Summer Sale- Special Discount Limited Time 65% Offer - Ends in 0d 00h 00m 00s - Coupon code: netdisc

Amazon Web Services BDS-C00 AWS Certified Big Data -Speciality Exam Practice Test

Note! Following BDS-C00 Exam is Retired now. Please select the alternative replacement for your Exam Certification. The new exam code is DAS-C01
Page: 1 / 26
Total 264 questions

AWS Certified Big Data -Speciality Questions and Answers

Question 1

A game company needs to properly scale its game application, which is backed by DynamoDB.

Amazon Redshift has the past two years of historical data. Game traffic varies throughout the year based on various factors such as season, movie release, and holiday season. An administrator needs to calculate how much read and write throughput should be previsioned for DynamoDB table for each week in advance.

How should the administrator accomplish this task?

Options:

A.

Feed the data into Amazon Machine Learning and build a regression model

B.

Feed the data into Spark Mlib and build a random forest model

C.

Feed the data into Apache Mahout and build a multi-classification model

D.

Feed the data into Amazon Machine Learning and build a binary classification model

Question 2

Which of the following are characteristics of Amazon VPC subnets? Choose 2 answers

Options:

A.

Each subnet maps to a single Availability Zone

B.

A CIDR block mask of /25 is the smallest range supported

C.

Instances in a private subnet can communicate with the internet only if they have an Elastic

IP.

D.

By default, all subnets can route between each other, whether they are private or public

E.

Each subnet spans at least 2 Availability zones to provide a high-availability environment

Question 3

You have been asked to use your department’s existing continuous integration (CI) tool to test a three- tier web architecture defined in an AWS CloudFormation template. The tool already supports AWS APIs and can launch new AWS CloudFormation stacks after polling version control. The CI tool reports on the success of the AWS CloudFormation stack creation by using the DescribeStacks API to look for the CREATE_COMPLETE status.

The architecture tiers defined in the template consist of:

. One load balancer

. Five Amazon EC2 instances running the web application

. One multi-AZ Amazon RDS instance How would you implement this? Choose 2 answers

Options:

A.

Define a WaitCondition and a WaitConditionhandle for the output of a output of a UserData command that does sanity checking of the application’s post-install state

B.

Define a CustomResource and write a script that runs architecture-level integration tests through the load balancer to the application and database for the state of multiple tiers

C.

Define a WaitCondition and use a WaitConditionHandle that leverages the AWS SDK to run the

DescribeStacks API call until the CREATE_COMPLETE status is returned

D.

Define a CustomResource that leverages the AWS SDK to run the DescribeStacks API call until the CREATE_COMPLETE status is returned

E.

Define a UserDataHandle for the output of a UserData command that does sanity checking of the application’s post-install state and runs integration tests on the state of multiple tiers through load balancer to the application

F.

Define a UserDataHandle for the output of a CustomResource that does sanity checking of the application’s post-install state

Question 4

You have a load balancer configured for VPC, and all backend Amazon EC2 instances are in service. However, your web browser times out when connecting to the load balancer’s DNS name. Which options are probable causes of this behavior?

Options:

A.

The load balancer was not configured to use a public subnet with an Internet gateway configured

B.

The Amazon EC2 instances do not have a dynamically allocated private IP address

C.

The security groups or network ACLs are not properly configured for web traffic

D.

The load balancer is not configured in a private subnet with a NAT instance

E.

The VPC does not have a VGW configured

Question 5

A user is planning to setup infrastructure on AWS for the Christmas sales. The user is planning to use Auto Scaling based on the schedule for proactive scaling. What advise would you give to the user?

Options:

A.

It is good to schedule now because if the user forgets later on it will not scale up

B.

The scaling should be setup only one week before Christmas

C.

Wait till end of November before scheduling the activity

D.

It is not advisable to use scheduled based scaling

Question 6

In AWS, which security aspects are the customer’s responsibility? Choose 4 answers

Options:

A.

Life-Cycle management of IAM credentials

B.

Security Group and ACL settings

C.

Controlling physical access to compute resources

D.

Path management on the EC2 instance’s operating system

E.

Encryption of EBS volumes

F.

Decommissioning storage devices

Question 7

Your social media marketing application has a component written in Ruby running on AWS Elastic BeanStalk. This application component posts messages to social media sites in support of various marketing campaigns. Your management now requires you to record replies to these social media messages to analyze the effectiveness of the marketing campaign in comparison to past and future efforts. You have already developed a new application component to interface with the social media site APIs in order to read the replies.

Which process should you use to record the social media replies in a durable data store that can be accessed at any time for analysis of historical data?

Options:

A.

Deploy the new application component in an Auto Scaling group of Amazon Elastic Compute Cloud (EC2) instances, read the data from the social media sites, store it with Amazon Elastic Block Store, and use AWS Data Pipeline to publish it to Amazon Kinesis for analytics

B.

Deploy the new application component as a Elastic BeanStalk application, read the data from

the social media sites, store it in Amazon DynamoDB, and use Apache Hive with Amazon Elastic

MapReduce for analytic

C.

Deploy the new application component in an Auto Scaling group of Amazon EC2 instances, read the data from the social media sites, store it in Amazon Glacier, and use AWS Data Pipeline to publish it to Amazon Redshift for analytics

D.

Deploy the new application component as an Amazon Elastic Beanstalk application, read the

data from the social media site, store it with Amazon Elastic Block Store, and use Amazon Kinesis to stream the data to Amazon CloudWatch for analytics

Question 8

A telecommunications company needs to predict customer churn (i.e. customers eho decide to switch a computer). The company has historic records of each customer, including monthly consumption patterns, calls to customer service, and whether the customer ultimately quit th eservice. All of this data is stored in Amazon S3. The company needs to know which customers are likely going to churn soon so that they can win back their loyalty.

What is the optimal approach to meet these requirements?

Options:

A.

Use the Amazon Machine Learning service to build the binary classification model based on the dataset stored in Amazon S3. The model will be used regularly to predict churn attribute for existing customers

B.

Use AWS QuickSight to connect it to data stored in Amazon S3 to obtain the necessary

business insight. Plot the churn trend graph to extrapolate churn likelihood for existing customer

C.

Use EMR to run the Hive queries to build a profile of a churning customer. Apply the profile to existing customers to determine the likelihood of churn

D.

Use a Redshift cluster to COPY the data from Amazon S3. Create a user Define Function in Redshift that computers the likelihood of churn

Question 9

An organization has configured a VPC with an Internet Gateway (IGW). Pairs of public and private subnets (each with one subnet per Availability Zone), and an Elastic Load Balancer (ELB) configured to use the public subnets. The application’s web tier leverages the ELB. Auto Scaling and a multi-AZ RDS database instance the organization would like to eliminate any potential single points of failure in this design.

What step should you take to achieve this organization's objective?

Options:

A.

Nothing, there are no single points of failure in this architecture.

B.

Create and attach a second IGW to provide redundant internet connectivity.

C.

Create and configure a second Elastic Load Balancer to provide a redundant load balancer.

D.

Create a second multi-AZ RDS instance in another Availability Zone and configure replication to provide a redundant database.

Question 10

A customer needs to determine the optimal distribution strategy for the ORDERS fact table in its Redshift schema. The ORDERS table has foreign key relationships with multiple dimension tables in this schema.

How should the company determine the most appropriate distribution key for the ORDRES table?

Options:

A.

Identity the largest and most frequently joined dimension table and ensure that it and the ORDERS table both have EVEN distribution

B.

Identify the target dimension table and designate the key of this dimension table as the distribution key of the ORDERS table

C.

Identity the smallest dimension table and designate the key of this dimension table as the distribution key of ORDERS table

D.

Identify the largest and most frequently joined dimension table and designate the key of this dimension table as the distribution key for the orders table

Question 11

In the 'Detailed' monitoring data available for your Amazon EBS volumes, Provisioned IOPS volumes automatically send _____ minute metrics to Amazon CloudWatch.

Options:

A.

5

B.

2

C.

1

D.

3

Question 12

If your DB instance runs out of storage space or file system resources, its status will change to_____ and your DB Instance will no longer be available.

Options:

A.

storage-overflow

B.

storage-full

C.

storage-exceed

D.

storage-overage

Question 13

Managers in a company need access to the human resources database that runs on Amazon Redshift, to run reports about their employees. Managers must only see information about their direct reports.

Which technique should be used to address this requirement with Amazon Redshift?

Options:

A.

Define an IAM group for each employee as an IAM user in that group and use that to limit the access.

B.

Use Amazon Redshift snapshot to create one cluster per manager. Allow the managers to access only their designated clusters.

C.

Define a key for each manager in AWS KMS and encrypt the data for their employees with their private keys.

D.

Define a view that uses the employee’s manager name to filter the records based on current user names.

Question 14

An organization is using Amazon Kinesis Data Streams to collect data generated from thousands of temperature devices and is using AWS Lambda to process the data. Devices generate 10 to 12 million records every day, but Lambda is processing only around 450 thousand records. Amazon CloudWatch indicates that throttling on Lambda is not occurring.

What should be done to ensure that all data is processed? (Choose two.)

Options:

A.

Increase the BatchSize value on the EventSource, and increase the memory allocated to the Lambda function.

B.

Decrease the BatchSize value on the EventSource, and increase the memory allocated to the Lambda function.

C.

Create multiple Lambda functions that will consume the same Amazon Kinesis stream.

D.

Increase the number of vCores allocated for the Lambda function.

E.

Increase the number of shards on the Amazon Kinesis stream.

Question 15

Which of the following instance types are available as Amazon EBS backend only?

Options:

A.

General purpose T2

B.

General purpose M3

C.

Compute-optimized C4

D.

Compute-optimized C3

E.

Storage-optimized 12

Question 16

An organization is setting up a data catalog and metadata management environment for their numerous data stores currently running on AWS. The data catalog will be used to determine the structure and other attributes of data in the data stores. The data stores are composed of Amazon RDS databases, Amazon Redshift, and CSV files residing on Amazon S3. The catalog should be populated on a scheduled basis, and minimal administration is required to manage the catalog.

How can this be accomplished?

Options:

A.

Set up Amazon DynamoDB as the data catalog and run a scheduled AWS Lambda function that connects to data sources to populate the database.

B.

Use an Amazon database as the data catalog and run a scheduled AWS Lambda function that connects to data sources to populate the database.

C.

Use AWS Glue Data Catalog as the data catalog and schedule crawlers that connect to data sources to populate the database.

D.

Set up Apache Hive metastore on an Amazon EC2 instance and run a scheduled bash script that connects to data sources to populate the metastore.

Question 17

Which data store should the organization choose?

Options:

A.

Amazon Relational Database Service (RDS)

B.

Amazon Redshift

C.

Amazon DynamoDB

D.

Amazon Elasticsearch

Question 18

A data engineer is running a DWH on a 25-node Redshift cluster of a SaaS service. The data engineer needs to build a dashboard that will be used by customers. Five big customers represent 80% of usage, and there is a long tail of dozens of smaller customers. The data engineer has selected the dashboarding tool.

How should the data engineer make sure that the larger customer workloads do NOT interfere with the smaller customer workloads?

Options:

A.

Apply query filters based on customer-id that can NOT be changed by the user and apply distribution keys on customer id

B.

Place the largest customers into a single user group with a dedicated query queue and place the rest of the customer into a different query queue

C.

Push aggregations into an RDS for Aurora instance. Connect the dashboard application to Aurora rather than Redshift for faster queries

D.

Route the largest customers to a dedicated Redshift cluster, Raise the concurrency of the multi-tenant Redshift cluster to accommodate the remaining customers

Question 19

A user has setup an RDS DB with Oracle. The user wants to get notifications when someone modifies the security group of that DB. How can the user configure that?

Options:

A.

It is not possible to get the notifications on a change in the security group

B.

Configure SNS to monitor security group changes

C.

Configure event notification on the DB security group

D.

Configure the CloudWatch alarm on the DB for a change in the security group

Question 20

Which of the following requires a custom cloudwatch metric to monitor?

Options:

A.

Memory utilization of an EC2 instance

B.

CPU utilization of an EC2 instance

C.

Disk usage activity of an EC2 instance

D.

Data transfer of an EC2 instance

Question 21

Is there a limit to the number of groups you can have?

Options:

A.

Yes for all users

B.

Yes for all users except root

C.

No

D.

Yes unless special permission granted

Question 22

Multiple rows in an Amazon Redshift table were accidentally deleted. A System Administrator is restoring the table from the most recent snapshot. The snapshot contains all rows that were in the table before the deletion.

What is the SIMPLEST solution to restore the table without impacting users?

Options:

A.

Restore the snapshot to a new Amazon Redshift cluster, then UNLOAD the table to Amazon S3. In the original cluster, TRUNCATE the table, then load the data from Amazon S3 by using a COPY command.

B.

Use the Restore Table from a Snapshot command and specify a new table name DROP the original table, then RENAME the new table to the original table name.

C.

Restore the snapshot to a new Amazon Redshift cluster. Create a DBLINK between the two clusters in the original cluster, TRUNCATE the destination table, then use an INSERT command to copy the data from the new cluster.

D.

Use the ALTER TABLE REVERT command and specify a time stamp of immediately before the data deletion. Specify the Amazon Resource Name of the snapshot as the SOURCE and use the OVERWRITE REPLACE option.

Question 23

Which of these configuration or deployment practices is a security risk for RDS?

Options:

A.

Storing SQL function code in plaintext

B.

Non-Multi-AZ RDS instance

C.

Having RDS and EC2 instances exist in the same subnet

D.

RDS in a public subnet

Question 24

A user has setup an RDS DB with Oracle. The user wants to get notifications when someone modifies the security group of that DB. How can the user configure that?

Options:

A.

It is not possible to get the notifications on a change in the security group

B.

Configure SNS to monitor security group changes

C.

Configure event notification on the DB security group

D.

Configure the CloudWatch alarm on the DB for a change in the security group

Question 25

When using the following AWS services, which should be implemented in multiple Availability Zones for high availability solutions? Choose 2 answers

Options:

A.

Amazon Simple Storage Service

B.

Amazon Elastic Load Balancing

C.

Amazon Elastic Compute Cloud

D.

Amazon Simple Notification Service

E.

Amazon DynamoDB

Question 26

You have a video Trans coding application running on Amazon EC2. Each instance pools a queue to find out which video should be Trans coded, and then runs a Trans coding process.

If this process is interrupted, the video will be Trans coded by another instance based on the queuing system. You have a large backlog of videos which need to be Trans coded and would like to reduce this backlog by adding more instances. You will need these instances only until the backlog is reduced. Which type of Amazon EC2 instance should you use to reduce the backlog in the most cost-effective way?

Options:

A.

Dedicated instances

B.

Spot instances

C.

On-demand instances

D.

Reserved instances

Question 27

By default what are ENIs that are automatically created and attached to instances using the EC2 console set to do when the attached instance terminates?

Options:

A.

Remain as is

B.

Terminate

C.

Hibernate

D.

Pause

Question 28

An organization is designing an Amazon DynamoDB table for an application that must meet the following requirements:

Item size is 40 KB

Read/write ratio 2000/500 sustained, respectively

Heavily read-oriented and requires low latencies in the order of milliseconds

The application runs on an Amazon EC2 instance

Access to the DynamoDB table must be secure within the VPC

Minimal changes to application code to improve performance using write-through cache

Which design options will BEST meet these requirements?

Options:

A.

Size the DynamoDB table with 10000 RCUs/20000 WCUs, implement the DynamoDB Accelerator (DAX) for read performance, use VPC endpoints for DynamoDB, and implement an IAM role on the EC2 instance to secure DynamoDB access.

B.

Size the DynamoDB table with 20000 RCUs/20000 WCUs, implement the DynamoDB Accelerator (DAX) for read performance, leverage VPC endpoints for DynamoDB, and implement an IAM user on the EC2 instance to secure DynamoDB access.

C.

Size the DynamoDB table with 10000 RCUs/20000 WCUs, implement Amazon ElastiCache for read performance, set up a NAT gateway on VPC for the EC2 instance to access DynamoDB, and implement an IAM role on the EC2 instance to secure DynamoDB access.

D.

Size the DynamoDB table with 20000 RCUs/20000 WCUs, implement Amazon ElastiCache for read performance, leverage VPC endpoints for DynamoDB, and implement an IAM user on the EC2 instance to secure DynamoDB access.

Question 29

An Administrator needs to design the event log storage architecture for events from mobile devices. The event data will be processed by an Amazon EMR cluster daily for aggregated reporting and analytics before being archived.

How should the administrator recommend storing the log data?

Options:

A.

Create an Amazon S3 bucket and write log data into folders by device Execute the EMR job on the device folders

B.

Create an Amazon DynamoDB table partitioned on the device and sorted on data, write log data to the table. Execute the EMR job on the Amazon DynamoDB table

C.

Create an Amazon S3 bucket and write data into folders by day. Execute the EMR job on the daily folder

D.

Create an Amazon DynamoDB table partitioned on EventID, write log data to table. Execute the EMR job on the table

Question 30

A user is running one instance for only 3 hours every day. The user wants to save some cost with the instance. Which of the below mentioned Reserved Instance categories is advised in this case?

Options:

A.

The user should not use RI; instead only go with the on-demand pricing

B.

The user should use the AWS high utilized RI

C.

The user should use the AWS medium utilized RI D. The user should use the AWS low utilized RI

Question 31

Your application uses CloudFormation to orchestrate your application’s resources. During your testing phase before application went live, your Amazon RDS instance type was changed and caused the instance to be re-created, resulting in the loss of test data.

How should you prevent this from occurring in the future?

Options:

A.

Within the AWS CloudFormation parameter with which users can select the Amazon RDS

instance type, set AllowedValues to only contain the current instance type

B.

Use an AWS CloudFormation stack policy to deny updates to the instance. Only allow

UpdateStack permission to IAM principles that are denied SetStackPolicy

C.

In the AWS CloudFormation template, set the AWS::RDS::DBInstance’s DBInstanceClass property to be read-only

D.

Subscribe to the AWS CloudFormation notification “BeforeResourceUpdate” and call

CancelStackUpdate if the resource identified is the Amazon RDS instance

E.

In the AWS ClousFormation template, set the DeletionPolicy of the AWS::RDS::DBInstance’s

DeletionPolicy property to “Retain”

Question 32

A clinical trial will rely on medical sensors to remotely assess patient health. Each physician who participates in the trial requires visual reports each morning. The reports are built from aggregations of all the sensor data taken each minute.

What is the most cost-effective solution for creating this visualization each day?

Options:

A.

Use Kinesis Aggregators Library to generate reports for reviewing the patient sensor data and generate a QuickSight visualization on the new data each morning for the physician to review

B.

Use a Transient EMR cluster that shuts down after use to aggregate the patient sensor data each night and generate a QuickSight visualization on the new data each morning for the physician to review

C.

Use Spark streaming on EMR to aggregate the sensor data coming in every 15 minutes and generate a QuickSight visualization on the new data each morning for the physician to review

D.

Use an EMR cluster to aggregate the patient sensor data each right and provide Zeppelin notebooks that look at the new data residing on the cluster each morning

Question 33

The Amazon EC2 web service can be accessed using the _____ web services messaging protocol. This interface is described by a Web Services Description Language (WSDL) document.

Options:

A.

SOAP

B.

DCOM

C.

CORBA

D.

XML-RPC

Question 34

A solutions architect for a logistics organization ships packages from thousands of suppliers to end customers. The architect is building a platform where suppliers can view the status of one or more of their shipments. Each supplier can have multiple roles that will only allow access to specific fields in the resulting information.

Which strategy allows the appropriate level of access control and requires the LEAST amount of management work?

Options:

A.

Send the tracking data to Amazon Kinesis Streams. Use AWS Lambda to store the data in an Amazon DynamoDB Table. Generate temporary AWS credentials for the supplier’s users with AWS STS, specifying fine-grained security policies to limit access only to their application data.

B.

Send the tracking data to Amazon Kinesis Firehouse. Use Amazon S3 notifications and AWS Lambda to prepare files in Amazon S3 with appropriate data for each supplier’s roles. Generate temporary AWS credentials for the suppliers’ users with AWS STS. Limit access to the appropriate files through security policies.

C.

Send the tracking data to Amazon Kinesis Streams. Use Amazon EMR with Spark Streaming to store the data in HBase. Create one table per supplier. Use HBase Kerberos integration with the suppliers’ users. Use HBase ACL-based security to limit access to the roles to their specific table and columns.

D.

Send the tracking data to Amazon Kinesis Firehose. Store the data in an Amazon Redshift cluster. Create views for the supplier’s users and roles. Allow suppliers access to the Amazon Redshift cluster using a user limited to the application view.

Question 35

An organization uses Amazon Elastic MapReduce (EMR) to process a series of extract-transform-load (ETL) steps that run in sequence. The output of each step must be fully processed in subsequent steps but will not be retained.

Which of the following techniques will meet this requirement most efficiently?

Options:

A.

Use the EMR File System (EMRFS) to store the outputs from each step as objects in Amazon

Simple Storage Service (S3).

B.

Use the s3n URI to story the data to be processes as objects in Amazon S3.

C.

Define the ETL steps as separate AWS Data Pipeline activities.

D.

Load the data to be processed into HDFS and then write the final output to Amazon S3.

Question 36

A company with a support organization needs support engineers to be able to search historic cases to provide fast responses on new issues raised. The company has forwarded all support messages into an Amazon Kinesis Stream. This meets a company objective of using only managed services to reduce.

The company needs an appropriate architecture that allows support engineers to search on historic cases can find similar issues and their associated responses.

Which AWS Lambda action is most appropriate?

Options:

A.

Ingest and index the content into an Amazon Elasticsearch domain

B.

Stem and tokenize the input and store the results into Amazon ElastiCache

C.

Write data as JSON into Amazon DynamoDB with primary and secondary indexes

D.

Aggregate feedback is Amazon S3 using a columnar format with partitioning

Question 37

What is web identity federation?

Options:

A.

Use of an identity provider like Google or Facebook to become an AWS IAM User.

B.

Use of an identity provider like Google or Facebook to exchange for temporary AWS security credentials.

C.

Use of AWS IAM User tokens to log in as a Google or Facebook user.

D.

Use of AWS STS Tokens to log in as a Google or Facebook user.

Question 38

You are working with customer who has 10 TB of archival data that they want to migrate to Amazon Glacier. The customer has a 1Mbps connection to the Internet. Which service or feature provide the fastest method of getting the data into Amazon Glacier?

Options:

A.

Amazon Glacier multipart upload

B.

AWS Storage Gateway

C.

VM Import/Export

D.

AWS Import/Export

Question 39

A system needs to collect on-premises application spools files into a persistent storage layer in AWS. Each spool file is 2 KB. The application generates 1 M files per hour. Each source file is automatically deleted from the local server after one hour. What is the most cost-efficient option to meet these requirements?

Options:

A.

Write file contents to an Amazon DynamoDB table

B.

Copy files to Amazon S3 standard storage

C.

Write file content to Amazon ElastiCache

D.

Copy files to Amazon S3 infrequent Access storage

Page: 1 / 26
Total 264 questions