Pre-Summer Sale Limited Time Flat 70% Discount offer - Ends in 0d 00h 00m 00s - Coupon code: 70spcl

Amazon Web Services MLA-C01 AWS Certified Machine Learning Engineer - Associate Exam Practice Test

Page: 1 / 24
Total 241 questions

AWS Certified Machine Learning Engineer - Associate Questions and Answers

Question 1

A company is using an Amazon Redshift database as its single data source. Some of the data is sensitive.

A data scientist needs to use some of the sensitive data from the database. An ML engineer must give the data scientist access to the data without transforming the source data and without storing anonymized data in the database.

Which solution will meet these requirements with the LEAST implementation effort?

Options:

A.

Configure dynamic data masking policies to control how sensitive data is shared with the data scientist at query time.

B.

Create a materialized view with masking logic on top of the database. Grant the necessary read permissions to the data scientist.

C.

Unload the Amazon Redshift data to Amazon S3. Use Amazon Athena to create schema-on-read with masking logic. Share the view with the data scientist.

D.

Unload the Amazon Redshift data to Amazon S3. Create an AWS Glue job to anonymize the data. Share the dataset with the data scientist.

Question 2

A company has a team of data scientists who use Amazon SageMaker notebook instances to test ML models. When the data scientists need new permissions, the company attaches the permissions to each individual role that was created during the creation of the SageMaker notebook instance.

The company needs to centralize management of the team ' s permissions.

Which solution will meet this requirement?

Options:

A.

Create a single IAM role that has the necessary permissions. Attach the role to each notebook instance that the team uses.

B.

Create a single IAM group. Add the data scientists to the group. Associate the group with each notebook instance that the team uses.

C.

Create a single IAM user. Attach the AdministratorAccess AWS managed IAM policy to the user. Configure each notebook instance to use the IAM user.

D.

Create a single IAM group. Add the data scientists to the group. Create an IAM role. Attach the AdministratorAccess AWS managed IAM policy to the role. Associate the role with the group. Associate the group with each notebook instance that the team uses.

Question 3

A company plans to use Amazon SageMaker AI to build image classification models. The company has 6 TB of training data stored on Amazon FSx for NetApp ONTAP. The file system is in the same VPC as SageMaker AI.

An ML engineer must make the training data accessible to SageMaker AI training jobs.

Which solution will meet these requirements?

Options:

A.

Mount the FSx for ONTAP file system as a volume to the SageMaker AI instance.

B.

Create an Amazon S3 bucket and use Mountpoint for Amazon S3 to link the bucket to FSx for ONTAP.

C.

Create a catalog connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

D.

Create a direct connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

Question 4

A company runs an Amazon SageMaker domain in a public subnet of a newly created VPC. The network is configured properly, and ML engineers can access the SageMaker domain.

Recently, the company discovered suspicious traffic to the domain from a specific IP address. The company needs to block traffic from the specific IP address.

Which update to the network configuration will meet this requirement?

Options:

A.

Create a security group inbound rule to deny traffic from the specific IP address. Assign the security group to the domain.

B.

Create a network ACL inbound rule to deny traffic from the specific IP address. Assign the rule to the default network Ad for the subnet where the domain is located.

C.

Create a shadow variant for the domain. Configure SageMaker Inference Recommender to send traffic from the specific IP address to the shadow endpoint.

D.

Create a VPC route table to deny inbound traffic from the specific IP address. Assign the route table to the domain.

Question 5

An ML engineer is setting up a continuous integration and continuous delivery (CI/CD) pipeline for an ML workflow in Amazon SageMaker AI. The pipeline needs to automate model re-training, testing, and deployment whenever new data is uploaded to an Amazon S3 bucket. New data files are approximately 10 GB in size. The ML engineer wants to track model versions for auditing.

Which solution will meet these requirements?

Options:

A.

Use AWS CodePipeline, Amazon S3, and AWS CodeBuild to retrain and deploy the model automatically and to track model versions.

B.

Use SageMaker Pipelines with the SageMaker Model Registry to orchestrate model training and version tracking.

C.

Create an AWS Lambda function to re-train and deploy the model. Use Amazon EventBridge to invoke the Lambda function. Reference the Lambda logs to track model versions.

D.

Use SageMaker AI notebook instances to manually re-train and deploy the model when needed. Reference AWS CloudTrail logs to track model versions.

Question 6

A hospital wants to predict patient outcomes for the coming year An ML engineer must improve several existing ML models that currently perform poorly.

Select the correct regularization method from the following list to improve each model Select each regularization method one time, more than one time, or not at all. (Select THREE.)

• L1 regularization

• L2 regularization

• Early stopping

Question # 6

Options:

Question 7

A company wants to deploy an Amazon SageMaker AI model that can queue requests. The model needs to handle payloads of up to 1 GB that take up to 1 hour to process. The model must return an inference for each request. The model also must scale down when no requests are available to process.

Which inference option will meet these requirements?

Options:

A.

Asynchronous inference

B.

Batch transform

C.

Serverless inference

D.

Real-time inference

Question 8

A company that has hundreds of data scientists is using Amazon SageMaker to create ML models. The models are in model groups in the SageMaker Model Registry.

The data scientists are grouped into three categories: computer vision, natural language processing (NLP), and speech recognition. An ML engineer needs to implement a solution to organize the existing models into these groups to improve model discoverability at scale. The solution must not affect the integrity of the model artifacts and their existing groupings.

Which solution will meet these requirements?

Options:

A.

Create a custom tag for each of the three categories. Add the tags to the model packages in the SageMaker Model Registry.

B.

Create a model group for each category. Move the existing models into these category model groups.

C.

Use SageMaker ML Lineage Tracking to automatically identify and tag which model groups should contain the models.

D.

Create a Model Registry collection for each of the three categories. Move the existing model groups into the collections.

Question 9

A company has a custom extract, transform, and load (ETL) process that runs on premises. The ETL process is written in the R language and runs for an average of 6 hours. The company wants to migrate the process to run on AWS.

Which solution will meet these requirements?

Options:

A.

Use an AWS Lambda function created from a container image to run the ETL jobs.

B.

Use Amazon SageMaker AI processing jobs with a custom Docker image stored in Amazon Elastic Container Registry (Amazon ECR).

C.

Use Amazon SageMaker AI script mode to build a Docker image. Run the ETL jobs by using SageMaker Notebook Jobs.

D.

Use AWS Glue to prepare and run the ETL jobs.

Question 10

An ML engineer normalized training data by using min-max normalization in AWS Glue DataBrew. The ML engineer must normalize production inference data in the same way before passing the data to the model.

Which solution will meet this requirement?

Options:

A.

Apply statistics from a well-known dataset to normalize the production samples.

B.

Keep the min-max normalization statistics from the training set and use them to normalize the production samples.

C.

Calculate new min-max statistics from a batch of production samples and use them to normalize all production samples.

D.

Calculate new min-max statistics from each production sample and use them to normalize all production samples.

Question 11

An ML engineer needs to use an Amazon EMR cluster to process large volumes of data in batches. Any data loss is unacceptable.

Which instance purchasing option will meet these requirements MOST cost-effectively?

Options:

A.

Run the primary node, core nodes, and task nodes on On-Demand Instances.

B.

Run the primary node, core nodes, and task nodes on Spot Instances.

C.

Run the primary node on an On-Demand Instance. Run the core nodes and task nodes on Spot Instances.

D.

Run the primary node and core nodes on On-Demand Instances. Run the task nodes on Spot Instances.

Question 12

An ML engineer has a custom container that performs k-fold cross-validation and logs an average F1 score during training. The ML engineer wants Amazon SageMaker AI Automatic Model Tuning (AMT) to select hyperparameters that maximize the average F1 score.

How should the ML engineer integrate the custom metric into SageMaker AI AMT?

Options:

A.

Define the average F1 score in the TrainingInputMode parameter.

B.

Define a metric definition in the tuning job that uses a regular expression to capture the average F1 score from the training logs.

C.

Publish the average F1 score as a custom Amazon CloudWatch metric.

D.

Write the F1 score to a JSON file in Amazon S3 and reference it in ObjectiveMetricName.

Question 13

A company is using Amazon SageMaker AI to develop a credit risk assessment model. During model validation, the company finds that the model achieves 82% accuracy on the validation data. However, the model achieved 99% accuracy on the training data. The company needs to address the model accuracy issue before deployment.

Which solution will meet this requirement?

Options:

A.

Add more dense layers to increase model complexity. Implement batch normalization. Use early stopping during training.

B.

Implement dropout layers. Use L1 or L2 regularization. Perform k-fold cross-validation.

C.

Use principal component analysis (PCA) to reduce the feature dimensionality. Decrease model layers. Implement cross-entropy loss functions.

D.

Augment the training dataset. Remove duplicate records from the training dataset. Implement stratified sampling.

Question 14

A company has a Retrieval Augmented Generation (RAG) application that uses a vector database to store embeddings of documents. The company must migrate the application to AWS and must implement a solution that provides semantic search of text files. The company has already migrated the text repository to an Amazon S3 bucket.

Which solution will meet these requirements?

Options:

A.

Use an AWS Batch job to process the files and generate embeddings. Use AWS Glue to store the embeddings. Use SQL queries to perform the semantic searches.

B.

Use a custom Amazon SageMaker notebook to run a custom script to generate embeddings. Use SageMaker Feature Store to store the embeddings. Use SQL queries to perform the semantic searches.

C.

Use the Amazon Kendra S3 connector to ingest the documents from the S3 bucket into Amazon Kendra. Query Amazon Kendra to perform the semantic searches.

D.

Use an Amazon Textract asynchronous job to ingest the documents from the S3 bucket. Query Amazon Textract to perform the semantic searches.

Question 15

A company is using Amazon SageMaker AI to build an ML model to predict customer behavior. The company needs to explain the bias in the model to an auditor. The explanation must focus on demographic data of the customers.

Which solution will meet these requirements?

Options:

A.

Use SageMaker Clarify to generate a bias report. Send the report to the auditor.

B.

Use AWS Glue DataBrew to create a job to detect drift in the model ' s data quality. Send the job output to the auditor.

C.

Use Amazon QuickSight integration with SageMaker AI to generate a bias report. Send the report to the auditor.

D.

Use Amazon CloudWatch metrics from the SageMaker AI namespace to create a bias dashboard. Share the dashboard with the auditor.

Question 16

A company runs an Amazon SageMaker AI domain in a public subnet of a newly created VPC. The network is configured properly, and ML engineers can access the SageMaker AI domain.

Recently, the company discovered suspicious traffic to the domain from a specific IP address. The company needs to block traffic from the specific IP address.

Which update to the network configuration will meet this requirement?

Options:

A.

Create a security group inbound rule to deny traffic from the specific IP address. Assign the security group to the domain.

B.

Create a network ACL inbound rule to deny traffic from the specific IP address. Assign the rule to the default network ACL for the subnet where the domain is located.

C.

Create a shadow variant for the domain. Configure SageMaker Inference Recommender to send traffic from the specific IP address to the shadow endpoint.

D.

Create a VPC route table to deny inbound traffic from the specific IP address. Assign the route table to the domain.

Question 17

A company is creating an application that will recommend products for customers to purchase. The application will make API calls to Amazon Q Business. The company must ensure that responses from Amazon Q Business do not include the name of the company ' s main competitor.

Which solution will meet this requirement?

Options:

A.

Configure the competitor ' s name as a blocked phrase in Amazon Q Business.

B.

Configure an Amazon Q Business retriever to exclude the competitor’s name.

C.

Configure an Amazon Kendra retriever for Amazon Q Business to build indexes that exclude the competitor ' s name.

D.

Configure document attribute boosting in Amazon Q Business to deprioritize the competitor ' s name.

Question 18

An ML engineer is training an XGBoost regression model in Amazon SageMaker AI. The ML engineer conducts several rounds of hyperparameter tuning with random grid search. After these rounds of tuning, the error rate on the test hold-out dataset is much larger than the error rate on the training dataset.

The ML engineer needs to make changes before running the hyperparameter grid search again.

Which changes will improve the model ' s performance? (Select TWO.)

Options:

A.

Increase the model complexity by increasing the number of features in the dataset.

B.

Decrease the model complexity by reducing the number of features in the dataset.

C.

Decrease the model complexity by reducing the number of samples in the dataset.

D.

Increase the value of the L2 regularization parameter.

E.

Decrease the value of the L2 regularization parameter.

Question 19

An ML engineer is analyzing potential biases in a customer dataset before training an ML model. The dataset contains customer age (numeric), product reviews (text), and purchase outcomes (categorical).

Which statistical metrics should the ML engineer use to identify potential biases in the dataset before model training?

Options:

A.

Calculate the statistical mean and standard deviation of customer age distribution. Count word frequencies in product reviews.

B.

Calculate the class imbalance metric of purchase outcomes. Use product reviews to check sentiment distribution to capture bias.

C.

Calculate the class imbalance metric of purchase outcomes and the difference in proportions of labels (DPL) across customer age groups.

D.

Calculate the correlation coefficient between customer age and purchase outcomes. Calculate unique word counts in product reviews.

Question 20

An ML engineer is configuring auto scaling for an inference component of a model that runs behind an Amazon SageMaker AI endpoint. The ML engineer configures SageMaker AI auto scaling with a target tracking scaling policy set to 100 invocations per model per minute. The SageMaker AI endpoint scales appropriately during normal business hours. However, the ML engineer notices that at the start of each business day, there are zero instances available to handle requests, which causes delays in processing.

The ML engineer must ensure that the SageMaker AI endpoint can handle incoming requests at the start of each business day.

Which solution will meet this requirement?

Options:

A.

Reduce the SageMaker AI auto scaling cooldown period to the minimum supported value. Add an auto scaling lifecycle hook to scale the SageMaker AI instances.

B.

Change the target metric to CPU utilization.

C.

Modify the scaling policy target value to one.

D.

Apply a step scaling policy that scales based on an Amazon CloudWatch alarm. Apply a second CloudWatch alarm and scaling policy to scale the minimum number of instances from zero to one at the start of each business day.

Question 21

A company has significantly increased the amount of data that is stored as .csv files in an Amazon S3 bucket. Data transformation scripts and queries are now taking much longer than they used to take.

An ML engineer must implement a solution to optimize the data for query performance.

Which solution will meet this requirement with the LEAST operational overhead?

Options:

A.

Configure an AWS Lambda function to split the .csv files into smaller objects in the S3 bucket.

B.

Configure an AWS Glue job to drop columns that have string type values and to save the results to the S3 bucket.

C.

Configure an AWS Glue extract, transform, and load (ETL) job to convert the .csv files to Apache Parquet format.

D.

Configure an Amazon EMR cluster to process the data that is in the S3 bucket.

Question 22

An ML engineer is training a simple neural network model. The ML engineer tracks the performance of the model over time on a validation dataset. The model ' s performance improves substantially at first and then degrades after a specific number of epochs.

Which solutions will mitigate this problem? (Choose two.)

Options:

A.

Enable early stopping on the model.

B.

Increase dropout in the layers.

C.

Increase the number of layers.

D.

Increase the number of neurons.

E.

Investigate and reduce the sources of model bias.

Question 23

A company wants to predict the success of advertising campaigns by considering the color scheme of each advertisement. An ML engineer is preparing data for a neural network model. The dataset includes color information as categorical data.

Which technique for feature engineering should the ML engineer use for the model?

Options:

A.

Apply label encoding to the color categories. Automatically assign each color a unique integer.

B.

Implement padding to ensure that all color feature vectors have the same length.

C.

Perform dimensionality reduction on the color categories.

D.

One-hot encode the color categories to transform the color scheme feature into a binary matrix.

Question 24

A company has AWS Glue data processing jobs that are orchestrated by an AWS Glue workflow. The AWS Glue jobs can run on a schedule or can be launched manually.

The company is developing pipelines in Amazon SageMaker Pipelines for ML model development. The pipelines will use the output of the AWS Glue jobs during the data processing phase of model development. An ML engineer needs to implement a solution that integrates the AWS Glue jobs with the pipelines.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Use AWS Step Functions for orchestration of the pipelines and the AWS Glue jobs.

B.

Use processing steps in SageMaker Pipelines. Configure inputs that point to the Amazon Resource Names (ARNs) of the AWS Glue jobs.

C.

Use Callback steps in SageMaker Pipelines to start the AWS Glue workflow and to stop the pipelines until the AWS Glue jobs finish running.

D.

Use Amazon EventBridge to invoke the pipelines and the AWS Glue jobs in the desired order.

Question 25

A company has a conversational AI assistant that sends requests through Amazon Bedrock to an Anthropic Claude large language model (LLM). Users report that when they ask similar questions multiple times, they sometimes receive different answers. An ML engineer needs to improve the responses to be more consistent and less random.

Which solution will meet these requirements?

Options:

A.

Increase the temperature parameter and the top_k parameter.

B.

Increase the temperature parameter. Decrease the top_k parameter.

C.

Decrease the temperature parameter. Increase the top_k parameter.

D.

Decrease the temperature parameter and the top_k parameter.

Question 26

A company has historical data that shows whether customers needed long-term support from company staff. The company needs to develop an ML model to predict whether new customers will require long-term support.

Which modeling approach should the company use to meet this requirement?

Options:

A.

Anomaly detection

B.

Linear regression

C.

Logistic regression

D.

Semantic segmentation

Question 27

An ML engineer is tuning an image classification model that performs poorly on one of two classes. The poorly performing class represents an extremely small fraction of the training dataset.

Which solution will improve the model’s performance?

Options:

A.

Optimize for accuracy. Use image augmentation on the less common images.

B.

Optimize for F1 score. Use image augmentation on the less common images.

C.

Optimize for accuracy. Use SMOTE to generate synthetic images.

D.

Optimize for F1 score. Use SMOTE to generate synthetic images.

Question 28

An ML engineer needs to use an ML model to predict the price of apartments in a specific location.

Which metric should the ML engineer use to evaluate the model’s performance?

Options:

A.

Accuracy

B.

Area Under the ROC Curve (AUC)

C.

F1 score

D.

Mean absolute error (MAE)

Question 29

An ML engineer is using an Amazon SageMaker AI shadow test to evaluate a new model that is hosted on a SageMaker AI endpoint. The shadow test requires significant GPU resources for high performance. The production variant currently runs on a less powerful instance type.

The ML engineer needs to configure the shadow test to use a higher performance instance type for a shadow variant. The solution must not affect the instance type of the production variant.

Which solution will meet these requirements?

Options:

A.

Modify the existing ProductionVariant configuration in the endpoint to include a ShadowProductionVariants list. Specify the larger instance type for the shadow variant.

B.

Create a new endpoint configuration with two ProductionVariant definitions. Configure one definition for the existing production variant and one definition for the shadow variant with the larger instance type. Use the UpdateEndpoint action to apply the new configuration.

C.

Create a separate SageMaker AI endpoint for the shadow variant that uses the larger instance type. Create an AWS Lambda function that routes a portion of the traffic to the shadow endpoint. Assign the Lambda function to the original endpoint.

D.

Use the CreateEndpointConfig action to define a new configuration. Specify the existing production variant in the configuration and add a separate ShadowProductionVariants list. Specify the larger instance type for the shadow variant. Use the CreateEndpoint action and pass the new configuration to the endpoint.

Question 30

A company is planning to use Amazon Redshift ML in its primary AWS account. The source data is in an Amazon S3 bucket in a secondary account.

An ML engineer needs to set up an ML pipeline in the primary account to access the S3 bucket in the secondary account. The solution must not require public IPv4 addresses.

Which solution will meet these requirements?

Options:

A.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC with no public access enabled in the primary account. Create a VPC peering connection between the accounts. Update the VPC route tables to remove the route to 0.0.0.0/0.

B.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC with no public access enabled in the primary account. Create an AWS Direct Connect connection and a transit gateway. Associate the VPCs from both accounts with the transit gateway. Update the VPC route tables to remove the route to 0.0.0.0/0.

C.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC in the primary account. Create an AWS Site-to-Site VPN connection with two encrypted IPsec tunnels between the accounts. Set up interface VPC endpoints for Amazon S3.

D.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC in the primary account. Create an S3 gateway endpoint. Update the S3 bucket policy to allow IAM principals from the primary account. Set up interface VPC endpoints for SageMaker and Amazon Redshift.

Question 31

A company is creating an application that will recommend products for customers to purchase. The application will make API calls to Amazon Q Business. The company must ensure that responses from Amazon Q Business do not include the name of the company ' s main competitor.

Which solution will meet this requirement?

Options:

A.

Configure the competitor ' s name as a blocked phrase in Amazon Q Business.

B.

Configure an Amazon Q Business retriever to exclude the competitor ' s name.

C.

Configure an Amazon Kendra retriever for Amazon Q Business to build indexes that exclude the competitor ' s name.

D.

Configure document attribute boosting in Amazon Q Business to deprioritize the competitor ' s name.

Question 32

A bank needs to use Amazon SageMaker AI to create an ML model to determine which customers qualify for a new product. The bank must use algorithms that SageMaker AI directly supports. The model must be explainable to the bank ' s regulators.

Which modeling approach will meet these requirements?

Options:

A.

Train the model by using the Object2Vec algorithm.

B.

Train the model by using the linear learner algorithm.

C.

Train a neural network.

D.

Train the model by using the k-means algorithm.

Question 33

An ML engineer is working on an ML model to predict the prices of similarly sized homes. The model will base predictions on several features The ML engineer will use the following feature engineering techniques to estimate the prices of the homes:

• Feature splitting

• Logarithmic transformation

• One-hot encoding

• Standardized distribution

Select the correct feature engineering techniques for the following list of features. Each feature engineering technique should be selected one time or not at all (Select three.)

Question # 33

Options:

Question 34

A company wants to reduce the cost of its containerized ML applications. The applications use ML models that run on Amazon EC2 instances, AWS Lambda functions, and an Amazon Elastic Container Service (Amazon ECS) cluster. The EC2 workloads and ECS workloads use Amazon Elastic Block Store (Amazon EBS) volumes to save predictions and artifacts.

An ML engineer must identify resources that are being used inefficiently. The ML engineer also must generate recommendations to reduce the cost of these resources.

Which solution will meet these requirements with the LEAST development effort?

Options:

A.

Create code to evaluate each instance ' s memory and compute usage.

B.

Add cost allocation tags to the resources. Activate the tags in AWS Billing and Cost Management.

C.

Check AWS CloudTrail event history for the creation of the resources.

D.

Run AWS Compute Optimizer.

Question 35

A company stores training data as a .csv file in an Amazon S3 bucket. The company must encrypt the data and must control which applications have access to the encryption key.

Which solution will meet these requirements?

Options:

A.

Create a new SSH access key and use the AWS Encryption CLI to encrypt the file.

B.

Create a new API key by using Amazon API Gateway and use it to encrypt the file.

C.

Create a new IAM role with permissions for kms:GenerateDataKey and use the role to encrypt the file.

D.

Create a new AWS Key Management Service (AWS KMS) key and use the AWS Encryption CLI with the KMS key to encrypt the file.

Question 36

A company is developing an application that reads animal descriptions from user prompts and generates images based on the information in the prompts. The application reads a message from an Amazon Simple Queue Service (Amazon SQS) queue. Then the application uses Amazon Titan Image Generator on Amazon Bedrock to generate an image based on the information in the message. Finally, the application removes the message from SQS queue.

Which IAM permissions should the company assign to the application ' s IAM role? (Select TWO.)

Options:

A.

Allow the bedrock:InvokeModel action for the Amazon Titan Image Generator resource.

B.

Allow the bedrock:Get* action for the Amazon Titan Image Generator resource.

C.

Allow the sqs:ReceiveMessage action and the sqs:DeleteMessage action for the SQS queue resource.

D.

Allow the sqs:GetQueueAttributes action and the sqs:DeleteMessage action for the SQS queue resource.

E.

Allow the sagemaker:PutRecord* action for the Amazon Titan Image Generator resource.

Question 37

An ML engineer is tuning an image classification model that shows poor performance on one of two available classes during prediction. Analysis reveals that the images whose class the model performed poorly on represent an extremely small fraction of the whole training dataset.

The ML engineer must improve the model ' s performance.

Which solution will meet this requirement?

Options:

A.

Optimize for accuracy. Use image augmentation on the less common images to generate new samples.

B.

Optimize for F1 score. Use image augmentation on the less common images to generate new samples.

C.

Optimize for accuracy. Use Synthetic Minority Oversampling Technique (SMOTE) on the less common images to generate new samples.

D.

Optimize for F1 score. Use Synthetic Minority Oversampling Technique (SMOTE) on the less common images to generate new samples.

Question 38

An ML engineer uses an Amazon SageMaker AI notebook instance to run a training job that trains a neural network model with an estimator. The training job loads data iteratively from an Amazon S3 path that is configured as an environment variable. The ML engineer viewed a profiling report of the training job. The ML engineer discovered that a substantial amount of the training time is spent during data loading.

How can the ML engineer improve the training speed?

Options:

A.

Provision Amazon Elastic Block Store (Amazon EBS) Provisioned IOPS SSD io1 storage during the estimator initialization. Download the training data from the S3 path to Amazon EBS. Point the data loader to the EBS location.

B.

Provision Amazon Elastic File System (Amazon EFS) storage during the estimator initialization. Download the training data to Amazon EFS by using the S3 path. Point the data loader to the EFS location.

C.

Download the training data to the estimator by using fast file mode. Point the data loader to the location specified by the S3 path.

D.

Configure the path to the S3 bucket that contains the training data as a hyperparameter instead of an environment variable.

Question 39

A credit card company has a fraud detection model in production on an Amazon SageMaker endpoint. The company develops a new version of the model. The company needs to assess the new model ' s performance by using live data and without affecting production end users.

Which solution will meet these requirements?

Options:

A.

Set up SageMaker Debugger and create a custom rule.

B.

Set up blue/green deployments with all-at-once traffic shifting.

C.

Set up blue/green deployments with canary traffic shifting.

D.

Set up shadow testing with a shadow variant of the new model.

Question 40

A logistics company has installed in-vehicle cameras for basic monitoring of its drivers. The company wants to improve driver safety by identifying distractions that could lead to accidents.

Which solution will meet this requirement with the LEAST operational effort?

Options:

A.

Use Amazon Rekognition eye gaze direction detection to monitor driver behavior and identify distractions.

B.

Use Amazon SageMaker AI to customize an AI model to monitor driver behavior and identify distractions.

C.

Integrate a third-party driver monitoring system with Amazon Rekognition to monitor driver behavior and identify distractions.

D.

Use Amazon Comprehend to analyze text-based driver feedback and identify distractions.

Question 41

A company uses Amazon SageMaker Studio to develop an ML model. The company has a single SageMaker Studio domain. An ML engineer needs to implement a solution that provides an automated alert when SageMaker compute costs reach a specific threshold.

Which solution will meet these requirements?

Options:

A.

Add resource tagging by editing the SageMaker user profile in the SageMaker domain. Configure AWS Cost Explorer to send an alert when the threshold is reached.

B.

Add resource tagging by editing the SageMaker user profile in the SageMaker domain. Configure AWS Budgets to send an alert when the threshold is reached.

C.

Add resource tagging by editing each user ' s IAM profile. Configure AWS Cost Explorer to send an alert when the threshold is reached.

D.

Add resource tagging by editing each user ' s IAM profile. Configure AWS Budgets to send an alert when the threshold is reached.

Question 42

A company ' s dataset for prediction analytics contains duplicate records, missing data, and unusually extreme high or low values. The company needs a solution to resolve the data quality issues quickly. The solution must maintain data integrity and have the LEAST operational overhead.

Which solution will meet these requirements?

Options:

A.

Use AWS Glue DataBrew to delete duplicate records, fill missing values with medians, and replace extreme values with values in a normal range.

B.

Configure an AWS Glue job to identify records with missing values and extreme measurements and delete them.

C.

Create an Amazon EMR Spark job to replace missing values with zeros and merge duplicate records.

D.

Use Amazon SageMaker Data Wrangler to delete duplicates, apply statistical modeling for missing values, and apply outlier detection algorithms.

Question 43

An ML engineer wants to use, prepare, and load data from Amazon S3 for analytics. The ML engineer must run an extract, transform, and load (ETL) job to discover the schema of the data and to store the metadata.

Which solution will meet these requirements with the LEAST manual effort?

Options:

A.

Use AWS Glue to run the ETL job. Use the job to discover the schema and to store the associated metadata in the AWS Glue Data Catalog.

B.

Create an Amazon SageMaker Data Wrangler flow to run the ETL job. Use the job to discover the schema and to store the associated metadata in an S3 bucket.

C.

Create an ETL pipeline by using Amazon Athena integrated with AWS Step Functions. Use the pipeline to run the ETL job to discover the schema and to store the associated metadata in an S3 bucket.

D.

Launch an Amazon EC2 instance that includes the scikit-learn library to run the ETL job. Use the job to discover the schema and to store the associated metadata in Amazon Redshift.

Question 44

A company has significantly increased the amount of data stored as .csv files in an Amazon S3 bucket. Data transformation scripts and queries are now taking much longer than before.

An ML engineer must implement a solution to optimize the data for query performance with the LEAST operational overhead.

Which solution will meet this requirement?

Options:

A.

Configure an AWS Lambda function to split the .csv files into smaller objects.

B.

Configure an AWS Glue job to drop string-type columns and save the results to S3.

C.

Configure an AWS Glue ETL job to convert the .csv files to Apache Parquet format.

D.

Configure an Amazon EMR cluster to process the data in S3.

Question 45

A company uses Amazon SageMakerAI to support ML workflows such as model training and deployment.

Select the correct registry from the following list to meet the requirements for each use case with the LEAST operational overhead. Each registry should be selected one or more times. (Select FOUR.)

• Amazon Elastic Container Registry (Amazon ECR)

• SageMaker Model Registry

Question # 45

Options:

Question 46

Case Study

A company is building a web-based AI application by using Amazon SageMaker. The application will provide the following capabilities and features: ML experimentation, training, a

central model registry, model deployment, and model monitoring.

The application must ensure secure and isolated use of training data during the ML lifecycle. The training data is stored in Amazon S3.

The company needs to use the central model registry to manage different versions of models in the application.

Which action will meet this requirement with the LEAST operational overhead?

Options:

A.

Create a separate Amazon Elastic Container Registry (Amazon ECR) repository for each model.

B.

Use Amazon Elastic Container Registry (Amazon ECR) and unique tags for each model version.

C.

Use the SageMaker Model Registry and model groups to catalog the models.

D.

Use the SageMaker Model Registry and unique tags for each model version.

Question 47

A company runs an ML model on Amazon SageMaker AI. The company uses an automatic process that makes API calls to create training jobs for the model. The company has new compliance rules that prohibit the collection of aggregated metadata from training jobs.

Which solution will prevent SageMaker AI from collecting metadata from the training jobs?

Options:

A.

Opt out of metadata tracking for any training job that is submitted.

B.

Ensure that training jobs are running in a private subnet in a custom VPC.

C.

Encrypt the training data with an AWS Key Management Service (AWS KMS) customer managed key.

D.

Reconfigure the training jobs to use only AWS Nitro instances.

Question 48

A company is using Amazon SageMaker and millions of files to train an ML model. Each file is several megabytes in size. The files are stored in an Amazon S3 bucket. The company needs to improve training performance.

Which solution will meet these requirements in the LEAST amount of time?

Options:

A.

Transfer the data to a new S3 bucket that provides S3 Express One Zone storage. Adjust the training job to use the new S3 bucket.

B.

Create an Amazon FSx for Lustre file system. Link the file system to the existing S3 bucket. Adjust the training job to read from the file system.

C.

Create an Amazon Elastic File System (Amazon EFS) file system. Transfer the existing data to the file system. Adjust the training job to read from the file system.

D.

Create an Amazon ElastiCache (Redis OSS) cluster. Link the Redis OSS cluster to the existing S3 bucket. Stream the data from the Redis OSS cluster directly to the training job.

Question 49

An ML engineer is collecting data to train a classification ML model by using Amazon SageMaker AI. The target column can have two possible values: Class A or Class B. The ML engineer wants to ensure that the number of samples for both Class A and Class B are balanced, without losing any existing training data. The ML engineer must test the balance of the training data.

Which solution will meet this requirement?

Options:

A.

Use SageMaker Clarify to check for class imbalance (CI). If the value is equal to 0, then use random undersampling in SageMaker Data Wrangler to balance the classes.

B.

Use SageMaker Clarify to check for class imbalance (CI). If the value is greater than 0, then use synthetic minority oversampling technique (SMOTE) in SageMaker Data Wrangler to balance the classes.

C.

Use SageMaker JumpStart to generate a class imbalance (CI) report. If the value is greater than 0, then use random undersampling in SageMaker Studio to balance the classes.

D.

Use SageMaker JumpStart to generate a class imbalance (CI) report. If the value is equal to 0, then use synthetic minority oversampling technique (SMOTE) in SageMaker Studio to balance the classes.

Question 50

An ML engineer is using an Amazon SageMaker Studio notebook to train a neural network by creating an estimator. The estimator runs a Python training script that uses Distributed Data Parallel (DDP) on a single instance that has more than one GPU.

The ML engineer discovers that the training script is underutilizing GPU resources. The ML engineer must identify the point in the training script where resource utilization can be optimized.

Which solution will meet this requirement?

Options:

A.

Use Amazon CloudWatch metrics to create a report that describes GPU utilization over time.

B.

Add SageMaker Profiler annotations to the training script. Run the script and generate a report from the results.

C.

Use AWS CloudTrail to create a report that describes GPU utilization and GPU memory utilization over time.

D.

Create a default monitor in Amazon SageMaker Model Monitor and suggest a baseline. Generate a report based on the constraints and statistics the monitor generates.

Question 51

An ML engineer is using Amazon SageMaker AI to train an ML model. The ML engineer needs to use SageMaker AI automatic model tuning (AMT) features to tune the model hyperparameters over a large parameter space.

The model has 20 categorical hyperparameters and 7 continuous hyperparameters that can be tuned. The ML engineer needs to run the tuning job a maximum of 1,000 times. The ML engineer must ensure that each parameter trial is built based on the performance of the previous trial.

Which solution will meet these requirements?

Options:

A.

Define the search space as categorical parameters of 1,000 possible combinations. Use grid search.

B.

Define the search space as continuous parameters. Use random search. Set the maximum number of tuning jobs to 1,000.

C.

Define the search space as categorical parameters and continuous parameters. Use Bayesian optimization. Set the maximum number of training jobs to 1,000.

D.

Define the search space as categorical parameters and continuous parameters. Use grid search. Set the maximum number of tuning jobs to 1,000.

Question 52

A music streaming company constantly streams song ratings from an application to an Amazon S3 bucket. The company wants to use the ratings as an input for training and inference of an Amazon SageMaker AI model.

The company has an AWS Glue Data Catalog that is configured with the S3 bucket as the source. An ML engineer needs to implement a solution to create a repository for this data. The solution must ensure that the data stays synchronized during batch training and real-time inference.

Which solution will meet these requirements?

Options:

A.

Ingest data into SageMaker Feature Store from the S3 bucket. Apply tags and indexes.

B.

Use Amazon Athena. Create tables by using CREATE TABLE AS SELECT (CTAS) queries to group data.

C.

Use AWS Lake Formation. Apply tag-based control on the data.

D.

Use the Generate Data Insights function in SageMaker Data Wrangler.

Question 53

An ML model is deployed in production. The model has performed well and has met its metric thresholds for months.

An ML engineer who is monitoring the model observes a sudden degradation. The performance metrics of the model are now below the thresholds.

What could be the cause of the performance degradation?

Options:

A.

Lack of training data

B.

Drift in production data distribution

C.

Compute resource constraints

D.

Model overfitting

Question 54

A company needs to combine data from multiple sources. The company must use Amazon Redshift Serverless to query an AWS Glue Data Catalog database and underlying data that is stored in an Amazon S3 bucket.

Select and order the correct steps from the following list to meet these requirements. Select each step one time or not at all. (Select and order three.)

• Attach the IAM role to the Redshift cluster.

• Attach the IAM role to the Redshift namespace.

• Create an external database in Amazon Redshift to point to the Data Catalog schema.

• Create an external schema in Amazon Redshift to point to the Data Catalog database.

• Create an IAM role for Amazon Redshift to use to access only the S3 bucket that contains underlying data.

• Create an IAM role for Amazon Redshift to use to access the Data Catalog and the S3 bucket that contains underlying data.

Question # 54

Options:

Question 55

A company has developed a computer vision model. The company needs to deploy the model into production on Amazon SageMaker AI. The company has not hosted a model on SageMaker AI previously.

An ML engineer needs to implement a solution to track model versions. The solution also must provide recommendations about which Amazon EC2 instance types to use to host the model.

Which solution will meet these requirements?

Options:

A.

Register the model in Amazon Elastic Container Registry (Amazon ECR). Use AWS Compute Optimizer for recommendations about instance types.

B.

Register the model in the SageMaker Model Registry. Use SageMaker Autopilot for recommendations about instance types.

C.

Register the model in the SageMaker Model Registry. Use SageMaker Inference Recommender for recommendations about instance types.

D.

Register the model in Amazon Elastic Container Registry (Amazon ECR). Use SageMaker Experiments for recommendations about instance types.

Question 56

A construction company is using Amazon SageMaker AI to train specialized custom object detection models to identify road damage. The company uses images from multiple cameras. The images are stored as JPEG objects in an Amazon S3 bucket.

The images need to be pre-processed by using computationally intensive computer vision techniques before the images can be used in the training job. The company needs to optimize data loading and pre-processing in the training job. The solution cannot affect model performance or increase compute or storage resources.

Which solution will meet these requirements?

Options:

A.

Use SageMaker AI file mode to load and process the images in batches.

B.

Reduce the batch size of the model and increase the number of pre-processing threads.

C.

Reduce the quality of the training images in the S3 bucket.

D.

Convert the images into RecordIO format and use the lazy loading pattern.

Question 57

A gaming company needs to deploy a natural language processing (NLP) model to moderate a chat forum in a game. The workload experiences heavy usage during evenings and weekends but minimal activity during other hours.

Which solution will meet these requirements MOST cost-effectively?

Options:

A.

Use an Amazon SageMaker AI batch transform job with fixed capacity.

B.

Use Amazon SageMaker Serverless Inference.

C.

Use a single Amazon EC2 GPU instance with reserved capacity.

D.

Use Amazon SageMaker Asynchronous Inference.

Question 58

An ML engineer is developing a classification model. The ML engineer needs to use custom libraries in processing jobs, training jobs, and pipelines in Amazon SageMaker AI.

Which solution will provide this functionality with the LEAST implementation effort?

Options:

A.

Manually install the libraries in the SageMaker AI containers.

B.

Build a custom Docker container that includes the required libraries. Host the container in Amazon Elastic Container Registry (Amazon ECR). Use the ECR image in the SageMaker AI jobs and pipelines.

C.

Use a SageMaker AI notebook instance and install libraries at startup.

D.

Run code externally on Amazon EC2 and import results into SageMaker AI.

Question 59

An ML engineer receives datasets that contain missing values, duplicates, and extreme outliers. The ML engineer must consolidate these datasets into a single data frame and must prepare the data for ML.

Which solution will meet these requirements?

Options:

A.

Use Amazon SageMaker Data Wrangler to import the datasets and to consolidate them into a single data frame. Use the cleansing and enrichment functionalities to prepare the data.

B.

Use Amazon SageMaker Ground Truth to import the datasets and to consolidate them into a single data frame. Use the human-in-the-loop capability to prepare the data.

C.

Manually import and merge the datasets. Consolidate the datasets into a single data frame. Use Amazon Q Developer to generate code snippets that will prepare the data.

D.

Manually import and merge the datasets. Consolidate the datasets into a single data frame. Use Amazon SageMaker data labeling to prepare the data.

Question 60

An ML engineer needs to use AWS CloudFormation to create an ML model that an Amazon SageMaker endpoint will host.

Which resource should the ML engineer declare in the CloudFormation template to meet this requirement?

Options:

A.

AWS::SageMaker::Model

B.

AWS::SageMaker::Endpoint

C.

AWS::SageMaker::NotebookInstance

D.

AWS::SageMaker::Pipeline

Question 61

A company is using an Amazon S3 bucket to collect data that will be used for ML workflows. The company needs to use AWS Glue DataBrew to clean and normalize the data.

Which solution will meet these requirements?

Options:

A.

Create a DataBrew dataset by using the S3 path. Clean and normalize the data by using a DataBrew profile job.

B.

Create a DataBrew dataset by using the S3 path. Clean and normalize the data by using a DataBrew recipe job.

C.

Create a DataBrew dataset by using a JDBC driver to connect to the S3 bucket. Use a profile job.

D.

Create a DataBrew dataset by using a JDBC driver to connect to the S3 bucket. Use a recipe job.

Question 62

An ML engineer needs to deploy ML models to get inferences from large datasets in an asynchronous manner. The ML engineer also needs to implement scheduled monitoring of data quality for the models and must receive alerts when changes in data quality occur.

Which solution will meet these requirements?

Options:

A.

Deploy the models by using scheduled AWS Glue jobs. Use Amazon CloudWatch alarms to monitor the data quality and send alerts.

B.

Deploy the models by using scheduled AWS Batch jobs. Use AWS CloudTrail to monitor the data quality and send alerts.

C.

Deploy the models by using Amazon ECS on AWS Fargate. Use Amazon EventBridge to monitor the data quality and send alerts.

D.

Deploy the models by using Amazon SageMaker AI batch transform. Use SageMaker Model Monitor to monitor the data quality and send alerts.

Question 63

A company ' s ML engineer has deployed an ML model for sentiment analysis to an Amazon SageMaker endpoint. The ML engineer needs to explain to company stakeholders how the model makes predictions.

Which solution will provide an explanation for the model ' s predictions?

Options:

A.

Use SageMaker Model Monitor on the deployed model.

B.

Use SageMaker Clarify on the deployed model.

C.

Show the distribution of inferences from A/В testing in Amazon CloudWatch.

D.

Add a shadow endpoint. Analyze prediction differences on samples.

Question 64

A company needs to give its ML engineers appropriate access to training data. The ML engineers must access training data from only their own business group. The ML engineers must not be allowed to access training data from other business groups.

The company uses a single AWS account and stores all the training data in Amazon S3 buckets. All ML model training occurs in Amazon SageMaker.

Which solution will provide the ML engineers with the appropriate access?

Options:

A.

Enable S3 bucket versioning.

B.

Configure S3 Object Lock settings for each user.

C.

Add cross-origin resource sharing (CORS) policies to the S3 buckets.

D.

Create IAM policies. Attach the policies to IAM users or IAM roles.

Question 65

A company uses a training job on Amazon SageMaker Al to train a neural network. The job first trains a model and then evaluates the model ' s performance ag

test dataset. The company uses the results from the evaluation phase to decide if the trained model will go to production.

The training phase takes too long. The company needs solutions that can shorten training time without decreasing the model ' s final performance.

Select the correct solutions from the following list to meet the requirements for each description. Select each solution one time or not at all. (Select THREE.)

. Change the epoch count.

. Choose an Amazon EC2 Spot Fleet.

· Change the batch size.

. Use early stopping on the training job.

· Use the SageMaker Al distributed data parallelism (SMDDP) library.

. Stop the training job.

Question # 65

Options:

Question 66

A company needs an AWS solution that will automatically create versions of ML models as the models are created. Which solution will meet this requirement?

Options:

A.

Amazon Elastic Container Registry (Amazon ECR)

B.

Model packages from Amazon SageMaker Marketplace

C.

Amazon SageMaker ML Lineage Tracking

D.

Amazon SageMaker Model Registry

Question 67

An ML engineer needs to use an ML model to predict the price of apartments in a specific location.

Which metric should the ML engineer use to evaluate the model ' s performance?

Options:

A.

Accuracy

B.

Area Under the ROC Curve (AUC)

C.

F1 score

D.

Mean absolute error (MAE)

Question 68

A company has built more than 50 models and deployed the models on Amazon SageMaker Al as real-time inference

endpoints. The company needs to reduce the costs of the SageMaker Al inference endpoints. The company used the same

ML framework to build the models. The company ' s customers require low-latency access to the models.

Select and order the correct steps from the following list to reduce the cost of inference and keep latency low. Select each

step one time or not at all. (Select and order FIVE.)

· Create an endpoint configuration that references a multi-model container.

. Create a SageMaker Al model with multi-model endpoints enabled.

. Deploy a real-time inference endpoint by using the endpoint configuration.

. Deploy a serverless inference endpoint configuration by using the endpoint configuration.

· Spread the existing models to multiple different Amazon S3 bucket paths.

. Upload the existing models to the same Amazon S3 bucket path.

. Update the models to use the new endpoint ID. Pass the model IDs to the new endpoint.

Question # 68

Options:

Question 69

A company wants to host an ML model on Amazon SageMaker. An ML engineer is configuring a continuous integration and continuous delivery (Cl/CD) pipeline in AWS CodePipeline to deploy the model. The pipeline must run automatically when new training data for the model is uploaded to an Amazon S3 bucket.

Select and order the pipeline ' s correct steps from the following list. Each step should be selected one time or not at all. (Select and order three.)

• An S3 event notification invokes the pipeline when new data is uploaded.

• S3 Lifecycle rule invokes the pipeline when new data is uploaded.

• SageMaker retrains the model by using the data in the S3 bucket.

• The pipeline deploys the model to a SageMaker endpoint.

• The pipeline deploys the model to SageMaker Model Registry.

Question # 69

Options:

Question 70

A company stores time-series data about user clicks in an Amazon S3 bucket. The raw data consists of millions of rows of user activity every day. ML engineers access the data to develop their ML models.

The ML engineers need to generate daily reports and analyze click trends over the past 3 days by using Amazon Athena. The company must retain the data for 30 days before archiving the data.

Which solution will provide the HIGHEST performance for data retrieval?

Options:

A.

Keep all the time-series data without partitioning in the S3 bucket. Manually move data that is older than 30 days to separate S3 buckets.

B.

Create AWS Lambda functions to copy the time-series data into separate S3 buckets. Apply S3 Lifecycle policies to archive data that is older than 30 days to S3 Glacier Flexible Retrieval.

C.

Organize the time-series data into partitions by date prefix in the S3 bucket. Apply S3 Lifecycle policies to archive partitions that are older than 30 days to S3 Glacier Flexible Retrieval.

D.

Put each day ' s time-series data into its own S3 bucket. Use S3 Lifecycle policies to archive S3 buckets that hold data that is older than 30 days to S3 Glacier Flexible Retrieval.

Question 71

An ML engineering team is spread across multiple locations. When the lead ML engineer opens an Amazon SageMaker AI notebook, the ML engineer does not see the latest merged notebook made by other team members from a Git repository.

The lead ML engineer must see the latest SageMaker AI notebook updates.

Which solution will meet this requirement?

Options:

A.

Run the !git pull origin master command.

B.

Run the !git commit command.

C.

Run the !git push origin master command.

D.

Run the !git branch command.

Question 72

An ML engineer develops a neural network model to predict whether customers will continue to subscribe to a service. The model performs well on training data. However, the accuracy of the model decreases significantly on evaluation data.

The ML engineer must resolve the model performance issue.

Which solution will meet this requirement?

Options:

A.

Penalize large weights by using L1 or L2 regularization.

B.

Remove dropout layers from the neural network.

C.

Train the model for longer by increasing the number of epochs.

D.

Capture complex patterns by increasing the number of layers.

Page: 1 / 24
Total 241 questions