Summer Sale- Special Discount Limited Time 65% Offer - Ends in 0d 00h 00m 00s - Coupon code: netdisc

CompTIA DY0-001 CompTIA DataX Exam Exam Practice Test

Page: 1 / 9
Total 85 questions

CompTIA DataX Exam Questions and Answers

Question 1

Which of the following types of layers is used to downsample feature detection when using a convolutional neural network?

Options:

A.

Pooling

B.

Input

C.

Output

D.

Hidden

Question 2

Which of the following image data augmentation techniques allows a data scientist to increase the size of a data set?

Options:

A.

Clipping

B.

Cropping

C.

Masking

D.

Scaling

Question 3

Which of the following does k represent in the k-means model?

Options:

A.

Number of model tests

B.

Number of data splits

C.

Number of clusters

D.

Distance between features

Question 4

Which of the following is the naive assumption in Bayes' rule?

Options:

A.

Normal distribution

B.

Independence

C.

Uniform distribution

D.

Homoskedasticity

Question 5

Which of the following measures would a data scientist most likely use to calculate the similarity of two text strings?

Options:

A.

Word cloud

B.

Edit distance

C.

String indexing

D.

k-nearest neighbors

Question 6

An analyst is examining data from an array of temperature sensors and sees that one sensor consistently returns values that are much higher than the values from the other sensors. Which of the following terms best describes this type of error?

Options:

A.

Synthetic

B.

Systematic

C.

Heteroskedastic

D.

Idiosyncratic

Question 7

Which of the following modeling tools is appropriate for solving a scheduling problem?

Options:

A.

One-armed bandit

B.

Constrained optimization

C.

Decision tree

D.

Gradient descent

Question 8

A data analyst is examining the correlation matrix of a new data set to identify issues that could adversely impact model performance. Which of the following is the analyst most likely checking for?

Options:

A.

Undersampling

B.

Multicollinearity

C.

Oversampling

D.

Overfitting

Question 9

In a modeling project, people evaluate phrases and provide reactions as the target variable for the model. Which of the following best describes what this model is doing?

Options:

A.

Sentiment analysis

B.

Named-entity recognition

C.

TF-IDF vectorization

D.

Part-of-speech tagging

Question 10

A data scientist is preparing to brief a non-technical audience that is focused on analysis and results. During the modeling process, the data scientist produced the following artifacts:

Which of the following artifacts should the data scientist include in the briefing? (Choose two.)

Options:

A.

Final charts and dashboards

B.

Model selection, justification, and purpose

C.

Code documentation

D.

Mathematical descriptions of clustering algorithms included in the selected model

E.

Model performance statistics (accuracy, precision, recall, F1 score, etc.)

F.

Data dictionary

Question 11

A company created a very popular collectible card set. Collectors attempt to collect the entire set, but the availability of each card varies, because some cards have higher production volumes than others. The set contains a total of 12 cards. The attributes of the cards are shown.

Question # 11

The data scientist is tasked with designing an initial model iteration to predict whether the animal on the card lives in the sea or on land, given the card's features: Wrapper color, Wrapper shape, and Animal.

Which of the following is the best way to accomplish this task?

Options:

A.

ARIMA

B.

Linear regression

C.

Association rules

D.

Decision trees

Question 12

Given matrix

Question # 12

Which of the following is AT?

Options:

A.

Option A12

B.

Option B12

C.

Option C12

D.

Option D12

Question 13

A team is building a spam detection system. The team wants a probability-based identification method without complex, in-depth training from the historical data set. Which of the following methods would best serve this purpose?

Options:

A.

Logistic regression

B.

Random forest

C.

Naive Bayes

D.

Linear regression

Question 14

Which of the following techniques enables automation and iteration of code releases?

Options:

A.

Virtualization

B.

Markdown

C.

Code isolation

D.

CI/CD

Question 15

A data scientist is deploying a model that needs to be accessed by multiple departments with minimal development effort by the departments. Which of the following APIs would be best for the data scientist to use?

Options:

A.

SOAP

B.

RPC

C.

JSON

D.

REST

Question 16

Which of the following compute delivery models allows packaging of only critical dependencies while developing a reusable asset?

Options:

A.

Thin clients

B.

Containers

C.

Virtual machines

D.

Edge devices

Question 17

Given the equation:

Question # 17

Xt = δ + ϕ1Xt−1 + ωt, where ωt ∼ N(0, σω²)

Which of the following time series models best represents this process?

Options:

A.

ARIMA(1,1,1)

B.

ARMA(1,1)

C.

SARIMA(1,1,1) × (1,1,1)1

D.

AR(1)

Question 18

A data scientist is working with a data set that covers a two-year period for a large number of machines. The data set contains:

    Machine system ID numbers

    Sensor measurement values

    Daily timestamps for each machine

The data scientist needs to plot the total measurements from all the machines over the entire time period. Which of the following is the best way to present this data?

Options:

A.

Scatter plot

B.

Line plot

C.

Histogram

D.

Box-and-whisker plot

Question 19

A data scientist uses a large data set to build multiple linear regression models to predict the likely market value of a real estate property. The selected new model has an RMSE of 995 on the holdout set and an adjusted R² of 0.75. The benchmark model has an RMSE of 1,000 on the holdout set. Which of the following is the best business statement regarding the new model?

Options:

A.

The model should be deployed because it has a lower RMSE.

B.

The model's adjusted R² is exceptionally strong for such a complex relationship.

C.

The model fails to improve meaningfully on the benchmark model.

D.

The model's adjusted R² is too low for the real estate industry.

Question 20

A data scientist is analyzing a data set with categorical features and would like to make those features more useful when building a model. Which of the following data transformation techniques should the data scientist use? (Choose two.)

Options:

A.

Normalization

B.

One-hot encoding

C.

Linearization

D.

Label encoding

E.

Scaling

F.

Pivoting

Question 21

A data analyst wants to find the latitude and longitude of a mailing address. Which of the following is the best method to use?

Options:

A.

One-hot encoding

B.

Binning

C.

Geocoding

D.

Imputing

Question 22

Which of the following distributions would be best to use for hypothesis testing on a data set with 20 observations?

Options:

A.

Power law

B.

Normal

C.

Uniform

D.

Student's t-

Question 23

Which of the following is the layer that is responsible for the depth in deep learning?

Options:

A.

Convolution

B.

Dropout

C.

Pooling

D.

Hidden

Question 24

A data scientist has built a model that provides the likelihood of an error occurring in a factory. The historical accuracy of the model is 90%. At a specific factory, the model is reporting a likelihood score of 0.90. Which of the following explains a confidence score of 0.90?

Options:

A.

Running this model for all known factory issues, it is expected the model will identify 90 out of 100 known factory issues.

B.

Running this model on 100 samples of factories, a certain model performance is expected for 90 out of the 100 samples.

C.

Running this model 100 times on a factory, it is expected the model will predict 90 out of 100 factory errors.

D.

Running this model 100 times within a factory it is expected the model will predict error 90 out of 100 times the model is ran.

Question 25

A data analyst wants to generate the most data using tables from a database. Which of the following is the best way to accomplish this objective?

Options:

A.

INNER JOIN

B.

LEFT OUTER JOIN

C.

RIGHT OUTER JOIN

D.

FULL OUTER JOIN

Page: 1 / 9
Total 85 questions