Weekend Sale Special Limited Time Flat 70% Discount offer - Ends in 0d 00h 00m 00s - Coupon code: 70spcl

Databricks Databricks-Certified-Professional-Data-Scientist Databricks Certified Professional Data Scientist Exam Exam Practice Test

Page: 1 / 14
Total 138 questions

Databricks Certified Professional Data Scientist Exam Questions and Answers

Question 1

What are the advantages of the mutual information over the Pearson correlation for text classification problems?

Options:

A.

The mutual information has a meaningful test for statistical significance.

B.

The mutual information can signal non-linear relationships between the dependent and independent variables.

C.

The mutual information is easier to parallelize.

D.

The mutual information doesn't assume that the variables are normally distributed.

Question 2

A problem statement is given as below

Hospital records show that of patients suffering from a certain disease, 75% die of it. What is the probability that of 6 randomly selected patients, 4 will recover?

Which of the following model will you use to solve it.

Options:

A.

Binomial

B.

Poisson

C.

Normal

D.

Any of the above

Question 3

A fruit may be considered to be an apple if it is red, round, and about 3" in diameter. A naive Bayes classifier considers each of these features to contribute independently to the probability that this fruit is an apple, regardless of the

Options:

A.

Presence of the other features.

B.

Absence of the other features.

C.

Presence or absence of the other features

D.

None of the above

Question 4

Select the statement which applies correctly to the Naive Bayes

Options:

A.

Works with a small amount of data

B.

Sensitive to how the input data is prepared

C.

Works with nominal values

Question 5

Marie is getting married tomorrow, at an outdoor ceremony in the desert. In recent years, it has

rained only 5 days each year. Unfortunately, the weatherman has predicted rain for tomorrow. When it actually rains, the weatherman correctly forecasts rain 90% of the time. When it doesn't rain, he incorrectly forecasts rain 10% of the time. Which of the following will you use to calculate the probability whether it will rain on the

day of Marie’s wedding?

Options:

A.

Naive Bayes

B.

Logistic Regression

C.

Random Decision Forests

D.

All of the above

Question 6

You are working on a Data Science project and during the project you have been gibe a responsibility to interview all the stakeholders in the project. In which phase of the project you are?

Options:

A.

Discovery

B.

Data Preparations

C.

Creating Models

D.

Executing Models

E.

Creating visuals from the outcome

F.

Operationnalise the models

Question 7

Which analytical method is considered unsupervised?

Question # 7

may have a trend component that is quadratic in nature. Which pattern of data will indicate that the trend in the time series data is quadratic in nature?

Options:

A.

Naive Bayesian classifier

B.

Decision tree

C.

Linear regression

D.

K-means clustering

Question 8

What is one modeling or descriptive statistical function in MADlib that is typically not provided in a standard relational database?

Options:

A.

Expected value

B.

Variance

C.

Linear regression

D.

Quantiles

Question 9

Which of the following is a correct example of the target variable in regression (supervised learning)?

Options:

A.

Nominal values like true, false

B.

Reptile, fish, mammal, amphibian, plant, fungi

C.

Infinite number of numeric values, such as 0.100, 42.001, 1000.743..

D.

All of the above

Question 10

Select the correct option from the below

Options:

A.

If you're trying to predict or forecast a target value^ then you need to look into supervised learning.

B.

If you've chosen supervised learning, with discrete target value like Yes/No. 1/2/3, A/B/C: or Red/Yellow/Black, then look into classification.

C.

If the target value can take on a number of values, say any value from 0.00 to 100.00, or -999 to 999: or +_to -_, then you need to look unsupervised learning

D.

If you're not trying to predict a target value, then you need to look into unsupervised learning

E.

Are you trying to fit your data into some discrete groups? If so and that's all you need, you should look into clustering.

Question 11

A researcher is interested in how variables, such as GRE (Graduate Record Exam scores), GPA (grade point average) and prestige of the undergraduate institution, effect admission into graduate school. The response variable, admit/don't admit, is a binary variable.

Above is an example of

Options:

A.

Linear Regression

B.

Logistic Regression

C.

Recommendation system

D.

Maximum likelihood estimation

E.

Hierarchical linear models

Question 12

Of all the smokers in a particular district, 40% prefer brand A and 60% prefer brand B. Of those smokers who prefer brand A. 30% are females, and of those who prefer brand B. 40% are female. What is the probability that a randomly selected smoker prefers brand A, given that the person selected is a female?

Which of the following is a best way to solve this problem?

Options:

A.

Bays Theorem

B.

Poisson Distribution

C.

Binomial Distribution

D.

None of the above

Question 13

In which lifecycle stage are test and training data sets created?

Options:

A.

Model planning

B.

Discovery

C.

Model building

D.

Data preparation

Question 14

You are creating a regression model with the input income, education and current debt of a customer, what could be the possible output from this model.

Options:

A.

Customer fit as a good

B.

Customer fit as acceptable or average category

C.

expressed as a percent, that the customer will default on a loan

D.

1 and 3 are correct

E.

2 and 3 are correct

Question 15

Which of the following are point estimation methods?

Options:

A.

MAP

B.

MLE

C.

MMSE

Question 16

What is the probability that the total of two dice will be greater than 8, given that the first die is a 6?

Options:

A.

1/3

B.

2/3

C.

1/6

D.

2/6

Question 17

Assume some output variable "y" is a linear combination of some independent input variables "A" plus some independent noise "e". The way the independent variables are combined is defined by a parameter vector B y=AB+e where X is an m x n matrix. B is a vector of n unknowns, and b is a vector of m values. Assuming that m is not equal to n and the columns of X are linearly independent, which expression correctly solves for B?

Question # 17

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

Question 18

You are asked to create a model to predict the total number of monthly subscribers for a specific magazine. You are provided with 1 year's worth of subscription and payment data, user demographic data, and 10 years worth of content of the magazine (articles and pictures). Which algorithm is the most appropriate for building a predictive model for subscribers?

Options:

A.

Linear regression

B.

Logistic regression

C.

Decision trees

D.

TF-IDF

Question 19

In which of the scenario you can use the regression to predict the values

Options:

A.

Samsung can use it for mobile sales forecast

B.

Mobile companies can use it to forecast manufacturing defects

C.

Probability of the celebrity divorce

D.

Only 1 and 2

E.

All 1 ,2 and 3

Question 20

Select the correct statement which applies to Supervised learning

Options:

A.

We asks the machine to learn from our data when we specify a target variable.

B.

Lesser machine's task to only divining some pattern from the input data to get the target variable

C.

Instead of telling the machine Predict Y for our data X, we're asking What can you tell me about X?

Page: 1 / 14
Total 138 questions