New Year Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70special

Databricks Databricks-Certified-Professional-Data-Scientist Databricks Certified Professional Data Scientist Exam Exam Practice Test

Databricks Certified Professional Data Scientist Exam Questions and Answers

Testing Engine

  • Product Type: Testing Engine
$37.5  $124.99

PDF Study Guide

  • Product Type: PDF Study Guide
$33  $109.99
Question 1

Which of the following are advantages of the Support Vector machines?

Options:

A.

Effective in high dimensional spaces.

B.

it is memory efficient

C.

possible to specify custom kernels

D.

Effective in cases where number of dimensions is greater than the number of samples

E.

Number of features is much greater than the number of samples, the method still give good performances

F.

SVMs directly provide probability estimates

Question 2

In which of the following scenario you should apply the Bay's Theorem

Options:

A.

The sample space is partitioned into a set of mutually exclusive events {A1, A2, . .., An }.

B.

Within the sample space, there exists an event B, for which P(B) > 0.

C.

The analytical goal is to compute a conditional probability of the form: P(Ak | B ).

D.

In all above cases

Question 3

You are creating a model for the recommending the book at Amazon.com, so which of the following recommender system you will use you don't have cold start problem?

Options:

A.

Naive Bayes classifier

B.

Item-based collaborative filtering

C.

User-based collaborative filtering

D.

Content-based filtering

Question 4

In which of the scenario you can use the regression to predict the values

Options:

A.

Samsung can use it for mobile sales forecast

B.

Mobile companies can use it to forecast manufacturing defects

C.

Probability of the celebrity divorce

D.

Only 1 and 2

E.

All 1 ,2 and 3

Question 5

Which of the following is not a correct application for the Classification?

Options:

A.

credit scoring

B.

tumor detection

C.

image recognition

D.

drug discovery

Question 6

Select the correct statement which applies to K-Nearest Neighbors

Options:

A.

No Assumption about the data

B.

Computationally expensive

C.

Require less memory

D.

Works with Numeric Values

Question 7

You are having 1000 patients' data with the height and age. Where age in years and height in meters. You wanted to create cluster using this two attributes. You wanted to have near equal effect for both the age and height while creating the cluster. What you can do?

Options:

A.

You will be adding height with the numeric value 100

B.

You will be converting each height value to centimeters

C.

You will be dividing both age and height with their respective standard deviation

D.

You will be taking square root of height

Question 8

Clustering is a type of unsupervised learning with the following goals

Options:

A.

Maximize a utility function

B.

Find similarities in the training data

C.

Not to maximize a utility function

D.

1 and 2

E.

2 and 3

Question 9

A fruit may be considered to be an apple if it is red, round, and about 3" in diameter. A naive Bayes classifier considers each of these features to contribute independently to the probability that this fruit is an apple, regardless of the

Options:

A.

Presence of the other features.

B.

Absence of the other features.

C.

Presence or absence of the other features

D.

None of the above

Question 10

A problem statement is given as below

Hospital records show that of patients suffering from a certain disease, 75% die of it. What is the probability that of 6 randomly selected patients, 4 will recover?

Which of the following model will you use to solve it.

Options:

A.

Binomial

B.

Poisson

C.

Normal

D.

Any of the above

Question 11

If E1 and E2 are two events, how do you represent the conditional probability given that E2 occurs given that E1 has occurred?

Options:

A.

P(E1)/P(E2)

B.

P(E1+E2)/P(E1)

C.

P(E2)/P(E1)

D.

P(E2)/(P(E1+E2)

Question 12

You are studying the behavior of a population, and you are provided with multidimensional data at the individual level. You have identified four specific individuals who are valuable to your study, and would like to find all users who are most similar to each individual. Which algorithm is the most appropriate for this study?

Options:

A.

Association rules

B.

Decision trees

C.

Linear regression

D.

K-means clustering

Question 13

Question-18. What is the best way to ensure that the k-means algorithm will find a good clustering of a collection of vectors?

Options:

A.

Only consider values of k larger than log(N), where N is the number of observations in the data set

B.

Run at least log(N) iterations of Lloyd's algorithm, where N is the number of observations in the data set

C.

Choose the initial centroids so that they all He along different axes

D.

Choose the initial centroids so that they are far away from each other

Question 14

Let's say you have two cases as below for the movie ratings

1. You recommend to a user a movie with four stars and he really doesn't like it and he'd rate it two stars

2. You recommend a movie with three stars but the user loves it (he'd rate it five stars). So which statement correctly applies?

Options:

A.

In both cases, the contribution to the RMSE is the same

B.

In both cases, the contribution to the RMSE is the different

C.

In both cases, the contribution to the RMSE, could varies

D.

None of the above

Question 15

A data scientist is asked to implement an article recommendation feature for an on-line magazine.

The magazine does not want to use client tracking technologies such as cookies or reading history. Therefore, only the style and subject matter of the current article is available for making recommendations. All of the magazine's articles are stored in a database in a format suitable for analytics.

Which method should the data scientist try first?

Options:

A.

K Means Clustering

B.

Naive Bayesian

C.

Logistic Regression

D.

Association Rules

Question 16

Which of the following question statement falls under data science category?

Options:

A.

What happened in last six months?

B.

How many products have been sold in a last month?

C.

Where is a problem for sales?

D.

Which is the optimal scenario for selling this product?

E.

What happens, if these scenario continues?

Question 17

In which of the scenario you can use the linear regression model?

Options:

A.

Predicting Home Price based on the location and house area

B.

Predicting demand of the goods and services based on the weather

C.

Predicting tumor size reduction based on input as number of radiation treatment

D.

Predicting sales of the text book based on the number of students in state

Question 18

As a data scientist consultant at ABC Corp, you are working on a recommendation engine for the learning resources for end user. So Which recommender system technique benefits most from additional user preference data?

Options:

A.

Naive Bayes classifier

B.

Item-based collaborative filtering

C.

Logistic Regression

D.

Content-based filtering

Question 19

You are doing advanced analytics for the one of the medical application using the regression and you have two variables which are weight and height and they are very important input variables, which cannot be ignored and they are also highly co-related. What is the best solution for that?

Options:

A.

You will take cube root of height

B.

You will take square root of weight

C.

You will take square of the height.

D.

You would consider using BMI (Body Mass Index)

Question 20

You are creating a regression model with the input income, education and current debt of a customer, what could be the possible output from this model.

Options:

A.

Customer fit as a good

B.

Customer fit as acceptable or average category

C.

expressed as a percent, that the customer will default on a loan

D.

1 and 3 are correct

E.

2 and 3 are correct