New Year Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70special

Microsoft DP-100 Designing and Implementing a Data Science Solution on Azure Exam Practice Test

Page: 1 / 44
Total 441 questions

Designing and Implementing a Data Science Solution on Azure Questions and Answers

Testing Engine

  • Product Type: Testing Engine
$42  $139.99

PDF Study Guide

  • Product Type: PDF Study Guide
$36  $119.99
Question 1

You need to implement a scaling strategy for the local penalty detection data.

Which normalization type should you use?

Options:

A.

Streaming

B.

Weight

C.

Batch

D.

Cosine

Question 2

You need to use the Python language to build a sampling strategy for the global penalty detection models.

How should you complete the code segment? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Options:

Question 3

You need to define a modeling strategy for ad response.

Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.

Options:

Question 4

You need to resolve the local machine learning pipeline performance issue. What should you do?

Options:

A.

Increase Graphic Processing Units (GPUs).

B.

Increase the learning rate.

C.

Increase the training iterations,

D.

Increase Central Processing Units (CPUs).

Question 5

You need to implement a feature engineering strategy for the crowd sentiment local models.

What should you do?

Options:

A.

Apply an analysis of variance (ANOVA).

B.

Apply a Pearson correlation coefficient.

C.

Apply a Spearman correlation coefficient.

D.

Apply a linear discriminant analysis.

Question 6

You need to implement a model development strategy to determine a user’s tendency to respond to an ad.

Which technique should you use?

Options:

A.

Use a Relative Expression Split module to partition the data based on centroid distance.

B.

Use a Relative Expression Split module to partition the data based on distance travelled to the event.

C.

Use a Split Rows module to partition the data based on distance travelled to the event.

D.

Use a Split Rows module to partition the data based on centroid distance.

Question 7

You need to define an evaluation strategy for the crowd sentiment models.

Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.

Options:

Question 8

You need to modify the inputs for the global penalty event model to address the bias and variance issue.

Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.

Options:

Question 9

You need to implement a new cost factor scenario for the ad response models as illustrated in the

performance curve exhibit.

Which technique should you use?

Options:

A.

Set the threshold to 0.5 and retrain if weighted Kappa deviates +/- 5% from 0.45.

B.

Set the threshold to 0.05 and retrain if weighted Kappa deviates +/- 5% from 0.5.

C.

Set the threshold to 0.2 and retrain if weighted Kappa deviates +/- 5% from 0.6.

D.

Set the threshold to 0.75 and retrain if weighted Kappa deviates +/- 5% from 0.15.

Question 10

You need to build a feature extraction strategy for the local models.

How should you complete the code segment? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Options:

Question 11

You need to define an evaluation strategy for the crowd sentiment models.

Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.

Options:

Question 12

You need to define a process for penalty event detection.

Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.

Options:

Question 13

You need to define a process for penalty event detection.

Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.

Options:

Question 14

You need to select an environment that will meet the business and data requirements.

Which environment should you use?

Options:

A.

Azure HDInsight with Spark MLlib

B.

Azure Cognitive Services

C.

Azure Machine Learning Studio

D.

Microsoft Machine Learning Server

Question 15

You create a multi-class image classification deep learning model.

The model must be retrained monthly with the new image data fetched from a public web portal. You create an Azure Machine Learning pipeline to fetch new data, standardize the size of images and retrain the model.

You need to use the Azure Machine Learning Python SEX v2 to configure the schedule for the pipeline. The schedule should be defined by using the frequency and interval properties with frequency set to month' and interval set to "1:

Which three classes should you instantiate in sequence"' To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.

Options:

Question 16

You use the designer to create a training pipeline for a classification model. The pipeline uses a dataset that includes the features and labels required for model training.

You create a real-time inference pipeline from the training pipeline. You observe that the schema for the generated web service input is based on the dataset and includes the label column that the model predicts. Client applications that use the service must not be required to submit this value.

You need to modify the inference pipeline to meet the requirement.

What should you do?

Options:

A.

Add a Select Columns in Dataset module to the inference pipeline after the dataset and use it to select all columns other than the label.

B.

Delete the dataset from the training pipeline and recreate the real-time inference pipeline.

C.

Delete the Web Service Input module from the inference pipeline.

D.

Replace the dataset in the inference pipeline with an Enter Data Manually module that includes data for the feature columns but not the label column.

Question 17

You deploy a model as an Azure Machine Learning real-time web service using the following code.

The deployment fails.

You need to troubleshoot the deployment failure by determining the actions that were performed during deployment and identifying the specific action that failed.

Which code segment should you run?

Options:

A.

service.get_logs()

B.

service.state

C.

service.serialize()

D.

service.update_deployment_state()

Question 18

You manage an Azure Machine Learning workspace.

An MLflow model is already registered. You plan to customize how the deployment does inference. You need to deploy the MLflow model to a batch endpoint for batch inferencing. What should you create first?

Options:

A.

scoring script

B.

deployment

C.

environment

D.

deployment definition

Question 19

You are performing feature scaling by using the scikit-learn Python library for x.1 x2, and x3 features.

Original and scaled data is shown in the following image.

Use the drop-down menus to select the answer choice that answers each question based on the information presented in the graphic.

NOTE: Each correct selection is worth one point.

Options:

Question 20

You create a binary classification model using Azure Machine Learning Studio.

You must use a Receiver Operating Characteristic (RO C) curve and an F1 score to evaluate the model.

You need to create the required business metrics.

How should you complete the experiment? To answer, select the appropriate options in the dialog box in the answer area.

NOTE: Each correct selection is worth one point.

Options:

Question 21

You are moving a large dataset from Azure Machine Learning Studio to a Weka environment.

You need to format the data for the Weka environment.

Which module should you use?

Options:

A.

Convert to CSV

B.

Convert to Dataset

C.

Convert to ARFF

D.

Convert to SVMLight

Question 22

You train a machine learning model.

You must deploy the model as a real-time inference service for testing. The service requires low CPU utilization and less than 48 MB of RAM. The compute target for the deployed service must initialize automatically while minimizing cost and administrative overhead.

Which compute target should you use?

Options:

A.

Azure Kubernetes Service (AKS) inference cluster

B.

Azure Machine Learning compute cluster

C.

Azure Container Instance (ACI)

D.

attached Azure Databricks cluster

Question 23

You have a Python data frame named salesData in the following format:

The data frame must be unpivoted to a long data format as follows:

You need to use the pandas.melt() function in Python to perform the transformation.

How should you complete the code segment? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Options:

Question 24

You create an Azure Machine Learning workspace named ML-workspace. You also create an Azure Databricks workspace named DB-workspace. DB-workspace contains a cluster named DB-cluster.

You must use DB-cluster to run experiments from notebooks that you import into DB-workspace.

You need to use ML-workspace to track MLflow metrics and artifacts generated by experiments running on DB-cluster. The solution must minimize the need for custom code.

What should you do?

Options:

A.

From DB-cluster, configure the Advanced Logging option.

B.

From DB-workspace. configure the Link Azure ML workspace option.

C.

From ML-workspace. create an attached compute.

D.

From ML-workspace. create a compute cluster.

Question 25

You use Azure Machine Learning Studio to build a machine learning experiment.

You need to divide data into two distinct datasets.

Which module should you use?

Options:

A.

Split Data

B.

Load Trained Model

C.

Assign Data to Clusters

D.

Group Data into Bins

Question 26

You have an Azure Machine learning workspace. The workspace contains a dataset with data in a tabular form.

You plan to use the Azure Machine Learning SDK for Python vl to create a control script that will load the dataset into a pandas dataframe in preparation for model training The script will accept a parameter designating the dataset

You need to complete the script.

How should you complete the script? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Options:

Question 27

You manage an Azure Machine Learning workspace.

You must log multiple metrics by using MLflow.

You need to maximize logging performance.

What are two possible ways to achieve this goal? Each correct answer presents a complete solution.

NOTE: Each correct selection is worth one point.

Options:

A.

MLflowClient.log_batch

B.

mlflowlog_metrics

C.

mlflow.log_param

D.

mlflow.log. metric

Question 28

A set of CSV files contains sales records. All the CSV files have the same data schema.

Each CSV file contains the sales record for a particular month and has the filename sales.csv. Each file in stored in a folder that indicates the month and year when the data was recorded. The folders are in an Azure blob container for which a datastore has been defined in an Azure Machine Learning workspace. The folders are organized in a parent folder named sales to create the following hierarchical structure:

At the end of each month, a new folder with that month’s sales file is added to the sales folder.

You plan to use the sales data to train a machine learning model based on the following requirements:

You must define a dataset that loads all of the sales data to date into a structure that can be easily converted to a dataframe.

You must be able to create experiments that use only data that was created before a specific previous month, ignoring any data that was added after that month.

You must register the minimum number of datasets possible.

You need to register the sales data as a dataset in Azure Machine Learning service workspace.

What should you do?

Options:

A.

Create a tabular dataset that references the datastore and explicitly specifies each 'sales/mm-yyyy/sales.csv' file every month. Register the dataset with the name sales_dataset each month, replacing theexisting dataset and specifying a tag named month indicating the month and year it was registered. Usethis dataset for all experiments.

B.

Create a tabular dataset that references the datastore and specifies the path 'sales/*/sales.csv', register the dataset with the name sales_dataset and a tag named month indicating the month and year it was registered, and use this dataset for all experiments.

C.

Create a new tabular dataset that references the datastore and explicitly specifies each 'sales/mm-yyyy/ sales.csv' file every month. Register the dataset with the name sales_dataset_MM-YYYY each month with appropriate MM and YYYY values for the month and year. Use the appropriate month-specific dataset for experiments.

D.

Create a tabular dataset that references the datastore and explicitly specifies each 'sales/mm-yyyy/sales.csv' file. Register the dataset with the name sales_dataset each month as a new version and with a tag named month indicating the month and year it was registered. Use this dataset for all experiments,identifying the version to be used based on the month tag as necessary.

Question 29

You manage an Azure Machine Learning won pace named workspace 1 by using the Python SDK v2. You create a Gene-al Purpose v2 Azure storage account named mlstorage1. The storage account includes a pulley accessible container name micOTtalnerl. The container stores 10 blobs with files in the CSV format.

You must develop Python SDK v2 code to create a data asset referencing all blobs in the container named mtcontamer1.

You need to complete the Python SDK v2 code.

How should you complete the code? To answer select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Options:

Question 30

You have a dataset that includes confidential data. You use the dataset to train a model.

You must use a differential privacy parameter to keep the data of individuals safe and private.

You need to reduce the effect of user data on aggregated results.

What should you do?

Options:

A.

Decrease the value of the epsilon parameter to reduce the amount of noise added to the data

B.

Increase the value of the epsilon parameter to decrease privacy and increase accuracy

C.

Decrease the value of the epsilon parameter to increase privacy and reduce accuracy

D.

Set the value of the epsilon parameter to 1 to ensure maximum privacy

Question 31

You create a binary classification model by using Azure Machine Learning Studio.

You must tune hyperparameters by performing a parameter sweep of the model. The parameter sweep must meet the following requirements:

iterate all possible combinations of hyperparameters

minimize computing resources required to perform the sweep

You need to perform a parameter sweep of the model.

Which parameter sweep mode should you use?

Options:

A.

Random sweep

B.

Sweep clustering

C.

Entire grid

D.

Random grid

E.

Random seed

Question 32

You manage an Azure Machine Learning workspace. You have an environment for training jobs which uses an existing Docker image. A new version of the Docker image is available.

You need to use the latest version of the Docker image for the environment configuration by using the Azure Machine Learning SDK v2-What should you do?

Options:

A.

Modify the conda. file to specify the new version of the Docker image.

B.

Use the Environment class to create a new version of the environment.

C.

Use the create.or. update method to change the tag of the image.

D.

Change the description parameter of the environment configuration.

Question 33

You manage an Azure Machine Learning workspace.

You must set up an event-driven process to trigger a retraining pipeline.

You need to configure an Azure service that will trigger a retraining pipeline in response to data drift in Azure Machine Learning datasets. Which Azure service should you use?

Options:

A.

Event Grid

B.

Azure Functions

C.

Event Hubs

D.

Logic Apps

Question 34

You create an Azure Machine Learning compute target named ComputeOne by using the STANDARD_D1 virtual machine image.

You define a Python variable named was that references the Azure Machine Learning workspace. You run the following Python code:

For each of the following statements, select Yes if the statement is true. Otherwise, select No.

NOTE: Each correct selection is worth one point.

Options:

Question 35

: 214 HOTSPOT

You create a script for training a machine learning model in Azure Machine Learning service.

You create an estimator by running the following code:

For each of the following statements, select Yes if the statement is true. Otherwise, select No.

NOTE: Each correct selection is worth one point.

Options:

Question 36

You create an Azure Machine Learning pipeline named pipeline1 with two steps that contain Python scripts. Data processed by the first step is passed to the second step.

You must update the content of the downstream data source of pipeline1 and run the pipeline again

You need to ensure the new run of pipeline1 fully processes the updated content.

Solution: Set the allow_reuse parameter of the PythonScriptStep object of both steps to False

Does the solution meet the goal?

Options:

A.

Yes

B.

No

Question 37

You are with a time series dataset in Azure Machine Learning Studio.

You need to split your dataset into training and testing subsets by using the Split Data module.

Which splitting mode should you use?

Options:

A.

Regular Expression Split

B.

Split Rows with the Randomized split parameter set to true

C.

Relative Expression Split

D.

Recommender Split

Question 38

You run a script as an experiment in Azure Machine Learning.

You have a Run object named run that references the experiment run. You must review the log files that were generated during the experiment run.

You need to download the log files to a local folder for review.

Which two code segments can you run to achieve this goal? Each correct answer presents a complete solution.

NOTE: Each correct selection is worth one point.

Options:

A.

run.get_details()

B.

run.get_file_names()

C.

run.get_metrics()

D.

run.download_files(output_directory='./runfiles')

E.

run.get_all_logs(destination='./runlogs')

Question 39

You create an Azure Machine Learning workspace and an Azure Synapse Analytics workspace with a Spark pool. The workspaces are contained within the same Azure subscription.

You must manage the Synapse Spark pool from the Azure Machine Learning workspace.

You need to attach the Synapse Spark pool in Azure Machine Learning by usinq the Python SDK v2.

Which three actions should you perform in sequence? To answer move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.

Options:

Question 40

You create an Azure Machine Learning workspace.

You plan to write an Azure Machine Learning SDK for Python v2 script that logs an image for an experiment. The logged image must be available from the images tab in Azure Machine Learning Studio.

You need to complete the script.

Which code segments should you use? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Options:

Question 41

You are creating a compute target to train a machine learning experiment.

The compute target must support automated machine learning, machine learning pipelines, and Azure Machine Learning designer training.

You need to configure the compute target

Which option should you use?

Options:

A.

Azure HDInsight

B.

Azure Machine Learning compute cluster

C.

Azure Batch

D.

Remote VM

Question 42

You previously deployed a model that was trained using a tabular dataset named training-dataset, which is based on a folder of CSV files.

Over time, you have collected the features and predicted labels generated by the model in a folder containing a CSV file for each month. You have created two tabular datasets based on the folder containing the inference data: one named predictions-dataset with a schema that matches the training data exactly, including the predicted label; and another named features-dataset with a schema containing all of the feature columns and a timestamp column based on the filename, which includes the day, month, and year.

You need to create a data drift monitor to identify any changing trends in the feature data since the model was trained. To accomplish this, you must define the required datasets for the data drift monitor.

Which datasets should you use to configure the data drift monitor? To answer, drag the appropriate datasets to the correct data drift monitor options. Each source may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.

NOTE: Each correct selection is worth one point.

Options:

Question 43

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

You create an Azure Machine Learning service datastore in a workspace. The datastore contains the following files:

• /data/2018/Q1 .csv

• /data/2018/Q2.csv

• /data/2018/Q3.csv

• /data/2018/Q4.csv

• /data/2019/Q1.csv

All files store data in the following format:

id,M,f2,l

1,1,2,0

2,1,1,1

32,10

You run the following code:

You need to create a dataset named training_data and load the data from all files into a single data frame by using the following code:

Solution: Run the following code:

Does the solution meet the goal?

Options:

A.

Yes

B.

No

Question 44

You manage an Azure Machine Learning workspace. You create an experiment named experiment1 by using the Azure Machine Learning Python SDK v2 and MLflow. You are reviewing the results of experiment1 by using the following code segment:

For each of the following statements, Select Yes if the statement is true Otherwise, select No.

Options:

Question 45

You need to identify the methods for dividing the data according, to the testing requirements.

Which properties should you select? To answer, select the appropriate option-, m the answer area. NOTE: Each correct selection is worth one point.

Options:

Question 46

You need to configure the Feature Based Feature Selection module based on the experiment requirements and datasets.

How should you configure the module properties? To answer, select the appropriate options in the dialog box in the answer area.

NOTE: Each correct selection is worth one point.

Options:

Question 47

You need to implement early stopping criteria as suited in the model training requirements.

Which three code segments should you use to develop the solution? To answer, move the appropriate code segments from the list of code segments to the answer area and arrange them in the correct order.

NOTE: More than one order of answer choices is correct. You will receive credit for any of the correct orders you select.

Options:

Question 48

You need to configure the Permutation Feature Importance module for the model training requirements.

What should you do? To answer, select the appropriate options in the dialog box in the answer area.

NOTE: Each correct selection is worth one point.

Options:

Question 49

You need to visually identify whether outliers exist in the Age column and quantify the outliers before the outliers are removed.

Which three Azure Machine Learning Studio modules should you use in sequence? To answer, move the appropriate modules from the list of modules to the answer area and arrange them in the correct order.

Options:

Question 50

You need to produce a visualization for the diagnostic test evaluation according to the data visualization requirements.

Which three modules should you recommend be used in sequence? To answer, move the appropriate modules from the list of modules to the answer area and arrange them in the correct order.

Options:

Question 51

You need to select a feature extraction method.

Which method should you use?

Options:

A.

Spearman correlation

B.

Mutual information

C.

Mann-Whitney test

D.

Pearson’s correlation

Question 52

You need to configure the Edit Metadata module so that the structure of the datasets match.

Which configuration options should you select? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Options:

Question 53

You need to replace the missing data in the AccessibilityToHighway columns.

How should you configure the Clean Missing Data module? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Options:

Question 54

You need to identify the methods for dividing the data according to the testing requirements.

Which properties should you select? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Options:

Question 55

You need to select a feature extraction method.

Which method should you use?

Options:

A.

Mutual information

B.

Mood’s median test

C.

Kendall correlation

D.

Permutation Feature Importance

Question 56

You need to correct the model fit issue.

Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.

Options:

Question 57

You need to set up the Permutation Feature Importance module according to the model training requirements.

Which properties should you select? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Options:

Page: 1 / 44
Total 441 questions