Offering Free access to Databricks Certification Databricks-Certified-Data-Engineer-Associate Exam Questions Pool Bank

Databricks Certified Data Engineer Associate Exam Questions and Answers

Testing Engine

Product Type: Testing Engine

$37.5 ~~$124.99~~

Add to Cart

PDF + Testing Engine

Product Type: PDF + Testing Engine

$52.5 ~~$174.99~~

Add to Cart

PDF Study Guide

Product Type: PDF Study Guide

$33 ~~$109.99~~

Add to Cart

Question 1

A new data engineering team team has been assigned to an ELT project. The new data engineering team will need full privileges on the table sales to fully manage the project.

Which of the following commands can be used to grant full permissions on the database to the new data engineering team?

Options:

GRANT ALL PRIVILEGES ON TABLE sales TO team;

GRANT SELECT CREATE MODIFY ON TABLE sales TO team;

GRANT SELECT ON TABLE sales TO team;

GRANT USAGE ON TABLE sales TO team;

GRANT ALL PRIVILEGES ON TABLE team TO sales;

Question 2

What is stored in a Databricks customer's cloud account?

Options:

Data

Cluster management metadata

Databricks web application

Notebooks

Question 3

A data engineer is using the following code block as part of a batch ingestion pipeline to read from a composable table:

Which of the following changes needs to be made so this code block will work when the transactions table is a stream source?

Options:

Replace predict with a stream-friendly prediction function

Replace schema(schema) with option ("maxFilesPerTrigger", 1)

Replace "transactions" with the path to the location of the Delta table

Replace format("delta") with format("stream")

Replace spark.read with spark.readStream

Question 4

A data engineer needs to use a Delta table as part of a data pipeline, but they do not know if they have the appropriate permissions.

In which location can the data engineer review their permissions on the table?

Options:

Jobs

Dashboards

Catalog Explorer

Repos

Question 5

Which of the following code blocks will remove the rows where the value in column age is greater than 25 from the existing Delta table my_table and save the updated table?

Options:

SELECT * FROM my_table WHERE age > 25;

UPDATE my_table WHERE age > 25;

DELETE FROM my_table WHERE age > 25;

UPDATE my_table WHERE age <= 25;

DELETE FROM my_table WHERE age <= 25;

Question 6

Which of the following approaches should be used to send the Databricks Job owner an email in the case that the Job fails?

Options:

Manually programming in an alert system in each cell of the Notebook

Setting up an Alert in the Job page

Setting up an Alert in the Notebook

There is no way to notify the Job owner in the case of Job failure

MLflow Model Registry Webhooks

Question 7

A data engineer needs access to a table new_uable, but they do not have the correct permissions. They can ask the table owner for permission, but they do not know who the table owner is.

Which approach can be used to identify the owner of new_table?

Options:

There is no way to identify the owner of the table

Review the Owner field in the table's page in the cloud storage solution

Review the Permissions tab in the table's page in Data Explorer

Review the Owner field in the table’s page in Data Explorer

Question 8

In which of the following file formats is data from Delta Lake tables primarily stored?

Options:

Delta

CSV

Parquet

JSON

A proprietary, optimized format specific to Databricks

Question 9

A dataset has been defined using Delta Live Tables and includes an expectations clause:

CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION FAIL UPDATE

What is the expected behavior when a batch of data containing data that violates these constraints is processed?

Options:

Records that violate the expectation cause the job to fail.

Records that violate the expectation are added to the target dataset and flagged as invalid in a field added to the target dataset.

Records that violate the expectation are dropped from the target dataset and recorded as invalid in the event log.

Records that violate the expectation are added to the target dataset and recorded as invalid in the event log.

Question 10

A data engineer has a Job with multiple tasks that runs nightly. Each of the tasks runs slowly because the clusters take a long time to start.

Which of the following actions can the data engineer perform to improve the start up time for the clusters used for the Job?

Options:

They can use endpoints available in Databricks SQL

They can use jobs clusters instead of all-purpose clusters

They can configure the clusters to be single-node

They can use clusters that are from a cluster pool

They can configure the clusters to autoscale for larger data sizes

Answer:

Explanation:

The best action that the data engineer can perform to improve the start up time for the clusters used for the Job is to use clusters that are from a cluster pool. A cluster pool is a set of idle clusters that can be used by jobs or interactive sessions. By using a cluster pool, the data engineer can avoid the cluster creation time and reduce the latency of the tasks. Cluster pools also offer cost savings and resource efficiency, as they can be shared by multiple users and jobs.

Option A is not relevant, as endpoints available in Databricks SQL are used for creating and managing SQL analytics workloads, not for improving cluster start up time.

Option B is not correct, as jobs clusters and all-purpose clusters have similar start up times. Jobs clusters are clusters that are dedicated to run a single job and are terminated when the job is completed. All-purpose clusters are clusters that can be used for multiple purposes, such as interactive sessions, notebooks, or multiple jobs. Both types of clusters can benefit from using a cluster pool.

Option C is not advisable, as configuring the clusters to be single-node will reduce the parallelism and performance of the tasks. Single-node clusters are clusters that have only one worker node and are typically used for testing or development purposes. They are not suitable for running production jobs that require high scalability and fault tolerance.

Option E is not helpful, as configuring the clusters to autoscale for larger data sizes will not affect the start up time of the clusters. Autoscaling is a feature that allows clusters to dynamically adjust the number of worker nodes based on the workload. It can help optimize the resource utilization and cost efficiency of the clusters, but it does not speed up the cluster creation process.

[:, Cluster Pools, Jobs, Clusters, [Databricks Data Engineer Professional Exam Guide], ]

Question 11

Which of the following tools is used by Auto Loader process data incrementally?

Options:

Checkpointing

Spark Structured Streaming

Data Explorer

Unity Catalog

Databricks SQL

Question 12

Which of the following SQL keywords can be used to convert a table from a long format to a wide format?

Options:

PIVOT

CONVERT

WHERE

TRANSFORM

SUM

Question 13

A data engineer needs to use a Delta table as part of a data pipeline, but they do not know if they have the appropriate permissions.

In which of the following locations can the data engineer review their permissions on the table?

Options:

Databricks Filesystem

Jobs

Dashboards

Repos

Data Explorer

Question 14

A new data engineering team team has been assigned to an ELT project. The new data engineering team will need full privileges on the table sales to fully manage the project.

Which command can be used to grant full permissions on the database to the new data engineering team?

Options:

grant all privileges on table sales TO team;

GRANT SELECT ON TABLE sales TO team;

GRANT SELECT CREATE MODIFY ON TABLE sales TO team;

GRANT ALL PRIVILEGES ON TABLE team TO sales;

Question 15

Which of the following is hosted completely in the control plane of the classic Databricks architecture?

Options:

Worker node

JDBC data source

Databricks web application

Databricks Filesystem

Driver node

Answer:

Explanation:

The Databricks web application is the user interface that allows you to create and manage workspaces, clusters, notebooks, jobs, and other resources. It is hosted completely in the control plane of the classic Databricks architecture, which includes the backend services that Databricks manages in your Databricks account. The other options are part of the compute plane, which is where your data is processed by compute resources such as clusters. Thecompute plane is in your own cloud account and network. References: Databricks architecture overview, Security and Trust CenterQUESTION NO: 4

Which of the following benefits of using the Databricks Lakehouse Platform is provided by Delta Lake?

A. The ability to manipulate the same data using a variety of languages

B. The ability to collaborate in real time on a single notebook

C. The ability to set up alerts for query failures

D. The ability to support batch and streaming workloads

E. The ability to distribute complex data operations

Answer: D

Delta Lake is the optimized storage layer that provides the foundation for storing data and tables in the Databricks lakehouse. Delta Lake is fully compatible with Apache Spark APIs, and was developed for tight integration with Structured Streaming, allowing you to easily use a single copy of data for both batch and streaming operations and providing incremental processing at scale1. Delta Lake supports upserts using the merge operation, which enables you to efficiently update existing data or insert new data into your Delta tables2. Delta Lake also provides time travel capabilities, which allow you to query previous versions of your data or roll back to a specific point in time3. References: 1: What is Delta Lake? | Databricks on AWS 2: Upsert into a table using merge | Databricks on AWS 3: [Query an older snapshot of a table (time travel) | Databricks on AWS]

Learn more

Question 16

An engineering manager wants to monitor the performance of a recent project using a Databricks SQL query. For the first week following the project’s release, the manager wants the query results to be updated every minute. However, the manager is concerned that the compute resources used for the query will be left running and cost the organization a lot of money beyond the first week of the project’s release.

Which of the following approaches can the engineering team use to ensure the query does not cost the organization any money beyond the first week of the project’s release?

Options:

They can set a limit to the number of DBUs that are consumed by the SQL Endpoint.

They can set the query’s refresh schedule to end after a certain number of refreshes.

They cannot ensure the query does not cost the organization money beyond the first week of the project’s release.

They can set a limit to the number of individuals that are able to manage the query’s refresh schedule.

They can set the query’s refresh schedule to end on a certain date in the query scheduler.

Question 17

Which file format is used for storing Delta Lake Table?

Options:

Parquet

Delta

JSON

Question 18

Which of the following data workloads will utilize a Gold table as its source?

Options:

A job that enriches data by parsing its timestamps into a human-readable format

A job that aggregates uncleaned data to create standard summary statistics

A job that cleans data by removing malformatted records

A job that queries aggregated data designed to feed into a dashboard

A job that ingests raw data from a streaming source into the Lakehouse

Question 19

Which of the following describes when to use the CREATE STREAMING LIVE TABLE (formerly CREATE INCREMENTAL LIVE TABLE) syntax over the CREATE LIVE TABLE syntax when creating Delta Live Tables (DLT) tables using SQL?

Options:

CREATE STREAMING LIVE TABLE should be used when the subsequent step in the DLT pipeline is static.

CREATE STREAMING LIVE TABLE should be used when data needs to be processed incrementally.

CREATE STREAMING LIVE TABLE is redundant for DLT and it does not need to be used.

CREATE STREAMING LIVE TABLE should be used when data needs to be processed through complicated aggregations.

CREATE STREAMING LIVE TABLE should be used when the previous step in the DLT pipeline is static.

Question 20

Which of the following commands can be used to write data into a Delta table while avoiding the writing of duplicate records?

Options:

DROP

IGNORE

MERGE

APPEND

INSERT

Question 21

Which of the following commands will return the number of null values in the member_id column?

Options:

SELECT count(member_id) FROM my_table;

SELECT count(member_id) - count_null(member_id) FROM my_table;

SELECT count_if(member_id IS NULL) FROM my_table;

SELECT null(member_id) FROM my_table;

SELECT count_null(member_id) FROM my_table;

Question 22

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The code block used by the data engineer is below:

Which line of code should the data engineer use to fill in the blank if the data engineer only wants the query to execute a micro-batch to process data every 5 seconds?

Options:

trigger("5 seconds")

trigger(continuous="5 seconds")

trigger(once="5 seconds")

trigger(processingTime="5 seconds")

Question 23

A data architect has determined that a table of the following format is necessary:

Which of the following code blocks uses SQL DDL commands to create an empty Delta table in the above format regardless of whether a table already exists with this name?

Options:

Option A

Option B

Option C

Option D

Option E

Question 24

A data engineer wants to schedule their Databricks SQL dashboard to refresh once per day, but they only want the associated SQL endpoint to be running when it is necessary.

Which of the following approaches can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?

Options:

They can ensure the dashboard’s SQL endpoint matches each of the queries’ SQL endpoints.

They can set up the dashboard’s SQL endpoint to be serverless.

They can turn on the Auto Stop feature for the SQL endpoint.

They can reduce the cluster size of the SQL endpoint.

They can ensure the dashboard’s SQL endpoint is not one of the included query’s SQL endpoint.

Question 25

Which of the following statements regarding the relationship between Silver tables and Bronze tables is always true?

Options:

Silver tables contain a less refined, less clean view of data than Bronze data.

Silver tables contain aggregates while Bronze data is unaggregated.

Silver tables contain more data than Bronze tables.

Silver tables contain a more refined and cleaner view of data than Bronze tables.

Silver tables contain less data than Bronze tables.

Question 26

A data engineer has a Python notebook in Databricks, but they need to use SQL to accomplish a specific task within a cell. They still want all of the other cells to use Python without making any changes to those cells.

Which of the following describes how the data engineer can use SQL within a cell of their Python notebook?

Options:

It is not possible to use SQL in a Python notebook

They can attach the cell to a SQL endpoint rather than a Databricks cluster

They can simply write SQL syntax in the cell

They can add %sql to the first line of the cell

They can change the default language of the notebook to SQL

Question 27

A data engineer has developed a data pipeline to ingest data from a JSON source using Auto Loader, but the engineer has not provided any type inference or schema hints in their pipeline. Upon reviewing the data, the data engineer has noticed that all of the columns in the target table are of the string type despite some of the fields only including float or boolean values.

Which of the following describes why Auto Loader inferred all of the columns to be of the string type?

Options:

There was a type mismatch between the specific schema and the inferred schema

JSON data is a text-based format

Auto Loader only works with string data

All of the fields had at least one null value

Auto Loader cannot infer the schema of ingested data

Question 28

A data engineer and data analyst are working together on a data pipeline. The data engineer is working on the raw, bronze, and silver layers of the pipeline using Python, and the data analyst is working on the gold layer of the pipeline using SQL The raw source of the pipeline is a streaming input. They now want to migrate their pipeline to use Delta Live Tables.

Which change will need to be made to the pipeline when migrating to Delta Live Tables?

Options:

The pipeline can have different notebook sources in SQL & Python.

The pipeline will need to be written entirely in SQL.

The pipeline will need to be written entirely in Python.

The pipeline will need to use a batch source in place of a streaming source.

Question 29

A data engineer wants to schedule their Databricks SQL dashboard to refresh every hour, but they only want the associated SQL endpoint to be running when it is necessary. The dashboard has multiple queries on multiple datasets associated with it. The data that feeds the dashboard is automatically processed using a Databricks Job.

Which of the following approaches can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?

Options:

They can turn on the Auto Stop feature for the SQL endpoint.

They can ensure the dashboard's SQL endpoint is not one of the included query's SQL endpoint.

They can reduce the cluster size of the SQL endpoint.

They can ensure the dashboard's SQL endpoint matches each of the queries' SQL endpoints.

They can set up the dashboard's SQL endpoint to be serverless.

Question 30

A data engineer needs to determine whether to use the built-in Databricks Notebooks versioning or version their project using Databricks Repos.

Which of the following is an advantage of using Databricks Repos over the Databricks Notebooks versioning?

Options:

Databricks Repos automatically saves development progress

Databricks Repos supports the use of multiple branches

Databricks Repos allows users to revert to previous versions of a notebook

Databricks Repos provides the ability to comment on specific changes

Databricks Repos is wholly housed within the Databricks Lakehouse Platform

Question 31

A data engineer is maintaining a data pipeline. Upon data ingestion, the data engineer notices that the source data is starting to have a lower level of quality. The data engineer would like to automate the process of monitoring the quality level.

Which of the following tools can the data engineer use to solve this problem?

Options:

Unity Catalog

Data Explorer

Delta Lake

Delta Live Tables

Auto Loader

Question 32

A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are defined against Delta Lake table sources using LIVE TABLE.

The table is configured to run in Production mode using the Continuous Pipeline Mode.

Assuming previously unprocessed data exists and all definitions are valid, what is the expected outcome after clicking Start to update the pipeline?

Options:

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.

All datasets will be updated once and the pipeline will persist without any processing. The compute resources will persist but go unused.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be deployed for the update and terminated when the pipeline is stopped.

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

Load More Databricks-Certified-Data-Engineer-Associate Questions

Weekend Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70special

Databricks Databricks-Certified-Data-Engineer-Associate Databricks Certified Data Engineer Associate Exam Exam Practice Test

Databricks Certified Data Engineer Associate Exam Questions and Answers

Testing Engine

PDF + Testing Engine

PDF Study Guide

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options: