Offering Free access to Data Analyst Databricks-Certified-Data-Analyst-Associate Exam Questions Pool Bank

Databricks Certified Data Analyst Associate Exam Questions and Answers

Testing Engine

Product Type: Testing Engine

$43.75 ~~$124.99~~

Add to Cart

PDF + Testing Engine

Product Type: PDF + Testing Engine

$61.25 ~~$174.99~~

Add to Cart

PDF Study Guide

Product Type: PDF Study Guide

$38.5 ~~$109.99~~

Add to Cart

Question 1

A data analysis team is working with the table_bronze SQL table as a source for one of its most complex projects. A stakeholder of the project notices that some of the downstream data is duplicative. The analysis team identifies table_bronze as the source of the duplication.

Which of the following queries can be used to deduplicate the data from table_bronze and write it to a new table table_silver?

CREATE TABLE table_silver AS

SELECT DISTINCT *

FROM table_bronze;

CREATE TABLE table_silver AS

INSERT *

FROM table_bronze;

CREATE TABLE table_silver AS

MERGE DEDUPLICATE *

FROM table_bronze;

INSERT INTO TABLE table_silver

SELECT * FROM table_bronze;

INSERT OVERWRITE TABLE table_silver

SELECT * FROM table_bronze;

Options:

Option A

Option B

Option C

Option D

Option E

Question 2

A data engineering team has created a Structured Streaming pipeline that processes data in micro-batches and populates gold-level tables. The microbatches are triggered every minute.

A data analyst has created a dashboard based on this gold-level data. The project stakeholders want to see the results in the dashboard updated within one minute or less of new data becoming available within the gold-level tables.

Which of the following cautions should the data analyst share prior to setting up the dashboard to complete this task?

Options:

The required compute resources could be costly

The gold-level tables are not appropriately clean for business reporting

The streaming data is not an appropriate data source for a dashboard

The streaming cluster is not fault tolerant

The dashboard cannot be refreshed that quickly

Answer:

Explanation:

A Structured Streaming pipeline that processes data in micro-batches and populates gold-level tables every minute requires a high level of compute resources to handle the frequent data ingestion, processing, and writing. This could result in a significant cost for the organization, especially if the data volume and velocity are large. Therefore, the data analyst should share this caution with the project stakeholders before setting up the dashboard and evaluate the trade-offs between the desired refresh rate and the available budget. The other options are not valid cautions because:

B. The gold-level tables are assumed to be appropriately clean for business reporting, as they are the final output of the data engineering pipeline. If the data quality is not satisfactory, the issue should be addressed at the source or silver level, not at the gold level.

C. The streaming data is an appropriate data source for a dashboard, as it can provide near real-time insights and analytics for the business users. Structured Streaming supports various sources and sinks for streaming data, including Delta Lake, which can enable both batch and streaming queries on the same data.

D. The streaming cluster is fault tolerant, as Structured Streaming provides end-to-end exactly-once fault-tolerance guarantees through checkpointing and write-ahead logs. If a query fails, it can be restarted from the last checkpoint and resume processing.

E. The dashboard can be refreshed within one minute or less of new data becoming available in the gold-level tables, as Structured Streaming can trigger micro-batches as fast as possible (every few seconds) and update the results incrementally. However, this may not be necessary or optimal for the business use case, as it could cause frequent changes in the dashboard and consume more resources. References: Streaming on Databricks, Monitoring Structured Streaming queries on Databricks, A look at the new Structured Streaming UI in Apache Spark 3.0, Run your first Structured Streaming workload

Question 3

A business analyst has been asked to create a data entity/object called sales_by_employee. It should always stay up-to-date when new data are added to the sales table. The new entity should have the columns sales_person, which will be the name of the employee from the employees table, and sales, which will be all sales for that particular sales person. Both the sales table and the employees table have an employee_id column that is used to identify the sales person.

Which of the following code blocks will accomplish this task?

Options:

Option

Question 4

A data analyst is processing a complex aggregation on a table with zero null values and their query returns the following result:

Which of the following queries did the analyst run to obtain the above result?

Options:

Option A

Option B

Option C

Option D

Option E

Question 5

Which location can be used to determine the owner of a managed table?

Options:

Review the Owner field in the table page using Catalog Explorer

Review the Owner field in the database page using Data Explorer

Review the Owner field in the schema page using Data Explorer

Review the Owner field in the table page using the SQL Editor

Question 6

Which statement about subqueries is correct?

Options:

Subqueries are not available in Databricks SQL

Subqueries can be used like other user-defined functions to transform data into different data types.

Subqueries can retrieve data without requiring the creation of a table or view.

Subqueries can be used like other built-in functions to transform data into different data types.

Question 7

Which of the following approaches can be used to ingest data directly from cloud-based object storage?

Options:

Create an external table while specifying the DBFS storage path to FROM

Create an external table while specifying the DBFS storage path to PATH

It is not possible to directly ingest data from cloud-based object storage

Create an external table while specifying the object storage path to FROM

Create an external table while specifying the object storage path to LOCATION

Question 8

A data analyst has been asked to provide a list of options on how to share a dashboard with a client. It is a security requirement that the client does not gain access to any other information, resources, or artifacts in the database.

Which of the following approaches cannot be used to share the dashboard and meet the security requirement?

Options:

Download the Dashboard as a PDF and share it with the client.

Set a refresh schedule for the dashboard and enter the client's email address in the "Subscribers" box.

Take a screenshot of the dashboard and share it with the client.

Generate a Personal Access Token that is good for 1 day and share it with the client.

Download a PNG file of the visualizations in the dashboard and share them with the client.

Question 9

A data analyst wants the following output:

customer_name

number_of_orders

John Doe

388

Zhang San

234

Which statement will produce this output?

Options:

SELECT customer_name, count(order_id) AS number_of_orders

FROM customers

JOIN orders

ON customers.customer_id = orders.customer_id

GROUP BY customer_name;

SELECT customer_name, count(order_id) number_of_orders

FROM customers

JOIN orders

ON customers.customer_id = orders.customer_id USE customer_name;

SELECT customerjiame, (order_id) number_of_orders

FROM customers

JOIN orders

ON customers.customer_id = orders.customer_id;

SELECT customerjiame, count(order_id)

FROM customers

JOIN orders

ON customers.customer_id = orders.customer_id GROUP BY customerjiame;

Question 10

Which statement describes descriptive statistics?

Options:

A branch of statistics that uses a variety of data analysis techniques to infer properties of an underlying distribution of probability.

A branch of statistics that uses summary statistics to categorically describe and summarize data.

A branch of statistics that uses summary statistics to quantitatively describe and summarize data.

A branch of statistics that uses quantitative variables that must take on a finite or countably infinite set of values.

Question 11

Where in the Databricks SQL workspace can a data analyst configure a refresh schedule for a query when the query is not attached to a dashboard or alert?

Options:

Data bxplorer

The Visualization editor

The Query Editor

The Dashboard Editor

Question 12

Which of the following benefits of using Databricks SQL is provided by Data Explorer?

Options:

It can be used to run UPDATE queries to update any tables in a database.

It can be used to view metadata and data, as well as view/change permissions.

It can be used to produce dashboards that allow data exploration.

It can be used to make visualizations that can be shared with stakeholders.

It can be used to connect to third party Bl cools.

Question 13

A data analyst has created a user-defined function using the following line of code:

CREATE FUNCTION price(spend DOUBLE, units DOUBLE)

RETURNS DOUBLE

RETURN spend / units;

Which of the following code blocks can be used to apply this function to the customer_spend and customer_units columns of the table customer_summary to create column customer_price?

Options:

SELECT PRICE customer_spend, customer_units AS customer_price FROM customer_summary

SELECT price FROM customer_summary

SELECT function(price(customer_spend, customer_units)) AS customer_price FROM customer_summary

SELECT double(price(customer_spend, customer_units)) AS customer_price FROM customer_summary

SELECT price(customer_spend, customer_units) AS customer_price FROM customer_summary

Question 14

In which of the following situations will the mean value and median value of variable be meaningfully different?

Options:

When the variable contains no outliers

When the variable contains no missing values

When the variable is of the boolean type

When the variable is of the categorical type

When the variable contains a lot of extreme outliers

Question 15

A data analyst created and is the owner of the managed table my_ table. They now want to change ownership of the table to a single other user using Data Explorer.

Which of the following approaches can the analyst use to complete the task?

Options:

Edit the Owner field in the table page by removing their own account

Edit the Owner field in the table page by selecting All Users

Edit the Owner field in the table page by selecting the new owner's account

Edit the Owner field in the table page by selecting the Admins group

Edit the Owner field in the table page by removing all access

Question 16

The stakeholders.customers table has 15 columns and 3,000 rows of data. The following command is run:

After runningSELECT * FROM stakeholders.eur_customers, 15 rows are returned. After the command executes completely, the user logs out of Databricks.

After logging back in two days later, what is the status of thestakeholders.eur_customersview?

Options:

The view remains available and SELECT * FROM stakeholders.eur_customers will execute correctly.

The view has been dropped.

The view is not available in the metastore, but the underlying data can be accessed with SELECT * FROM delta. `stakeholders.eur_customers`.

The view remains available but attempting to SELECT from it results in an empty result set because data in views are automatically deleted after logging out.

The view has been converted into a table.

Question 17

Which of the following approaches can be used to connect Databricks to Fivetran for data ingestion?

Options:

Use Workflows to establish a SQL warehouse (formerly known as a SQL endpoint) for Fivetran to interact with

Use Delta Live Tables to establish a cluster for Fivetran to interact with

Use Partner Connect's automated workflow to establish a cluster for Fivetran to interact with

Use Partner Connect's automated workflow to establish a SQL warehouse (formerly known as a SQL endpoint) for Fivetran to interact with

Use Workflows to establish a cluster for Fivetran to interact with

Question 18

A data engineering team has created a Structured Streaming pipeline that processes data in micro-batches and populates gold-level tables. The microbatches are triggered every 10 minutes.

A data analyst has created a dashboard based on this gold level data. The project stakeholders want to see the results in the dashboard updated within 10 minutes or less of new data becoming available within the gold-level tables.

What is the ability to ensure the streamed data is included in the dashboard at the standard requested by the project stakeholders?

Options:

A refresh schedule with an interval of 10 minutes or less

A refresh schedule with an always-on SQL Warehouse (formerly known as SQL Endpoint

A refresh schedule with stakeholders included as subscribers

A refresh schedule with a Structured Streaming cluster

Question 19

A data analyst runs the following command:

INSERT INTO stakeholders.suppliers TABLE stakeholders.new_suppliers;

What is the result of running this command?

Options:

The suppliers table now contains both the data it had before the command was run and the data from the new suppliers table, and any duplicate data is deleted.

The command fails because it is written incorrectly.

The suppliers table now contains both the data it had before the command was run and the data from the new suppliers table, including any duplicate data.

The suppliers table now contains the data from the new suppliers table, and the new suppliers table now contains the data from the suppliers table.

The suppliers table now contains only the data from the new suppliers table.

Load More Databricks-Certified-Data-Analyst-Associate Questions

Summer Special Flat 65% Limited Time Discount offer - Ends in 0d 00h 00m 00s - Coupon code: netdisc

Databricks Databricks-Certified-Data-Analyst-Associate Databricks Certified Data Analyst Associate Exam Exam Practice Test

Databricks Certified Data Analyst Associate Exam Questions and Answers

Testing Engine

PDF + Testing Engine

PDF Study Guide

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation: