Winter Special Flat 65% Limited Time Discount offer - Ends in 0d 00h 00m 00s - Coupon code: netdisc

Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Databricks Certified Associate Developer for Apache Spark 3.0 Exam Exam Practice Test

Databricks Certified Associate Developer for Apache Spark 3.0 Exam Questions and Answers

Testing Engine

  • Product Type: Testing Engine
$42  $119.99

PDF Study Guide

  • Product Type: PDF Study Guide
$36.75  $104.99
Question 1

Which of the following code blocks returns a new DataFrame in which column attributes of DataFrame itemsDf is renamed to feature0 and column supplier to feature1?

Options:

A.

itemsDf.withColumnRenamed(attributes, feature0).withColumnRenamed(supplier, feature1)

B.

1.itemsDf.withColumnRenamed("attributes", "feature0")

2.itemsDf.withColumnRenamed("supplier", "feature1")

C.

itemsDf.withColumnRenamed(col("attributes"), col("feature0"), col("supplier"), col("feature1"))

D.

itemsDf.withColumnRenamed("attributes", "feature0").withColumnRenamed("supplier", "feature1")

E.

itemsDf.withColumn("attributes", "feature0").withColumn("supplier", "feature1")

Question 2

Which of the following code blocks reduces a DataFrame from 12 to 6 partitions and performs a full shuffle?

Options:

A.

DataFrame.repartition(12)

B.

DataFrame.coalesce(6).shuffle()

C.

DataFrame.coalesce(6)

D.

DataFrame.coalesce(6, shuffle=True)

E.

DataFrame.repartition(6)

Question 3

The code block displayed below contains an error. When the code block below has executed, it should have divided DataFrame transactionsDf into 14 parts, based on columns storeId and

transactionDate (in this order). Find the error.

Code block:

transactionsDf.coalesce(14, ("storeId", "transactionDate"))

Options:

A.

The parentheses around the column names need to be removed and .select() needs to be appended to the code block.

B.

Operator coalesce needs to be replaced by repartition, the parentheses around the column names need to be removed, and .count() needs to be appended to the code block.

(Correct)

C.

Operator coalesce needs to be replaced by repartition, the parentheses around the column names need to be removed, and .select() needs to be appended to the code block.

D.

Operator coalesce needs to be replaced by repartition and the parentheses around the column names need to be replaced by square brackets.

E.

Operator coalesce needs to be replaced by repartition.

Question 4

The code block shown below should store DataFrame transactionsDf on two different executors, utilizing the executors' memory as much as possible, but not writing anything to disk. Choose the

answer that correctly fills the blanks in the code block to accomplish this.

1.from pyspark import StorageLevel

2.transactionsDf.__1__(StorageLevel.__2__).__3__

Options:

A.

1. cache

2. MEMORY_ONLY_2

3. count()

B.

1. persist

2. DISK_ONLY_2

3. count()

C.

1. persist

2. MEMORY_ONLY_2

3. select()

D.

1. cache

2. DISK_ONLY_2

3. count()

E.

1. persist

2. MEMORY_ONLY_2

3. count()

Question 5

Which of the following code blocks reads in parquet file /FileStore/imports.parquet as a DataFrame?

Options:

A.

spark.mode("parquet").read("/FileStore/imports.parquet")

B.

spark.read.path("/FileStore/imports.parquet", source="parquet")

C.

spark.read().parquet("/FileStore/imports.parquet")

D.

spark.read.parquet("/FileStore/imports.parquet")

E.

spark.read().format('parquet').open("/FileStore/imports.parquet")

Question 6

Which of the following statements about RDDs is incorrect?

Options:

A.

An RDD consists of a single partition.

B.

The high-level DataFrame API is built on top of the low-level RDD API.

C.

RDDs are immutable.

D.

RDD stands for Resilient Distributed Dataset.

E.

RDDs are great for precisely instructing Spark on how to do a query.

Question 7

The code block shown below should show information about the data type that column storeId of DataFrame transactionsDf contains. Choose the answer that correctly fills the blanks in the code

block to accomplish this.

Code block:

transactionsDf.__1__(__2__).__3__

Options:

A.

1. select

2. "storeId"

3. print_schema()

B.

1. limit

2. 1

3. columns

C.

1. select

2. "storeId"

3. printSchema()

D.

1. limit

2. "storeId"

3. printSchema()

E.

1. select

2. storeId

3. dtypes

Question 8

The code block shown below should write DataFrame transactionsDf as a parquet file to path storeDir, using brotli compression and replacing any previously existing file. Choose the answer that

correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__.format("parquet").__2__(__3__).option(__4__, "brotli").__5__(storeDir)

Options:

A.

1. save

2. mode

3. "ignore"

4. "compression"

5. path

B.

1. store

2. with

3. "replacement"

4. "compression"

5. path

C.

1. write

2. mode

3. "overwrite"

4. "compression"

5. save

(Correct)

D.

1. save

2. mode

3. "replace"

4. "compression"

5. path

E.

1. write

2. mode

3. "overwrite"

4. compression

5. parquet

Question 9

Which of the following code blocks reads in the JSON file stored at filePath as a DataFrame?

Options:

A.

spark.read.json(filePath)

B.

spark.read.path(filePath, source="json")

C.

spark.read().path(filePath)

D.

spark.read().json(filePath)

E.

spark.read.path(filePath)

Question 10

Which of the following statements about the differences between actions and transformations is correct?

Options:

A.

Actions are evaluated lazily, while transformations are not evaluated lazily.

B.

Actions generate RDDs, while transformations do not.

C.

Actions do not send results to the driver, while transformations do.

D.

Actions can be queued for delayed execution, while transformations can only be processed immediately.

E.

Actions can trigger Adaptive Query Execution, while transformation cannot.

Question 11

The code block shown below should return a copy of DataFrame transactionsDf without columns value and productId and with an additional column associateId that has the value 5. Choose the

answer that correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__(__2__, __3__).__4__(__5__, 'value')

Options:

A.

1. withColumn

2. 'associateId'

3. 5

4. remove

5. 'productId'

B.

1. withNewColumn

2. associateId

3. lit(5)

4. drop

5. productId

C.

1. withColumn

2. 'associateId'

3. lit(5)

4. drop

5. 'productId'

D.

1. withColumnRenamed

2. 'associateId'

3. 5

4. drop

5. 'productId'

E.

1. withColumn

2. col(associateId)

3. lit(5)

4. drop

5. col(productId)

Question 12

Which of the following is not a feature of Adaptive Query Execution?

Options:

A.

Replace a sort merge join with a broadcast join, where appropriate.

B.

Coalesce partitions to accelerate data processing.

C.

Split skewed partitions into smaller partitions to avoid differences in partition processing time.

D.

Reroute a query in case of an executor failure.

E.

Collect runtime statistics during query execution.

Question 13

The code block displayed below contains an error. The code block should return a new DataFrame that only contains rows from DataFrame transactionsDf in which the value in column predError is

at least 5. Find the error.

Code block:

transactionsDf.where("col(predError) >= 5")

Options:

A.

The argument to the where method should be "predError >= 5".

B.

Instead of where(), filter() should be used.

C.

The expression returns the original DataFrame transactionsDf and not a new DataFrame. To avoid this, the code block should be transactionsDf.toNewDataFrame().where("col(predError) >= 5").

D.

The argument to the where method cannot be a string.

E.

Instead of >=, the SQL operator GEQ should be used.

Question 14

Which of the following describes tasks?

Options:

A.

A task is a command sent from the driver to the executors in response to a transformation.

B.

Tasks transform jobs into DAGs.

C.

A task is a collection of slots.

D.

A task is a collection of rows.

E.

Tasks get assigned to the executors by the driver.

Question 15

Which of the following code blocks uses a schema fileSchema to read a parquet file at location filePath into a DataFrame?

Options:

A.

spark.read.schema(fileSchema).format("parquet").load(filePath)

B.

spark.read.schema("fileSchema").format("parquet").load(filePath)

C.

spark.read().schema(fileSchema).parquet(filePath)

D.

spark.read().schema(fileSchema).format(parquet).load(filePath)

E.

spark.read.schema(fileSchema).open(filePath)

Question 16

Which of the following DataFrame operators is never classified as a wide transformation?

Options:

A.

DataFrame.sort()

B.

DataFrame.aggregate()

C.

DataFrame.repartition()

D.

DataFrame.select()

E.

DataFrame.join()

Question 17

The code block displayed below contains an error. The code block should count the number of rows that have a predError of either 3 or 6. Find the error.

Code block:

transactionsDf.filter(col('predError').in([3, 6])).count()

Options:

A.

The number of rows cannot be determined with the count() operator.

B.

Instead of filter, the select method should be used.

C.

The method used on column predError is incorrect.

D.

Instead of a list, the values need to be passed as single arguments to the in operator.

E.

Numbers 3 and 6 need to be passed as string variables.

Question 18

Which of the following statements about garbage collection in Spark is incorrect?

Options:

A.

Garbage collection information can be accessed in the Spark UI's stage detail view.

B.

Optimizing garbage collection performance in Spark may limit caching ability.

C.

Manually persisting RDDs in Spark prevents them from being garbage collected.

D.

In Spark, using the G1 garbage collector is an alternative to using the default Parallel garbage collector.

E.

Serialized caching is a strategy to increase the performance of garbage collection.

Question 19

Which of the following code blocks returns DataFrame transactionsDf sorted in descending order by column predError, showing missing values last?

Options:

A.

transactionsDf.sort(asc_nulls_last("predError"))

B.

transactionsDf.orderBy("predError").desc_nulls_last()

C.

transactionsDf.sort("predError", ascending=False)

D.

transactionsDf.desc_nulls_last("predError")

E.

transactionsDf.orderBy("predError").asc_nulls_last()

Question 20

Which of the following statements about storage levels is incorrect?

Options:

A.

The cache operator on DataFrames is evaluated like a transformation.

B.

In client mode, DataFrames cached with the MEMORY_ONLY_2 level will not be stored in the edge node's memory.

C.

Caching can be undone using the DataFrame.unpersist() operator.

D.

MEMORY_AND_DISK replicates cached DataFrames both on memory and disk.

E.

DISK_ONLY will not use the worker node's memory.

Question 21

Which of the following code blocks creates a new DataFrame with two columns season and wind_speed_ms where column season is of data type string and column wind_speed_ms is of data type

double?

Options:

A.

spark.DataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})

B.

spark.createDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])

C.

1. from pyspark.sql import types as T

2. spark.createDataFrame((("summer", 4.5), ("winter", 7.5)), T.StructType([T.StructField("season", T.CharType()), T.StructField("season", T.DoubleType())]))

D.

spark.newDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])

E.

spark.createDataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})

Question 22

Which of the following code blocks reads in the parquet file stored at location filePath, given that all columns in the parquet file contain only whole numbers and are stored in the most appropriate

format for this kind of data?

Options:

A.

1.spark.read.schema(

2. StructType(

3. StructField("transactionId", IntegerType(), True),

4. StructField("predError", IntegerType(), True)

5. )).load(filePath)

B.

1.spark.read.schema([

2. StructField("transactionId", NumberType(), True),

3. StructField("predError", IntegerType(), True)

4. ]).load(filePath)

C.

1.spark.read.schema(

2. StructType([

3. StructField("transactionId", StringType(), True),

4. StructField("predError", IntegerType(), True)]

5. )).parquet(filePath)

D.

1.spark.read.schema(

2. StructType([

3. StructField("transactionId", IntegerType(), True),

4. StructField("predError", IntegerType(), True)]

5. )).format("parquet").load(filePath)

E.

1.spark.read.schema([

2. StructField("transactionId", IntegerType(), True),

3. StructField("predError", IntegerType(), True)

4. ]).load(filePath, format="parquet")

Question 23

The code block shown below should return a DataFrame with columns transactionsId, predError, value, and f from DataFrame transactionsDf. Choose the answer that correctly fills the blanks in the

code block to accomplish this.

transactionsDf.__1__(__2__)

Options:

A.

1. filter

2. "transactionId", "predError", "value", "f"

B.

1. select

2. "transactionId, predError, value, f"

C.

1. select

2. ["transactionId", "predError", "value", "f"]

D.

1. where

2. col("transactionId"), col("predError"), col("value"), col("f")

E.

1. select

2. col(["transactionId", "predError", "value", "f"])

Question 24

Which of the following is the idea behind dynamic partition pruning in Spark?

Options:

A.

Dynamic partition pruning is intended to skip over the data you do not need in the results of a query.

B.

Dynamic partition pruning concatenates columns of similar data types to optimize join performance.

C.

Dynamic partition pruning performs wide transformations on disk instead of in memory.

D.

Dynamic partition pruning reoptimizes physical plans based on data types and broadcast variables.

E.

Dynamic partition pruning reoptimizes query plans based on runtime statistics collected during query execution.

Question 25

The code block shown below should convert up to 5 rows in DataFrame transactionsDf that have the value 25 in column storeId into a Python list. Choose the answer that correctly fills the blanks in

the code block to accomplish this.

Code block:

transactionsDf.__1__(__2__).__3__(__4__)

Options:

A.

1. filter

2. "storeId"==25

3. collect

4. 5

B.

1. filter

2. col("storeId")==25

3. toLocalIterator

4. 5

C.

1. select

2. storeId==25

3. head

4. 5

D.

1. filter

2. col("storeId")==25

3. take

4. 5

E.

1. filter

2. col("storeId")==25

3. collect

4. 5

Question 26

Which of the following code blocks generally causes a great amount of network traffic?

Options:

A.

DataFrame.select()

B.

DataFrame.coalesce()

C.

DataFrame.collect()

D.

DataFrame.rdd.map()

E.

DataFrame.count()

Question 27

The code block displayed below contains an error. The code block should produce a DataFrame with color as the only column and three rows with color values of red, blue, and green, respectively.

Find the error.

Code block:

1.spark.createDataFrame([("red",), ("blue",), ("green",)], "color")

Instead of calling spark.createDataFrame, just DataFrame should be called.

Options:

A.

The commas in the tuples with the colors should be eliminated.

B.

The colors red, blue, and green should be expressed as a simple Python list, and not a list of tuples.

C.

Instead of color, a data type should be specified.

D.

The "color" expression needs to be wrapped in brackets, so it reads ["color"].