Which of the following code blocks returns a new DataFrame in which column attributes of DataFrame itemsDf is renamed to feature0 and column supplier to feature1?
Which of the following code blocks reduces a DataFrame from 12 to 6 partitions and performs a full shuffle?
The code block displayed below contains an error. When the code block below has executed, it should have divided DataFrame transactionsDf into 14 parts, based on columns storeId and
transactionDate (in this order). Find the error.
Code block:
transactionsDf.coalesce(14, ("storeId", "transactionDate"))
The code block shown below should store DataFrame transactionsDf on two different executors, utilizing the executors' memory as much as possible, but not writing anything to disk. Choose the
answer that correctly fills the blanks in the code block to accomplish this.
1.from pyspark import StorageLevel
2.transactionsDf.__1__(StorageLevel.__2__).__3__
Which of the following code blocks reads in parquet file /FileStore/imports.parquet as a DataFrame?
Which of the following statements about RDDs is incorrect?
The code block shown below should show information about the data type that column storeId of DataFrame transactionsDf contains. Choose the answer that correctly fills the blanks in the code
block to accomplish this.
Code block:
transactionsDf.__1__(__2__).__3__
The code block shown below should write DataFrame transactionsDf as a parquet file to path storeDir, using brotli compression and replacing any previously existing file. Choose the answer that
correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__.format("parquet").__2__(__3__).option(__4__, "brotli").__5__(storeDir)
Which of the following code blocks reads in the JSON file stored at filePath as a DataFrame?
Which of the following statements about the differences between actions and transformations is correct?
The code block shown below should return a copy of DataFrame transactionsDf without columns value and productId and with an additional column associateId that has the value 5. Choose the
answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__(__2__, __3__).__4__(__5__, 'value')
Which of the following is not a feature of Adaptive Query Execution?
The code block displayed below contains an error. The code block should return a new DataFrame that only contains rows from DataFrame transactionsDf in which the value in column predError is
at least 5. Find the error.
Code block:
transactionsDf.where("col(predError) >= 5")
Which of the following describes tasks?
Which of the following code blocks uses a schema fileSchema to read a parquet file at location filePath into a DataFrame?
Which of the following DataFrame operators is never classified as a wide transformation?
The code block displayed below contains an error. The code block should count the number of rows that have a predError of either 3 or 6. Find the error.
Code block:
transactionsDf.filter(col('predError').in([3, 6])).count()
Which of the following statements about garbage collection in Spark is incorrect?
Which of the following code blocks returns DataFrame transactionsDf sorted in descending order by column predError, showing missing values last?
Which of the following statements about storage levels is incorrect?
Which of the following code blocks creates a new DataFrame with two columns season and wind_speed_ms where column season is of data type string and column wind_speed_ms is of data type
double?
Which of the following code blocks reads in the parquet file stored at location filePath, given that all columns in the parquet file contain only whole numbers and are stored in the most appropriate
format for this kind of data?
The code block shown below should return a DataFrame with columns transactionsId, predError, value, and f from DataFrame transactionsDf. Choose the answer that correctly fills the blanks in the
code block to accomplish this.
transactionsDf.__1__(__2__)
Which of the following is the idea behind dynamic partition pruning in Spark?
The code block shown below should convert up to 5 rows in DataFrame transactionsDf that have the value 25 in column storeId into a Python list. Choose the answer that correctly fills the blanks in
the code block to accomplish this.
Code block:
transactionsDf.__1__(__2__).__3__(__4__)
Which of the following code blocks generally causes a great amount of network traffic?
The code block displayed below contains an error. The code block should produce a DataFrame with color as the only column and three rows with color values of red, blue, and green, respectively.
Find the error.
Code block:
1.spark.createDataFrame([("red",), ("blue",), ("green",)], "color")
Instead of calling spark.createDataFrame, just DataFrame should be called.