Special Summer Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70special

Google Associate-Data-Practitioner Google Cloud Associate Data Practitioner (ADP Exam) Exam Practice Test

Google Cloud Associate Data Practitioner (ADP Exam) Questions and Answers

Testing Engine

  • Product Type: Testing Engine
$37.5  $124.99

PDF Study Guide

  • Product Type: PDF Study Guide
$33  $109.99
Question 1

Your organization uses a BigQuery table that is partitioned by ingestion time. You need to remove data that is older than one year to reduce your organization’s storage costs. You want to use the most efficient approach while minimizing cost. What should you do?

Options:

A.

Create a scheduled query that periodically runs an update statement in SQL that sets the “deleted" column to “yes” for data that is more than one year old. Create a view that filters out rows that have been marked deleted.

B.

Create a view that filters out rows that are older than one year.

C.

Require users to specify a partition filter using the alter table statement in SQL.

D.

Set the table partition expiration period to one year using the ALTER TABLE statement in SQL.

Question 2

Your organization needs to store historical customer order data. The data will only be accessed once a month for analysis and must be readily available within a few seconds when it is accessed. You need to choose a storage class that minimizes storage costs while ensuring that the data can be retrieved quickly. What should you do?

Options:

A.

Store the data in Cloud Storaqe usinq Nearline storaqe.

B.

Store the data in Cloud Storaqe usinq Coldline storaqe.

C.

Store the data in Cloud Storage using Standard storage.

D.

Store the data in Cloud Storage using Archive storage.

Question 3

You manage an ecommerce website that has a diverse range of products. You need to forecast future product demand accurately to ensure that your company has sufficient inventory to meet customer needs and avoid stockouts. Your company's historical sales data is stored in a BigQuery table. You need to create a scalable solution that takes into account the seasonality and historical data to predict product demand. What should you do?

Options:

A.

Use the historical sales data to train and create a BigQuery ML time series model. Use the ML.FORECAST function call to output the predictions into a new BigQuery table.

B.

Use Colab Enterprise to create a Jupyter notebook. Use the historical sales data to train a custom prediction model in Python.

C.

Use the historical sales data to train and create a BigQuery ML linear regression model. Use the ML.PREDICT function call to output the predictions into a new BigQuery table.

D.

Use the historical sales data to train and create a BigQuery ML logistic regression model. Use the ML.PREDICT function call to output the predictions into a new BigQuery table.

Question 4

You recently inherited a task for managing Dataflow streaming pipelines in your organization and noticed that proper access had not been provisioned to you. You need to request a Google-provided IAM role so you can restart the pipelines. You need to follow the principle of least privilege. What should you do?

Options:

A.

Request the Dataflow Developer role.

B.

Request the Dataflow Viewer role.

C.

Request the Dataflow Worker role.

D.

Request the Dataflow Admin role.

Question 5

Your data science team needs to collaboratively analyze a 25 TB BigQuery dataset to support the development of a machine learning model. You want to use Colab Enterprise notebooks while ensuring efficient data access and minimizing cost. What should you do?

Options:

A.

Export the BigQuery dataset to Google Drive. Load the dataset into the Colab Enterprise notebook using Pandas.

B.

Use BigQuery magic commands within a Colab Enterprise notebook to query and analyze the data.

C.

Create a Dataproc cluster connected to a Colab Enterprise notebook, and use Spark to process the data in BigQuery.

D.

Copy the BigQuery dataset to the local storage of the Colab Enterprise runtime, and analyze the data using Pandas.

Question 6

You are using your own data to demonstrate the capabilities of BigQuery to your organization’s leadership team. You need to perform a one-time load of the files stored on your local machine into BigQuery using as little effort as possible. What should you do?

Options:

A.

Write and execute a Python script using the BigQuery Storage Write API library.

B.

Create a Dataproc cluster, copy the files to Cloud Storage, and write an Apache Spark job using the spark-bigquery-connector.

C.

Execute the bq load command on your local machine.

D.

Create a Dataflow job using the Apache Beam FileIO and BigQueryIO connectors with a local runner.

Question 7

Your organization stores highly personal data in BigQuery and needs to comply with strict data privacy regulations. You need to ensure that sensitive data values are rendered unreadable whenever an employee leaves the organization. What should you do?

Options:

A.

Use AEAD functions and delete keys when employees leave the organization.

B.

Use dynamic data masking and revoke viewer permissions when employees leave the organization.

C.

Use customer-managed encryption keys (CMEK) and delete keys when employees leave the organization.

D.

Use column-level access controls with policy tags and revoke viewer permissions when employees leave the organization.

Question 8

You are designing an application that will interact with several BigQuery datasets. You need to grant the application’s service account permissions that allow it to query and update tables within the datasets, and list all datasets in a project within your application. You want to follow the principle of least privilege. Which pre-defined IAM role(s) should you apply to the service account?

Options:

A.

roles/bigquery.jobUser and roles/bigquery.dataOwner

B.

roles/bigquery.connectionUser and roles/bigquery.dataViewer

C.

roles/bigquery.admin

D.

roles/bigquery.user and roles/bigquery.filteredDataViewer

Question 9

You need to create a data pipeline for a new application. Your application will stream data that needs to be enriched and cleaned. Eventually, the data will be used to train machine learning models. You need to determine the appropriate data manipulation methodology and which Google Cloud services to use in this pipeline. What should you choose?

Options:

A.

ETL; Dataflow -> BigQuery

B.

ETL; Cloud Data Fusion -> Cloud Storage

C.

ELT; Cloud Storage -> Bigtable

D.

ELT; Cloud SQL -> Analytics Hub

Question 10

Your organization plans to move their on-premises environment to Google Cloud. Your organization’s network bandwidth is less than 1 Gbps. You need to move over 500 ТВ of data to Cloud Storage securely, and only have a few days to move the data. What should you do?

Options:

A.

Request multiple Transfer Appliances, copy the data to the appliances, and ship the appliances back to Google Cloud to upload the data to Cloud Storage.

B.

Connect to Google Cloud using VPN. Use Storage Transfer Service to move the data to Cloud Storage.

C.

Connect to Google Cloud using VPN. Use the gcloud storage command to move the data to Cloud Storage.

D.

Connect to Google Cloud using Dedicated Interconnect. Use the gcloud storage command to move the data to Cloud Storage.

Question 11

Your organization is building a new application on Google Cloud. Several data files will need to be stored in Cloud Storage. Your organization has approved only two specific cloud regions where these data files can reside. You need to determine a Cloud Storage bucket strategy that includes automated high availability. What should you do?

Options:

A.

Create a dual-region bucket, and upload the files to this bucket.

B.

Create a single-region bucket in each of the two regions, and use the gcloud storage command to replicate the data across the buckets in both regions.

C.

Create a multi-region bucket, and upload the files to this bucket.

D.

Create a single-region bucket in each of the two regions, and use Storage Transfer Service to replicate the data across the buckets in both regions.

Question 12

You have millions of customer feedback records stored in BigQuery. You want to summarize the data by using the large language model (LLM) Gemini. You need to plan and execute this analysis using the most efficient approach. What should you do?

Options:

A.

Query the BigQuery table from within a Python notebook, use the Gemini API to summarize the data within the notebook, and store the summaries in BigQuery.

B.

Use a BigQuery ML model to pre-process the text data, export the results to Cloud Storage, and use the Gemini API to summarize the pre- processed data.

C.

Create a BigQuery Cloud resource connection to a remote model in Vertex Al, and use Gemini to summarize the data.

D.

Export the raw BigQuery data to a CSV file, upload it to Cloud Storage, and use the Gemini API to summarize the data.

Question 13

Your organization needs to implement near real-time analytics for thousands of events arriving each second in Pub/Sub. The incoming messages require transformations. You need to configure a pipeline that processes, transforms, and loads the data into BigQuery while minimizing development time. What should you do?

Options:

A.

Use a Google-provided Dataflow template to process the Pub/Sub messages, perform transformations, and write the results to BigQuery.

B.

Create a Cloud Data Fusion instance and configure Pub/Sub as a source. Use Data Fusion to process the Pub/Sub messages, perform transformations, and write the results to BigQuery.

C.

Load the data from Pub/Sub into Cloud Storage using a Cloud Storage subscription. Create a Dataproc cluster, use PySpark to perform transformations in Cloud Storage, and write the results to BigQuery.

D.

Use Cloud Run functions to process the Pub/Sub messages, perform transformations, and write the results to BigQuery.

Question 14

Your company has developed a website that allows users to upload and share video files. These files are most frequently accessed and shared when they are initially uploaded. Over time, the files are accessed and shared less frequently, although some old video files may remain very popular. You need to design a storage system that is simple and cost-effective. What should you do?

Options:

A.

Create a single-region bucket with custom Object Lifecycle Management policies based on upload date.

B.

Create a single-region bucket with Autoclass enabled.

C.

Create a single-region bucket. Configure a Cloud Scheduler job that runs every 24 hours and changes the storage class based on upload date.

D.

Create a single-region bucket with Archive as the default storage class.

Question 15

Your organization uses scheduled queries to perform transformations on data stored in BigQuery. You discover that one of your scheduled queries has failed. You need to troubleshoot the issue as quickly as possible. What should you do?

Options:

A.

Navigate to the Logs Explorer page in Cloud Logging. Use filters to find the failed job, and analyze the error details.

B.

Set up a log sink using the gcloud CLI to export BigQuery audit logs to BigQuery. Query those logs to identify the error associated with the failed job ID.

C.

Request access from your admin to the BigQuery information_schema. Query the jobs view with the failed job ID, and analyze error details.

D.

Navigate to the Scheduled queries page in the Google Cloud console. Select the failed job, and analyze the error details.

Question 16

Your organization has a petabyte of application logs stored as Parquet files in Cloud Storage. You need to quickly perform a one-time SQL-based analysis of the files and join them to data that already resides in BigQuery. What should you do?

Options:

A.

Create a Dataproc cluster, and write a PySpark job to join the data from BigQuery to the files in Cloud Storage.

B.

Launch a Cloud Data Fusion environment, use plugins to connect to BigQuery and Cloud Storage, and use the SQL join operation to analyze the data.

C.

Create external tables over the files in Cloud Storage, and perform SQL joins to tables in BigQuery to analyze the data.

D.

Use the bq load command to load the Parquet files into BigQuery, and perform SQL joins to analyze the data.

Question 17

You are working with a large dataset of customer reviews stored in Cloud Storage. The dataset contains several inconsistencies, such as missing values, incorrect data types, and duplicate entries. You need toclean the data to ensure that it is accurate and consistent before using it for analysis. What should you do?

Options:

A.

Use the PythonOperator in Cloud Composer to clean the data and load it into BigQuery. Use SQL for analysis.

B.

Use BigQuery to batch load the data into BigQuery. Use SQL for cleaning and analysis.

C.

Use Storage Transfer Service to move the data to a different Cloud Storage bucket. Use event triggers to invoke Cloud Run functions to load the data into BigQuery. Use SQL for analysis.

D.

Use Cloud Run functions to clean the data and load it into BigQuery. Use SQL for analysis.

Question 18

Your organization has a BigQuery dataset that contains sensitive employee information such as salaries and performance reviews. The payroll specialist in the HR department needs to have continuous access to aggregated performance data, but they do not need continuous access to other sensitive data. You need to grant the payroll specialist access to the performance data without granting them access to the entire dataset using the simplest and most secure approach. What should you do?

Options:

A.

Use authorized views to share query results with the payroll specialist.

B.

Create row-level and column-level permissions and policies on the table that contains performance data in the dataset. Provide the payroll specialist with the appropriate permission set.

C.

Create a table with the aggregated performance data. Use table-level permissions to grant access to the payroll specialist.

D.

Create a SQL query with the aggregated performance data. Export the results to an Avro file in a Cloud Storage bucket. Share the bucket with the payroll specialist.

Question 19

You are a data analyst at your organization. You have been given a BigQuery dataset that includes customer information. The dataset contains inconsistencies and errors, such as missing values, duplicates, and formatting issues. You need to effectively and quickly clean the data. What should you do?

Options:

A.

Develop a Dataflow pipeline to read the data from BigQuery, perform data quality rules and transformations, and write the cleaned data back to BigQuery.

B.

Use Cloud Data Fusion to create a data pipeline to read the data from BigQuery, perform data quality transformations, and write the clean data back to BigQuery.

C.

Export the data from BigQuery to CSV files. Resolve the errors using a spreadsheet editor, and re-import the cleaned data into BigQuery.

D.

Use BigQuery's built-in functions to perform data quality transformations.

Question 20

Your company is building a near real-time streaming pipeline to process JSON telemetry data from small appliances. You need to process messages arriving at a Pub/Sub topic, capitalize letters in the serial number field, and write results to BigQuery. You want to use a managed service and write a minimal amount of code for underlying transformations. What should you do?

Options:

A.

Use a Pub/Sub to BigQuery subscription, write results directly to BigQuery, and schedule a transformation query to run every five minutes.

B.

Use a Pub/Sub to Cloud Storage subscription, write a Cloud Run service that is triggered when objects arrive in the bucket, performs the transformations, and writes the results to BigQuery.

C.

Use the “Pub/Sub to BigQuery” Dataflow template with a UDF, and write the results to BigQuery.

D.

Use a Pub/Sub push subscription, write a Cloud Run service that accepts the messages, performs the transformations, and writes the results to BigQuery.

Question 21

You are a database administrator managing sales transaction data by region stored in a BigQuery table. You need to ensure that each sales representative can only see the transactions in their region. What should you do?

Options:

A.

Add a policy tag in BigQuery.

B.

Create a row-level access policy.

C.

Create a data masking rule.

D.

Grant the appropriate 1AM permissions on the dataset.

Question 22

Your organization has several datasets in BigQuery. The datasets need to be shared with your external partners so that they can run SQL queries without needing to copy the data to their own projects. You have organized each partner’s data in its own BigQuery dataset. Each partner should be able to access only their data. You want to share the data while following Google-recommended practices. What should you do?

Options:

A.

Use Analytics Hub to create a listing on a private data exchange for each partner dataset. Allow each partner to subscribe to their respective listings.

B.

Create a Dataflow job that reads from each BigQuery dataset and pushes the data into a dedicated Pub/Sub topic for each partner. Grant each partner the pubsub. subscriber IAM role.

C.

Export the BigQuery data to a Cloud Storage bucket. Grant the partners the storage.objectUser IAM role on the bucket.

D.

Grant the partners the bigquery.user IAM role on the BigQuery project.

Question 23

You need to design a data pipeline that ingests data from CSV, Avro, and Parquet files into Cloud Storage. The data includes raw user input. You need to remove all malicious SQL injections before storing the data in BigQuery. Which data manipulation methodology should you choose?

Options:

A.

EL

B.

ELT

C.

ETL

D.

ETLT

Question 24

You work for a financial services company that handles highly sensitive data. Due to regulatory requirements, your company is required to have complete and manual control of data encryption. Which type of keys should you recommend to use for data storage?

Options:

A.

Use customer-supplied encryption keys (CSEK).

B.

Use a dedicated third-party key management system (KMS) chosen by the company.

C.

Use Google-managed encryption keys (GMEK).

D.

Use customer-managed encryption keys (CMEK).

Question 25

You are designing a pipeline to process data files that arrive in Cloud Storage by 3:00 am each day. Data processing is performed in stages, where the output of one stage becomes the input of the next. Each stage takes a long time to run. Occasionally a stage fails, and you have to address

the problem. You need to ensure that the final output is generated as quickly as possible. What should you do?

Options:

A.

Design a Spark program that runs under Dataproc. Code the program to wait for user input when an error is detected. Rerun the last action after correcting any stage output data errors.

B.

Design the pipeline as a set of PTransforms in Dataflow. Restart the pipeline after correcting any stage output data errors.

C.

Design the workflow as a Cloud Workflow instance. Code the workflow to jump to a given stage based on an input parameter. Rerun the workflow after correcting any stage output data errors.

D.

Design the processing as a directed acyclic graph (DAG) in Cloud Composer. Clear the state of the failed task after correcting any stage output data errors.

Question 26

You need to design a data pipeline to process large volumes of raw server log data stored in Cloud Storage. The data needs to be cleaned, transformed, and aggregated before being loaded into BigQuery for analysis. The transformation involves complex data manipulation using Spark scripts that your team developed. You need to implement a solution that leverages your team’s existing skillset, processes data at scale, and minimizes cost. What should you do?

Options:

A.

Use Dataflow with a custom template for the transformation logic.

B.

Use Cloud Data Fusion to visually design and manage the pipeline.

C.

Use Dataform to define the transformations in SQLX.

D.

Use Dataproc to run the transformations on a cluster.