Free Professional-Data-Engineer Exam Braindumps

Pass your Google Professional Data Engineer Exam exam with these free Questions and Answers

Page 11 of 54
QUESTION 46

- (Exam Topic 6)
Your team is responsible for developing and maintaining ETLs in your company. One of your Dataflow jobs is failing because of some errors in the input data, and you need to improve reliability of the pipeline (incl. being able to reprocess all failing data). What should you do?

  1. A. Add a filtering step to skip these types of errors in the future, extract erroneous rows from logs.
  2. B. Add a try… catch block to your DoFn that transforms the data, extract erroneous rows from logs.
  3. C. Add a try… catch block to your DoFn that transforms the data, write erroneous rows to PubSub directly from the DoFn.
  4. D. Add a try… catch block to your DoFn that transforms the data, use a sideOutput to create a PCollection that can be stored to PubSub later.

Correct Answer: C

QUESTION 47

- (Exam Topic 6)
You have a query that filters a BigQuery table using a WHERE clause on timestamp and ID columns. By using bq query – -dry_run you learn that the query triggers a full scan of the table, even though the filter on timestamp and ID select a tiny fraction of the overall data. You want to reduce the amount of data scanned by BigQuery with minimal changes to existing SQL queries. What should you do?

  1. A. Create a separate table for each ID.
  2. B. Use the LIMIT keyword to reduce the number of rows returned.
  3. C. Recreate the table with a partitioning column and clustering column.
  4. D. Use the bq query - -maximum_bytes_billed flag to restrict the number of bytes billed.

Correct Answer: B

QUESTION 48

- (Exam Topic 6)
You have developed three data processing jobs. One executes a Cloud Dataflow pipeline that transforms data uploaded to Cloud Storage and writes results to BigQuery. The second ingests data from on-premises servers and uploads it to Cloud Storage. The third is a Cloud Dataflow pipeline that gets information from third-party data providers and uploads the information to Cloud Storage. You need to be able to schedule and monitor the execution of these three workflows and manually execute them when needed. What should you do?

  1. A. Create a Direct Acyclic Graph in Cloud Composer to schedule and monitor the jobs.
  2. B. Use Stackdriver Monitoring and set up an alert with a Webhook notification to trigger the jobs.
  3. C. Develop an App Engine application to schedule and request the status of the jobs using GCP API calls.
  4. D. Set up cron jobs in a Compute Engine instance to schedule and monitor the pipelines using GCP API calls.

Correct Answer: D

QUESTION 49

- (Exam Topic 5)
How would you query specific partitions in a BigQuery table?

  1. A. Use the DAY column in the WHERE clause
  2. B. Use the EXTRACT(DAY) clause
  3. C. Use the PARTITIONTIME pseudo-column in the WHERE clause
  4. D. Use DATE BETWEEN in the WHERE clause

Correct Answer: C
Partitioned tables include a pseudo column named _PARTITIONTIME that contains a date-based timestamp for data loaded into the table. To limit a query to particular partitions (such as Jan 1st and 2nd of 2017), use a clause similar to this:
WHERE _PARTITIONTIME BETWEEN TIMESTAMP('2017-01-01') AND TIMESTAMP('2017-01-02')
Reference: https://cloud.google.com/bigquery/docs/partitioned-tables#the_partitiontime_pseudo_column

QUESTION 50

- (Exam Topic 5)
Which of these is NOT a way to customize the software on Dataproc cluster instances?

  1. A. Set initialization actions
  2. B. Modify configuration files using cluster properties
  3. C. Configure the cluster using Cloud Deployment Manager
  4. D. Log into the master node and make changes from there

Correct Answer: C
You can access the master node of the cluster by clicking the SSH button next to it in the Cloud Console.
You can easily use the --properties option of the dataproc command in the Google Cloud SDK to modify many common configuration files when creating a cluster.
When creating a Cloud Dataproc cluster, you can specify initialization actions in executables and/or scripts that Cloud Dataproc will run on all nodes in your Cloud Dataproc cluster immediately after the cluster is set up. [https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/init-actions]
Reference: https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/cluster-properties

Page 11 of 54

Post your Comments and Discuss Google Professional-Data-Engineer exam with other Community members: