Free Professional-Data-Engineer Exam Braindumps

Pass your Google Professional Data Engineer Exam exam with these free Questions and Answers

Page 5 of 54
QUESTION 16

- (Exam Topic 6)
You have Cloud Functions written in Node.js that pull messages from Cloud Pub/Sub and send the data to BigQuery. You observe that the message processing rate on the Pub/Sub topic is orders of magnitude higher than anticipated, but there is no error logged in Stackdriver Log Viewer. What are the two most likely causes of this problem? Choose 2 answers.

  1. A. Publisher throughput quota is too small.
  2. B. Total outstanding messages exceed the 10-MB maximum.
  3. C. Error handling in the subscriber code is not handling run-time errors properly.
  4. D. The subscriber code cannot keep up with the messages.
  5. E. The subscriber code does not acknowledge the messages that it pulls.

Correct Answer: CD

QUESTION 17

- (Exam Topic 6)
You have a data pipeline with a Cloud Dataflow job that aggregates and writes time series metrics to Cloud Bigtable. This data feeds a dashboard used by thousands of users across the organization. You need to support additional concurrent users and reduce the amount of time required to write the data. Which two actions should you take? (Choose two.)

  1. A. Configure your Cloud Dataflow pipeline to use local execution
  2. B. Increase the maximum number of Cloud Dataflow workers by setting maxNumWorkers in PipelineOptions
  3. C. Increase the number of nodes in the Cloud Bigtable cluster
  4. D. Modify your Cloud Dataflow pipeline to use the Flatten transform before writing to Cloud Bigtable
  5. E. Modify your Cloud Dataflow pipeline to use the CoGroupByKey transform before writing to Cloud Bigtable

Correct Answer: DE

QUESTION 18

- (Exam Topic 6)
Your company receives both batch- and stream-based event data. You want to process the data using Google Cloud Dataflow over a predictable time period. However, you realize that in some instances data can arrive late or out of order. How should you design your Cloud Dataflow pipeline to handle data that is late or out of order?

  1. A. Set a single global window to capture all the data.
  2. B. Set sliding windows to capture all the lagged data.
  3. C. Use watermarks and timestamps to capture the lagged data.
  4. D. Ensure every datasource type (stream or batch) has a timestamp, and use the timestamps to define the logic for lagged data.

Correct Answer: B

QUESTION 19

- (Exam Topic 5)
Why do you need to split a machine learning dataset into training data and test data?

  1. A. So you can try two different sets of features
  2. B. To make sure your model is generalized for more than just the training data
  3. C. To allow you to create unit tests in your code
  4. D. So you can use one dataset for a wide model and one for a deep model

Correct Answer: B
The flaw with evaluating a predictive model on training data is that it does not inform you on how well the model has generalized to new unseen data. A model that is selected for its accuracy on the training dataset rather than its accuracy on an unseen test dataset is very likely to have lower accuracy on an unseen test dataset. The reason is that the model is not as generalized. It has specialized to the structure in the training dataset. This is called overfitting.
Reference: https://machinelearningmastery.com/a-simple-intuition-for-overfitting/

QUESTION 20

- (Exam Topic 6)
You need to migrate a 2TB relational database to Google Cloud Platform. You do not have the resources to significantly refactor the application that uses this database and cost to operate is of primary concern.
Which service do you select for storing and serving your data?

  1. A. Cloud Spanner
  2. B. Cloud Bigtable
  3. C. Cloud Firestore
  4. D. Cloud SQL

Correct Answer: D

Page 5 of 54

Post your Comments and Discuss Google Professional-Data-Engineer exam with other Community members: