Free DAS-C01 Exam Braindumps

Pass your AWS Certified Data Analytics - Specialty exam with these free Questions and Answers

Page 4 of 32
QUESTION 11

A company is reading data from various customer databases that run on Amazon RDS. The databases contain many inconsistent fields For example, a customer record field that is place_id in one database is location_id in another database. The company wants to link customer records across different databases, even when many customer record fields do not match exactly
Which solution will meet these requirements with the LEAST operational overhead?

  1. A. Create an Amazon EMR cluster to process and analyze data in the databases Connect to the Apache Zeppelin notebook, and use the FindMatches transform to find duplicate records in the data.
  2. B. Create an AWS Glue crawler to crawl the database
  3. C. Use the FindMatches transform to find duplicate records in the data Evaluate and tune the transform by evaluating performance and results of finding matches
  4. D. Create an AWS Glue crawler to crawl the data in the databases Use Amazon SageMaker to construct Apache Spark ML pipelines to find duplicate records in the data
  5. E. Create an Amazon EMR cluster to process and analyze data in the database
  6. F. Connect to the Apache Zeppelin notebook, and use Apache Spark ML to find duplicate records in the dat
  7. G. Evaluate and tune the model by evaluating performance and results of finding duplicates

Correct Answer: B

QUESTION 12

An airline has been collecting metrics on flight activities for analytics. A recently completed proof of concept demonstrates how the company provides insights to data analysts to improve on-time departures. The proof of concept used objects in Amazon S3, which contained the metrics in .csv format, and used Amazon Athena for querying the data. As the amount of data increases, the data analyst wants to optimize the storage solution to improve query performance.
Which options should the data analyst use to improve performance as the data lake grows? (Choose three.)

  1. A. Add a randomized string to the beginning of the keys in S3 to get more throughput across partitions.
  2. B. Use an S3 bucket in the same account as Athena.
  3. C. Compress the objects to reduce the data transfer I/O.
  4. D. Use an S3 bucket in the same Region as Athena.
  5. E. Preprocess the .csv data to JSON to reduce I/O by fetching only the document keys needed by the query.
  6. F. Preprocess the .csv data to Apache Parquet to reduce I/O by fetching only the data blocks needed forpredicate

Correct Answer: CDF
https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/

QUESTION 13

A bank is using Amazon Managed Streaming for Apache Kafka (Amazon MSK) to populate real-time data into a data lake The data lake is built on Amazon S3, and data must be accessible from the data lake within 24 hours Different microservices produce messages to different topics in the cluster The cluster is created with 8 TB of Amazon Elastic Block Store (Amazon EBS) storage and a retention period of 7 days
The customer transaction volume has tripled recently and disk monitoring has provided an alert that the cluster is almost out of storage capacity
What should a data analytics specialist do to prevent the cluster from running out of disk space1?

  1. A. Use the Amazon MSK console to triple the broker storage and restart the cluster
  2. B. Create an Amazon CloudWatch alarm that monitors the KafkaDataLogsDiskUsed metric Automaticallyflush the oldest messages when the value of this metric exceeds 85%
  3. C. Create a custom Amazon MSK configuration Set the log retention hours parameter to 48 Update the cluster with the new configuration file
  4. D. Triple the number of consumers to ensure that data is consumed as soon as it is added to a topic.

Correct Answer: B

QUESTION 14

A financial company hosts a data lake in Amazon S3 and a data warehouse on an Amazon Redshift cluster. The company uses Amazon QuickSight to build dashboards and wants to secure access from its on-premises Active Directory to Amazon QuickSight.
How should the data be secured?

  1. A. Use an Active Directory connector and single sign-on (SSO) in a corporate network environment.
  2. B. Use a VPC endpoint to connect to Amazon S3 from Amazon QuickSight and an IAM role to authenticate Amazon Redshift.
  3. C. Establish a secure connection by creating an S3 endpoint to connect Amazon QuickSight and a VPC endpoint to connect to Amazon Redshift.
  4. D. Place Amazon QuickSight and Amazon Redshift in the security group and use an Amazon S3 endpoint to connect Amazon QuickSight to Amazon S3.

Correct Answer: A
https://docs.aws.amazon.com/quicksight/latest/user/directory-integration.html

QUESTION 15

A company is building a data lake and needs to ingest data from a relational database that has time-series data. The company wants to use managed services to accomplish this. The process needs to be scheduled daily and bring incremental data only from the source into Amazon S3.
What is the MOST cost-effective approach to meet these requirements?

  1. A. Use AWS Glue to connect to the data source using JDBC Driver
  2. B. Ingest incremental records only using job bookmarks.
  3. C. Use AWS Glue to connect to the data source using JDBC Driver
  4. D. Store the last updated key in an Amazon DynamoDB table and ingest the data using the updated key as a filter.
  5. E. Use AWS Glue to connect to the data source using JDBC Drivers and ingest the entire datase
  6. F. Use appropriate Apache Spark libraries to compare the dataset, and find the delta.
  7. G. Use AWS Glue to connect to the data source using JDBC Drivers and ingest the full dat
  8. H. Use AWS DataSync to ensure the delta only is written into Amazon S3.

Correct Answer: A
https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html

Page 4 of 32

Post your Comments and Discuss Amazon-Web-Services DAS-C01 exam with other Community members: