DAS-C01 · Question #146
DAS-C01 Question #146: Real Exam Question with Answer & Explanation
The correct answer is B: Use an AWS Glue crawler to catalog the data in Amazon S3. To automate the ingestion of patient information with updates/deletes and append-only diagnostic test data into Amazon Redshift tables, using an AWS Glue crawler is a crucial foundational step for schema discovery.
Question
A hospital uses an electronic health records (EHR) system to collect two types of data. - Patient information, which includes a patient's name and address - Diagnostic tests conducted and the results of these tests Patient information is expected to change periodically Existing diagnostic test data never changes and only new records are added. The hospital runs an Amazon Redshift cluster with four dc2.large nodes and wants to automate the ingestion of the patient information and diagnostic test data into respective Amazon Redshift tables for analysis. The EHR system exports data as CSV files to an Amazon S3 bucket on a daily basis. Two sets of CSV files are generated. One set of files is for patient information with updates, deletes, and inserts. The other set of files is for new diagnostic test data only. What is the MOST cost-effective solution to meet these requirements?
Options
- AUse Amazon EMR with Apache Hudi.
- BUse an AWS Glue crawler to catalog the data in Amazon S3
- CUse an AWS Lambda function to run a COPY command that appends new diagnostic test data to
- DUse AWS Database Migration Service (AWS DMS) to collect and process change data capture
Explanation
To automate the ingestion of patient information with updates/deletes and append-only diagnostic test data into Amazon Redshift tables, using an AWS Glue crawler is a crucial foundational step for schema discovery.
Common mistakes.
- A. Amazon EMR with Apache Hudi is excellent for managing mutable data in a data lake in S3 but requires additional integration or ETL steps to automate ingestion specifically into Amazon Redshift tables.
- C. An AWS Lambda function with a COPY command is suitable for appending new data to Redshift, like diagnostic tests, but it does not inherently provide an automated solution for handling updates and deletes required for patient information, nor does it automate schema inference.
- D. AWS Database Migration Service (DMS) is designed for migrating databases and continuous replication (CDC) between database systems. While powerful for CDC, using DMS directly for file-based CDC from S3 into Redshift for varying schemas can be more complex than an AWS Glue-centric approach.
Concept tested. AWS Glue Crawler for schema discovery and cataloging
Reference. https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#crawler-overview
Topics
Community Discussion
No community discussion yet for this question.