DAS-C01 · Question #75
DAS-C01 Question #75: Real Exam Question with Answer & Explanation
The correct answer is B: Enable job bookmarks on the AWS Glue jobs.. To efficiently process only incremental data from Amazon S3 in daily AWS Glue ETL jobs, enabling job bookmarks on the Glue jobs is the most effective solution. Job bookmarks automatically track and process only new or changed data since the last successful run, minimizing coding
Question
A company has developed several AWS Glue jobs to validate and transform its data from Amazon S3 and load it into Amazon RDS for MySQL in batches once every day. The ETL jobs read the S3 data using a DynamicFrame. Currently, the ETL developers are experiencing challenges in processing only the incremental data on every run, as the AWS Glue job processes all the S3 input data on each run. Which approach would allow the developers to solve the issue with minimal coding effort?
Options
- AHave the ETL jobs read the data from Amazon S3 using a DataFrame.
- BEnable job bookmarks on the AWS Glue jobs.
- CCreate custom logic on the ETL jobs to track the processed S3 objects.
- DHave the ETL jobs delete the processed objects or data from Amazon S3 after each run.
Explanation
To efficiently process only incremental data from Amazon S3 in daily AWS Glue ETL jobs, enabling job bookmarks on the Glue jobs is the most effective solution. Job bookmarks automatically track and process only new or changed data since the last successful run, minimizing coding effort and improving job efficiency.
Common mistakes.
- A. Changing from DynamicFrame to DataFrame does not inherently provide functionality for incremental data processing. While both are used for data manipulation, neither solves the issue of tracking already processed data from S3 without additional custom logic, thus not meeting the 'minimal coding effort' requirement for incremental loading.
- C. Creating custom logic to track processed S3 objects would solve the problem but directly contradicts the requirement for 'minimal coding effort.' This approach would involve significant development, testing, and maintenance overhead compared to using Glue's native job bookmark feature.
- D. Deleting processed objects or data from Amazon S3 after each run is a dangerous practice that can lead to data loss, make auditing impossible, and complicate disaster recovery or reruns. It is not a robust or recommended solution for managing incremental data processing in a data lake.
Concept tested. AWS Glue Job Bookmarks for incremental data processing
Reference. https://docs.aws.amazon.com/glue/latest/dg/monitor-exceptions-jobs.html
Topics
Community Discussion
No community discussion yet for this question.