DAS-C01 · Question #89
DAS-C01 Question #89: Real Exam Question with Answer & Explanation
The correct answer is B: Add a key prefix of the form date=year-month-day/ to the S3 objects to partition the data.. To reduce costs and improve performance for Amazon Athena queries on 100 TB of log files with minimal maintenance, the data analyst should convert logs to Apache Parquet, partition S3 objects using a 'key=value/' prefix, and update the Athena table using MSCK REPAIR TABLE.
Question
A company has collected more than 100 TB of log files in the last 24 months. The files are stored as raw text in a dedicated Amazon S3 bucket. Each object has a key of the form year-month- day_log_HHmmss.txt where HHmmss represents the time the log file was initially created. A table was created in Amazon Athena that points to the S3 bucket. One-time queries are run against a subset of columns in the table several times an hour. A data analyst must make changes to reduce the cost of running these queries. Management wants a solution with minimal maintenance overhead. Which combination of steps should the data analyst take to meet these requirements? (Choose three.)
Options
- AConvert the log files to Apace Avro format.
- BAdd a key prefix of the form date=year-month-day/ to the S3 objects to partition the data.
- CConvert the log files to Apache Parquet format.
- DAdd a key prefix of the form year-month-day/ to the S3 objects to partition the data.
- EDrop and recreate the table with the PARTITIONED BY clause. Run the ALTER TABLE ADD
- FDrop and recreate the table with the PARTITIONED BY clause. Run the MSCK REPAIR TABLE
Explanation
To reduce costs and improve performance for Amazon Athena queries on 100 TB of log files with minimal maintenance, the data analyst should convert logs to Apache Parquet, partition S3 objects using a 'key=value/' prefix, and update the Athena table using MSCK REPAIR TABLE.
Common mistakes.
- A. While Apache Avro is a good data format, Apache Parquet (columnar) is generally more efficient for analytical queries in Athena, especially when querying a subset of columns, offering superior performance and cost benefits.
- D. Adding a key prefix of the form year-month-day/ is a valid partitioning strategy, but using the 'key=value/' format (e.g., date=year-month-day/) is the standard for
MSCK REPAIR TABLEto automatically discover partitions; without it, manual partition addition might be needed, increasing maintenance. - E. Using ALTER TABLE ADD PARTITION requires manually defining each partition, which becomes high-maintenance for daily log files over 24 months. MSCK REPAIR TABLE (Option F) is the preferred low-maintenance approach for adding many partitions.
Concept tested. Optimizing Athena queries with partitioning and columnar formats
Reference. https://docs.aws.amazon.com/athena/latest/ug/converting-to-columnar.html
Topics
Community Discussion
No community discussion yet for this question.