nerdexam
AmazonAmazon

DAS-C01 · Question #89

DAS-C01 Question #89: Real Exam Question with Answer & Explanation

The correct answer is B: Add a key prefix of the form date=year-month-day/ to the S3 objects to partition the data.. To reduce costs and improve performance for Amazon Athena queries on 100 TB of log files with minimal maintenance, the data analyst should convert logs to Apache Parquet, partition S3 objects using a 'key=value/' prefix, and update the Athena table using MSCK REPAIR TABLE.

Storage and Data Management

Question

A company has collected more than 100 TB of log files in the last 24 months. The files are stored as raw text in a dedicated Amazon S3 bucket. Each object has a key of the form year-month- day_log_HHmmss.txt where HHmmss represents the time the log file was initially created. A table was created in Amazon Athena that points to the S3 bucket. One-time queries are run against a subset of columns in the table several times an hour. A data analyst must make changes to reduce the cost of running these queries. Management wants a solution with minimal maintenance overhead. Which combination of steps should the data analyst take to meet these requirements? (Choose three.)

Options

  • AConvert the log files to Apace Avro format.
  • BAdd a key prefix of the form date=year-month-day/ to the S3 objects to partition the data.
  • CConvert the log files to Apache Parquet format.
  • DAdd a key prefix of the form year-month-day/ to the S3 objects to partition the data.
  • EDrop and recreate the table with the PARTITIONED BY clause. Run the ALTER TABLE ADD
  • FDrop and recreate the table with the PARTITIONED BY clause. Run the MSCK REPAIR TABLE

Explanation

To reduce costs and improve performance for Amazon Athena queries on 100 TB of log files with minimal maintenance, the data analyst should convert logs to Apache Parquet, partition S3 objects using a 'key=value/' prefix, and update the Athena table using MSCK REPAIR TABLE.

Common mistakes.

  • A. While Apache Avro is a good data format, Apache Parquet (columnar) is generally more efficient for analytical queries in Athena, especially when querying a subset of columns, offering superior performance and cost benefits.
  • D. Adding a key prefix of the form year-month-day/ is a valid partitioning strategy, but using the 'key=value/' format (e.g., date=year-month-day/) is the standard for MSCK REPAIR TABLE to automatically discover partitions; without it, manual partition addition might be needed, increasing maintenance.
  • E. Using ALTER TABLE ADD PARTITION requires manually defining each partition, which becomes high-maintenance for daily log files over 24 months. MSCK REPAIR TABLE (Option F) is the preferred low-maintenance approach for adding many partitions.

Concept tested. Optimizing Athena queries with partitioning and columnar formats

Reference. https://docs.aws.amazon.com/athena/latest/ug/converting-to-columnar.html

Topics

#Athena Cost Optimization#S3 Data Partitioning#Columnar Data Formats#Partition Management

Community Discussion

No community discussion yet for this question.

Full DAS-C01 PracticeBrowse All DAS-C01 Questions