nerdexam
AmazonAmazon

DEA-C01 · Question #302

DEA-C01 Question #302: Real Exam Question with Answer & Explanation

The correct answer is D: Convert the CSV logs to a single Apache Parquet file for each day Partition the data by date in. Partitioning by date limits each query to only the relevant day’s data, and Parquet enables efficient columnar reads and predicate pushdown for scanning only the needed columns and row groups. Writing a single Parquet file per day minimizes file listing and small-file overhead, t

Data Ingestion and Transformation

Question

An ecommerce company collects daily customer transaction logs in CSV format and stores the logs in Amazon S3. The company uses Amazon Athena to scan a subset of attributes from the logs on the same day the company receives each log. Query times are increasing because of increasing transaction volume. The company wants to improve query performance. Which solution will meet these requirements with the SHORTEST query times?

Options

  • AConvert the CSV logs into multiple ORC files for better parallelism in Athena. Partition by date in
  • BConvert the CSV logs to JSON. Partition by date in Amazon S3. Use Athena with dynamic
  • CConvert the CSV logs to Avro. Partition by date in Amazon S3. Use Athena with projection-based
  • DConvert the CSV logs to a single Apache Parquet file for each day Partition the data by date in

Explanation

Partitioning by date limits each query to only the relevant day’s data, and Parquet enables efficient columnar reads and predicate pushdown for scanning only the needed columns and row groups. Writing a single Parquet file per day minimizes file listing and small-file overhead, typically delivering the shortest Athena query times for same-day, subset-column analytics.

Topics

#Columnar Storage#Amazon Athena Performance#Query Optimization#Data Partitioning

Community Discussion

No community discussion yet for this question.

Full DEA-C01 PracticeBrowse All DEA-C01 Questions