nerdexam
AmazonAmazon

DAS-C01 · Question #125

DAS-C01 Question #125: Real Exam Question with Answer & Explanation

The correct answer is C: Run a daily AWS Glue ETL job to convert the data files to Apache Parquet and to partition the. {"question_number": 4, "correct_answer": "C, E", "explanation": "Option C converts CSV files to Apache Parquet (a columnar format) and adds partitioning via AWS Glue ETL. Parquet drastically reduces the amount of data Athena scans per query through column pruning and predicate pu

Processing

Question

A company analyzes historical data and needs to query data that is stored in Amazon S3. New data is generated daily as .csv files that are stored in Amazon S3. The company's analysts are using Amazon Athena to perform SQL queries against a recent subset of the overall data. The amount of data that is ingested into Amazon S3 has increased substantially over time, and the query latency also has increased. Which solutions could the company implement to improve query performance? (Choose two.)

Options

  • AUse MySQL Workbench on an Amazon EC2 instance, and connect to Athena by using a JDBC or
  • BUse Athena to extract the data and store it in Apache Parquet format on a daily basis. Query the
  • CRun a daily AWS Glue ETL job to convert the data files to Apache Parquet and to partition the
  • DRun a daily AWS Glue ETL job to compress the data files by using the .gzip format. Query the
  • ERun a daily AWS Glue ETL job to compress the data files by using the .lzo format. Query the

Explanation

{"question_number": 4, "correct_answer": "C, E", "explanation": "Option C converts CSV files to Apache Parquet (a columnar format) and adds partitioning via AWS Glue ETL. Parquet drastically reduces the amount of data Athena scans per query through column pruning and predicate pushdown, and partitioning further limits scans to only relevant data segments - both are the most impactful Athena optimizations. Option E compresses CSV files using the LZO format, which - unlike gzip (option D) - is splittable, meaning Athena can parallelize reads across multiple splits of a single file. Option A (MySQL Workbench via JDBC) does not improve Athena performance. Option B uses Athena CTAS to produce Parquet, but Glue ETL (option C) is more suitable for scheduled daily batch transformation and also adds partitioning, making C the superior Parquet conversion approach.", "generated_by": "claude-sonnet", "llm_judge_score": 4}

Topics

#Amazon Athena Performance#AWS Glue ETL#Apache Parquet#Data Partitioning

Community Discussion

No community discussion yet for this question.

Full DAS-C01 PracticeBrowse All DAS-C01 Questions