nerdexam
AmazonAmazon

MLA-C01 · Question #159

MLA-C01 Question #159: Real Exam Question with Answer & Explanation

The correct answer is C: Configure an AWS Glue extract, transform, and load (ETL) job to convert the .csv files to Apache. Converting CSV files to Apache Parquet optimizes query performance by using a columnar storage format that reduces I/O and improves scan efficiency. AWS Glue provides a fully managed ETL capability to perform this conversion with minimal operational overhead.

Data Preparation for Machine Learning

Question

A company has significantly increased the amount of data that is stored as .csv files in an Amazon S3 bucket. Data transformation scripts and queries are now taking much longer than they used to take. An ML engineer must implement a solution to optimize the data for query performance. Which solution will meet this requirement with the LEAST operational overhead?

Options

  • AConfigure an AWS Lambda function to split the .csv files into smaller objects in the S3 bucket.
  • BConfigure an AWS Glue job to drop columns that have string type values and to save the results
  • CConfigure an AWS Glue extract, transform, and load (ETL) job to convert the .csv files to Apache
  • DConfigure an Amazon EMR cluster to process the data that is in the S3 bucket.

Explanation

Converting CSV files to Apache Parquet optimizes query performance by using a columnar storage format that reduces I/O and improves scan efficiency. AWS Glue provides a fully managed ETL capability to perform this conversion with minimal operational overhead.

Topics

#Data format optimization#Apache Parquet#AWS Glue ETL#Query performance

Community Discussion

No community discussion yet for this question.

Full MLA-C01 PracticeBrowse All MLA-C01 Questions