nerdexam
AmazonAmazon

MLA-C01 · Question #169

MLA-C01 Question #169: Real Exam Question with Answer & Explanation

The correct answer is B: Apache Parquet. Apache Parquet is a columnar storage format optimized for analytics engines like Amazon Athena, providing efficient data scanning, reduced I/O, and faster query execution. These characteristics result in low latency for large-scale batch transformations and training, while also s

Data Preparation for Machine Learning

Question

An ML engineer is building an ML pipeline. The pipeline must process a dataset in two ways by using Amazon Athena. The pipeline must use batch processing to perform large-scale data transformations and for model training. The pipeline must also use near real-time processing to perform low-latency queries for inference and analytics. Which file format will provide the LEAST latency for both types of processing?

Options

  • ACSV
  • BApache Parquet
  • CNested JSON
  • DDeserialized JSON

Explanation

Apache Parquet is a columnar storage format optimized for analytics engines like Amazon Athena, providing efficient data scanning, reduced I/O, and faster query execution. These characteristics result in low latency for large-scale batch transformations and training, while also supporting fast, low-latency queries for near real-time inference and analytics.

Topics

#Data Formats#Amazon Athena#Latency Optimization#ML Data Processing

Community Discussion

No community discussion yet for this question.

Full MLA-C01 PracticeBrowse All MLA-C01 Questions