MLA-C01 · Question #169
MLA-C01 Question #169: Real Exam Question with Answer & Explanation
The correct answer is B: Apache Parquet. Apache Parquet is a columnar storage format optimized for analytics engines like Amazon Athena, providing efficient data scanning, reduced I/O, and faster query execution. These characteristics result in low latency for large-scale batch transformations and training, while also s
Question
An ML engineer is building an ML pipeline. The pipeline must process a dataset in two ways by using Amazon Athena. The pipeline must use batch processing to perform large-scale data transformations and for model training. The pipeline must also use near real-time processing to perform low-latency queries for inference and analytics. Which file format will provide the LEAST latency for both types of processing?
Options
- ACSV
- BApache Parquet
- CNested JSON
- DDeserialized JSON
Explanation
Apache Parquet is a columnar storage format optimized for analytics engines like Amazon Athena, providing efficient data scanning, reduced I/O, and faster query execution. These characteristics result in low latency for large-scale batch transformations and training, while also supporting fast, low-latency queries for near real-time inference and analytics.
Topics
Community Discussion
No community discussion yet for this question.