nerdexam
AmazonAmazon

MLA-C01 · Question #5

MLA-C01 Question #5: Real Exam Question with Answer & Explanation

The correct answer is D: AWS Lake Formation. AWS Lake Formation (D) is the correct choice because it is purpose-built to aggregate, catalog, and govern data from heterogeneous sources - including S3 (for the transaction logs and customer profiles) and on-premises databases like MySQL via AWS Glue connectors - into a unified

Data Preparation for Machine Learning

Question

Case Study An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3. The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data. Which AWS service or feature can aggregate the data from the various data sources?

Options

  • AAmazon EMR Spark jobs
  • BAmazon Kinesis Data Streams
  • CAmazon DynamoDB
  • DAWS Lake Formation

Explanation

AWS Lake Formation (D) is the correct choice because it is purpose-built to aggregate, catalog, and govern data from heterogeneous sources - including S3 (for the transaction logs and customer profiles) and on-premises databases like MySQL via AWS Glue connectors - into a unified data lake, making it immediately consumable for ML workflows.

Why the distractors are wrong:

  • A (EMR Spark) is a processing engine, not an aggregation/ingestion service - it can transform data already collected, but doesn't natively federate disparate sources into one place.
  • B (Kinesis Data Streams) is designed for real-time streaming ingestion, not batch aggregation of static tables and on-premises databases.
  • C (DynamoDB) is a NoSQL database - it stores data, but it has no native capability to pull in and unify data from S3 and MySQL sources.

Memory tip: Think of Lake Formation as the "librarian" - it doesn't generate or process the data, it collects, catalogs, and secures it from wherever it lives. When you see a question about unifying data from multiple heterogeneous sources (S3 + on-prem databases), Lake Formation is almost always the answer over compute or streaming services.

Topics

#Data Aggregation#Data Lake#AWS Lake Formation#Data Integration

Community Discussion

No community discussion yet for this question.

Full MLA-C01 PracticeBrowse All MLA-C01 Questions