nerdexam
AmazonAmazon

MLA-C01 · Question #100

MLA-C01 Question #100: Real Exam Question with Answer & Explanation

The correct answer is D: Use Amazon Managed Streaming for Apache Kafka (Amazon MSK) to ingest the data. Use. Option D correctly pairs Amazon MSK (a managed Kafka service) for high-throughput clickstream ingestion with a downstream service - most likely Amazon EMR - that natively satisfies both remaining requirements: SQL-based processing via Spark SQL or Hive, and interactive analysis v

Data Preparation for Machine Learning

Question

A company is building a real-time data processing pipeline for an ecommerce application. The application generates a high volume of clickstream data that must be ingested, processed, and visualized in near real time. The company needs a solution that supports SQL for data processing and Jupyter notebooks for interactive analysis. Which solution will meet these requirements?

Options

  • AUse Amazon Data Firehose to ingest the data. Create an AWS Lambda function to process the
  • BUse Amazon Kinesis Data Streams to ingest the data. Use Amazon Data Firehose to transform
  • CUse Amazon Managed Streaming for Apache Kafka (Amazon MSK) to ingest the data. Use AWS
  • DUse Amazon Managed Streaming for Apache Kafka (Amazon MSK) to ingest the data. Use

Explanation

Option D correctly pairs Amazon MSK (a managed Kafka service) for high-throughput clickstream ingestion with a downstream service - most likely Amazon EMR - that natively satisfies both remaining requirements: SQL-based processing via Spark SQL or Hive, and interactive analysis via Jupyter notebooks through EMR Studio.

Why the distractors fail:

  • Option A (Firehose + Lambda): Lambda is stateless and event-driven - it cannot perform continuous SQL stream processing, and has no Jupyter notebook capability.
  • Option B (Kinesis Data Streams + Firehose): Firehose is a delivery service that routes data to destinations like S3 or Redshift; it is not a processing engine and supports neither SQL queries nor notebooks.
  • Option C (MSK + something else): While MSK is a valid ingestion layer, the paired service in option C likely addresses only one of the two required capabilities (SQL or Jupyter), not both.

Memory tip: Associate MSK → EMR as the "full stack" real-time pattern - MSK handles durable, high-throughput Kafka ingestion, and EMR is the multi-tool that covers SQL processing and Jupyter notebooks in one managed cluster. When you see both SQL and notebooks required together, EMR is the strong signal for the correct answer.

Topics

#Real-time streaming#Data ingestion#Data processing (SQL)#Interactive analysis (Jupyter)

Community Discussion

No community discussion yet for this question.

Full MLA-C01 PracticeBrowse All MLA-C01 Questions