nerdexam
AmazonAmazon

DEA-C01 · Question #128

DEA-C01 Question #128: Real Exam Question with Answer & Explanation

The correct answer is A: Design the application so it can remove duplicates during processing by embedding a unique ID. The data engineer needs to achieve exactly-once delivery for a Kinesis Data Streams pipeline ingesting transactional data from an application, especially considering network outages that can lead to duplicates.

Data Ingestion and Transformation

Question

A banking company uses an application to collect large volumes of transactional data. The company uses Amazon Kinesis Data Streams for real-time analytics. The company's application uses the PutRecord action to send data to Kinesis Data Streams. A data engineer has observed network outages during certain times of day. The data engineer wants to configure exactly-once delivery for the entire processing pipeline. Which solution will meet this requirement?

Options

  • ADesign the application so it can remove duplicates during processing by embedding a unique ID
  • BUpdate the checkpoint configuration of the Amazon Managed Service for Apache Flink
  • CDesign the data source so events are not ingested into Kinesis Data Streams multiple times.
  • DStop using Kinesis Data Streams. Use Amazon EMR instead. Use Apache Flink and Apache

Explanation

The data engineer needs to achieve exactly-once delivery for a Kinesis Data Streams pipeline ingesting transactional data from an application, especially considering network outages that can lead to duplicates.

Common mistakes.

  • B. Updating the checkpoint configuration of Amazon Managed Service for Apache Flink primarily helps Flink applications recover from failures and maintain state consistency during processing, contributing to at-least-once or exactly-once processing within Flink itself, but it does not prevent duplicate ingestion into Kinesis Data Streams from the producer.
  • C. While ideal, ensuring the data source never ingests events multiple times can be challenging or impossible to guarantee in the face of unreliable networks or application failures, and the solution needs to handle duplicates that do occur.
  • D. Switching from Kinesis Data Streams to Amazon EMR (which is a big data processing framework, not a streaming ingestion service) and using Apache Flink and Apache Spark does not inherently solve the exactly-once delivery problem originating from producer retries and network outages, and is an unnecessary platform change.

Concept tested. Exactly-once processing in streaming data pipelines

Reference. https://docs.aws.amazon.com/streams/latest/dev/kinesis-api-actions-putrecord.html

Topics

#Exactly-once delivery#Kinesis Data Streams#Data Deduplication#Streaming Data Pipelines

Community Discussion

No community discussion yet for this question.

Full DEA-C01 PracticeBrowse All DEA-C01 Questions