CERTIFIED-DATA-ENGINEER-PROFESSIONAL · Question #122
CERTIFIED-DATA-ENGINEER-PROFESSIONAL Question #122: Real Exam Question with Answer & Explanation
The correct answer is A: Duplicate records enqueued more than 2 hours apart may be retained and the orders table may. See the full explanation below for the reasoning.
Question
A task orchestrator has been configured to run two hourly tasks. First, an outside system writes Parquet data to a directory mounted at /mnt/raw_orders/. After this data is written, a Databricks job containing the following code is executed: Assume that the fields customer_id and order_id serve as a composite key to uniquely identify each order, and that the time field indicates when the record was queued in the source system. If the upstream system is known to occasionally enqueue duplicate entries for a single order hours apart, which statement is correct?
Options
- ADuplicate records enqueued more than 2 hours apart may be retained and the orders table may
- BAll records will be held in the state store for 2 hours before being deduplicated and committed to
- CThe orders table will contain only the most recent 2 hours of records and no duplicates will be
- DDuplicate records arriving more than 2 hours apart will be dropped, but duplicates that arrive in
- EThe orders table will not contain duplicates, but records arriving more than 2 hours late will be
Topics
Community Discussion
No community discussion yet for this question.