CERTIFIED-DATA-ENGINEER-PROFESSIONAL · Question #14
CERTIFIED-DATA-ENGINEER-PROFESSIONAL Question #14: Real Exam Question with Answer & Explanation
Sign in or unlock CERTIFIED-DATA-ENGINEER-PROFESSIONAL to reveal the answer and full explanation for question #14. The question stem and answer options stay visible for context.
Question
An hourly batch job is configured to ingest data files from a cloud object storage container where each batch represent all records produced by the source system in a given hour. The batch job to process these records into the Lakehouse is sufficiently delayed to ensure no late-arriving data is missed. The user_id field represents a unique key for the data, which has the following schema: user_id BIGINT, username STRING, user_utc STRING, user_region STRING, last_login BIGINT, auto_pay BOOLEAN, last_updated BIGINT New records are all ingested into a table named account_history which maintains a full record of all data in the same schema as the source. The next table in the system is named account_current and is implemented as a Type 1 table representing the most recent value for each unique user_id. Assuming there are millions of user accounts and tens of thousands of records processed hourly, which implementation can be used to efficiently update the described account_current table as part of each hourly batch job?
Options
- AUse Auto Loader to subscribe to new files in the account history directory; configure a Structured
- BOverwrite the account current table with each batch using the results of a query against the
- CFilter records in account history using the last updated field and the most recent hour processed,
- DUse Delta Lake version history to get the difference between the latest version of account history
- EFilter records in account history using the last updated field and the most recent hour processed,
Unlock CERTIFIED-DATA-ENGINEER-PROFESSIONAL to see the answer
You've previewed enough free CERTIFIED-DATA-ENGINEER-PROFESSIONAL questions. Unlock CERTIFIED-DATA-ENGINEER-PROFESSIONAL for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.