nerdexam
DatabricksDatabricks

CERTIFIED-DATA-ENGINEER-PROFESSIONAL · Question #14

CERTIFIED-DATA-ENGINEER-PROFESSIONAL Question #14: Real Exam Question with Answer & Explanation

Sign in or unlock CERTIFIED-DATA-ENGINEER-PROFESSIONAL to reveal the answer and full explanation for question #14. The question stem and answer options stay visible for context.

Data Transformation

Question

An hourly batch job is configured to ingest data files from a cloud object storage container where each batch represent all records produced by the source system in a given hour. The batch job to process these records into the Lakehouse is sufficiently delayed to ensure no late-arriving data is missed. The user_id field represents a unique key for the data, which has the following schema: user_id BIGINT, username STRING, user_utc STRING, user_region STRING, last_login BIGINT, auto_pay BOOLEAN, last_updated BIGINT New records are all ingested into a table named account_history which maintains a full record of all data in the same schema as the source. The next table in the system is named account_current and is implemented as a Type 1 table representing the most recent value for each unique user_id. Assuming there are millions of user accounts and tens of thousands of records processed hourly, which implementation can be used to efficiently update the described account_current table as part of each hourly batch job?

Options

  • AUse Auto Loader to subscribe to new files in the account history directory; configure a Structured
  • BOverwrite the account current table with each batch using the results of a query against the
  • CFilter records in account history using the last updated field and the most recent hour processed,
  • DUse Delta Lake version history to get the difference between the latest version of account history
  • EFilter records in account history using the last updated field and the most recent hour processed,

Unlock CERTIFIED-DATA-ENGINEER-PROFESSIONAL to see the answer

You've previewed enough free CERTIFIED-DATA-ENGINEER-PROFESSIONAL questions. Unlock CERTIFIED-DATA-ENGINEER-PROFESSIONAL for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.

Topics

#Delta Lake#SCD Type 1#MERGE INTO#Data Transformation
Full CERTIFIED-DATA-ENGINEER-PROFESSIONAL PracticeBrowse All CERTIFIED-DATA-ENGINEER-PROFESSIONAL Questions