nerdexam
DatabricksDatabricks

CERTIFIED-DATA-ENGINEER-PROFESSIONAL · Question #123

CERTIFIED-DATA-ENGINEER-PROFESSIONAL Question #123: Real Exam Question with Answer & Explanation

Sign in or unlock CERTIFIED-DATA-ENGINEER-PROFESSIONAL to reveal the answer and full explanation for question #123. The question stem and answer options stay visible for context.

Building and Maintaining Data Pipelines

Question

A nightly batch job is configured to ingest all data files from a cloud object storage container where records are stored in a nested directory structure YYYY/MM/DD. The data for each date represents all records that were processed by the source system on that date, noting that some records may be delayed as they await moderator approval. Each entry represents a user review of a product and has the following schema: user_id STRING, review_id BIGINT, product_id BIGINT, review_timestamp TIMESTAMP, review_text STRING The ingestion job is configured to append all data for the previous date to a target table reviews_raw with an identical schema to the source system. The next step in the pipeline is a batch write to propagate all new records inserted into reviews_raw to a table where data is fully deduplicated, validated, and enriched. Which solution minimizes the compute costs to propagate this batch of data?

Options

  • APerform a batch read on the reviews_raw table and perform an insert-only merge using the
  • BConfigure a Structured Streaming read against the reviews_raw table using the trigger once
  • CUse Delta Lake version history to get the difference between the latest version of reviews_raw
  • DFilter all records in the reviews_raw table based on the review_timestamp; batch append those
  • EReprocess all records in reviews_raw and overwrite the next table in the pipeline.

Unlock CERTIFIED-DATA-ENGINEER-PROFESSIONAL to see the answer

You've previewed enough free CERTIFIED-DATA-ENGINEER-PROFESSIONAL questions. Unlock CERTIFIED-DATA-ENGINEER-PROFESSIONAL for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.

Topics

#Structured Streaming#Delta Lake#Incremental Processing#Cost Optimization
Full CERTIFIED-DATA-ENGINEER-PROFESSIONAL PracticeBrowse All CERTIFIED-DATA-ENGINEER-PROFESSIONAL Questions