nerdexam
DatabricksDatabricks

CERTIFIED-MACHINE-LEARNING-PROFESSIONAL · Question #26

CERTIFIED-MACHINE-LEARNING-PROFESSIONAL Question #26: Real Exam Question with Answer & Explanation

The correct answer is B: Replace schema(schema) with option("maxFilesPerTriqqer", 1}. When reading a Delta table as a stream source, the Delta format is self-describing - it automatically infers its own schema from the table metadata, making .schema(schema) unnecessary and potentially conflicting. Replacing it with .option("maxFilesPerTrigger", 1) correctly config

Question

A machine learning engineer is using the following code block as part of a batch deployment pipeline: Which of the following changes needs to be made so this code block will work when the inference table is a stream source?

Options

  • AReplace "inference" with the path to the location of the Delta table
  • BReplace schema(schema) with option("maxFilesPerTriqqer", 1}
  • CReplace spark.read with spark.readStream
  • DReplace formatfdelta") with format("stream")
  • EReplace predict with a stream-friendly prediction function

Explanation

When reading a Delta table as a stream source, the Delta format is self-describing - it automatically infers its own schema from the table metadata, making .schema(schema) unnecessary and potentially conflicting. Replacing it with .option("maxFilesPerTrigger", 1) correctly configures the stream by telling Spark how many new Delta files to process per micro-batch trigger, which is the idiomatic way to control streaming throughput from a Delta source.

Why the distractors are wrong:

  • A - Replacing the table name with a path is unrelated to streaming compatibility; the table reference itself is not the problem.
  • C - The code already includes spark.readStream (the streaming reader is implied by the question setup); simply swapping .read for .readStream is insufficient because the incompatible .schema() call remains.
  • D - "stream" is not a valid Spark format name; Delta tables are always read with format("delta"), whether in batch or streaming mode.
  • E - MLflow-based prediction functions (e.g., spark_udf) are already designed to work with both batch and streaming DataFrames; the function itself does not need to change.

Memory tip: Think of Delta as "self-aware" - it carries its schema internally, so never hand it one manually in streaming mode. Instead, hand it a flow-control option like maxFilesPerTrigger to manage how fast it processes incoming files.

Community Discussion

No community discussion yet for this question.

Full CERTIFIED-MACHINE-LEARNING-PROFESSIONAL PracticeBrowse All CERTIFIED-MACHINE-LEARNING-PROFESSIONAL Questions