CERTIFIED-MACHINE-LEARNING-PROFESSIONAL · Question #46
CERTIFIED-MACHINE-LEARNING-PROFESSIONAL Question #46: Real Exam Question with Answer & Explanation
The correct answer is B: spark.read.format("delta").load(exp_id). Note on the stated correct answer: Based on Databricks MLflow documentation, this question likely contains an error - option E is actually the correct answer, not B. Here's why: spark.read.format("mlflow-experiment").load(exp_id) is the purpose-built Databricks data source that a
Question
A data scientist is utilizing MLflow to track their machine learning experiments. After completing a series of runs for the experiment with experiment ID exp_id, the data scientist wants to programmatically work with the experiment run data in a Spark DataFrame. They have an active MLflow Client client and an active Spark session spark. Which of the following lines of code can be used to obtain run-level results for exp_id in a Spark DataFrame?
Options
- Aclient.list_run_infos(exp_id)
- Bspark.read.format("delta").load(exp_id)
- CThere is no way to programmatically return row-level results from an MLflow Experiment.
- Dmlflow.search_runs(exp_id)
- Espark.read.format("mlflow-experiment").load(exp_id)
Explanation
Note on the stated correct answer: Based on Databricks MLflow documentation, this question likely contains an error - option E is actually the correct answer, not B. Here's why:
spark.read.format("mlflow-experiment").load(exp_id) is the purpose-built Databricks data source that accepts an experiment ID directly and returns run metadata as a Spark DataFrame. This is explicitly documented in the Databricks MLflow integration docs.
Why the other options are wrong:
- B -
spark.read.format("delta").load(exp_id)reads a Delta table from a file path, not an experiment ID. While MLflow data is stored as Delta under the hood, you can't pass a bare experiment ID string as a path - this would fail or produce unexpected results. - A -
client.list_run_infos(exp_id)returns a Python list ofRunInfoobjects, not a Spark DataFrame. - D -
mlflow.search_runs(exp_id)returns a pandas DataFrame, not a Spark DataFrame. - C - Clearly false; both A and D show that programmatic access exists.
Memory tip: The "mlflow-experiment" format name is self-describing - when you want MLflow data in Spark, use the format that literally says "mlflow-experiment". If your exam question says B is correct instead of E, that's likely a question bank error worth flagging to your instructor.
Community Discussion
No community discussion yet for this question.