nerdexam
DatabricksDatabricks

CERTIFIED-MACHINE-LEARNING-PROFESSIONAL · Question #22

CERTIFIED-MACHINE-LEARNING-PROFESSIONAL Question #22: Real Exam Question with Answer & Explanation

Option D is correct because MLflow provides mlflow.pyfunc.spark_udf(), which wraps a logged MLflow model as a Spark User Defined Function (UDF). The correct pattern loads the model using its URI (runs:/{run_id}/random_forest_model) and registers it as a callable UDF that can be a

Question

A machine learning engineer has developed a random forest model using scikit-learn, logged the model using MLflow as random_forest_model, and stored its run ID in the run_id Python variable. They now want to deploy that model by performing batch inference on a Spark DataFrame spark_df. Which of the following code blocks can they use to create a function called predict that they can use to complete the task? A.

Options

  • BIt is not possible to deploy a scikit-learn model on a Spark DataFrame.

Explanation

Option D is correct because MLflow provides mlflow.pyfunc.spark_udf(), which wraps a logged MLflow model as a Spark User Defined Function (UDF). The correct pattern loads the model using its URI (runs:/{run_id}/random_forest_model) and registers it as a callable UDF that can be applied column-wise across a Spark DataFrame - bridging scikit-learn's single-node world with Spark's distributed execution.

Option B is wrong because it is absolutely possible to deploy a scikit-learn model on a Spark DataFrame - this is precisely the use case mlflow.pyfunc.spark_udf() was designed for. MLflow's pyfunc flavor acts as a universal wrapper that handles the serialization and distribution automatically.

The other distractors (A, C) likely show incorrect approaches such as using mlflow.sklearn.load_model() directly and calling .predict() on the Spark DataFrame (which would fail because Spark DataFrames don't support pandas-style method calls), or passing the wrong URI format / argument names to spark_udf.

Memory tip: Think "Spark needs a UDF" - whenever you need to apply any MLflow model (not just scikit-learn) to a Spark DataFrame, reach for mlflow.pyfunc.spark_udf(spark, model_uri=...). The spark session is the first argument because Spark needs to register the function within its context before it can distribute the work.

Community Discussion

No community discussion yet for this question.

Full CERTIFIED-MACHINE-LEARNING-PROFESSIONAL PracticeBrowse All CERTIFIED-MACHINE-LEARNING-PROFESSIONAL Questions