CERTIFIED-MACHINE-LEARNING-PROFESSIONAL · Question #48
CERTIFIED-MACHINE-LEARNING-PROFESSIONAL Question #48: Real Exam Question with Answer & Explanation
The correct answer is E: There is no longer a need for pipeline-like machine learning objects. Wrapping preprocessing inside a custom pyfunc model class creates a self-contained artifact: the preprocessing logic is bundled with the model itself, so any system loading the model for deployment gets preprocessing "for free" without needing a separate sklearn Pipeline, ColumnT
Question
A machine learning engineer wants to log and deploy a model as an MLflow pyfunc model. They have custom preprocessing that needs to be completed on feature variables prior to fitting the model or computing predictions using that model. They decide to wrap this preprocessing in a custom model class ModelWithPreprocess, where the preprocessing is performed when calling fit and when calling predict. They then log the fitted model of the ModelWithPreprocess class as a pyfunc model. Which of the following is a benefit of this approach when loading the logged pyfunc model for downstream deployment?
Options
- AThe pvfunc model can be used to deploy models in a parallelizable fashion
- BThe same preprocessing logic will automatically be applied when calling fit
- CThe same preprocessing logic will automatically be applied when calling predict
- DThis approach has no impact when loading the logged Pvfunc model for downstream deployment
- EThere is no longer a need for pipeline-like machine learning objects
Explanation
Wrapping preprocessing inside a custom pyfunc model class creates a self-contained artifact: the preprocessing logic is bundled with the model itself, so any system loading the model for deployment gets preprocessing "for free" without needing a separate sklearn Pipeline, ColumnTransformer, or similar wrapper. The pyfunc interface effectively acts as its own pipeline abstraction, making external pipeline objects redundant - that's the core benefit captured in E.
Why the distractors fail:
- A is wrong because pyfunc is a deployment interface, not a parallelization mechanism - it says nothing about how inference is distributed.
- B is wrong because a loaded pyfunc model is used for inference, not training;
fitis irrelevant at deployment time, and pyfunc'spredictinterface doesn't expose afitmethod at all. - C is tempting but not the key benefit - yes, preprocessing runs during
predict, but this is also true of any approach that includes preprocessing in predict logic. The specific benefit of the pyfunc approach is that no external pipeline wrapper is needed, which is what E describes. - D is wrong because the approach clearly does have an impact: preprocessing travels with the model artifact.
Memory tip: Think of a pyfunc model as a lunchbox - everything you need (model + preprocessing) is packed inside. When you pick it up (load it), you don't need a separate container (Pipeline). If the preprocessing were outside the model, deployment teams would have to recreate it separately, which is a common source of train-serve skew bugs.
Community Discussion
No community discussion yet for this question.