You have built a model that is trained on data stored in Parquet files. You access the data through a Hive table hosted on Google Cloud. You preprocessed these data with PySpark and exported it as a C

Sign in or unlock PROFESSIONAL-MACHINE-LEARNING-ENGINEER to reveal the answer and full explanation for question #153. The question stem and answer options stay visible for context.

Submitted by diego_uy· Apr 18, 2026ML pipeline operationalization

Question

You have built a model that is trained on data stored in Parquet files. You access the data through a Hive table hosted on Google Cloud. You preprocessed these data with PySpark and exported it as a CSV file into Cloud Storage. After preprocessing, you execute additional steps to train and evaluate your model. You want to parametrize this model training in Kubeflow Pipelines. What should you do?

Options

ARemove the data transformation step from your pipeline.
BContainerize the PySpark transformation step, and add it to your pipeline.
CAdd a ContainerOp to your pipeline that spins a Dataproc cluster, runs a transformation, and then
DDeploy Apache Spark at a separate node pool in a Google Kubernetes Engine cluster. Add a

Unlock PROFESSIONAL-MACHINE-LEARNING-ENGINEER to see the answer

You've previewed enough free PROFESSIONAL-MACHINE-LEARNING-ENGINEER questions. Unlock PROFESSIONAL-MACHINE-LEARNING-ENGINEER for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.

Unlock PROFESSIONAL-MACHINE-LEARNING-ENGINEER - $49.99 / 30 days Sign in

Topics

#Kubeflow Pipelines#Dataproc#PySpark#Data Transformation

Full PROFESSIONAL-MACHINE-LEARNING-ENGINEER Practice