PROFESSIONAL-MACHINE-LEARNING-ENGINEER · Question #274
PROFESSIONAL-MACHINE-LEARNING-ENGINEER Question #274: Real Exam Question with Answer & Explanation
The correct answer is C: 1. Use TFX components with Dataflow to encode the text features and scale the numerical. For large-scale tabular data from BigQuery requiring MaxMin scaling and one-hot encoding for a custom TensorFlow model trained over multiple epochs, leverage TFX components with Dataflow to minimize effort and cost.
Question
You are developing a custom TensorFlow classification model based on tabular data. Your raw data is stored in BigQuery. contains hundreds of millions of rows, and includes both categorical and numerical features. You need to use a MaxMin scaler on some numerical features, and apply a one-hot encoding to some categorical features such as SKU names. Your model will be trained over multiple epochs. You want to minimize the effort and cost of your solution. What should you do?
Options
- A1. Write a SQL query to create a separate lookup table to scale the numerical features.
- B1. Use BigQuery to scale the numerical features.
- C1. Use TFX components with Dataflow to encode the text features and scale the numerical
- D1. Write a SQL query to create a separate lookup table to scale the numerical features.
Explanation
For large-scale tabular data from BigQuery requiring MaxMin scaling and one-hot encoding for a custom TensorFlow model trained over multiple epochs, leverage TFX components with Dataflow to minimize effort and cost.
Common mistakes.
- A. Using SQL queries in BigQuery to create lookup tables for scaling or performing encoding can be inefficient and complex to manage for a large number of features and rows, especially when ensuring consistency between training and serving, and for iterative model development.
- B. While BigQuery can perform some data transformations, complex preprocessing like robust one-hot encoding for many categorical features or global MaxMin scaling values consistent across large datasets and multiple training epochs is more efficiently and robustly handled by a dedicated ML preprocessing framework like TFX/TensorFlow Transform, which integrates well with TensorFlow models.
- D. Using SQL queries in BigQuery to create lookup tables for scaling or performing encoding can be inefficient and complex to manage for a large number of features and rows, especially when ensuring consistency between training and serving, and for iterative model development.
Concept tested. Large-scale data preprocessing with TFX/Dataflow
Reference. https://cloud.google.com/vertex-ai/docs/pipelines/build-tfx-pipeline
Topics
Community Discussion
No community discussion yet for this question.