MLS-C01 · Question #189
MLS-C01 Question #189: Real Exam Question with Answer & Explanation
The correct answer is B: Add new columns that store one-hot representation of colors.. {"question_number": 4, "question_summary": "Feature engineering a nominal categorical color variable (Red, White, Green) for a numerical regression model.", "correct_answer": "B, E", "explanation": "Wall_Color is a nominal categorical variable - its values have no inherent order.
Question
A machine learning specialist is developing a regression model to predict rental rates from rental listings. A variable named Wall_Color represents the most prominent exterior wall color of the property. The following is the sample data, excluding all other variables: The specialist chose a model that needs numerical input data. Which feature engineering approaches should the specialist use to allow the regression model to learn from the Wall_Color data? (Choose two.)
Options
- AApply integer transformation and set Red = 1, White = 5, and Green = 10.
- BAdd new columns that store one-hot representation of colors.
- CReplace the color name string by its length.
- DCreate three columns to encode the color in RGB format.
- EReplace each color name by its training set frequency.
Explanation
{"question_number": 4, "question_summary": "Feature engineering a nominal categorical color variable (Red, White, Green) for a numerical regression model.", "correct_answer": "B, E", "explanation": "Wall_Color is a nominal categorical variable - its values have no inherent order. Any encoding must avoid implying a ranking. Option B (one-hot encoding) creates a separate binary column per color, cleanly representing membership without any false ordinal relationship. Option E (frequency encoding) replaces each color with how often it appears in the training set, which can capture prevalence signals the model may find useful. Option A (integer encoding with arbitrary values like 1, 5, 10) is incorrect because it implies Red < White < Green, introducing a false ordinal relationship the model will exploit incorrectly. Option C (replacing with string length) discards all meaningful color information. Option D (RGB encoding) adds three highly correlated columns that introduce engineering complexity without clear benefit for a regression task predicting rental rates.", "generated_by": "claude-sonnet", "llm_judge_score": 3}
Topics
Community Discussion
No community discussion yet for this question.