MLS-C01 · Question #194
MLS-C01 Question #194: Real Exam Question with Answer & Explanation
The correct answer is B: Apply principal component analysis (PCA).. {"question_number": 7, "question_summary": "Handling multicollinearity in a 100-feature tabular customer dataset before model training.", "correct_answer": "B, C", "explanation": "High inter-feature correlation (multicollinearity) inflates variance in model coefficients, reduces
Question
A data engineer at a bank is evaluating a new tabular dataset that includes customer data. The data engineer will use the customer data to create a new model to predict customer behavior. After creating a correlation matrix for the variables, the data engineer notices that many of the 100 features are highly correlated with each other. Which steps should the data engineer take to address this issue? (Choose two.)
Options
- AUse a linear-based algorithm to train the model.
- BApply principal component analysis (PCA).
- CRemove a portion of highly correlated features from the dataset.
- DApply min-max feature scaling to the dataset.
- EApply one-hot encoding category-based variables.
Explanation
{"question_number": 7, "question_summary": "Handling multicollinearity in a 100-feature tabular customer dataset before model training.", "correct_answer": "B, C", "explanation": "High inter-feature correlation (multicollinearity) inflates variance in model coefficients, reduces interpretability, and can harm predictive performance. Option B (PCA - Principal Component Analysis) mathematically transforms correlated features into a smaller set of uncorrelated principal components, directly eliminating multicollinearity while retaining the maximum explained variance. Option C (removing highly correlated features) is a straightforward, interpretable approach - selecting one feature from each correlated group reduces redundancy. Option A (using a linear algorithm) is actually counterproductive: linear models are the most sensitive to multicollinearity, not immune to it. Option D (min-max scaling) normalizes feature ranges but does nothing to reduce correlation between features. Option E (one-hot encoding) applies to categorical variables and is unrelated to the correlation problem described.", "generated_by": "claude-sonnet", "llm_judge_score": 3}
Topics
Community Discussion
No community discussion yet for this question.