nerdexam
AmazonAmazon

MLS-C01 · Question #194

MLS-C01 Question #194: Real Exam Question with Answer & Explanation

The correct answer is B: Apply principal component analysis (PCA).. {"question_number": 7, "question_summary": "Handling multicollinearity in a 100-feature tabular customer dataset before model training.", "correct_answer": "B, C", "explanation": "High inter-feature correlation (multicollinearity) inflates variance in model coefficients, reduces

Modeling

Question

A data engineer at a bank is evaluating a new tabular dataset that includes customer data. The data engineer will use the customer data to create a new model to predict customer behavior. After creating a correlation matrix for the variables, the data engineer notices that many of the 100 features are highly correlated with each other. Which steps should the data engineer take to address this issue? (Choose two.)

Options

  • AUse a linear-based algorithm to train the model.
  • BApply principal component analysis (PCA).
  • CRemove a portion of highly correlated features from the dataset.
  • DApply min-max feature scaling to the dataset.
  • EApply one-hot encoding category-based variables.

Explanation

{"question_number": 7, "question_summary": "Handling multicollinearity in a 100-feature tabular customer dataset before model training.", "correct_answer": "B, C", "explanation": "High inter-feature correlation (multicollinearity) inflates variance in model coefficients, reduces interpretability, and can harm predictive performance. Option B (PCA - Principal Component Analysis) mathematically transforms correlated features into a smaller set of uncorrelated principal components, directly eliminating multicollinearity while retaining the maximum explained variance. Option C (removing highly correlated features) is a straightforward, interpretable approach - selecting one feature from each correlated group reduces redundancy. Option A (using a linear algorithm) is actually counterproductive: linear models are the most sensitive to multicollinearity, not immune to it. Option D (min-max scaling) normalizes feature ranges but does nothing to reduce correlation between features. Option E (one-hot encoding) applies to categorical variables and is unrelated to the correlation problem described.", "generated_by": "claude-sonnet", "llm_judge_score": 3}

Topics

#Feature Preprocessing#Multicollinearity#Dimensionality Reduction#Principal Component Analysis

Community Discussion

No community discussion yet for this question.

Full MLS-C01 PracticeBrowse All MLS-C01 Questions