A company is building a new supervised classification model in an AWS environment. The company's data science team notices that the dataset has a large quantity of variables. All the variables are numeric. The model accuracy for training and validation is low. The model's processing time is affected by high latency. The data science team needs to increase the accuracy of the model and decrease the processing time. What should the data science team do to meet these requirements?

Question

Accepted Answer

B. Use a principal component analysis (PCA) model. Principal Component Analysis (PCA) is a dimensionality reduction technique that transforms a large set of correlated numeric variables into a smaller, uncorrelated set of principal components. This reduces computational complexity, decreases processing time, and can improve model accuracy by mitigating overfitting and removing noise, directly addressing both requirements for a dataset with many numeric variables.

Answer

A. Create new features and interaction variables. Creating new features and interaction variables would increase the dataset's dimensionality, exacerbating high latency and potentially worsening accuracy, contradicting the stated requirements.

Answer

C. Apply normalization on the feature set. Applying normalization scales feature values but does not reduce the number of variables or directly address high latency caused by a large quantity of variables.

Answer

D. Use a multiple correspondence analysis (MCA) model. Multiple Correspondence Analysis (MCA) is a dimensionality reduction technique specifically designed for categorical variables, making it unsuitable for a dataset where all variables are numeric.

A company is building a new supervised classification model in an AWS environment. The company's data science team notices that the dataset has a large quantity of variables. All the variables are num

Question

Options

How the community answered

Why each option

Topics

Community Discussion