nerdexam
AmazonAmazon

MLS-C01 · Question #379

MLS-C01 Question #379: Real Exam Question with Answer & Explanation

The correct answer is D: Encode the country codes into numeric variables by using one-hot encoding.. To transform categorical country codes for model training with the smallest dimensionality increase and no information loss, one-hot encoding is the appropriate solution.

Modeling

Question

A data scientist is conducting exploratory data analysis (EDA) on a dataset that contains information about product suppliers. The dataset records the country where each product supplier is located as a two-letter text code. For example, the code for New Zealand is "NZ." The data scientist needs to transform the country codes for model training. The data scientist must choose the solution that will result in the smallest increase in dimensionality. The solution must not result in any information loss. Which solution will meet these requirements?

Options

  • AAdd a new column of data that includes the full country name.
  • BEncode the country codes into numeric variables by using similarity encoding.
  • CMap the country codes to continent names.
  • DEncode the country codes into numeric variables by using one-hot encoding.

Explanation

To transform categorical country codes for model training with the smallest dimensionality increase and no information loss, one-hot encoding is the appropriate solution.

Common mistakes.

  • A. Adding a new column with full country names merely provides redundant text data and does not convert the categorical codes into a numerical format suitable for many machine learning models.
  • B. Similarity encoding (or embedding) can be used to represent categories in a lower-dimensional space, but it may involve some information loss depending on the specific method and desired dimensionality, and it's generally more complex than one-hot encoding for preserving all distinct categorical information.
  • C. Mapping country codes to continent names significantly reduces dimensionality but results in substantial information loss, as the distinct identity of individual countries is no longer preserved, directly violating the 'no information loss' requirement.

Concept tested. Categorical feature encoding

Reference. https://docs.aws.amazon.com/sagemaker/latest/dg/feature-processing.html

Topics

#Feature Encoding#Categorical Data#Dimensionality Reduction#Data Preprocessing

Community Discussion

No community discussion yet for this question.

Full MLS-C01 PracticeBrowse All MLS-C01 Questions