nerdexam
AmazonAmazon

MLS-C01 · Question #382

MLS-C01 Question #382: Real Exam Question with Answer & Explanation

The correct answer is A: Perform one-hot encoding on every possible option for each question of the survey.. To comprehensively represent multiple selections for each survey question in a dataset for logistic regression, one-hot encoding is the most appropriate method. This technique converts each possible option into a binary feature, clearly indicating chosen responses.

Data Engineering

Question

A company distributes an online multiple-choice survey to several thousand people. Respondents to the survey can select multiple options for each question. A machine learning (ML) engineer needs to comprehensively represent every response from all respondents in a dataset. The ML engineer will use the dataset to train a logistic regression model. Which solution will meet these requirements?

Options

  • APerform one-hot encoding on every possible option for each question of the survey.
  • BPerform binning on all the answers each respondent selected for each question.
  • CUse Amazon Mechanical Turk to create categorical labels for each set of possible responses.
  • DUse Amazon Textract to create numeric features for each set of possible responses.

Explanation

To comprehensively represent multiple selections for each survey question in a dataset for logistic regression, one-hot encoding is the most appropriate method. This technique converts each possible option into a binary feature, clearly indicating chosen responses.

Common mistakes.

  • B. Binning is used for categorizing continuous numerical data into ranges, not for representing discrete, multi-select categorical choices from a survey.
  • C. Amazon Mechanical Turk is a crowdsourcing service for human-based tasks like data labeling, not for the automated structuring of existing survey responses into a dataset.
  • D. Amazon Textract is an optical character recognition (OCR) service for extracting text from documents, which is irrelevant for processing digital survey responses.

Concept tested. Data preparation - one-hot encoding for multi-label categorical features

Reference. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

Topics

#Data Preprocessing#One-Hot Encoding#Categorical Data

Community Discussion

No community discussion yet for this question.

Full MLS-C01 PracticeBrowse All MLS-C01 Questions