nerdexam
AmazonAmazon

MLS-C01 · Question #365

MLS-C01 Question #365: Real Exam Question with Answer & Explanation

The correct answer is B: Apply Synthetic Minority Oversampling Technique (SMOTE) to the training dataset before training.. A churn prediction model returning only false results from a dataset with a 10:1 class imbalance indicates a bias towards the majority class; applying Synthetic Minority Oversampling Technique (SMOTE) to the training dataset will balance the classes and enable accurate prediction

Modeling

Question

A global bank requires a solution to predict whether customers will leave the bank and choose another bank. The bank is using a dataset to train a model to predict customer loss. The training dataset has 1,000 rows. The training dataset includes 100 instances of customers who left the bank. A machine learning (ML) specialist is using Amazon SageMaker Data Wrangler to train a churn prediction model by using a SageMaker training job. After training, the ML specialist notices that the model returns only false results. The ML specialist must correct the model so that it returns more accurate predictions. Which solution will meet these requirements?

Options

  • AApply anomaly detection to remove outliers from the training dataset before training.
  • BApply Synthetic Minority Oversampling Technique (SMOTE) to the training dataset before training.
  • CApply normalization to the features of the training dataset before training.
  • DApply undersampling to the training dataset before training.

Explanation

A churn prediction model returning only false results from a dataset with a 10:1 class imbalance indicates a bias towards the majority class; applying Synthetic Minority Oversampling Technique (SMOTE) to the training dataset will balance the classes and enable accurate predictions.

Common mistakes.

  • A. Anomaly detection primarily identifies unusual data points or outliers, which is a different problem from class imbalance causing a model to predict only the majority class.
  • C. Normalizing features scales numerical data, which can improve an algorithm's performance or convergence but does not address the fundamental problem of class imbalance that leads to biased predictions.
  • D. Undersampling the majority class would reduce the training data from 900 majority instances to 100, discarding a large amount of potentially valuable information, which is generally not ideal for datasets of only 1000 rows and a 10:1 imbalance where the minority class is already small.

Concept tested. Handling class imbalance in machine learning

Reference. https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-transforms-smote.html

Topics

#Class Imbalance#SMOTE#Data Preprocessing#Classification

Community Discussion

No community discussion yet for this question.

Full MLS-C01 PracticeBrowse All MLS-C01 Questions