nerdexam
AmazonAmazon

MLS-C01 · Question #8

MLS-C01 Question #8: Real Exam Question with Answer & Explanation

The correct answer is B: Replace the age field value for records with a value of 0 with the mean or median value from the. {"question_number": 4, "correct_answer": "B", "explanation": "Ages recorded as 0 for patients who are all over 65 are clearly data entry errors, not true values. The appropriate fix is imputation: replacing the erroneous 0s with the mean or median age computed from valid records.

Data Engineering

Question

A Data Scientist is developing a machine learning model to predict future patient outcomes based on information collected about each patient and their treatment plans. The model should output a continuous value as its prediction. The data available includes labeled outcomes for a set of 4,000 patients. The study was conducted on a group of individuals over the age of 65 who have a particular disease that is known to worsen with age. Initial models have performed poorly. While reviewing the underlying data, the Data Scientist notices that, out of 4,000 patient observations, there are 450 where the patient age has been input as 0. The other features for these observations appear normal compared to the rest of the sample population How should the Data Scientist correct this issue?

Options

  • ADrop all records from the dataset where age has been set to 0.
  • BReplace the age field value for records with a value of 0 with the mean or median value from the
  • CDrop the age feature from the dataset and train the model using the rest of the features.
  • DUse k-means clustering to handle missing features

Explanation

{"question_number": 4, "correct_answer": "B", "explanation": "Ages recorded as 0 for patients who are all over 65 are clearly data entry errors, not true values. The appropriate fix is imputation: replacing the erroneous 0s with the mean or median age computed from valid records. This preserves all 4,000 observations and maintains statistical properties of the dataset. Dropping the 450 records (A) discards roughly 11% of an already small labeled dataset. Dropping the age feature entirely (C) removes a potentially important predictor for a condition that worsens with age. K-means clustering (D) is an unsupervised grouping algorithm and is not a mechanism for imputing missing or invalid feature values.", "generated_by": "claude-sonnet", "llm_judge_score": 4}

Topics

#Data Cleaning#Missing Data Imputation#Data Preprocessing

Community Discussion

No community discussion yet for this question.

Full MLS-C01 PracticeBrowse All MLS-C01 Questions