nerdexam
Cloudera

DS-200 · Question #62

DS-200 Question #62: Real Exam Question with Answer & Explanation

The correct answer is D. Perform matched sampling across other provided variables. See the full explanation below for the reasoning.

Question

There are 20 patients with acute lymphoblastic leukemia (ALL) and 32 patients with acute myeloid leukemia (AML), both variants of a blood cancer. The makeup of the groups as follows: Each individual has an expression value for each of 10000 different genes. The expression value for each gene is a continuous value between -1 and 1. You've built your model for discriminating between AML and ALL patients and you find that it works quite well on your current data. One month later, a collaboration tells you she has fresh data from 100 new AML/ALL patients. You run the samples through your model, and turns out your model has very poor predictive accuracy on the new samples; specifically, your model predicts that all males have ALL. What is the most reliable way to fix this problem?

Options

  • AChange the distance metric
  • BReduce the number of dimensions
  • CUse a Gibbs sampler on a Bayesian network
  • DPerform matched sampling across other provided variables

Community Discussion

No community discussion yet for this question.

Full DS-200 Practice