Cloudera
DS-200 · Question #62
DS-200 Question #62: Real Exam Question with Answer & Explanation
The correct answer is D. Perform matched sampling across other provided variables. See the full explanation below for the reasoning.
Question
There are 20 patients with acute lymphoblastic leukemia (ALL) and 32 patients with acute myeloid leukemia (AML), both variants of a blood cancer. The makeup of the groups as follows: Each individual has an expression value for each of 10000 different genes. The expression value for each gene is a continuous value between -1 and 1. You've built your model for discriminating between AML and ALL patients and you find that it works quite well on your current data. One month later, a collaboration tells you she has fresh data from 100 new AML/ALL patients. You run the samples through your model, and turns out your model has very poor predictive accuracy on the new samples; specifically, your model predicts that all males have ALL. What is the most reliable way to fix this problem?
Options
- AChange the distance metric
- BReduce the number of dimensions
- CUse a Gibbs sampler on a Bayesian network
- DPerform matched sampling across other provided variables
Community Discussion
No community discussion yet for this question.