CERTIFIED-DATA-ANALYST-ASSOCIATE · Question #65
CERTIFIED-DATA-ANALYST-ASSOCIATE Question #65: Real Exam Question with Answer & Explanation
The correct answer is D: When the variable contains a lot of extreme outliers. D is correct because extreme outliers pull the mean toward them (since mean uses every value in its calculation), while the median - being the middle value - remains resistant to outliers. A heavily skewed distribution, like household incomes where a few billionaires exist, will
Question
In which circumstance will there be a substantial difference between the variable's mean and median values?
Options
- AWhen the variable is of the categorical type
- BWhen the variable is of the boolean type
- CWhen the variable contains no outliers
- DWhen the variable contains a lot of extreme outliers
Explanation
D is correct because extreme outliers pull the mean toward them (since mean uses every value in its calculation), while the median - being the middle value - remains resistant to outliers. A heavily skewed distribution, like household incomes where a few billionaires exist, will have a mean far higher than the median.
Why the distractors are wrong:
- A (categorical): Mean and median aren't typically meaningful for categorical variables at all - the comparison is irrelevant.
- B (boolean): A boolean (0/1) variable can have a mean (proportion of 1s) and a median, but these won't substantially differ in unusual ways beyond what any skewed distribution causes.
- C (no outliers): Without outliers, the distribution tends to be more symmetric, so mean and median stay close to each other - the opposite of what the question asks.
Memory tip: Think of the mean as a "sensitive" measure (easily disturbed by extremes) and the median as a "stubborn" one (ignores extremes). A big gap between them is a red flag that outliers are dragging the mean away - statisticians call this skewness.
Community Discussion
No community discussion yet for this question.