Cloudera
DS-200 · Question #38
DS-200 Question #38: Real Exam Question with Answer & Explanation
The correct answer is C. Method C. See the full explanation below for the reasoning.
Question
You have a large file of N records (one per line), and want to randomly sample 10% them. You have two functions that are perfect random number generators (through they are a bit slow): Random_uniform () generates a uniformly distributed number in the interval [0, 1] random_permotation (M) generates a random permutation of the number O through M -1. Below are three different functions that implement the sampling. Method A For line in file: If random_uniform () < 0.1; Print line Method B i = 0 for line in file: if i % 10 = = 0; print line i += 1 Method C idxs = random_permotation (N) [: (N/10)] i = 0 for line in file: if i in idxs: print line i +=1 Which method might introduce unexpected correlations?
Options
- AMethod A
- BMethod B
- CMethod C
Community Discussion
No community discussion yet for this question.