CompTIACompTIA
DA0-001 · Question #152
DA0-001 Question #152: Real Exam Question with Answer & Explanation
The correct answer is B: R. For calculating a range of statistical measures like interquartile range, median, mean, and standard deviation on a large dataset of 5,000,000 rows, R is the most suitable tool.
Data Analysis
Question
Which of the following tools would be best to use to calculate the interquartile range, median, mean, and standard deviation of a column in a table that has 5,000,000 rows?
Options
- AMicrosoft Excel
- BR
- CSnowflake
- DSQL
Explanation
For calculating a range of statistical measures like interquartile range, median, mean, and standard deviation on a large dataset of 5,000,000 rows, R is the most suitable tool.
Common mistakes.
- A. Microsoft Excel would struggle significantly with performance and memory handling for a dataset of 5,000,000 rows, making it impractical for these calculations.
- C. Snowflake is a cloud data warehouse optimized for querying and storing large datasets; while it can perform some aggregate functions, it is not primarily a statistical analysis tool like R.
- D. SQL can perform basic aggregate functions like mean and standard deviation, and median with window functions, but calculating the interquartile range can be cumbersome, and SQL is not as specialized or efficient for a comprehensive suite of statistical analyses on large datasets compared to R.
Concept tested. Statistical analysis tools for large datasets
Reference. www.r-project.org/about.html
Topics
#Statistical computing#Data analysis tools#Large datasets#R programming
Community Discussion
No community discussion yet for this question.