DP-600 · Question #120
DP-600 Question #120: Real Exam Question with Answer & Explanation
The correct answer is A: Yes. In PySpark, df.describe() computes summary statistics and returns a DataFrame containing count, mean, stddev (standard deviation), min, and max for each column. It covers both numeric columns (all five statistics) and string columns (count, min, and max as lexicographic values).
Question
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution. After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen. You have a Fabric tenant that contains a new semantic model in OneLake. You use a Fabric notebook to read the data into a Spark DataFrame. You need to evaluate the data to calculate the min, max, mean, and standard deviation values for all the string and numeric columns. Solution: You use the following PySpark expression: df.describe().show() Does this meet the goal?
Options
- AYes
- BNo
Explanation
In PySpark, df.describe() computes summary statistics and returns a DataFrame containing count, mean, stddev (standard deviation), min, and max for each column. It covers both numeric columns (all five statistics) and string columns (count, min, and max as lexicographic values). Calling .show() displays the result in the console. This directly satisfies the requirement to calculate min, max, mean, and standard deviation for all string and numeric columns in a single expression. It is important to note that df.summary() would provide additional percentile statistics, but df.describe() is fully sufficient for the four metrics specified in this requirement.
Topics
Community Discussion
No community discussion yet for this question.