MicrosoftMicrosoft
DP-500 · Question #10
DP-500 Question #10: Real Exam Question with Answer & Explanation
The correct answer is D: describe. To display statistical summaries of a Spark DataFrame in a tabular view, the describe() method should be invoked.
Explore and visualize data
Question
You are using a Python notebook in an Apache Spark pool in Azure Synapse Analytics. You need to present the data distribution statistics from a DataFrame in a tabular view. Which method should you invoke on the DataFrame?
Options
- AfreqItems
- Bexplain
- Crollup
- Ddescribe
Explanation
To display statistical summaries of a Spark DataFrame in a tabular view, the describe() method should be invoked.
Common mistakes.
- A. freqItems is used to find frequent items in a DataFrame, not to generate general statistical summaries.
- B. explain is used to print the physical and logical plans of a DataFrame operation, useful for debugging and optimization, not for data distribution statistics.
- C. rollup is an aggregation function used to create sub-total and grand-total rows, similar to SQL's ROLLUP, and does not directly provide overall distribution statistics.
Concept tested. PySpark DataFrame descriptive statistics
Reference. https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.describe.html
Topics
#PySpark DataFrame#Descriptive Statistics#Azure Synapse Analytics#Data Exploration
Community Discussion
No community discussion yet for this question.