nerdexam
MicrosoftMicrosoft

DP-500 · Question #10

DP-500 Question #10: Real Exam Question with Answer & Explanation

The correct answer is D: describe. To display statistical summaries of a Spark DataFrame in a tabular view, the describe() method should be invoked.

Explore and visualize data

Question

You are using a Python notebook in an Apache Spark pool in Azure Synapse Analytics. You need to present the data distribution statistics from a DataFrame in a tabular view. Which method should you invoke on the DataFrame?

Options

  • AfreqItems
  • Bexplain
  • Crollup
  • Ddescribe

Explanation

To display statistical summaries of a Spark DataFrame in a tabular view, the describe() method should be invoked.

Common mistakes.

  • A. freqItems is used to find frequent items in a DataFrame, not to generate general statistical summaries.
  • B. explain is used to print the physical and logical plans of a DataFrame operation, useful for debugging and optimization, not for data distribution statistics.
  • C. rollup is an aggregation function used to create sub-total and grand-total rows, similar to SQL's ROLLUP, and does not directly provide overall distribution statistics.

Concept tested. PySpark DataFrame descriptive statistics

Reference. https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.describe.html

Topics

#PySpark DataFrame#Descriptive Statistics#Azure Synapse Analytics#Data Exploration

Community Discussion

No community discussion yet for this question.

Full DP-500 PracticeBrowse All DP-500 Questions