nerdexam
MicrosoftMicrosoft

DP-203 · Question #277

DP-203 Question #277: Real Exam Question with Answer & Explanation

The correct sequence uses 'createDataFrame' to initialize the DataFrame from JSON data, then 'select' to choose and transform columns, 'explode' to flatten the nested array structure (expanding each array element into its own row), 'alias' to rename the resulting exploded column

Submitted by wei.xz· Mar 30, 2026Design and implement data engineering workloads using Azure Databricks, including transforming semi-structured data (JSON) into structured tabular formats using PySpark DataFrame APIs - typically aligned with the 'Implement data transformation' or 'Process and serve data' domain of the DP-203 Azure Data Engineer Associate certification.

Question

Drag and Drop Question You use PySpark in Azure Databricks to parse the following JSON input. You need to output the data in the following tabular format. How should you complete the PySpark code? To answer, drag the appropriate values to the correct targets. Each value may be used once, more than once, or not at all. You may need to drag the spit bar between panes or scroll to view content. NOTE: Each correct selection is worth one point. Answer:

Explanation

The correct sequence uses 'createDataFrame' to initialize the DataFrame from JSON data, then 'select' to choose and transform columns, 'explode' to flatten the nested array structure (expanding each array element into its own row), 'alias' to rename the resulting exploded column to a meaningful name, and a final 'select' to shape the output into the desired tabular format. This pipeline is the standard PySpark pattern for normalizing nested JSON arrays into flat tabular data.

Topics

#PySpark#Azure Databricks#JSON Parsing#DataFrame Operations#explode() function

Community Discussion

No community discussion yet for this question.

Full DP-203 PracticeBrowse All DP-203 Questions