DP-203 · Question #277
DP-203 Question #277: Real Exam Question with Answer & Explanation
The correct sequence uses 'createDataFrame' to initialize the DataFrame from JSON data, then 'select' to choose and transform columns, 'explode' to flatten the nested array structure (expanding each array element into its own row), 'alias' to rename the resulting exploded column
Question
Drag and Drop Question You use PySpark in Azure Databricks to parse the following JSON input. You need to output the data in the following tabular format. How should you complete the PySpark code? To answer, drag the appropriate values to the correct targets. Each value may be used once, more than once, or not at all. You may need to drag the spit bar between panes or scroll to view content. NOTE: Each correct selection is worth one point. Answer:
Explanation
The correct sequence uses 'createDataFrame' to initialize the DataFrame from JSON data, then 'select' to choose and transform columns, 'explode' to flatten the nested array structure (expanding each array element into its own row), 'alias' to rename the resulting exploded column to a meaningful name, and a final 'select' to shape the output into the desired tabular format. This pipeline is the standard PySpark pattern for normalizing nested JSON arrays into flat tabular data.
Topics
Community Discussion
No community discussion yet for this question.