nerdexam
MicrosoftMicrosoft

DP-600 · Question #39

DP-600 Question #39: Real Exam Question with Answer & Explanation

Sign in or unlock DP-600 to reveal the answer and full explanation for question #39. The question stem and answer options stay visible for context.

Submitted by asante_acc· Apr 18, 2026Prepare and serve data

Question

You are analyzing customer purchases in a Fabric notebook by using PySpark. You have the following DataFrames: - transactions: Contains five columns named transaction_id, customer_id, product_id, amount, and date and has 10 million rows, with each row representing a transaction. - customers: Contains customer details in 1,000 rows and three columns named customer_id, name, and country. You need to join the DataFrames on the customer_id column. The solution must minimize data shuffling. You write the following code. from pyspark.sql import functions as F results = Which code should you run to populate the results DataFrame?

Options

  • Atransactions.join(F.broadcast(customers), transactions.customer_id == customers.customer_id)
  • Btransactions.join(customers, transactions.customer_id == customers.customer_id).distinct()
  • Ctransactions.join(customers, transactions.customer_id == customers.customer_id)
  • Dtransactions.crossJoin(customers).where(transactions.customer_id == customers.customer_id)

Unlock DP-600 to see the answer

You've previewed enough free DP-600 questions. Unlock DP-600 for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.

Topics

#PySpark#DataFrame Join#Broadcast Join#Data Shuffling
Full DP-600 PracticeBrowse All DP-600 Questions