DP-600 · Question #39
DP-600 Question #39: Real Exam Question with Answer & Explanation
Sign in or unlock DP-600 to reveal the answer and full explanation for question #39. The question stem and answer options stay visible for context.
Question
You are analyzing customer purchases in a Fabric notebook by using PySpark. You have the following DataFrames: - transactions: Contains five columns named transaction_id, customer_id, product_id, amount, and date and has 10 million rows, with each row representing a transaction. - customers: Contains customer details in 1,000 rows and three columns named customer_id, name, and country. You need to join the DataFrames on the customer_id column. The solution must minimize data shuffling. You write the following code. from pyspark.sql import functions as F results = Which code should you run to populate the results DataFrame?
Options
- Atransactions.join(F.broadcast(customers), transactions.customer_id == customers.customer_id)
- Btransactions.join(customers, transactions.customer_id == customers.customer_id).distinct()
- Ctransactions.join(customers, transactions.customer_id == customers.customer_id)
- Dtransactions.crossJoin(customers).where(transactions.customer_id == customers.customer_id)
Unlock DP-600 to see the answer
You've previewed enough free DP-600 questions. Unlock DP-600 for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.