nerdexam
DatabricksDatabricks

CERTIFIED-DATA-ENGINEER-PROFESSIONAL · Question #25

CERTIFIED-DATA-ENGINEER-PROFESSIONAL Question #25: Real Exam Question with Answer & Explanation

Sign in or unlock CERTIFIED-DATA-ENGINEER-PROFESSIONAL to reveal the answer and full explanation for question #25. The question stem and answer options stay visible for context.

Optimizing Spark Applications

Question

A Spark job is taking longer than expected. Using the Spark UI, a data engineer notes that the Min, Median, and Max Durations for tasks in a particular stage show the minimum and median time to complete a task as roughly the same, but the max duration for a task to be roughly 100 times as long as the minimum. Which situation is causing increased duration of the overall job?

Options

  • ATask queueing resulting from improper thread pool assignment.
  • BSpill resulting from attached volume storage being too small.
  • CNetwork latency due to some cluster nodes being in different regions from the source data
  • DSkew caused by more data being assigned to a subset of spark-partitions.
  • ECredential validation errors while pulling data from an external system.

Unlock CERTIFIED-DATA-ENGINEER-PROFESSIONAL to see the answer

You've previewed enough free CERTIFIED-DATA-ENGINEER-PROFESSIONAL questions. Unlock CERTIFIED-DATA-ENGINEER-PROFESSIONAL for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.

Topics

#Spark Performance#Data Skew#Spark UI#Troubleshooting
Full CERTIFIED-DATA-ENGINEER-PROFESSIONAL PracticeBrowse All CERTIFIED-DATA-ENGINEER-PROFESSIONAL Questions