CERTIFIED-DATA-ENGINEER-PROFESSIONAL · Question #25
CERTIFIED-DATA-ENGINEER-PROFESSIONAL Question #25: Real Exam Question with Answer & Explanation
Sign in or unlock CERTIFIED-DATA-ENGINEER-PROFESSIONAL to reveal the answer and full explanation for question #25. The question stem and answer options stay visible for context.
Question
A Spark job is taking longer than expected. Using the Spark UI, a data engineer notes that the Min, Median, and Max Durations for tasks in a particular stage show the minimum and median time to complete a task as roughly the same, but the max duration for a task to be roughly 100 times as long as the minimum. Which situation is causing increased duration of the overall job?
Options
- ATask queueing resulting from improper thread pool assignment.
- BSpill resulting from attached volume storage being too small.
- CNetwork latency due to some cluster nodes being in different regions from the source data
- DSkew caused by more data being assigned to a subset of spark-partitions.
- ECredential validation errors while pulling data from an external system.
Unlock CERTIFIED-DATA-ENGINEER-PROFESSIONAL to see the answer
You've previewed enough free CERTIFIED-DATA-ENGINEER-PROFESSIONAL questions. Unlock CERTIFIED-DATA-ENGINEER-PROFESSIONAL for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.