PROFESSIONAL-DATA-ENGINEER · Question #170
PROFESSIONAL-DATA-ENGINEER Question #170: Real Exam Question with Answer & Explanation
Sign in or unlock PROFESSIONAL-DATA-ENGINEER to reveal the answer and full explanation for question #170. The question stem and answer options stay visible for context.
Question
You've migrated a Hadoop job from an on-prem cluster to dataproc and GCS. Your Spark job is a complicated analytical workload that consists of many shuffing operations and initial data are parquet files (on average 200-400 MB size each). You see some degradation in performance after the migration to Dataproc, so you'd like to optimize for it. You need to keep in mind that your organization is very cost- sensitive, so you'd like to continue using Dataproc on preemptibles (with 2 non-preemptible workers only) for this workload. What should you do?
Options
- AIncrease the size of your parquet files to ensure them to be 1 GB minimum.
- BSwitch to TFRecords formats (appr. 200MB per file) instead of parquet files.
- CSwitch from HDDs to SSDs, copy initial data from GCS to HDFS, run the Spark job and copy results back to GCS.
- DSwitch from HDDs to SSDs, override the preemptible VMs configuration to increase the boot disk size.
Unlock PROFESSIONAL-DATA-ENGINEER to see the answer
You've previewed enough free PROFESSIONAL-DATA-ENGINEER questions. Unlock PROFESSIONAL-DATA-ENGINEER for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.