PROFESSIONAL-DATA-ENGINEER · Question #144
PROFESSIONAL-DATA-ENGINEER Question #144: Real Exam Question with Answer & Explanation
Sign in or unlock PROFESSIONAL-DATA-ENGINEER to reveal the answer and full explanation for question #144. The question stem and answer options stay visible for context.
Question
Your analytics team wants to build a simple statistical model to determine which customers are most likely to work with your company again, based on a few different metrics. They want to run the model on Apache Spark, using data housed in Google Cloud Storage, and you have recommended using Google Cloud Dataproc to execute this job. Testing has shown that this workload can run in approximately 30 minutes on a 15-node cluster, outputting the results into Google BigQuery. The plan is to run this workload weekly. How should you optimize the cluster for cost?
Options
- AMigrate the workload to Google Cloud Dataflow
- BUse pre-emptible virtual machines (VMs) for the cluster
- CUse a higher-memory node so that the job runs faster
- DUse SSDs on the worker nodes so that the job can run faster
Unlock PROFESSIONAL-DATA-ENGINEER to see the answer
You've previewed enough free PROFESSIONAL-DATA-ENGINEER questions. Unlock PROFESSIONAL-DATA-ENGINEER for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.