You want to rebuild your batch pipeline for structured data on Google Cloud. You are using PySpark to conduct data transformations at scale, but your pipelines are taking over twelve hours to run. To

Sign in or unlock PROFESSIONAL-DATA-ENGINEER to reveal the answer and full explanation for question #254. The question stem and answer options stay visible for context.

Submitted by jian89· Mar 30, 2026Building and operationalizing data processing systems

Question

You want to rebuild your batch pipeline for structured data on Google Cloud. You are using PySpark to conduct data transformations at scale, but your pipelines are taking over twelve hours to run. To expedite development and pipeline run time, you want to use a serverless tool and SQL syntax. You have already moved your raw data into Cloud Storage. How should you build the pipeline on Google Cloud while meeting speed and processing requirements?

Options

CIngest your data into BigQuery from Cloud Storage, convert your PySpark commands into BigQuery SQL queries to transform the data, and then write the
DUse Apache Beam Python SDK to build the transformation pipelines, and write the data into BigQuery

Unlock PROFESSIONAL-DATA-ENGINEER to see the answer

You've previewed enough free PROFESSIONAL-DATA-ENGINEER questions. Unlock PROFESSIONAL-DATA-ENGINEER for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.

Unlock PROFESSIONAL-DATA-ENGINEER - $49.99 / 30 days Sign in

Topics

#BigQuery#Cloud Storage#SQL transformation#serverless

Full PROFESSIONAL-DATA-ENGINEER Practice