nerdexam
GoogleGoogle

PROFESSIONAL-DATA-ENGINEER · Question #28

PROFESSIONAL-DATA-ENGINEER Question #28: Real Exam Question with Answer & Explanation

The correct answer is D: Use a row key of the form >#<sensorid>#<timestamp>.. Explanation/Reference: BigQueryIO.read.from() directly reads the whole table from BigQuery. This function exports the whole table to temporary files in Google Cloud Storage, where it will later be read from. This requires almost no computation, as it only performs an export job,

Submitted by daniela_cl· Mar 30, 2026

Question

Your company is performing data preprocessing for a learning algorithm in Google Cloud Dataflow. Numerous data logs are being are being generated during this step, and the team wants to analyze them. Due to the dynamic nature of the campaign, the data is growing exponentially every hour. The data scientists have written the following code to read the data for a new key features in the logs. BigQueryIO.Read .named("ReadLogData") .from("clouddataflow-readonly:samples.log_data") You want to improve the performance of this data read. What should you do?

Options

  • AUse a row key of the form <timestamp>.
  • BUse a row key of the form <sensorid>.
  • CUse a row key of the form <timestamp>#<sensorid>.
  • DUse a row key of the form >#<sensorid>#<timestamp>.

Explanation

Explanation/Reference: BigQueryIO.read.from() directly reads the whole table from BigQuery. This function exports the whole table to temporary files in Google Cloud Storage, where it will later be read from. This requires almost no computation, as it only performs an export job, and later Dataflow reads from GCS (not from BigQuery). BigQueryIO.read.fromQuery() executes a query and then reads the results received after the query execution. Therefore, this function is more time-consuming, given that it requires that a query is first executed (which will incur in the corresponding economic and computational costs). Explanation/Reference: Best practices of bigtable states that rowkey should not be only timestamp or have timestamp at starting. It’s better to have sensorid and timestamp as rowkey.

Community Discussion

No community discussion yet for this question.

Full PROFESSIONAL-DATA-ENGINEER PracticeBrowse All PROFESSIONAL-DATA-ENGINEER Questions