nerdexam
GoogleGoogle

PROFESSIONAL-DATA-ENGINEER · Question #332

PROFESSIONAL-DATA-ENGINEER Question #332: Real Exam Question with Answer & Explanation

The correct answer is C: 1. Load the CSV files from Cloud Storage into a BigQuery table, and enable dynamic data masking.. Option C is correct because BigQuery's dynamic data masking applies a deterministic masking function (such as a SHA256 cryptographic hash) at query time, meaning the same email address always produces the same masked value across both tables - preserving the ability to JOIN while

Submitted by lucia.co· Mar 30, 2026Building and operationalizing data processing systems

Question

Your company's data platform ingests CSV file dumps of booking and user profile data from upstream sources into Cloud Storage. The data analyst team wants to join these datasets on the email field available in both the datasets to perform analysis. However, personally identifiable information (PII) should not be accessible to the analysts. You need to de-identify the email field in both the datasets before loading them into BigQuery for analysts. What should you do?

Options

  • A1. Create a pipeline to de-identify the email field by using recordTransformations in Cloud Data Loss Prevention (Cloud DLP) with masking as the de-
  • B1. Create a pipeline to de-identify the email field by using recordTransformations in Cloud DLP with format-preserving encryption with FFX as the de-
  • C1. Load the CSV files from Cloud Storage into a BigQuery table, and enable dynamic data masking.
  • D1. Load the CSV files from Cloud Storage into a BigQuery table, and enable dynamic data masking.

Explanation

Option C is correct because BigQuery's dynamic data masking applies a deterministic masking function (such as a SHA256 cryptographic hash) at query time, meaning the same email address always produces the same masked value across both tables - preserving the ability to JOIN while ensuring analysts never see the raw PII.

Why the distractors fail:

  • Option A (Cloud DLP + masking) replaces characters with fixed symbols (e.g., ****@****.***), which is not join-preserving - every email maps to the same indistinguishable output, making cross-dataset joins meaningless.
  • Option B (Cloud DLP + FFX format-preserving encryption) is deterministic and would technically allow joining, but FFX produces reversible ciphertext that analysts could potentially decrypt with the key, and it adds operational complexity (key management) that dynamic data masking avoids natively in BigQuery.
  • Option D (as rendered) appears identical to C - in the actual exam, D likely differs in a detail such as applying nullification or non-deterministic masking that breaks the join.

Memory tip: The JOIN requirement is the key discriminator. Ask yourself: "Will the same email always produce the same masked output?" Only deterministic transformations (SHA256 hash via dynamic masking, or FFX encryption) pass this test - but BigQuery's native dynamic masking is the simpler, fully managed solution Google favors.

Topics

#Dynamic Data Masking#BigQuery#Data De-identification#PII Protection

Community Discussion

No community discussion yet for this question.

Full PROFESSIONAL-DATA-ENGINEER PracticeBrowse All PROFESSIONAL-DATA-ENGINEER Questions