PROFESSIONAL-DATA-ENGINEER · Question #332
PROFESSIONAL-DATA-ENGINEER Question #332: Real Exam Question with Answer & Explanation
The correct answer is C: 1. Load the CSV files from Cloud Storage into a BigQuery table, and enable dynamic data masking.. Option C is correct because BigQuery's dynamic data masking applies a deterministic masking function (such as a SHA256 cryptographic hash) at query time, meaning the same email address always produces the same masked value across both tables - preserving the ability to JOIN while
Question
Your company's data platform ingests CSV file dumps of booking and user profile data from upstream sources into Cloud Storage. The data analyst team wants to join these datasets on the email field available in both the datasets to perform analysis. However, personally identifiable information (PII) should not be accessible to the analysts. You need to de-identify the email field in both the datasets before loading them into BigQuery for analysts. What should you do?
Options
- A1. Create a pipeline to de-identify the email field by using recordTransformations in Cloud Data Loss Prevention (Cloud DLP) with masking as the de-
- B1. Create a pipeline to de-identify the email field by using recordTransformations in Cloud DLP with format-preserving encryption with FFX as the de-
- C1. Load the CSV files from Cloud Storage into a BigQuery table, and enable dynamic data masking.
- D1. Load the CSV files from Cloud Storage into a BigQuery table, and enable dynamic data masking.
Explanation
Option C is correct because BigQuery's dynamic data masking applies a deterministic masking function (such as a SHA256 cryptographic hash) at query time, meaning the same email address always produces the same masked value across both tables - preserving the ability to JOIN while ensuring analysts never see the raw PII.
Why the distractors fail:
- Option A (Cloud DLP + masking) replaces characters with fixed symbols (e.g.,
****@****.***), which is not join-preserving - every email maps to the same indistinguishable output, making cross-dataset joins meaningless. - Option B (Cloud DLP + FFX format-preserving encryption) is deterministic and would technically allow joining, but FFX produces reversible ciphertext that analysts could potentially decrypt with the key, and it adds operational complexity (key management) that dynamic data masking avoids natively in BigQuery.
- Option D (as rendered) appears identical to C - in the actual exam, D likely differs in a detail such as applying nullification or non-deterministic masking that breaks the join.
Memory tip: The JOIN requirement is the key discriminator. Ask yourself: "Will the same email always produce the same masked output?" Only deterministic transformations (SHA256 hash via dynamic masking, or FFX encryption) pass this test - but BigQuery's native dynamic masking is the simpler, fully managed solution Google favors.
Topics
Community Discussion
No community discussion yet for this question.