PROFESSIONAL-DATA-ENGINEER · Question #185
PROFESSIONAL-DATA-ENGINEER Question #185: Real Exam Question with Answer & Explanation
The correct answer is C: Use the BigQuery streaming the stream changes into a daily inventory movement table. Calculate balances in a view that joins it to the historical inventory. Option C is correct because BigQuery's streaming insert API allows inventory changes to be ingested with seconds of latency, satisfying the near real-time requirement. Storing changes as an append-only movement table (rather than mutating balances) aligns with BigQuery's columnar
Question
You need to create a near real-time inventory dashboard that reads the main inventory tables in your BigQuery data warehouse. Historical inventory data is stored as inventory balances by item and location. You have several thousand updates to inventory every hour. You want to maximize performance of the dashboard and ensure that the data is accurate. What should you do?
Options
- ALeverage BigQuery UPDATE statements to update the inventory balances as they are changing.
- BPartition the inventory balance table by item to reduce the amount of data scanned with each inventory update.
- CUse the BigQuery streaming the stream changes into a daily inventory movement table. Calculate balances in a view that joins it to the historical inventory
- DUse the BigQuery bulk loader to batch load inventory changes into a daily inventory movement table.
Explanation
Option C is correct because BigQuery's streaming insert API allows inventory changes to be ingested with seconds of latency, satisfying the near real-time requirement. Storing changes as an append-only movement table (rather than mutating balances) aligns with BigQuery's columnar, read-optimized architecture, and computing current balances in a view by joining movements to historical data ensures accuracy without expensive rewrites.
A is wrong because BigQuery is not designed for frequent row-level DML - thousands of UPDATE statements per hour would be slow and costly, since each DML operation rewrites entire storage segments rather than modifying individual rows in place.
B is wrong because partitioning by item reduces scan cost for queries but does nothing to solve the real-time ingestion problem; it also doesn't address how updates reach the table at all.
D is wrong because bulk loading is inherently batch-oriented and introduces significant latency (typically minutes to hours), which directly violates the near real-time requirement.
Memory tip: In BigQuery, "near real-time" almost always means streaming inserts, and "frequent mutations" is a red flag - BigQuery prefers append-only patterns. When you see both requirements together, think stream movements + view-computed balances.
Topics
Community Discussion
No community discussion yet for this question.