PROFESSIONAL-DATA-ENGINEER · Question #262
PROFESSIONAL-DATA-ENGINEER Question #262: Real Exam Question with Answer & Explanation
The correct answer is C: Create a table that includes information about the books and authors, but nest the author fields inside the author column. Option C is correct because BigQuery's recommended practice for denormalization is to use nested fields (STRUCTs) rather than traditional relational joins. BigQuery is a columnar, distributed system where JOINs are costly - nesting author data as a STRUCT inside the book table ke
Question
You are migrating an application that tracks library books and information about each book, such as author or year published, from an on-premises data warehouse to BigQuery. In your current relational database, the author information is kept in a separate table and joined to the book information on a common key. Based on Google's recommended practice for schema design, how would you structure the data to ensure optimal speed of queries about the author of each book that has been borrowed?
Options
- AKeep the schema the same, maintain the different tables for the book and each of the attributes, and query as you are doing today
- BCreate a table that is wide and includes a column for each attribute, including the author's first name, last name, date of birth, etc
- CCreate a table that includes information about the books and authors, but nest the author fields inside the author column
- DKeep the schema the same, create a view that joins all of the tables, and always query the view
Explanation
Option C is correct because BigQuery's recommended practice for denormalization is to use nested fields (STRUCTs) rather than traditional relational joins. BigQuery is a columnar, distributed system where JOINs are costly - nesting author data as a STRUCT inside the book table keeps related data co-located in storage, allowing queries to retrieve book and author information in a single scan without shuffling data across nodes.
Option A is wrong because maintaining separate tables and joining them replicates a relational pattern that performs poorly in BigQuery - distributed JOINs are expensive and go against Google's denormalization guidance. Option B (a fully flat wide table) moves in the right direction with denormalization, but doesn't leverage BigQuery's native STRUCT/ARRAY support, which provides better logical organization and handles one-to-many relationships more cleanly. Option D is wrong because a view is just a saved query - it doesn't change how data is physically stored, so the underlying JOIN cost remains unchanged at query time.
Memory tip: Think "BigQuery = NEST, not JOIN." When you see a relational schema migration question, the answer almost always involves flattening or nesting data - and when there's a clear parent-child relationship (book → author), nested fields beat a flat wide table because they preserve structure while eliminating the JOIN.
Topics
Community Discussion
No community discussion yet for this question.