PROFESSIONAL-DATA-ENGINEER · Question #206
PROFESSIONAL-DATA-ENGINEER Question #206: Real Exam Question with Answer & Explanation
Sign in or unlock PROFESSIONAL-DATA-ENGINEER to reveal the answer and full explanation for question #206. The question stem and answer options stay visible for context.
Question
You need to create a data pipeline that copies time-series transaction data so that it can be queried from within BigQuery by your data science team for analysis. Every hour, thousands of transactions are updated with a new status. The size of the intitial dataset is 1.5 PB, and it will grow by 3 TB per day. The data is heavily structured, and your data science team will build machine learning models based on this data. You want to maximize performance and usability for your data science team. Which two strategies should you adopt? (Choose two.)
Options
- ADenormalize the data as must as possible.
- BPreserve the structure of the data as much as possible.
- CUse BigQuery UPDATE to further reduce the size of the dataset.
- DDevelop a data pipeline where status updates are appended to BigQuery instead of updated.
- ECopy a daily snapshot of transaction data to Cloud Storage and store it as an Avro file.
Unlock PROFESSIONAL-DATA-ENGINEER to see the answer
You've previewed enough free PROFESSIONAL-DATA-ENGINEER questions. Unlock PROFESSIONAL-DATA-ENGINEER for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.