PROFESSIONAL-CLOUD-DEVELOPER · Question #174
PROFESSIONAL-CLOUD-DEVELOPER Question #174: Real Exam Question with Answer & Explanation
The correct answer is B: Process the messages with a Dataflow streaming pipeline using Apache Beam's PubSubIO. https://cloud.google.com/blog/products/data-analytics/handling-duplicate-data-in-streaming- pipeline-using-pubsub-dataflow Pub/Sub provides each message with a unique message_id, Dataflow uses it to deduplicate messages by default if you use the built-in Apache Beam PubSubIO.
Question
Your team develops services that run on Google Cloud. You want to process messages sent to a Pub/Sub topic, and then store them. Each message must be processed exactly once to avoid duplication of data and any data conflicts. You need to use the cheapest and most simple solution. What should you do?
Options
- AProcess the messages with a Dataproc job, and write the output to storage.
- BProcess the messages with a Dataflow streaming pipeline using Apache Beam's PubSubIO
- CProcess the messages with a Cloud Function, and write the results to a BigQuery location where
- DRetrieve the messages with a Dataflow streaming pipeline, store them in Cloud Bigtable, and use
Explanation
https://cloud.google.com/blog/products/data-analytics/handling-duplicate-data-in-streaming- pipeline-using-pubsub-dataflow Pub/Sub provides each message with a unique message_id, Dataflow uses it to deduplicate messages by default if you use the built-in Apache Beam PubSubIO.
Topics
Community Discussion
No community discussion yet for this question.