PROFESSIONAL-DATA-ENGINEER · Question #169
PROFESSIONAL-DATA-ENGINEER Question #169: Real Exam Question with Answer & Explanation
Option A is correct because it uses Kafka MirrorMaker - a native Kafka tool designed specifically for mirroring topics between clusters - to replicate on-prem topics to a Kafka cluster on GCE, then a Dataflow job reads from that GCP-side Kafka cluster to load into GCS/BigQuery. T
Question
You have an Apache Kafka cluster on-prem with topics containing web application logs. You need to replicate the data to Google Cloud for analysis in BigQuery and Cloud Storage. The preferred replication method is mirroring to avoid deployment of Kafka Connect plugins. Dataflow job to read from Kafka and write to GCS.
Options
- BDeploy a Kafka cluster on GCE VM Instances with the PubSub Kafka connector configured as a Sink connector. Use a Dataproc cluster or Dataflow job to read
- CDeploy the PubSub Kafka connector to your on-prem Kafka cluster and configure PubSub as a Source connector. Use a Dataflow job to read from PubSub and
- DDeploy the PubSub Kafka connector to your on-prem Kafka cluster and configure PubSub as a Sink connector. Use a Dataflow job to read from PubSub and
Explanation
Option A is correct because it uses Kafka MirrorMaker - a native Kafka tool designed specifically for mirroring topics between clusters - to replicate on-prem topics to a Kafka cluster on GCE, then a Dataflow job reads from that GCP-side Kafka cluster to load into GCS/BigQuery. This satisfies both requirements: mirroring as the replication method and no Kafka Connect plugins deployed to the on-prem cluster.
Option B is wrong because deploying the PubSub Kafka connector on GCE still deploys a Kafka Connect plugin - just on the cloud side - and doesn't use mirroring as the replication method.
Option C is wrong on two fronts: it deploys a Kafka Connect plugin to the on-prem cluster (violating the constraint), and it misconfigures the connector direction - a Source connector reads into Kafka from an external system, so "PubSub as Source" would pull from PubSub into Kafka, not the other way around.
Option D gets the connector direction right (Sink reads out of Kafka into PubSub), but still violates the core constraint by deploying a Kafka Connect plugin to the on-prem cluster instead of using mirroring.
Memory tip: When the question says "mirroring," think MirrorMaker - it's Kafka's built-in replication tool that needs no Connect plugins. Also remember the direction rule: Sink = data flows out of Kafka, Source = data flows into Kafka.
Topics
Community Discussion
No community discussion yet for this question.