nerdexam
GoogleGoogle

PROFESSIONAL-CLOUD-DEVOPS-ENGINEER · Question #33

PROFESSIONAL-CLOUD-DEVOPS-ENGINEER Question #33: Real Exam Question with Answer & Explanation

The correct answer is A: Reroute the user traffic from the affected region to other regions that don't report issues.. To quickly resolve an outage affecting all users in a specific region for a multi-regional GKE application, the immediate SRE action is to reroute traffic to healthy regions.

Submitted by eva_at· Apr 18, 2026Managing a service incident

Question

You support a popular mobile game application deployed on Google Kubernetes Engine (GKE) across several Google Cloud regions. Each region has multiple Kubernetes clusters. You receive a report that none of the users in a specific region can connect to the application. You want to resolve the incident while following Site Reliability Engineering practices. What should you do first?

Options

  • AReroute the user traffic from the affected region to other regions that don't report issues.
  • BUse Stackdriver Monitoring to check for a spike in CPU or memory usage for the affected region.
  • CAdd an extra node pool that consists of high memory and high CPU machine type instances to
  • DUse Stackdriver Logging to filter on the clusters in the affected region, and inspect error

Explanation

To quickly resolve an outage affecting all users in a specific region for a multi-regional GKE application, the immediate SRE action is to reroute traffic to healthy regions.

Common mistakes.

  • B. Checking monitoring for CPU/memory spikes is a diagnostic step, not an immediate mitigation for a complete regional connectivity outage, and would delay restoring service to users.
  • C. Adding an extra node pool is a scaling action, not an immediate incident response, and it assumes the problem is resource saturation, which is unlikely to cause a complete connectivity loss for an entire region.
  • D. Inspecting error logs is a diagnostic step for root cause analysis, which is important but secondary to immediate service restoration, and would not directly restore connectivity for affected users.

Concept tested. SRE incident response (immediate mitigation)

Reference. https://sre.google/sre-book/on-call-handbook/#addressing-incidents

Topics

#Incident Response#Service Restoration#Multi-region Architecture#Site Reliability Engineering

Community Discussion

No community discussion yet for this question.

Full PROFESSIONAL-CLOUD-DEVOPS-ENGINEER PracticeBrowse All PROFESSIONAL-CLOUD-DEVOPS-ENGINEER Questions