You developed an ML model with AI Platform, and you want to move it to production. You serve a few thousand queries per second and are experiencing latency issues. Incoming requests are served by a lo

Sign in or unlock PROFESSIONAL-MACHINE-LEARNING-ENGINEER to reveal the answer and full explanation for question #56. The question stem and answer options stay visible for context.

Submitted by daniela_cl· Apr 18, 2026Monitoring, optimizing, and maintaining ML solutions

Question

You developed an ML model with AI Platform, and you want to move it to production. You serve a few thousand queries per second and are experiencing latency issues. Incoming requests are served by a load balancer that distributes them across multiple Kubeflow CPU-only pods running on Google Kubernetes Engine (GKE). Your goal is to improve the serving latency without changing the underlying infrastructure. What should you do?

Options

ASignificantly increase the max_batch_size TensorFlow Serving parameter.
BSwitch to the tensorflow-model-server-universal version of TensorFlow Serving.
CSignificantly increase the max_enqueued_batches TensorFlow Serving parameter.
DRecompile TensorFlow Serving using the source to support CPU-specific optimizations.

Unlock PROFESSIONAL-MACHINE-LEARNING-ENGINEER to see the answer

You've previewed enough free PROFESSIONAL-MACHINE-LEARNING-ENGINEER questions. Unlock PROFESSIONAL-MACHINE-LEARNING-ENGINEER for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.

Unlock PROFESSIONAL-MACHINE-LEARNING-ENGINEER - $49.99 / 30 days Sign in

Topics

#ML Serving Optimization#TensorFlow Serving#CPU Performance#Latency Reduction

Full PROFESSIONAL-MACHINE-LEARNING-ENGINEER Practice