MLA-C01 · Question #112
MLA-C01 Question #112: Real Exam Question with Answer & Explanation
The correct answer is C: Create a SageMaker endpoint. Create an inference component for each model. In the inference. Option C is correct because SageMaker Inference Components allow you to deploy multiple models behind a single endpoint while configuring independent scaling policies per model - directly addressing the "different scaling requirements" constraint. By setting a minimum copy count
Question
A company runs Amazon SageMaker ML models that use accelerated instances. The models require real-time responses. Each model has different scaling requirements. The company must not allow a cold start for the models. Which solution will meet these requirements?
Options
- ACreate a SageMaker Serverless Inference endpoint for each model. Use provisioned concurrency
- BCreate a SageMaker Asynchronous Inference endpoint for each model. Create an auto scaling
- CCreate a SageMaker endpoint. Create an inference component for each model. In the inference
- DCreate an Amazon S3 bucket. Store all the model artifacts in the S3 bucket. Create a SageMaker
Explanation
Option C is correct because SageMaker Inference Components allow you to deploy multiple models behind a single endpoint while configuring independent scaling policies per model - directly addressing the "different scaling requirements" constraint. By setting a minimum copy count greater than zero for each inference component, you guarantee warm instances are always available, eliminating cold starts. Inference components also support accelerated (GPU) instances and deliver the synchronous, real-time latency the question requires.
Why the distractors fail:
- A (Serverless Inference): Serverless inference does not support accelerated (GPU) instances and is inherently susceptible to cold starts - provisioned concurrency cannot fully eliminate them in this context.
- B (Asynchronous Inference): Async inference uses a queue-based model designed for long-running batch jobs, not real-time responses; it returns results later, not immediately.
- D (S3 bucket): Storing artifacts in S3 alone provides no inference endpoint or compute - the answer is incomplete and describes model storage, not a serving solution.
Memory tip: Think "Inference Components = Independent Control" - each component gets its own scaling and resource config on one shared endpoint, with a minimum copy count that keeps instances warm and cold-start-free.
Topics
Community Discussion
No community discussion yet for this question.