nerdexam
AmazonAmazon

MLA-C01 · Question #112

MLA-C01 Question #112: Real Exam Question with Answer & Explanation

The correct answer is C: Create a SageMaker endpoint. Create an inference component for each model. In the inference. Option C is correct because SageMaker Inference Components allow you to deploy multiple models behind a single endpoint while configuring independent scaling policies per model - directly addressing the "different scaling requirements" constraint. By setting a minimum copy count

Deployment and Orchestration of ML Workflows

Question

A company runs Amazon SageMaker ML models that use accelerated instances. The models require real-time responses. Each model has different scaling requirements. The company must not allow a cold start for the models. Which solution will meet these requirements?

Options

  • ACreate a SageMaker Serverless Inference endpoint for each model. Use provisioned concurrency
  • BCreate a SageMaker Asynchronous Inference endpoint for each model. Create an auto scaling
  • CCreate a SageMaker endpoint. Create an inference component for each model. In the inference
  • DCreate an Amazon S3 bucket. Store all the model artifacts in the S3 bucket. Create a SageMaker

Explanation

Option C is correct because SageMaker Inference Components allow you to deploy multiple models behind a single endpoint while configuring independent scaling policies per model - directly addressing the "different scaling requirements" constraint. By setting a minimum copy count greater than zero for each inference component, you guarantee warm instances are always available, eliminating cold starts. Inference components also support accelerated (GPU) instances and deliver the synchronous, real-time latency the question requires.

Why the distractors fail:

  • A (Serverless Inference): Serverless inference does not support accelerated (GPU) instances and is inherently susceptible to cold starts - provisioned concurrency cannot fully eliminate them in this context.
  • B (Asynchronous Inference): Async inference uses a queue-based model designed for long-running batch jobs, not real-time responses; it returns results later, not immediately.
  • D (S3 bucket): Storing artifacts in S3 alone provides no inference endpoint or compute - the answer is incomplete and describes model storage, not a serving solution.

Memory tip: Think "Inference Components = Independent Control" - each component gets its own scaling and resource config on one shared endpoint, with a minimum copy count that keeps instances warm and cold-start-free.

Topics

#SageMaker Inference Components#Real-time Inference#No Cold Start#ML Model Deployment

Community Discussion

No community discussion yet for this question.

Full MLA-C01 PracticeBrowse All MLA-C01 Questions