A company is developing a generative AI (GenAI) application that analyzes customer service calls in real time and generates suggested responses for human customer service agents. The application must process 500,000 concurrent calls during peak hours with less than 200 ms end- to-end latency for each suggestion. The company uses existing architecture to transcribe customer call audio streams. The application must not exceed a predefined monthly compute budget and must maintain auto scaling capabilities. Which solution will meet these requirements?

Question

Accepted Answer

B. Deploy a low-latency, real-time optimized model on Amazon Bedrock. Purchase provisioned

Answer

A. Deploy a large, complex reasoning model on Amazon Bedrock. Purchase provisioned throughput

Answer

C. Deploy a large language model (LLM) on an Amazon SageMaker real-time endpoint that uses

Answer

D. Deploy a mid-sized language model on an Amazon SageMaker serverless endpoint that is

AIP-C01 Question #78: Real Exam Question with Answer & Explanation

Question

Options

Explanation

Topics

Community Discussion