A Generative AI Engineer developed an LLM application using the provisioned throughput Foundation Model API. Now that the application is ready to be deployed, they realize their volume of requests are

Sign in or unlock GENERATIVE-AI-ENGINEER-ASSOCIATE to reveal the answer and full explanation for question #98. The question stem and answer options stay visible for context.

LLM Deployment and Cost Management

Question

A Generative AI Engineer developed an LLM application using the provisioned throughput Foundation Model API. Now that the application is ready to be deployed, they realize their volume of requests are not sufficiently high enough to create their own provisioned throughput endpoint. They want to choose a strategy that ensures the best cost-effectiveness for their application. What strategy should the Generative AI Engineer use?

Options

ASwitch to using External Models instead
BDeploy the model using pay-per-token throughput as it comes with cost guarantees
CChange to a model with a fewer number of parameters in order to reduce hardware constraint
DThrottle the incoming batch of requests manually to avoid rate limiting issues

Unlock GENERATIVE-AI-ENGINEER-ASSOCIATE to see the answer

You've previewed enough free GENERATIVE-AI-ENGINEER-ASSOCIATE questions. Unlock GENERATIVE-AI-ENGINEER-ASSOCIATE for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.

Unlock GENERATIVE-AI-ENGINEER-ASSOCIATE - $49.99 / 30 days Sign in

Topics

#LLM Deployment#Cost Optimization#Foundation Models#API Throughput

Full GENERATIVE-AI-ENGINEER-ASSOCIATE Practice