DatabricksDatabricks
GENERATIVE-AI-ENGINEER-ASSOCIATE · Question #98
GENERATIVE-AI-ENGINEER-ASSOCIATE Question #98: Real Exam Question with Answer & Explanation
The correct answer is B: Deploy the model using pay-per-token throughput as it comes with cost guarantees. See the full explanation below for the reasoning.
LLM Deployment and Cost Management
Question
A Generative AI Engineer developed an LLM application using the provisioned throughput Foundation Model API. Now that the application is ready to be deployed, they realize their volume of requests are not sufficiently high enough to create their own provisioned throughput endpoint. They want to choose a strategy that ensures the best cost-effectiveness for their application. What strategy should the Generative AI Engineer use?
Options
- ASwitch to using External Models instead
- BDeploy the model using pay-per-token throughput as it comes with cost guarantees
- CChange to a model with a fewer number of parameters in order to reduce hardware constraint
- DThrottle the incoming batch of requests manually to avoid rate limiting issues
Topics
#LLM Deployment#Cost Optimization#Foundation Models#API Throughput
Community Discussion
No community discussion yet for this question.