GENERATIVE-AI-ENGINEER-ASSOCIATE · Question #53
GENERATIVE-AI-ENGINEER-ASSOCIATE Question #53: Real Exam Question with Answer & Explanation
The correct answer is D: Configure rate limiting on the Foundation Model endpoints.. Configuring rate limiting on the Foundation Model endpoints is the most direct and effective solution to prevent unintentionally running excessive inference queries. Rate limiting ensures that a set number of requests can be processed within a defined time frame, reducing the ris
Question
A Generative AI Engineer who was prototyping an LLM system accidentally ran thousands of inference queries against a Foundation Model endpoint over the weekend. They want to take action to prevent this from unintentionally happening again in the future. What action should they take?
Options
- AUse prompt engineering to instruct the LLM endpoints to refuse too many subsequent queries.
- BRequire that all development code which interfaces with a Foundation Model endpoint must be
- CBuild a pyfunc model which proxies to the Foundation Model endpoint and add throttling within
- DConfigure rate limiting on the Foundation Model endpoints.
Explanation
Configuring rate limiting on the Foundation Model endpoints is the most direct and effective solution to prevent unintentionally running excessive inference queries. Rate limiting ensures that a set number of requests can be processed within a defined time frame, reducing the risk of accidental or runaway queries, and helping to control costs and resource utilization.
Topics
Community Discussion
No community discussion yet for this question.