nerdexam
DatabricksDatabricks

GENERATIVE-AI-ENGINEER-ASSOCIATE · Question #53

GENERATIVE-AI-ENGINEER-ASSOCIATE Question #53: Real Exam Question with Answer & Explanation

The correct answer is D: Configure rate limiting on the Foundation Model endpoints.. Configuring rate limiting on the Foundation Model endpoints is the most direct and effective solution to prevent unintentionally running excessive inference queries. Rate limiting ensures that a set number of requests can be processed within a defined time frame, reducing the ris

Foundation Model Management and Operationalization

Question

A Generative AI Engineer who was prototyping an LLM system accidentally ran thousands of inference queries against a Foundation Model endpoint over the weekend. They want to take action to prevent this from unintentionally happening again in the future. What action should they take?

Options

  • AUse prompt engineering to instruct the LLM endpoints to refuse too many subsequent queries.
  • BRequire that all development code which interfaces with a Foundation Model endpoint must be
  • CBuild a pyfunc model which proxies to the Foundation Model endpoint and add throttling within
  • DConfigure rate limiting on the Foundation Model endpoints.

Explanation

Configuring rate limiting on the Foundation Model endpoints is the most direct and effective solution to prevent unintentionally running excessive inference queries. Rate limiting ensures that a set number of requests can be processed within a defined time frame, reducing the risk of accidental or runaway queries, and helping to control costs and resource utilization.

Topics

#Rate Limiting#Foundation Models#API Management#Cost Optimization

Community Discussion

No community discussion yet for this question.

Full GENERATIVE-AI-ENGINEER-ASSOCIATE PracticeBrowse All GENERATIVE-AI-ENGINEER-ASSOCIATE Questions