A Generative AI Engineer is creating a LLM-based application. The documents for its retriever have been chunked to a maximum of 512 tokens each. The Generative AI Engineer knows that cost and latency

Sign in or unlock GENERATIVE-AI-ENGINEER-ASSOCIATE to reveal the answer and full explanation for question #70. The question stem and answer options stay visible for context.

LLM Application Development and Optimization

Question

A Generative AI Engineer is creating a LLM-based application. The documents for its retriever have been chunked to a maximum of 512 tokens each. The Generative AI Engineer knows that cost and latency are more important than quality for this application. They have several context length levels to choose from. Which will fulfill their need?

Options

Acontext length 512: smallest model is 0.13GB with and embedding dimension 384
Bcontext length 514: smallest model is 0.44GB and embedding dimension 768
Ccontext length 2048: smallest model is 11GB and embedding dimension 2560
Dcontext length 32768: smallest model is 14GB and embedding dimension 4096

Unlock GENERATIVE-AI-ENGINEER-ASSOCIATE to see the answer

You've previewed enough free GENERATIVE-AI-ENGINEER-ASSOCIATE questions. Unlock GENERATIVE-AI-ENGINEER-ASSOCIATE for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.

Unlock GENERATIVE-AI-ENGINEER-ASSOCIATE - $49.99 / 30 days Sign in

Topics

#RAG System Design#LLM Performance Optimization#Cost-Latency Tradeoffs#Context Window Management

Full GENERATIVE-AI-ENGINEER-ASSOCIATE Practice