GENERATIVE-AI-ENGINEER-ASSOCIATE · Question #63
GENERATIVE-AI-ENGINEER-ASSOCIATE Question #63: Real Exam Question with Answer & Explanation
The correct answer is A: BLEU metric. The BLEU (Bilingual Evaluation Understudy) metric is a widely used evaluation metric specifically designed for assessing the quality of machine-generated translations. It compares the overlap between the n-grams of the translated text and the reference translations, making it ide
Question
A Generative AI Engineer has built an LLM-based system that will automatically translate user text between two languages. They now want to benchmark multiple LLM's on this task and pick the best one. They have an evaluation set with known high quality translation examples. They want to evaluate each LLM using the evaluation set with a performant metric. Which metric should they choose for this evaluation?
Options
- ABLEU metric
- BNDCG metric
- CROUGE metric
- DRECALL metric
Explanation
The BLEU (Bilingual Evaluation Understudy) metric is a widely used evaluation metric specifically designed for assessing the quality of machine-generated translations. It compares the overlap between the n-grams of the translated text and the reference translations, making it ideal for evaluating LLM-based translation systems.
Topics
Community Discussion
No community discussion yet for this question.