A Generative AI Engineer has built an LLM-based system that will automatically translate user text between two languages. They now want to benchmark multiple LLM's on this task and pick the best one.

Sign in or unlock GENERATIVE-AI-ENGINEER-ASSOCIATE to reveal the answer and full explanation for question #63. The question stem and answer options stay visible for context.

LLM Evaluation Metrics

Question

A Generative AI Engineer has built an LLM-based system that will automatically translate user text between two languages. They now want to benchmark multiple LLM’s on this task and pick the best one. They have an evaluation set with known high quality translation examples. They want to evaluate each LLM using the evaluation set with a performant metric. Which metric should they choose for this evaluation?

Options

ABLEU metric
BNDCG metric
CROUGE metric
DRECALL metric

Unlock GENERATIVE-AI-ENGINEER-ASSOCIATE to see the answer

You've previewed enough free GENERATIVE-AI-ENGINEER-ASSOCIATE questions. Unlock GENERATIVE-AI-ENGINEER-ASSOCIATE for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.

Unlock GENERATIVE-AI-ENGINEER-ASSOCIATE - $49.99 / 30 days Sign in

Topics

#LLM Evaluation#Machine Translation#NLP Metrics#BLEU Score

Full GENERATIVE-AI-ENGINEER-ASSOCIATE Practice