nerdexam
DatabricksDatabricks

GENERATIVE-AI-ENGINEER-ASSOCIATE · Question #63

GENERATIVE-AI-ENGINEER-ASSOCIATE Question #63: Real Exam Question with Answer & Explanation

The correct answer is A: BLEU metric. The BLEU (Bilingual Evaluation Understudy) metric is a widely used evaluation metric specifically designed for assessing the quality of machine-generated translations. It compares the overlap between the n-grams of the translated text and the reference translations, making it ide

LLM Evaluation Metrics

Question

A Generative AI Engineer has built an LLM-based system that will automatically translate user text between two languages. They now want to benchmark multiple LLM's on this task and pick the best one. They have an evaluation set with known high quality translation examples. They want to evaluate each LLM using the evaluation set with a performant metric. Which metric should they choose for this evaluation?

Options

  • ABLEU metric
  • BNDCG metric
  • CROUGE metric
  • DRECALL metric

Explanation

The BLEU (Bilingual Evaluation Understudy) metric is a widely used evaluation metric specifically designed for assessing the quality of machine-generated translations. It compares the overlap between the n-grams of the translated text and the reference translations, making it ideal for evaluating LLM-based translation systems.

Topics

#LLM Evaluation#Machine Translation#NLP Metrics#BLEU Score

Community Discussion

No community discussion yet for this question.

Full GENERATIVE-AI-ENGINEER-ASSOCIATE PracticeBrowse All GENERATIVE-AI-ENGINEER-ASSOCIATE Questions