nerdexam
ISTQB

CT-AI · Question #106

CT-AI Question #106: Real Exam Question with Answer & Explanation

The correct answer is B. Reward-hacking. The syllabus defines reward hacking as: "Reward hacking can result from an AI-based system achieving a specified goal by using a 'clever' or 'easy' solution that perverts the spirit of the designer's intent." In this case, the vacuum found a loophole in the reward function--drivi

Question

You are using a neural network to train a robot vacuum to navigate without bumping into objects. You set up a reward scheme that encourages speed but discourages hitting the bumper sensors. Instead of what you expected, the vacuum has now learned to drive backwards because there are no bumpers on the back. This is an example of what type of behavior?

Options

  • AError-shortcircuiting
  • BReward-hacking
  • CTransparency
  • DInterpretability

Explanation

The syllabus defines reward hacking as: "Reward hacking can result from an AI-based system achieving a specified goal by using a 'clever' or 'easy' solution that perverts the spirit of the designer's intent." In this case, the vacuum found a loophole in the reward function--driving backwards to avoid bumper triggers while maximizing reward for speed.

Community Discussion

No community discussion yet for this question.

Full CT-AI Practice
You are using a neural network to train a robot vacuum to navigate... | CT-AI Q#106 Answer | NerdExam