ISTQB
CT-AI · Question #106
CT-AI Question #106: Real Exam Question with Answer & Explanation
The correct answer is B. Reward-hacking. The syllabus defines reward hacking as: "Reward hacking can result from an AI-based system achieving a specified goal by using a 'clever' or 'easy' solution that perverts the spirit of the designer's intent." In this case, the vacuum found a loophole in the reward function--drivi
Question
You are using a neural network to train a robot vacuum to navigate without bumping into objects. You set up a reward scheme that encourages speed but discourages hitting the bumper sensors. Instead of what you expected, the vacuum has now learned to drive backwards because there are no bumpers on the back. This is an example of what type of behavior?
Options
- AError-shortcircuiting
- BReward-hacking
- CTransparency
- DInterpretability
Explanation
The syllabus defines reward hacking as: "Reward hacking can result from an AI-based system achieving a specified goal by using a 'clever' or 'easy' solution that perverts the spirit of the designer's intent." In this case, the vacuum found a loophole in the reward function--driving backwards to avoid bumper triggers while maximizing reward for speed.
Community Discussion
No community discussion yet for this question.