AIF-C01 · Question #111
AIF-C01 Question #111: Real Exam Question with Answer & Explanation
The correct answer is D: Extracting the prompt template. Extracting the prompt template (D) is the correct answer because this attack specifically targets the system prompt or instruction configuration itself, tricking the LLM into revealing the hidden directives that define its behavior, persona, and restrictions - directly exposing h
Question
Which prompting attack directly exposes the configured behavior of a large language model (LLM)?
Options
- APrompted persona switches
- BExploiting friendliness and trust
- CIgnoring the prompt template
- DExtracting the prompt template
Explanation
Extracting the prompt template (D) is the correct answer because this attack specifically targets the system prompt or instruction configuration itself, tricking the LLM into revealing the hidden directives that define its behavior, persona, and restrictions - directly exposing how the model has been set up.
Why the distractors are wrong:
- A (Prompted persona switches): This manipulates the model into acting differently (e.g., "pretend you're an unrestricted AI"), but it doesn't necessarily reveal the underlying configuration.
- B (Exploiting friendliness and trust): This leverages the model's conversational nature to bypass guardrails socially, but the goal is manipulation, not exposure of the template.
- C (Ignoring the prompt template): This involves getting the model to disregard its instructions, but again, the template isn't revealed - it's bypassed.
Memory tip: Think of the word "extracting" - just like extracting a document from a locked drawer, this attack pulls out the hidden system prompt for the attacker to see. The key word in the question is "exposes", which directly maps to extraction, not manipulation or bypass.
Topics
Community Discussion
No community discussion yet for this question.