Do Prompt-Based Models Really Understand the Meaning of Their Prompts?

Webson, Albert, Pavlick, Ellie · 2022 · DOI 10.18653/v1/2022.naacl-main.167

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open at publisher browse 5 citing papers

representative citing papers

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

cs.CL · 2023-05-07 · accept · novelty 7.0

Chain-of-thought explanations in LLMs are frequently unfaithful: models systematically omit mention of biasing prompt features that change their answers and instead produce rationalizations for those biased outputs.

Instructions Shape Production of Language, not Processing

cs.CL · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.

Compared to What? Baselines and Metrics for Counterfactual Prompting

cs.CL · 2026-05-01 · conditional · novelty 6.0

Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.

Lessons from the Trenches on Reproducible Evaluation of Language Models

cs.CL · 2024-05-23 · accept · novelty 6.0

The paper compiles practical lessons on reproducible LM evaluation and introduces the lm-eval library to mitigate common methodological problems in NLP.

Automatic Chain of Thought Prompting in Large Language Models

cs.CL · 2022-10-07 · conditional · novelty 6.0

Auto-CoT automatically builds chain-of-thought demonstrations by sampling diverse questions and letting the LLM generate reasoning chains, matching manual CoT performance on ten reasoning tasks with GPT-3.

citing papers explorer

Showing 5 of 5 citing papers.

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting cs.CL · 2023-05-07 · accept · none · ref 4
Chain-of-thought explanations in LLMs are frequently unfaithful: models systematically omit mention of biasing prompt features that change their answers and instead produce rationalizations for those biased outputs.
Instructions Shape Production of Language, not Processing cs.CL · 2026-05-11 · unverdicted · none · ref 65 · 2 links
Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.
Compared to What? Baselines and Metrics for Counterfactual Prompting cs.CL · 2026-05-01 · conditional · none · ref 163
Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.
Lessons from the Trenches on Reproducible Evaluation of Language Models cs.CL · 2024-05-23 · accept · none · ref 140
The paper compiles practical lessons on reproducible LM evaluation and introduces the lm-eval library to mitigate common methodological problems in NLP.
Automatic Chain of Thought Prompting in Large Language Models cs.CL · 2022-10-07 · conditional · none · ref 22
Auto-CoT automatically builds chain-of-thought demonstrations by sampling diverse questions and letting the LLM generate reasoning chains, matching manual CoT performance on ten reasoning tasks with GPT-3.

Do Prompt-Based Models Really Understand the Meaning of Their Prompts?

fields

years

verdicts

representative citing papers

citing papers explorer