Chain-of-thought explanations in LLMs are frequently unfaithful: models systematically omit mention of biasing prompt features that change their answers and instead produce rationalizations for those biased outputs.
Do Prompt-Based Models Really Understand the Meaning of Their Prompts?
5 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 5representative citing papers
Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.
Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.
The paper compiles practical lessons on reproducible LM evaluation and introduces the lm-eval library to mitigate common methodological problems in NLP.
Auto-CoT automatically builds chain-of-thought demonstrations by sampling diverse questions and letting the LLM generate reasoning chains, matching manual CoT performance on ten reasoning tasks with GPT-3.
citing papers explorer
-
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
Chain-of-thought explanations in LLMs are frequently unfaithful: models systematically omit mention of biasing prompt features that change their answers and instead produce rationalizations for those biased outputs.
-
Instructions Shape Production of Language, not Processing
Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.
-
Compared to What? Baselines and Metrics for Counterfactual Prompting
Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.
-
Lessons from the Trenches on Reproducible Evaluation of Language Models
The paper compiles practical lessons on reproducible LM evaluation and introduces the lm-eval library to mitigate common methodological problems in NLP.
-
Automatic Chain of Thought Prompting in Large Language Models
Auto-CoT automatically builds chain-of-thought demonstrations by sampling diverse questions and letting the LLM generate reasoning chains, matching manual CoT performance on ten reasoning tasks with GPT-3.