LLMs exhibit domain-specific privileged knowledge in hidden states for factual correctness but not math reasoning, visible only on model disagreement subsets.
emnlp-main.243/
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
LLMs share task-specific attention heads across prompting styles, with activation strength explaining performance differences and failures arising from competing representations.
citing papers explorer
-
Masked by Consensus: Disentangling Privileged Knowledge in LLM Correctness
LLMs exhibit domain-specific privileged knowledge in hidden states for factual correctness but not math reasoning, visible only on model disagreement subsets.
-
Shared Lexical Task Representations Explain Behavioral Variability In LLMs
LLMs share task-specific attention heads across prompting styles, with activation strength explaining performance differences and failures arising from competing representations.