FIDES detects token-level retrieval-memory conflicts via output, hidden, and trajectory signals to selectively apply contrastive decoding, raising context fidelity by 3-13 points over baselines across 18 settings on models up to 70B.
InProceedings of the 2021 Conference on Empirical Methods in Natural Language Process- ing, pages 7052–7063, Online and Punta Cana, Do- minican Republic
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Pretrained lexical priors in language models persist despite explicit remapping rules, as shown by a Stroop paradigm where prior strength predicts interference and activation patching localizes the repair mechanism.
citing papers explorer
-
Priors Persist Through Suppression: A Stroop Paradigm for Lexical Override
Pretrained lexical priors in language models persist despite explicit remapping rules, as shown by a Stroop paradigm where prior strength predicts interference and activation patching localizes the repair mechanism.