Unlearned language models retain low calibration error but show increased shortcut reliance on the TOFU benchmark, extending the reliability paradox to machine unlearning.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.
citing papers explorer
-
Calibration vs Decision Making: Revisiting the Reliability Paradox in Unlearned Language Models
Unlearned language models retain low calibration error but show increased shortcut reliance on the TOFU benchmark, extending the reliability paradox to machine unlearning.
-
Measuring AI Reasoning: A Guide for Researchers
Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.