DeltaLogic reveals that models with strong initial logical accuracy often fail to revise beliefs correctly after minimal premise edits, showing inertia even when the gold answer should change.
LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
DeltaLogic: Minimal Premise Edits Reveal Belief-Revision Failures in Logical Reasoning Models
DeltaLogic reveals that models with strong initial logical accuracy often fail to revise beliefs correctly after minimal premise edits, showing inertia even when the gold answer should change.