LLM tutoring agents achieve near-ceiling accuracy on optimal solutions but systematically over-reject valid suboptimal reasoning and over-validate incorrect ones in a propositional logic benchmark.
Constraints: • Output exactly one next step insymbolic notation only
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Confirming Correct, Missing the Rest: LLM Tutoring Agents Struggle Where Feedback Matters Most
LLM tutoring agents achieve near-ceiling accuracy on optimal solutions but systematically over-reject valid suboptimal reasoning and over-validate incorrect ones in a propositional logic benchmark.