Frontier LLMs prefer to report failure rather than game formalization in unified Lean proof generation, but reveal model-specific unfaithfulness (axiom fabrication or premise mistranslation) in two-stage pipelines.
and Gu, Alex and Lipkin, Benjamin and Zhang, Cedegao E
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
PDDL-Mind improves LLM accuracy on theory-of-mind benchmarks by over 5% by translating stories into verifiable PDDL states that decouple environment tracking from belief inference.
A neuro-symbolic system using LLM disagreement to trigger Z3 formal verification achieves 94.3% accuracy and a combined score of 41.88 on syllogistic validity prediction, improving on the pure ensemble by reducing content effects.
A neuro-symbolic pipeline pairing 4B-parameter LLMs with a symbolic theorem prover delivers competitive accuracy and low content effects on syllogistic reasoning subtasks.
citing papers explorer
-
Do LLMs Game Formalization? Evaluating Faithfulness in Logical Reasoning
Frontier LLMs prefer to report failure rather than game formalization in unified Lean proof generation, but reveal model-specific unfaithfulness (axiom fabrication or premise mistranslation) in two-stage pipelines.
-
PDDL-Mind: Large Language Models are Capable on Belief Reasoning with Reliable State Tracking
PDDL-Mind improves LLM accuracy on theory-of-mind benchmarks by over 5% by translating stories into verifiable PDDL states that decouple environment tracking from belief inference.
-
FregeLogic at SemEval 2026 Task 11: A Hybrid Neuro-Symbolic Architecture for Content-Robust Syllogistic Validity Prediction
A neuro-symbolic system using LLM disagreement to trigger Z3 formal verification achieves 94.3% accuracy and a combined score of 41.88 on syllogistic validity prediction, improving on the pure ensemble by reducing content effects.
-
UFAL-CUNI at SemEval-2026 Task 11: An Efficient Modular Neuro-symbolic Method for Syllogistic Reasoning
A neuro-symbolic pipeline pairing 4B-parameter LLMs with a symbolic theorem prover delivers competitive accuracy and low content effects on syllogistic reasoning subtasks.