and Gu, Alex and Lipkin, Benjamin and Zhang, Cedegao E

Olausson, Theo X · 2023 · DOI 10.18653/v1/2023.emnlp-main.313

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

representative citing papers

Do LLMs Game Formalization? Evaluating Faithfulness in Logical Reasoning

cs.AI · 2026-04-21 · unverdicted · novelty 7.0

Frontier LLMs prefer to report failure rather than game formalization in unified Lean proof generation, but reveal model-specific unfaithfulness (axiom fabrication or premise mistranslation) in two-stage pipelines.

PDDL-Mind: Large Language Models are Capable on Belief Reasoning with Reliable State Tracking

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

PDDL-Mind improves LLM accuracy on theory-of-mind benchmarks by over 5% by translating stories into verifiable PDDL states that decouple environment tracking from belief inference.

FregeLogic at SemEval 2026 Task 11: A Hybrid Neuro-Symbolic Architecture for Content-Robust Syllogistic Validity Prediction

cs.CL · 2026-04-20 · unverdicted · novelty 5.0

A neuro-symbolic system using LLM disagreement to trigger Z3 formal verification achieves 94.3% accuracy and a combined score of 41.88 on syllogistic validity prediction, improving on the pure ensemble by reducing content effects.

UFAL-CUNI at SemEval-2026 Task 11: An Efficient Modular Neuro-symbolic Method for Syllogistic Reasoning

cs.CL · 2026-05-06 · unverdicted · novelty 4.0

A neuro-symbolic pipeline pairing 4B-parameter LLMs with a symbolic theorem prover delivers competitive accuracy and low content effects on syllogistic reasoning subtasks.

citing papers explorer

Showing 4 of 4 citing papers.

Do LLMs Game Formalization? Evaluating Faithfulness in Logical Reasoning cs.AI · 2026-04-21 · unverdicted · none · ref 18
Frontier LLMs prefer to report failure rather than game formalization in unified Lean proof generation, but reveal model-specific unfaithfulness (axiom fabrication or premise mistranslation) in two-stage pipelines.
PDDL-Mind: Large Language Models are Capable on Belief Reasoning with Reliable State Tracking cs.CL · 2026-04-20 · unverdicted · none · ref 57
PDDL-Mind improves LLM accuracy on theory-of-mind benchmarks by over 5% by translating stories into verifiable PDDL states that decouple environment tracking from belief inference.
FregeLogic at SemEval 2026 Task 11: A Hybrid Neuro-Symbolic Architecture for Content-Robust Syllogistic Validity Prediction cs.CL · 2026-04-20 · unverdicted · none · ref 13
A neuro-symbolic system using LLM disagreement to trigger Z3 formal verification achieves 94.3% accuracy and a combined score of 41.88 on syllogistic validity prediction, improving on the pure ensemble by reducing content effects.
UFAL-CUNI at SemEval-2026 Task 11: An Efficient Modular Neuro-symbolic Method for Syllogistic Reasoning cs.CL · 2026-05-06 · unverdicted · none · ref 4
A neuro-symbolic pipeline pairing 4B-parameter LLMs with a symbolic theorem prover delivers competitive accuracy and low content effects on syllogistic reasoning subtasks.

and Gu, Alex and Lipkin, Benjamin and Zhang, Cedegao E

fields

years

verdicts

representative citing papers

citing papers explorer