Abductive Reasoning with Probabilistic Commonsense

Chiara Roverato; Didier Chetelat; Han Zhou; Joseph Cotnareanu; Mark Coates; Yingxue Zhang

arxiv: 2605.08011 · v1 · submitted 2026-05-08 · 💻 cs.AI · stat.CO

Abductive Reasoning with Probabilistic Commonsense

Joseph Cotnareanu , Chiara Roverato , Han Zhou , Didier Chetelat , Yingxue Zhang , Mark Coates This is my paper

Pith reviewed 2026-05-11 03:00 UTC · model grok-4.3

classification 💻 cs.AI stat.CO

keywords abductive reasoningprobabilistic commonsenselarge language modelsneurosymbolic AIformal logic solversbelief variationmajority judgment

0 comments

The pith

By sampling multiple possible commonsense proofs from a language model and aggregating their conclusions, a new algorithm determines what most people would likely judge as true or false more accurately than methods assuming fixed facts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that commonsense beliefs vary across individuals, so AI systems that rely on language models to fill gaps in formal logic solvers will make mistakes if they treat supplied facts as universally accepted. It introduces a method that draws many distinct proofs from the language model, each standing in for a different person's belief set, and then aggregates the results to estimate the judgment that most people would reach. This matters because it lets reasoning systems account for real disagreement in everyday knowledge rather than forcing a single view, which improves accuracy on tasks that require finding the most plausible explanation for given observations. The approach pairs the language model's ability to generate assumptions with a formal solver's ability to check logical validity across those samples.

Core claim

PACS samples multiple proofs by prompting an LLM to supply commonsense assumptions and using a formal solver to validate each one, treats every valid sample as an observation of one possible individual's distinct belief set, and aggregates the conclusions across samples to estimate whether most people would accept a given statement as true or false.

What carries the argument

PACS, the algorithm that samples LLM-generated proofs as observations of varied commonsense beliefs and aggregates their conclusions to approximate majority human judgment.

If this is right

PACS achieves higher performance than chain-of-thought reasoning on the tested benchmarks.
It outperforms prior neurosymbolic methods that supply fixed commonsense assumptions.
It also beats search-based approaches by explicitly modeling variation rather than seeking a single solution.
The method can be applied across multiple benchmarks without requiring new human annotations for each commonsense fact.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The sampling approach could be adjusted to target specific demographic groups instead of a generic majority if human data from those groups were used to guide prompt variation.
Similar aggregation over multiple LLM outputs might apply to other subjective tasks such as preference modeling or ethical judgment where single answers are unreliable.
The framework suggests that the cost of additional samples trades off against accuracy in approximating human belief distributions, opening a path to efficiency studies.

Load-bearing premise

Repeated sampling from the language model produces a distribution of proofs that approximates how human commonsense beliefs actually differ, so that the aggregated outcome matches what most people would judge true or false.

What would settle it

A large-scale human survey that rates the same reasoning conclusions as true or false and shows that PACS majority votes match human majorities no better than chain-of-thought or fixed neurosymbolic baselines would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.08011 by Chiara Roverato, Didier Chetelat, Han Zhou, Joseph Cotnareanu, Mark Coates, Yingxue Zhang.

**Figure 1.** Figure 1: Diagram illustrating our proposed PACS algorithm. The LLM receives a question from a user which requires abductive reasoning. The LLM translates this question into premises S and a query proposition c whose truth value is to be determined. Ascertaining that it cannot be solved directly, the LLM then attempts to add new commonsense clauses l1, l2, l3, . . . , each time calling the formal logic solver to ve… view at source ↗

**Figure 2.** Figure 2: The (normalized) score progression of LLM sampled and PACS sampled paths. On the left and middle, we generate paths exhaustively taking 3 sample next-thoughts at each node. On the left, we show the model-count-based scores for incorrect paths and in the middle for the correct ones. We find no discernible difference between correct and incorrect score behaviour, indicating unfaithful LLM reasoning. On the… view at source ↗

**Figure 3.** Figure 3: Comparison of two very similar reasoning paths with opposite answers. On the left, we see a reasoning path in which, at step 3, a step which is not necessarily false but increasing in score is introduced. This clearly “throws off” the LLM, as its next step is simply the final (wrong) answer. On the right, however, we see that step 3 pushes the score down, bringing the path closer to a logically valid final… view at source ↗

read the original abstract

Recent efforts to improve the reasoning abilities of Large Language Models (LLMs) have focused on integrating formal logic solvers within neurosymbolic frameworks. A key challenge is that formal solvers lack commonsense world knowledge, preventing them from making reasoning steps that humans find obvious. Prior methods address this by using LLMs to supply missing commonsense assumptions, but these approaches implicitly assume universal agreement on such commonsense facts. In reality, commonsense beliefs vary across individuals. We propose a probabilistic framework for abductive commonsense reasoning that explicitly models this variation, aiming to determine whether most people would judge a statement as true or false. We introduce Probabilistic Abductive CommonSense (PACS), a novel algorithm that uses an LLM and a formal solver to sample proofs as observations of individuals' distinct commonsense beliefs, and aggregates conclusions across these samples. Empirically, PACS outperforms chain-of-thought reasoning, prior neurosymbolic methods, and search-based approaches across multiple benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PACS samples LLM proofs to model varying commonsense beliefs and aggregates for majority judgment, but the gains rest on an untested assumption that those samples track real human variation.

read the letter

The main takeaway is that this paper gives a concrete algorithm for handling disagreement on commonsense facts inside neurosymbolic pipelines. Instead of treating the LLM as a single source of universal background knowledge, PACS draws multiple proof samples, treats each as one person's belief set, runs them through a formal solver, and uses the aggregate to decide what most people would accept as true or false. That step is new relative to the chain-of-thought and neurosymbolic baselines cited in the abstract. It directly targets the practical bottleneck where prior methods collapse all commonsense into one fixed set of assumptions. The setup stays simple enough to plug into existing LLM-plus-solver stacks, which is a practical plus. The description of the sampling-plus-aggregation loop is clear and avoids circular fitting loops. The central empirical claim is that PACS beats chain-of-thought, earlier neurosymbolic methods, and search baselines on multiple benchmarks. That claim is stated plainly, but the abstract supplies no numbers, error bars, or ablation details, so the size and robustness of the improvement cannot be judged yet. The bigger soft spot is the missing link between LLM samples and actual human belief distributions. Nothing in the provided description shows calibration against human data or checks whether the sampled proofs exhibit the right kind of diversity rather than correlated LLM artifacts. If the model over-represents certain priors, the majority vote will simply reproduce that skew. This is a real gap, not a minor one, because the whole point of the method is to approximate what most people would judge. The work is aimed at people already building hybrid LLM-logic systems who need to move beyond uniform commonsense assumptions. Readers who want a drop-in algorithmic tweak for belief variation will find the method description useful even before the numbers are fully stress-tested. It deserves a serious referee. The idea is distinct enough and the problem is concrete enough that external review can tighten the validation without starting from scratch.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Probabilistic Abductive CommonSense (PACS), a neurosymbolic algorithm for abductive reasoning that models variation in commonsense beliefs. It uses an LLM to sample multiple proofs (treated as observations from distinct individuals' belief distributions), applies a formal solver to derive conclusions from each, and aggregates via majority vote to determine whether most people would judge a statement true or false. The paper claims this outperforms chain-of-thought reasoning, prior neurosymbolic methods, and search-based approaches across multiple benchmarks.

Significance. If the empirical results hold under proper controls and the sampling procedure can be shown to approximate human belief variation, PACS would address a genuine limitation in existing neurosymbolic systems that assume universal commonsense agreement. The probabilistic aggregation idea is a clear conceptual advance over deterministic assumption-injection methods. However, the absence of human calibration data means the practical significance remains provisional; the work is more a promising algorithmic proposal than a fully validated framework.

major comments (2)

Abstract: The claim of empirical outperformance over CoT, neurosymbolic, and search-based methods is stated without any quantitative results, error bars, benchmark names, dataset sizes, or ablation details. This makes it impossible to assess whether gains survive controls for prompt engineering, solver choice, or sampling temperature; the central empirical claim therefore cannot be evaluated from the provided information.
Method section (description of PACS algorithm): The framework treats repeated LLM-generated proof samples as draws from a distribution of human commonsense beliefs and uses majority vote to recover the modal judgment. No human calibration experiments, correlation with psychometric data on commonsense variation, or ablation comparing majority vote to single-sample or temperature-0 baselines are reported. This assumption is load-bearing for the probabilistic interpretation and the novelty claim relative to prior work that assumes universal agreement.

minor comments (1)

The paper would benefit from an explicit formal definition (e.g., as a probability distribution over possible worlds or belief sets) early in the method section to clarify how the LLM samples are aggregated.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment point by point below, indicating where revisions will be made to the manuscript.

read point-by-point responses

Referee: Abstract: The claim of empirical outperformance over CoT, neurosymbolic, and search-based methods is stated without any quantitative results, error bars, benchmark names, dataset sizes, or ablation details. This makes it impossible to assess whether gains survive controls for prompt engineering, solver choice, or sampling temperature; the central empirical claim therefore cannot be evaluated from the provided information.

Authors: We agree that the abstract would benefit from greater specificity. In the revised version, we will include concrete quantitative results (accuracy figures with error bars on the primary benchmarks), dataset sizes, and references to the key ablations (including controls for sampling temperature and solver variants). This will make the empirical claims directly evaluable while preserving the abstract's brevity. revision: yes
Referee: Method section (description of PACS algorithm): The framework treats repeated LLM-generated proof samples as draws from a distribution of human commonsense beliefs and uses majority vote to recover the modal judgment. No human calibration experiments, correlation with psychometric data on commonsense variation, or ablation comparing majority vote to single-sample or temperature-0 baselines are reported. This assumption is load-bearing for the probabilistic interpretation and the novelty claim relative to prior work that assumes universal agreement.

Authors: We acknowledge that the manuscript does not contain human calibration experiments or psychometric correlations validating that LLM samples approximate human belief distributions; this remains an assumption underlying the probabilistic framing. We do, however, include ablations of majority aggregation versus single-sample inference. We will revise the method and discussion sections to state the modeling assumption more explicitly, add the requested temperature-0 baseline comparison, and insert a limitations paragraph highlighting the need for future human validation studies. These changes will clarify the distinction from deterministic neurosymbolic baselines. revision: partial

standing simulated objections not resolved

Absence of human calibration experiments or psychometric data to empirically support the assumption that LLM-generated proof samples approximate variation in human commonsense beliefs.

Circularity Check

0 steps flagged

No circularity: algorithmic sampling and aggregation method is self-contained

full rationale

The paper presents PACS as an algorithmic procedure that invokes an external LLM to generate proof samples (treated as observations of individual belief distributions) and a formal solver to evaluate them, followed by majority-vote aggregation. No equations, parameters, or derivations are defined in terms of the target output; the method does not fit any quantity to a subset of its own results and then relabel that quantity as a prediction. No load-bearing self-citations or uniqueness theorems imported from the authors' prior work appear in the provided text. The central claim therefore rests on the external behavior of the LLM and solver rather than on any internal reduction to the inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on the unstated premise that LLM-generated proofs can serve as faithful proxies for human commonsense variation and that majority aggregation over samples yields a meaningful population-level judgment. No free parameters or invented entities are named in the abstract.

axioms (2)

domain assumption LLM-generated proofs constitute valid observations of distinct individual commonsense belief sets
Invoked when the method treats each sampled proof as coming from a different person.
domain assumption Majority vote across samples approximates what most people would judge true or false
Central modeling choice that converts per-sample conclusions into a population-level prediction.

pith-pipeline@v0.9.0 · 5469 in / 1395 out tokens · 33321 ms · 2026-05-11T03:00:13.799354+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a probabilistic framework... sample proofs as observations of individuals’ distinct commonsense beliefs, and aggregates conclusions across these samples.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

cAP(S, c) = 1/K Σ 1[S∧Lk ⊢ c]

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages

[1]

Harnessing the Power of Large Language Models for Natural Language to First-Order Logic Translation , author=. Proc. Conf. Association for Computational Linguistics , pages=

work page
[2]

CaDiCaL 2.0 , author=. Proc. Int. Conf. on Computer Aided Verification , pages=. 2024 , organization=

work page 2024
[3]

Large language models are zero-shot reasoners , author=. Proc. Conf. Neural Informations Processing Systems , pages=

work page
[4]

Getting closer to AI complete question answering: A set of prerequisite real tasks , author=. Proc. AAAI Conf. Artificial Intelligence , volume=

work page
[5]

Cosmos QA : Machine Reading Comprehension with Contextual Commonsense Reasoning

Huang, Lifu and Le Bras, Ronan and Bhagavatula, Chandra and Choi, Yejin. Cosmos QA : Machine Reading Comprehension with Contextual Commonsense Reasoning. Proc. Conf. Empirical Methods in Natural Language Processing. 2019

work page 2019
[6]

Language Models as Knowledge Bases?

Petroni, Fabio and Rockt. Language Models as Knowledge Bases?. Proc. Conf. Empirical Methods in Natural Language Processing. 2019

work page 2019
[7]

Faith and Fate: Limits of Transformers on Compositionality , volume =

Dziri, Nouha and Lu, Ximing and Sclar, Melanie and Li, Xiang (Lorraine) and Jiang, Liwei and Lin, Bill Yuchen and Welleck, Sean and West, Peter and Bhagavatula, Chandra and Le Bras, Ronan and Hwang, Jena and Sanyal, Soumya and Ren, Xiang and Ettinger, Allyson and Harchaoui, Zaid and Choi, Yejin , booktitle =. Faith and Fate: Limits of Transformers on Comp...

work page
[8]

Nilsson , abstract =

Nils J. Nilsson , abstract =. Logic and artificial intelligence , journal =. 1991 , issn =

work page 1991
[9]

Honghua Dong and Jiayuan Mao and Tian Lin and Chong Wang and Lihong Li and Denny Zhou , title =

work page
[10]

Bowman , title =

Miles Turpin and Julian Michael and Ethan Perez and Samuel R. Bowman , title =. Proc. Conf. Neural Information Processing Systems , year =

work page
[11]

FOLIO : Natural Language Reasoning with First-Order Logic

Han, Simeng and Schoelkopf, Hailey and Zhao, Yilun and Qi, Zhenting and Riddell, Martin and Zhou, Wenfei and Coady, James and Peng, David and Qiao, Yujie and Benson, Luke and Sun, Lucy and Wardle-Solano, Alexander and Szab \'o , Hannah and Zubova, Ekaterina and Burtell, Matthew and Fan, Jonathan and Liu, Yixin and Wong, Brian and Sailor, Malcolm and Ni, A...

work page 2024
[12]

Neural logic reasoning , author=. Proc. Int. Conf. Information & Knowledge Management , pages=

work page
[13]

Daniel Crevier , title =

work page
[14]

Bertrand Russell and Alfred Whitehead , title =

work page
[15]

IRE Transactions on Information Theory , year =

Allen Newell and Herbert Simon , title =. IRE Transactions on Information Theory , year =

work page
[16]

2020 , eprint=

Logical Neural Networks , author=. 2020 , eprint=

work page 2020
[17]

Camburu, Oana-Maria and Rockt\". Proc. Conf. Neural Information Processing Systems , title =

work page
[18]

2022 , author =

A comprehensive overview of knowledge graph completion , journal =. 2022 , author =

work page 2022
[19]

Embedding Uncertain Knowledge Graphs , number=. Proc. Conf. Artificial Intell. , author=. 2019 , month=

work page 2019
[20]

Automated Knowledge Base Construction , year =

Joint Reasoning for Multi-Faceted Commonsense Knowledge , author=. Automated Knowledge Base Construction , year =

work page
[21]

Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought , author=. Proc. Int. Conf. Learning Representations , year=

work page
[22]

2024 , eprint=

SymBa: Symbolic Backward Chaining for Structured Natural Language ReasoningSymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning , author=. 2024 , eprint=

work page 2024
[23]

Xi Ye and Qiaochu Chen and Isil Dillig and Greg Durrett , booktitle=. Sat

work page
[24]

Logic- LM : Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning

Pan, Liangming and Albalak, Alon and Wang, Xinyi and Wang, William. Logic- LM : Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning. Findings of the Association for Computational Linguistics. 2023

work page 2023
[25]

Faithful Chain-of-Thought Reasoning , author=. Proc. Conf. Natural Language Processing , year=

work page
[26]

LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers , author=. Proc. Conf. Empirical Methods in Natural Language Processing , pages=

work page
[27]

Hamilton , Title =

Koustuv Sinha and Shagun Sodhani and Jin Dong and Joelle Pineau and William L. Hamilton , Title =. 2019 , booktitle =

work page 2019
[28]

Reasoning with large lan- guage models, a survey

Reasoning with large language models, a survey , author=. arXiv preprint arXiv:2407.11511 , year=

work page arXiv
[29]

Diagnosing the first-order logical reasoning ability through LogicNLI , author=. Proc. Conf. Empirical Methods in Natural Language Processing , pages=

work page
[30]

Faithful Logical Reasoning via Symbolic Chain-of-Thought , author=. Proc. Conf. Association for Computational Linguistics , pages=

work page
[31]

ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language , author=. Proc. Conf. Association for Computational Linguistics: ACL-IJCNLP , pages=

work page
[32]

Transformers as soft reasoners over language , author=. Proc. Int. Joint Conf. on Artificial Intelligence , pages=

work page
[33]

Chain-of-thought prompting elicits reasoning in large language models , author=. Proc. Conf. Neural Information Processing Systems , pages=

work page
[34]

Language models are few-shot learners , author=. Proc. Conf. Neural Information Processing Systems , pages=

work page
[35]

LogiQA: a challenge dataset for machine reading comprehension with logical reasoning , author=. Proc. Int. Joint Conf. on Artificial Intelligence , pages=

work page
[36]

ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning , author=. Proc. Int. Conf. on Learning Representations , year=

work page
[37]

Graph of thoughts: Solving elaborate problems with large language models , author=. Proc. Conf. Association Advancement of Artifical Intelligence , pages=

work page
[38]

Decompose, analyze and rethink: Solving intricate problems with human-like reasoning cycle , author=. Proc. Conf. Neural Information Processing Systems , pages=

work page
[39]

Verifiable, Debuggable, and Repairable Commonsense Logical Reasoning via LLM-based Theory Resolution , author=. Proc. Conf. Empirical Methods in Natural Language Processing , pages=

work page
[40]

Tree of thoughts: Deliberate problem solving with large language models , author=. Proc. Conf. Neural Information Processing Systems , pages=

work page
[41]

Logic-Driven Context Extension and Data Augmentation for Logical Reasoning of Text

Wang, Siyuan and Zhong, Wanjun and Tang, Duyu and Wei, Zhongyu and Fan, Zhihao and Jiang, Daxin and Zhou, Ming and Duan, Nan. Logic-Driven Context Extension and Data Augmentation for Logical Reasoning of Text. Proc. Conf. Association for Computational Linguistics. 2022

work page 2022
[42]

Self-Consistency Improves Chain of Thought Reasoning in Language Models , author=. Proc. Int. Conf. on Learning Representations , year=

work page
[43]

2026 , eprint=

A Balanced Neuro-Symbolic Approach for Commonsense Abductive Logic , author=. 2026 , eprint=

work page 2026
[44]

LRM s are not thinking straight: Unreliability of thinking trajectories

Cuesta-Ramirez, Jhouben and Beaussant, Samuel and Mounsif, Mehdi. LRM s are not thinking straight: Unreliability of thinking trajectories. Proc. Conf. Natural Language Processing. 2025

work page 2025
[45]

Harnessing the Power of Large Language Models for Natural Language to First-Order Logic Translation

Yang, Yuan and Xiong, Siheng and Payani, Ali and Shareghi, Ehsan and Fekri, Faramarz. Harnessing the Power of Large Language Models for Natural Language to First-Order Logic Translation. Proc. Conf. Association for Computational Linguistics. 2024

work page 2024
[46]

Self-Evaluation Guided Beam Search for Reasoning , year =

Xie, Yuxi and Kawaguchi, Kenji and Zhao, Yiran and Zhao, James Xu and Kan, Min-Yen and He, Junxian and Xie, Michael , booktitle =. Self-Evaluation Guided Beam Search for Reasoning , year =

work page
[47]

Reasoning with Language Model is Planning with World Model

Hao, Shibo and Gu, Yi and Ma, Haodi and Hong, Joshua and Wang, Zhen and Wang, Daisy and Hu, Zhiting. Reasoning with Language Model is Planning with World Model. Proc. Conf. Empirical Methods in Natural Language Processing. 2023

work page 2023
[48]

Stepwise Informativeness Search for Improving LLM Reasoning

Wang, Siyuan and Zhao, Enda and Ren, Xiang. Stepwise Informativeness Search for Improving LLM Reasoning. Proc. Conf. Empirical Methods in Natural Language Processing. 2025

work page 2025
[49]

arXiv:2409.17539 , archivePrefix=

Logic-of-thought: Injecting logic into contexts for full reasoning in large language models , author=. arXiv:2409.17539 , archivePrefix=

work page arXiv
[50]

LAMBADA: Backward Chaining for Automated Reasoning in Natural Language , author=. Proc. Conf. Association for Computational Linguistics , pages=

work page

[1] [1]

Harnessing the Power of Large Language Models for Natural Language to First-Order Logic Translation , author=. Proc. Conf. Association for Computational Linguistics , pages=

work page

[2] [2]

CaDiCaL 2.0 , author=. Proc. Int. Conf. on Computer Aided Verification , pages=. 2024 , organization=

work page 2024

[3] [3]

Large language models are zero-shot reasoners , author=. Proc. Conf. Neural Informations Processing Systems , pages=

work page

[4] [4]

Getting closer to AI complete question answering: A set of prerequisite real tasks , author=. Proc. AAAI Conf. Artificial Intelligence , volume=

work page

[5] [5]

Cosmos QA : Machine Reading Comprehension with Contextual Commonsense Reasoning

Huang, Lifu and Le Bras, Ronan and Bhagavatula, Chandra and Choi, Yejin. Cosmos QA : Machine Reading Comprehension with Contextual Commonsense Reasoning. Proc. Conf. Empirical Methods in Natural Language Processing. 2019

work page 2019

[6] [6]

Language Models as Knowledge Bases?

Petroni, Fabio and Rockt. Language Models as Knowledge Bases?. Proc. Conf. Empirical Methods in Natural Language Processing. 2019

work page 2019

[7] [7]

Faith and Fate: Limits of Transformers on Compositionality , volume =

Dziri, Nouha and Lu, Ximing and Sclar, Melanie and Li, Xiang (Lorraine) and Jiang, Liwei and Lin, Bill Yuchen and Welleck, Sean and West, Peter and Bhagavatula, Chandra and Le Bras, Ronan and Hwang, Jena and Sanyal, Soumya and Ren, Xiang and Ettinger, Allyson and Harchaoui, Zaid and Choi, Yejin , booktitle =. Faith and Fate: Limits of Transformers on Comp...

work page

[8] [8]

Nilsson , abstract =

Nils J. Nilsson , abstract =. Logic and artificial intelligence , journal =. 1991 , issn =

work page 1991

[9] [9]

Honghua Dong and Jiayuan Mao and Tian Lin and Chong Wang and Lihong Li and Denny Zhou , title =

work page

[10] [10]

Bowman , title =

Miles Turpin and Julian Michael and Ethan Perez and Samuel R. Bowman , title =. Proc. Conf. Neural Information Processing Systems , year =

work page

[11] [11]

FOLIO : Natural Language Reasoning with First-Order Logic

Han, Simeng and Schoelkopf, Hailey and Zhao, Yilun and Qi, Zhenting and Riddell, Martin and Zhou, Wenfei and Coady, James and Peng, David and Qiao, Yujie and Benson, Luke and Sun, Lucy and Wardle-Solano, Alexander and Szab \'o , Hannah and Zubova, Ekaterina and Burtell, Matthew and Fan, Jonathan and Liu, Yixin and Wong, Brian and Sailor, Malcolm and Ni, A...

work page 2024

[12] [12]

Neural logic reasoning , author=. Proc. Int. Conf. Information & Knowledge Management , pages=

work page

[13] [13]

Daniel Crevier , title =

work page

[14] [14]

Bertrand Russell and Alfred Whitehead , title =

work page

[15] [15]

IRE Transactions on Information Theory , year =

Allen Newell and Herbert Simon , title =. IRE Transactions on Information Theory , year =

work page

[16] [16]

2020 , eprint=

Logical Neural Networks , author=. 2020 , eprint=

work page 2020

[17] [17]

Camburu, Oana-Maria and Rockt\". Proc. Conf. Neural Information Processing Systems , title =

work page

[18] [18]

2022 , author =

A comprehensive overview of knowledge graph completion , journal =. 2022 , author =

work page 2022

[19] [19]

Embedding Uncertain Knowledge Graphs , number=. Proc. Conf. Artificial Intell. , author=. 2019 , month=

work page 2019

[20] [20]

Automated Knowledge Base Construction , year =

Joint Reasoning for Multi-Faceted Commonsense Knowledge , author=. Automated Knowledge Base Construction , year =

work page

[21] [21]

Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought , author=. Proc. Int. Conf. Learning Representations , year=

work page

[22] [22]

2024 , eprint=

SymBa: Symbolic Backward Chaining for Structured Natural Language ReasoningSymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning , author=. 2024 , eprint=

work page 2024

[23] [23]

Xi Ye and Qiaochu Chen and Isil Dillig and Greg Durrett , booktitle=. Sat

work page

[24] [24]

Logic- LM : Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning

Pan, Liangming and Albalak, Alon and Wang, Xinyi and Wang, William. Logic- LM : Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning. Findings of the Association for Computational Linguistics. 2023

work page 2023

[25] [25]

Faithful Chain-of-Thought Reasoning , author=. Proc. Conf. Natural Language Processing , year=

work page

[26] [26]

LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers , author=. Proc. Conf. Empirical Methods in Natural Language Processing , pages=

work page

[27] [27]

Hamilton , Title =

Koustuv Sinha and Shagun Sodhani and Jin Dong and Joelle Pineau and William L. Hamilton , Title =. 2019 , booktitle =

work page 2019

[28] [28]

Reasoning with large lan- guage models, a survey

Reasoning with large language models, a survey , author=. arXiv preprint arXiv:2407.11511 , year=

work page arXiv

[29] [29]

Diagnosing the first-order logical reasoning ability through LogicNLI , author=. Proc. Conf. Empirical Methods in Natural Language Processing , pages=

work page

[30] [30]

Faithful Logical Reasoning via Symbolic Chain-of-Thought , author=. Proc. Conf. Association for Computational Linguistics , pages=

work page

[31] [31]

ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language , author=. Proc. Conf. Association for Computational Linguistics: ACL-IJCNLP , pages=

work page

[32] [32]

Transformers as soft reasoners over language , author=. Proc. Int. Joint Conf. on Artificial Intelligence , pages=

work page

[33] [33]

Chain-of-thought prompting elicits reasoning in large language models , author=. Proc. Conf. Neural Information Processing Systems , pages=

work page

[34] [34]

Language models are few-shot learners , author=. Proc. Conf. Neural Information Processing Systems , pages=

work page

[35] [35]

LogiQA: a challenge dataset for machine reading comprehension with logical reasoning , author=. Proc. Int. Joint Conf. on Artificial Intelligence , pages=

work page

[36] [36]

ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning , author=. Proc. Int. Conf. on Learning Representations , year=

work page

[37] [37]

Graph of thoughts: Solving elaborate problems with large language models , author=. Proc. Conf. Association Advancement of Artifical Intelligence , pages=

work page

[38] [38]

Decompose, analyze and rethink: Solving intricate problems with human-like reasoning cycle , author=. Proc. Conf. Neural Information Processing Systems , pages=

work page

[39] [39]

Verifiable, Debuggable, and Repairable Commonsense Logical Reasoning via LLM-based Theory Resolution , author=. Proc. Conf. Empirical Methods in Natural Language Processing , pages=

work page

[40] [40]

Tree of thoughts: Deliberate problem solving with large language models , author=. Proc. Conf. Neural Information Processing Systems , pages=

work page

[41] [41]

Logic-Driven Context Extension and Data Augmentation for Logical Reasoning of Text

Wang, Siyuan and Zhong, Wanjun and Tang, Duyu and Wei, Zhongyu and Fan, Zhihao and Jiang, Daxin and Zhou, Ming and Duan, Nan. Logic-Driven Context Extension and Data Augmentation for Logical Reasoning of Text. Proc. Conf. Association for Computational Linguistics. 2022

work page 2022

[42] [42]

Self-Consistency Improves Chain of Thought Reasoning in Language Models , author=. Proc. Int. Conf. on Learning Representations , year=

work page

[43] [43]

2026 , eprint=

A Balanced Neuro-Symbolic Approach for Commonsense Abductive Logic , author=. 2026 , eprint=

work page 2026

[44] [44]

LRM s are not thinking straight: Unreliability of thinking trajectories

Cuesta-Ramirez, Jhouben and Beaussant, Samuel and Mounsif, Mehdi. LRM s are not thinking straight: Unreliability of thinking trajectories. Proc. Conf. Natural Language Processing. 2025

work page 2025

[45] [45]

Harnessing the Power of Large Language Models for Natural Language to First-Order Logic Translation

Yang, Yuan and Xiong, Siheng and Payani, Ali and Shareghi, Ehsan and Fekri, Faramarz. Harnessing the Power of Large Language Models for Natural Language to First-Order Logic Translation. Proc. Conf. Association for Computational Linguistics. 2024

work page 2024

[46] [46]

Self-Evaluation Guided Beam Search for Reasoning , year =

Xie, Yuxi and Kawaguchi, Kenji and Zhao, Yiran and Zhao, James Xu and Kan, Min-Yen and He, Junxian and Xie, Michael , booktitle =. Self-Evaluation Guided Beam Search for Reasoning , year =

work page

[47] [47]

Reasoning with Language Model is Planning with World Model

Hao, Shibo and Gu, Yi and Ma, Haodi and Hong, Joshua and Wang, Zhen and Wang, Daisy and Hu, Zhiting. Reasoning with Language Model is Planning with World Model. Proc. Conf. Empirical Methods in Natural Language Processing. 2023

work page 2023

[48] [48]

Stepwise Informativeness Search for Improving LLM Reasoning

Wang, Siyuan and Zhao, Enda and Ren, Xiang. Stepwise Informativeness Search for Improving LLM Reasoning. Proc. Conf. Empirical Methods in Natural Language Processing. 2025

work page 2025

[49] [49]

arXiv:2409.17539 , archivePrefix=

Logic-of-thought: Injecting logic into contexts for full reasoning in large language models , author=. arXiv:2409.17539 , archivePrefix=

work page arXiv

[50] [50]

LAMBADA: Backward Chaining for Automated Reasoning in Natural Language , author=. Proc. Conf. Association for Computational Linguistics , pages=

work page