Semantic-Aware Logical Reasoning via a Semiotic Framework

Junqing Yu; Junxi Sheng; Wei Yang; Wenbing Li; Xinglang Zhang; Yi-Ping Phoebe Chen; Yunyao Zhang; Zikai Song

arxiv: 2509.24765 · v8 · submitted 2025-09-29 · 💻 cs.AI

Semantic-Aware Logical Reasoning via a Semiotic Framework

Yunyao Zhang , Xinglang Zhang , Junxi Sheng , Wenbing Li , Junqing Yu , Yi-Ping Phoebe Chen , Wei Yang , Zikai Song This is my paper

Pith reviewed 2026-05-18 12:54 UTC · model grok-4.3

classification 💻 cs.AI

keywords logical reasoningsemiotic squarelarge language modelsmulti-perspective analysisRepublicQAsemantic complexityautomated deduction

0 comments

The pith

LogicAgent combines a semiotic square for multi-perspective semantics with deduction and verification to improve logical reasoning in language models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces LogicAgent, a framework guided by the semiotic square that analyzes propositions from several semantic angles at once. It pairs this analysis with automated deduction steps and reflective checks to handle longer chains of reasoning. A new benchmark called RepublicQA tests these abilities with abstract, philosophically grounded statements that include contrary and contradictory forms at college-level reading difficulty. Results show consistent gains on this benchmark and on established ones like ProntoQA and FOLIO. The work matters because most current systems falter when both the meaning is ambiguous and the logical steps are deep.

Core claim

LogicAgent integrates the semiotic square to perform multi-perspective semantic analysis and combines it with automated deduction plus reflective verification, allowing large language models to manage logical complexity more effectively across deeper reasoning chains on tasks that mix semantic and logical difficulty.

What carries the argument

The semiotic square, which organizes semantic relations among a proposition, its contrary, its contradictory, and its subcontrary to enable structured multi-perspective examination of meaning.

If this is right

Language models gain the ability to track conflicting stances within the same reasoning task rather than collapsing them early.
Benchmarks that jointly vary semantic depth and logical length become necessary for realistic evaluation of reasoning systems.
The same semiotic structure can be reused to generate or verify reasoning traces that explicitly account for alternative meanings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could transfer to domains such as policy analysis or case law where one proposition must be examined against its logical opposites.
Future work could test whether the square structure helps models avoid common semantic pitfalls like scope ambiguity in natural-language premises.
If the integration scales, it suggests a general route for adding lightweight symbolic scaffolds to purely neural reasoning pipelines.

Load-bearing premise

The semiotic square supplies a reliable structure for breaking down semantic relations that can be usefully combined with deduction and verification inside language-model reasoning loops.

What would settle it

Run LogicAgent and a version without the semiotic-square component on a fresh set of abstract propositions with systematically varied contrary and contradictory forms; if the full system shows no measurable gain in accuracy or chain length, the central integration claim does not hold.

Figures

Figures reproduced from arXiv: 2509.24765 by Junqing Yu, Junxi Sheng, Wei Yang, Wenbing Li, Xinglang Zhang, Yi-Ping Phoebe Chen, Yunyao Zhang, Zikai Song.

**Figure 1.** Figure 1: Overview of LogicAgent and the proposed RepublicQA benchmark. (Top-left) RepublicQA features abstract, philosophical propositions from Plato’s Republic with diverse contextual premises, enabling multiple semantic interpretations. (Bottom-left) LogicAgent consists of three stages. (Top-right) A multi-step reasoning process explores contraries and contradictions when S1 is indeterminate. (Bottom-right)… view at source ↗

**Figure 2.** Figure 2: Greimas’ Semiotic Square: illustrating contraries (S1 vs. S2), contradictions (S1 vs. ¬S1, S2 vs. ¬S2), and implications (S1 ⇒ ¬S2, S2 ⇒ ¬S1). Greimas’ Semiotic Square. The Greimas’ Semiotic Square (Greimas et al., 1982) is a foundational construct in structuralist semantics that organizes conceptual contraries and contradictions into a four-element structure, enabling fine-grained reasoning over meaning… view at source ↗

**Figure 3.** Figure 3: Overview of the LogicAgent framework. The agent processes a natural language proposition through three stages. (1) Semantic Structuring Stage constructs a Greimas’ Semiotic Square, generating four interrelated propositions: the primary proposition S1, its contradiction ¬S1, the contrary S2, and the contradiction of the contrary ¬S2. These are verified for FOL-consistency using a CFG-based parser. (2) Log… view at source ↗

**Figure 4.** Figure 4: Complexity metrics comparison. Red is our benchmark. Current benchmarks primarily focus on logical complexity while largely overlooking semantic complexity, resulting in limited coverage of abstraction, contextual ambiguity, and nuanced meaning. To address this gap, we construct RepublicQA, a benchmark designed to jointly capture logical depth and semantic breadth reasoning. Benchmark Construction. R… view at source ↗

**Figure 5.** Figure 5: Ablation studies: (a) input modalities and (b) reasoning efficiency. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: An example CFG parse tree for the FOL rule [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: Answer distribution across different benchmarks. [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: Analysis of Philosophical Concepts: (a) frequency distribution of concepts, (b) overall [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗

**Figure 9.** Figure 9: Overall and relation-specific accuracy across datasets. FOLIO ProntoQA ProofWriter ProverQA RepublicQA 0.0 0.2 0.4 0.6 0.8 1.0 Proportion 0.70 0.30 1.00 1.00 0.87 0.13 0.27 0.70 Contradictory Proportion Contrary Proportion [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗

read the original abstract

Logical reasoning is a fundamental capability of large language models. However, existing studies often overlook the interaction between logical complexity and semantic complexity, leading to systems that struggle with abstract propositions, ambiguous contexts, and conflicting stances that are central to human reasoning. We propose LogicAgent, a semiotic-square-guided framework that jointly addresses these two axes of difficulty. The semiotic square provides a principled structure for multi-perspective semantic analysis, and LogicAgent integrates automated deduction with reflective verification to manage logical complexity across deeper reasoning chains. To support evaluation under these conditions, we introduce RepublicQA, a benchmark that couples semantic complexity with logical depth. RepublicQA reaches college-level semantic difficulty (FKGL 11.94), contains philosophically grounded abstract propositions with systematically constructed contrary and contradictory forms, and offers a semantically rich setting for assessing logical reasoning in large language models. Experiments show that LogicAgent achieves state-of-the-art performance on RepublicQA with a 6.25 percent average improvement over strong baselines, and generalizes effectively to mainstream logical reasoning benchmarks including ProntoQA, ProofWriter, FOLIO, and ProverQA, achieving an additional 7.05 percent average gain. These results demonstrate the effectiveness of semiotic-grounded multi-perspective reasoning in enhancing logical performance. Code is available at https://github.com/AI4SS/Logic-Agent.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LogicAgent layers a semiotic square onto deduction and reflection for LLM reasoning, with gains on a new abstract benchmark, but the square's specific role still needs isolation.

read the letter

The main thing to know is that this paper adds a semiotic square to guide multi-perspective semantic analysis inside an LLM agent that also does automated deduction and reflective verification. They pair it with a new benchmark, RepublicQA, built around abstract philosophical propositions and their contrary and contradictory forms. The reported results show a 6.25 percent lift on that benchmark and a 7.05 percent average gain when tested on ProntoQA, ProofWriter, FOLIO, and ProverQA. Code is released, which helps.

Referee Report

1 major / 1 minor

Summary. The paper proposes LogicAgent, a semiotic-square-guided framework that combines multi-perspective semantic analysis with automated deduction and reflective verification to improve logical reasoning in LLMs under conditions of high semantic and logical complexity. It introduces the RepublicQA benchmark, which features college-level semantic difficulty (FKGL 11.94), philosophically grounded abstract propositions, and systematically constructed contrary/contradictory forms. Experiments report that LogicAgent achieves SOTA performance on RepublicQA (6.25% average improvement over strong baselines) and generalizes to ProntoQA, ProofWriter, FOLIO, and ProverQA (additional 7.05% average gain). Code is released publicly.

Significance. If the results hold after addressing isolation concerns, the work would represent a meaningful step toward integrating semiotic structures with automated reasoning pipelines in LLMs, offering a structured way to handle ambiguous and conflicting semantic contexts that current systems often overlook. The new RepublicQA benchmark and public code release are concrete strengths that could support follow-on research in semantic-aware logical reasoning.

major comments (1)

[Framework and Experiments] Framework description and experimental evaluation: The central claim attributes the 6.25% RepublicQA gain and 7.05% cross-benchmark improvement specifically to the semiotic-square-guided multi-perspective analysis. However, no ablation is reported that removes or replaces the semiotic square (contrary/contradictory forms and multi-perspective semantic analysis) while retaining the automated deduction and reflective verification steps. This leaves open whether the gains arise from the semiotic component or from reflective verification alone, which is load-bearing for the paper's attribution of effectiveness to the semiotic framework.

minor comments (1)

[Abstract] Abstract and experimental setup: More explicit details on baseline implementations, exact prompting templates, and statistical controls (e.g., number of runs, variance) would strengthen reproducibility claims, even with code release.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the thorough review and constructive criticism. The concern about isolating the contribution of the semiotic square is well-taken and points to a genuine gap in the current experimental design. We address this point directly below and outline the planned revision.

read point-by-point responses

Referee: [Framework and Experiments] Framework description and experimental evaluation: The central claim attributes the 6.25% RepublicQA gain and 7.05% cross-benchmark improvement specifically to the semiotic-square-guided multi-perspective analysis. However, no ablation is reported that removes or replaces the semiotic square (contrary/contradictory forms and multi-perspective semantic analysis) while retaining the automated deduction and reflective verification steps. This leaves open whether the gains arise from the semiotic component or from reflective verification alone, which is load-bearing for the paper's attribution of effectiveness to the semiotic framework.

Authors: We agree that the manuscript would be strengthened by an ablation that removes or replaces the semiotic square (including the contrary/contradictory forms and multi-perspective semantic analysis) while keeping the automated deduction and reflective verification components intact. The existing baselines compare LogicAgent against methods that lack the full pipeline, but they do not isolate the semiotic component from reflective verification in the manner described. To address this directly, we will add a targeted ablation study in the revised version. This study will evaluate a variant that retains deduction and verification but substitutes a non-semiotic multi-perspective prompt or removes the structured contrary/contradictory analysis. The new results will be reported alongside the existing experiments to clarify the specific contribution of the semiotic framework to the observed gains on RepublicQA and the other benchmarks. revision: yes

Circularity Check

0 steps flagged

No circularity: new framework and benchmark validated on external benchmarks

full rationale

The paper introduces LogicAgent as a novel semiotic-square-guided framework and RepublicQA as a new benchmark with college-level semantic difficulty and philosophically grounded propositions. Performance gains (6.25% on RepublicQA, 7.05% on cross-benchmarks) are reported via direct empirical comparison to strong baselines on ProntoQA, ProofWriter, FOLIO, and ProverQA. No equations, fitted parameters, or self-referential definitions appear in the abstract or described derivation; the semiotic square is presented as an imported principled structure rather than derived from the results themselves. The central claims rest on experimental outcomes and external benchmark generalization, making the derivation self-contained against independent data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the effectiveness of a newly introduced framework and benchmark whose core structuring assumption (semiotic square utility) and evaluation setting have no independent external validation cited.

axioms (1)

domain assumption The semiotic square provides a principled structure for multi-perspective semantic analysis.
Invoked directly as the foundation for LogicAgent in the abstract description of the framework.

invented entities (2)

LogicAgent no independent evidence
purpose: Semiotic-square-guided framework integrating deduction and reflective verification
Newly proposed system whose performance gains are demonstrated only within this work.
RepublicQA no independent evidence
purpose: Benchmark coupling semantic complexity with logical depth via abstract propositions and contrary/contradictory forms
Newly introduced evaluation dataset whose construction and difficulty claims are internal to the paper.

pith-pipeline@v0.9.0 · 5787 in / 1452 out tokens · 47242 ms · 2026-05-18T12:54:29.783256+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The semiotic square provides a principled structure for multi-perspective semantic analysis... S1 ⇒ ¬S2 and S2 ⇒ ¬S1 (Theorem 1)
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

LogicAgent integrates automated deduction with reflective verification... Deep Reflection using S1 ⇒ ¬S2

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

TEMA: Anchor the Image, Follow the Text for Multi-Modification Composed Image Retrieval
cs.CV 2026-04 unverdicted novelty 7.0

TEMA is the first framework for multi-modification composed image retrieval, using entity mapping to improve accuracy on both new complex datasets and existing benchmarks while balancing efficiency.
IntervenSim: Intervention-Aware Social Network Simulation for Opinion Dynamics
cs.SI 2026-04 unverdicted novelty 7.0

IntervenSim is an intervention-aware social network simulation that couples source interventions with crowd interactions in a feedback loop, improving MAPE by 41.6% and DTW by 66.9% over prior static frameworks on rea...
OmniTrend: Content-Context Modeling for Scalable Social Popularity Prediction
cs.CV 2026-04 unverdicted novelty 6.0

OmniTrend predicts popularity by combining separate content attractiveness and contextual exposure predictors using cross-modal and exogenous signals.
HotComment: A Benchmark for Evaluating Popularity of Online Comments
cs.AI 2026-04 unverdicted novelty 6.0

HotComment is a new multimodal benchmark that quantifies online comment popularity via content quality assessment, interaction-based prediction, and agent-simulated user engagement, accompanied by the StyleCmt stylist...
Towards Disentangled Preference Optimization Dynamics: Suppress the Loser, Preserve the Winner
cs.LG 2026-04 unverdicted novelty 6.0

A unified incentive-score decomposition of preference optimization reveals the disentanglement band condition and reward calibration method that enables suppressing losers while preserving winners in LLM training.
Coupling Macro Dynamics and Micro States for Long-Horizon Social Simulation
cs.SI 2026-04 unverdicted novelty 6.0

MF-MDP enables stable long-horizon social simulations by coupling micro-level individual opinion states with macro-level collective dynamics, achieving up to 40,000 interactions with 75% lower KL divergence than baselines.
Seeing Further and Wider: Joint Spatio-Temporal Enlargement for Micro-Video Popularity Prediction
cs.MM 2026-04 unverdicted novelty 5.0

A new joint spatio-temporal enlargement model for micro-video popularity prediction using frame scoring for long sequences and a topology-aware memory bank for unbounded historical associations.
CurEvo: Curriculum-Guided Self-Evolution for Video Understanding
cs.CV 2026-04 unverdicted novelty 4.0

CurEvo integrates curriculum guidance into self-evolution to structure autonomous improvement of video understanding models, yielding gains on VideoQA benchmarks.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · cited by 8 Pith papers · 13 internal anchors

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page
[2]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

Qwen Technical Report

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. Qwen technical report. arXiv preprint arXiv:2309.16609, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

Nltk: the natural language toolkit

Steven Bird. Nltk: the natural language toolkit. In Proceedings of the COLING/ACL 2006 interactive presentation sessions, pp.\ 69--72, 2006

work page 2006
[5]

Autoagents: A framework for automatic agent generation.arXiv preprint arXiv:2309.17288, 2023

Guangyao Chen, Siwei Dong, Yu Shu, Ge Zhang, Jaward Sesay, B \"o rje F Karlsson, Jie Fu, and Yemin Shi. Autoagents: A framework for automatic agent generation. arXiv preprint arXiv:2309.17288, 2023

work page arXiv 2023
[6]

Asymptotically unambitious artificial general intelligence

Michael Cohen, Badri Vellambi, and Marcus Hutter. Asymptotically unambitious artificial general intelligence. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pp.\ 2467--2476, 2020

work page 2020
[7]

Semcoder: Training code language models with comprehensive semantics reasoning

Yangruibo Ding, Jinjun Peng, Marcus Min, Gail Kaiser, Junfeng Yang, and Baishakhi Ray. Semcoder: Training code language models with comprehensive semantics reasoning. Advances in Neural Information Processing Systems, 37: 0 60275--60308, 2024

work page 2024
[8]

Agent AI: Surveying the Horizons of Multimodal Interaction

Zane Durante, Qiuyuan Huang, Naoki Wake, Ran Gong, Jae Sung Park, Bidipta Sarkar, Rohan Taori, Yusuke Noda, Demetri Terzopoulos, Yejin Choi, et al. Agent ai: Surveying the horizons of multimodal interaction. arXiv preprint arXiv:2401.03568, 2024

work page internal anchor Pith review arXiv 2024
[9]

Deep se (3)-equivariant geometric reasoning for precise placement tasks

Ben Eisner, Yi Yang, Todor Davchev, Mel Vecerik, Jonathan Scholz, and David Held. Deep se (3)-equivariant geometric reasoning for precise placement tasks. arXiv preprint arXiv:2404.13478, 2024

work page arXiv 2024
[10]

Pal: Program-aided language models

Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. Pal: Program-aided language models. In International Conference on Machine Learning, pp.\ 10764--10799. PMLR, 2023

work page 2023
[11]

Linguistic complexity: Locality of syntactic dependencies

Edward Gibson. Linguistic complexity: Locality of syntactic dependencies. Cognition, 68 0 (1): 0 1--76, 1998

work page 1998
[12]

On meaning: Selected writings in semiotic theory

Algirdas Julien Greimas. On meaning: Selected writings in semiotic theory. (No Title), 1987

work page 1987
[13]

Maupassant: The semiotics of text

Algirdas Julien Greimas. Maupassant: The semiotics of text. 1988

work page 1988
[14]

Semiotics and language: An analytical dictionary

Algirdas Julien Greimas, Joseph Court \'e s, Larry Crist, and Daniel Patte. Semiotics and language: An analytical dictionary. Indiana University Press Bloomington, 1982

work page 1982
[15]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[16]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[17]

Folio: Natural language reasoning with first-order logic

Simeng Han, Hailey Schoelkopf, Yilun Zhao, Zhenting Qi, Martin Riddell, Wenfei Zhou, James Coady, David Peng, Yujie Qiao, Luke Benson, et al. Folio: Natural language reasoning with first-order logic. arXiv preprint arXiv:2209.00840, 2022

work page arXiv 2022
[18]

Plan-then-execute: An empirical study of user trust and team performance when using llm agents as a daily assistant

Gaole He, Gianluca Demartini, and Ujwal Gadiraju. Plan-then-execute: An empirical study of user trust and team performance when using llm agents as a daily assistant. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pp.\ 1--22, 2025

work page 2025
[19]

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

Sirui Hong, Xiawu Zheng, Jonathan Chen, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, et al. Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 3 0 (4): 0 6, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[20]

Sf2t: Self-supervised fragment finetuning of video-llms for fine-grained understanding

Yangliu Hu, Zikai Song, Na Feng, Yawei Luo, Junqing Yu, Yi-Ping Phoebe Chen, and Wei Yang. Sf2t: Self-supervised fragment finetuning of video-llms for fine-grained understanding. arXiv preprint arXiv:2504.07745, 2025

work page arXiv 2025
[21]

Towards Reasoning in Large Language Models: A Survey

Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403, 2022

work page internal anchor Pith review arXiv 2022
[22]

GPT-4o System Card

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card. arXiv preprint arXiv:2410.21276, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[23]

Coupled mamba: Enhanced multi-modal fusion with coupled state space model

Wenbing Li, Hang Zhou, Junqing Yu, Zikai Song, and Wei Yang. Coupled mamba: Enhanced multi-modal fusion with coupled state space model. arXiv preprint arXiv:2405.18014, 2024

work page arXiv 2024
[24]

Code as policies: Language model programs for embodied control

Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. Code as policies: Language model programs for embodied control. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pp.\ 9493--9500. IEEE, 2023

work page 2023
[25]

Taskmatrix.ai: Completing tasks by connecting foundation models with millions of apis

Yaobo Liang, Chenfei Wu, Ting Song, Wenshan Wu, Yan Xia, Yu Liu, Yang Ou, Shuai Lu, Lei Ji, Shaoguang Mao, et al. Taskmatrix.ai: Completing tasks by connecting foundation models with millions of apis. Intelligent Computing, 3: 0 0063, 2024

work page 2024
[26]

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

Iman Mirzadeh, Keivan Alizadeh, Hooman Shahrokhi, Oncel Tuzel, Samy Bengio, and Mehrdad Farajtabar. Gsm-symbolic: Understanding the limitations of mathematical reasoning in large language models. arXiv preprint arXiv:2410.05229, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[27]

Logic-LM: Empowering Large Language Models With Symbolic Solvers for Faithful Logical Reasoning,

Liangming Pan, Alon Albalak, Xinyi Wang, and William Yang Wang. Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning. arXiv preprint arXiv:2305.12295, 2023

work page arXiv 2023
[28]

Generative agents: Interactive simulacra of human behavior

Joon Sung Park, Joseph O'Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th annual acm symposium on user interface software and technology, pp.\ 1--22, 2023

work page 2023
[29]

Advances in neural in- formation processing systems, 35:27730–27744

Nisarg Patel, Mohith Kulkarni, Mihir Parmar, Aashna Budhiraja, Mutsumi Nakamura, Neeraj Varshney, and Chitta Baral. Multi-logieval: Towards evaluating multi-step logical reasoning ability of large language models. arXiv preprint arXiv:2406.17169, 2024

work page arXiv 2024
[30]

Gorilla: Large language model connected with massive apis

Shishir G Patil, Tianjun Zhang, Xin Wang, and Joseph E Gonzalez. Gorilla: Large language model connected with massive apis. Advances in Neural Information Processing Systems, 37: 0 126544--126565, 2024

work page 2024
[31]

Critical and reflective thinking: A philosophical perspective

Richard W Paul. Critical and reflective thinking: A philosophical perspective. In Dimensions of thinking and cognitive instruction, pp.\ 445--494. Routledge, 2013

work page 2013
[32]

Large language models meet symbolic provers for logical reasoning evaluation

Chengwen Qi, Ren Ma, Bowen Li, He Du, Binyuan Hui, Jinwang Wu, Yuanjun Laili, and Conghui He. Large language models meet symbolic provers for logical reasoning evaluation. arXiv preprint arXiv:2502.06563, 2025

work page arXiv 2025
[33]

Divide and translate: Compositional first-order logic translation and verification for complex logical reasoning

Hyun Ryu, Gyeongman Kim, Hyemin S Lee, and Eunho Yang. Divide and translate: Compositional first-order logic translation and verification for complex logical reasoning. arXiv preprint arXiv:2410.08047, 2024

work page arXiv 2024
[34]

arXiv preprint arXiv:2210.01240 , year=

Abulhair Saparov and He He. Language models are greedy reasoners: A systematic formal analysis of chain-of-thought. arXiv preprint arXiv:2210.01240, 2022

work page arXiv 2022
[35]

An introduction to formal logic

Peter Smith. An introduction to formal logic. Cambridge University Press, 2003

work page 2003
[36]

Transformer tracking with cyclic shifting window attention

Zikai Song, Junqing Yu, Yi-Ping Phoebe Chen, and Wei Yang. Transformer tracking with cyclic shifting window attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 8791--8800, 2022

work page 2022
[37]

Compact transformer tracker with correlative masked modeling

Zikai Song, Run Luo, Junqing Yu, Yi-Ping Phoebe Chen, and Wei Yang. Compact transformer tracker with correlative masked modeling. In Proceedings of the AAAI conference on artificial intelligence, volume 37, pp.\ 2321--2329, 2023

work page 2023
[38]

Autogenic language embedding for coherent point tracking

Zikai Song, Ying Tang, Run Luo, Lintao Ma, Junqing Yu, Yi-Ping Phoebe Chen, and Wei Yang. Autogenic language embedding for coherent point tracking. In Proceedings of the 32nd ACM International Conference on Multimedia, pp.\ 2021--2030, 2024

work page 2021
[39]

Temporal coherent object flow for multi-object tracking

Zikai Song, Run Luo, Lintao Ma, Ying Tang, Yi-Ping Phoebe Chen, Junqing Yu, and Wei Yang. Temporal coherent object flow for multi-object tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pp.\ 6978--6986, 2025

work page 2025
[40]

Tafjord, B

Oyvind Tafjord, Bhavana Dalvi Mishra, and Peter Clark. Proofwriter: Generating implications, proofs, and abductive statements over natural language. arXiv preprint arXiv:2012.13048, 2020

work page arXiv 2012
[41]

Ambiguity, polysemy, and vagueness

David Tuggy. Ambiguity, polysemy, and vagueness. 1993

work page 1993
[42]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017

work page 2017
[43]

Mathcoder: Seamless code integration in llms for enhanced mathematical reasoning,

Ke Wang, Houxing Ren, Aojun Zhou, Zimu Lu, Sichun Luo, Weikang Shi, Renrui Zhang, Linqi Song, Mingjie Zhan, and Hongsheng Li. Mathcoder: Seamless code integration in llms for enhanced mathematical reasoning. arXiv preprint arXiv:2310.03731, 2023

work page arXiv 2023
[44]

CANDLE : Iterative conceptualization and instantiation distillation from large language models for commonsense reasoning

Weiqi Wang, Tianqing Fang, Chunyang Li, Haochen Shi, Wenxuan Ding, Baixuan Xu, Zhaowei Wang, Jiaxin Bai, Xin Liu, Cheng Jiayang, Chunkit Chan, and Yangqiu Song. CANDLE : Iterative conceptualization and instantiation distillation from large language models for commonsense reasoning. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar (eds.), Proceedings of th...

work page doi:10.18653/v1/2024.acl-long.128 2024
[45]

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[46]

Chain-of-thought prompting elicits reasoning in large language models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35: 0 24824--24837, 2022

work page 2022
[47]

Aristotle: Mastering logical reasoning with a logic-complete decompose-search-resolve framework

Jundong Xu, Hao Fei, Meng Luo, Qian Liu, Liangming Pan, William Yang Wang, Preslav Nakov, Mong-Li Lee, and Wynne Hsu. Aristotle: Mastering logical reasoning with a logic-complete decompose-search-resolve framework. arXiv preprint arXiv:2412.16953, 2024 a

work page arXiv 2024
[48]

arXiv preprint arXiv:2405.18357 (2024)

Jundong Xu, Hao Fei, Liangming Pan, Qian Liu, Mong-Li Lee, and Wynne Hsu. Faithful logical reasoning via symbolic chain-of-thought. arXiv preprint arXiv:2405.18357, 2024 b

work page arXiv 2024
[49]

An Yang, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoyan Huang, Jiandong Jiang, Jianhong Tu, Jianwei Zhang, Jingren Zhou, et al. Qwen2. 5-1m technical report. arXiv preprint arXiv:2501.15383, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[50]

Harnessing the power of large language models for natural language to first-order logic translation

Yuan Yang, Siheng Xiong, Ali Payani, Ehsan Shareghi, and Faramarz Fekri. Harnessing the power of large language models for natural language to first-order logic translation. arXiv preprint arXiv:2305.15541, 2023

work page arXiv 2023
[51]

Tree of thoughts: Deliberate problem solving with large language models

Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large language models. Advances in neural information processing systems, 36: 0 11809--11822, 2023

work page 2023
[52]

Mvp: Winning solution to smp challenge 2025 video track

Liliang Ye, Yunyao Zhang, Yafeng Wu, Yi-Ping Phoebe Chen, Junqing Yu, Wei Yang, and Zikai Song. Mvp: Winning solution to smp challenge 2025 video track. arXiv preprint arXiv:2507.00950, 2025

work page arXiv 2025
[53]

Why prompt design matters and works: A complexity analysis of prompt search space in llms

Xiang Zhang, Juntai Cao, Jiaqi Wei, Chenyu You, and Dujian Ding. Why prompt design matters and works: A complexity analysis of prompt search space in llms. arXiv preprint arXiv:2503.10084, 2025 a

work page arXiv 2025
[54]

ga-s^3 : Comprehensive social network simulation with group agents

Yunyao Zhang, Zikai Song, Hang Zhou, Wenfeng Ren, Yi-Ping Phoebe Chen, Junqing Yu, and Wei Yang. ga-s^3 : Comprehensive social network simulation with group agents. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (eds.), Findings of the Association for Computational Linguistics: ACL 2025, pp.\ 8950--8970, Vienna, Austria, Ju...

work page doi:10.18653/v1/2025.findings-acl.468 2025
[55]

Semantics-aware bert for language understanding

Zhuosheng Zhang, Yuwei Wu, Hai Zhao, Zuchao Li, Shuailiang Zhang, Xi Zhou, and Xiang Zhou. Semantics-aware bert for language understanding. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pp.\ 9628--9635, 2020

work page 2020
[56]

Explicit planning helps language models in logical reasoning

Hongyu Zhao, Kangrui Wang, Mo Yu, and Hongyuan Mei. Explicit planning helps language models in logical reasoning. arXiv preprint arXiv:2303.15714, 2023 a

work page arXiv 2023
[57]

A Survey of Large Language Models

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. A survey of large language models. arXiv preprint arXiv:2303.18223, 1 0 (2), 2023 b

work page internal anchor Pith review Pith/arXiv arXiv 2023
[58]

Exploring the role of reasoning structures for constructing proofs in multi-step natural language reasoning with large language models

Zi'ou Zheng, Christopher Malon, Martin Renqiang Min, and Xiaodan Zhu. Exploring the role of reasoning structures for constructing proofs in multi-step natural language reasoning with large language models. arXiv preprint arXiv:2410.08436, 2024

work page arXiv 2024
[59]

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

Denny Zhou, Nathanael Sch \"a rli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, et al. Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[60]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page
[61]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page
[62]

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page

[1] [1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page

[2] [2]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[3] [3]

Qwen Technical Report

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. Qwen technical report. arXiv preprint arXiv:2309.16609, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[4] [4]

Nltk: the natural language toolkit

Steven Bird. Nltk: the natural language toolkit. In Proceedings of the COLING/ACL 2006 interactive presentation sessions, pp.\ 69--72, 2006

work page 2006

[5] [5]

Autoagents: A framework for automatic agent generation.arXiv preprint arXiv:2309.17288, 2023

Guangyao Chen, Siwei Dong, Yu Shu, Ge Zhang, Jaward Sesay, B \"o rje F Karlsson, Jie Fu, and Yemin Shi. Autoagents: A framework for automatic agent generation. arXiv preprint arXiv:2309.17288, 2023

work page arXiv 2023

[6] [6]

Asymptotically unambitious artificial general intelligence

Michael Cohen, Badri Vellambi, and Marcus Hutter. Asymptotically unambitious artificial general intelligence. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pp.\ 2467--2476, 2020

work page 2020

[7] [7]

Semcoder: Training code language models with comprehensive semantics reasoning

Yangruibo Ding, Jinjun Peng, Marcus Min, Gail Kaiser, Junfeng Yang, and Baishakhi Ray. Semcoder: Training code language models with comprehensive semantics reasoning. Advances in Neural Information Processing Systems, 37: 0 60275--60308, 2024

work page 2024

[8] [8]

Agent AI: Surveying the Horizons of Multimodal Interaction

Zane Durante, Qiuyuan Huang, Naoki Wake, Ran Gong, Jae Sung Park, Bidipta Sarkar, Rohan Taori, Yusuke Noda, Demetri Terzopoulos, Yejin Choi, et al. Agent ai: Surveying the horizons of multimodal interaction. arXiv preprint arXiv:2401.03568, 2024

work page internal anchor Pith review arXiv 2024

[9] [9]

Deep se (3)-equivariant geometric reasoning for precise placement tasks

Ben Eisner, Yi Yang, Todor Davchev, Mel Vecerik, Jonathan Scholz, and David Held. Deep se (3)-equivariant geometric reasoning for precise placement tasks. arXiv preprint arXiv:2404.13478, 2024

work page arXiv 2024

[10] [10]

Pal: Program-aided language models

Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. Pal: Program-aided language models. In International Conference on Machine Learning, pp.\ 10764--10799. PMLR, 2023

work page 2023

[11] [11]

Linguistic complexity: Locality of syntactic dependencies

Edward Gibson. Linguistic complexity: Locality of syntactic dependencies. Cognition, 68 0 (1): 0 1--76, 1998

work page 1998

[12] [12]

On meaning: Selected writings in semiotic theory

Algirdas Julien Greimas. On meaning: Selected writings in semiotic theory. (No Title), 1987

work page 1987

[13] [13]

Maupassant: The semiotics of text

Algirdas Julien Greimas. Maupassant: The semiotics of text. 1988

work page 1988

[14] [14]

Semiotics and language: An analytical dictionary

Algirdas Julien Greimas, Joseph Court \'e s, Larry Crist, and Daniel Patte. Semiotics and language: An analytical dictionary. Indiana University Press Bloomington, 1982

work page 1982

[15] [15]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[16] [16]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[17] [17]

Folio: Natural language reasoning with first-order logic

Simeng Han, Hailey Schoelkopf, Yilun Zhao, Zhenting Qi, Martin Riddell, Wenfei Zhou, James Coady, David Peng, Yujie Qiao, Luke Benson, et al. Folio: Natural language reasoning with first-order logic. arXiv preprint arXiv:2209.00840, 2022

work page arXiv 2022

[18] [18]

Plan-then-execute: An empirical study of user trust and team performance when using llm agents as a daily assistant

Gaole He, Gianluca Demartini, and Ujwal Gadiraju. Plan-then-execute: An empirical study of user trust and team performance when using llm agents as a daily assistant. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pp.\ 1--22, 2025

work page 2025

[19] [19]

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

Sirui Hong, Xiawu Zheng, Jonathan Chen, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, et al. Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 3 0 (4): 0 6, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[20] [20]

Sf2t: Self-supervised fragment finetuning of video-llms for fine-grained understanding

Yangliu Hu, Zikai Song, Na Feng, Yawei Luo, Junqing Yu, Yi-Ping Phoebe Chen, and Wei Yang. Sf2t: Self-supervised fragment finetuning of video-llms for fine-grained understanding. arXiv preprint arXiv:2504.07745, 2025

work page arXiv 2025

[21] [21]

Towards Reasoning in Large Language Models: A Survey

Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403, 2022

work page internal anchor Pith review arXiv 2022

[22] [22]

GPT-4o System Card

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card. arXiv preprint arXiv:2410.21276, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[23] [23]

Coupled mamba: Enhanced multi-modal fusion with coupled state space model

Wenbing Li, Hang Zhou, Junqing Yu, Zikai Song, and Wei Yang. Coupled mamba: Enhanced multi-modal fusion with coupled state space model. arXiv preprint arXiv:2405.18014, 2024

work page arXiv 2024

[24] [24]

Code as policies: Language model programs for embodied control

Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. Code as policies: Language model programs for embodied control. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pp.\ 9493--9500. IEEE, 2023

work page 2023

[25] [25]

Taskmatrix.ai: Completing tasks by connecting foundation models with millions of apis

Yaobo Liang, Chenfei Wu, Ting Song, Wenshan Wu, Yan Xia, Yu Liu, Yang Ou, Shuai Lu, Lei Ji, Shaoguang Mao, et al. Taskmatrix.ai: Completing tasks by connecting foundation models with millions of apis. Intelligent Computing, 3: 0 0063, 2024

work page 2024

[26] [26]

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

Iman Mirzadeh, Keivan Alizadeh, Hooman Shahrokhi, Oncel Tuzel, Samy Bengio, and Mehrdad Farajtabar. Gsm-symbolic: Understanding the limitations of mathematical reasoning in large language models. arXiv preprint arXiv:2410.05229, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[27] [27]

Logic-LM: Empowering Large Language Models With Symbolic Solvers for Faithful Logical Reasoning,

Liangming Pan, Alon Albalak, Xinyi Wang, and William Yang Wang. Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning. arXiv preprint arXiv:2305.12295, 2023

work page arXiv 2023

[28] [28]

Generative agents: Interactive simulacra of human behavior

Joon Sung Park, Joseph O'Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th annual acm symposium on user interface software and technology, pp.\ 1--22, 2023

work page 2023

[29] [29]

Advances in neural in- formation processing systems, 35:27730–27744

Nisarg Patel, Mohith Kulkarni, Mihir Parmar, Aashna Budhiraja, Mutsumi Nakamura, Neeraj Varshney, and Chitta Baral. Multi-logieval: Towards evaluating multi-step logical reasoning ability of large language models. arXiv preprint arXiv:2406.17169, 2024

work page arXiv 2024

[30] [30]

Gorilla: Large language model connected with massive apis

Shishir G Patil, Tianjun Zhang, Xin Wang, and Joseph E Gonzalez. Gorilla: Large language model connected with massive apis. Advances in Neural Information Processing Systems, 37: 0 126544--126565, 2024

work page 2024

[31] [31]

Critical and reflective thinking: A philosophical perspective

Richard W Paul. Critical and reflective thinking: A philosophical perspective. In Dimensions of thinking and cognitive instruction, pp.\ 445--494. Routledge, 2013

work page 2013

[32] [32]

Large language models meet symbolic provers for logical reasoning evaluation

Chengwen Qi, Ren Ma, Bowen Li, He Du, Binyuan Hui, Jinwang Wu, Yuanjun Laili, and Conghui He. Large language models meet symbolic provers for logical reasoning evaluation. arXiv preprint arXiv:2502.06563, 2025

work page arXiv 2025

[33] [33]

Divide and translate: Compositional first-order logic translation and verification for complex logical reasoning

Hyun Ryu, Gyeongman Kim, Hyemin S Lee, and Eunho Yang. Divide and translate: Compositional first-order logic translation and verification for complex logical reasoning. arXiv preprint arXiv:2410.08047, 2024

work page arXiv 2024

[34] [34]

arXiv preprint arXiv:2210.01240 , year=

Abulhair Saparov and He He. Language models are greedy reasoners: A systematic formal analysis of chain-of-thought. arXiv preprint arXiv:2210.01240, 2022

work page arXiv 2022

[35] [35]

An introduction to formal logic

Peter Smith. An introduction to formal logic. Cambridge University Press, 2003

work page 2003

[36] [36]

Transformer tracking with cyclic shifting window attention

Zikai Song, Junqing Yu, Yi-Ping Phoebe Chen, and Wei Yang. Transformer tracking with cyclic shifting window attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 8791--8800, 2022

work page 2022

[37] [37]

Compact transformer tracker with correlative masked modeling

Zikai Song, Run Luo, Junqing Yu, Yi-Ping Phoebe Chen, and Wei Yang. Compact transformer tracker with correlative masked modeling. In Proceedings of the AAAI conference on artificial intelligence, volume 37, pp.\ 2321--2329, 2023

work page 2023

[38] [38]

Autogenic language embedding for coherent point tracking

Zikai Song, Ying Tang, Run Luo, Lintao Ma, Junqing Yu, Yi-Ping Phoebe Chen, and Wei Yang. Autogenic language embedding for coherent point tracking. In Proceedings of the 32nd ACM International Conference on Multimedia, pp.\ 2021--2030, 2024

work page 2021

[39] [39]

Temporal coherent object flow for multi-object tracking

Zikai Song, Run Luo, Lintao Ma, Ying Tang, Yi-Ping Phoebe Chen, Junqing Yu, and Wei Yang. Temporal coherent object flow for multi-object tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pp.\ 6978--6986, 2025

work page 2025

[40] [40]

Tafjord, B

Oyvind Tafjord, Bhavana Dalvi Mishra, and Peter Clark. Proofwriter: Generating implications, proofs, and abductive statements over natural language. arXiv preprint arXiv:2012.13048, 2020

work page arXiv 2012

[41] [41]

Ambiguity, polysemy, and vagueness

David Tuggy. Ambiguity, polysemy, and vagueness. 1993

work page 1993

[42] [42]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017

work page 2017

[43] [43]

Mathcoder: Seamless code integration in llms for enhanced mathematical reasoning,

Ke Wang, Houxing Ren, Aojun Zhou, Zimu Lu, Sichun Luo, Weikang Shi, Renrui Zhang, Linqi Song, Mingjie Zhan, and Hongsheng Li. Mathcoder: Seamless code integration in llms for enhanced mathematical reasoning. arXiv preprint arXiv:2310.03731, 2023

work page arXiv 2023

[44] [44]

CANDLE : Iterative conceptualization and instantiation distillation from large language models for commonsense reasoning

Weiqi Wang, Tianqing Fang, Chunyang Li, Haochen Shi, Wenxuan Ding, Baixuan Xu, Zhaowei Wang, Jiaxin Bai, Xin Liu, Cheng Jiayang, Chunkit Chan, and Yangqiu Song. CANDLE : Iterative conceptualization and instantiation distillation from large language models for commonsense reasoning. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar (eds.), Proceedings of th...

work page doi:10.18653/v1/2024.acl-long.128 2024

[45] [45]

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[46] [46]

Chain-of-thought prompting elicits reasoning in large language models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35: 0 24824--24837, 2022

work page 2022

[47] [47]

Aristotle: Mastering logical reasoning with a logic-complete decompose-search-resolve framework

Jundong Xu, Hao Fei, Meng Luo, Qian Liu, Liangming Pan, William Yang Wang, Preslav Nakov, Mong-Li Lee, and Wynne Hsu. Aristotle: Mastering logical reasoning with a logic-complete decompose-search-resolve framework. arXiv preprint arXiv:2412.16953, 2024 a

work page arXiv 2024

[48] [48]

arXiv preprint arXiv:2405.18357 (2024)

Jundong Xu, Hao Fei, Liangming Pan, Qian Liu, Mong-Li Lee, and Wynne Hsu. Faithful logical reasoning via symbolic chain-of-thought. arXiv preprint arXiv:2405.18357, 2024 b

work page arXiv 2024

[49] [49]

An Yang, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoyan Huang, Jiandong Jiang, Jianhong Tu, Jianwei Zhang, Jingren Zhou, et al. Qwen2. 5-1m technical report. arXiv preprint arXiv:2501.15383, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[50] [50]

Harnessing the power of large language models for natural language to first-order logic translation

Yuan Yang, Siheng Xiong, Ali Payani, Ehsan Shareghi, and Faramarz Fekri. Harnessing the power of large language models for natural language to first-order logic translation. arXiv preprint arXiv:2305.15541, 2023

work page arXiv 2023

[51] [51]

Tree of thoughts: Deliberate problem solving with large language models

Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large language models. Advances in neural information processing systems, 36: 0 11809--11822, 2023

work page 2023

[52] [52]

Mvp: Winning solution to smp challenge 2025 video track

Liliang Ye, Yunyao Zhang, Yafeng Wu, Yi-Ping Phoebe Chen, Junqing Yu, Wei Yang, and Zikai Song. Mvp: Winning solution to smp challenge 2025 video track. arXiv preprint arXiv:2507.00950, 2025

work page arXiv 2025

[53] [53]

Why prompt design matters and works: A complexity analysis of prompt search space in llms

Xiang Zhang, Juntai Cao, Jiaqi Wei, Chenyu You, and Dujian Ding. Why prompt design matters and works: A complexity analysis of prompt search space in llms. arXiv preprint arXiv:2503.10084, 2025 a

work page arXiv 2025

[54] [54]

ga-s^3 : Comprehensive social network simulation with group agents

Yunyao Zhang, Zikai Song, Hang Zhou, Wenfeng Ren, Yi-Ping Phoebe Chen, Junqing Yu, and Wei Yang. ga-s^3 : Comprehensive social network simulation with group agents. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (eds.), Findings of the Association for Computational Linguistics: ACL 2025, pp.\ 8950--8970, Vienna, Austria, Ju...

work page doi:10.18653/v1/2025.findings-acl.468 2025

[55] [55]

Semantics-aware bert for language understanding

Zhuosheng Zhang, Yuwei Wu, Hai Zhao, Zuchao Li, Shuailiang Zhang, Xi Zhou, and Xiang Zhou. Semantics-aware bert for language understanding. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pp.\ 9628--9635, 2020

work page 2020

[56] [56]

Explicit planning helps language models in logical reasoning

Hongyu Zhao, Kangrui Wang, Mo Yu, and Hongyuan Mei. Explicit planning helps language models in logical reasoning. arXiv preprint arXiv:2303.15714, 2023 a

work page arXiv 2023

[57] [57]

A Survey of Large Language Models

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. A survey of large language models. arXiv preprint arXiv:2303.18223, 1 0 (2), 2023 b

work page internal anchor Pith review Pith/arXiv arXiv 2023

[58] [58]

Exploring the role of reasoning structures for constructing proofs in multi-step natural language reasoning with large language models

Zi'ou Zheng, Christopher Malon, Martin Renqiang Min, and Xiaodan Zhu. Exploring the role of reasoning structures for constructing proofs in multi-step natural language reasoning with large language models. arXiv preprint arXiv:2410.08436, 2024

work page arXiv 2024

[59] [59]

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

Denny Zhou, Nathanael Sch \"a rli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, et al. Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[60] [60]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page

[61] [61]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page

[62] [62]

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page