Semantic-Aware Logical Reasoning via a Semiotic Framework
Pith reviewed 2026-05-18 12:54 UTC · model grok-4.3
The pith
LogicAgent combines a semiotic square for multi-perspective semantics with deduction and verification to improve logical reasoning in language models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LogicAgent integrates the semiotic square to perform multi-perspective semantic analysis and combines it with automated deduction plus reflective verification, allowing large language models to manage logical complexity more effectively across deeper reasoning chains on tasks that mix semantic and logical difficulty.
What carries the argument
The semiotic square, which organizes semantic relations among a proposition, its contrary, its contradictory, and its subcontrary to enable structured multi-perspective examination of meaning.
If this is right
- Language models gain the ability to track conflicting stances within the same reasoning task rather than collapsing them early.
- Benchmarks that jointly vary semantic depth and logical length become necessary for realistic evaluation of reasoning systems.
- The same semiotic structure can be reused to generate or verify reasoning traces that explicitly account for alternative meanings.
Where Pith is reading between the lines
- The approach could transfer to domains such as policy analysis or case law where one proposition must be examined against its logical opposites.
- Future work could test whether the square structure helps models avoid common semantic pitfalls like scope ambiguity in natural-language premises.
- If the integration scales, it suggests a general route for adding lightweight symbolic scaffolds to purely neural reasoning pipelines.
Load-bearing premise
The semiotic square supplies a reliable structure for breaking down semantic relations that can be usefully combined with deduction and verification inside language-model reasoning loops.
What would settle it
Run LogicAgent and a version without the semiotic-square component on a fresh set of abstract propositions with systematically varied contrary and contradictory forms; if the full system shows no measurable gain in accuracy or chain length, the central integration claim does not hold.
Figures
read the original abstract
Logical reasoning is a fundamental capability of large language models. However, existing studies often overlook the interaction between logical complexity and semantic complexity, leading to systems that struggle with abstract propositions, ambiguous contexts, and conflicting stances that are central to human reasoning. We propose LogicAgent, a semiotic-square-guided framework that jointly addresses these two axes of difficulty. The semiotic square provides a principled structure for multi-perspective semantic analysis, and LogicAgent integrates automated deduction with reflective verification to manage logical complexity across deeper reasoning chains. To support evaluation under these conditions, we introduce RepublicQA, a benchmark that couples semantic complexity with logical depth. RepublicQA reaches college-level semantic difficulty (FKGL 11.94), contains philosophically grounded abstract propositions with systematically constructed contrary and contradictory forms, and offers a semantically rich setting for assessing logical reasoning in large language models. Experiments show that LogicAgent achieves state-of-the-art performance on RepublicQA with a 6.25 percent average improvement over strong baselines, and generalizes effectively to mainstream logical reasoning benchmarks including ProntoQA, ProofWriter, FOLIO, and ProverQA, achieving an additional 7.05 percent average gain. These results demonstrate the effectiveness of semiotic-grounded multi-perspective reasoning in enhancing logical performance. Code is available at https://github.com/AI4SS/Logic-Agent.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes LogicAgent, a semiotic-square-guided framework that combines multi-perspective semantic analysis with automated deduction and reflective verification to improve logical reasoning in LLMs under conditions of high semantic and logical complexity. It introduces the RepublicQA benchmark, which features college-level semantic difficulty (FKGL 11.94), philosophically grounded abstract propositions, and systematically constructed contrary/contradictory forms. Experiments report that LogicAgent achieves SOTA performance on RepublicQA (6.25% average improvement over strong baselines) and generalizes to ProntoQA, ProofWriter, FOLIO, and ProverQA (additional 7.05% average gain). Code is released publicly.
Significance. If the results hold after addressing isolation concerns, the work would represent a meaningful step toward integrating semiotic structures with automated reasoning pipelines in LLMs, offering a structured way to handle ambiguous and conflicting semantic contexts that current systems often overlook. The new RepublicQA benchmark and public code release are concrete strengths that could support follow-on research in semantic-aware logical reasoning.
major comments (1)
- [Framework and Experiments] Framework description and experimental evaluation: The central claim attributes the 6.25% RepublicQA gain and 7.05% cross-benchmark improvement specifically to the semiotic-square-guided multi-perspective analysis. However, no ablation is reported that removes or replaces the semiotic square (contrary/contradictory forms and multi-perspective semantic analysis) while retaining the automated deduction and reflective verification steps. This leaves open whether the gains arise from the semiotic component or from reflective verification alone, which is load-bearing for the paper's attribution of effectiveness to the semiotic framework.
minor comments (1)
- [Abstract] Abstract and experimental setup: More explicit details on baseline implementations, exact prompting templates, and statistical controls (e.g., number of runs, variance) would strengthen reproducibility claims, even with code release.
Simulated Author's Rebuttal
We thank the referee for the thorough review and constructive criticism. The concern about isolating the contribution of the semiotic square is well-taken and points to a genuine gap in the current experimental design. We address this point directly below and outline the planned revision.
read point-by-point responses
-
Referee: [Framework and Experiments] Framework description and experimental evaluation: The central claim attributes the 6.25% RepublicQA gain and 7.05% cross-benchmark improvement specifically to the semiotic-square-guided multi-perspective analysis. However, no ablation is reported that removes or replaces the semiotic square (contrary/contradictory forms and multi-perspective semantic analysis) while retaining the automated deduction and reflective verification steps. This leaves open whether the gains arise from the semiotic component or from reflective verification alone, which is load-bearing for the paper's attribution of effectiveness to the semiotic framework.
Authors: We agree that the manuscript would be strengthened by an ablation that removes or replaces the semiotic square (including the contrary/contradictory forms and multi-perspective semantic analysis) while keeping the automated deduction and reflective verification components intact. The existing baselines compare LogicAgent against methods that lack the full pipeline, but they do not isolate the semiotic component from reflective verification in the manner described. To address this directly, we will add a targeted ablation study in the revised version. This study will evaluate a variant that retains deduction and verification but substitutes a non-semiotic multi-perspective prompt or removes the structured contrary/contradictory analysis. The new results will be reported alongside the existing experiments to clarify the specific contribution of the semiotic framework to the observed gains on RepublicQA and the other benchmarks. revision: yes
Circularity Check
No circularity: new framework and benchmark validated on external benchmarks
full rationale
The paper introduces LogicAgent as a novel semiotic-square-guided framework and RepublicQA as a new benchmark with college-level semantic difficulty and philosophically grounded propositions. Performance gains (6.25% on RepublicQA, 7.05% on cross-benchmarks) are reported via direct empirical comparison to strong baselines on ProntoQA, ProofWriter, FOLIO, and ProverQA. No equations, fitted parameters, or self-referential definitions appear in the abstract or described derivation; the semiotic square is presented as an imported principled structure rather than derived from the results themselves. The central claims rest on experimental outcomes and external benchmark generalization, making the derivation self-contained against independent data.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The semiotic square provides a principled structure for multi-perspective semantic analysis.
invented entities (2)
-
LogicAgent
no independent evidence
-
RepublicQA
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The semiotic square provides a principled structure for multi-perspective semantic analysis... S1 ⇒ ¬S2 and S2 ⇒ ¬S1 (Theorem 1)
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_injective unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
LogicAgent integrates automated deduction with reflective verification... Deep Reflection using S1 ⇒ ¬S2
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 8 Pith papers
-
TEMA: Anchor the Image, Follow the Text for Multi-Modification Composed Image Retrieval
TEMA is the first framework for multi-modification composed image retrieval, using entity mapping to improve accuracy on both new complex datasets and existing benchmarks while balancing efficiency.
-
IntervenSim: Intervention-Aware Social Network Simulation for Opinion Dynamics
IntervenSim is an intervention-aware social network simulation that couples source interventions with crowd interactions in a feedback loop, improving MAPE by 41.6% and DTW by 66.9% over prior static frameworks on rea...
-
OmniTrend: Content-Context Modeling for Scalable Social Popularity Prediction
OmniTrend predicts popularity by combining separate content attractiveness and contextual exposure predictors using cross-modal and exogenous signals.
-
HotComment: A Benchmark for Evaluating Popularity of Online Comments
HotComment is a new multimodal benchmark that quantifies online comment popularity via content quality assessment, interaction-based prediction, and agent-simulated user engagement, accompanied by the StyleCmt stylist...
-
Towards Disentangled Preference Optimization Dynamics: Suppress the Loser, Preserve the Winner
A unified incentive-score decomposition of preference optimization reveals the disentanglement band condition and reward calibration method that enables suppressing losers while preserving winners in LLM training.
-
Coupling Macro Dynamics and Micro States for Long-Horizon Social Simulation
MF-MDP enables stable long-horizon social simulations by coupling micro-level individual opinion states with macro-level collective dynamics, achieving up to 40,000 interactions with 75% lower KL divergence than baselines.
-
Seeing Further and Wider: Joint Spatio-Temporal Enlargement for Micro-Video Popularity Prediction
A new joint spatio-temporal enlargement model for micro-video popularity prediction using frame scoring for long sequences and a topology-aware memory bank for unbounded historical associations.
-
CurEvo: Curriculum-Guided Self-Evolution for Video Understanding
CurEvo integrates curriculum guidance into self-evolution to structure autonomous improvement of video understanding models, yielding gains on VideoQA benchmarks.
Reference graph
Works this paper leans on
-
[1]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
-
[2]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[3]
Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. Qwen technical report. arXiv preprint arXiv:2309.16609, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[4]
Nltk: the natural language toolkit
Steven Bird. Nltk: the natural language toolkit. In Proceedings of the COLING/ACL 2006 interactive presentation sessions, pp.\ 69--72, 2006
work page 2006
-
[5]
Autoagents: A framework for automatic agent generation.arXiv preprint arXiv:2309.17288, 2023
Guangyao Chen, Siwei Dong, Yu Shu, Ge Zhang, Jaward Sesay, B \"o rje F Karlsson, Jie Fu, and Yemin Shi. Autoagents: A framework for automatic agent generation. arXiv preprint arXiv:2309.17288, 2023
-
[6]
Asymptotically unambitious artificial general intelligence
Michael Cohen, Badri Vellambi, and Marcus Hutter. Asymptotically unambitious artificial general intelligence. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pp.\ 2467--2476, 2020
work page 2020
-
[7]
Semcoder: Training code language models with comprehensive semantics reasoning
Yangruibo Ding, Jinjun Peng, Marcus Min, Gail Kaiser, Junfeng Yang, and Baishakhi Ray. Semcoder: Training code language models with comprehensive semantics reasoning. Advances in Neural Information Processing Systems, 37: 0 60275--60308, 2024
work page 2024
-
[8]
Agent AI: Surveying the Horizons of Multimodal Interaction
Zane Durante, Qiuyuan Huang, Naoki Wake, Ran Gong, Jae Sung Park, Bidipta Sarkar, Rohan Taori, Yusuke Noda, Demetri Terzopoulos, Yejin Choi, et al. Agent ai: Surveying the horizons of multimodal interaction. arXiv preprint arXiv:2401.03568, 2024
work page internal anchor Pith review arXiv 2024
-
[9]
Deep se (3)-equivariant geometric reasoning for precise placement tasks
Ben Eisner, Yi Yang, Todor Davchev, Mel Vecerik, Jonathan Scholz, and David Held. Deep se (3)-equivariant geometric reasoning for precise placement tasks. arXiv preprint arXiv:2404.13478, 2024
-
[10]
Pal: Program-aided language models
Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. Pal: Program-aided language models. In International Conference on Machine Learning, pp.\ 10764--10799. PMLR, 2023
work page 2023
-
[11]
Linguistic complexity: Locality of syntactic dependencies
Edward Gibson. Linguistic complexity: Locality of syntactic dependencies. Cognition, 68 0 (1): 0 1--76, 1998
work page 1998
-
[12]
On meaning: Selected writings in semiotic theory
Algirdas Julien Greimas. On meaning: Selected writings in semiotic theory. (No Title), 1987
work page 1987
-
[13]
Maupassant: The semiotics of text
Algirdas Julien Greimas. Maupassant: The semiotics of text. 1988
work page 1988
-
[14]
Semiotics and language: An analytical dictionary
Algirdas Julien Greimas, Joseph Court \'e s, Larry Crist, and Daniel Patte. Semiotics and language: An analytical dictionary. Indiana University Press Bloomington, 1982
work page 1982
-
[15]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[16]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[17]
Folio: Natural language reasoning with first-order logic
Simeng Han, Hailey Schoelkopf, Yilun Zhao, Zhenting Qi, Martin Riddell, Wenfei Zhou, James Coady, David Peng, Yujie Qiao, Luke Benson, et al. Folio: Natural language reasoning with first-order logic. arXiv preprint arXiv:2209.00840, 2022
-
[18]
Gaole He, Gianluca Demartini, and Ujwal Gadiraju. Plan-then-execute: An empirical study of user trust and team performance when using llm agents as a daily assistant. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pp.\ 1--22, 2025
work page 2025
-
[19]
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
Sirui Hong, Xiawu Zheng, Jonathan Chen, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, et al. Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 3 0 (4): 0 6, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[20]
Sf2t: Self-supervised fragment finetuning of video-llms for fine-grained understanding
Yangliu Hu, Zikai Song, Na Feng, Yawei Luo, Junqing Yu, Yi-Ping Phoebe Chen, and Wei Yang. Sf2t: Self-supervised fragment finetuning of video-llms for fine-grained understanding. arXiv preprint arXiv:2504.07745, 2025
-
[21]
Towards Reasoning in Large Language Models: A Survey
Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403, 2022
work page internal anchor Pith review arXiv 2022
-
[22]
Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card. arXiv preprint arXiv:2410.21276, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[23]
Coupled mamba: Enhanced multi-modal fusion with coupled state space model
Wenbing Li, Hang Zhou, Junqing Yu, Zikai Song, and Wei Yang. Coupled mamba: Enhanced multi-modal fusion with coupled state space model. arXiv preprint arXiv:2405.18014, 2024
-
[24]
Code as policies: Language model programs for embodied control
Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. Code as policies: Language model programs for embodied control. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pp.\ 9493--9500. IEEE, 2023
work page 2023
-
[25]
Taskmatrix.ai: Completing tasks by connecting foundation models with millions of apis
Yaobo Liang, Chenfei Wu, Ting Song, Wenshan Wu, Yan Xia, Yu Liu, Yang Ou, Shuai Lu, Lei Ji, Shaoguang Mao, et al. Taskmatrix.ai: Completing tasks by connecting foundation models with millions of apis. Intelligent Computing, 3: 0 0063, 2024
work page 2024
-
[26]
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
Iman Mirzadeh, Keivan Alizadeh, Hooman Shahrokhi, Oncel Tuzel, Samy Bengio, and Mehrdad Farajtabar. Gsm-symbolic: Understanding the limitations of mathematical reasoning in large language models. arXiv preprint arXiv:2410.05229, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[27]
Logic-LM: Empowering Large Language Models With Symbolic Solvers for Faithful Logical Reasoning,
Liangming Pan, Alon Albalak, Xinyi Wang, and William Yang Wang. Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning. arXiv preprint arXiv:2305.12295, 2023
-
[28]
Generative agents: Interactive simulacra of human behavior
Joon Sung Park, Joseph O'Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th annual acm symposium on user interface software and technology, pp.\ 1--22, 2023
work page 2023
-
[29]
Advances in neural in- formation processing systems, 35:27730–27744
Nisarg Patel, Mohith Kulkarni, Mihir Parmar, Aashna Budhiraja, Mutsumi Nakamura, Neeraj Varshney, and Chitta Baral. Multi-logieval: Towards evaluating multi-step logical reasoning ability of large language models. arXiv preprint arXiv:2406.17169, 2024
-
[30]
Gorilla: Large language model connected with massive apis
Shishir G Patil, Tianjun Zhang, Xin Wang, and Joseph E Gonzalez. Gorilla: Large language model connected with massive apis. Advances in Neural Information Processing Systems, 37: 0 126544--126565, 2024
work page 2024
-
[31]
Critical and reflective thinking: A philosophical perspective
Richard W Paul. Critical and reflective thinking: A philosophical perspective. In Dimensions of thinking and cognitive instruction, pp.\ 445--494. Routledge, 2013
work page 2013
-
[32]
Large language models meet symbolic provers for logical reasoning evaluation
Chengwen Qi, Ren Ma, Bowen Li, He Du, Binyuan Hui, Jinwang Wu, Yuanjun Laili, and Conghui He. Large language models meet symbolic provers for logical reasoning evaluation. arXiv preprint arXiv:2502.06563, 2025
-
[33]
Hyun Ryu, Gyeongman Kim, Hyemin S Lee, and Eunho Yang. Divide and translate: Compositional first-order logic translation and verification for complex logical reasoning. arXiv preprint arXiv:2410.08047, 2024
-
[34]
arXiv preprint arXiv:2210.01240 , year=
Abulhair Saparov and He He. Language models are greedy reasoners: A systematic formal analysis of chain-of-thought. arXiv preprint arXiv:2210.01240, 2022
-
[35]
An introduction to formal logic
Peter Smith. An introduction to formal logic. Cambridge University Press, 2003
work page 2003
-
[36]
Transformer tracking with cyclic shifting window attention
Zikai Song, Junqing Yu, Yi-Ping Phoebe Chen, and Wei Yang. Transformer tracking with cyclic shifting window attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 8791--8800, 2022
work page 2022
-
[37]
Compact transformer tracker with correlative masked modeling
Zikai Song, Run Luo, Junqing Yu, Yi-Ping Phoebe Chen, and Wei Yang. Compact transformer tracker with correlative masked modeling. In Proceedings of the AAAI conference on artificial intelligence, volume 37, pp.\ 2321--2329, 2023
work page 2023
-
[38]
Autogenic language embedding for coherent point tracking
Zikai Song, Ying Tang, Run Luo, Lintao Ma, Junqing Yu, Yi-Ping Phoebe Chen, and Wei Yang. Autogenic language embedding for coherent point tracking. In Proceedings of the 32nd ACM International Conference on Multimedia, pp.\ 2021--2030, 2024
work page 2021
-
[39]
Temporal coherent object flow for multi-object tracking
Zikai Song, Run Luo, Lintao Ma, Ying Tang, Yi-Ping Phoebe Chen, Junqing Yu, and Wei Yang. Temporal coherent object flow for multi-object tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pp.\ 6978--6986, 2025
work page 2025
-
[40]
Oyvind Tafjord, Bhavana Dalvi Mishra, and Peter Clark. Proofwriter: Generating implications, proofs, and abductive statements over natural language. arXiv preprint arXiv:2012.13048, 2020
-
[41]
Ambiguity, polysemy, and vagueness
David Tuggy. Ambiguity, polysemy, and vagueness. 1993
work page 1993
-
[42]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017
work page 2017
-
[43]
Mathcoder: Seamless code integration in llms for enhanced mathematical reasoning,
Ke Wang, Houxing Ren, Aojun Zhou, Zimu Lu, Sichun Luo, Weikang Shi, Renrui Zhang, Linqi Song, Mingjie Zhan, and Hongsheng Li. Mathcoder: Seamless code integration in llms for enhanced mathematical reasoning. arXiv preprint arXiv:2310.03731, 2023
-
[44]
Weiqi Wang, Tianqing Fang, Chunyang Li, Haochen Shi, Wenxuan Ding, Baixuan Xu, Zhaowei Wang, Jiaxin Bai, Xin Liu, Cheng Jiayang, Chunkit Chan, and Yangqiu Song. CANDLE : Iterative conceptualization and instantiation distillation from large language models for commonsense reasoning. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar (eds.), Proceedings of th...
-
[45]
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[46]
Chain-of-thought prompting elicits reasoning in large language models
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35: 0 24824--24837, 2022
work page 2022
-
[47]
Aristotle: Mastering logical reasoning with a logic-complete decompose-search-resolve framework
Jundong Xu, Hao Fei, Meng Luo, Qian Liu, Liangming Pan, William Yang Wang, Preslav Nakov, Mong-Li Lee, and Wynne Hsu. Aristotle: Mastering logical reasoning with a logic-complete decompose-search-resolve framework. arXiv preprint arXiv:2412.16953, 2024 a
-
[48]
arXiv preprint arXiv:2405.18357 (2024)
Jundong Xu, Hao Fei, Liangming Pan, Qian Liu, Mong-Li Lee, and Wynne Hsu. Faithful logical reasoning via symbolic chain-of-thought. arXiv preprint arXiv:2405.18357, 2024 b
-
[49]
An Yang, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoyan Huang, Jiandong Jiang, Jianhong Tu, Jianwei Zhang, Jingren Zhou, et al. Qwen2. 5-1m technical report. arXiv preprint arXiv:2501.15383, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[50]
Harnessing the power of large language models for natural language to first-order logic translation
Yuan Yang, Siheng Xiong, Ali Payani, Ehsan Shareghi, and Faramarz Fekri. Harnessing the power of large language models for natural language to first-order logic translation. arXiv preprint arXiv:2305.15541, 2023
-
[51]
Tree of thoughts: Deliberate problem solving with large language models
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large language models. Advances in neural information processing systems, 36: 0 11809--11822, 2023
work page 2023
-
[52]
Mvp: Winning solution to smp challenge 2025 video track
Liliang Ye, Yunyao Zhang, Yafeng Wu, Yi-Ping Phoebe Chen, Junqing Yu, Wei Yang, and Zikai Song. Mvp: Winning solution to smp challenge 2025 video track. arXiv preprint arXiv:2507.00950, 2025
-
[53]
Why prompt design matters and works: A complexity analysis of prompt search space in llms
Xiang Zhang, Juntai Cao, Jiaqi Wei, Chenyu You, and Dujian Ding. Why prompt design matters and works: A complexity analysis of prompt search space in llms. arXiv preprint arXiv:2503.10084, 2025 a
-
[54]
ga-s^3 : Comprehensive social network simulation with group agents
Yunyao Zhang, Zikai Song, Hang Zhou, Wenfeng Ren, Yi-Ping Phoebe Chen, Junqing Yu, and Wei Yang. ga-s^3 : Comprehensive social network simulation with group agents. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (eds.), Findings of the Association for Computational Linguistics: ACL 2025, pp.\ 8950--8970, Vienna, Austria, Ju...
-
[55]
Semantics-aware bert for language understanding
Zhuosheng Zhang, Yuwei Wu, Hai Zhao, Zuchao Li, Shuailiang Zhang, Xi Zhou, and Xiang Zhou. Semantics-aware bert for language understanding. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pp.\ 9628--9635, 2020
work page 2020
-
[56]
Explicit planning helps language models in logical reasoning
Hongyu Zhao, Kangrui Wang, Mo Yu, and Hongyuan Mei. Explicit planning helps language models in logical reasoning. arXiv preprint arXiv:2303.15714, 2023 a
-
[57]
A Survey of Large Language Models
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. A survey of large language models. arXiv preprint arXiv:2303.18223, 1 0 (2), 2023 b
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[58]
Zi'ou Zheng, Christopher Malon, Martin Renqiang Min, and Xiaodan Zhu. Exploring the role of reasoning structures for constructing proofs in multi-step natural language reasoning with large language models. arXiv preprint arXiv:2410.08436, 2024
-
[59]
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
Denny Zhou, Nathanael Sch \"a rli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, et al. Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[60]
\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...
-
[61]
\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...
-
[62]
@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.