Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems

Bifan Wei; Boqian Yang; Hui Liu; Jianhao Deng; Jianwen Sun; Jie Ma; Jing Tao; Jun Liu; Lingling Zhang; Pinghui Wang

arxiv: 2605.14892 · v2 · pith:4JHPVUT4new · submitted 2026-05-14 · 💻 cs.AI

Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems

Shihao Qi , Jie Ma , Rui Xing , Wei Guo , Xiao Huang , Zhitao Gao , Jianhao Deng , Jun Liu

show 10 more authors

Lingling Zhang Bifan Wei Boqian Yang Pinghui Wang Jianwen Sun Jing Tao Yaqiang Wu Hui Liu Yu Yao Tongliang Liu

This is my paper

Pith reviewed 2026-05-19 16:47 UTC · model grok-4.3

classification 💻 cs.AI

keywords LLM agentsmulti-agent systemscollaborationfailure attributionself-evolutioncollective intelligenceautonomous systems

0 comments

The pith

A survey organizes LLM multi-agent research into four causally linked stages that build from individual capabilities to self-evolving collective systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews how LLM-based agents move beyond individual tasks to coordinated multi-agent work. It identifies that existing work treats collaboration, error diagnosis, and self-improvement in isolation. By linking them as a progression, it shows each stage sets requirements for the next while depending on prior ones. This matters because without closing the loop from failures back to structural changes, systems cannot sustain improvement across complex tasks. The survey maps taxonomies and flags boundary problems for future work on autonomous multi-agent intelligence.

Core claim

The authors propose the LIFE progression—Lay the capability foundation, Integrate agents through collaboration, Find faults through attribution, and Evolve through autonomous self-improvement—as a way to unify fragmented research. They characterize formal dependencies between adjacent stages, demonstrating that each both depends on and constrains the following one, and they outline a cross-stage agenda to create closed-loop systems that diagnose failures and reorganize autonomously.

What carries the argument

The LIFE progression framework, which sequences four stages and formally links them through dependencies that each stage imposes on the next.

Load-bearing premise

That the four stages genuinely form a causal chain where each depends on and constrains the next, rather than being loosely related research topics.

What would settle it

A documented case of an LLM multi-agent system achieving continuous self-improvement and structural reorganization without any mechanism for attributing failures across agents.

read the original abstract

LLM-based autonomous agents have demonstrated strong capabilities in reasoning, planning, and tool use, yet remain limited when tasks require sustained coordination across roles, tools, and environments. Multi-agent systems address this through structured collaboration among specialized agents, but tighter coordination also amplifies a less explored risk: errors can propagate across agents and interaction rounds, producing failures that are difficult to diagnose and rarely translate into structural self-improvement. Existing surveys cover individual agent capabilities, multi-agent collaboration, or agent self-evolution separately, leaving the causal dependencies among them unexamined. This survey provides a unified review organized around four causally linked stages, which we term the LIFE progression: Lay the capability foundation, Integrate agents through collaboration, Find faults through attribution, and Evolve through autonomous self-improvement. For each stage, we provide systematic taxonomies and formally characterize the dependencies between adjacent stages, revealing how each stage both depends on and constrains the next. Beyond synthesizing existing work, we identify open challenges at stage boundaries and propose a cross-stage research agenda for closed-loop multi-agent systems capable of continuously diagnosing failures, reorganizing structures, and refining agent behaviors, extending current coordination frameworks toward more self-organizing forms of collective intelligence. By bridging these previously fragmented research threads, this survey aims to offer both a systematic reference and a conceptual roadmap toward autonomous, self-improving multi-agent intelligence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper surveys LLM-based multi-agent systems and proposes the LIFE progression framework consisting of four stages: Lay the capability foundation, Integrate agents through collaboration, Find faults through attribution, and Evolve through autonomous self-improvement. It provides systematic taxonomies of existing work for each stage, formally characterizes dependencies between adjacent stages, identifies boundary challenges, and outlines a cross-stage research agenda aimed at enabling closed-loop, self-organizing multi-agent systems.

Significance. If the central claims hold, the survey would serve as a useful unifying reference that connects previously separate threads on individual agent capabilities, collaboration, fault attribution, and self-evolution in LLM-based multi-agent systems. The explicit identification of stage-boundary challenges and the proposed agenda for closed-loop systems could help orient future research toward more autonomous collective intelligence. The manuscript's strength lies in its broad literature coverage and organizational synthesis rather than in new derivations or empirical tests.

major comments (2)

[Abstract and §1] Abstract and §1: The central claim that the four stages form a 'causal progression' in which each 'both depends on and constrains the next,' enabling 'closed-loop' self-organizing systems, rests on narrative synthesis of the literature. No meta-analysis, cross-paper empirical correlations, or formal models are presented to demonstrate, for example, that mechanisms in the 'Find faults' stage measurably constrain outcomes in the 'Evolve' stage (or vice versa). This makes the progression function as an imposed organizing lens rather than an extracted causal structure, directly affecting the strength of the cross-stage agenda.
[§3 and §4] §3 (Integrate) and §4 (Find faults): The claimed dependency that collaboration structures constrain attribution accuracy is described qualitatively but without reference to any specific empirical studies or quantitative comparisons showing effect sizes or failure propagation rates across interaction rounds. Adding such evidence or explicit caveats would be needed to support the load-bearing assertion that bridging these boundaries produces self-organizing systems.

minor comments (2)

[Taxonomy tables] The taxonomy tables in each stage section would benefit from consistent column headers and explicit criteria used for categorizing papers to improve readability and reproducibility of the classification.
[Boundary challenges] Some citations in the boundary-challenge discussions appear to overlap with earlier stage sections; cross-referencing these explicitly would reduce redundancy.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and indicate revisions to strengthen the manuscript's claims while preserving its scope as a literature survey.

read point-by-point responses

Referee: [Abstract and §1] Abstract and §1: The central claim that the four stages form a 'causal progression' in which each 'both depends on and constrains the next,' enabling 'closed-loop' self-organizing systems, rests on narrative synthesis of the literature. No meta-analysis, cross-paper empirical correlations, or formal models are presented to demonstrate, for example, that mechanisms in the 'Find faults' stage measurably constrain outcomes in the 'Evolve' stage (or vice versa). This makes the progression function as an imposed organizing lens rather than an extracted causal structure, directly affecting the strength of the cross-stage agenda.

Authors: We acknowledge that the LIFE progression is a synthesized framework derived from systematic literature review rather than new empirical meta-analysis or formal causal models. The dependencies are characterized through logical analysis of reported interactions in existing studies, supported by specific citations to works demonstrating inter-stage effects. We will revise the abstract and §1 to clarify this as a conceptual organizing lens grounded in qualitative and selected quantitative evidence from the literature, add a dedicated limitations paragraph on the lack of formal causal inference, and include further references to empirical studies where inter-stage constraints have been observed. revision: partial
Referee: [§3 and §4] §3 (Integrate) and §4 (Find faults): The claimed dependency that collaboration structures constrain attribution accuracy is described qualitatively but without reference to any specific empirical studies or quantitative comparisons showing effect sizes or failure propagation rates across interaction rounds. Adding such evidence or explicit caveats would be needed to support the load-bearing assertion that bridging these boundaries produces self-organizing systems.

Authors: We agree that explicit empirical grounding would strengthen the dependency claims. In the revision, we will expand §§3 and 4 with additional citations to studies that provide quantitative or comparative evidence on how collaboration complexity affects attribution accuracy and error propagation. Where direct effect sizes are limited in the surveyed literature, we will insert clear caveats noting the need for future targeted experiments. This will better substantiate the boundary challenges and the proposed cross-stage agenda. revision: yes

Circularity Check

0 steps flagged

LIFE progression is an organizational synthesis of existing literature with no self-referential derivation

full rationale

The paper is a survey paper that synthesizes prior work on LLM-based multi-agent systems into a four-stage framework called the LIFE progression (Lay the capability foundation, Integrate agents through collaboration, Find faults through attribution, and Evolve through autonomous self-improvement). It claims to 'formally characterize the dependencies between adjacent stages' and reveal how each 'both depends on and constrains the next,' but this is accomplished via systematic taxonomies and review of external literature rather than any new derivations, equations, fitted parameters, or self-referential logic. No quantitative models, predictions from inputs, or load-bearing self-citations are present that would reduce the central claim to its own construction. The framework functions as a conceptual organizing lens for fragmented research threads, rendering the analysis self-contained without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The survey rests on the domain assumption that the four stages exhibit the claimed causal dependencies and that the proposed taxonomy captures the essential interactions without major omissions.

axioms (1)

domain assumption The four stages (Lay the capability foundation, Integrate agents through collaboration, Find faults through attribution, Evolve through autonomous self-improvement) form a causal progression in which each depends on and constrains the next.
Invoked as the central organizing principle that reveals dependencies and enables the cross-stage agenda.

invented entities (1)

LIFE progression framework no independent evidence
purpose: To provide a unified causal structure linking collaboration, failure attribution, and self-evolution stages
New conceptual construct introduced to organize the survey and propose future research.

pith-pipeline@v0.9.0 · 5833 in / 1440 out tokens · 93634 ms · 2026-05-19T16:47:01.085479+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We organize this lifecycle as a causally linked progression termed LIFE... formally characterize the dependencies between adjacent stages
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

failure attribution... self-evolution... closed-loop multi-agent systems

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

298 extracted references · 298 canonical work pages · 23 internal anchors

[1]

2025 , month = aug, howpublished =

Introducing. 2025 , month = aug, howpublished =

work page 2025
[2]

arXiv preprint arXiv:2412.19437 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Qwen3 Technical Report

Yang, A. and others , title =. arXiv preprint arXiv:2505.09388 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[4]

2026 , month = feb, howpublished =

Introducing. 2026 , month = feb, howpublished =

work page 2026
[5]

and Yang, D

Guo, D. and Yang, D. and Zhang, H. and others , title =. Nature , volume =

work page
[6]

and Bosma, M

Wei, J. and Bosma, M. and Zhao, V. Y. and Guu, K. and Yu, A. W. and Lester, B. and Du, N. and Dai, A. M. and Le, Q. V. , title =. Proc. ICLR , year =

work page
[7]

and Wang, X

Wei, J. and Wang, X. and Schuurmans, D. and Bosma, M. and Ichter, B. and Xia, F. and Chi, E. H. and Le, Q. V. and Zhou, D. , title =. Proc. NeurIPS , volume =

work page
[8]

and Gu, S

Kojima, T. and Gu, S. S. and Reid, M. and Matsuo, Y. and Iwasawa, Y. , title =. Proc. NeurIPS , volume =

work page
[9]

2026 , month = mar, howpublished =

Introducing. 2026 , month = mar, howpublished =

work page 2026
[10]

and Ma, C

Wang, L. and Ma, C. and Feng, X. and Zhang, Z. and Yang, H. and Zhang, J. and Chen, Z. and Tang, J. and Chen, X. and Lin, Y. and Zhao, W. X. and Wei, Z. and Wen, J.-R. , title =. Frontiers of Computer Science , volume =. 2024 , note =

work page 2024
[11]

and Chen, W

Xi, Z. and Chen, W. and Guo, X. and He, W. and Ding, Y. and Hong, B. and Zhang, M. and Wang, J. and Jin, S. and Zhou, E. and Zheng, R. and Fan, X. and Wang, X. and Xiong, L. and Zhou, Y. and Wang, W. and Jiang, C. and Zou, Y. and Liu, X. and Yin, Z. and Dou, S. and Weng, R. and Cheng, W. and Zhang, Q. and Qin, Y. and Zheng, Y. and Qiu, X. and Huang, X. an...

work page 2025
[12]

Jimenez, C. E. and Yang, J. and Wettig, A. and Yao, S. and Pei, K. and Press, O. and Narasimhan, K. , title =. Proc. ICLR , year =

work page
[13]

Qian, Chen and Liu, Wei and Liu, Hongzhang and Chen, Nuo and Dang, Yufan and Li, Jiahao and Yang, Cheng and Chen, Weize and Su, Yusheng and Cong, Xin and Xu, Juyuan and Li, Dahai and Liu, Zhiyuan and Sun, Maosong , title =. Proc. ACL , pages =. 2024 , publisher =

work page 2024
[14]

Boiko, D. A. and MacKnight, R. and Kline, B. and Gomes, G. , title =. Nature , volume =

work page
[15]

Bran, A. M. and Cox, S. and Schilter, O. and Baldassari, C. and White, A. D. and Schwaller, P. , title =. Nature Machine Intelligence , volume =

work page
[16]

and Joshi, T

Zitkovich, B. and Joshi, T. J. and Irpan, A. and Ichter, B. and Hsu, J. and Herzog, A. and Hausman, K. and Gopalakrishnan, K. and Fu, C. and Florence, P. and Finn, C. and Dubey, K. A. and Driess, D. and Ding, T. and Choromanski, K. M. and Chen, X. and Chebotar, Y. and Carbajal, J. and Brown, N. and Brohan, A. and Arenas, M. G. and Han, K. , title =. Proc....

work page
[17]

Huang, Wenlong and Xia, Fei and Xiao, Ted and Chan, Harris and Liang, Jacky and Florence, Pete and Zeng, Andy and Tompson, Jonathan and Mordatch, Igor and Chebotar, Yevgen and Sermanet, Pierre and Jackson, Tomas and Brown, Noah and Luu, Linda and Levine, Sergey and Hausman, Karol and Ichter, Brian , title =. Proc. CoRL , volume =. 2023 , publisher =

work page 2023
[18]

and Zhao, J

Yao, S. and Zhao, J. and Yu, D. and Du, N. and Shafran, I. and Narasimhan, K. and Cao, Y. , title =. Proc. ICLR , year =

work page
[19]

and Cassano, F

Shinn, N. and Cassano, F. and Gopinath, A. and Narasimhan, K. and Yao, S. , title =. Proc. NeurIPS , volume =

work page
[20]

Why Do Multi-Agent LLM Systems Fail?

Cemri, Mert and Pan, Melissa Z. and Yang, Shuyi and Agrawal, Lakshya A. and Chopra, Bhavya and Tiwari, Rishabh and Keutzer, Kurt and Parameswaran, Aditya and Klein, Dan and Ramchandran, Kannan and Zaharia, Matei and Gonzalez, Joseph E. and Stoica, Ion , title =. arXiv preprint arXiv:2503.13657 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[21]

TechRxiv preprint , doi =

Yao, Yunzhi and Qin, Jiaxin and Zhang, Ningyu and Xu, Haoming and Zhu, Yuqi and Yu, Zeping and Wang, Mengru and Tang, Yuqi and Gu, Jiachen and Deng, Shumin and Peng, Nanyun and Chen, Huajun , title =. TechRxiv preprint , doi =

work page
[22]

and Zhuge, M

Hong, S. and Zhuge, M. and Chen, J. and Zheng, X. and Cheng, Y. and Wang, J. and Zhang, C. and Wang, Z. and Yau, S. K. S. and Lin, Z. and Zhou, L. and Ran, C. and Xiao, L. and Wu, C. and Schmidhuber, J. , title =. Proc. ICLR , year =

work page
[23]

and Chen, X

Guo, T. and Chen, X. and Wang, Y. and Chang, R. and Pei, S. and Chawla, N. V. and Wiest, O. and Zhang, X. , title =. Proc. IJCAI , pages =

work page
[24]

and Bansal, G

Wu, Q. and Bansal, G. and Zhang, J. and Wu, Y. and Li, B. and Zhu, E. and Jiang, L. and Zhang, X. and Zhang, S. and Liu, J. and Awadallah, A. H. and White, R. W. and Burger, D. and Wang, C. , title =. Proc. COLM , year =

work page
[25]

and Han, S

Park, C. and Han, S. and Guo, X. and Ozdaglar, A. and Zhang, K. and Kim, J.-K. , title =. Proc. ACL , pages =

work page
[26]

2024 , month = nov, howpublished =

Introducing the. 2024 , month = nov, howpublished =

work page 2024
[27]

2025 , month = apr, howpublished =

Announcing the. 2025 , month = apr, howpublished =

work page 2025
[28]

and Yin, M

Zhang, S. and Yin, M. and Zhang, J. and Liu, J. and Han, Z. and Zhang, J. and Li, B. and Wang, C. and Wang, H. and Chen, Y. and Wu, Q. , title =. Proc. ICML , pages =

work page
[29]

Deshpande, V

Deshpande, D. and Gangal, V. and Mehta, H. and Krishnan, J. and Kannappan, A. and Qian, R. , title =. arXiv preprint arXiv:2505.08638 , year =

work page arXiv
[30]

and Xie, X

Ma, X. and Xie, X. and Wang, Y. and Wang, J. and Wu, B. and Li, M. and Wang, Q. , title =. arXiv preprint arXiv:2509.23735 , year =

work page arXiv
[31]

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

Gao, H.-a. and Geng, J. and Hua, W. and Hu, M. and Juan, X. and Liu, H. and Liu, S. and Qiu, J. and Qi, X. and Ren, Q. and Wu, Y. and Wang, H. and Xiao, H. and Zhou, Y. and Zhang, S. and Zhang, J. and Xiang, J. and Fang, Y. and Zhao, Q. and Liu, D. and Qian, C. and Wang, Z. and Hu, M. and Wang, H. and Wu, Q. and Ji, H. and Wang, M. , title =. arXiv prepri...

work page internal anchor Pith review Pith/arXiv arXiv
[32]

A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

Fang, J. and Peng, Y. and Zhang, X. and Wang, Y. and Yi, X. and Zhang, G. and Xu, Y. and Wu, B. and Liu, S. and Li, Z. and Ren, Z. and Aletras, N. and Wang, X. and Zhou, H. and Meng, Z. , title =. arXiv preprint arXiv:2508.07407 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[33]

From flat logs to causal graphs: Hierarchical failure attribution for llm-based multi-agent systems.arXiv preprint arXiv:2602.23701, 2026

Wang, Y. and Wu, W. and Wang, J. and Wang, Q. , title =. arXiv preprint arXiv:2602.23701 , year =

work page arXiv
[34]

and Qian, C

Dang, Y. and Qian, C. and others , title =. Proc. NeurIPS , year =

work page
[35]

verbose database queries correlate with null results

Zhu, K. and Liu, Z. and Li, B. and Tian, M. and Yang, Y. and Zhang, J. and Han, P. and Xie, Q. and Cui, F. and Zhang, W. and Ma, X. and Yu, X. and Ramesh, G. and Wu, J. and Liu, Z. and Lu, P. and Zou, J. and You, J. , title =. arXiv preprint arXiv:2509.25370 , year =

work page arXiv
[36]

and Dai, Q

Zhang, Z. and Dai, Q. and Bo, X. and Ma, C. and Li, R. and Chen, X. and others , title =. ACM Transactions on Information Systems , volume =. 2025 , note =

work page 2025
[37]

and Zhang, Z

Wei, H. and Zhang, Z. and He, S. and Xia, T. and Pan, S. and Liu, F. , title =. Proc. ACL , pages =

work page
[39]

and Wang, S

Li, X. and Wang, S. and Zeng, S. and Wu, Y. and Yang, Y. , title =. Vicinagearth , volume =. 2024 , note =

work page 2024
[40]

Multi-Agent Collaboration Mechanisms: A Survey of LLMs

Tran, K.-T. and others , title =. arXiv preprint arXiv:2501.06322 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[42]

LLM Multi-Agent Systems: Challenges and Open Problems

Han, S. and others , title =. arXiv preprint arXiv:2402.03578 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[43]

GPT-4 Technical Report

OpenAI , title =. arXiv preprint arXiv:2303.08774 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[44]

and Lavril, T

Touvron, H. and Lavril, T. and Izacard, G. and Martinet, X. and Lachaux, M.-A. and Lacroix, T. and Rozi\`. LLaMA: Open and efficient foundation language models , journal =

work page
[45]

and Xu, F

Zhou, S. and Xu, F. F. and Zhu, H. and Zhou, X. and Lo, R. and Sridhar, A. and Cheng, X. and Ou, T. and Bisk, Y. and Fried, D. and Alon, U. and Neubig, G. , title =. Proc. ICLR , year =

work page
[46]

Mathematics Into Type , howpublished =

work page
[47]

The Rise and Potential of Large Language Model Based Agents: A Survey

Xi, Z. and Chen, W. and Guo, X. and He, W. and Ding, Y. and Hong, B. and Zhang, M. and Wang, J. and Jin, S. and Zhou, E. and Zheng, R. and Fan, X. and Wang, X. and Xiong, L. and Zhou, Y. and Wang, W. and C. Jiang and Zou, Y. and Liu, X. and Yin, Z. and Dou, S. and Weng, R. and Cheng, W. and Zhang, Q. and Qin, Y. and Zheng, Y. and Qiu, X. and Huang, X. and...

work page internal anchor Pith review Pith/arXiv arXiv
[48]

Chaundy, T. W. and Barrett, P. R. and Batey, C. , title =. 1954 , publisher =

work page 1954
[49]

and Goossens, M

Mittelbach, F. and Goossens, M. , title =. 2004 , publisher =

work page 2004
[50]

More Math Into LaTeX , year =

Gr\". More Math Into LaTeX , year =

work page
[51]

and Sharp, J

Letourneau, M. and Sharp, J. W. , title =

work page
[52]

, title =

Sira-Ramirez, H. , title =. Systems & Control Letters , volume =

work page
[53]

, title =

Levant, A. , title =. Proc. IEEE CDC , pages =. 2006 , address =

work page 2006
[54]

and Join, C

Fliess, M. and Join, C. and Sira-Ramirez, H. , title =. International Journal of Modelling, Identification and Control , volume =

work page
[55]

and Astolfi, A

Ortega, R. and Astolfi, A. and Bastin, G. and Rodriguez, H. , title =. Proc. ACC , pages =. 2000 , address =

work page 2000
[56]

Findings of ACL , pages =

Jie Huang and Kevin Chen-Chuan Chang , title =. Findings of ACL , pages =

work page
[57]

From System 1 to System 2: A Survey of Reasoning Large Language Models

Fei Sun and Chaochao Chen and Shuai Li and others , title =. arXiv preprint arXiv:2502.17419 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[58]

Retrieval-Augmented Generation for Knowledge-Intensive

Patrick Lewis and Ethan Perez and Aleksandra Piktus and Fabio Petroni and Vladimir Karpukhin and Naman Goyal and Heinrich K\". Retrieval-Augmented Generation for Knowledge-Intensive. Proc. NeurIPS , volume =

work page
[59]

Akari Asai and Zeqiu Wu and Yizhong Wang and Avirup Sil and Hannaneh Hajishirzi , title =. Proc. ICLR , year =

work page
[60]

Mufei Li and Siqi Miao and Pan Li , title =. Proc. ICLR , year =

work page
[61]

Yudi Zhang and Pei Xiao and Lu Wang and Chaoyun Zhang and Meng Fang and Yali Du and Yevgeniy Puzyrev and Randolph Yao and Si Qin and Qingwei Lin and Mykola Pechenizkiy and Dongmei Zhang and Saravanakumar Rajmohan and Qi Zhang , title =. Proc. ICLR , year =

work page
[62]

Transactions on Machine Learning Research , year =

Zhuosheng Zhang and Aston Zhang and Mu Li and Hai Zhao and George Karypis and Alex Smola , title =. Transactions on Machine Learning Research , year =

work page
[63]

Smith and Ranjay Krishna , title =

Yushi Hu and Weijia Shi and Xingyu Fu and Dan Roth and Mari Ostendorf and Luke Zettlemoyer and Noah A. Smith and Ranjay Krishna , title =. Proc. NeurIPS , volume =

work page
[64]

Ji Qi and Ming Ding and Weihan Wang and Yushi Bai and Qingsong Lv and Wenyi Hong and Bin Xu and Lei Hou and Juanzi Li and Yuxiao Dong and Jie Tang , title =. Proc. ICLR , year =

work page
[65]

Besta, Maciej and Blach, Nils and Kubicek, Ales and Gerstenberger, Robert and Podstawski, Michal and Gianinazzi, Lukas and Gajda, Joanna and Lehmann, Tomasz and Niewiadomski, Hubert and Nyczyk, Piotr and Hoefler, Torsten , title =. Proc. AAAI , volume =

work page
[66]

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Snell, Charlie and Lee, Jaehoon and Xu, Kelvin and Kumar, Aviral , title =. arXiv preprint arXiv:2408.03314 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[67]

Wang, Peiyi and Li, Lei and Shao, Zhihong and Xu, Runxin and Dai, Damai and Li, Yifei and Chen, Deli and Wu, Yu and Sui, Zhifang , title =. Proc. ACL , pages =

work page
[68]

Kojima, Takeshi and Gu, Shixiang Shane and Reid, Machel and Matsuo, Yutaka and Iwasawa, Yusuke , title =. Proc. NeurIPS , volume =

work page
[69]

Yao, Shunyu and Yu, Dian and Zhao, Jeffrey and Shafran, Izhak and Griffiths, Tom and Cao, Yuan and Narasimhan, Karthik , title =. Proc. NeurIPS , volume =

work page
[70]

Wang, Xuezhi and Wei, Jason and Schuurmans, Dale and Le, Quoc and Chi, Ed and Narang, Sharan and Chowdhery, Aakanksha and Zhou, Denny , title =. Proc. ICLR , year =

work page
[71]

Madaan, Aman and Tandon, Niket and Gupta, Prakhar and Hallinan, Skyler and Gao, Luyu and Wiegreffe, Sarah and Alon, Uri and Dziri, Nouha and Prabhumoye, Shrimai and Yang, Yiming and Gupta, Shashank and Majumder, Bodhisattwa Prasad and Hermann, Katherine and Welleck, Sean and Yazdanbakhsh, Amir and Clark, Peter , title =. Proc. NeurIPS , volume =

work page
[72]

Lightman, Hunter and Kosaraju, Vineet and Burda, Yuri and Edwards, Harrison and Baker, Bowen and Lee, Teddy and Leike, Jan and Schulman, John and Sutskever, Ilya and Cobbe, Karl , title =. Proc. ICLR , year =

work page
[73]

Nature , year =

DeepSeek-R1: Incentivizing Reasoning Capability in. Nature , year =

work page
[74]

Zhang, Xuan and Du, Chao and Pang, Tianyu and Liu, Qian and Gao, Wei and Lin, Min , title =. Proc. NeurIPS , volume =

work page
[75]

Luo, Haipeng and Sun, Qingfeng and Xu, Can and Zhao, Pu and Lou, Jian-Guang and Tao, Chongyang and Geng, Xiubo and Lin, Qingwei and Chen, Shifeng and Tang, Yansong and Zhang, Dongmei , title =. Proc. ICLR , year =

work page
[76]

ACM Computing Surveys , volume =

Ji, Ziwei and Lee, Nayeon and Frieske, Rita and Yu, Tiezheng and Su, Dan and Xu, Yan and Ishii, Etsuko and Bang, Yejin and Chen, Delong and Dai, Wenliang and Chan, Ho Shu and Madotto, Andrea and Fung, Pascale , title =. ACM Computing Surveys , volume =

work page
[77]

Min, Sewon and Krishna, Kalpesh and Lyu, Xinxi and Lewis, Mike and Yih, Wen-tau and Koh, Pang Wei and Iyyer, Mohit and Zettlemoyer, Luke and Hajishirzi, Hannaneh , title =. Proc. EMNLP , pages =

work page
[78]

2307.13528 , archivePrefix=

Chern, I-Chun and Chern, Steffi and Chen, Shiqi and Yuan, Weizhe and Feng, Kehua and Zhou, Chunting and He, Junxian and Neubig, Graham and Liu, Pengfei , title =. arXiv preprint arXiv:2307.13528 , year =

work page arXiv
[79]

Findings of ACL , year =

Yuxia Wang and Revanth Gangi Reddy and Zain Muhammad Mujahid and Arnav Arora and Aleksandr Rubashevskii and Jiahui Geng and Osama Mohammed Afzal and Liangming Pan and Nadav Borenstein and Aditya Pillai and Isabelle Augenstein and Iryna Gurevych and Preslav Nakov , title =. Findings of ACL , year =

work page
[80]

Manakul, Potsawee and Liusie, Adian and Gales, Mark , title =. Proc. EMNLP , pages =

work page
[81]

Kuhn, Lorenz and Gal, Yarin and Farquhar, Sebastian , title =. Proc. ICLR , year =

work page
[82]

Language Models (Mostly) Know What They Know

Kadavath, Saurav and Conerly, Tom and Askell, Amanda and Henighan, Tom and Drain, Dawn and Perez, Ethan and Schiefer, Nicholas and Hatfield-Dodds, Zac and DasSarma, Nova and Tran-Johnson, Eli and others , title =. arXiv preprint arXiv:2207.05221 , year =

work page internal anchor Pith review Pith/arXiv arXiv

Showing first 80 references.

[1] [1]

2025 , month = aug, howpublished =

Introducing. 2025 , month = aug, howpublished =

work page 2025

[2] [2]

arXiv preprint arXiv:2412.19437 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Qwen3 Technical Report

Yang, A. and others , title =. arXiv preprint arXiv:2505.09388 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

2026 , month = feb, howpublished =

Introducing. 2026 , month = feb, howpublished =

work page 2026

[5] [5]

and Yang, D

Guo, D. and Yang, D. and Zhang, H. and others , title =. Nature , volume =

work page

[6] [6]

and Bosma, M

Wei, J. and Bosma, M. and Zhao, V. Y. and Guu, K. and Yu, A. W. and Lester, B. and Du, N. and Dai, A. M. and Le, Q. V. , title =. Proc. ICLR , year =

work page

[7] [7]

and Wang, X

Wei, J. and Wang, X. and Schuurmans, D. and Bosma, M. and Ichter, B. and Xia, F. and Chi, E. H. and Le, Q. V. and Zhou, D. , title =. Proc. NeurIPS , volume =

work page

[8] [8]

and Gu, S

Kojima, T. and Gu, S. S. and Reid, M. and Matsuo, Y. and Iwasawa, Y. , title =. Proc. NeurIPS , volume =

work page

[9] [9]

2026 , month = mar, howpublished =

Introducing. 2026 , month = mar, howpublished =

work page 2026

[10] [10]

and Ma, C

Wang, L. and Ma, C. and Feng, X. and Zhang, Z. and Yang, H. and Zhang, J. and Chen, Z. and Tang, J. and Chen, X. and Lin, Y. and Zhao, W. X. and Wei, Z. and Wen, J.-R. , title =. Frontiers of Computer Science , volume =. 2024 , note =

work page 2024

[11] [11]

and Chen, W

Xi, Z. and Chen, W. and Guo, X. and He, W. and Ding, Y. and Hong, B. and Zhang, M. and Wang, J. and Jin, S. and Zhou, E. and Zheng, R. and Fan, X. and Wang, X. and Xiong, L. and Zhou, Y. and Wang, W. and Jiang, C. and Zou, Y. and Liu, X. and Yin, Z. and Dou, S. and Weng, R. and Cheng, W. and Zhang, Q. and Qin, Y. and Zheng, Y. and Qiu, X. and Huang, X. an...

work page 2025

[12] [12]

Jimenez, C. E. and Yang, J. and Wettig, A. and Yao, S. and Pei, K. and Press, O. and Narasimhan, K. , title =. Proc. ICLR , year =

work page

[13] [13]

Qian, Chen and Liu, Wei and Liu, Hongzhang and Chen, Nuo and Dang, Yufan and Li, Jiahao and Yang, Cheng and Chen, Weize and Su, Yusheng and Cong, Xin and Xu, Juyuan and Li, Dahai and Liu, Zhiyuan and Sun, Maosong , title =. Proc. ACL , pages =. 2024 , publisher =

work page 2024

[14] [14]

Boiko, D. A. and MacKnight, R. and Kline, B. and Gomes, G. , title =. Nature , volume =

work page

[15] [15]

Bran, A. M. and Cox, S. and Schilter, O. and Baldassari, C. and White, A. D. and Schwaller, P. , title =. Nature Machine Intelligence , volume =

work page

[16] [16]

and Joshi, T

Zitkovich, B. and Joshi, T. J. and Irpan, A. and Ichter, B. and Hsu, J. and Herzog, A. and Hausman, K. and Gopalakrishnan, K. and Fu, C. and Florence, P. and Finn, C. and Dubey, K. A. and Driess, D. and Ding, T. and Choromanski, K. M. and Chen, X. and Chebotar, Y. and Carbajal, J. and Brown, N. and Brohan, A. and Arenas, M. G. and Han, K. , title =. Proc....

work page

[17] [17]

Huang, Wenlong and Xia, Fei and Xiao, Ted and Chan, Harris and Liang, Jacky and Florence, Pete and Zeng, Andy and Tompson, Jonathan and Mordatch, Igor and Chebotar, Yevgen and Sermanet, Pierre and Jackson, Tomas and Brown, Noah and Luu, Linda and Levine, Sergey and Hausman, Karol and Ichter, Brian , title =. Proc. CoRL , volume =. 2023 , publisher =

work page 2023

[18] [18]

and Zhao, J

Yao, S. and Zhao, J. and Yu, D. and Du, N. and Shafran, I. and Narasimhan, K. and Cao, Y. , title =. Proc. ICLR , year =

work page

[19] [19]

and Cassano, F

Shinn, N. and Cassano, F. and Gopinath, A. and Narasimhan, K. and Yao, S. , title =. Proc. NeurIPS , volume =

work page

[20] [20]

Why Do Multi-Agent LLM Systems Fail?

Cemri, Mert and Pan, Melissa Z. and Yang, Shuyi and Agrawal, Lakshya A. and Chopra, Bhavya and Tiwari, Rishabh and Keutzer, Kurt and Parameswaran, Aditya and Klein, Dan and Ramchandran, Kannan and Zaharia, Matei and Gonzalez, Joseph E. and Stoica, Ion , title =. arXiv preprint arXiv:2503.13657 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[21] [21]

TechRxiv preprint , doi =

Yao, Yunzhi and Qin, Jiaxin and Zhang, Ningyu and Xu, Haoming and Zhu, Yuqi and Yu, Zeping and Wang, Mengru and Tang, Yuqi and Gu, Jiachen and Deng, Shumin and Peng, Nanyun and Chen, Huajun , title =. TechRxiv preprint , doi =

work page

[22] [22]

and Zhuge, M

Hong, S. and Zhuge, M. and Chen, J. and Zheng, X. and Cheng, Y. and Wang, J. and Zhang, C. and Wang, Z. and Yau, S. K. S. and Lin, Z. and Zhou, L. and Ran, C. and Xiao, L. and Wu, C. and Schmidhuber, J. , title =. Proc. ICLR , year =

work page

[23] [23]

and Chen, X

Guo, T. and Chen, X. and Wang, Y. and Chang, R. and Pei, S. and Chawla, N. V. and Wiest, O. and Zhang, X. , title =. Proc. IJCAI , pages =

work page

[24] [24]

and Bansal, G

Wu, Q. and Bansal, G. and Zhang, J. and Wu, Y. and Li, B. and Zhu, E. and Jiang, L. and Zhang, X. and Zhang, S. and Liu, J. and Awadallah, A. H. and White, R. W. and Burger, D. and Wang, C. , title =. Proc. COLM , year =

work page

[25] [25]

and Han, S

Park, C. and Han, S. and Guo, X. and Ozdaglar, A. and Zhang, K. and Kim, J.-K. , title =. Proc. ACL , pages =

work page

[26] [26]

2024 , month = nov, howpublished =

Introducing the. 2024 , month = nov, howpublished =

work page 2024

[27] [27]

2025 , month = apr, howpublished =

Announcing the. 2025 , month = apr, howpublished =

work page 2025

[28] [28]

and Yin, M

Zhang, S. and Yin, M. and Zhang, J. and Liu, J. and Han, Z. and Zhang, J. and Li, B. and Wang, C. and Wang, H. and Chen, Y. and Wu, Q. , title =. Proc. ICML , pages =

work page

[29] [29]

Deshpande, V

Deshpande, D. and Gangal, V. and Mehta, H. and Krishnan, J. and Kannappan, A. and Qian, R. , title =. arXiv preprint arXiv:2505.08638 , year =

work page arXiv

[30] [30]

and Xie, X

Ma, X. and Xie, X. and Wang, Y. and Wang, J. and Wu, B. and Li, M. and Wang, Q. , title =. arXiv preprint arXiv:2509.23735 , year =

work page arXiv

[31] [31]

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

Gao, H.-a. and Geng, J. and Hua, W. and Hu, M. and Juan, X. and Liu, H. and Liu, S. and Qiu, J. and Qi, X. and Ren, Q. and Wu, Y. and Wang, H. and Xiao, H. and Zhou, Y. and Zhang, S. and Zhang, J. and Xiang, J. and Fang, Y. and Zhao, Q. and Liu, D. and Qian, C. and Wang, Z. and Hu, M. and Wang, H. and Wu, Q. and Ji, H. and Wang, M. , title =. arXiv prepri...

work page internal anchor Pith review Pith/arXiv arXiv

[32] [32]

A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

Fang, J. and Peng, Y. and Zhang, X. and Wang, Y. and Yi, X. and Zhang, G. and Xu, Y. and Wu, B. and Liu, S. and Li, Z. and Ren, Z. and Aletras, N. and Wang, X. and Zhou, H. and Meng, Z. , title =. arXiv preprint arXiv:2508.07407 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[33] [33]

From flat logs to causal graphs: Hierarchical failure attribution for llm-based multi-agent systems.arXiv preprint arXiv:2602.23701, 2026

Wang, Y. and Wu, W. and Wang, J. and Wang, Q. , title =. arXiv preprint arXiv:2602.23701 , year =

work page arXiv

[34] [34]

and Qian, C

Dang, Y. and Qian, C. and others , title =. Proc. NeurIPS , year =

work page

[35] [35]

verbose database queries correlate with null results

Zhu, K. and Liu, Z. and Li, B. and Tian, M. and Yang, Y. and Zhang, J. and Han, P. and Xie, Q. and Cui, F. and Zhang, W. and Ma, X. and Yu, X. and Ramesh, G. and Wu, J. and Liu, Z. and Lu, P. and Zou, J. and You, J. , title =. arXiv preprint arXiv:2509.25370 , year =

work page arXiv

[36] [36]

and Dai, Q

Zhang, Z. and Dai, Q. and Bo, X. and Ma, C. and Li, R. and Chen, X. and others , title =. ACM Transactions on Information Systems , volume =. 2025 , note =

work page 2025

[37] [37]

and Zhang, Z

Wei, H. and Zhang, Z. and He, S. and Xia, T. and Pan, S. and Liu, F. , title =. Proc. ACL , pages =

work page

[38] [39]

and Wang, S

Li, X. and Wang, S. and Zeng, S. and Wu, Y. and Yang, Y. , title =. Vicinagearth , volume =. 2024 , note =

work page 2024

[39] [40]

Multi-Agent Collaboration Mechanisms: A Survey of LLMs

Tran, K.-T. and others , title =. arXiv preprint arXiv:2501.06322 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[40] [42]

LLM Multi-Agent Systems: Challenges and Open Problems

Han, S. and others , title =. arXiv preprint arXiv:2402.03578 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[41] [43]

GPT-4 Technical Report

OpenAI , title =. arXiv preprint arXiv:2303.08774 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[42] [44]

and Lavril, T

Touvron, H. and Lavril, T. and Izacard, G. and Martinet, X. and Lachaux, M.-A. and Lacroix, T. and Rozi\`. LLaMA: Open and efficient foundation language models , journal =

work page

[43] [45]

and Xu, F

Zhou, S. and Xu, F. F. and Zhu, H. and Zhou, X. and Lo, R. and Sridhar, A. and Cheng, X. and Ou, T. and Bisk, Y. and Fried, D. and Alon, U. and Neubig, G. , title =. Proc. ICLR , year =

work page

[44] [46]

Mathematics Into Type , howpublished =

work page

[45] [47]

The Rise and Potential of Large Language Model Based Agents: A Survey

Xi, Z. and Chen, W. and Guo, X. and He, W. and Ding, Y. and Hong, B. and Zhang, M. and Wang, J. and Jin, S. and Zhou, E. and Zheng, R. and Fan, X. and Wang, X. and Xiong, L. and Zhou, Y. and Wang, W. and C. Jiang and Zou, Y. and Liu, X. and Yin, Z. and Dou, S. and Weng, R. and Cheng, W. and Zhang, Q. and Qin, Y. and Zheng, Y. and Qiu, X. and Huang, X. and...

work page internal anchor Pith review Pith/arXiv arXiv

[46] [48]

Chaundy, T. W. and Barrett, P. R. and Batey, C. , title =. 1954 , publisher =

work page 1954

[47] [49]

and Goossens, M

Mittelbach, F. and Goossens, M. , title =. 2004 , publisher =

work page 2004

[48] [50]

More Math Into LaTeX , year =

Gr\". More Math Into LaTeX , year =

work page

[49] [51]

and Sharp, J

Letourneau, M. and Sharp, J. W. , title =

work page

[50] [52]

, title =

Sira-Ramirez, H. , title =. Systems & Control Letters , volume =

work page

[51] [53]

, title =

Levant, A. , title =. Proc. IEEE CDC , pages =. 2006 , address =

work page 2006

[52] [54]

and Join, C

Fliess, M. and Join, C. and Sira-Ramirez, H. , title =. International Journal of Modelling, Identification and Control , volume =

work page

[53] [55]

and Astolfi, A

Ortega, R. and Astolfi, A. and Bastin, G. and Rodriguez, H. , title =. Proc. ACC , pages =. 2000 , address =

work page 2000

[54] [56]

Findings of ACL , pages =

Jie Huang and Kevin Chen-Chuan Chang , title =. Findings of ACL , pages =

work page

[55] [57]

From System 1 to System 2: A Survey of Reasoning Large Language Models

Fei Sun and Chaochao Chen and Shuai Li and others , title =. arXiv preprint arXiv:2502.17419 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[56] [58]

Retrieval-Augmented Generation for Knowledge-Intensive

Patrick Lewis and Ethan Perez and Aleksandra Piktus and Fabio Petroni and Vladimir Karpukhin and Naman Goyal and Heinrich K\". Retrieval-Augmented Generation for Knowledge-Intensive. Proc. NeurIPS , volume =

work page

[57] [59]

Akari Asai and Zeqiu Wu and Yizhong Wang and Avirup Sil and Hannaneh Hajishirzi , title =. Proc. ICLR , year =

work page

[58] [60]

Mufei Li and Siqi Miao and Pan Li , title =. Proc. ICLR , year =

work page

[59] [61]

Yudi Zhang and Pei Xiao and Lu Wang and Chaoyun Zhang and Meng Fang and Yali Du and Yevgeniy Puzyrev and Randolph Yao and Si Qin and Qingwei Lin and Mykola Pechenizkiy and Dongmei Zhang and Saravanakumar Rajmohan and Qi Zhang , title =. Proc. ICLR , year =

work page

[60] [62]

Transactions on Machine Learning Research , year =

Zhuosheng Zhang and Aston Zhang and Mu Li and Hai Zhao and George Karypis and Alex Smola , title =. Transactions on Machine Learning Research , year =

work page

[61] [63]

Smith and Ranjay Krishna , title =

Yushi Hu and Weijia Shi and Xingyu Fu and Dan Roth and Mari Ostendorf and Luke Zettlemoyer and Noah A. Smith and Ranjay Krishna , title =. Proc. NeurIPS , volume =

work page

[62] [64]

Ji Qi and Ming Ding and Weihan Wang and Yushi Bai and Qingsong Lv and Wenyi Hong and Bin Xu and Lei Hou and Juanzi Li and Yuxiao Dong and Jie Tang , title =. Proc. ICLR , year =

work page

[63] [65]

Besta, Maciej and Blach, Nils and Kubicek, Ales and Gerstenberger, Robert and Podstawski, Michal and Gianinazzi, Lukas and Gajda, Joanna and Lehmann, Tomasz and Niewiadomski, Hubert and Nyczyk, Piotr and Hoefler, Torsten , title =. Proc. AAAI , volume =

work page

[64] [66]

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Snell, Charlie and Lee, Jaehoon and Xu, Kelvin and Kumar, Aviral , title =. arXiv preprint arXiv:2408.03314 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[65] [67]

Wang, Peiyi and Li, Lei and Shao, Zhihong and Xu, Runxin and Dai, Damai and Li, Yifei and Chen, Deli and Wu, Yu and Sui, Zhifang , title =. Proc. ACL , pages =

work page

[66] [68]

Kojima, Takeshi and Gu, Shixiang Shane and Reid, Machel and Matsuo, Yutaka and Iwasawa, Yusuke , title =. Proc. NeurIPS , volume =

work page

[67] [69]

Yao, Shunyu and Yu, Dian and Zhao, Jeffrey and Shafran, Izhak and Griffiths, Tom and Cao, Yuan and Narasimhan, Karthik , title =. Proc. NeurIPS , volume =

work page

[68] [70]

Wang, Xuezhi and Wei, Jason and Schuurmans, Dale and Le, Quoc and Chi, Ed and Narang, Sharan and Chowdhery, Aakanksha and Zhou, Denny , title =. Proc. ICLR , year =

work page

[69] [71]

Madaan, Aman and Tandon, Niket and Gupta, Prakhar and Hallinan, Skyler and Gao, Luyu and Wiegreffe, Sarah and Alon, Uri and Dziri, Nouha and Prabhumoye, Shrimai and Yang, Yiming and Gupta, Shashank and Majumder, Bodhisattwa Prasad and Hermann, Katherine and Welleck, Sean and Yazdanbakhsh, Amir and Clark, Peter , title =. Proc. NeurIPS , volume =

work page

[70] [72]

Lightman, Hunter and Kosaraju, Vineet and Burda, Yuri and Edwards, Harrison and Baker, Bowen and Lee, Teddy and Leike, Jan and Schulman, John and Sutskever, Ilya and Cobbe, Karl , title =. Proc. ICLR , year =

work page

[71] [73]

Nature , year =

DeepSeek-R1: Incentivizing Reasoning Capability in. Nature , year =

work page

[72] [74]

Zhang, Xuan and Du, Chao and Pang, Tianyu and Liu, Qian and Gao, Wei and Lin, Min , title =. Proc. NeurIPS , volume =

work page

[73] [75]

Luo, Haipeng and Sun, Qingfeng and Xu, Can and Zhao, Pu and Lou, Jian-Guang and Tao, Chongyang and Geng, Xiubo and Lin, Qingwei and Chen, Shifeng and Tang, Yansong and Zhang, Dongmei , title =. Proc. ICLR , year =

work page

[74] [76]

ACM Computing Surveys , volume =

Ji, Ziwei and Lee, Nayeon and Frieske, Rita and Yu, Tiezheng and Su, Dan and Xu, Yan and Ishii, Etsuko and Bang, Yejin and Chen, Delong and Dai, Wenliang and Chan, Ho Shu and Madotto, Andrea and Fung, Pascale , title =. ACM Computing Surveys , volume =

work page

[75] [77]

Min, Sewon and Krishna, Kalpesh and Lyu, Xinxi and Lewis, Mike and Yih, Wen-tau and Koh, Pang Wei and Iyyer, Mohit and Zettlemoyer, Luke and Hajishirzi, Hannaneh , title =. Proc. EMNLP , pages =

work page

[76] [78]

2307.13528 , archivePrefix=

Chern, I-Chun and Chern, Steffi and Chen, Shiqi and Yuan, Weizhe and Feng, Kehua and Zhou, Chunting and He, Junxian and Neubig, Graham and Liu, Pengfei , title =. arXiv preprint arXiv:2307.13528 , year =

work page arXiv

[77] [79]

Findings of ACL , year =

Yuxia Wang and Revanth Gangi Reddy and Zain Muhammad Mujahid and Arnav Arora and Aleksandr Rubashevskii and Jiahui Geng and Osama Mohammed Afzal and Liangming Pan and Nadav Borenstein and Aditya Pillai and Isabelle Augenstein and Iryna Gurevych and Preslav Nakov , title =. Findings of ACL , year =

work page

[78] [80]

Manakul, Potsawee and Liusie, Adian and Gales, Mark , title =. Proc. EMNLP , pages =

work page

[79] [81]

Kuhn, Lorenz and Gal, Yarin and Farquhar, Sebastian , title =. Proc. ICLR , year =

work page

[80] [82]

Language Models (Mostly) Know What They Know

Kadavath, Saurav and Conerly, Tom and Askell, Amanda and Henighan, Tom and Drain, Dawn and Perez, Ethan and Schiefer, Nicholas and Hatfield-Dodds, Zac and DasSarma, Nova and Tran-Johnson, Eli and others , title =. arXiv preprint arXiv:2207.05221 , year =

work page internal anchor Pith review Pith/arXiv arXiv