pith. sign in

arxiv: 2605.14892 · v2 · pith:4JHPVUT4new · submitted 2026-05-14 · 💻 cs.AI

Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems

Pith reviewed 2026-05-19 16:47 UTC · model grok-4.3

classification 💻 cs.AI
keywords LLM agentsmulti-agent systemscollaborationfailure attributionself-evolutioncollective intelligenceautonomous systems
0
0 comments X

The pith

A survey organizes LLM multi-agent research into four causally linked stages that build from individual capabilities to self-evolving collective systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews how LLM-based agents move beyond individual tasks to coordinated multi-agent work. It identifies that existing work treats collaboration, error diagnosis, and self-improvement in isolation. By linking them as a progression, it shows each stage sets requirements for the next while depending on prior ones. This matters because without closing the loop from failures back to structural changes, systems cannot sustain improvement across complex tasks. The survey maps taxonomies and flags boundary problems for future work on autonomous multi-agent intelligence.

Core claim

The authors propose the LIFE progression—Lay the capability foundation, Integrate agents through collaboration, Find faults through attribution, and Evolve through autonomous self-improvement—as a way to unify fragmented research. They characterize formal dependencies between adjacent stages, demonstrating that each both depends on and constrains the following one, and they outline a cross-stage agenda to create closed-loop systems that diagnose failures and reorganize autonomously.

What carries the argument

The LIFE progression framework, which sequences four stages and formally links them through dependencies that each stage imposes on the next.

Load-bearing premise

That the four stages genuinely form a causal chain where each depends on and constrains the next, rather than being loosely related research topics.

What would settle it

A documented case of an LLM multi-agent system achieving continuous self-improvement and structural reorganization without any mechanism for attributing failures across agents.

read the original abstract

LLM-based autonomous agents have demonstrated strong capabilities in reasoning, planning, and tool use, yet remain limited when tasks require sustained coordination across roles, tools, and environments. Multi-agent systems address this through structured collaboration among specialized agents, but tighter coordination also amplifies a less explored risk: errors can propagate across agents and interaction rounds, producing failures that are difficult to diagnose and rarely translate into structural self-improvement. Existing surveys cover individual agent capabilities, multi-agent collaboration, or agent self-evolution separately, leaving the causal dependencies among them unexamined. This survey provides a unified review organized around four causally linked stages, which we term the LIFE progression: Lay the capability foundation, Integrate agents through collaboration, Find faults through attribution, and Evolve through autonomous self-improvement. For each stage, we provide systematic taxonomies and formally characterize the dependencies between adjacent stages, revealing how each stage both depends on and constrains the next. Beyond synthesizing existing work, we identify open challenges at stage boundaries and propose a cross-stage research agenda for closed-loop multi-agent systems capable of continuously diagnosing failures, reorganizing structures, and refining agent behaviors, extending current coordination frameworks toward more self-organizing forms of collective intelligence. By bridging these previously fragmented research threads, this survey aims to offer both a systematic reference and a conceptual roadmap toward autonomous, self-improving multi-agent intelligence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper surveys LLM-based multi-agent systems and proposes the LIFE progression framework consisting of four stages: Lay the capability foundation, Integrate agents through collaboration, Find faults through attribution, and Evolve through autonomous self-improvement. It provides systematic taxonomies of existing work for each stage, formally characterizes dependencies between adjacent stages, identifies boundary challenges, and outlines a cross-stage research agenda aimed at enabling closed-loop, self-organizing multi-agent systems.

Significance. If the central claims hold, the survey would serve as a useful unifying reference that connects previously separate threads on individual agent capabilities, collaboration, fault attribution, and self-evolution in LLM-based multi-agent systems. The explicit identification of stage-boundary challenges and the proposed agenda for closed-loop systems could help orient future research toward more autonomous collective intelligence. The manuscript's strength lies in its broad literature coverage and organizational synthesis rather than in new derivations or empirical tests.

major comments (2)
  1. [Abstract and §1] Abstract and §1: The central claim that the four stages form a 'causal progression' in which each 'both depends on and constrains the next,' enabling 'closed-loop' self-organizing systems, rests on narrative synthesis of the literature. No meta-analysis, cross-paper empirical correlations, or formal models are presented to demonstrate, for example, that mechanisms in the 'Find faults' stage measurably constrain outcomes in the 'Evolve' stage (or vice versa). This makes the progression function as an imposed organizing lens rather than an extracted causal structure, directly affecting the strength of the cross-stage agenda.
  2. [§3 and §4] §3 (Integrate) and §4 (Find faults): The claimed dependency that collaboration structures constrain attribution accuracy is described qualitatively but without reference to any specific empirical studies or quantitative comparisons showing effect sizes or failure propagation rates across interaction rounds. Adding such evidence or explicit caveats would be needed to support the load-bearing assertion that bridging these boundaries produces self-organizing systems.
minor comments (2)
  1. [Taxonomy tables] The taxonomy tables in each stage section would benefit from consistent column headers and explicit criteria used for categorizing papers to improve readability and reproducibility of the classification.
  2. [Boundary challenges] Some citations in the boundary-challenge discussions appear to overlap with earlier stage sections; cross-referencing these explicitly would reduce redundancy.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and indicate revisions to strengthen the manuscript's claims while preserving its scope as a literature survey.

read point-by-point responses
  1. Referee: [Abstract and §1] Abstract and §1: The central claim that the four stages form a 'causal progression' in which each 'both depends on and constrains the next,' enabling 'closed-loop' self-organizing systems, rests on narrative synthesis of the literature. No meta-analysis, cross-paper empirical correlations, or formal models are presented to demonstrate, for example, that mechanisms in the 'Find faults' stage measurably constrain outcomes in the 'Evolve' stage (or vice versa). This makes the progression function as an imposed organizing lens rather than an extracted causal structure, directly affecting the strength of the cross-stage agenda.

    Authors: We acknowledge that the LIFE progression is a synthesized framework derived from systematic literature review rather than new empirical meta-analysis or formal causal models. The dependencies are characterized through logical analysis of reported interactions in existing studies, supported by specific citations to works demonstrating inter-stage effects. We will revise the abstract and §1 to clarify this as a conceptual organizing lens grounded in qualitative and selected quantitative evidence from the literature, add a dedicated limitations paragraph on the lack of formal causal inference, and include further references to empirical studies where inter-stage constraints have been observed. revision: partial

  2. Referee: [§3 and §4] §3 (Integrate) and §4 (Find faults): The claimed dependency that collaboration structures constrain attribution accuracy is described qualitatively but without reference to any specific empirical studies or quantitative comparisons showing effect sizes or failure propagation rates across interaction rounds. Adding such evidence or explicit caveats would be needed to support the load-bearing assertion that bridging these boundaries produces self-organizing systems.

    Authors: We agree that explicit empirical grounding would strengthen the dependency claims. In the revision, we will expand §§3 and 4 with additional citations to studies that provide quantitative or comparative evidence on how collaboration complexity affects attribution accuracy and error propagation. Where direct effect sizes are limited in the surveyed literature, we will insert clear caveats noting the need for future targeted experiments. This will better substantiate the boundary challenges and the proposed cross-stage agenda. revision: yes

Circularity Check

0 steps flagged

LIFE progression is an organizational synthesis of existing literature with no self-referential derivation

full rationale

The paper is a survey paper that synthesizes prior work on LLM-based multi-agent systems into a four-stage framework called the LIFE progression (Lay the capability foundation, Integrate agents through collaboration, Find faults through attribution, and Evolve through autonomous self-improvement). It claims to 'formally characterize the dependencies between adjacent stages' and reveal how each 'both depends on and constrains the next,' but this is accomplished via systematic taxonomies and review of external literature rather than any new derivations, equations, fitted parameters, or self-referential logic. No quantitative models, predictions from inputs, or load-bearing self-citations are present that would reduce the central claim to its own construction. The framework functions as a conceptual organizing lens for fragmented research threads, rendering the analysis self-contained without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The survey rests on the domain assumption that the four stages exhibit the claimed causal dependencies and that the proposed taxonomy captures the essential interactions without major omissions.

axioms (1)
  • domain assumption The four stages (Lay the capability foundation, Integrate agents through collaboration, Find faults through attribution, Evolve through autonomous self-improvement) form a causal progression in which each depends on and constrains the next.
    Invoked as the central organizing principle that reveals dependencies and enables the cross-stage agenda.
invented entities (1)
  • LIFE progression framework no independent evidence
    purpose: To provide a unified causal structure linking collaboration, failure attribution, and self-evolution stages
    New conceptual construct introduced to organize the survey and propose future research.

pith-pipeline@v0.9.0 · 5833 in / 1440 out tokens · 93634 ms · 2026-05-19T16:47:01.085479+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

298 extracted references · 298 canonical work pages · 23 internal anchors

  1. [1]

    2025 , month = aug, howpublished =

    Introducing. 2025 , month = aug, howpublished =

  2. [2]

    arXiv preprint arXiv:2412.19437 , year =

  3. [3]

    Qwen3 Technical Report

    Yang, A. and others , title =. arXiv preprint arXiv:2505.09388 , year =

  4. [4]

    2026 , month = feb, howpublished =

    Introducing. 2026 , month = feb, howpublished =

  5. [5]

    and Yang, D

    Guo, D. and Yang, D. and Zhang, H. and others , title =. Nature , volume =

  6. [6]

    and Bosma, M

    Wei, J. and Bosma, M. and Zhao, V. Y. and Guu, K. and Yu, A. W. and Lester, B. and Du, N. and Dai, A. M. and Le, Q. V. , title =. Proc. ICLR , year =

  7. [7]

    and Wang, X

    Wei, J. and Wang, X. and Schuurmans, D. and Bosma, M. and Ichter, B. and Xia, F. and Chi, E. H. and Le, Q. V. and Zhou, D. , title =. Proc. NeurIPS , volume =

  8. [8]

    and Gu, S

    Kojima, T. and Gu, S. S. and Reid, M. and Matsuo, Y. and Iwasawa, Y. , title =. Proc. NeurIPS , volume =

  9. [9]

    2026 , month = mar, howpublished =

    Introducing. 2026 , month = mar, howpublished =

  10. [10]

    and Ma, C

    Wang, L. and Ma, C. and Feng, X. and Zhang, Z. and Yang, H. and Zhang, J. and Chen, Z. and Tang, J. and Chen, X. and Lin, Y. and Zhao, W. X. and Wei, Z. and Wen, J.-R. , title =. Frontiers of Computer Science , volume =. 2024 , note =

  11. [11]

    and Chen, W

    Xi, Z. and Chen, W. and Guo, X. and He, W. and Ding, Y. and Hong, B. and Zhang, M. and Wang, J. and Jin, S. and Zhou, E. and Zheng, R. and Fan, X. and Wang, X. and Xiong, L. and Zhou, Y. and Wang, W. and Jiang, C. and Zou, Y. and Liu, X. and Yin, Z. and Dou, S. and Weng, R. and Cheng, W. and Zhang, Q. and Qin, Y. and Zheng, Y. and Qiu, X. and Huang, X. an...

  12. [12]

    Jimenez, C. E. and Yang, J. and Wettig, A. and Yao, S. and Pei, K. and Press, O. and Narasimhan, K. , title =. Proc. ICLR , year =

  13. [13]

    Qian, Chen and Liu, Wei and Liu, Hongzhang and Chen, Nuo and Dang, Yufan and Li, Jiahao and Yang, Cheng and Chen, Weize and Su, Yusheng and Cong, Xin and Xu, Juyuan and Li, Dahai and Liu, Zhiyuan and Sun, Maosong , title =. Proc. ACL , pages =. 2024 , publisher =

  14. [14]

    Boiko, D. A. and MacKnight, R. and Kline, B. and Gomes, G. , title =. Nature , volume =

  15. [15]

    Bran, A. M. and Cox, S. and Schilter, O. and Baldassari, C. and White, A. D. and Schwaller, P. , title =. Nature Machine Intelligence , volume =

  16. [16]

    and Joshi, T

    Zitkovich, B. and Joshi, T. J. and Irpan, A. and Ichter, B. and Hsu, J. and Herzog, A. and Hausman, K. and Gopalakrishnan, K. and Fu, C. and Florence, P. and Finn, C. and Dubey, K. A. and Driess, D. and Ding, T. and Choromanski, K. M. and Chen, X. and Chebotar, Y. and Carbajal, J. and Brown, N. and Brohan, A. and Arenas, M. G. and Han, K. , title =. Proc....

  17. [17]

    Huang, Wenlong and Xia, Fei and Xiao, Ted and Chan, Harris and Liang, Jacky and Florence, Pete and Zeng, Andy and Tompson, Jonathan and Mordatch, Igor and Chebotar, Yevgen and Sermanet, Pierre and Jackson, Tomas and Brown, Noah and Luu, Linda and Levine, Sergey and Hausman, Karol and Ichter, Brian , title =. Proc. CoRL , volume =. 2023 , publisher =

  18. [18]

    and Zhao, J

    Yao, S. and Zhao, J. and Yu, D. and Du, N. and Shafran, I. and Narasimhan, K. and Cao, Y. , title =. Proc. ICLR , year =

  19. [19]

    and Cassano, F

    Shinn, N. and Cassano, F. and Gopinath, A. and Narasimhan, K. and Yao, S. , title =. Proc. NeurIPS , volume =

  20. [20]

    Why Do Multi-Agent LLM Systems Fail?

    Cemri, Mert and Pan, Melissa Z. and Yang, Shuyi and Agrawal, Lakshya A. and Chopra, Bhavya and Tiwari, Rishabh and Keutzer, Kurt and Parameswaran, Aditya and Klein, Dan and Ramchandran, Kannan and Zaharia, Matei and Gonzalez, Joseph E. and Stoica, Ion , title =. arXiv preprint arXiv:2503.13657 , year =

  21. [21]

    TechRxiv preprint , doi =

    Yao, Yunzhi and Qin, Jiaxin and Zhang, Ningyu and Xu, Haoming and Zhu, Yuqi and Yu, Zeping and Wang, Mengru and Tang, Yuqi and Gu, Jiachen and Deng, Shumin and Peng, Nanyun and Chen, Huajun , title =. TechRxiv preprint , doi =

  22. [22]

    and Zhuge, M

    Hong, S. and Zhuge, M. and Chen, J. and Zheng, X. and Cheng, Y. and Wang, J. and Zhang, C. and Wang, Z. and Yau, S. K. S. and Lin, Z. and Zhou, L. and Ran, C. and Xiao, L. and Wu, C. and Schmidhuber, J. , title =. Proc. ICLR , year =

  23. [23]

    and Chen, X

    Guo, T. and Chen, X. and Wang, Y. and Chang, R. and Pei, S. and Chawla, N. V. and Wiest, O. and Zhang, X. , title =. Proc. IJCAI , pages =

  24. [24]

    and Bansal, G

    Wu, Q. and Bansal, G. and Zhang, J. and Wu, Y. and Li, B. and Zhu, E. and Jiang, L. and Zhang, X. and Zhang, S. and Liu, J. and Awadallah, A. H. and White, R. W. and Burger, D. and Wang, C. , title =. Proc. COLM , year =

  25. [25]

    and Han, S

    Park, C. and Han, S. and Guo, X. and Ozdaglar, A. and Zhang, K. and Kim, J.-K. , title =. Proc. ACL , pages =

  26. [26]

    2024 , month = nov, howpublished =

    Introducing the. 2024 , month = nov, howpublished =

  27. [27]

    2025 , month = apr, howpublished =

    Announcing the. 2025 , month = apr, howpublished =

  28. [28]

    and Yin, M

    Zhang, S. and Yin, M. and Zhang, J. and Liu, J. and Han, Z. and Zhang, J. and Li, B. and Wang, C. and Wang, H. and Chen, Y. and Wu, Q. , title =. Proc. ICML , pages =

  29. [29]

    Deshpande, V

    Deshpande, D. and Gangal, V. and Mehta, H. and Krishnan, J. and Kannappan, A. and Qian, R. , title =. arXiv preprint arXiv:2505.08638 , year =

  30. [30]

    and Xie, X

    Ma, X. and Xie, X. and Wang, Y. and Wang, J. and Wu, B. and Li, M. and Wang, Q. , title =. arXiv preprint arXiv:2509.23735 , year =

  31. [31]

    A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

    Gao, H.-a. and Geng, J. and Hua, W. and Hu, M. and Juan, X. and Liu, H. and Liu, S. and Qiu, J. and Qi, X. and Ren, Q. and Wu, Y. and Wang, H. and Xiao, H. and Zhou, Y. and Zhang, S. and Zhang, J. and Xiang, J. and Fang, Y. and Zhao, Q. and Liu, D. and Qian, C. and Wang, Z. and Hu, M. and Wang, H. and Wu, Q. and Ji, H. and Wang, M. , title =. arXiv prepri...

  32. [32]

    A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

    Fang, J. and Peng, Y. and Zhang, X. and Wang, Y. and Yi, X. and Zhang, G. and Xu, Y. and Wu, B. and Liu, S. and Li, Z. and Ren, Z. and Aletras, N. and Wang, X. and Zhou, H. and Meng, Z. , title =. arXiv preprint arXiv:2508.07407 , year =

  33. [33]

    From flat logs to causal graphs: Hierarchical failure attribution for llm-based multi-agent systems.arXiv preprint arXiv:2602.23701, 2026

    Wang, Y. and Wu, W. and Wang, J. and Wang, Q. , title =. arXiv preprint arXiv:2602.23701 , year =

  34. [34]

    and Qian, C

    Dang, Y. and Qian, C. and others , title =. Proc. NeurIPS , year =

  35. [35]

    verbose database queries correlate with null results

    Zhu, K. and Liu, Z. and Li, B. and Tian, M. and Yang, Y. and Zhang, J. and Han, P. and Xie, Q. and Cui, F. and Zhang, W. and Ma, X. and Yu, X. and Ramesh, G. and Wu, J. and Liu, Z. and Lu, P. and Zou, J. and You, J. , title =. arXiv preprint arXiv:2509.25370 , year =

  36. [36]

    and Dai, Q

    Zhang, Z. and Dai, Q. and Bo, X. and Ma, C. and Li, R. and Chen, X. and others , title =. ACM Transactions on Information Systems , volume =. 2025 , note =

  37. [37]

    and Zhang, Z

    Wei, H. and Zhang, Z. and He, S. and Xia, T. and Pan, S. and Liu, F. , title =. Proc. ACL , pages =

  38. [39]

    and Wang, S

    Li, X. and Wang, S. and Zeng, S. and Wu, Y. and Yang, Y. , title =. Vicinagearth , volume =. 2024 , note =

  39. [40]

    Multi-Agent Collaboration Mechanisms: A Survey of LLMs

    Tran, K.-T. and others , title =. arXiv preprint arXiv:2501.06322 , year =

  40. [42]

    LLM Multi-Agent Systems: Challenges and Open Problems

    Han, S. and others , title =. arXiv preprint arXiv:2402.03578 , year =

  41. [43]

    GPT-4 Technical Report

    OpenAI , title =. arXiv preprint arXiv:2303.08774 , year =

  42. [44]

    and Lavril, T

    Touvron, H. and Lavril, T. and Izacard, G. and Martinet, X. and Lachaux, M.-A. and Lacroix, T. and Rozi\`. LLaMA: Open and efficient foundation language models , journal =

  43. [45]

    and Xu, F

    Zhou, S. and Xu, F. F. and Zhu, H. and Zhou, X. and Lo, R. and Sridhar, A. and Cheng, X. and Ou, T. and Bisk, Y. and Fried, D. and Alon, U. and Neubig, G. , title =. Proc. ICLR , year =

  44. [46]

    Mathematics Into Type , howpublished =

  45. [47]

    The Rise and Potential of Large Language Model Based Agents: A Survey

    Xi, Z. and Chen, W. and Guo, X. and He, W. and Ding, Y. and Hong, B. and Zhang, M. and Wang, J. and Jin, S. and Zhou, E. and Zheng, R. and Fan, X. and Wang, X. and Xiong, L. and Zhou, Y. and Wang, W. and C. Jiang and Zou, Y. and Liu, X. and Yin, Z. and Dou, S. and Weng, R. and Cheng, W. and Zhang, Q. and Qin, Y. and Zheng, Y. and Qiu, X. and Huang, X. and...

  46. [48]

    Chaundy, T. W. and Barrett, P. R. and Batey, C. , title =. 1954 , publisher =

  47. [49]

    and Goossens, M

    Mittelbach, F. and Goossens, M. , title =. 2004 , publisher =

  48. [50]

    More Math Into LaTeX , year =

    Gr\". More Math Into LaTeX , year =

  49. [51]

    and Sharp, J

    Letourneau, M. and Sharp, J. W. , title =

  50. [52]

    , title =

    Sira-Ramirez, H. , title =. Systems & Control Letters , volume =

  51. [53]

    , title =

    Levant, A. , title =. Proc. IEEE CDC , pages =. 2006 , address =

  52. [54]

    and Join, C

    Fliess, M. and Join, C. and Sira-Ramirez, H. , title =. International Journal of Modelling, Identification and Control , volume =

  53. [55]

    and Astolfi, A

    Ortega, R. and Astolfi, A. and Bastin, G. and Rodriguez, H. , title =. Proc. ACC , pages =. 2000 , address =

  54. [56]

    Findings of ACL , pages =

    Jie Huang and Kevin Chen-Chuan Chang , title =. Findings of ACL , pages =

  55. [57]

    From System 1 to System 2: A Survey of Reasoning Large Language Models

    Fei Sun and Chaochao Chen and Shuai Li and others , title =. arXiv preprint arXiv:2502.17419 , year =

  56. [58]

    Retrieval-Augmented Generation for Knowledge-Intensive

    Patrick Lewis and Ethan Perez and Aleksandra Piktus and Fabio Petroni and Vladimir Karpukhin and Naman Goyal and Heinrich K\". Retrieval-Augmented Generation for Knowledge-Intensive. Proc. NeurIPS , volume =

  57. [59]

    Akari Asai and Zeqiu Wu and Yizhong Wang and Avirup Sil and Hannaneh Hajishirzi , title =. Proc. ICLR , year =

  58. [60]

    Mufei Li and Siqi Miao and Pan Li , title =. Proc. ICLR , year =

  59. [61]

    Yudi Zhang and Pei Xiao and Lu Wang and Chaoyun Zhang and Meng Fang and Yali Du and Yevgeniy Puzyrev and Randolph Yao and Si Qin and Qingwei Lin and Mykola Pechenizkiy and Dongmei Zhang and Saravanakumar Rajmohan and Qi Zhang , title =. Proc. ICLR , year =

  60. [62]

    Transactions on Machine Learning Research , year =

    Zhuosheng Zhang and Aston Zhang and Mu Li and Hai Zhao and George Karypis and Alex Smola , title =. Transactions on Machine Learning Research , year =

  61. [63]

    Smith and Ranjay Krishna , title =

    Yushi Hu and Weijia Shi and Xingyu Fu and Dan Roth and Mari Ostendorf and Luke Zettlemoyer and Noah A. Smith and Ranjay Krishna , title =. Proc. NeurIPS , volume =

  62. [64]

    Ji Qi and Ming Ding and Weihan Wang and Yushi Bai and Qingsong Lv and Wenyi Hong and Bin Xu and Lei Hou and Juanzi Li and Yuxiao Dong and Jie Tang , title =. Proc. ICLR , year =

  63. [65]

    Besta, Maciej and Blach, Nils and Kubicek, Ales and Gerstenberger, Robert and Podstawski, Michal and Gianinazzi, Lukas and Gajda, Joanna and Lehmann, Tomasz and Niewiadomski, Hubert and Nyczyk, Piotr and Hoefler, Torsten , title =. Proc. AAAI , volume =

  64. [66]

    Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

    Snell, Charlie and Lee, Jaehoon and Xu, Kelvin and Kumar, Aviral , title =. arXiv preprint arXiv:2408.03314 , year =

  65. [67]

    Wang, Peiyi and Li, Lei and Shao, Zhihong and Xu, Runxin and Dai, Damai and Li, Yifei and Chen, Deli and Wu, Yu and Sui, Zhifang , title =. Proc. ACL , pages =

  66. [68]

    Kojima, Takeshi and Gu, Shixiang Shane and Reid, Machel and Matsuo, Yutaka and Iwasawa, Yusuke , title =. Proc. NeurIPS , volume =

  67. [69]

    Yao, Shunyu and Yu, Dian and Zhao, Jeffrey and Shafran, Izhak and Griffiths, Tom and Cao, Yuan and Narasimhan, Karthik , title =. Proc. NeurIPS , volume =

  68. [70]

    Wang, Xuezhi and Wei, Jason and Schuurmans, Dale and Le, Quoc and Chi, Ed and Narang, Sharan and Chowdhery, Aakanksha and Zhou, Denny , title =. Proc. ICLR , year =

  69. [71]

    Madaan, Aman and Tandon, Niket and Gupta, Prakhar and Hallinan, Skyler and Gao, Luyu and Wiegreffe, Sarah and Alon, Uri and Dziri, Nouha and Prabhumoye, Shrimai and Yang, Yiming and Gupta, Shashank and Majumder, Bodhisattwa Prasad and Hermann, Katherine and Welleck, Sean and Yazdanbakhsh, Amir and Clark, Peter , title =. Proc. NeurIPS , volume =

  70. [72]

    Lightman, Hunter and Kosaraju, Vineet and Burda, Yuri and Edwards, Harrison and Baker, Bowen and Lee, Teddy and Leike, Jan and Schulman, John and Sutskever, Ilya and Cobbe, Karl , title =. Proc. ICLR , year =

  71. [73]

    Nature , year =

    DeepSeek-R1: Incentivizing Reasoning Capability in. Nature , year =

  72. [74]

    Zhang, Xuan and Du, Chao and Pang, Tianyu and Liu, Qian and Gao, Wei and Lin, Min , title =. Proc. NeurIPS , volume =

  73. [75]

    Luo, Haipeng and Sun, Qingfeng and Xu, Can and Zhao, Pu and Lou, Jian-Guang and Tao, Chongyang and Geng, Xiubo and Lin, Qingwei and Chen, Shifeng and Tang, Yansong and Zhang, Dongmei , title =. Proc. ICLR , year =

  74. [76]

    ACM Computing Surveys , volume =

    Ji, Ziwei and Lee, Nayeon and Frieske, Rita and Yu, Tiezheng and Su, Dan and Xu, Yan and Ishii, Etsuko and Bang, Yejin and Chen, Delong and Dai, Wenliang and Chan, Ho Shu and Madotto, Andrea and Fung, Pascale , title =. ACM Computing Surveys , volume =

  75. [77]

    Min, Sewon and Krishna, Kalpesh and Lyu, Xinxi and Lewis, Mike and Yih, Wen-tau and Koh, Pang Wei and Iyyer, Mohit and Zettlemoyer, Luke and Hajishirzi, Hannaneh , title =. Proc. EMNLP , pages =

  76. [78]

    2307.13528 , archivePrefix=

    Chern, I-Chun and Chern, Steffi and Chen, Shiqi and Yuan, Weizhe and Feng, Kehua and Zhou, Chunting and He, Junxian and Neubig, Graham and Liu, Pengfei , title =. arXiv preprint arXiv:2307.13528 , year =

  77. [79]

    Findings of ACL , year =

    Yuxia Wang and Revanth Gangi Reddy and Zain Muhammad Mujahid and Arnav Arora and Aleksandr Rubashevskii and Jiahui Geng and Osama Mohammed Afzal and Liangming Pan and Nadav Borenstein and Aditya Pillai and Isabelle Augenstein and Iryna Gurevych and Preslav Nakov , title =. Findings of ACL , year =

  78. [80]

    Manakul, Potsawee and Liusie, Adian and Gales, Mark , title =. Proc. EMNLP , pages =

  79. [81]

    Kuhn, Lorenz and Gal, Yarin and Farquhar, Sebastian , title =. Proc. ICLR , year =

  80. [82]

    Language Models (Mostly) Know What They Know

    Kadavath, Saurav and Conerly, Tom and Askell, Amanda and Henighan, Tom and Drain, Dawn and Perez, Ethan and Schiefer, Nicholas and Hatfield-Dodds, Zac and DasSarma, Nova and Tran-Johnson, Eli and others , title =. arXiv preprint arXiv:2207.05221 , year =

Showing first 80 references.