MetaCogAgent: A Metacognitive Multi-Agent LLM Framework with Self-Aware Task Delegation

Chenyu Wang; Yang Shu

arxiv: 2605.17292 · v1 · pith:2LO4NTGDnew · submitted 2026-05-17 · 💻 cs.AI · cs.MA

MetaCogAgent: A Metacognitive Multi-Agent LLM Framework with Self-Aware Task Delegation

Chenyu Wang , Yang Shu This is my paper

Pith reviewed 2026-05-20 13:43 UTC · model grok-4.3

classification 💻 cs.AI cs.MA

keywords metacognitive agentsmulti-agent LLMtask delegationself-assessmentcognitive benchmarkLLM framework

0 comments

The pith

MetaCogAgent adds metacognitive self-assessment to multi-agent LLM frameworks for smarter task delegation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that multi-agent LLM systems can perform better if each agent can assess its own suitability for a given task before attempting it. Existing systems assign tasks based on fixed roles, which causes agents to overconfidently tackle things outside their expertise. MetaCogAgent introduces a self-assessment mechanism, an adaptive delegation protocol, and a learning module to refine capabilities. This leads to improved accuracy and efficiency, as shown in experiments on a new benchmark covering various cognitive skills. A sympathetic reader would care because it points toward more reliable collaborative AI that wastes less effort on mismatched tasks.

Core claim

The central claim is that by equipping each agent with a Metacognitive Self-Assessment Unit that estimates confidence through verbalized uncertainty and historical profiles, the system can adaptively delegate low-confidence tasks to better-suited agents, resulting in higher overall task accuracy and fewer API calls.

What carries the argument

The Metacognitive Self-Assessment Unit that evaluates task-capability alignment before execution by combining verbalized uncertainty with historical capability profiles.

If this is right

Tasks are routed to agents with higher competence, increasing accuracy over standard routing baselines.
API calls are reduced compared to AutoGen and ensemble voting methods.
Each agent's competence model improves iteratively through feedback loops.
The framework handles tasks across multiple cognitive dimensions more effectively.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This metacognitive approach could be applied to improve robustness in other AI collaboration setups.
Agents might develop better long-term strategies if they learn from past self-assessments.
Testing on dynamic, real-time tasks could reveal additional benefits or limitations.

Load-bearing premise

The self-assessment mechanism produces reliable confidence estimates that correctly identify when to delegate tasks.

What would settle it

Running the system with the self-assessment unit disabled or replaced with random confidence scores and checking if the accuracy and efficiency advantages disappear.

Figures

Figures reproduced from arXiv: 2605.17292 by Chenyu Wang, Yang Shu.

**Figure 2.** Figure 2: Accuracy by cognitive dimension. MetaCogAgent shows the largest [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Reliability diagram. MetaCogAgent’s confidence is well-calibrated, [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

read the original abstract

Multi-agent large language model (LLM) systems have shown promise for solving complex tasks through agent collaboration. However, existing frameworks assign tasks based on predefined roles without considering whether an agent can accurately assess its own competence boundaries, leading to overconfident execution on tasks beyond its expertise. Inspired by metacognition theory from cognitive science, we propose MetaCogAgent, a multi-agent LLM framework where each agent is equipped with a Metacognitive Self-Assessment Unit that evaluates task-capability alignment before execution. The framework introduces three contributions: (1) a self-assessment mechanism that estimates per-task confidence by combining verbalized uncertainty with historical capability profiles; (2) an adaptive delegation protocol that routes low-confidence tasks to better-suited agents through cross-agent evaluation; and (3) a capability boundary learning module that iteratively refines each agent's competence model via cybernetic feedback. Experiments on our constructed MetaCog-Eval benchmark (700 tasks across 5 cognitive dimensions) demonstrate that MetaCogAgent achieves 82.4% task accuracy -- 8.7% above the best routing baseline -- while using 5% fewer API calls than AutoGen and 34% fewer than ensemble voting. Ablation studies confirm that each metacognitive component contributes to overall system performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MetaCogAgent offers a fresh metacognitive approach to agent delegation but its benchmark results need more scrutiny to be convincing.

read the letter

The main takeaway is that MetaCogAgent brings a metacognitive layer to multi-agent LLM frameworks by having agents assess their own task competence before acting. This includes combining verbalized uncertainty with historical profiles for confidence estimates, routing uncertain tasks through peer evaluation, and refining capability boundaries with feedback loops. The approach draws from cognitive science in a way that feels distinct from standard agent routing or ensemble methods. What stands out is the concrete implementation of these three pieces and the reported results on their MetaCog-Eval benchmark. The system hits 82.4 percent accuracy, which is 8.7 points better than the best routing baseline, while cutting API calls compared to AutoGen and voting setups. Ablation checks suggest each part adds value. If the mechanisms hold up, this could help reduce errors in complex problem-solving agent teams without extra cost. The main concern is the experimental setup. The benchmark was built by the authors with 700 tasks across five dimensions, but the abstract gives no details on how tasks were chosen, their difficulty levels, or any controls for bias. There's also no mention of statistical significance, run variance, or whether the baseline systems used the exact same LLM backends and prompts. This leaves room for the gains to be tied to how the test was designed rather than the metacognitive features themselves. The free parameter for confidence threshold also needs checking for sensitivity. This paper would interest researchers focused on making multi-agent systems more robust for real applications like planning or analysis tasks. Readers looking for new ways to handle agent limitations might get ideas from the self-assessment and learning modules. It shows clear thinking on the problem and engages with existing frameworks like AutoGen. The work deserves a serious referee to push for better documentation of the experiments and perhaps code release for reproducibility. I would recommend sending it through peer review rather than a desk reject, as the core proposal has potential even if the current evidence needs strengthening.

Referee Report

2 major / 2 minor

Summary. The paper introduces MetaCogAgent, a multi-agent LLM framework inspired by metacognition theory. Each agent includes a Metacognitive Self-Assessment Unit that combines verbalized uncertainty with historical capability profiles to estimate task alignment. An adaptive delegation protocol routes low-confidence tasks across agents, and a capability boundary learning module refines competence models via feedback. On the authors' constructed MetaCog-Eval benchmark of 700 tasks spanning 5 cognitive dimensions, the system reports 82.4% accuracy (8.7% above the best routing baseline), 5% fewer API calls than AutoGen, and 34% fewer than ensemble voting. Ablation studies attribute gains to the metacognitive components.

Significance. If the benchmark and baselines are shown to be fair and reproducible, the work could meaningfully advance multi-agent LLM systems by addressing overconfidence through explicit self-assessment and cross-agent routing. The combination of verbalized uncertainty with learned capability profiles offers a concrete mechanism that goes beyond static role assignment. However, the current presentation leaves the central performance claim only partially supported.

major comments (2)

[Abstract / Experiments] Abstract and Experiments section: The headline claims (82.4% accuracy, +8.7% over best baseline, reduced API calls) rest on the MetaCog-Eval benchmark, yet the manuscript provides no information on task sourcing, difficulty calibration, inter-rater reliability, or how the 5 cognitive dimensions were operationalized. Without these details it is impossible to determine whether the observed gains reflect the metacognitive components or properties of the task distribution.
[Abstract / Experiments] Abstract and Experiments section: No statistical significance tests, standard deviations across runs, or details on baseline re-implementations (identical LLM back-ends, prompt templates, and agent counts) are reported. This leaves open the possibility that the 8.7% margin and API-call savings are sensitive to implementation choices rather than the proposed self-assessment and delegation protocol.

minor comments (2)

[Abstract] The abstract refers to 'cybernetic feedback' without defining the term or its concrete implementation in the capability boundary learning module.
[Method] Clarify whether the free parameter (confidence threshold for delegation) was tuned on the same benchmark used for final evaluation or held out.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. The comments correctly identify gaps in the presentation of our evaluation that affect the strength of our central claims. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core contributions.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and Experiments section: The headline claims (82.4% accuracy, +8.7% over best baseline, reduced API calls) rest on the MetaCog-Eval benchmark, yet the manuscript provides no information on task sourcing, difficulty calibration, inter-rater reliability, or how the 5 cognitive dimensions were operationalized. Without these details it is impossible to determine whether the observed gains reflect the metacognitive components or properties of the task distribution.

Authors: We agree that the current manuscript lacks sufficient detail on MetaCog-Eval construction. In the revised version we will add a new subsection (and expanded appendix) that explicitly describes: (1) the sourcing of the 700 tasks from public cognitive-science datasets and synthetic generation procedures; (2) the operationalization of the five cognitive dimensions drawing on established taxonomies (e.g., Bloom’s revised taxonomy and dual-process theory); (3) the difficulty-calibration protocol using pilot runs and expert rating; and (4) inter-rater reliability statistics (Cohen’s κ) for task labeling. These additions will allow readers to evaluate whether performance differences arise from the metacognitive mechanisms rather than benchmark artifacts. revision: yes
Referee: [Abstract / Experiments] Abstract and Experiments section: No statistical significance tests, standard deviations across runs, or details on baseline re-implementations (identical LLM back-ends, prompt templates, and agent counts) are reported. This leaves open the possibility that the 8.7% margin and API-call savings are sensitive to implementation choices rather than the proposed self-assessment and delegation protocol.

Authors: We acknowledge that the reported results currently omit statistical rigor and implementation specifics. The revised manuscript will include: (1) means and standard deviations computed over five independent runs with different random seeds; (2) paired t-tests (or Wilcoxon signed-rank tests where normality assumptions fail) with p-values for the 8.7% accuracy improvement and API-call reductions; and (3) a detailed reproducibility appendix listing the exact LLM back-end versions, full prompt templates for each baseline (AutoGen, ensemble voting, and routing baselines), and the precise agent counts and temperature settings used. These changes will demonstrate that the observed gains are robust and attributable to the proposed metacognitive components. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results independent of internal definitions

full rationale

The paper proposes a metacognitive multi-agent framework with self-assessment, adaptive delegation, and iterative capability learning modules, then reports direct experimental outcomes (82.4% accuracy on the author-constructed MetaCog-Eval benchmark of 700 tasks). These performance figures and ablation results are obtained by running the implemented system against external task instances rather than by algebraic reduction of equations to fitted parameters or by self-referential definitions. Capability profiles are learned from interaction data, but the final accuracy and efficiency claims do not equate to those inputs by construction; the benchmark tasks and baseline comparisons supply an independent testbed. No load-bearing derivation step matches the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 3 invented entities

The central claim rests on the effectiveness of three newly introduced software components whose performance is demonstrated only through the reported experiments; no independent evidence for the components is supplied in the abstract.

free parameters (1)

confidence threshold for delegation
The adaptive delegation protocol must use some cutoff below which tasks are routed elsewhere; the abstract does not state how this value is chosen or tuned.

axioms (1)

domain assumption Metacognition theory from cognitive science can be directly transferred to LLM agents via verbalized uncertainty and historical profiles
The framework is explicitly inspired by metacognition theory and assumes the analogy produces useful self-assessment.

invented entities (3)

Metacognitive Self-Assessment Unit no independent evidence
purpose: Estimates per-task confidence by combining verbalized uncertainty with historical capability profiles
New component introduced to address overconfident execution.
adaptive delegation protocol no independent evidence
purpose: Routes low-confidence tasks to better-suited agents via cross-agent evaluation
New protocol for dynamic task routing.
capability boundary learning module no independent evidence
purpose: Iteratively refines each agent's competence model via cybernetic feedback
New module for ongoing self-model improvement.

pith-pipeline@v0.9.0 · 5752 in / 1629 out tokens · 71621 ms · 2026-05-20T13:43:33.059699+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

self-assessment mechanism that estimates per-task confidence by combining verbalized uncertainty with historical capability profiles; adaptive delegation protocol that routes low-confidence tasks

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 4 internal anchors

[1]

Language models are few-shot learners,

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askellet al., “Language models are few-shot learners,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 33, 2020, pp. 1877–1901

work page 2020
[2]

GPT-4 Technical Report

J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkatet al., “GPT-4 technical report,”arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

AutoGen: Enabling next-gen LLM applica- tions via multi-agent conversation,

Q. Wu, G. Bansal, J. Zhang, Y . Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Wang, S. Zhanget al., “AutoGen: Enabling next-gen LLM applica- tions via multi-agent conversation,” inProceedings of the International Conference on Machine Learning (ICML), 2024

work page 2024
[4]

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

S. Hong, M. Zhuge, J. Chen, X. Zheng, Y . Cheng, C. Zhang, J. Wang, Z. Wang, S. K. S. Yau, Z. Linet al., “MetaGPT: Meta programming for a multi-agent collaborative framework,”arXiv preprint arXiv:2308.00352, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[5]

CAMEL: Communicative agents for “mind

G. Li, H. A. A. K. Hammoud, H. Itani, D. Khizbullin, and B. Ghanem, “CAMEL: Communicative agents for “mind” exploration of large lan- guage model society,” inAdvances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023
[6]

Generative agents: Interactive simulacra of human behavior,

J. S. Park, J. C. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” inProceedings of the ACM Symposium on User Interface Software and Technology (UIST), 2023

work page 2023
[7]

Metacognition and cognitive monitoring: A new area of cognitive-developmental inquiry,

J. H. Flavell, “Metacognition and cognitive monitoring: A new area of cognitive-developmental inquiry,”American Psychologist, vol. 34, no. 10, pp. 906–911, 1979

work page 1979
[8]

Metacognitive control and strategy selection: Deciding to practice retrieval during learning,

T. C. Toppino and M. S. Cohen, “Metacognitive control and strategy selection: Deciding to practice retrieval during learning,”Journal of Experimental Psychology: Learning, Memory, and Cognition, vol. 35, no. 5, pp. 1105–1117, 2009

work page 2009
[9]

AgentVerse: Facilitating multi- agent collaboration and exploring emergent behaviors,

W. Chen, Y . Su, J. Zuo, C. Yang, C. Yuan, C.-M. Chan, H. Yu, Y . Lu, Y .-H. Hung, C. Qianet al., “AgentVerse: Facilitating multi- agent collaboration and exploring emergent behaviors,” inInternational Conference on Learning Representations (ICLR), 2024

work page 2024
[10]

Improving factuality and reasoning in language models through multiagent debate,

Y . Du, S. Li, A. Torralba, J. B. Tenenbaum, and I. Mordatch, “Improving factuality and reasoning in language models through multiagent debate,” inProceedings of the International Conference on Machine Learning (ICML), 2024

work page 2024
[11]

Exchange-of-thought: Enhancing large language model capabilities through cross-model communication,

Z. Yin, Q. Sun, C. Chang, Q. Guo, J. Dai, X. Huang, and X. Qiu, “Exchange-of-thought: Enhancing large language model capabilities through cross-model communication,” inProceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

work page 2023
[12]

Language Models (Mostly) Know What They Know

S. Kadavath, T. Conerly, A. Askell, T. Henighan, D. Drain, E. Perez, N. Schiefer, Z. Hatfield-Dodds, N. DasSarma, E. Tran-Johnsonet al., “Language models (mostly) know what they know,”arXiv preprint arXiv:2207.05221, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[13]

Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs

M. Xiong, Z. Hu, X. Lu, Y . Li, J. Fu, J. He, and B. Hooi, “Can LLMs express their uncertainty? an empirical evaluation of confidence elicitation in LLMs,”arXiv preprint arXiv:2306.13063, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[14]

On calibration of modern neural networks,

C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” inProceedings of the International Confer- ence on Machine Learning (ICML), 2017, pp. 1321–1330

work page 2017
[15]

Reflexion: Language agents with verbal reinforcement learning,

N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao, “Reflexion: Language agents with verbal reinforcement learning,” in Advances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023
[16]

Tree of thoughts: Deliberate problem solving with large language models,

S. Yao, D. Yu, J. Zhao, I. Shafran, T. L. Griffiths, Y . Cao, and K. Narasimhan, “Tree of thoughts: Deliberate problem solving with large language models,” inAdvances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023
[17]

Wiener,Cybernetics: Or Control and Communication in the Animal and the Machine

N. Wiener,Cybernetics: Or Control and Communication in the Animal and the Machine. Cambridge, MA: MIT Press, 1948

work page 1948

[1] [1]

Language models are few-shot learners,

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askellet al., “Language models are few-shot learners,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 33, 2020, pp. 1877–1901

work page 2020

[2] [2]

GPT-4 Technical Report

J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkatet al., “GPT-4 technical report,”arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[3] [3]

AutoGen: Enabling next-gen LLM applica- tions via multi-agent conversation,

Q. Wu, G. Bansal, J. Zhang, Y . Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Wang, S. Zhanget al., “AutoGen: Enabling next-gen LLM applica- tions via multi-agent conversation,” inProceedings of the International Conference on Machine Learning (ICML), 2024

work page 2024

[4] [4]

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

S. Hong, M. Zhuge, J. Chen, X. Zheng, Y . Cheng, C. Zhang, J. Wang, Z. Wang, S. K. S. Yau, Z. Linet al., “MetaGPT: Meta programming for a multi-agent collaborative framework,”arXiv preprint arXiv:2308.00352, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[5] [5]

CAMEL: Communicative agents for “mind

G. Li, H. A. A. K. Hammoud, H. Itani, D. Khizbullin, and B. Ghanem, “CAMEL: Communicative agents for “mind” exploration of large lan- guage model society,” inAdvances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023

[6] [6]

Generative agents: Interactive simulacra of human behavior,

J. S. Park, J. C. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” inProceedings of the ACM Symposium on User Interface Software and Technology (UIST), 2023

work page 2023

[7] [7]

Metacognition and cognitive monitoring: A new area of cognitive-developmental inquiry,

J. H. Flavell, “Metacognition and cognitive monitoring: A new area of cognitive-developmental inquiry,”American Psychologist, vol. 34, no. 10, pp. 906–911, 1979

work page 1979

[8] [8]

Metacognitive control and strategy selection: Deciding to practice retrieval during learning,

T. C. Toppino and M. S. Cohen, “Metacognitive control and strategy selection: Deciding to practice retrieval during learning,”Journal of Experimental Psychology: Learning, Memory, and Cognition, vol. 35, no. 5, pp. 1105–1117, 2009

work page 2009

[9] [9]

AgentVerse: Facilitating multi- agent collaboration and exploring emergent behaviors,

W. Chen, Y . Su, J. Zuo, C. Yang, C. Yuan, C.-M. Chan, H. Yu, Y . Lu, Y .-H. Hung, C. Qianet al., “AgentVerse: Facilitating multi- agent collaboration and exploring emergent behaviors,” inInternational Conference on Learning Representations (ICLR), 2024

work page 2024

[10] [10]

Improving factuality and reasoning in language models through multiagent debate,

Y . Du, S. Li, A. Torralba, J. B. Tenenbaum, and I. Mordatch, “Improving factuality and reasoning in language models through multiagent debate,” inProceedings of the International Conference on Machine Learning (ICML), 2024

work page 2024

[11] [11]

Exchange-of-thought: Enhancing large language model capabilities through cross-model communication,

Z. Yin, Q. Sun, C. Chang, Q. Guo, J. Dai, X. Huang, and X. Qiu, “Exchange-of-thought: Enhancing large language model capabilities through cross-model communication,” inProceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

work page 2023

[12] [12]

Language Models (Mostly) Know What They Know

S. Kadavath, T. Conerly, A. Askell, T. Henighan, D. Drain, E. Perez, N. Schiefer, Z. Hatfield-Dodds, N. DasSarma, E. Tran-Johnsonet al., “Language models (mostly) know what they know,”arXiv preprint arXiv:2207.05221, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[13] [13]

Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs

M. Xiong, Z. Hu, X. Lu, Y . Li, J. Fu, J. He, and B. Hooi, “Can LLMs express their uncertainty? an empirical evaluation of confidence elicitation in LLMs,”arXiv preprint arXiv:2306.13063, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[14] [14]

On calibration of modern neural networks,

C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” inProceedings of the International Confer- ence on Machine Learning (ICML), 2017, pp. 1321–1330

work page 2017

[15] [15]

Reflexion: Language agents with verbal reinforcement learning,

N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao, “Reflexion: Language agents with verbal reinforcement learning,” in Advances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023

[16] [16]

Tree of thoughts: Deliberate problem solving with large language models,

S. Yao, D. Yu, J. Zhao, I. Shafran, T. L. Griffiths, Y . Cao, and K. Narasimhan, “Tree of thoughts: Deliberate problem solving with large language models,” inAdvances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023

[17] [17]

Wiener,Cybernetics: Or Control and Communication in the Animal and the Machine

N. Wiener,Cybernetics: Or Control and Communication in the Animal and the Machine. Cambridge, MA: MIT Press, 1948

work page 1948