arxiv: 2604.18500 · v1 · submitted 2026-04-20 · 💻 cs.MA · q-fin.GN

Recognition: unknown

QRAFTI: An Agentic Framework for Empirical Research in Quantitative Finance

Terence Lim , Kumar Muthuraman , Michael Sury

Authors on Pith no claims yet

Pith reviewed 2026-05-10 03:02 UTC · model grok-4.3

classification 💻 cs.MA q-fin.GN

keywords multi-agent systemsquantitative financeequity factorsempirical researchagentic frameworkspanel datatool callingreflection planning

0 comments

The pith

A multi-agent framework with chained tool calls and reflection planning supports quantitative equity factor research better than dynamic code generation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces QRAFTI, a multi-agent system meant to act like parts of a quantitative research team handling equity factor work on big financial datasets. It connects a panel data toolkit to MCP servers that turn data access, factor building, and coding into tools the agents can use. The authors show it can copy known factors, create and check new ones, and write reports with explanations and logs. They argue that chaining tool calls with reflection-based planning gives better results and clearer reasoning on complex tasks than just generating code on the fly.

Core claim

QRAFTI is a multi-agent framework that emulates parts of a quantitative research team for equity factor research on large financial panel datasets by integrating MCP servers exposing data access, factor construction, and custom coding operations as callable tools, allowing replication of factors, testing of new signals, and generation of reports with narrative and traces; on multi-step empirical tasks, chained tool calls and reflection-based planning offer better performance and explainability than dynamic code generation alone.

What carries the argument

The multi-agent system using chained tool calls with reflection-based planning, backed by MCP servers that expose data access, factor construction, and custom coding as tools.

If this is right

It automates replication of established equity factors on large panels with full traces.
It supports systematic formulation and testing of new factor signals.
It produces standardized reports that include both narrative analysis and computational details.
Multi-step empirical workflows gain structure and traceability without manual intervention at each step.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could extend to non-finance empirical domains by swapping the MCP server tools for domain-specific ones.
Logged traces from the system might support audit or regulatory review of quantitative findings.
Pairing the planning layer with larger models could handle longer research chains or noisier data.
Widespread use might shift quant teams toward verifying agent outputs rather than writing all code themselves.

Load-bearing premise

Integrating MCP servers for data access, factor construction, and custom coding will enable the multi-agent system to reliably emulate parts of a quantitative research team on large financial panels.

What would settle it

A controlled test on identical multi-step financial panel tasks where dynamic code generation alone matches or exceeds the performance and explainability of the chained tool calls with reflection planning.

Figures

Figures reproduced from arXiv: 2604.18500 by Kumar Muthuraman, Michael Sury, Terence Lim.

**Figure 2.** Figure 2: Fama-French HML-workflow user queries nentially weighted stock-volatility characteristic— QRAFTI delegates the task to a code-writing and execution agent. This agent writes Python code against the Panel API, executes it, and returns both the code and the resulting artifact (that is, the identifier of a newly created Panel dataset), which can subsequently be used in later tool calls. Reviewers may also in… view at source ↗

**Figure 5.** Figure 5: Listing of custom code generated swer every query five times, which allows us to evaluate both accuracy and consistency across repeated trials. For each task, we report Sim@k for k ∈ {1, 2, 5}, defined as the expected maximum cosine similarity between the reference panel and a set of k generated outputs sampled from the n attempts. This metric is similar in spirit to Pass@k (Kulal et al., 2019), but repla… view at source ↗

**Figure 3.** Figure 3: Price momentum JKP-workflow user query [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Scatter plots of constructed panels against [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 6.** Figure 6: UI Demo: Conversation history 12 [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: UI Demo: Computation graph and standardized research report with narrative analysis [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

read the original abstract

We introduce a multi-agent framework intended to emulate parts of a quantitative research team and support equity factor research on large financial panel datasets. QRAFTI integrates a research toolkit for panel data with MCP servers that expose data access, factor construction, and custom coding operations as callable tools. It can help replicate established factors, formulate and test new signals, and generate standardized research reports accompanied by narrative analysis and computational traces. On multi-step empirical tasks, using chained tool calls and reflection-based planning may offer better performance and explainability than dynamic code generation alone.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

QRAFTI names a multi-agent framework for financial factor research but supplies no tests or comparisons to support its claims about chained tool calls.

read the letter

QRAFTI is a proposed multi-agent system for equity factor work on large financial panels. It uses MCP servers to turn data access, factor building, and custom coding into callable tools, then layers on planning and reflection so agents can chain steps and produce reports with traces. The main suggestion is that this setup may handle multi-step tasks better and more explainably than just generating code on demand. That is the one thing a colleague needs to know up front: the paper is still at the architecture-description stage with no supporting runs or metrics attached.

Referee Report

2 major / 2 minor

Summary. The paper introduces QRAFTI, a multi-agent framework for empirical quantitative finance research on large equity panel datasets. It integrates MCP servers exposing data access, factor construction, and custom coding as tools, combined with reflection-based planning and chained tool calls, to emulate parts of a quant research team. The system is intended to support factor replication, new signal testing, and generation of standardized reports with narrative analysis and computational traces. The abstract posits that this architecture may yield better performance and explainability than dynamic code generation alone on multi-step empirical tasks.

Significance. If empirically validated, the framework could contribute to reproducible automation of routine quant research workflows, particularly for panel-data factor work. The explicit separation of tool interfaces (MCP) from planning and reflection layers is a clear architectural choice that could improve traceability. At present, however, the manuscript offers only a descriptive proposal without any implemented evaluation, so its significance remains prospective rather than demonstrated.

major comments (2)

[Abstract] Abstract: the central claim that 'chained tool calls and reflection-based planning may offer better performance and explainability than dynamic code generation alone' is stated without any supporting experiments, ablation studies, success-rate metrics, error analysis, or baseline comparisons on concrete financial-panel tasks such as factor replication or signal testing.
[Framework description (throughout)] The manuscript provides no evaluation section, results, or case studies demonstrating that the MCP-server integration and multi-agent planning reliably emulate quant-research-team behavior on large panels; the weakest assumption (reliable emulation) therefore remains untested.

minor comments (2)

[Abstract] The abstract and introduction should explicitly label the performance statement as a hypothesis rather than a suggested advantage, to avoid implying empirical support.
[System architecture] Notation for MCP servers and agent roles is introduced without a dedicated diagram or table summarizing the tool interfaces; adding one would improve clarity for readers unfamiliar with the MCP protocol.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We agree that QRAFTI is presented as an architectural framework proposal without empirical evaluations or results sections at this stage. We address each major comment below and will make targeted revisions to clarify the scope and strengthen the presentation.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'chained tool calls and reflection-based planning may offer better performance and explainability than dynamic code generation alone' is stated without any supporting experiments, ablation studies, success-rate metrics, error analysis, or baseline comparisons on concrete financial-panel tasks such as factor replication or signal testing.

Authors: We acknowledge that the claim, even with the qualifier 'may', lacks empirical backing in the current manuscript. The statement was intended as a design hypothesis rather than a demonstrated result. We will revise the abstract to remove the comparative claim entirely and instead describe the framework's intended use for multi-step tasks, supported only by the architectural rationale. This revision will be made in the next version. revision: yes
Referee: [Framework description (throughout)] The manuscript provides no evaluation section, results, or case studies demonstrating that the MCP-server integration and multi-agent planning reliably emulate quant-research-team behavior on large panels; the weakest assumption (reliable emulation) therefore remains untested.

Authors: The manuscript is a framework introduction focused on the system architecture, tool interfaces via MCP servers, and planning mechanisms. We agree that reliable emulation of quant research behavior is an untested assumption. In the revised manuscript we will add a dedicated 'Illustrative Examples' section containing concrete case studies of factor replication workflows, including sample tool-call sequences, reflection steps, and generated report traces. These will serve as demonstrations of operation rather than quantitative benchmarks; comprehensive evaluations with metrics are planned for follow-on work. revision: partial

Circularity Check

0 steps flagged

No circularity: descriptive framework with no derivations or fitted inputs

full rationale

The paper is a system-description proposal for a multi-agent framework (QRAFTI) that integrates MCP servers for data, factors, and coding. It contains no equations, no parameter fitting, no derivation chain, and no self-citations used to justify a mathematical result. The sole performance statement is explicitly hedged as a hypothesis ('may offer better performance... than dynamic code generation alone') rather than a claim derived from prior steps or data within the paper. All other content is architectural description. This satisfies the default expectation of no significant circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are specified in the abstract.

pith-pipeline@v0.9.0 · 5384 in / 1044 out tokens · 44554 ms · 2026-05-10T03:02:14.962511+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

90 extracted references · 49 canonical work pages · 13 internal anchors

[1]

2025 , month = mar, day =

Matt Robinson , title =. 2025 , month = mar, day =

2025
[2]

The Journal of Finance , volume=

Presidential Address: Discount Rates , author=. The Journal of Finance , volume=. 2011 , doi=

2011
[3]

2014 , publisher =

Andrew Ang , title =. 2014 , publisher =

2014
[4]

& Rahwan, T

Self-Reflection Makes Large Language Models Safer, Less Biased, and Ideologically Neutral , author =. arXiv preprint arXiv:2406.10400 , year =. doi:10.48550/arXiv.2406.10400 , url =

work page doi:10.48550/arxiv.2406.10400
[5]

Beyond Prompting: An Autonomous Framework for Systematic Factor Investing via Agentic AI

Beyond Prompting: An Autonomous Framework for Systematic Factor Investing via Agentic AI , author =. 2026 , journal =. doi:10.48550/arXiv.2603.14288 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2603.14288 2026
[6]

The Journal of Finance , year =

Feng, Guanhao and Giglio, Stefano and Xiu, Dacheng , title =. The Journal of Finance , year =
[7]

French , title =

Kenneth R. French , title =. 2022 , month = mar, day =

2022
[8]

2024 , month = oct, day =

Jamil Baz , title =. 2024 , month = oct, day =

2024
[9]

2026 , month = mar, day =

Bessembinder, Hendrik , title =. 2026 , month = mar, day =

2026
[10]

The Journal of Finance , year =

Tyler Shumway , title =. The Journal of Finance , year =
[11]

Warther , title =

Tyler Shumway and Vincent A. Warther , title =. The Journal of Finance , year =
[12]

2025 , eprint=

LLMs Get Lost In Multi-Turn Conversation , author=. 2025 , eprint=

2025
[13]

2019 , eprint=

SPoC: Search-based Pseudocode to Code , author=. 2019 , eprint=

2019
[14]

David and Pontiff, Jeffrey , title =

McLean, R. David and Pontiff, Jeffrey , title =. The Journal of Finance , volume =. doi:https://doi.org/10.1111/jofi.12365 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/jofi.12365 , abstract =

work page doi:10.1111/jofi.12365
[15]

and MacKinlay, A

Lo, Andrew W. and MacKinlay, A. Craig , title =. The Review of Financial Studies , volume =. 1990 , month = jul, doi =

1990
[16]

2021 , institution =

Hasler, Mathias , title =. 2021 , institution =. doi:10.2139/ssrn.3886984 , note =

work page doi:10.2139/ssrn.3886984 2021
[17]

2026 , eprint=

Evaluation and Benchmarking Suite for Financial Large Language Models and Agents , author=. 2026 , eprint=

2026
[18]

International Journal of Emerging Markets , year =

Mishra, Shibi and Singh, Shveta and Misra, Alok Misra , title =. International Journal of Emerging Markets , year =. doi:10.1108/IJOEM-02-2024-0279 , url =

work page doi:10.1108/ijoem-02-2024-0279 2024
[19]

SSRN Electronic Journal , year =

Batra, Devesh and Hamill, Conor and Hartley, John and Okhrati, Ramin and Seddon, Dale and Miller, Harvey and Khraishi, Raad and Cowan, Greig , title =. SSRN Electronic Journal , year =. doi:10.2139/ssrn.5381584 , url =

work page doi:10.2139/ssrn.5381584
[20]

2024 , howpublished =

Introducing the Model Context Protocol , author =. 2024 , howpublished =

2024
[21]

2025 , howpublished =

Model Context Protocol Specification , author =. 2025 , howpublished =

2025
[22]

ReAct: Synergizing Reasoning and Acting in Language Models

ReAct: Synergizing Reasoning and Acting in Language Models , author =. arXiv preprint arXiv:2210.03629 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[23]

Toolformer: Language Models Can Teach Themselves to Use Tools

Toolformer: Language Models Can Teach Themselves to Use Tools , author =. arXiv preprint arXiv:2302.04761 , year =

work page internal anchor Pith review arXiv
[24]

Auto-gpt for online decision making: Benchmarks and additional opinions.arXiv preprint arXiv:2306.02224,

Auto-GPT for Online Decision Making: Benchmarks and Additional Opinions , author =. arXiv preprint arXiv:2306.02224 , year =

work page arXiv
[25]

AgentBench: Evaluating LLMs as Agents

AgentBench: Evaluating LLMs as Agents , author =. arXiv preprint arXiv:2308.03688 , year =

work page internal anchor Pith review arXiv
[26]

Proceedings of EMNLP 2025 Industry Track , year =

A Multi-Agent Framework for Quantitative Finance: An Application to Portfolio Management Analytics , author =. Proceedings of EMNLP 2025 Industry Track , year =

2025
[27]

InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents

InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents , author =. arXiv preprint arXiv:2403.02691 , year =

work page internal anchor Pith review arXiv
[28]

Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions

Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions , author =. arXiv preprint arXiv:2503.23278 , year =

work page internal anchor Pith review arXiv
[29]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , author =. arXiv preprint arXiv:2005.11401 , year =

work page internal anchor Pith review arXiv 2005
[30]

Billion-scale similarity search with GPUs

Billion-scale similarity search with GPUs , author =. arXiv preprint arXiv:1702.08734 , year =

work page Pith review arXiv
[31]

Proceedings of EMNLP-IJCNLP , year =

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , author =. Proceedings of EMNLP-IJCNLP , year =
[32]

Journal of Financial Economics , volume =

Common risk factors in the returns on stocks and bonds , author =. Journal of Financial Economics , volume =. 1993 , doi =

1993
[33]

Journal of Financial Economics , volume =

A five-factor asset pricing model , author =. Journal of Financial Economics , volume =. 2015 , doi =

2015
[34]

Journal of Political Economy , volume =

Risk, Return, and Equilibrium: Empirical Tests , author =. Journal of Political Economy , volume =. 1973 , doi =

1973
[35]

The Review of Financial Studies , volume =

and the Cross-Section of Expected Returns , author =. The Review of Financial Studies , volume =. 2016 , doi =

2016
[36]

The Review of Financial Studies , volume =

Replicating Anomalies , author =. The Review of Financial Studies , volume =. 2020 , doi =

2020
[37]

2026 , howpublished =

PydanticAI: GenAI Agent Framework, the Pydantic way , author =. 2026 , howpublished =

2026
[38]

2026 , howpublished =

Pydantic Logfire Documentation , author =. 2026 , howpublished =

2026
[40]

and Karkee, Manoj , year=

Sapkota, Ranjan and Roumeliotis, Konstantinos I. and Karkee, Manoj , year=. AI Agents vs. Agentic AI: A Conceptual taxonomy, applications and challenges , volume=. doi:10.1016/j.inffus.2025.103599 , journal=

work page doi:10.1016/j.inffus.2025.103599 2025
[41]

2024 , eprint=

MemGPT: Towards LLMs as Operating Systems , author=. 2024 , eprint=

2024
[42]

SSRN Electronic Journal , year =

Assaying Anomalies , author =. SSRN Electronic Journal , year =. doi:10.2139/ssrn.4723712 , note =

work page doi:10.2139/ssrn.4723712
[43]

The Journal of Portfolio Management , volume =

Factor Timing with Cross-Sectional and Time-Series Predictors , author =. The Journal of Portfolio Management , volume =. 2017 , doi =

2017
[44]

Boyd and Enzo Busseti and Steven Diamond and Ronald N

Multi-Period Trading via Convex Optimization , author =. Foundations and Trends in Optimization , volume =. 2016 , publisher =. doi:10.1561/2400000023 , url =

work page doi:10.1561/2400000023 2016
[45]

2025 , month =

AI-Powered (Finance) Scholarship , author =. 2025 , month =. doi:10.3386/w33363 , url =

work page doi:10.3386/w33363 2025
[46]

Journal of Financial Economics , volume =

Comparing Factor Models with Price-Impact Costs , author =. Journal of Financial Economics , volume =. 2024 , doi =

2024
[47]

arXiv preprint arXiv:2407.17866 , year =

Financial Statement Analysis with Large Language Models , author =. arXiv preprint arXiv:2407.17866 , year =

work page arXiv
[48]

Agentic AI: A Conceptual Taxonomy, Applications and Challenges , author=

AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges , author=. 2025 , eprint=

2025
[49]

2025 , eprint=

Querying Databases with Function Calling , author=. 2025 , eprint=

2025
[50]

doi:10.1016/j.cell.2024.09.022

Empowering biomedical discovery with AI agents , author =. Cell , volume =. 2024 , month =. doi:10.1016/j.cell.2024.09.022 , url =

work page doi:10.1016/j.cell.2024.09.022 2024
[51]

The Theory and Practice of Investment Management: Asset Allocation, Valuation, Portfolio Construction, and Strategies , editor =

Alford, Andrew and Jones, Robert and Lim, Terence , title =. The Theory and Practice of Investment Management: Asset Allocation, Valuation, Portfolio Construction, and Strategies , editor =. 2011 , chapter =. doi:10.1002/9781118267028.ch11 , url =

work page doi:10.1002/9781118267028.ch11 2011
[52]

The Journal of Finance , year =

Jensen, Theis Ingerslev and Kelly, Bryan and Pedersen, Lasse Heje , title =. The Journal of Finance , year =
[53]

Information

Trading Costs and Returns for U.S. Equities: Estimating Effective Costs from Daily Data , author =. The Journal of Finance , volume =. 2009 , month =. doi:10.1111/j.1540-6261.2009.01469.x , url =

work page doi:10.1111/j.1540-6261.2009.01469.x 2009
[54]

Journal of Financial and Quantitative Analysis , volume =

Zeroing In on the Expected Returns of Anomalies , author =. Journal of Financial and Quantitative Analysis , volume =. 2023 , doi =

2023
[55]

Journal of Finance , volume =

Model Comparison with Transaction Costs , author =. Journal of Finance , volume =. 2023 , month =. doi:10.1111/jofi.13225 , url =

work page doi:10.1111/jofi.13225 2023
[56]

SSRN Electronic Journal , year =

Assaying Anomalies , author =. SSRN Electronic Journal , year =. doi:10.2139/ssrn.4338007 , url =

work page doi:10.2139/ssrn.4338007
[57]

Swanson, et al., The Virtual Lab of AI Agents Designs New SARS-CoV-2 Nanobodies, Nature 646 (2025) 716–723

The Virtual Lab of AI Agents Designs New SARS-CoV-2 Nanobodies , author =. Nature , year =. doi:10.1038/s41586-025-09442-9 , url =

work page doi:10.1038/s41586-025-09442-9
[58]

Journal of Finance , volume =

A Multifactor Perspective on Volatility-Managed Portfolios , author =. Journal of Finance , volume =. 2024 , month =. doi:10.1111/jofi.13395 , url =

work page doi:10.1111/jofi.13395 2024
[59]

Journal of Financial Economics , volume =

Show Me the Money: The Monetary Policy Risk Premium , author =. Journal of Financial Economics , volume =. 2020 , month =. doi:10.1016/j.jfineco.2019.06.012 , url =

work page doi:10.1016/j.jfineco.2019.06.012 2020
[60]

BloombergGPT: A Large Language Model for Finance

BloombergGPT: A Large Language Model for Finance , author =. arXiv preprint arXiv:2303.17564 , year =

work page internal anchor Pith review arXiv
[61]

arXiv preprint arXiv:2306.11025 , year =

Temporal Data Meets LLM – Explainable Financial Time Series Forecasting , author =. arXiv preprint arXiv:2306.11025 , year =

work page arXiv
[62]

URL https: //doi.org/10.1145/3604237.3626869

Large Language Models in Finance: A Survey , author =. Proceedings of the 4th ACM International Conference on AI in Finance (ICAIF '23) , pages =. 2023 , publisher =. doi:10.1145/3604237.3626869 , url =

work page doi:10.1145/3604237.3626869 2023
[63]

Revolutionizing finance with llms: An overview of applications and insights.arXiv preprint arXiv:2401.11641,

Revolutionizing Finance with LLMs: An Overview of Applications and Insights , author =. arXiv preprint arXiv:2401.11641 , year =

work page arXiv
[64]

A Survey of Large Language Models in Finance (FinLLMs),

A Survey of Large Language Models in Finance (FinLLMs) , author =. arXiv preprint arXiv:2402.02315 , year =

work page arXiv
[65]

Xiaoning Dong, Wenbo Hu, Wei Xu, and Tianxing He

Large Language Model Agent in Financial Trading: A Survey , author =. arXiv preprint arXiv:2408.06361 , year =

work page arXiv
[66]

Chat Bankman-Fried: an Exploration of

Biancotti, Claudia and Camassa, Carolina and Coletta, Andrea and Giudice, Oliver and Glielmo, Aldo , booktitle =. Chat Bankman-Fried: an Exploration of. 2025 , address =

2025
[67]

Investor- bench: A benchmark for financial decision-making tasks with llm-based agent.arXiv preprint arXiv:2412.18174, 2024

INVESTORBENCH: A Benchmark for Financial Decision-Making Tasks with LLM-based Agent , author =. arXiv preprint arXiv:2412.18174 , year =

work page arXiv
[68]

Mar- ketsenseai 2.0: Enhancing stock analysis through llm agents,

MarketSenseAI 2.0: Enhancing Stock Analysis through LLM Agents , author =. arXiv preprint arXiv:2502.00415 , year =

work page arXiv
[69]

Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E

Position: Standard Benchmarks Fail – LLM Agents Present Overlooked Risks for Financial Applications , author =. arXiv preprint arXiv:2502.15865 , year =

work page arXiv
[70]

and Tom Zimmermann , title =

Chen, Andrew Y. and Tom Zimmermann , title =. Critical Finance Review , volume =. 2022 , month =. doi:10.1561/104.00000112 , url =

work page doi:10.1561/104.00000112 2022
[71]

Advances in Neural Information Processing Systems , volume=

Language Models are Few-Shot Learners , author=. Advances in Neural Information Processing Systems , volume=. 2020 , url=

2020
[72]

Advances in Neural Information Processing Systems , volume=

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , author=. Advances in Neural Information Processing Systems , volume=. 2022 , url=

2022
[73]

Reflexion: Language Agents with Verbal Reinforcement Learning

Reflexion: Language Agents with Verbal Reinforcement Learning , author=. 2023 , eprint=. doi:10.48550/arXiv.2303.11366 , url=

work page internal anchor Pith review doi:10.48550/arxiv.2303.11366 2023
[74]

Self-Refine: Iterative Refinement with Self-Feedback

Self-Refine: Iterative Refinement with Self-Feedback , author=. 2023 , eprint=. doi:10.48550/arXiv.2303.17651 , url=

work page internal anchor Pith review doi:10.48550/arxiv.2303.17651 2023
[75]

Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models.arXiv preprint arXiv:2305.04091,

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models , author=. 2023 , eprint=. doi:10.48550/arXiv.2305.04091 , url=

work page doi:10.48550/arxiv.2305.04091 2023
[76]

Re- woo: Decoupling reasoning from observations for ef- ficient augmented language models

ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models , author=. 2023 , eprint=. doi:10.48550/arXiv.2305.18323 , url=

work page doi:10.48550/arxiv.2305.18323 2023
[77]

The Twelfth International Conference on Learning Representations (ICLR) , year=

RECOMP: Improving Retrieval-Augmented LMs with Context Compression and Selective Augmentation , author=. The Twelfth International Conference on Learning Representations (ICLR) , year=
[78]

Transactions of the Association for Computational Linguistics , year=

Lost in the Middle: How Language Models Use Long Contexts , author=. Transactions of the Association for Computational Linguistics , year=
[79]

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation , author=. 2023 , eprint=. doi:10.48550/arXiv.2308.08155 , url=

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2308.08155 2023
[80]

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework , author=. 2024 , eprint=. doi:10.48550/arXiv.2308.00352 , url=

work page internal anchor Pith review doi:10.48550/arxiv.2308.00352 2024
[81]

AgentBench: Evaluating LLMs as Agents

AgentBench: Evaluating LLMs as Agents , author=. 2023 , eprint=. doi:10.48550/arXiv.2308.03688 , url=

work page internal anchor Pith review doi:10.48550/arxiv.2308.03688 2023

Showing first 80 references.