pith. machine review for the scientific record. sign in

arxiv: 2604.18500 · v1 · submitted 2026-04-20 · 💻 cs.MA · q-fin.GN

Recognition: unknown

QRAFTI: An Agentic Framework for Empirical Research in Quantitative Finance

Authors on Pith no claims yet

Pith reviewed 2026-05-10 03:02 UTC · model grok-4.3

classification 💻 cs.MA q-fin.GN
keywords multi-agent systemsquantitative financeequity factorsempirical researchagentic frameworkspanel datatool callingreflection planning
0
0 comments X

The pith

A multi-agent framework with chained tool calls and reflection planning supports quantitative equity factor research better than dynamic code generation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces QRAFTI, a multi-agent system meant to act like parts of a quantitative research team handling equity factor work on big financial datasets. It connects a panel data toolkit to MCP servers that turn data access, factor building, and coding into tools the agents can use. The authors show it can copy known factors, create and check new ones, and write reports with explanations and logs. They argue that chaining tool calls with reflection-based planning gives better results and clearer reasoning on complex tasks than just generating code on the fly.

Core claim

QRAFTI is a multi-agent framework that emulates parts of a quantitative research team for equity factor research on large financial panel datasets by integrating MCP servers exposing data access, factor construction, and custom coding operations as callable tools, allowing replication of factors, testing of new signals, and generation of reports with narrative and traces; on multi-step empirical tasks, chained tool calls and reflection-based planning offer better performance and explainability than dynamic code generation alone.

What carries the argument

The multi-agent system using chained tool calls with reflection-based planning, backed by MCP servers that expose data access, factor construction, and custom coding as tools.

If this is right

  • It automates replication of established equity factors on large panels with full traces.
  • It supports systematic formulation and testing of new factor signals.
  • It produces standardized reports that include both narrative analysis and computational details.
  • Multi-step empirical workflows gain structure and traceability without manual intervention at each step.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could extend to non-finance empirical domains by swapping the MCP server tools for domain-specific ones.
  • Logged traces from the system might support audit or regulatory review of quantitative findings.
  • Pairing the planning layer with larger models could handle longer research chains or noisier data.
  • Widespread use might shift quant teams toward verifying agent outputs rather than writing all code themselves.

Load-bearing premise

Integrating MCP servers for data access, factor construction, and custom coding will enable the multi-agent system to reliably emulate parts of a quantitative research team on large financial panels.

What would settle it

A controlled test on identical multi-step financial panel tasks where dynamic code generation alone matches or exceeds the performance and explainability of the chained tool calls with reflection planning.

Figures

Figures reproduced from arXiv: 2604.18500 by Kumar Muthuraman, Michael Sury, Terence Lim.

Figure 1
Figure 1. Figure 1: Agentic framework for empirical research in quantitative finance [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Fama-French HML-workflow user queries nentially weighted stock-volatility characteristic— QRAFTI delegates the task to a code-writing and execution agent. This agent writes Python code against the Panel API, executes it, and returns both the code and the resulting artifact (that is, the iden￾tifier of a newly created Panel dataset), which can subsequently be used in later tool calls. Review￾ers may also in… view at source ↗
Figure 5
Figure 5. Figure 5: Listing of custom code generated swer every query five times, which allows us to evaluate both accuracy and consistency across re￾peated trials. For each task, we report Sim@k for k ∈ {1, 2, 5}, defined as the expected maximum cosine similarity between the reference panel and a set of k generated outputs sampled from the n attempts. This metric is similar in spirit to Pass@k (Kulal et al., 2019), but repla… view at source ↗
Figure 3
Figure 3. Figure 3: Price momentum JKP-workflow user query [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Scatter plots of constructed panels against [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: UI Demo: Conversation history 12 [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: UI Demo: Computation graph and standardized research report with narrative analysis [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
read the original abstract

We introduce a multi-agent framework intended to emulate parts of a quantitative research team and support equity factor research on large financial panel datasets. QRAFTI integrates a research toolkit for panel data with MCP servers that expose data access, factor construction, and custom coding operations as callable tools. It can help replicate established factors, formulate and test new signals, and generate standardized research reports accompanied by narrative analysis and computational traces. On multi-step empirical tasks, using chained tool calls and reflection-based planning may offer better performance and explainability than dynamic code generation alone.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces QRAFTI, a multi-agent framework for empirical quantitative finance research on large equity panel datasets. It integrates MCP servers exposing data access, factor construction, and custom coding as tools, combined with reflection-based planning and chained tool calls, to emulate parts of a quant research team. The system is intended to support factor replication, new signal testing, and generation of standardized reports with narrative analysis and computational traces. The abstract posits that this architecture may yield better performance and explainability than dynamic code generation alone on multi-step empirical tasks.

Significance. If empirically validated, the framework could contribute to reproducible automation of routine quant research workflows, particularly for panel-data factor work. The explicit separation of tool interfaces (MCP) from planning and reflection layers is a clear architectural choice that could improve traceability. At present, however, the manuscript offers only a descriptive proposal without any implemented evaluation, so its significance remains prospective rather than demonstrated.

major comments (2)
  1. [Abstract] Abstract: the central claim that 'chained tool calls and reflection-based planning may offer better performance and explainability than dynamic code generation alone' is stated without any supporting experiments, ablation studies, success-rate metrics, error analysis, or baseline comparisons on concrete financial-panel tasks such as factor replication or signal testing.
  2. [Framework description (throughout)] The manuscript provides no evaluation section, results, or case studies demonstrating that the MCP-server integration and multi-agent planning reliably emulate quant-research-team behavior on large panels; the weakest assumption (reliable emulation) therefore remains untested.
minor comments (2)
  1. [Abstract] The abstract and introduction should explicitly label the performance statement as a hypothesis rather than a suggested advantage, to avoid implying empirical support.
  2. [System architecture] Notation for MCP servers and agent roles is introduced without a dedicated diagram or table summarizing the tool interfaces; adding one would improve clarity for readers unfamiliar with the MCP protocol.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We agree that QRAFTI is presented as an architectural framework proposal without empirical evaluations or results sections at this stage. We address each major comment below and will make targeted revisions to clarify the scope and strengthen the presentation.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'chained tool calls and reflection-based planning may offer better performance and explainability than dynamic code generation alone' is stated without any supporting experiments, ablation studies, success-rate metrics, error analysis, or baseline comparisons on concrete financial-panel tasks such as factor replication or signal testing.

    Authors: We acknowledge that the claim, even with the qualifier 'may', lacks empirical backing in the current manuscript. The statement was intended as a design hypothesis rather than a demonstrated result. We will revise the abstract to remove the comparative claim entirely and instead describe the framework's intended use for multi-step tasks, supported only by the architectural rationale. This revision will be made in the next version. revision: yes

  2. Referee: [Framework description (throughout)] The manuscript provides no evaluation section, results, or case studies demonstrating that the MCP-server integration and multi-agent planning reliably emulate quant-research-team behavior on large panels; the weakest assumption (reliable emulation) therefore remains untested.

    Authors: The manuscript is a framework introduction focused on the system architecture, tool interfaces via MCP servers, and planning mechanisms. We agree that reliable emulation of quant research behavior is an untested assumption. In the revised manuscript we will add a dedicated 'Illustrative Examples' section containing concrete case studies of factor replication workflows, including sample tool-call sequences, reflection steps, and generated report traces. These will serve as demonstrations of operation rather than quantitative benchmarks; comprehensive evaluations with metrics are planned for follow-on work. revision: partial

Circularity Check

0 steps flagged

No circularity: descriptive framework with no derivations or fitted inputs

full rationale

The paper is a system-description proposal for a multi-agent framework (QRAFTI) that integrates MCP servers for data, factors, and coding. It contains no equations, no parameter fitting, no derivation chain, and no self-citations used to justify a mathematical result. The sole performance statement is explicitly hedged as a hypothesis ('may offer better performance... than dynamic code generation alone') rather than a claim derived from prior steps or data within the paper. All other content is architectural description. This satisfies the default expectation of no significant circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are specified in the abstract.

pith-pipeline@v0.9.0 · 5384 in / 1044 out tokens · 44554 ms · 2026-05-10T03:02:14.962511+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

90 extracted references · 49 canonical work pages · 13 internal anchors

  1. [1]

    2025 , month = mar, day =

    Matt Robinson , title =. 2025 , month = mar, day =

  2. [2]

    The Journal of Finance , volume=

    Presidential Address: Discount Rates , author=. The Journal of Finance , volume=. 2011 , doi=

  3. [3]

    2014 , publisher =

    Andrew Ang , title =. 2014 , publisher =

  4. [4]

    & Rahwan, T

    Self-Reflection Makes Large Language Models Safer, Less Biased, and Ideologically Neutral , author =. arXiv preprint arXiv:2406.10400 , year =. doi:10.48550/arXiv.2406.10400 , url =

  5. [5]

    Beyond Prompting: An Autonomous Framework for Systematic Factor Investing via Agentic AI

    Beyond Prompting: An Autonomous Framework for Systematic Factor Investing via Agentic AI , author =. 2026 , journal =. doi:10.48550/arXiv.2603.14288 , url =

  6. [6]

    The Journal of Finance , year =

    Feng, Guanhao and Giglio, Stefano and Xiu, Dacheng , title =. The Journal of Finance , year =

  7. [7]

    French , title =

    Kenneth R. French , title =. 2022 , month = mar, day =

  8. [8]

    2024 , month = oct, day =

    Jamil Baz , title =. 2024 , month = oct, day =

  9. [9]

    2026 , month = mar, day =

    Bessembinder, Hendrik , title =. 2026 , month = mar, day =

  10. [10]

    The Journal of Finance , year =

    Tyler Shumway , title =. The Journal of Finance , year =

  11. [11]

    Warther , title =

    Tyler Shumway and Vincent A. Warther , title =. The Journal of Finance , year =

  12. [12]

    2025 , eprint=

    LLMs Get Lost In Multi-Turn Conversation , author=. 2025 , eprint=

  13. [13]

    2019 , eprint=

    SPoC: Search-based Pseudocode to Code , author=. 2019 , eprint=

  14. [14]

    David and Pontiff, Jeffrey , title =

    McLean, R. David and Pontiff, Jeffrey , title =. The Journal of Finance , volume =. doi:https://doi.org/10.1111/jofi.12365 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/jofi.12365 , abstract =

  15. [15]

    and MacKinlay, A

    Lo, Andrew W. and MacKinlay, A. Craig , title =. The Review of Financial Studies , volume =. 1990 , month = jul, doi =

  16. [16]

    2021 , institution =

    Hasler, Mathias , title =. 2021 , institution =. doi:10.2139/ssrn.3886984 , note =

  17. [17]

    2026 , eprint=

    Evaluation and Benchmarking Suite for Financial Large Language Models and Agents , author=. 2026 , eprint=

  18. [18]

    International Journal of Emerging Markets , year =

    Mishra, Shibi and Singh, Shveta and Misra, Alok Misra , title =. International Journal of Emerging Markets , year =. doi:10.1108/IJOEM-02-2024-0279 , url =

  19. [19]

    SSRN Electronic Journal , year =

    Batra, Devesh and Hamill, Conor and Hartley, John and Okhrati, Ramin and Seddon, Dale and Miller, Harvey and Khraishi, Raad and Cowan, Greig , title =. SSRN Electronic Journal , year =. doi:10.2139/ssrn.5381584 , url =

  20. [20]

    2024 , howpublished =

    Introducing the Model Context Protocol , author =. 2024 , howpublished =

  21. [21]

    2025 , howpublished =

    Model Context Protocol Specification , author =. 2025 , howpublished =

  22. [22]

    ReAct: Synergizing Reasoning and Acting in Language Models

    ReAct: Synergizing Reasoning and Acting in Language Models , author =. arXiv preprint arXiv:2210.03629 , year =

  23. [23]

    Toolformer: Language Models Can Teach Themselves to Use Tools

    Toolformer: Language Models Can Teach Themselves to Use Tools , author =. arXiv preprint arXiv:2302.04761 , year =

  24. [24]

    Auto-gpt for online decision making: Benchmarks and additional opinions.arXiv preprint arXiv:2306.02224,

    Auto-GPT for Online Decision Making: Benchmarks and Additional Opinions , author =. arXiv preprint arXiv:2306.02224 , year =

  25. [25]

    AgentBench: Evaluating LLMs as Agents

    AgentBench: Evaluating LLMs as Agents , author =. arXiv preprint arXiv:2308.03688 , year =

  26. [26]

    Proceedings of EMNLP 2025 Industry Track , year =

    A Multi-Agent Framework for Quantitative Finance: An Application to Portfolio Management Analytics , author =. Proceedings of EMNLP 2025 Industry Track , year =

  27. [27]

    InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents

    InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents , author =. arXiv preprint arXiv:2403.02691 , year =

  28. [28]

    Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions

    Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions , author =. arXiv preprint arXiv:2503.23278 , year =

  29. [29]

    Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

    Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , author =. arXiv preprint arXiv:2005.11401 , year =

  30. [30]

    Billion-scale similarity search with GPUs

    Billion-scale similarity search with GPUs , author =. arXiv preprint arXiv:1702.08734 , year =

  31. [31]

    Proceedings of EMNLP-IJCNLP , year =

    Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , author =. Proceedings of EMNLP-IJCNLP , year =

  32. [32]

    Journal of Financial Economics , volume =

    Common risk factors in the returns on stocks and bonds , author =. Journal of Financial Economics , volume =. 1993 , doi =

  33. [33]

    Journal of Financial Economics , volume =

    A five-factor asset pricing model , author =. Journal of Financial Economics , volume =. 2015 , doi =

  34. [34]

    Journal of Political Economy , volume =

    Risk, Return, and Equilibrium: Empirical Tests , author =. Journal of Political Economy , volume =. 1973 , doi =

  35. [35]

    The Review of Financial Studies , volume =

    and the Cross-Section of Expected Returns , author =. The Review of Financial Studies , volume =. 2016 , doi =

  36. [36]

    The Review of Financial Studies , volume =

    Replicating Anomalies , author =. The Review of Financial Studies , volume =. 2020 , doi =

  37. [37]

    2026 , howpublished =

    PydanticAI: GenAI Agent Framework, the Pydantic way , author =. 2026 , howpublished =

  38. [38]

    2026 , howpublished =

    Pydantic Logfire Documentation , author =. 2026 , howpublished =

  39. [40]

    and Karkee, Manoj , year=

    Sapkota, Ranjan and Roumeliotis, Konstantinos I. and Karkee, Manoj , year=. AI Agents vs. Agentic AI: A Conceptual taxonomy, applications and challenges , volume=. doi:10.1016/j.inffus.2025.103599 , journal=

  40. [41]

    2024 , eprint=

    MemGPT: Towards LLMs as Operating Systems , author=. 2024 , eprint=

  41. [42]

    SSRN Electronic Journal , year =

    Assaying Anomalies , author =. SSRN Electronic Journal , year =. doi:10.2139/ssrn.4723712 , note =

  42. [43]

    The Journal of Portfolio Management , volume =

    Factor Timing with Cross-Sectional and Time-Series Predictors , author =. The Journal of Portfolio Management , volume =. 2017 , doi =

  43. [44]

    Boyd and Enzo Busseti and Steven Diamond and Ronald N

    Multi-Period Trading via Convex Optimization , author =. Foundations and Trends in Optimization , volume =. 2016 , publisher =. doi:10.1561/2400000023 , url =

  44. [45]

    2025 , month =

    AI-Powered (Finance) Scholarship , author =. 2025 , month =. doi:10.3386/w33363 , url =

  45. [46]

    Journal of Financial Economics , volume =

    Comparing Factor Models with Price-Impact Costs , author =. Journal of Financial Economics , volume =. 2024 , doi =

  46. [47]

    arXiv preprint arXiv:2407.17866 , year =

    Financial Statement Analysis with Large Language Models , author =. arXiv preprint arXiv:2407.17866 , year =

  47. [48]

    Agentic AI: A Conceptual Taxonomy, Applications and Challenges , author=

    AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges , author=. 2025 , eprint=

  48. [49]

    2025 , eprint=

    Querying Databases with Function Calling , author=. 2025 , eprint=

  49. [50]

    doi:10.1016/j.cell.2024.09.022

    Empowering biomedical discovery with AI agents , author =. Cell , volume =. 2024 , month =. doi:10.1016/j.cell.2024.09.022 , url =

  50. [51]

    The Theory and Practice of Investment Management: Asset Allocation, Valuation, Portfolio Construction, and Strategies , editor =

    Alford, Andrew and Jones, Robert and Lim, Terence , title =. The Theory and Practice of Investment Management: Asset Allocation, Valuation, Portfolio Construction, and Strategies , editor =. 2011 , chapter =. doi:10.1002/9781118267028.ch11 , url =

  51. [52]

    The Journal of Finance , year =

    Jensen, Theis Ingerslev and Kelly, Bryan and Pedersen, Lasse Heje , title =. The Journal of Finance , year =

  52. [53]

    Information

    Trading Costs and Returns for U.S. Equities: Estimating Effective Costs from Daily Data , author =. The Journal of Finance , volume =. 2009 , month =. doi:10.1111/j.1540-6261.2009.01469.x , url =

  53. [54]

    Journal of Financial and Quantitative Analysis , volume =

    Zeroing In on the Expected Returns of Anomalies , author =. Journal of Financial and Quantitative Analysis , volume =. 2023 , doi =

  54. [55]

    Journal of Finance , volume =

    Model Comparison with Transaction Costs , author =. Journal of Finance , volume =. 2023 , month =. doi:10.1111/jofi.13225 , url =

  55. [56]

    SSRN Electronic Journal , year =

    Assaying Anomalies , author =. SSRN Electronic Journal , year =. doi:10.2139/ssrn.4338007 , url =

  56. [57]

    Swanson, et al., The Virtual Lab of AI Agents Designs New SARS-CoV-2 Nanobodies, Nature 646 (2025) 716–723

    The Virtual Lab of AI Agents Designs New SARS-CoV-2 Nanobodies , author =. Nature , year =. doi:10.1038/s41586-025-09442-9 , url =

  57. [58]

    Journal of Finance , volume =

    A Multifactor Perspective on Volatility-Managed Portfolios , author =. Journal of Finance , volume =. 2024 , month =. doi:10.1111/jofi.13395 , url =

  58. [59]

    Journal of Financial Economics , volume =

    Show Me the Money: The Monetary Policy Risk Premium , author =. Journal of Financial Economics , volume =. 2020 , month =. doi:10.1016/j.jfineco.2019.06.012 , url =

  59. [60]

    BloombergGPT: A Large Language Model for Finance

    BloombergGPT: A Large Language Model for Finance , author =. arXiv preprint arXiv:2303.17564 , year =

  60. [61]

    arXiv preprint arXiv:2306.11025 , year =

    Temporal Data Meets LLM – Explainable Financial Time Series Forecasting , author =. arXiv preprint arXiv:2306.11025 , year =

  61. [62]

    URL https: //doi.org/10.1145/3604237.3626869

    Large Language Models in Finance: A Survey , author =. Proceedings of the 4th ACM International Conference on AI in Finance (ICAIF '23) , pages =. 2023 , publisher =. doi:10.1145/3604237.3626869 , url =

  62. [63]

    Revolutionizing finance with llms: An overview of applications and insights.arXiv preprint arXiv:2401.11641,

    Revolutionizing Finance with LLMs: An Overview of Applications and Insights , author =. arXiv preprint arXiv:2401.11641 , year =

  63. [64]

    A Survey of Large Language Models in Finance (FinLLMs),

    A Survey of Large Language Models in Finance (FinLLMs) , author =. arXiv preprint arXiv:2402.02315 , year =

  64. [65]

    Xiaoning Dong, Wenbo Hu, Wei Xu, and Tianxing He

    Large Language Model Agent in Financial Trading: A Survey , author =. arXiv preprint arXiv:2408.06361 , year =

  65. [66]

    Chat Bankman-Fried: an Exploration of

    Biancotti, Claudia and Camassa, Carolina and Coletta, Andrea and Giudice, Oliver and Glielmo, Aldo , booktitle =. Chat Bankman-Fried: an Exploration of. 2025 , address =

  66. [67]

    Investor- bench: A benchmark for financial decision-making tasks with llm-based agent.arXiv preprint arXiv:2412.18174, 2024

    INVESTORBENCH: A Benchmark for Financial Decision-Making Tasks with LLM-based Agent , author =. arXiv preprint arXiv:2412.18174 , year =

  67. [68]

    Mar- ketsenseai 2.0: Enhancing stock analysis through llm agents,

    MarketSenseAI 2.0: Enhancing Stock Analysis through LLM Agents , author =. arXiv preprint arXiv:2502.00415 , year =

  68. [69]

    Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E

    Position: Standard Benchmarks Fail – LLM Agents Present Overlooked Risks for Financial Applications , author =. arXiv preprint arXiv:2502.15865 , year =

  69. [70]

    and Tom Zimmermann , title =

    Chen, Andrew Y. and Tom Zimmermann , title =. Critical Finance Review , volume =. 2022 , month =. doi:10.1561/104.00000112 , url =

  70. [71]

    Advances in Neural Information Processing Systems , volume=

    Language Models are Few-Shot Learners , author=. Advances in Neural Information Processing Systems , volume=. 2020 , url=

  71. [72]

    Advances in Neural Information Processing Systems , volume=

    Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , author=. Advances in Neural Information Processing Systems , volume=. 2022 , url=

  72. [73]

    Reflexion: Language Agents with Verbal Reinforcement Learning

    Reflexion: Language Agents with Verbal Reinforcement Learning , author=. 2023 , eprint=. doi:10.48550/arXiv.2303.11366 , url=

  73. [74]

    Self-Refine: Iterative Refinement with Self-Feedback

    Self-Refine: Iterative Refinement with Self-Feedback , author=. 2023 , eprint=. doi:10.48550/arXiv.2303.17651 , url=

  74. [75]

    Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models.arXiv preprint arXiv:2305.04091,

    Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models , author=. 2023 , eprint=. doi:10.48550/arXiv.2305.04091 , url=

  75. [76]

    Re- woo: Decoupling reasoning from observations for ef- ficient augmented language models

    ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models , author=. 2023 , eprint=. doi:10.48550/arXiv.2305.18323 , url=

  76. [77]

    The Twelfth International Conference on Learning Representations (ICLR) , year=

    RECOMP: Improving Retrieval-Augmented LMs with Context Compression and Selective Augmentation , author=. The Twelfth International Conference on Learning Representations (ICLR) , year=

  77. [78]

    Transactions of the Association for Computational Linguistics , year=

    Lost in the Middle: How Language Models Use Long Contexts , author=. Transactions of the Association for Computational Linguistics , year=

  78. [79]

    AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

    AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation , author=. 2023 , eprint=. doi:10.48550/arXiv.2308.08155 , url=

  79. [80]

    MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

    MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework , author=. 2024 , eprint=. doi:10.48550/arXiv.2308.00352 , url=

  80. [81]

    AgentBench: Evaluating LLMs as Agents

    AgentBench: Evaluating LLMs as Agents , author=. 2023 , eprint=. doi:10.48550/arXiv.2308.03688 , url=

Showing first 80 references.