Recognition: unknown
QRAFTI: An Agentic Framework for Empirical Research in Quantitative Finance
Pith reviewed 2026-05-10 03:02 UTC · model grok-4.3
The pith
A multi-agent framework with chained tool calls and reflection planning supports quantitative equity factor research better than dynamic code generation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
QRAFTI is a multi-agent framework that emulates parts of a quantitative research team for equity factor research on large financial panel datasets by integrating MCP servers exposing data access, factor construction, and custom coding operations as callable tools, allowing replication of factors, testing of new signals, and generation of reports with narrative and traces; on multi-step empirical tasks, chained tool calls and reflection-based planning offer better performance and explainability than dynamic code generation alone.
What carries the argument
The multi-agent system using chained tool calls with reflection-based planning, backed by MCP servers that expose data access, factor construction, and custom coding as tools.
If this is right
- It automates replication of established equity factors on large panels with full traces.
- It supports systematic formulation and testing of new factor signals.
- It produces standardized reports that include both narrative analysis and computational details.
- Multi-step empirical workflows gain structure and traceability without manual intervention at each step.
Where Pith is reading between the lines
- The approach could extend to non-finance empirical domains by swapping the MCP server tools for domain-specific ones.
- Logged traces from the system might support audit or regulatory review of quantitative findings.
- Pairing the planning layer with larger models could handle longer research chains or noisier data.
- Widespread use might shift quant teams toward verifying agent outputs rather than writing all code themselves.
Load-bearing premise
Integrating MCP servers for data access, factor construction, and custom coding will enable the multi-agent system to reliably emulate parts of a quantitative research team on large financial panels.
What would settle it
A controlled test on identical multi-step financial panel tasks where dynamic code generation alone matches or exceeds the performance and explainability of the chained tool calls with reflection planning.
Figures
read the original abstract
We introduce a multi-agent framework intended to emulate parts of a quantitative research team and support equity factor research on large financial panel datasets. QRAFTI integrates a research toolkit for panel data with MCP servers that expose data access, factor construction, and custom coding operations as callable tools. It can help replicate established factors, formulate and test new signals, and generate standardized research reports accompanied by narrative analysis and computational traces. On multi-step empirical tasks, using chained tool calls and reflection-based planning may offer better performance and explainability than dynamic code generation alone.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces QRAFTI, a multi-agent framework for empirical quantitative finance research on large equity panel datasets. It integrates MCP servers exposing data access, factor construction, and custom coding as tools, combined with reflection-based planning and chained tool calls, to emulate parts of a quant research team. The system is intended to support factor replication, new signal testing, and generation of standardized reports with narrative analysis and computational traces. The abstract posits that this architecture may yield better performance and explainability than dynamic code generation alone on multi-step empirical tasks.
Significance. If empirically validated, the framework could contribute to reproducible automation of routine quant research workflows, particularly for panel-data factor work. The explicit separation of tool interfaces (MCP) from planning and reflection layers is a clear architectural choice that could improve traceability. At present, however, the manuscript offers only a descriptive proposal without any implemented evaluation, so its significance remains prospective rather than demonstrated.
major comments (2)
- [Abstract] Abstract: the central claim that 'chained tool calls and reflection-based planning may offer better performance and explainability than dynamic code generation alone' is stated without any supporting experiments, ablation studies, success-rate metrics, error analysis, or baseline comparisons on concrete financial-panel tasks such as factor replication or signal testing.
- [Framework description (throughout)] The manuscript provides no evaluation section, results, or case studies demonstrating that the MCP-server integration and multi-agent planning reliably emulate quant-research-team behavior on large panels; the weakest assumption (reliable emulation) therefore remains untested.
minor comments (2)
- [Abstract] The abstract and introduction should explicitly label the performance statement as a hypothesis rather than a suggested advantage, to avoid implying empirical support.
- [System architecture] Notation for MCP servers and agent roles is introduced without a dedicated diagram or table summarizing the tool interfaces; adding one would improve clarity for readers unfamiliar with the MCP protocol.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We agree that QRAFTI is presented as an architectural framework proposal without empirical evaluations or results sections at this stage. We address each major comment below and will make targeted revisions to clarify the scope and strengthen the presentation.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that 'chained tool calls and reflection-based planning may offer better performance and explainability than dynamic code generation alone' is stated without any supporting experiments, ablation studies, success-rate metrics, error analysis, or baseline comparisons on concrete financial-panel tasks such as factor replication or signal testing.
Authors: We acknowledge that the claim, even with the qualifier 'may', lacks empirical backing in the current manuscript. The statement was intended as a design hypothesis rather than a demonstrated result. We will revise the abstract to remove the comparative claim entirely and instead describe the framework's intended use for multi-step tasks, supported only by the architectural rationale. This revision will be made in the next version. revision: yes
-
Referee: [Framework description (throughout)] The manuscript provides no evaluation section, results, or case studies demonstrating that the MCP-server integration and multi-agent planning reliably emulate quant-research-team behavior on large panels; the weakest assumption (reliable emulation) therefore remains untested.
Authors: The manuscript is a framework introduction focused on the system architecture, tool interfaces via MCP servers, and planning mechanisms. We agree that reliable emulation of quant research behavior is an untested assumption. In the revised manuscript we will add a dedicated 'Illustrative Examples' section containing concrete case studies of factor replication workflows, including sample tool-call sequences, reflection steps, and generated report traces. These will serve as demonstrations of operation rather than quantitative benchmarks; comprehensive evaluations with metrics are planned for follow-on work. revision: partial
Circularity Check
No circularity: descriptive framework with no derivations or fitted inputs
full rationale
The paper is a system-description proposal for a multi-agent framework (QRAFTI) that integrates MCP servers for data, factors, and coding. It contains no equations, no parameter fitting, no derivation chain, and no self-citations used to justify a mathematical result. The sole performance statement is explicitly hedged as a hypothesis ('may offer better performance... than dynamic code generation alone') rather than a claim derived from prior steps or data within the paper. All other content is architectural description. This satisfies the default expectation of no significant circularity.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
2025 , month = mar, day =
Matt Robinson , title =. 2025 , month = mar, day =
2025
-
[2]
The Journal of Finance , volume=
Presidential Address: Discount Rates , author=. The Journal of Finance , volume=. 2011 , doi=
2011
-
[3]
2014 , publisher =
Andrew Ang , title =. 2014 , publisher =
2014
-
[4]
Self-Reflection Makes Large Language Models Safer, Less Biased, and Ideologically Neutral , author =. arXiv preprint arXiv:2406.10400 , year =. doi:10.48550/arXiv.2406.10400 , url =
-
[5]
Beyond Prompting: An Autonomous Framework for Systematic Factor Investing via Agentic AI
Beyond Prompting: An Autonomous Framework for Systematic Factor Investing via Agentic AI , author =. 2026 , journal =. doi:10.48550/arXiv.2603.14288 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2603.14288 2026
-
[6]
The Journal of Finance , year =
Feng, Guanhao and Giglio, Stefano and Xiu, Dacheng , title =. The Journal of Finance , year =
-
[7]
French , title =
Kenneth R. French , title =. 2022 , month = mar, day =
2022
-
[8]
2024 , month = oct, day =
Jamil Baz , title =. 2024 , month = oct, day =
2024
-
[9]
2026 , month = mar, day =
Bessembinder, Hendrik , title =. 2026 , month = mar, day =
2026
-
[10]
The Journal of Finance , year =
Tyler Shumway , title =. The Journal of Finance , year =
-
[11]
Warther , title =
Tyler Shumway and Vincent A. Warther , title =. The Journal of Finance , year =
-
[12]
2025 , eprint=
LLMs Get Lost In Multi-Turn Conversation , author=. 2025 , eprint=
2025
-
[13]
2019 , eprint=
SPoC: Search-based Pseudocode to Code , author=. 2019 , eprint=
2019
-
[14]
David and Pontiff, Jeffrey , title =
McLean, R. David and Pontiff, Jeffrey , title =. The Journal of Finance , volume =. doi:https://doi.org/10.1111/jofi.12365 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/jofi.12365 , abstract =
-
[15]
and MacKinlay, A
Lo, Andrew W. and MacKinlay, A. Craig , title =. The Review of Financial Studies , volume =. 1990 , month = jul, doi =
1990
-
[16]
Hasler, Mathias , title =. 2021 , institution =. doi:10.2139/ssrn.3886984 , note =
-
[17]
2026 , eprint=
Evaluation and Benchmarking Suite for Financial Large Language Models and Agents , author=. 2026 , eprint=
2026
-
[18]
International Journal of Emerging Markets , year =
Mishra, Shibi and Singh, Shveta and Misra, Alok Misra , title =. International Journal of Emerging Markets , year =. doi:10.1108/IJOEM-02-2024-0279 , url =
-
[19]
SSRN Electronic Journal , year =
Batra, Devesh and Hamill, Conor and Hartley, John and Okhrati, Ramin and Seddon, Dale and Miller, Harvey and Khraishi, Raad and Cowan, Greig , title =. SSRN Electronic Journal , year =. doi:10.2139/ssrn.5381584 , url =
-
[20]
2024 , howpublished =
Introducing the Model Context Protocol , author =. 2024 , howpublished =
2024
-
[21]
2025 , howpublished =
Model Context Protocol Specification , author =. 2025 , howpublished =
2025
-
[22]
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models , author =. arXiv preprint arXiv:2210.03629 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[23]
Toolformer: Language Models Can Teach Themselves to Use Tools
Toolformer: Language Models Can Teach Themselves to Use Tools , author =. arXiv preprint arXiv:2302.04761 , year =
work page internal anchor Pith review arXiv
-
[24]
Auto-GPT for Online Decision Making: Benchmarks and Additional Opinions , author =. arXiv preprint arXiv:2306.02224 , year =
-
[25]
AgentBench: Evaluating LLMs as Agents
AgentBench: Evaluating LLMs as Agents , author =. arXiv preprint arXiv:2308.03688 , year =
work page internal anchor Pith review arXiv
-
[26]
Proceedings of EMNLP 2025 Industry Track , year =
A Multi-Agent Framework for Quantitative Finance: An Application to Portfolio Management Analytics , author =. Proceedings of EMNLP 2025 Industry Track , year =
2025
-
[27]
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents , author =. arXiv preprint arXiv:2403.02691 , year =
work page internal anchor Pith review arXiv
-
[28]
Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions
Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions , author =. arXiv preprint arXiv:2503.23278 , year =
work page internal anchor Pith review arXiv
-
[29]
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , author =. arXiv preprint arXiv:2005.11401 , year =
work page internal anchor Pith review arXiv 2005
-
[30]
Billion-scale similarity search with GPUs
Billion-scale similarity search with GPUs , author =. arXiv preprint arXiv:1702.08734 , year =
-
[31]
Proceedings of EMNLP-IJCNLP , year =
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , author =. Proceedings of EMNLP-IJCNLP , year =
-
[32]
Journal of Financial Economics , volume =
Common risk factors in the returns on stocks and bonds , author =. Journal of Financial Economics , volume =. 1993 , doi =
1993
-
[33]
Journal of Financial Economics , volume =
A five-factor asset pricing model , author =. Journal of Financial Economics , volume =. 2015 , doi =
2015
-
[34]
Journal of Political Economy , volume =
Risk, Return, and Equilibrium: Empirical Tests , author =. Journal of Political Economy , volume =. 1973 , doi =
1973
-
[35]
The Review of Financial Studies , volume =
and the Cross-Section of Expected Returns , author =. The Review of Financial Studies , volume =. 2016 , doi =
2016
-
[36]
The Review of Financial Studies , volume =
Replicating Anomalies , author =. The Review of Financial Studies , volume =. 2020 , doi =
2020
-
[37]
2026 , howpublished =
PydanticAI: GenAI Agent Framework, the Pydantic way , author =. 2026 , howpublished =
2026
-
[38]
2026 , howpublished =
Pydantic Logfire Documentation , author =. 2026 , howpublished =
2026
-
[40]
Sapkota, Ranjan and Roumeliotis, Konstantinos I. and Karkee, Manoj , year=. AI Agents vs. Agentic AI: A Conceptual taxonomy, applications and challenges , volume=. doi:10.1016/j.inffus.2025.103599 , journal=
-
[41]
2024 , eprint=
MemGPT: Towards LLMs as Operating Systems , author=. 2024 , eprint=
2024
-
[42]
SSRN Electronic Journal , year =
Assaying Anomalies , author =. SSRN Electronic Journal , year =. doi:10.2139/ssrn.4723712 , note =
-
[43]
The Journal of Portfolio Management , volume =
Factor Timing with Cross-Sectional and Time-Series Predictors , author =. The Journal of Portfolio Management , volume =. 2017 , doi =
2017
-
[44]
Boyd and Enzo Busseti and Steven Diamond and Ronald N
Multi-Period Trading via Convex Optimization , author =. Foundations and Trends in Optimization , volume =. 2016 , publisher =. doi:10.1561/2400000023 , url =
-
[45]
AI-Powered (Finance) Scholarship , author =. 2025 , month =. doi:10.3386/w33363 , url =
-
[46]
Journal of Financial Economics , volume =
Comparing Factor Models with Price-Impact Costs , author =. Journal of Financial Economics , volume =. 2024 , doi =
2024
-
[47]
arXiv preprint arXiv:2407.17866 , year =
Financial Statement Analysis with Large Language Models , author =. arXiv preprint arXiv:2407.17866 , year =
-
[48]
Agentic AI: A Conceptual Taxonomy, Applications and Challenges , author=
AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges , author=. 2025 , eprint=
2025
-
[49]
2025 , eprint=
Querying Databases with Function Calling , author=. 2025 , eprint=
2025
-
[50]
doi:10.1016/j.cell.2024.09.022
Empowering biomedical discovery with AI agents , author =. Cell , volume =. 2024 , month =. doi:10.1016/j.cell.2024.09.022 , url =
-
[51]
Alford, Andrew and Jones, Robert and Lim, Terence , title =. The Theory and Practice of Investment Management: Asset Allocation, Valuation, Portfolio Construction, and Strategies , editor =. 2011 , chapter =. doi:10.1002/9781118267028.ch11 , url =
-
[52]
The Journal of Finance , year =
Jensen, Theis Ingerslev and Kelly, Bryan and Pedersen, Lasse Heje , title =. The Journal of Finance , year =
-
[53]
Trading Costs and Returns for U.S. Equities: Estimating Effective Costs from Daily Data , author =. The Journal of Finance , volume =. 2009 , month =. doi:10.1111/j.1540-6261.2009.01469.x , url =
-
[54]
Journal of Financial and Quantitative Analysis , volume =
Zeroing In on the Expected Returns of Anomalies , author =. Journal of Financial and Quantitative Analysis , volume =. 2023 , doi =
2023
-
[55]
Model Comparison with Transaction Costs , author =. Journal of Finance , volume =. 2023 , month =. doi:10.1111/jofi.13225 , url =
-
[56]
SSRN Electronic Journal , year =
Assaying Anomalies , author =. SSRN Electronic Journal , year =. doi:10.2139/ssrn.4338007 , url =
-
[57]
The Virtual Lab of AI Agents Designs New SARS-CoV-2 Nanobodies , author =. Nature , year =. doi:10.1038/s41586-025-09442-9 , url =
-
[58]
A Multifactor Perspective on Volatility-Managed Portfolios , author =. Journal of Finance , volume =. 2024 , month =. doi:10.1111/jofi.13395 , url =
-
[59]
Journal of Financial Economics , volume =
Show Me the Money: The Monetary Policy Risk Premium , author =. Journal of Financial Economics , volume =. 2020 , month =. doi:10.1016/j.jfineco.2019.06.012 , url =
-
[60]
BloombergGPT: A Large Language Model for Finance
BloombergGPT: A Large Language Model for Finance , author =. arXiv preprint arXiv:2303.17564 , year =
work page internal anchor Pith review arXiv
-
[61]
arXiv preprint arXiv:2306.11025 , year =
Temporal Data Meets LLM – Explainable Financial Time Series Forecasting , author =. arXiv preprint arXiv:2306.11025 , year =
-
[62]
URL https: //doi.org/10.1145/3604237.3626869
Large Language Models in Finance: A Survey , author =. Proceedings of the 4th ACM International Conference on AI in Finance (ICAIF '23) , pages =. 2023 , publisher =. doi:10.1145/3604237.3626869 , url =
-
[63]
Revolutionizing Finance with LLMs: An Overview of Applications and Insights , author =. arXiv preprint arXiv:2401.11641 , year =
-
[64]
A Survey of Large Language Models in Finance (FinLLMs),
A Survey of Large Language Models in Finance (FinLLMs) , author =. arXiv preprint arXiv:2402.02315 , year =
-
[65]
Xiaoning Dong, Wenbo Hu, Wei Xu, and Tianxing He
Large Language Model Agent in Financial Trading: A Survey , author =. arXiv preprint arXiv:2408.06361 , year =
-
[66]
Chat Bankman-Fried: an Exploration of
Biancotti, Claudia and Camassa, Carolina and Coletta, Andrea and Giudice, Oliver and Glielmo, Aldo , booktitle =. Chat Bankman-Fried: an Exploration of. 2025 , address =
2025
-
[67]
INVESTORBENCH: A Benchmark for Financial Decision-Making Tasks with LLM-based Agent , author =. arXiv preprint arXiv:2412.18174 , year =
-
[68]
Mar- ketsenseai 2.0: Enhancing stock analysis through llm agents,
MarketSenseAI 2.0: Enhancing Stock Analysis through LLM Agents , author =. arXiv preprint arXiv:2502.00415 , year =
-
[69]
Position: Standard Benchmarks Fail – LLM Agents Present Overlooked Risks for Financial Applications , author =. arXiv preprint arXiv:2502.15865 , year =
-
[70]
Chen, Andrew Y. and Tom Zimmermann , title =. Critical Finance Review , volume =. 2022 , month =. doi:10.1561/104.00000112 , url =
-
[71]
Advances in Neural Information Processing Systems , volume=
Language Models are Few-Shot Learners , author=. Advances in Neural Information Processing Systems , volume=. 2020 , url=
2020
-
[72]
Advances in Neural Information Processing Systems , volume=
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , author=. Advances in Neural Information Processing Systems , volume=. 2022 , url=
2022
-
[73]
Reflexion: Language Agents with Verbal Reinforcement Learning
Reflexion: Language Agents with Verbal Reinforcement Learning , author=. 2023 , eprint=. doi:10.48550/arXiv.2303.11366 , url=
work page internal anchor Pith review doi:10.48550/arxiv.2303.11366 2023
-
[74]
Self-Refine: Iterative Refinement with Self-Feedback
Self-Refine: Iterative Refinement with Self-Feedback , author=. 2023 , eprint=. doi:10.48550/arXiv.2303.17651 , url=
work page internal anchor Pith review doi:10.48550/arxiv.2303.17651 2023
-
[75]
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models , author=. 2023 , eprint=. doi:10.48550/arXiv.2305.04091 , url=
-
[76]
Re- woo: Decoupling reasoning from observations for ef- ficient augmented language models
ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models , author=. 2023 , eprint=. doi:10.48550/arXiv.2305.18323 , url=
-
[77]
The Twelfth International Conference on Learning Representations (ICLR) , year=
RECOMP: Improving Retrieval-Augmented LMs with Context Compression and Selective Augmentation , author=. The Twelfth International Conference on Learning Representations (ICLR) , year=
-
[78]
Transactions of the Association for Computational Linguistics , year=
Lost in the Middle: How Language Models Use Long Contexts , author=. Transactions of the Association for Computational Linguistics , year=
-
[79]
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation , author=. 2023 , eprint=. doi:10.48550/arXiv.2308.08155 , url=
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2308.08155 2023
-
[80]
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework , author=. 2024 , eprint=. doi:10.48550/arXiv.2308.00352 , url=
work page internal anchor Pith review doi:10.48550/arxiv.2308.00352 2024
-
[81]
AgentBench: Evaluating LLMs as Agents
AgentBench: Evaluating LLMs as Agents , author=. 2023 , eprint=. doi:10.48550/arXiv.2308.03688 , url=
work page internal anchor Pith review doi:10.48550/arxiv.2308.03688 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.