Agent Trading Arena: A Study on Numerical Understanding in LLM-Based Agents

Jiawei Du; Joey Tianyi Zhou; Liang Xie; Tianmi Ma; Wenjie Wang; Wenxin Huang; Xian Zhong

arxiv: 2502.17967 · v2 · pith:QW5MHWYKnew · submitted 2025-02-25 · 💻 cs.LG · cs.AI· cs.CL· cs.MA· q-fin.ST

Agent Trading Arena: A Study on Numerical Understanding in LLM-Based Agents

Tianmi Ma , Jiawei Du , Wenxin Huang , Wenjie Wang , Liang Xie , Xian Zhong , Joey Tianyi Zhou This is my paper

classification 💻 cs.LG cs.AIcs.CLcs.MAq-fin.ST

keywords tradingagentsdatanumericalagentarenalanguagellm-based

0 comments

read the original abstract

Large language models (LLMs) have demonstrated remarkable capabilities in natural language tasks, yet their performance in dynamic, real-world financial environments remains underexplored. Existing approaches are limited to historical backtesting, where trading actions cannot influence market prices and agents train only on static data. To address this limitation, we present the Agent Trading Arena, a virtual zero-sum stock market in which LLM-based agents engage in competitive multi-agent trading and directly impact price dynamics. By simulating realistic bid-ask interactions, our platform enables training in scenarios that closely mirror live markets, thereby narrowing the gap between training and evaluation. Experiments reveal that LLMs struggle with numerical reasoning when given plain-text data, often overfitting to local patterns and recent values. In contrast, chart-based visualizations significantly enhance both numerical reasoning and trading performance. Furthermore, incorporating a reflection module yields additional improvements, especially with visual inputs. Evaluations on NASDAQ and CSI datasets demonstrate the superiority of our method, particularly under high volatility. All code and data are available at https://github.com/wekjsdvnm/Agent-Trading-Arena.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Alpha Illusion: Reported Alpha from LLM Trading Agents Should Not Be Treated as Deployment Evidence
cs.CE 2026-05 accept novelty 5.0

Reported alpha from end-to-end LLM trading agents does not constitute deployment evidence until it passes structural tests for temporal integrity, frictions, robustness, calibration, execution, and disaggregation.
From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review
cs.AI 2025-04 accept novelty 4.0

A survey consolidating benchmarks, agent frameworks, real-world applications, and protocols for LLM-based autonomous agents into a proposed taxonomy with recommendations for future research.