StockBench: Can LLM agents trade stocks profitably in real-world markets?

Yanxu Chen, Zijun Yao, Yantao Liu, Amy Xin, Jin Ye, Jianing Yu, Lei Hou, Juanzi Li · 2025 · arXiv 2510.02209

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

AutoRedTrader: Autonomous Red Teaming of Trading Agents through Synthetic Misinformation Injection

cs.CE · 2026-05-09 · unverdicted · novelty 7.0

AutoRedTrader generates synthetic financial misinformation via behavioral bias manipulation and agent feedback to red-team LLM trading agents, reaching 69% exposure and 26.67% attack success on Bitcoin data simulations.

Auditing AI Investment Recommendations as Executable Actions

cs.LO · 2026-06-25 · unverdicted · novelty 6.0

Introduces a protocol scoring AI investment advisors on validity under constraints, stability, and agreement with a deterministic baseline, showing agreement often masks invalid actions.

SysTradeBench: An Iterative Build-Test-Patch Benchmark for Strategy-to-Code Trading Systems with Drift-Aware Diagnostics

cs.SE · 2026-04-06 · unverdicted · novelty 6.0

SysTradeBench evaluates 17 LLMs on 12 trading strategies, finding over 91.7% code validity but rapid convergence in iterative fixes and a continued need for human oversight on critical strategies.

EcoGym: Evaluating LLMs for Long-Horizon Plan-and-Execute in Interactive Economies

cs.CL · 2026-02-10 · conditional · novelty 6.0

EcoGym is a new open benchmark with three economic environments that reveals no leading LLM dominates at sustained plan-and-execute decision making across scenarios.

Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application

cs.CL · 2026-06-10 · unverdicted · novelty 5.0

This survey categorizes agentic environments for LLMs by eight attributes and domains, introduces symbolic and neural synthesis paradigms with evaluation, and outlines four agent evolution pathways plus three environment evolution paradigms.

The Alpha Illusion: Reported Alpha from LLM Trading Agents Should Not Be Treated as Deployment Evidence

cs.CE · 2026-05-16 · accept · novelty 5.0

Reported alpha from end-to-end LLM trading agents does not constitute deployment evidence until it passes structural tests for temporal integrity, frictions, robustness, calibration, execution, and disaggregation.

Strat-LLM: Stratified Strategy Alignment for LLM-based Stock Trading with Real-time Multi-Source Signals

cs.AI · 2026-05-07 · unverdicted · novelty 5.0

Strat-LLM demonstrates that LLM trading performance varies by reasoning mode and model scale, with strict alignment reducing drawdowns in downtrends and deep reasoning avoiding small-gain traps.

BacktestBench: Benchmarking Large Language Models for Automated Quantitative Strategy Backtesting

cs.CL · 2026-05-18

citing papers explorer

Showing 5 of 5 citing papers after filters.

AutoRedTrader: Autonomous Red Teaming of Trading Agents through Synthetic Misinformation Injection cs.CE · 2026-05-09 · unverdicted · none · ref 3
AutoRedTrader generates synthetic financial misinformation via behavioral bias manipulation and agent feedback to red-team LLM trading agents, reaching 69% exposure and 26.67% attack success on Bitcoin data simulations.
Auditing AI Investment Recommendations as Executable Actions cs.LO · 2026-06-25 · unverdicted · none · ref 6
Introduces a protocol scoring AI investment advisors on validity under constraints, stability, and agreement with a deterministic baseline, showing agreement often masks invalid actions.
SysTradeBench: An Iterative Build-Test-Patch Benchmark for Strategy-to-Code Trading Systems with Drift-Aware Diagnostics cs.SE · 2026-04-06 · unverdicted · none · ref 28
SysTradeBench evaluates 17 LLMs on 12 trading strategies, finding over 91.7% code validity but rapid convergence in iterative fixes and a continued need for human oversight on critical strategies.
Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application cs.CL · 2026-06-10 · unverdicted · none · ref 160
This survey categorizes agentic environments for LLMs by eight attributes and domains, introduces symbolic and neural synthesis paradigms with evaluation, and outlines four agent evolution pathways plus three environment evolution paradigms.
Strat-LLM: Stratified Strategy Alignment for LLM-based Stock Trading with Real-time Multi-Source Signals cs.AI · 2026-05-07 · unverdicted · none · ref 4
Strat-LLM demonstrates that LLM trading performance varies by reasoning mode and model scale, with strict alignment reducing drawdowns in downtrends and deep reasoning avoiding small-gain traps.

StockBench: Can LLM agents trade stocks profitably in real-world markets?

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer