StockBench: Can LLM agents trade stocks profitably in real-world markets?

Stockbench: Can llm agents trade stocks profitably in real-world markets? , author= · 2025 · arXiv 2510.02209

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

citation-role summary

background 2 dataset 1

citation-polarity summary

background 3

representative citing papers

CLQT: A Closed-Loop, Cost-Aware, Strategy-Consistent Benchmark for Diagnostic Evaluation of LLM Portfolio-Management Agents

cs.AI · 2026-06-29 · unverdicted · novelty 7.0

CLQT is a new closed-loop, cost-aware benchmark that diagnoses LLM trading agent capabilities through strategy-consistent metrics and hash-verifiable trails rather than outcome rankings.

BacktestBench: Benchmarking Large Language Models for Automated Quantitative Strategy Backtesting

cs.CL · 2026-05-18 · unverdicted · novelty 7.0 · 2 refs

Introduces BacktestBench benchmark with 18k QA pairs across four backtesting tasks and evaluates 23 LLMs via the AutoBacktest multi-agent system.

Herculean: An Agentic Benchmark for Financial Intelligence

cs.AI · 2026-05-14 · unverdicted · novelty 7.0

Herculean benchmark shows frontier agents handle trading and market insights better than hedging and auditing workflows that demand state consistency and structured verification.

AutoRedTrader: Autonomous Red Teaming of Trading Agents through Synthetic Misinformation Injection

cs.CE · 2026-05-09 · unverdicted · novelty 7.0

AutoRedTrader generates synthetic financial misinformation via behavioral bias manipulation and agent feedback to red-team LLM trading agents, reaching 69% exposure and 26.67% attack success on Bitcoin data simulations.

Auditing AI Investment Recommendations as Executable Actions

cs.LO · 2026-06-25 · unverdicted · novelty 6.0

Introduces a protocol scoring AI investment advisors on validity under constraints, stability, and agreement with a deterministic baseline, showing agreement often masks invalid actions.

Probing Outcome-Level Resemblance and Mechanism-Level Alignment in LLM Risk Decisions: Evidence from the St. Petersburg Game

cs.CL · 2026-06-03 · unverdicted · novelty 6.0

LLMs produce human-like finite bids in the St. Petersburg game but shift toward rational behavior under controlled prompt changes, indicating surface-level outcome resemblance without mechanism-level alignment.

SysTradeBench: An Iterative Build-Test-Patch Benchmark for Strategy-to-Code Trading Systems with Drift-Aware Diagnostics

cs.SE · 2026-04-06 · unverdicted · novelty 6.0

SysTradeBench evaluates 17 LLMs on 12 trading strategies, finding over 91.7% code validity but rapid convergence in iterative fixes and a continued need for human oversight on critical strategies.

EcoGym: Evaluating LLMs for Long-Horizon Plan-and-Execute in Interactive Economies

cs.CL · 2026-02-10 · conditional · novelty 6.0

EcoGym is a new open benchmark with three economic environments that reveals no leading LLM dominates at sustained plan-and-execute decision making across scenarios.

Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application

cs.CL · 2026-06-10 · unverdicted · novelty 5.0

This survey categorizes agentic environments for LLMs by eight attributes and domains, introduces symbolic and neural synthesis paradigms with evaluation, and outlines four agent evolution pathways plus three environment evolution paradigms.

Paper Agents, Paper Gains: An Empirical Analysis of DeFi Investment Agents

cs.AI · 2026-05-27 · unverdicted · novelty 5.0

Empirical study of DeFi AI agents finds limited autonomous execution, negative median user returns, high gain inequality, and valuations disconnected from treasury fundamentals, with a proposed maturity framework.

The Alpha Illusion: Reported Alpha from LLM Trading Agents Should Not Be Treated as Deployment Evidence

cs.CE · 2026-05-16 · accept · novelty 5.0

Reported alpha from end-to-end LLM trading agents does not constitute deployment evidence until it passes structural tests for temporal integrity, frictions, robustness, calibration, execution, and disaggregation.

Strat-LLM: Stratified Strategy Alignment for LLM-based Stock Trading with Real-time Multi-Source Signals

cs.AI · 2026-05-07 · unverdicted · novelty 5.0

Strat-LLM demonstrates that LLM trading performance varies by reasoning mode and model scale, with strict alignment reducing drawdowns in downtrends and deep reasoning avoiding small-gain traps.

citing papers explorer

Showing 4 of 4 citing papers after filters.

CLQT: A Closed-Loop, Cost-Aware, Strategy-Consistent Benchmark for Diagnostic Evaluation of LLM Portfolio-Management Agents cs.AI · 2026-06-29 · unverdicted · none · ref 1
CLQT is a new closed-loop, cost-aware benchmark that diagnoses LLM trading agent capabilities through strategy-consistent metrics and hash-verifiable trails rather than outcome rankings.
Herculean: An Agentic Benchmark for Financial Intelligence cs.AI · 2026-05-14 · unverdicted · none · ref 45
Herculean benchmark shows frontier agents handle trading and market insights better than hedging and auditing workflows that demand state consistency and structured verification.
Paper Agents, Paper Gains: An Empirical Analysis of DeFi Investment Agents cs.AI · 2026-05-27 · unverdicted · none · ref 8
Empirical study of DeFi AI agents finds limited autonomous execution, negative median user returns, high gain inequality, and valuations disconnected from treasury fundamentals, with a proposed maturity framework.
Strat-LLM: Stratified Strategy Alignment for LLM-based Stock Trading with Real-time Multi-Source Signals cs.AI · 2026-05-07 · unverdicted · none · ref 4
Strat-LLM demonstrates that LLM trading performance varies by reasoning mode and model scale, with strict alignment reducing drawdowns in downtrends and deep reasoning avoiding small-gain traps.

StockBench: Can LLM agents trade stocks profitably in real-world markets?

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer