CLQT is a new closed-loop, cost-aware benchmark that diagnoses LLM trading agent capabilities through strategy-consistent metrics and hash-verifiable trails rather than outcome rankings.
Suchow, and Khaldoun Khashanah
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 8roles
background 1polarities
background 1representative citing papers
Multi-agent LLM system Agora under Sealed Joint Search conditions produces +1.87 holdout Sharpe on CSI 1000 over a 91-day sealed period, exceeding the best baseline at +1.334 under favorable seed.
FinCAD mitigates parametric look-ahead bias in LLM financial backtesting via learned adversarial prompts and per-entity-date adaptive CAD penalties, cutting memorised-date returns up to 67% while preserving out-of-sample results and raising in/out-of-sample Spearman correlation from 0.779 to 0.846.
SysTradeBench evaluates 17 LLMs on 12 trading strategies, finding over 91.7% code validity but rapid convergence in iterative fixes and a continued need for human oversight on critical strategies.
InKH architecture absorbs complexity into financial LLM agents, cutting latency 83%, token cost 82%, and stale knowledge 97% while raising task quality 0.108 on a 46k-episode synthetic benchmark versus baselines.
RMATS achieves 9.62% maximum drawdown over 561 trading days on 24 assets, outperforming MVO and FinBERT in 3 of 5 geopolitical stress scenarios while underperforming in bull markets.
Reported alpha from end-to-end LLM trading agents does not constitute deployment evidence until it passes structural tests for temporal integrity, frictions, robustness, calibration, execution, and disaggregation.
Reproducibility audit of 30 LLM trading papers shows execution assumptions under-reported relative to agent architectures, illustrated by a 10-equity example where frictions compress returns.
citing papers explorer
-
AI Trading's Alpha Singularity: Emergent Market Reasoning through Agent-to-Agent Self-Evolution
Multi-agent LLM system Agora under Sealed Joint Search conditions produces +1.87 holdout Sharpe on CSI 1000 over a 91-day sealed period, exceeding the best baseline at +1.334 under favorable seed.