CLQT is a new closed-loop, cost-aware benchmark that diagnoses LLM trading agent capabilities through strategy-consistent metrics and hash-verifiable trails rather than outcome rankings.
Suchow, and Khaldoun Khashanah
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 7roles
background 1polarities
background 1representative citing papers
Multi-agent LLM system Agora under Sealed Joint Search conditions produces +1.87 holdout Sharpe on CSI 1000 over a 91-day sealed period, exceeding the best baseline at +1.334 under favorable seed.
FinCAD mitigates parametric look-ahead bias in LLM financial backtesting via learned adversarial prompts and per-entity-date adaptive CAD penalties, cutting memorised-date returns up to 67% while preserving out-of-sample results and raising in/out-of-sample Spearman correlation from 0.779 to 0.846.
SysTradeBench evaluates 17 LLMs on 12 trading strategies, finding over 91.7% code validity but rapid convergence in iterative fixes and a continued need for human oversight on critical strategies.
InKH architecture absorbs complexity into financial LLM agents, cutting latency 83%, token cost 82%, and stale knowledge 97% while raising task quality 0.108 on a 46k-episode synthetic benchmark versus baselines.
RMATS achieves 9.62% maximum drawdown over 561 trading days on 24 assets, outperforming MVO and FinBERT in 3 of 5 geopolitical stress scenarios while underperforming in bull markets.
Reported alpha from end-to-end LLM trading agents does not constitute deployment evidence until it passes structural tests for temporal integrity, frictions, robustness, calibration, execution, and disaggregation.
citing papers explorer
-
The Alpha Illusion: Reported Alpha from LLM Trading Agents Should Not Be Treated as Deployment Evidence
Reported alpha from end-to-end LLM trading agents does not constitute deployment evidence until it passes structural tests for temporal integrity, frictions, robustness, calibration, execution, and disaggregation.