pith. machine review for the scientific record. sign in

arxiv: 2605.06822 · v1 · submitted 2026-05-07 · 💻 cs.LG

Recognition: no theorem link

SHARP: A Self-Evolving Human-Auditable Rubric Policy for Financial Trading Agents

Authors on Pith no claims yet

Pith reviewed 2026-05-11 00:47 UTC · model grok-4.3

classification 💻 cs.LG
keywords SHARPLLM trading agentsneuro-symbolic policiesself-evolving rubricshuman-auditable rulescredit assignment problemwalk-forward validationfinancial trading
0
0 comments X

The pith

SHARP confines LLM trading agents to explicit condition-action rule rubrics and uses cross-sample attribution to isolate and fix failures, yielding 10-20 point gains for compact models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that unconstrained prompt optimization in LLMs leads to policy drift in low-signal financial markets because stochastic noise prevents reliable credit assignment for delayed P&L rewards. By replacing free-form text mutations with a bounded, human-readable rubric of condition-action rules, SHARP allows an attribution agent to apply cross-sample reasoning across trades and identify exactly which rules caused sub-optimal outcomes. Targeted atomic edits to those rules are then validated through strict walk-forward procedures to prevent overfitting. This structured approach converts generic initial heuristics into robust strategies while preserving the transparency required for institutional oversight. The empirical results across equity sectors show consistent performance lifts for smaller backbones such as GPT-4o-mini.

Core claim

SHARP replaces unbounded free-form prompt optimization with structured symbolic policy optimization that confines the agent's reasoning to a bounded, human-readable rubric of explicit condition-action rules. When sub-optimal trades occur, an attribution agent employs cross-sample reasoning across multiple samples to isolate specific rule failures, enabling targeted atomic policy edits that are subsequently regularized through strict walk-forward validation. Evaluated across three diverse equity sectors and four LLM backbones, this process consistently transforms generic initial heuristics into highly robust strategies.

What carries the argument

The SHARP neuro-symbolic framework: a bounded rubric of explicit condition-action rules combined with an attribution agent that performs cross-sample reasoning to isolate individual rule failures for atomic edits.

If this is right

  • Compact LLMs such as GPT-4o-mini achieve 10-20 percentage point average gains in empirical trading performance.
  • Policies remain structurally transparent and human-auditable, meeting institutional finance requirements.
  • Targeted edits reduce policy drift compared with unstructured optimization in non-stationary markets.
  • The framework operates consistently across multiple equity sectors and LLM backbones.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The cross-sample attribution step could be tested in other delayed-reward domains such as robotic control or sequential game strategies to check transferability.
  • Combining the rubric structure with existing symbolic verification tools might further strengthen guarantees against unintended rule interactions.
  • The emphasis on walk-forward validation suggests a general template for safe self-modification in any agent that receives noisy scalar feedback.

Load-bearing premise

That an attribution agent using cross-sample reasoning can reliably isolate specific rule failures in low signal-to-noise market data without introducing new selection biases or missing interactions among rules.

What would settle it

A controlled test in which known rule defects are injected into simulated trading histories and the attribution agent is measured on whether it correctly identifies and isolates only those defective rules without false positives across varying noise levels.

Figures

Figures reproduced from arXiv: 2605.06822 by Huayu Li, Kashif Rasul, Songzhu Zheng, Wenhui Zhu, Xiwen Chen, Yueyue Deng.

Figure 1
Figure 1. Figure 1: Overcoming the Credit Assignment Prob￾lem in LLM-Based Trading. (Left) Standard self￾improving agents rely on unbounded free-text optimiza￾tion. When facing noisy P&L feedback, they cannot isolate logical errors, leading to untargeted mutations and policy degeneration. (Right) SHARP introduces a structured, human-auditable rubric. Losses are sym￾bolically attributed to specific rules, enabling targeted, at… view at source ↗
Figure 2
Figure 2. Figure 2: SHARP pipeline. Top: training-phase evolution loop. Each round backtests the current rubric, attributes losses to rules, proposes mutations, and validates before accepting. Bottom: at inference, the frozen rubric R∗ and daily market data feed into the LLM analyst to produce trading signals. tractable credit assignment mechanism absent in free-form prompt optimization. We show empirically (Section 4.4; Appe… view at source ↗
Figure 3
Figure 3. Figure 3: Cumulative OOS returns, AI Tech. Evolved rubrics [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: AI Tech diffs. Evolution discovers momen [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Full sector-specific rubric diffs: initial rules ( [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Free-form reflection (A2) on AI Tech, GPT-4.1-mini, window 0. The v0 prompt is fair [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Illustrative example of the attribution agent pipeline. [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
read the original abstract

Large language models (LLMs) are increasingly deployed for autonomous financial trading, a domain requiring continuous adaptation to noisy, non-stationary markets. Existing self-improving agents typically address this through unbounded free-form prompt optimization. However, in low signal-to-noise environments with delayed scalar rewards (P\&L), this unstructured approach exacerbates the fundamental credit assignment problem: optimizers cannot reliably distinguish systematic logic flaws from stochastic market variance, inevitably leading to policy drift. To overcome this bottleneck, we introduce the Self-Evolving Human-Auditable Rubric Policy (SHARP), a neuro-symbolic framework that replaces unconstrained text mutation with structured, symbolic policy optimization. SHARP confines the agent's reasoning to a bounded, human-readable rubric of explicit condition-action rules. When sub-optimal trades occur, an attribution agent employs cross-sample reasoning across multiple samples to isolate specific rule failures. This enables targeted, atomic policy edits that are subsequently regularized through strict walk-forward validation. Evaluated across three diverse equity sectors and four LLM backbones, SHARP consistently transforms generic initial heuristics into highly robust strategies, lifting the empirical performance of compact models by 10 to 20 percentage points on average (e.g., GPT-4o-mini). Ultimately, SHARP demonstrates that LLMs can achieve dynamic and efficient adaptation while significantly enhancing the structural transparency and auditability demanded by institutional finance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces SHARP, a neuro-symbolic framework for self-evolving LLM-based financial trading agents. It replaces unconstrained prompt optimization with a bounded human-auditable rubric of explicit condition-action rules. An attribution agent applies cross-sample reasoning over multiple trajectories to isolate specific rule failures, enabling targeted atomic edits that are regularized via strict walk-forward validation. Evaluated on three equity sectors and four LLM backbones, the approach is claimed to convert generic initial heuristics into robust strategies, yielding average performance lifts of 10-20 percentage points for compact models such as GPT-4o-mini.

Significance. If the attribution mechanism can be shown to reliably isolate rule failures amid market noise, SHARP would provide a concrete advance in transparent, auditable self-improving trading systems. The structured symbolic policy and walk-forward regularization directly target the credit-assignment problem in delayed, low-SNR reward settings, offering a more controllable alternative to free-form LLM optimization while satisfying institutional demands for human oversight.

major comments (2)
  1. [§3.2] §3.2 (Attribution Agent): The central empirical claim of 10-20 pp gains rests on the attribution step correctly diagnosing which condition-action rule caused suboptimal trades. No controlled test of attribution precision (e.g., synthetic failure injection or precision/recall metrics on known rule defects) is reported, leaving open the possibility that observed improvements arise from spurious correlations rather than accurate edits.
  2. [§4] §4 (Evaluation): Walk-forward validation is presented as sufficient regularization, yet it only checks downstream performance and does not audit whether the preceding attribution correctly identified the responsible rule. An incorrect edit can still pass validation if it improves the subsequent window by chance, undermining the causal interpretation of the reported lifts.
minor comments (2)
  1. [Abstract and §4] The abstract and §4 should explicitly state the primary performance metric (e.g., total P&L, Sharpe ratio, or win rate) and the exact baseline definitions used for the 10-20 pp comparison.
  2. [§3] Notation for the rubric (condition-action pairs) and the cross-sample attribution procedure could be formalized with a short pseudocode block or equation to improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments, which highlight important aspects of validating the attribution mechanism. We address each major comment below and will incorporate revisions to provide stronger empirical support for the causal role of the attribution step.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Attribution Agent): The central empirical claim of 10-20 pp gains rests on the attribution step correctly diagnosing which condition-action rule caused suboptimal trades. No controlled test of attribution precision (e.g., synthetic failure injection or precision/recall metrics on known rule defects) is reported, leaving open the possibility that observed improvements arise from spurious correlations rather than accurate edits.

    Authors: We acknowledge that the manuscript does not include a direct controlled test of attribution precision, such as synthetic failure injection with precision/recall metrics. The current evidence relies on consistent end-to-end performance lifts across sectors and backbones, combined with the design of cross-sample reasoning to reduce noise sensitivity. To strengthen the causal interpretation, we will add a new controlled experiment in the revised §3.2. This will inject known rule defects into synthetic trajectories and evaluate the attribution agent's ability to correctly identify them, reporting precision, recall, and accuracy metrics across varying noise levels. revision: yes

  2. Referee: [§4] §4 (Evaluation): Walk-forward validation is presented as sufficient regularization, yet it only checks downstream performance and does not audit whether the preceding attribution correctly identified the responsible rule. An incorrect edit can still pass validation if it improves the subsequent window by chance, undermining the causal interpretation of the reported lifts.

    Authors: We agree that walk-forward validation alone does not directly audit attribution correctness and that chance improvements remain possible. The validation serves to regularize against non-generalizing edits but leaves the attribution step's accuracy as an implicit assumption. In the revision, we will extend §4 to incorporate the synthetic attribution test described above and add a qualitative audit of a random sample of real attribution decisions, documenting whether the isolated rule aligns with observed trade failures. This combined approach will provide direct evidence that performance gains stem from accurate edits rather than spurious correlations. revision: yes

Circularity Check

0 steps flagged

No circularity: SHARP framework is an independent engineering construction with empirical claims

full rationale

The paper introduces SHARP as a neuro-symbolic method that structures LLM trading policies into explicit rubrics, uses an attribution agent for targeted edits, and applies walk-forward validation. No derivation chain, equations, or self-citations are present that reduce the claimed 10-20 pp performance lift to a fitted parameter, self-definition, or renamed input. The performance gains are presented as outcomes of the full pipeline evaluated on external equity data and multiple LLM backbones, without any step that is tautological by construction or that imports uniqueness from prior author work. The attribution mechanism is described as an independent cross-sample reasoning process rather than a reparameterization of the final metric.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on standard domain assumptions about market non-stationarity and LLM reasoning capacity plus the novel engineering components introduced by the authors.

axioms (2)
  • domain assumption Financial markets are noisy, non-stationary environments with delayed scalar rewards (P&L).
    Explicitly stated in the opening paragraph of the abstract as the setting that creates the credit-assignment problem.
  • domain assumption Unbounded free-form prompt optimization exacerbates policy drift in low signal-to-noise regimes.
    Presented as the motivation for replacing text mutation with structured rules.
invented entities (2)
  • Attribution agent no independent evidence
    purpose: Cross-sample reasoning to isolate specific rule failures for targeted edits
    New component introduced by the framework; no external falsifiable prediction supplied in the abstract.
  • Human-auditable rubric of condition-action rules no independent evidence
    purpose: Bounded symbolic policy representation that replaces free-form text
    Core representational invention of SHARP; evidence of utility is the reported performance lift.

pith-pipeline@v0.9.0 · 5562 in / 1555 out tokens · 76409 ms · 2026-05-11T00:47:07.492064+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 19 canonical work pages · 9 internal anchors

  1. [1]

    GPT-4o System Card

    Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card.arXiv preprint arXiv:2410.21276, 2024

  2. [2]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

  3. [3]

    Lopez-Lira and Y

    Alejandro Lopez-Lira and Yuehua Tang. Can chatgpt forecast stock price movements? return predictability and large language models.arXiv preprint arXiv:2304.07619, 2023

  4. [4]

    Mar- ketsenseai 2.0: Enhancing stock analysis through llm agents,

    George Fatouros, Kostas Metaxas, John Soldatos, and Manos Karathanassis. Marketsenseai 2.0: Enhancing stock analysis through llm agents.arXiv preprint arXiv:2502.00415, 2025

  5. [5]

    Finmem: A performance-enhanced llm trading agent with layered memory and character design.IEEE Transactions on Big Data, 2025

    Yangyang Yu, Haohang Li, Zhi Chen, Yuechen Jiang, Yang Li, Jordan W Suchow, Denghui Zhang, and Khaldoun Khashanah. Finmem: A performance-enhanced llm trading agent with layered memory and character design.IEEE Transactions on Big Data, 2025

  6. [6]

    Yangyang Yu, Zhiyuan Yao, Haohang Li, Zhiyang Deng, Yuechen Jiang, Yupeng Cao, Zhi Chen, Jordan W Suchow, Zhenyu Cui, Rong Liu, et al. Fincon: A synthesized llm multi-agent system with conceptual verbal reinforcement for enhanced financial decision making.Advances in Neural Information Processing Systems, 37:137010–137045, 2024

  7. [7]

    Large language models as optimizers

    Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V Le, Denny Zhou, and Xinyun Chen. Large language models as optimizers. InThe Twelfth International Conference on Learning Representations, 2023

  8. [8]

    GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

    Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, et al. Gepa: Reflective prompt evolution can outperform reinforcement learning.arXiv preprint arXiv:2507.19457, 2025

  9. [9]

    ATLAS: Adaptive Trading with LLM AgentS Through Dynamic Prompt Optimization and Multi-Agent Coordination

    Charidimos Papadakis, Angeliki Dimitriou, Giorgos Filandrianos, Maria Lymperaiou, Konstanti- nos Thomas, and Giorgos Stamou. Atlas: Adaptive trading with llm agents through dynamic prompt optimization and multi-agent coordination.arXiv preprint arXiv:2510.15949, 2025. 10

  10. [10]

    The MIT Press, 2018

    Barto Andrew and Sutton Richard S.Reinforcement learning: an introduction. The MIT Press, 2018

  11. [11]

    Auto-rubric: Learning from implicit weights to explicit rubrics for reward modeling.arXiv preprint arXiv:2510.17314, 2025

    Lipeng Xie, Sen Huang, Zhuo Zhang, Anni Zou, Yunpeng Zhai, Dingchao Ren, Kezun Zhang, Haoyuan Hu, Boyin Liu, Haoran Chen, et al. Auto-rubric: Learning from implicit weights to explicit rubrics for reward modeling.arXiv preprint arXiv:2510.17314, 2025

  12. [12]

    Tradinggpt: Multi- agent system with layered memory and distinct characters for enhanced financial trading performance

    Yang Li, Yangyang Yu, Haohang Li, Zhi Chen, and Khaldoun Khashanah. Tradinggpt: Multi- agent system with layered memory and distinct characters for enhanced financial trading performance.arXiv preprint arXiv:2309.03736, 2023

  13. [13]

    A multimodal foundation agent for financial trading: Tool-augmented, diversified, and generalist

    Wentao Zhang, Lingxuan Zhao, Haochong Xia, Shuo Sun, Jiaze Sun, Molei Qin, Xinyi Li, Yuqing Zhao, Yilei Zhao, Xinyu Cai, et al. A multimodal foundation agent for financial trading: Tool-augmented, diversified, and generalist. InProceedings of the 30th acm sigkdd conference on knowledge discovery and data mining, pages 4314–4325, 2024

  14. [14]

    Mountainlion: A multi-modal llm-based agent system for interpretable and adaptive financial trading.arXiv preprint arXiv:2507.20474, 2025

    Siyi Wu, Junqiao Wang, Zhaoyang Guan, Leyi Zhao, Xinyuan Song, Xinyu Ying, Dexu Yu, Jinhao Wang, Hanlin Zhang, Michele Pak, et al. Mountainlion: A multi-modal llm-based agent system for interpretable and adaptive financial trading.arXiv preprint arXiv:2507.20474, 2025

  15. [15]

    Optimizing instructions and demonstrations for multi-stage language model programs

    Krista Opsahl-Ong, Michael J Ryan, Josh Purtell, David Broman, Christopher Potts, Matei Zaharia, and Omar Khattab. Optimizing instructions and demonstrations for multi-stage language model programs. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 9340–9366, 2024

  16. [16]

    QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining

    Jun Han, Shuo Zhang, Wei Li, Zhi Yang, Yifan Dong, Tu Hu, Jialuo Yuan, Xiaomin Yu, Yumo Zhu, Fangqi Lou, et al. Quantaalpha: An evolutionary framework for llm-driven alpha mining. arXiv preprint arXiv:2602.07085, 2026

  17. [17]

    Alphaagent: Llm-driven alpha mining with regularized exploration to counteract alpha decay

    Ziyi Tang, Zechuan Chen, Jiarui Yang, Jiayao Mai, Yongsen Zheng, Keze Wang, Jinrui Chen, and Liang Lin. Alphaagent: Llm-driven alpha mining with regularized exploration to counteract alpha decay. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 2813–2822, 2025

  18. [18]

    arXiv preprint arXiv:2505.15155 (2025)

    Yuante Li, Xu Yang, Xiao Yang, Minrui Xu, Xisen Wang, Weiqing Liu, and Jiang Bian. R&d- agent-quant: a multi-agent framework for data-centric factors and model joint optimization. arXiv preprint arXiv:2505.15155, 2025

  19. [19]

    R&d-agent: Automating data-driven ai solution building through llm-powered automated research, development, and evolution.arXiv e-prints, pages arXiv–2505, 2025

    Xu Yang, Xiao Yang, Shikai Fang, Bowen Xian, Yuante Li, Jian Wang, Minrui Xu, Haoran Pan, Xinpeng Hong, Weiqing Liu, et al. R&d-agent: Automating data-driven ai solution building through llm-powered automated research, development, and evolution.arXiv e-prints, pages arXiv–2505, 2025

  20. [20]

    Aide: Ai-driven exploration in the space of code.arXiv preprint arXiv:2502.13138, 2025

    Zhengyao Jiang, Dominik Schmidt, Dhruv Srikanth, Dixing Xu, Ian Kaplan, Deniss Ja- cenko, and Yuxiang Wu. Aide: Ai-driven exploration in the space of code.arXiv preprint arXiv:2502.13138, 2025

  21. [21]

    AlphaLab: Autonomous Multi-Agent Research Across Optimization Domains with Frontier LLMs

    Brendan R Hogan, Xiwen Chen, James T Wilson, Kashif Rasul, Adel Boyarsky, Thomas Kamei, Anderson Schneider, and Yuriy Nevmyvaka. Alphalab: Autonomous multi-agent research across optimization domains with frontier llms.arXiv preprint arXiv:2604.08590, 2026

  22. [22]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

  23. [23]

    Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains

    Anisha Gunjal, Anthony Wang, Elaine Lau, Vaskar Nath, Yunzhong He, Bing Liu, and Sean Hendryx. Rubrics as rewards: Reinforcement learning beyond verifiable domains.arXiv preprint arXiv:2507.17746, 2025

  24. [24]

    Shen, Xinchi Qiu, Chenxi Whitehouse, Lisa Alazraki, Shashwat Goel, Francesco Barbieri, Timon Willi, Akhil Mathur, and Ilias Leontiadis

    William F Shen, Xinchi Qiu, Chenxi Whitehouse, Lisa Alazraki, Shashwat Goel, Francesco Barbieri, Timon Willi, Akhil Mathur, and Ilias Leontiadis. Rethinking rubric generation for im- proving llm judge and reward modeling for open-ended tasks.arXiv preprint arXiv:2602.05125, 2026. 11

  25. [25]

    Reinforcement learning with rubric anchors.arXiv preprint arXiv:2508.12790,

    Zenan Huang, Yihong Zhuang, Guoshan Lu, Zeyu Qin, Haokai Xu, Tianyu Zhao, Ru Peng, Jiaqi Hu, Zhanming Shen, Xiaomeng Hu, et al. Reinforcement learning with rubric anchors. arXiv preprint arXiv:2508.12790, 2025

  26. [26]

    Finhear: Human expertise and adaptive risk-aware temporal reasoning for financial decision- making.arXiv preprint arXiv:2506.09080, 2025

    Jiaxiang Chen, Mingxi Zou, Zhuo Wang, Qifan Wang, Dongning Sun, Chi Zhang, and Zenglin Xu. Finhear: Human expertise and adaptive risk-aware temporal reasoning for financial decision- making.arXiv preprint arXiv:2506.09080, 2025

  27. [27]

    Qwen2.5 technical report, 2025

    Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li,...

  28. [28]

    The Llama 3 Herd of Models

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

  29. [29]

    Direct estimation of equity market impact.Risk, 18(7):58–62, 2005

    Robert Almgren, Chee Thum, Emmanuel Hauptmann, and Hong Li. Direct estimation of equity market impact.Risk, 18(7):58–62, 2005

  30. [30]

    Giving content to investor sentiment: The role of media in the stock market

    Paul C Tetlock. Giving content to investor sentiment: The role of media in the stock market. The Journal of Finance, 62(3):1139–1168, 2007

  31. [31]

    News versus sentiment: Predicting stock returns from news stories

    Steven L Heston and Nitish Ranjan Sinha. News versus sentiment: Predicting stock returns from news stories. Technical report, 2015

  32. [32]

    Predicting returns with text data

    Zheng Tracy Ke, Bryan T Kelly, and Dacheng Xiu. Predicting returns with text data. Technical report, National Bureau of Economic Research, 2019. Broader Impacts The strongest case for SHARP is not full automation, but a shorter loop betweenobservinga failure mode andeditingthe policy that caused it. Because adaptation is expressed as explicit rule edits, ...

  33. [33]

    The LLM has access to all prices up to T close and all news up to T23:59 UTC

    Decision time: day T close. The LLM has access to all prices up to T close and all news up to T23:59 UTC. 2.Execution: a market-on-open (MOO) order fires at dayT+1open (09:30 ET). 3.Return:r i =Open T+2 i /OpenT+1 i −1for each held positioni. This avoids the common pitfall of using close-to-close returns with same-day signals, which implicitly assumes exe...

  34. [34]

    no real signal

    Random L/S: 1,000 Monte Carlo trials. Each trial randomly selects 5 long and 5 short positions from the same 16-stock universe, rebalanced daily with the same 5 bps cost. Transaction costs 13 account for directional flips (e.g., a stock moving from the long to the short leg counts as two trades: closing the old position and opening the new one). Although ...

  35. [35]

    The lookback k is selected from {1,2,3,5,10,20} days by maximizing Sharpe on the combined train+validation databeforeevaluating on test

    Momentum (tuned): rank tickers by k-day return; long top-5, short bottom-5. The lookback k is selected from {1,2,3,5,10,20} days by maximizing Sharpe on the combined train+validation databeforeevaluating on test. This respects the train/test boundary

  36. [36]

    Same lookback selection procedure as momentum

    Mean Reversion (tuned): rank tickers bynegative k-day return (buy losers, short winners). Same lookback selection procedure as momentum

  37. [37]

    strong AI chip demand; positive trade news

    Static rule (no evolution): the same LLM backbone receives the same news, price context, and macro data, and uses the initial rubric R(0) with no evolution applied. The LLM produces (ˆri, ci) signals guided only by the initial rules, without any sector-specific adaptation from P&L feedback. This is the key ablation: Static and Evo share the same initializ...