From Feedback Loops to Policy Updates: Reinforcement Fine-Tuning for LLM-Based Alpha Factor Discovery
Pith reviewed 2026-05-19 14:39 UTC · model grok-4.3
The pith
Reinforcement fine-tuning converts quantitative evaluations into policy updates so an LLM internalizes alpha factor optimization experience instead of accumulating prompt feedback.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
QuantEvolver is a self-evolving framework that constructs high-quality seed factors, builds diverse seed-time-window training tasks, generates executable Factor DSL expressions, evaluates them through Regime Backtest, and optimizes the Miner LLM with Diversity-Complementarity Reward. High-quality factors are continuously accumulated in a Mined Factor Database that serves as the final discovered factor library. By converting quantitative evaluation results into reinforcement policy updates rather than appending feedback to prompts, the Miner LLM internalizes historical optimization experience through parameter learning.
What carries the argument
Reinforcement fine-tuning that converts executable quantitative evaluation results into policy updates for the Miner LLM.
If this is right
- Consistently improves the primary evaluation metric of each task over existing LLM-based alpha factor discovery baselines.
- Produces higher-quality and more complementary factor pools.
- Avoids context explosion, increased inference cost, and feedback drift that arise from long prompt-level loops.
- Enables continuous accumulation of usable factors in the Mined Factor Database during training.
Where Pith is reading between the lines
- Smaller LLMs may become viable for factor discovery once they learn stable preferences through reinforcement updates rather than depending on the generation stability of very large models.
- The same conversion of quantifiable feedback into policy updates could apply to other automated discovery problems where evaluation metrics exist.
- Diverse regime-based training tasks may improve robustness when deployed on market conditions that differ from those seen in backtests.
- The mined factor library could serve as a reusable asset for downstream portfolio construction or risk modeling.
Load-bearing premise
Converting executable quantitative evaluation results into reinforcement policy updates allows the Miner LLM to internalize historical optimization experience without introducing new biases or failing to generalize beyond the regime backtests used during training.
What would settle it
Evaluate the trained Miner LLM on out-of-sample market data from regimes absent from the seed-time-window training tasks and check whether alpha factor quality or complementarity falls below prompt-based baselines.
Figures
read the original abstract
Modern quantitative trading increasingly relies on systematic models to extract predictive signals from large-scale financial data, where alpha factor discovery plays a central role in transforming market observations into tradable signals. Recent LLM-based methods have shown promise in automating factor generation, but most of them still rely on prompt-level generation--evaluation--feedback loops for iterative optimization. As the loop becomes longer, repeatedly appended historical candidates and feedback can cause context explosion, increase inference cost, dilute useful information, and introduce feedback drift. Moreover, these methods often depend on very large LLMs whose stable generation preferences may lead to structurally similar expressions, redundant candidates, and search stagnation. To address these limitations, we propose \textsc{QuantEvolver}, a self-evolving alpha factor discovery framework based on reinforcement fine-tuning. Instead of accumulating feedback in the prompt, \textsc{QuantEvolver} converts executable quantitative evaluation into policy updates, enabling a Miner LLM to internalize historical optimization experience through parameter learning. Specifically, \textsc{QuantEvolver} constructs high-quality seed factors, builds diverse seed--time-window training tasks, generates executable Factor DSL expressions, evaluates them through Regime Backtest, and optimizes the Miner LLM with Diversity-Complementarity Reward. During training, high-quality factors are continuously accumulated in a Mined Factor Database, which serves as the final discovered factor library. Extensive experiments on three realistic market benchmarks demonstrate the effectiveness of \textsc{QuantEvolver}, which consistently improves the primary evaluation metric of each task over existing LLM-based alpha factor discovery baselines, produces higher-quality and more complementary factor pools.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes QuantEvolver, a self-evolving alpha factor discovery framework that replaces prompt-based feedback loops with reinforcement fine-tuning of a Miner LLM. Executable regime backtest results are converted into policy updates using a Diversity-Complementarity Reward on seed–time-window training tasks; high-quality factors are accumulated in a Mined Factor Database that serves as the final library. The central empirical claim is that this approach yields consistent improvements in the primary evaluation metric of each task over existing LLM-based baselines on three realistic market benchmarks while producing higher-quality and more complementary factor pools.
Significance. If the empirical results prove robust, the work could meaningfully advance automated quantitative factor discovery by mitigating context explosion, inference cost, and search stagnation that arise in long prompt-based loops. The shift from accumulating historical feedback in context to parameter-level internalization via RL is a conceptually clean idea that, if validated, would improve scalability and diversity in LLM-driven alpha generation.
major comments (2)
- [Method (training task construction and reward optimization)] The central claim that policy updates from regime backtests enable the Miner LLM to internalize transferable optimization experience (rather than memorizing historical patterns) is load-bearing yet lacks any described safeguard such as adversarial regime construction, causal regularization, or strict forward-chaining validation. When training tasks are built from specific seed–time-window pairs on historical data, overlap or statistical similarity with the three evaluation benchmarks could produce the reported metric gains through distribution matching rather than genuine discovery.
- [Experiments and results] The experimental claim of consistent primary-metric improvements and higher-quality complementary pools is unsupported by visible details on the exact metrics, chosen baselines, statistical significance tests, data-split protocols, or explicit overfitting controls. Without these, it is impossible to determine whether observed gains exceed what would be expected from database exploitation or regime-specific fitting.
minor comments (2)
- [Abstract] The abstract refers to “three realistic market benchmarks” and “the primary evaluation metric of each task” without naming either; adding these specifics would immediately improve readability and allow readers to assess relevance.
- [Method] Notation for the Factor DSL and the precise definition of the Diversity-Complementarity Reward would benefit from an explicit equation or pseudocode block to avoid ambiguity when readers attempt to reproduce the training objective.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below in a point-by-point manner and indicate the revisions we will make to improve clarity and address the raised concerns.
read point-by-point responses
-
Referee: [Method (training task construction and reward optimization)] The central claim that policy updates from regime backtests enable the Miner LLM to internalize transferable optimization experience (rather than memorizing historical patterns) is load-bearing yet lacks any described safeguard such as adversarial regime construction, causal regularization, or strict forward-chaining validation. When training tasks are built from specific seed–time-window pairs on historical data, overlap or statistical similarity with the three evaluation benchmarks could produce the reported metric gains through distribution matching rather than genuine discovery.
Authors: We appreciate the referee's emphasis on ensuring that observed improvements reflect genuine policy learning rather than data leakage. The manuscript constructs training tasks from diverse seed–time-window pairs explicitly chosen to span distinct market regimes, with the Regime Backtest evaluating executable expressions on forward periods. The Diversity-Complementarity Reward is designed to promote exploration of novel factor structures. However, we acknowledge that the current description of safeguards could be more explicit. In the revised manuscript, we will add a subsection in the Method section detailing the temporal partitioning protocol, including how seed–time-window pairs are selected to avoid overlap with evaluation benchmarks, along with forward-chaining validation steps and regime diversity metrics used during task construction. revision: yes
-
Referee: [Experiments and results] The experimental claim of consistent primary-metric improvements and higher-quality complementary pools is unsupported by visible details on the exact metrics, chosen baselines, statistical significance tests, data-split protocols, or explicit overfitting controls. Without these, it is impossible to determine whether observed gains exceed what would be expected from database exploitation or regime-specific fitting.
Authors: We agree that greater transparency on experimental protocols is essential for validating the claims. The primary metrics are the Information Coefficient (IC) and Sharpe ratio, with baselines consisting of prompt-based LLM methods (e.g., AlphaGen-style loops) and non-LLM approaches such as genetic programming. Statistical significance is evaluated using paired t-tests and bootstrap resampling across multiple random seeds. Data splits follow a strict temporal protocol with training tasks drawn from earlier periods and evaluation on later out-of-sample windows across the three benchmarks, and overfitting is mitigated via validation-set monitoring of the diversity reward and factor novelty. We will expand the Experiments section with these details, including explicit tables for p-values, ablation studies on the reward function, and descriptions of the data-split and control procedures. revision: yes
Circularity Check
No circularity: claims rest on external benchmark experiments
full rationale
The paper's central claims are framed as empirical outcomes from experiments on three realistic market benchmarks, where QuantEvolver improves primary metrics over LLM-based baselines and yields higher-quality complementary factors. The abstract describes converting evaluation results into policy updates for the Miner LLM and using a Diversity-Complementarity Reward, but presents no equations, derivations, or self-referential definitions that reduce these improvements to fitted parameters or inputs by construction. Training tasks are built from seed-time-window pairs and factors are accumulated in a database, yet the reported gains are positioned as results of external comparative evaluation rather than tautological renaming or self-citation chains. This structure keeps the derivation self-contained against benchmarks, consistent with a non-circular finding.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Regime backtests provide reliable signals for training an LLM to generate generalizable alpha factors.
invented entities (1)
-
Miner LLM
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Instead of accumulating feedback in the prompt, QUANTEVOLVER converts executable quantitative evaluation into policy updates, enabling a Miner LLM to internalize historical optimization experience through parameter learning... optimizes the Miner LLM with Diversity-Complementarity Reward.
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
DiCo Reward... encourages the policy to generate factors that are not only predictive, but also structurally diverse, behaviorally distinct, and complementary to existing candidates.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
T. Zhang, Y . Li, Y . Jin, and J. Li, “Autoalpha: an efficient hierarchical evolutionary algorithm for mining alpha factors in quantitative invest- ment,” arXiv preprint arXiv:2002.08245, 2020
-
[2]
Alpha mining and enhancing via warm start genetic programming for quantitative investment,
W. Ren, Y . Qin, and Y . Li, “Alpha mining and enhancing via warm start genetic programming for quantitative investment,” arXiv preprint arXiv:2412.00896, 2024
-
[3]
Z. Kakushadze, “101 formulaic alphas,” Wilmott, vol. 2016, no. 84, pp. 72–81, 2016
work page 2016
-
[4]
Multiple regression genetic programming,
I. Arnaldo, K. Krawiec, and U.-M. O’Reilly, “Multiple regression genetic programming,” in Proceedings of the 2014 annual conference on genetic and evolutionary computation, 2014, pp. 879–886
work page 2014
-
[5]
Alpha discovery via grammar-guided learning and search,
H. Yang, D. Hao, Z. Wang, Q. Shi, and X. Li, “Alpha discovery via grammar-guided learning and search,” arXiv preprint arXiv:2601.22119, 2026
-
[6]
Riskminer: Discovering formulaic alphas via risk seeking monte carlo tree search,
T. Ren, R. Zhou, J. Jiang, J. Liang, Q. Wang, and Y . Peng, “Riskminer: Discovering formulaic alphas via risk seeking monte carlo tree search,” in Proceedings of the 5th ACM International Conference on AI in Finance, 2024, pp. 752–760
work page 2024
-
[7]
Generating synergistic formulaic alpha collections via reinforcement learning,
S. Yu, H. Xue, X. Ao, F. Pan, J. He, D. Tu, and Q. He, “Generating synergistic formulaic alpha collections via reinforcement learning,” in Proceedings of the 29th ACM SIGKDD conference on knowledge discovery and data mining, 2023, pp. 5476–5486
work page 2023
-
[8]
F. Xu, Y . Yin, X. Zhang, T. Liu, S. Jiang, and Z. Zhang, “Alpha2: Discovering logical formulaic alphas using deep reinforcement learning,” arXiv preprint arXiv:2406.16505, 2024
-
[9]
Alphaqcm: Alpha discovery in finance with distribu- tional reinforcement learning,
Z. Zhu and K. Zhu, “Alphaqcm: Alpha discovery in finance with distribu- tional reinforcement learning,” in Forty-second International Conference on Machine Learning, 2025
work page 2025
-
[10]
Alphaforge: A framework to mine and dynamically combine formulaic alpha factors,
H. Shi, W. Song, X. Zhang, J. Shi, C. Luo, X. Ao, H. Arian, and L. A. Seco, “Alphaforge: A framework to mine and dynamically combine formulaic alpha factors,” in Proceedings of the AAAI conference on artificial intelligence, vol. 39, no. 12, 2025, pp. 12 524–12 532
work page 2025
-
[11]
Alphasage: Structure-aware alpha mining via gflownets for robust exploration,
B. Chen, H. Ding, N. Shen, J. Huang, T. Guo, L. Liu, and M. Zhang, “Alphasage: Structure-aware alpha mining via gflownets for robust exploration,” arXiv preprint arXiv:2509.25055, 2025
work page internal anchor Pith review arXiv 2025
-
[12]
A survey of aiops in the era of large language models,
L. Zhang, T. Jia, M. Jia, Y . Wu, A. Liu, Y . Yang, Z. Wu, X. Hu, P. Yu, and Y . Li, “A survey of aiops in the era of large language models,”ACM Computing Surveys, 2025
work page 2025
-
[13]
E-log: Fine-grained elastic log-based anomaly detection and diagnosis for databases,
L. Zhang, T. Jia, X. Tan, X. Huang, M. Jia, H. Liu, Z. Wu, and Y . Li, “E-log: Fine-grained elastic log-based anomaly detection and diagnosis for databases,” IEEE Transactions on Services Computing, 2025
work page 2025
-
[14]
L. Zhang, T. Jia, M. Jia, H. Liu, Y . Yang, Z. Wu, and Y . Li, “Towards close-to-zero runtime collection overhead: Raft-based anomaly diagnosis on system faults for distributed storage system,” IEEE Transactions on Services Computing, 2024
work page 2024
-
[15]
Multivariate log- based anomaly detection for distributed database,
L. Zhang, T. Jia, M. Jia, Y . Li, Y . Yang, and Z. Wu, “Multivariate log- based anomaly detection for distributed database,” in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024, pp. 4256–4267
work page 2024
-
[16]
Reducing events to augment log-based anomaly detection models: An empirical study,
L. Zhang, T. Jia, K. Wang, M. Jia, Y . Yang, and Y . Li, “Reducing events to augment log-based anomaly detection models: An empirical study,” in Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, 2024, pp. 538– 548
work page 2024
-
[17]
Scalalog: Scalable log-based failure diagnosis using llm,
L. Zhang, T. Jia, M. Jia, Y . Wu, H. Liu, and Y . Li, “Scalalog: Scalable log-based failure diagnosis using llm,” in ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025, pp. 1–5
work page 2025
-
[18]
Agentfm: Role-aware failure management for distributed databases with llm-driven multi-agents,
L. Zhang, Y . Zhai, T. Jia, X. Huang, C. Duan, and Y . Li, “Agentfm: Role-aware failure management for distributed databases with llm-driven multi-agents,” arXiv preprint arXiv:2504.06614, 2025
-
[19]
L. Zhang, Y . Zhai, T. Jia, C. Duan, S. Yu, J. Gao, B. Ding, Z. Wu, and Y . Li, “Thinkfl: Self-refining failure localization for microservice sys- tems via reinforcement fine-tuning,” arXiv preprint arXiv:2504.18776, 2025
-
[20]
Agentic Memory Enhanced Recursive Reasoning for Root Cause Localization in Microservices,
L. Zhang, T. Jia, Y . Zhai, L. Pan, C. Duan, M. He, M. Jia, and Y . Li, “Agentic memory enhanced recursive reasoning for root cause localization in microservices,” arXiv preprint arXiv:2601.02732, 2026
-
[21]
Logdb: Multivariate log-based failure diagnosis for distributed databases (extended from multilog),
L. Zhang, T. Jia, M. Jia, and Y . Li, “Logdb: Multivariate log-based failure diagnosis for distributed databases (extended from multilog),” arXiv preprint arXiv:2505.01676, 2025
-
[22]
L. Zhang, T. Jia, M. Jia, Y . Wu, H. Liu, and Y . Li, “Xraglog: A resource- efficient and context-aware log-based anomaly detection method using retrieval-augmented generation,” inAAAI 2025 Workshop on Preventing and Detecting LLM Misinformation (PDLM), 2025
work page 2025
-
[23]
L. Zhang, L. Fang, C. Duan, M. He, L. Pan, P. Xiao, S. Huang, Y . Zhai, X. Hu, P. S. Yu et al., “A survey on parallel text generation: From parallel decoding to diffusion language models,” arXiv preprint arXiv:2508.08712, 2025
-
[24]
Time-tired compaction: An elastic compaction scheme for lsm-tree based time-series database,
L.-Z. Zhang, X.-D. Huang, Y .-K. Wang, J.-L. Qiao, S.-X. Song, and J.- M. Wang, “Time-tired compaction: An elastic compaction scheme for lsm-tree based time-series database,”Advanced Engineering Informatics, vol. 59, p. 102224, 2024
work page 2024
-
[25]
Separation or not: On handing out-of-order time-series data in leveled lsm-tree,
Y . Kang, X. Huang, S. Song, L. Zhang, J. Qiao, C. Wang, J. Wang, and J. Feinauer, “Separation or not: On handing out-of-order time-series data in leveled lsm-tree,” in2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 2022, pp. 3340–3352
work page 2022
-
[26]
L. Zhang, T. Jia, K. Wang, W. Hong, C. Duan, M. He, and Y . Li, “Adaptive root cause localization for microservice systems with multi- agent recursion-of-thought,” arXiv preprint arXiv:2508.20370, 2025
-
[27]
H. Liu, Y . Ma, X. Huang, L. Zhang, T. Jia, and Y . Li, “Ora: Job runtime prediction for high-performance computing platforms using the online retrieval-augmented language model,” in Proceedings of the 39th ACM International Conference on Supercomputing, 2025, pp. 884–894
work page 2025
-
[28]
Microremed: Benchmarking llms in microservices remediation,
L. Zhang, Y . Zhai, T. Jia, C. Duan, M. He, L. Pan, Z. Liu, B. Ding, and Y . Li, “Microremed: Benchmarking llms in microservices remediation,” arXiv preprint arXiv:2511.01166, 2025
-
[29]
arXiv preprint arXiv:2508.07173 , year=
L. Pan, Z. Fu, Y . Zhai, S. Tao, S. Guan, S. Huang, L. Zhang, Z. Liu, B. Ding, F. Henry et al., “Omni-safetybench: A benchmark for safety evaluation of audio-visual large language models,” arXiv preprint arXiv:2508.07173, 2025
-
[30]
Walk the talk: Is your log-based software reliability maintenance system really reliable?
M. He, T. Jia, C. Duan, P. Xiao, L. Zhang, K. Wang, Y . Wu, Y . Li, and G. Huang, “Walk the talk: Is your log-based software reliability maintenance system really reliable?” arXiv preprint arXiv:2509.24352, 2025
-
[31]
d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models
L. Pan, S. Tao, Y . Zhai, Z. Fu, L. Fang, M. He, L. Zhang, Z. Liu, B. Ding, A. Liu et al., “d-treerpo: Towards more reliable policy optimization for diffusion language models,” arXiv preprint arXiv:2512.09675, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[32]
Cslparser: A collaborative framework using small and large language models for log parsing,
W. Hong, Y . Wu, L. Zhang, C. Duan, P. Xiao, M. He, X. Yang, and Y . Li, “Cslparser: A collaborative framework using small and large language models for log parsing,” in 2025 IEEE 36th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 2025, pp. 61–72
work page 2025
-
[33]
United we stand: Towards end-to-end log- based fault diagnosis via interactive multi-task learning,
M. He, C. Duan, P. Xiao, T. Jia, S. Yu, L. Zhang, W. Hong, J. Han, Y . Wu, Y . Li et al., “United we stand: Towards end-to-end log- based fault diagnosis via interactive multi-task learning,” arXiv preprint arXiv:2509.24364, 2025. 13
-
[34]
L. Zhang, T. Jia, Y . Zhai, L. Pan, C. Duan, M. He, P. Xiao, and Y . Li, “Hypothesize-then-verify: Speculative root cause analysis for microser- vices with pathwise parallelism,” arXiv preprint arXiv:2601.02736, 2026
-
[35]
X. Huang, H. Liu, Y . Wu, L. Zhang, T. Jia, Y . Li, and Z. Wu, “Uda-rcl: Unsupervised domain adaptation for microservice root cause localization utilizing multimodal data,” IEEE Transactions on Services Computing, 2025
work page 2025
-
[36]
Aaad: Asynchronous inter-variable relationship-aware anomaly detection for multivariate time series,
H. Liu, X. Huang, M. Jia, L. Zhang, T. Jia, Z. Wu, and Y . Li, “Aaad: Asynchronous inter-variable relationship-aware anomaly detection for multivariate time series,” in 2025 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2025, pp. 1–6
work page 2025
-
[37]
Logaction: Consistent cross-system anomaly detection through logs via active domain adaptation,
C. Duan, M. He, P. Xiao, T. Jia, X. Zhang, Z. Zhong, X. Luo, Y . Niu, L. Zhang, S. Yu et al., “Logaction: Consistent cross-system anomaly detection through logs via active domain adaptation,” in 2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2025, pp. 700–712
work page 2025
-
[38]
Runtimeslicer: Towards generalizable unified runtime state representation for failure management,
L. Zhang, T. Jia, W. Hong, M. Wang, C. Duan, M. He, R. Wang, X. Peng, M. Wang, G. Zhang et al., “Runtimeslicer: Towards generalizable unified runtime state representation for failure management,” arXiv preprint arXiv:2603.21495, 2026
-
[39]
Efficient failure management for multi-agent systems with reasoning trace representation,
L. Zhang, T. Jia, M. Wang, W. Hong, C. Duan, M. He, R. Wang, X. Peng, M. Wang, G. Zhang et al., “Efficient failure management for multi-agent systems with reasoning trace representation,” arXiv preprint arXiv:2603.21522, 2026
-
[40]
L. Zhang, Y . Zhai, T. Jia, M. He, C. Duan, Z. Liu, B. Ding, and Y . Li, “E2e-reme: Towards end-to-end microservices auto-remediation via experience-simulation reinforcement fine-tuning,” arXiv preprint arXiv:2604.11094, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[41]
P. Xiao, C. Duan, M. He, T. Jia, Y . Wu, J. Xu, G. Gao, L. Zhang, W. Hong, Y . Li et al., “Coorlog: Efficient-generalizable log anomaly detection via adaptive coordinator in software evolution,” in 2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2025, pp. 1119–1131
work page 2025
-
[42]
Towards Robust LLM Post-Training: Automatic Failure Management for Reinforcement Fine-Tuning
L. Zhang, T. Jia, Y . Zhai, L. Fang, K. Zheng, H. Liu, X. Huang, P. S. Yu, and Y . Li, “Towards robust llm post-training: Automatic failure manage- ment for reinforcement fine-tuning,” arXiv preprint arXiv:2605.04431, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[43]
Alpha- gpt: Human-ai interactive alpha mining for quantitative investment,
S. Wang, H. Yuan, L. Zhou, L. Ni, H. Y . Shum, and J. Guo, “Alpha- gpt: Human-ai interactive alpha mining for quantitative investment,” in Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2025, pp. 196–206
work page 2025
-
[44]
Z. Li, R. Song, C. Sun, W. Xu, Z. Yu, and J.-R. Wen, “Can large language models mine interpretable financial factors more effectively? a neural-symbolic factor mining agent model,” in Findings of the Association for Computational Linguistics: ACL 2024, 2024, pp. 3891– 3902
work page 2024
-
[45]
Quantagent: Seeking holy grail in trading by self-improving large language model,
S. Wang, H. Yuan, L. M. Ni, and J. Guo, “Quantagent: Seeking holy grail in trading by self-improving large language model,” arXiv preprint arXiv:2402.03755, 2024
-
[46]
Al- phabench: Benchmarking large language models in formulaic alpha factor mining,
H. Luo, H. T. Ko, J. Chen, D. Sun, Y . Zhang, and C. Liu, “Al- phabench: Benchmarking large language models in formulaic alpha factor mining,” in The Fourteenth International Conference on Learning Representations
-
[47]
Alphaagent: Llm-driven alpha mining with regularized exploration to counteract alpha decay,
Z. Tang, Z. Chen, J. Yang, J. Mai, Y . Zheng, K. Wang, J. Chen, and L. Lin, “Alphaagent: Llm-driven alpha mining with regularized exploration to counteract alpha decay,” in Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V .2, 2025, pp. 2813–2822
work page 2025
-
[48]
Navigating the alpha jungle: An llm-powered mcts framework for formulaic alpha factor mining,
Y . Shi, Y . Duan, and J. Li, “Navigating the alpha jungle: An llm-powered mcts framework for formulaic alpha factor mining,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 2, 2026, pp. 997–1005
work page 2026
-
[49]
R&d-agent- quant: a multi-agent framework for data-centric factors and model joint optimization,
Y . Li, X. Yang, X. Yang, X. Wang, W. Liu, and J. Bian, “R&d-agent- quant: a multi-agent framework for data-centric factors and model joint optimization,” Advances in Neural Information Processing Systems, vol. 38, 2026
work page 2026
-
[50]
QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining
J. Han, S. Zhang, W. Li, Z. Yang, Y . Dong, T. Hu, J. Yuan, X. Yu, Y . Zhu, F. Lou et al., “Quantaalpha: An evolutionary framework for llm-driven alpha mining,” arXiv preprint arXiv:2602.07085, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[51]
Deep reinforcement learning from human preferences,
P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei, “Deep reinforcement learning from human preferences,” Advances in neural information processing systems, vol. 30, 2017
work page 2017
-
[52]
Fine-Tuning Language Models from Human Preferences
D. M. Ziegler, N. Stiennon, J. Wu, T. B. Brown, A. Radford, D. Amodei, P. Christiano, and G. Irving, “Fine-tuning language models from human preferences,” arXiv preprint arXiv:1909.08593, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1909
-
[53]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox- imal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[54]
Direct preference optimization: Your language model is secretly a reward model,
R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn, “Direct preference optimization: Your language model is secretly a reward model,” Advances in Neural Information Processing Systems, vol. 36, pp. 53 728–53 741, 2023
work page 2023
-
[55]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y . Li, Y . Wuet al., “Deepseekmath: Pushing the limits of mathematical reasoning in open language models,” arXiv preprint arXiv:2402.03300, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[56]
Y . Dong, F. Wu, K. Zhang, Y . Dai, S. Zhang, W. Ye, S. Chen, and Z.-Q. Cheng, “Large language model agents in finance: A survey bridging research, practice, and real-world deployment,” Findings of the Association for Computational Linguistics: EMNLP, vol. 2025, pp. 17 889–17 907, 2025
work page 2025
-
[57]
Ectsum: A new benchmark dataset for bullet point summarization of long earnings call transcripts,
R. Mukherjee, A. Bohra, A. Banerjee, S. Sharma, M. Hegde, A. Shaikh, S. Shrivastava, K. Dasgupta, N. Ganguly, S. Ghosh et al., “Ectsum: A new benchmark dataset for bullet point summarization of long earnings call transcripts,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022, pp. 10 893–10 906
work page 2022
-
[58]
H. Li, Q. Peng, X. Mou, Y . Wang, Z. Zeng, and M. F. Bashir, “Ab- stractive financial news summarization via transformer-bilstm encoder and graph attention-based decoder,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 3190–3205, 2023
work page 2023
-
[59]
Finred: A dataset for relation extraction in financial domain,
S. Sharma, T. Nayak, A. Bose, A. K. Meena, K. Dasgupta, N. Ganguly, and P. Goyal, “Finred: A dataset for relation extraction in financial domain,” in Companion Proceedings of the Web Conference 2022, 2022, pp. 595–597
work page 2022
-
[60]
Finbert: A pre-trained financial language representation model for financial text mining,
Z. Liu, D. Huang, K. Huang, Z. Li, and J. Zhao, “Finbert: A pre-trained financial language representation model for financial text mining,” in Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence, 2021, pp. 4513–4519
work page 2021
-
[61]
BloombergGPT: A Large Language Model for Finance
S. Wu, O. Irsoy, S. Lu, V . Dabravolski, M. Dredze, S. Gehrmann, P. Kambadur, D. Rosenberg, and G. Mann, “Bloomberggpt: A large language model for finance,” arXiv preprint arXiv:2303.17564, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[62]
Fingpt: Open-source financial large language models,
H. Yang, X.-Y . Liu, and C. D. Wang, “Fingpt: Open-source financial large language models,” arXiv preprint arXiv:2306.06031, 2023
-
[63]
Pixiu: a large language model, instruction data and evalua- tion benchmark for finance,
Q. Xie, W. Han, X. Zhang, Y . Lai, M. Peng, A. Lopez-Lira, and J. Huang, “Pixiu: a large language model, instruction data and evalua- tion benchmark for finance,” in Proceedings of the 37th International Conference on Neural Information Processing Systems, 2023, pp. 33 469–33 484
work page 2023
-
[64]
Y . Yang, Y . Tang, and K. Y . Tam, “Investlm: A large language model for investment using financial domain instruction tuning,” arXiv preprint arXiv:2309.13064, 2023
-
[65]
Fintral: A family of gpt-4 level multimodal financial large language models,
G. Bhatia, H. Cavusoglu, M. Abdul-Mageed et al., “Fintral: A family of gpt-4 level multimodal financial large language models,” in Findings of the Association for Computational Linguistics: ACL 2024, 2024, pp. 13 064–13 087
work page 2024
-
[66]
G. Hu, K. Qin, C. Yuan, M. Peng, A. Lopez-Lira, B. Wang, S. Ana- niadou, J. Huang, and Q. Xie, “No language is an island: Unifying chinese and english in financial large language models, instruction data, and benchmarks,” arXiv preprint arXiv:2403.06249, 2024
-
[67]
Fednlp: an interpretable nlp system to decode federal reserve communications,
J. Lee, H. L. Youn, N. Stevens, J. Poon, and S. C. Han, “Fednlp: an interpretable nlp system to decode federal reserve communications,” in Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval, 2021, pp. 2560–2564
work page 2021
-
[68]
Trillion dollar words: A new financial dataset, task & market analysis,
A. Shah, S. Paturi, and S. Chava, “Trillion dollar words: A new financial dataset, task & market analysis,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (V olume1: Long Papers), 2023, pp. 6664–6679
work page 2023
-
[69]
Impact of news on the commodity market: Dataset and results,
A. Sinha and T. Khandait, “Impact of news on the commodity market: Dataset and results,” in Future of Information and Communication Conference. Springer, 2021, pp. 589–601
work page 2021
-
[70]
Harnessing llms for temporal data-a study on explainable financial time series forecasting,
X. Yu, Z. Chen, and Y . Lu, “Harnessing llms for temporal data-a study on explainable financial time series forecasting,” in Proceedings of the 2023 conference on empirical methods in natural language processing: industry track, 2023, pp. 739–753
work page 2023
-
[71]
FinTSB: A Comprehensive and Practical Benchmark for Financial Time Series Forecasting
Y . Hu, Y . Li, P. Liu, Y . Zhu, N. Li, T. Dai, S.-t. Xia, D. Cheng, and C. Jiang, “Fintsb: A comprehensive and practical benchmark for financial time series forecasting,” arXiv preprint arXiv:2502.18834, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[72]
U. Gupta, “Gpt-investar: Enhancing stock investment strategies through annual report analysis with large language models,” arXiv preprint arXiv:2309.03079, 2023. 14
-
[73]
Finben: A holistic financial benchmark for large language models,
Q. Xie, W. Han, Z. Chen, R. Xiang, X. Zhang, Y . He, M. Xiao, D. Li, Y . Dai, D. Fenget al., “Finben: A holistic financial benchmark for large language models,” Advances in Neural Information Processing Systems, vol. 37, pp. 95 716–95 743, 2024
work page 2024
-
[74]
Investorbench: A benchmark for financial decision-making tasks with llm-based agent,
H. Li, Y . Cao, Y . Yu, S. R. Javaji, Z. Deng, Y . He, Y . Jiang, Z. Zhu, K. Subbalakshmi, J. Huang et al., “Investorbench: A benchmark for financial decision-making tasks with llm-based agent,” in Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (V olume1: Long Papers), 2025, pp. 2509–2525
work page 2025
-
[75]
Strux: An llm for decision- making with structured explanations,
Y . Lu, Y . Hu, H. Foroosh, W. Jin, and F. Liu, “Strux: An llm for decision- making with structured explanations,” in Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (V olume 2: Short Papers), 2025, pp. 131–141
work page 2025
-
[76]
Finmem: A performance-enhanced llm trading agent with layered memory and character design,
Y . Yu, H. Li, Z. Chen, Y . Jiang, Y . Li, J. W. Suchow, D. Zhang, and K. Khashanah, “Finmem: A performance-enhanced llm trading agent with layered memory and character design,” IEEE Transactions on Big Data, 2025
work page 2025
-
[77]
Cfgpt: Chinese financial assistant with large language model,
J. Li, Y . Bian, G. Wang, Y . Lei, D. Cheng, Z. Ding, and C. Jiang, “Cfgpt: Chinese financial assistant with large language model,” arXiv preprint arXiv:2309.10654, 2023
-
[78]
C. Zhang, X. Liu, Z. Zhang, M. Jin, L. Li, Z. Wang, W. Hua, D. Shu, S. Zhu, X. Jin et al., “When ai meets finance (stockagent): Large lan- guage model-based stock trading in simulated real-world environments,” arXiv preprint arXiv:2407.18957, 2024
-
[79]
Tradingagents: Multi-agents llm financial trading framework,
Y . Xiao, E. Sun, D. Luo, and W. Wang, “Tradingagents: Multi-agents llm financial trading framework,” in The First MARW: Multi-Agent AI in the Real World Workshop at AAAI 2025
work page 2025
-
[80]
Convfinqa: Exploring the chain of numerical reasoning in conversational finance question answering,
Z. Chen, S. Li, C. Smiley, Z. Ma, S. Shah, and W. Y . Wang, “Convfinqa: Exploring the chain of numerical reasoning in conversational finance question answering,” inProceedings of the 2022 conference on empirical methods in natural language processing, 2022, pp. 6279–6292
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.