Reasoning through Verifiable Forecast Actions: Consistency-Grounded RL for Financial LLMs

Ali Maatouk; Aosong Feng; Haiwen Wang; Harshit Verma; Jialin Chen; Leandros Tassiulas; Rex Ying; Siyi Gu; Yifeng Gao; Yixuan He

arxiv: 2605.21975 · v1 · pith:DBRM676Tnew · submitted 2026-05-21 · 💻 cs.LG

Reasoning through Verifiable Forecast Actions: Consistency-Grounded RL for Financial LLMs

Jialin Chen , Aosong Feng , Harshit Verma , Siyi Gu , Haiwen Wang , Ali Maatouk , Yixuan He , Yifeng Gao

show 2 more authors

Leandros Tassiulas Rex Ying

This is my paper

Pith reviewed 2026-05-22 07:11 UTC · model grok-4.3

classification 💻 cs.LG

keywords StockR1financial LLMsreinforcement learningforecast actionsconsistency rewardstock forecastingfinancial reasoningtime-series

0 comments

The pith

StockR1 uses consistency-grounded RL on verifiable forecast actions to improve LLM financial reasoning by up to 25.9%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Financial markets present challenges of non-stationarity and low signal-to-noise that current LLMs struggle with when trying to combine text reasoning with numerical forecasts. The paper proposes StockR1, which has the LLM emit a forecast action as a structured outlook, conditions a time-series decoder on it to produce future trajectories, and trains the whole thing with RL rewards that include answer validity, forecast accuracy, and consistency with actual dynamics, plus uncertainty reweighting. This setup is evaluated on a 10-year benchmark for both question answering and forecasting tasks. If successful, it shows that making the forecast step verifiable and consistent creates a direct link between reasoning and prediction outcomes. Sympathetic readers would care because it offers a concrete way to ground LLM decisions in observable market behavior rather than leaving them purely textual.

Core claim

By emitting a structured forecast action that conditions distributional future trajectories from a time-series decoder and optimizing with RL for consistency between that action and observed dynamics alongside validity and accuracy, StockR1 unifies language reasoning and temporal prediction in financial tasks.

What carries the argument

The verifiable forecast action, a tool-call structured output that represents qualitative market outlook and serves as conditioning input for the time-series decoder while being evaluated for consistency in the reward.

If this is right

Improved synergy allows LLMs to produce more accurate answers to financial questions by linking them to predicted trajectories.
Consistency rewards help mitigate the mismatch between qualitative reasoning and quantitative results in non-stationary environments.
Uncertainty reweighting enables the model to handle varying levels of market predictability.
Performance gains scale with model size, from 17.7% at 4B to 25.9% at 8B parameters.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach might generalize to other sequential decision domains where actions can be verified against future observations.
Interpretable forecast actions could allow users to inspect and intervene in the model's reasoning process.
Applying the same consistency mechanism to real-time data streams could test adaptability beyond historical benchmarks.

Load-bearing premise

A consistency reward between the model's forecast action and later observed time-series dynamics can be computed reliably and improve reasoning without post-hoc bias or overfitting to the 10-year periods.

What would settle it

Running the trained model on a new set of financial data from a period after the training benchmark and measuring whether the consistency between actions and outcomes still predicts higher reasoning accuracy.

Figures

Figures reproduced from arXiv: 2605.21975 by Ali Maatouk, Aosong Feng, Haiwen Wang, Harshit Verma, Jialin Chen, Leandros Tassiulas, Rex Ying, Siyi Gu, Yifeng Gao, Yixuan He.

**Figure 1.** Figure 1: Pearson correlation and out-ofsample (OOS) R2 for stock return prediction under different context settings. These observations motivate multimodal financial modeling, but existing approaches still lack a principled mechanism for coupling numerical forecasting with language-based reasoning. One line of work augments time-series models with news or fundamentals as auxiliary covariates [10, 11, 12], improv… view at source ↗

**Figure 2.** Figure 2: Overview of STOCK-R1. The model encodes multimodal market context into unified tokens, then uses a policy LLM to generate a structured forecast action that conditions a multichannel time-series decoder. The predicted trajectory and uncertainty are returned to the LLM for grounded financial reasoning, with the whole policy optimized via uncertainty-aware dual-modal rollout. 3.2 Multi-Signal Time Series Enco… view at source ↗

**Figure 3.** Figure 3: Ablation and Analysis of Time-Series Grounding and Action Quality [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: The entropy curve with and without uncertainty-aware reweighting. AnswerEvidence TrajectoryAction Logical Validity Action Success 20 35 50 65 Score (%) w/o RL w/o Rcons Full model 45.9 18.5 38.7 61.7 64.3 37.2 51.4 71.3 65.4 53.8 58.1 70.4 [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Ablation of RL components on numerical grounding metrics Uncertainty-Aware RL Stabilizes Cross-Modal Alignment [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Left: Policy entropy over training steps, comparing training with and without uncertaintyaware reweighting. Right: Reward convergence under different group sizes. For reinforcement learning, we employ VERL with uncertainty-aware reweighting of the learning signal. We select the reward weights through validation search and use α = 1.0, β = 0.5, and γ = 1.0 as the default setting, which empirically yields t… view at source ↗

**Figure 7.** Figure 7: Scalability analysis of pretraining perfor [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

**Figure 8.** Figure 8: Directional Trading Signal Generated by Different Models [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗

read the original abstract

Financial markets are characterized by extreme non-stationarity, low signal-to-noise ratios, and strong dependence on external information such as news, company fundamentals, and macroeconomic signals. Yet, existing approaches either abstract time-series into text or decouple forecasting from language-based reasoning, leading to a fundamental mismatch between qualitative reasoning and quantitative outcomes. To address this, we introduce StockR1, a time-series-enhanced LLM that unifies stock forecasting and financial reasoning through a verifiable forecast action. Based on a tool-call design, the model first emits a forecast action, which is a structured and interpretable representation of its qualitative market outlook. It then invokes a time-series decoder conditioned on this action to generate distributional future trajectories, leading to more informed question answering and financial reasoning. We optimize the full pipeline with reinforcement learning, where rewards jointly reflect answer validity, forecast accuracy, and consistency between generated actions and observed time-series dynamics. In addition, rewards are reweighted by a sample-level uncertainty scalar, encouraging the model to accommodate varying uncertainty in market dynamics. We evaluate StockR1 on financial question answering and stock forecasting over a large-scale 10-year benchmark. Our method consistently outperforms time-series baselines and general-purpose LLMs, improving reasoning accuracy by 17.7% (4B) and 25.9% (8B). These findings demonstrate that structuring the forecast actions establishes a powerful synergy between language reasoning and temporal prediction, enabling LLMs to reason through verifiable, interpretable, and numerically grounded decisions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

StockR1 uses a structured forecast action to link LLM reasoning to a time-series decoder and trains the whole thing with consistency RL, but the reward setup risks fitting the specific 10-year window rather than general market behavior.

read the letter

The main thing to know is that the paper builds StockR1 around a tool-call forecast action that the LLM emits first; this action then conditions a time-series decoder to produce distributional forecasts, and the system is trained end-to-end with RL that rewards answer validity, forecast accuracy, and consistency between the action and later observed data, plus an uncertainty reweighting term.

Referee Report

2 major / 2 minor

Summary. The paper introduces StockR1, a time-series-enhanced LLM that emits structured verifiable forecast actions, conditions a decoder on them to produce distributional trajectories, and optimizes the pipeline end-to-end with RL. Rewards combine answer validity, forecast accuracy, and consistency between the emitted action and subsequently observed time-series dynamics, with sample-level uncertainty reweighting. On a large-scale 10-year financial QA and forecasting benchmark the method reports consistent outperformance over time-series baselines and general-purpose LLMs, with reasoning-accuracy gains of 17.7% (4B) and 25.9% (8B).

Significance. If the consistency reward can be shown to improve generalization rather than overfit to the realized trajectories of one particular non-stationary decade, the work would meaningfully advance the integration of language reasoning with quantitative forecasting in low-signal domains.

major comments (2)

[Abstract and §4] Abstract and experimental section: the headline gains of 17.7% and 25.9% are reported without any description of baseline implementations, hyper-parameter search protocols, statistical significance tests, or the temporal construction and train/test split of the 10-year benchmark. These omissions make the numerical claims impossible to evaluate.
[RL reward formulation] RL objective (reward definition): the consistency term is computed from post-action observed time-series dynamics on the same 10-year window used for final reporting. In a domain the paper itself describes as extremely non-stationary, this construction risks the model learning to match the realized path of that specific decade rather than producing generalizable reasoning. No temporal hold-out, rolling-origin, or regime-shift experiments are described to isolate the effect.

minor comments (2)

[Method] The uncertainty scalar is introduced without an explicit equation or pseudocode; a short derivation or algorithmic box would remove ambiguity.
[Figures] Figure captions for the forecast-action examples should explicitly state the time horizon and the exact consistency metric used so readers can reproduce the visualization.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help strengthen the presentation of our experimental results and the discussion of generalization in non-stationary financial settings. We address each major comment below and indicate the corresponding revisions.

read point-by-point responses

Referee: [Abstract and §4] Abstract and experimental section: the headline gains of 17.7% and 25.9% are reported without any description of baseline implementations, hyper-parameter search protocols, statistical significance tests, or the temporal construction and train/test split of the 10-year benchmark. These omissions make the numerical claims impossible to evaluate.

Authors: We agree that the original manuscript provided insufficient detail on these aspects, making independent evaluation difficult. In the revised version we have expanded §4 and added a dedicated appendix subsection that fully describes: (i) the exact implementations and adaptations of all time-series baselines and general-purpose LLMs, (ii) the hyper-parameter search ranges, grid or random search procedure, and final selected values, (iii) the statistical significance tests (paired t-tests and McNemar’s test) together with reported p-values, and (iv) the precise temporal construction of the 10-year benchmark, including the chronological train/test split chosen to respect temporal causality and avoid future leakage. revision: yes
Referee: [RL reward formulation] RL objective (reward definition): the consistency term is computed from post-action observed time-series dynamics on the same 10-year window used for final reporting. In a domain the paper itself describes as extremely non-stationary, this construction risks the model learning to match the realized path of that specific decade rather than producing generalizable reasoning. No temporal hold-out, rolling-origin, or regime-shift experiments are described to isolate the effect.

Authors: We acknowledge the referee’s concern that the consistency reward, by construction, uses realized trajectories from the evaluation window and could therefore encourage fitting to the particular non-stationary decade rather than learning transferable reasoning. The benchmark already employs a strict forward-chronological split (earlier years for training, later years for testing) to simulate realistic deployment. To further isolate generalization, the revised manuscript now includes rolling-origin validation across multiple starting points and a regime-shift analysis (pre- versus post-major market events). These additional results show that the reported gains remain consistent, supporting that the consistency-grounded objective contributes to robustness rather than decade-specific memorization. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper defines a consistency reward between emitted forecast actions and subsequently observed time-series dynamics as an external training signal within RL optimization. This uses post-action observations as an independent anchor rather than redefining or fitting the target reasoning accuracy itself. No equations or steps reduce the reported reasoning accuracy gains (17.7%/25.9%) to the inputs by construction, nor does the abstract or described pipeline rely on self-citation load-bearing, uniqueness theorems from prior author work, or renaming of known results. The method introduces independent structure via tool-call forecast actions and joint rewards, remaining self-contained against the 10-year benchmark without evident tautological reduction.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

The central claim rests on the existence of a measurable consistency signal between discrete forecast actions and continuous time-series outcomes, plus the assumption that joint RL optimization over validity, accuracy, and consistency yields better reasoning than decoupled training.

free parameters (2)

reward component weights
The joint reward for answer validity, forecast accuracy, and action consistency is described as reweighted by an uncertainty scalar; the relative scaling among these terms is a tunable hyperparameter.
uncertainty scalar
Sample-level uncertainty used to reweight rewards is introduced without a derivation from first principles.

axioms (1)

domain assumption A structured forecast action can be reliably mapped to distributional future trajectories via a conditioned time-series decoder.
This mapping is the core of the tool-call design and is presupposed rather than derived.

invented entities (1)

verifiable forecast action no independent evidence
purpose: Structured, interpretable representation of qualitative market outlook that conditions the time-series decoder.
Presented as a new design element that unifies reasoning and forecasting.

pith-pipeline@v0.9.0 · 5835 in / 1457 out tokens · 59322 ms · 2026-05-22T07:11:09.008511+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

rewards jointly reflect answer validity, forecast accuracy, and consistency between generated actions and observed time-series dynamics

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · 9 internal anchors

[1]

Prophet: forecasting at scale.PeerJ Preprints, 5:e3190v2, 2017

Sean J Taylor and Benjamin Letham. Prophet: forecasting at scale.PeerJ Preprints, 5:e3190v2, 2017

work page 2017
[2]

Stock price prediction using the arima model

Adebiyi A Ariyo, Adewumi O Adewumi, and Charles K Ayo. Stock price prediction using the arima model. In2014 UKSim-AMSS 16th international conference on computer modelling and simulation, pages 106–112. IEEE, 2014

work page 2014
[3]

John Wiley & Sons, 2015

George EP Box, Gwilym M Jenkins, Gregory C Reinsel, and Greta M Ljung.Time series analysis: forecasting and control. John Wiley & Sons, 2015

work page 2015
[4]

Are transformers effective for time series forecasting? InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121–11128, 2023

Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. Are transformers effective for time series forecasting? InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121–11128, 2023

work page 2023
[5]

iTransformer: Inverted Transformers Are Effective for Time Series Forecasting

Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. itransformer: Inverted transformers are effective for time series forecasting.arXiv preprint arXiv:2310.06625, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[6]

Non-stationary transformers: Exploring the stationarity in time series forecasting.Advances in neural information processing systems, 35:9881–9893, 2022

Yong Liu, Haixu Wu, Jianmin Wang, and Mingsheng Long. Non-stationary transformers: Exploring the stationarity in time series forecasting.Advances in neural information processing systems, 35:9881–9893, 2022

work page 2022
[7]

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

Y Nie. A time series is worth 64words: Long-term forecasting with transformers.arXiv preprint arXiv:2211.14730, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[8]

K-nearest neighbour classifiers-a tutorial.ACM computing surveys (CSUR), 54(6):1–25, 2021

Padraig Cunningham and Sarah Jane Delany. K-nearest neighbour classifiers-a tutorial.ACM computing surveys (CSUR), 54(6):1–25, 2021

work page 2021
[9]

Predicting excess stock returns out of sample: Can anything beat the historical average?The Review of Financial Studies, 21(4):1509–1531, 2008

John Y Campbell and Samuel B Thompson. Predicting excess stock returns out of sample: Can anything beat the historical average?The Review of Financial Studies, 21(4):1509–1531, 2008

work page 2008
[10]

Causalstock: Deep end-to-end causal discovery for news-driven multi-stock movement prediction.Advances in Neural Information Processing Systems, 37:47432–47454, 2024

Shuqi Li, Yuebo Sun, Yuxin Lin, Xin Gao, Shuo Shang, and Rui Yan. Causalstock: Deep end-to-end causal discovery for news-driven multi-stock movement prediction.Advances in Neural Information Processing Systems, 37:47432–47454, 2024

work page 2024
[11]

Stock movement prediction from tweets and historical prices

Yumo Xu and Shay B Cohen. Stock movement prediction from tweets and historical prices. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1970–1979, 2018

work page 1970
[12]

Pen: prediction-explanation network to forecast stock price movement with better explainability

Shuqi Li, Weiheng Liao, Yuhan Chen, and Rui Yan. Pen: prediction-explanation network to forecast stock price movement with better explainability. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 5187–5194, 2023

work page 2023
[13]

BloombergGPT: A Large Language Model for Finance

Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, and Gideon Mann. Bloomberggpt: A large language model for finance.arXiv preprint arXiv:2303.17564, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[14]

Fin-r1: A large language model for financial reasoning through reinforcement learning.arXiv preprint arXiv:2503.16252, 2025

Zhaowei Liu, Xin Guo, Fangqi Lou, Lingfeng Zeng, Jinyi Niu, Zixuan Wang, Jiajie Xu, Weige Cai, Ziwei Yang, Xueqian Zhao, et al. Fin-r1: A large language model for financial reasoning through reinforcement learning.arXiv preprint arXiv:2503.16252, 2025

work page arXiv 2025
[15]

Fino1: On the transferability of reasoning-enhanced llms and reinforcement learning to finance.arXiv preprint arXiv:2502.08127, 2025

Lingfei Qian, Weipeng Zhou, Yan Wang, Xueqing Peng, Han Yi, Yilun Zhao, Jimin Huang, Qianqian Xie, and Jian-yun Nie. Fino1: On the transferability of reasoning-enhanced llms and reinforcement learning to finance.arXiv preprint arXiv:2502.08127, 2025

work page arXiv 2025
[16]

Dianjin- r1: Evaluating and enhancing financial reasoning in large language models.arXiv preprint arXiv:2504.15716, 2025

Jie Zhu, Qian Chen, Huaixia Dou, Junhui Li, Lifan Guo, Feng Chen, and Chi Zhang. Dianjin- r1: Evaluating and enhancing financial reasoning in large language models.arXiv preprint arXiv:2504.15716, 2025

work page arXiv 2025
[17]

Trading-r1: Financial trading with llm reasoning via reinforcement learning.arXiv preprint arXiv:2509.11420, 2025

Yijia Xiao, Edward Sun, Tong Chen, Fang Wu, Di Luo, and Wei Wang. Trading-r1: Financial trading with llm reasoning via reinforcement learning.arXiv preprint arXiv:2509.11420, 2025. 10

work page arXiv 2025
[18]

Do nlp models know numbers? probing numeracy in embeddings

Eric Wallace, Yizhong Wang, Sujian Li, Sameer Singh, and Matt Gardner. Do nlp models know numbers? probing numeracy in embeddings. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5307–5315, 2019

work page 2019
[19]

Are language models actually useful for time series forecasting?Advances in Neural Information Processing Systems, 37:60162–60191, 2024

Mingtian Tan, Mike A Merrill, Vinayak Gupta, Tim Althoff, and Thomas Hartvigsen. Are language models actually useful for time series forecasting?Advances in Neural Information Processing Systems, 37:60162–60191, 2024

work page 2024
[20]

Mtbench: A multimodal time series benchmark for temporal reasoning and question answering.arXiv preprint arXiv:2503.16858, 2025

Jialin Chen, Aosong Feng, Ziyu Zhao, Juan Garza, Gaukhar Nurbek, Cheng Qin, Ali Maatouk, Leandros Tassiulas, Yifeng Gao, and Rex Ying. Mtbench: A multimodal time series benchmark for temporal reasoning and question answering.arXiv preprint arXiv:2503.16858, 2025

work page arXiv 2025
[21]

TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models

Tong Guan, Zijie Meng, Dianqi Li, Shiyu Wang, Chao-Han Huck Yang, Qingsong Wen, Zuozhu Liu, Sabato Marco Siniscalchi, Ming Jin, and Shirui Pan. Timeomni-1: Incentivizing complex reasoning with time series in large language models.arXiv preprint arXiv:2509.24803, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[22]

Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs

Yucong Luo, Yitong Zhou, Mingyue Cheng, Jiahao Wang, Daoyu Wang, Tingyue Pan, and Jintao Zhang. Time series forecasting as reasoning: A slow-thinking approach with reinforced llms.arXiv preprint arXiv:2506.10630, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[23]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[24]

Forecasting at scale.The American Statistician, 72(1):37– 45, 2018

Sean J Taylor and Benjamin Letham. Forecasting at scale.The American Statistician, 72(1):37– 45, 2018

work page 2018
[25]

Informer: Beyond efficient transformer for long sequence time-series forecasting

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 11106–11115, 2021

work page 2021
[26]

Autoformer: Decomposition trans- formers with auto-correlation for long-term series forecasting.Advances in neural information processing systems, 34:22419–22430, 2021

Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition trans- formers with auto-correlation for long-term series forecasting.Advances in neural information processing systems, 34:22419–22430, 2021

work page 2021
[27]

Efficient high-resolution time series classification via attention kronecker decomposition.arXiv preprint arXiv:2403.04882, 2024

Aosong Feng, Jialin Chen, Juan Garza, Brooklyn Berry, Francisco Salazar, Yifeng Gao, Rex Ying, and Leandros Tassiulas. Efficient high-resolution time series classification via attention kronecker decomposition.arXiv preprint arXiv:2403.04882, 2024

work page arXiv 2024
[28]

Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting.arXiv preprint arXiv:2201.12740, 2022

Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting.arXiv preprint arXiv:2201.12740, 2022

work page arXiv 2022
[29]

A compre- hensive survey of deep learning for time series forecasting: architectural diversity and open challenges.Artificial Intelligence Review, 58(7):1–95, 2025

Jongseon Kim, Hyungjoon Kim, HyunGi Kim, Dongjun Lee, and Sungroh Yoon. A compre- hensive survey of deep learning for time series forecasting: architectural diversity and open challenges.Artificial Intelligence Review, 58(7):1–95, 2025

work page 2025
[30]

Time-LLM: Time Series Forecasting by Reprogramming Large Language Models

Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, et al. Time-llm: Time series forecasting by reprogramming large language models.arXiv preprint arXiv:2310.01728, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[31]

Llm4ts: Two-stage fine-tuning for time-series forecasting with pre-trained llms.CoRR, 2023

Ching Chang, Wen-Chih Peng, and Tien-Fu Chen. Llm4ts: Two-stage fine-tuning for time-series forecasting with pre-trained llms.CoRR, 2023

work page 2023
[32]

Context-alignment: Activating and enhancing llm capabilities in time series.arXiv preprint arXiv:2501.03747, 2025

Yuxiao Hu, Qian Li, Dongxiao Zhang, Jinyue Yan, and Yuntian Chen. Context-alignment: Activating and enhancing llm capabilities in time series.arXiv preprint arXiv:2501.03747, 2025

work page arXiv 2025
[33]

Autotimes: Au- toregressive time series forecasters via large language models.Advances in Neural Information Processing Systems, 37:122154–122184, 2024

Yong Liu, Guo Qin, Xiangdong Huang, Jianmin Wang, and Mingsheng Long. Autotimes: Au- toregressive time series forecasters via large language models.Advances in Neural Information Processing Systems, 37:122154–122184, 2024. 11

work page 2024
[34]

Lag-llama: Towards foundation models for time series forecasting

Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Arian Khorasani, George Adamopoulos, Rishika Bhagwatkar, Marin Biloš, Hena Ghonia, Nadhir Hassen, Anderson Schneider, et al. Lag-llama: Towards foundation models for time series forecasting. InR0-FoMo: Robustness of Few-shot and Zero-shot Learning in Large Foundation Models, 2023

work page 2023
[35]

Moment: A family of open time-series foundation models.arXiv preprint arXiv:2402.03885, 2024

Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, and Artur Dubrawski. Moment: A family of open time-series foundation models.arXiv preprint arXiv:2402.03885, 2024

work page arXiv 2024
[36]

Time-moe: Billion-scale time series foundation models with mixture of experts.arXiv preprint arXiv:2409.16040, 2024

Xiaoming Shi, Shiyu Wang, Yuqi Nie, Dianqi Li, Zhou Ye, Qingsong Wen, and Ming Jin. Time-moe: Billion-scale time series foundation models with mixture of experts.arXiv preprint arXiv:2409.16040, 2024

work page arXiv 2024
[37]

Timer-xl: Long- context transformers for unified time series forecasting.arXiv preprint arXiv:2410.04803, 2024

Yong Liu, Guo Qin, Xiangdong Huang, Jianmin Wang, and Mingsheng Long. Timer-xl: Long- context transformers for unified time series forecasting.arXiv preprint arXiv:2410.04803, 2024

work page arXiv 2024
[38]

Unified training of universal time series forecasting transformers

Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. Unified training of universal time series forecasting transformers. 2024

work page 2024
[39]

Chronos: Learning the Language of Time Series

Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, et al. Chronos: Learning the language of time series.arXiv preprint arXiv:2403.07815, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[40]

Temporal relational ranking for stock prediction.ACM Transactions on Information Systems (TOIS), 37(2):1–30, 2019

Fuli Feng, Xiangnan He, Xiang Wang, Cheng Luo, Yiqun Liu, and Tat-Seng Chua. Temporal relational ranking for stock prediction.ACM Transactions on Information Systems (TOIS), 37(2):1–30, 2019

work page 2019
[41]

Kronos: A foundation model for the language of financial markets.arXiv preprint arXiv:2508.02739, 2025

Yu Shi, Zongliang Fu, Shuo Chen, Bohan Zhao, Wei Xu, Changshui Zhang, and Jian Li. Kronos: A foundation model for the language of financial markets.arXiv preprint arXiv:2508.02739, 2025

work page arXiv 2025
[42]

Mitigating distribution shift in stock price data via return-volatility normalization for accurate prediction

Hyunwoo Lee, Jihyeong Jeon, Jaemin Hong, and U Kang. Mitigating distribution shift in stock price data via return-volatility normalization for accurate prediction. InProceedings of the 34th ACM International Conference on Information and Knowledge Management, pages 1458–1467, 2025

work page 2025
[43]

Factorvae: A probabilistic dynamic factor model based on variational autoencoder for predicting cross-sectional stock returns

Yitong Duan, Lei Wang, Qizhong Zhang, and Jian Li. Factorvae: A probabilistic dynamic factor model based on variational autoencoder for predicting cross-sectional stock returns. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pages 4468–4476, 2022

work page 2022
[44]

Efficient market hypothesis

Burton G Malkiel. Efficient market hypothesis. InFinance, pages 127–134. Springer, 1989

work page 1989
[45]

Hats: A hierarchical graph attention network for stock movement prediction.arXiv preprint arXiv:1908.07999, 2019

Raehyun Kim, Chan Ho So, Minbyul Jeong, Sanghoon Lee, Jinkyu Kim, and Jaewoo Kang. Hats: A hierarchical graph attention network for stock movement prediction.arXiv preprint arXiv:1908.07999, 2019

work page arXiv 1908
[46]

Stocktime: A time series specialized large language model architecture for stock price prediction.arXiv preprint arXiv:2409.08281, 2024

Shengkun Wang, Taoran Ji, Linhan Wang, Yanshen Sun, Shang-Ching Liu, Amit Kumar, and Chang-Tien Lu. Stocktime: A time series specialized large language model architecture for stock price prediction.arXiv preprint arXiv:2409.08281, 2024

work page arXiv 2024
[47]

Fingpt: Instruction tuning benchmark for open-source large language models in financial datasets.arXiv preprint arXiv:2310.04793, 2023

Neng Wang, Hongyang Yang, and Christina Dan Wang. Fingpt: Instruction tuning benchmark for open-source large language models in financial datasets.arXiv preprint arXiv:2310.04793, 2023

work page arXiv 2023
[48]

Trade-r1: Bridging verifiable rewards to stochastic environments via process-level reasoning verification

Rui Sun, Yifan Sun, Sheng Xu, Li Zhao, Jing Li, Daxin Jiang, Chen Hua, and Zuo Bai. Trade-r1: Bridging verifiable rewards to stochastic environments via process-level reasoning verification. arXiv preprint arXiv:2601.03948, 2026

work page arXiv 2026
[49]

Tradingagents: Multi-agents llm financial trading framework

Yijia Xiao, Edward Sun, Di Luo, and Wei Wang. Tradingagents: Multi-agents llm financial trading framework.arXiv preprint arXiv:2412.20138, 2024. 12

work page arXiv 2024
[50]

Stock market prices do not follow random walks: Evidence from a simple specification test.The review of financial studies, 1(1):41–66, 1988

Andrew W Lo and A Craig MacKinlay. Stock market prices do not follow random walks: Evidence from a simple specification test.The review of financial studies, 1(1):41–66, 1988

work page 1988
[51]

Efficient capital markets: A review of theory and empirical work.The journal of Finance, 25(2):383–417, 1970

Eugene F Fama. Efficient capital markets: A review of theory and empirical work.The journal of Finance, 25(2):383–417, 1970

work page 1970
[52]

Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

work page 2024
[53]

The llama 3 herd of models.arXiv e-prints, pages arXiv–2407, 2024

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models.arXiv e-prints, pages arXiv–2407, 2024

work page 2024
[54]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025. 13 A Dataset Curation A.1 Data Collection and Preprocessing To construct a robust multimodal benchmark, we aggregate high-frequency market data and unstruc- tured textua...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[55]

Company Fundamentals: We retrieve the most recently published quarterly financial statement prior tot(e.g.,revenue, operating margins, leverage ratios)

work page
[56]

Business & News Context: We aggregate news articles from a fixed lookback window using strict keyword matching based on the company’s ticker and official name, retaining only information available before market close att−1

work page
[57]

10Y spreads) and inflation metrics are similarly aligned by selecting the latest available release prior to time t

Macroeconomic Data: Key indicators such as US Treasury yields (e.g.,2Y vs. 10Y spreads) and inflation metrics are similarly aligned by selecting the latest available release prior to time t. These heterogeneous data sources are cleaned and serialized into a unified structured prompt Tt, serving as the grounding context for the model. A.2 Dataset Statistic...

work page 2025
[58]

Domain-Aware Question Synthesis. We prompt GPT-5 to generate questions across diverse financial tasks (e.g., Forecast QA,Event Detection,News Analysis, andMulti-signal Reasoning) to cover varied reasoning horizons and decision objectives

work page
[59]

Action-grounded Reasoning Construction. For each query, we derive a target forecast action from the realized future trajectory and require the teacher trace to justify an action that is consistent with available historical, textual, and macroeconomic evidence. This produces supervision for the reasoning before the intermediate action that will consequentl...

work page
[60]

future_window

Tool-aware Verification and Filtering. We execute the action-conditioned<forecast_action> to obtain a numerical trajectory, verify whether the trajectory supports the target answer, and filter out samples that are either inconsistent or trivially solvable by a smaller baseline (e.g.,Qwen3-4B). This yieldsD SFT with explicit reasoning-action-answer alignme...

work page 2025

[1] [1]

Prophet: forecasting at scale.PeerJ Preprints, 5:e3190v2, 2017

Sean J Taylor and Benjamin Letham. Prophet: forecasting at scale.PeerJ Preprints, 5:e3190v2, 2017

work page 2017

[2] [2]

Stock price prediction using the arima model

Adebiyi A Ariyo, Adewumi O Adewumi, and Charles K Ayo. Stock price prediction using the arima model. In2014 UKSim-AMSS 16th international conference on computer modelling and simulation, pages 106–112. IEEE, 2014

work page 2014

[3] [3]

John Wiley & Sons, 2015

George EP Box, Gwilym M Jenkins, Gregory C Reinsel, and Greta M Ljung.Time series analysis: forecasting and control. John Wiley & Sons, 2015

work page 2015

[4] [4]

Are transformers effective for time series forecasting? InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121–11128, 2023

Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. Are transformers effective for time series forecasting? InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121–11128, 2023

work page 2023

[5] [5]

iTransformer: Inverted Transformers Are Effective for Time Series Forecasting

Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. itransformer: Inverted transformers are effective for time series forecasting.arXiv preprint arXiv:2310.06625, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[6] [6]

Non-stationary transformers: Exploring the stationarity in time series forecasting.Advances in neural information processing systems, 35:9881–9893, 2022

Yong Liu, Haixu Wu, Jianmin Wang, and Mingsheng Long. Non-stationary transformers: Exploring the stationarity in time series forecasting.Advances in neural information processing systems, 35:9881–9893, 2022

work page 2022

[7] [7]

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

Y Nie. A time series is worth 64words: Long-term forecasting with transformers.arXiv preprint arXiv:2211.14730, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[8] [8]

K-nearest neighbour classifiers-a tutorial.ACM computing surveys (CSUR), 54(6):1–25, 2021

Padraig Cunningham and Sarah Jane Delany. K-nearest neighbour classifiers-a tutorial.ACM computing surveys (CSUR), 54(6):1–25, 2021

work page 2021

[9] [9]

Predicting excess stock returns out of sample: Can anything beat the historical average?The Review of Financial Studies, 21(4):1509–1531, 2008

John Y Campbell and Samuel B Thompson. Predicting excess stock returns out of sample: Can anything beat the historical average?The Review of Financial Studies, 21(4):1509–1531, 2008

work page 2008

[10] [10]

Causalstock: Deep end-to-end causal discovery for news-driven multi-stock movement prediction.Advances in Neural Information Processing Systems, 37:47432–47454, 2024

Shuqi Li, Yuebo Sun, Yuxin Lin, Xin Gao, Shuo Shang, and Rui Yan. Causalstock: Deep end-to-end causal discovery for news-driven multi-stock movement prediction.Advances in Neural Information Processing Systems, 37:47432–47454, 2024

work page 2024

[11] [11]

Stock movement prediction from tweets and historical prices

Yumo Xu and Shay B Cohen. Stock movement prediction from tweets and historical prices. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1970–1979, 2018

work page 1970

[12] [12]

Pen: prediction-explanation network to forecast stock price movement with better explainability

Shuqi Li, Weiheng Liao, Yuhan Chen, and Rui Yan. Pen: prediction-explanation network to forecast stock price movement with better explainability. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 5187–5194, 2023

work page 2023

[13] [13]

BloombergGPT: A Large Language Model for Finance

Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, and Gideon Mann. Bloomberggpt: A large language model for finance.arXiv preprint arXiv:2303.17564, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[14] [14]

Fin-r1: A large language model for financial reasoning through reinforcement learning.arXiv preprint arXiv:2503.16252, 2025

Zhaowei Liu, Xin Guo, Fangqi Lou, Lingfeng Zeng, Jinyi Niu, Zixuan Wang, Jiajie Xu, Weige Cai, Ziwei Yang, Xueqian Zhao, et al. Fin-r1: A large language model for financial reasoning through reinforcement learning.arXiv preprint arXiv:2503.16252, 2025

work page arXiv 2025

[15] [15]

Fino1: On the transferability of reasoning-enhanced llms and reinforcement learning to finance.arXiv preprint arXiv:2502.08127, 2025

Lingfei Qian, Weipeng Zhou, Yan Wang, Xueqing Peng, Han Yi, Yilun Zhao, Jimin Huang, Qianqian Xie, and Jian-yun Nie. Fino1: On the transferability of reasoning-enhanced llms and reinforcement learning to finance.arXiv preprint arXiv:2502.08127, 2025

work page arXiv 2025

[16] [16]

Dianjin- r1: Evaluating and enhancing financial reasoning in large language models.arXiv preprint arXiv:2504.15716, 2025

Jie Zhu, Qian Chen, Huaixia Dou, Junhui Li, Lifan Guo, Feng Chen, and Chi Zhang. Dianjin- r1: Evaluating and enhancing financial reasoning in large language models.arXiv preprint arXiv:2504.15716, 2025

work page arXiv 2025

[17] [17]

Trading-r1: Financial trading with llm reasoning via reinforcement learning.arXiv preprint arXiv:2509.11420, 2025

Yijia Xiao, Edward Sun, Tong Chen, Fang Wu, Di Luo, and Wei Wang. Trading-r1: Financial trading with llm reasoning via reinforcement learning.arXiv preprint arXiv:2509.11420, 2025. 10

work page arXiv 2025

[18] [18]

Do nlp models know numbers? probing numeracy in embeddings

Eric Wallace, Yizhong Wang, Sujian Li, Sameer Singh, and Matt Gardner. Do nlp models know numbers? probing numeracy in embeddings. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5307–5315, 2019

work page 2019

[19] [19]

Are language models actually useful for time series forecasting?Advances in Neural Information Processing Systems, 37:60162–60191, 2024

Mingtian Tan, Mike A Merrill, Vinayak Gupta, Tim Althoff, and Thomas Hartvigsen. Are language models actually useful for time series forecasting?Advances in Neural Information Processing Systems, 37:60162–60191, 2024

work page 2024

[20] [20]

Mtbench: A multimodal time series benchmark for temporal reasoning and question answering.arXiv preprint arXiv:2503.16858, 2025

Jialin Chen, Aosong Feng, Ziyu Zhao, Juan Garza, Gaukhar Nurbek, Cheng Qin, Ali Maatouk, Leandros Tassiulas, Yifeng Gao, and Rex Ying. Mtbench: A multimodal time series benchmark for temporal reasoning and question answering.arXiv preprint arXiv:2503.16858, 2025

work page arXiv 2025

[21] [21]

TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models

Tong Guan, Zijie Meng, Dianqi Li, Shiyu Wang, Chao-Han Huck Yang, Qingsong Wen, Zuozhu Liu, Sabato Marco Siniscalchi, Ming Jin, and Shirui Pan. Timeomni-1: Incentivizing complex reasoning with time series in large language models.arXiv preprint arXiv:2509.24803, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[22] [22]

Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs

Yucong Luo, Yitong Zhou, Mingyue Cheng, Jiahao Wang, Daoyu Wang, Tingyue Pan, and Jintao Zhang. Time series forecasting as reasoning: A slow-thinking approach with reinforced llms.arXiv preprint arXiv:2506.10630, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[23] [23]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[24] [24]

Forecasting at scale.The American Statistician, 72(1):37– 45, 2018

Sean J Taylor and Benjamin Letham. Forecasting at scale.The American Statistician, 72(1):37– 45, 2018

work page 2018

[25] [25]

Informer: Beyond efficient transformer for long sequence time-series forecasting

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 11106–11115, 2021

work page 2021

[26] [26]

Autoformer: Decomposition trans- formers with auto-correlation for long-term series forecasting.Advances in neural information processing systems, 34:22419–22430, 2021

Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition trans- formers with auto-correlation for long-term series forecasting.Advances in neural information processing systems, 34:22419–22430, 2021

work page 2021

[27] [27]

Efficient high-resolution time series classification via attention kronecker decomposition.arXiv preprint arXiv:2403.04882, 2024

Aosong Feng, Jialin Chen, Juan Garza, Brooklyn Berry, Francisco Salazar, Yifeng Gao, Rex Ying, and Leandros Tassiulas. Efficient high-resolution time series classification via attention kronecker decomposition.arXiv preprint arXiv:2403.04882, 2024

work page arXiv 2024

[28] [28]

Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting.arXiv preprint arXiv:2201.12740, 2022

Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting.arXiv preprint arXiv:2201.12740, 2022

work page arXiv 2022

[29] [29]

A compre- hensive survey of deep learning for time series forecasting: architectural diversity and open challenges.Artificial Intelligence Review, 58(7):1–95, 2025

Jongseon Kim, Hyungjoon Kim, HyunGi Kim, Dongjun Lee, and Sungroh Yoon. A compre- hensive survey of deep learning for time series forecasting: architectural diversity and open challenges.Artificial Intelligence Review, 58(7):1–95, 2025

work page 2025

[30] [30]

Time-LLM: Time Series Forecasting by Reprogramming Large Language Models

Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, et al. Time-llm: Time series forecasting by reprogramming large language models.arXiv preprint arXiv:2310.01728, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[31] [31]

Llm4ts: Two-stage fine-tuning for time-series forecasting with pre-trained llms.CoRR, 2023

Ching Chang, Wen-Chih Peng, and Tien-Fu Chen. Llm4ts: Two-stage fine-tuning for time-series forecasting with pre-trained llms.CoRR, 2023

work page 2023

[32] [32]

Context-alignment: Activating and enhancing llm capabilities in time series.arXiv preprint arXiv:2501.03747, 2025

Yuxiao Hu, Qian Li, Dongxiao Zhang, Jinyue Yan, and Yuntian Chen. Context-alignment: Activating and enhancing llm capabilities in time series.arXiv preprint arXiv:2501.03747, 2025

work page arXiv 2025

[33] [33]

Autotimes: Au- toregressive time series forecasters via large language models.Advances in Neural Information Processing Systems, 37:122154–122184, 2024

Yong Liu, Guo Qin, Xiangdong Huang, Jianmin Wang, and Mingsheng Long. Autotimes: Au- toregressive time series forecasters via large language models.Advances in Neural Information Processing Systems, 37:122154–122184, 2024. 11

work page 2024

[34] [34]

Lag-llama: Towards foundation models for time series forecasting

Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Arian Khorasani, George Adamopoulos, Rishika Bhagwatkar, Marin Biloš, Hena Ghonia, Nadhir Hassen, Anderson Schneider, et al. Lag-llama: Towards foundation models for time series forecasting. InR0-FoMo: Robustness of Few-shot and Zero-shot Learning in Large Foundation Models, 2023

work page 2023

[35] [35]

Moment: A family of open time-series foundation models.arXiv preprint arXiv:2402.03885, 2024

Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, and Artur Dubrawski. Moment: A family of open time-series foundation models.arXiv preprint arXiv:2402.03885, 2024

work page arXiv 2024

[36] [36]

Time-moe: Billion-scale time series foundation models with mixture of experts.arXiv preprint arXiv:2409.16040, 2024

Xiaoming Shi, Shiyu Wang, Yuqi Nie, Dianqi Li, Zhou Ye, Qingsong Wen, and Ming Jin. Time-moe: Billion-scale time series foundation models with mixture of experts.arXiv preprint arXiv:2409.16040, 2024

work page arXiv 2024

[37] [37]

Timer-xl: Long- context transformers for unified time series forecasting.arXiv preprint arXiv:2410.04803, 2024

Yong Liu, Guo Qin, Xiangdong Huang, Jianmin Wang, and Mingsheng Long. Timer-xl: Long- context transformers for unified time series forecasting.arXiv preprint arXiv:2410.04803, 2024

work page arXiv 2024

[38] [38]

Unified training of universal time series forecasting transformers

Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. Unified training of universal time series forecasting transformers. 2024

work page 2024

[39] [39]

Chronos: Learning the Language of Time Series

Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, et al. Chronos: Learning the language of time series.arXiv preprint arXiv:2403.07815, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[40] [40]

Temporal relational ranking for stock prediction.ACM Transactions on Information Systems (TOIS), 37(2):1–30, 2019

Fuli Feng, Xiangnan He, Xiang Wang, Cheng Luo, Yiqun Liu, and Tat-Seng Chua. Temporal relational ranking for stock prediction.ACM Transactions on Information Systems (TOIS), 37(2):1–30, 2019

work page 2019

[41] [41]

Kronos: A foundation model for the language of financial markets.arXiv preprint arXiv:2508.02739, 2025

Yu Shi, Zongliang Fu, Shuo Chen, Bohan Zhao, Wei Xu, Changshui Zhang, and Jian Li. Kronos: A foundation model for the language of financial markets.arXiv preprint arXiv:2508.02739, 2025

work page arXiv 2025

[42] [42]

Mitigating distribution shift in stock price data via return-volatility normalization for accurate prediction

Hyunwoo Lee, Jihyeong Jeon, Jaemin Hong, and U Kang. Mitigating distribution shift in stock price data via return-volatility normalization for accurate prediction. InProceedings of the 34th ACM International Conference on Information and Knowledge Management, pages 1458–1467, 2025

work page 2025

[43] [43]

Factorvae: A probabilistic dynamic factor model based on variational autoencoder for predicting cross-sectional stock returns

Yitong Duan, Lei Wang, Qizhong Zhang, and Jian Li. Factorvae: A probabilistic dynamic factor model based on variational autoencoder for predicting cross-sectional stock returns. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pages 4468–4476, 2022

work page 2022

[44] [44]

Efficient market hypothesis

Burton G Malkiel. Efficient market hypothesis. InFinance, pages 127–134. Springer, 1989

work page 1989

[45] [45]

Hats: A hierarchical graph attention network for stock movement prediction.arXiv preprint arXiv:1908.07999, 2019

Raehyun Kim, Chan Ho So, Minbyul Jeong, Sanghoon Lee, Jinkyu Kim, and Jaewoo Kang. Hats: A hierarchical graph attention network for stock movement prediction.arXiv preprint arXiv:1908.07999, 2019

work page arXiv 1908

[46] [46]

Stocktime: A time series specialized large language model architecture for stock price prediction.arXiv preprint arXiv:2409.08281, 2024

Shengkun Wang, Taoran Ji, Linhan Wang, Yanshen Sun, Shang-Ching Liu, Amit Kumar, and Chang-Tien Lu. Stocktime: A time series specialized large language model architecture for stock price prediction.arXiv preprint arXiv:2409.08281, 2024

work page arXiv 2024

[47] [47]

Fingpt: Instruction tuning benchmark for open-source large language models in financial datasets.arXiv preprint arXiv:2310.04793, 2023

Neng Wang, Hongyang Yang, and Christina Dan Wang. Fingpt: Instruction tuning benchmark for open-source large language models in financial datasets.arXiv preprint arXiv:2310.04793, 2023

work page arXiv 2023

[48] [48]

Trade-r1: Bridging verifiable rewards to stochastic environments via process-level reasoning verification

Rui Sun, Yifan Sun, Sheng Xu, Li Zhao, Jing Li, Daxin Jiang, Chen Hua, and Zuo Bai. Trade-r1: Bridging verifiable rewards to stochastic environments via process-level reasoning verification. arXiv preprint arXiv:2601.03948, 2026

work page arXiv 2026

[49] [49]

Tradingagents: Multi-agents llm financial trading framework

Yijia Xiao, Edward Sun, Di Luo, and Wei Wang. Tradingagents: Multi-agents llm financial trading framework.arXiv preprint arXiv:2412.20138, 2024. 12

work page arXiv 2024

[50] [50]

Stock market prices do not follow random walks: Evidence from a simple specification test.The review of financial studies, 1(1):41–66, 1988

Andrew W Lo and A Craig MacKinlay. Stock market prices do not follow random walks: Evidence from a simple specification test.The review of financial studies, 1(1):41–66, 1988

work page 1988

[51] [51]

Efficient capital markets: A review of theory and empirical work.The journal of Finance, 25(2):383–417, 1970

Eugene F Fama. Efficient capital markets: A review of theory and empirical work.The journal of Finance, 25(2):383–417, 1970

work page 1970

[52] [52]

Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

work page 2024

[53] [53]

The llama 3 herd of models.arXiv e-prints, pages arXiv–2407, 2024

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models.arXiv e-prints, pages arXiv–2407, 2024

work page 2024

[54] [54]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025. 13 A Dataset Curation A.1 Data Collection and Preprocessing To construct a robust multimodal benchmark, we aggregate high-frequency market data and unstruc- tured textua...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[55] [55]

Company Fundamentals: We retrieve the most recently published quarterly financial statement prior tot(e.g.,revenue, operating margins, leverage ratios)

work page

[56] [56]

Business & News Context: We aggregate news articles from a fixed lookback window using strict keyword matching based on the company’s ticker and official name, retaining only information available before market close att−1

work page

[57] [57]

10Y spreads) and inflation metrics are similarly aligned by selecting the latest available release prior to time t

Macroeconomic Data: Key indicators such as US Treasury yields (e.g.,2Y vs. 10Y spreads) and inflation metrics are similarly aligned by selecting the latest available release prior to time t. These heterogeneous data sources are cleaned and serialized into a unified structured prompt Tt, serving as the grounding context for the model. A.2 Dataset Statistic...

work page 2025

[58] [58]

Domain-Aware Question Synthesis. We prompt GPT-5 to generate questions across diverse financial tasks (e.g., Forecast QA,Event Detection,News Analysis, andMulti-signal Reasoning) to cover varied reasoning horizons and decision objectives

work page

[59] [59]

Action-grounded Reasoning Construction. For each query, we derive a target forecast action from the realized future trajectory and require the teacher trace to justify an action that is consistent with available historical, textual, and macroeconomic evidence. This produces supervision for the reasoning before the intermediate action that will consequentl...

work page

[60] [60]

future_window

Tool-aware Verification and Filtering. We execute the action-conditioned<forecast_action> to obtain a numerical trajectory, verify whether the trajectory supports the target answer, and filter out samples that are either inconsistent or trivially solvable by a smaller baseline (e.g.,Qwen3-4B). This yieldsD SFT with explicit reasoning-action-answer alignme...

work page 2025