Realised Volatility Forecasting: Machine Learning via Financial Word Embedding
Pith reviewed 2026-05-24 12:22 UTC · model grok-4.3
The pith
News text embeddings improve realized volatility forecasts beyond standard benchmarks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
News text can be turned into embedding-based representations that, when used in machine learning models, supply forward-looking information about future realized volatility. This information is incremental to what is contained in historical volatility measures, shows stronger effects for stock-specific news and high-volatility days, and produces consistent statistical and economic improvements when added to a leading benchmark model.
What carries the argument
Embedding-based representations of news text, which convert unstructured articles into dense vectors that serve as input features for volatility forecasting models.
If this is right
- Standalone models that use only news embeddings exhibit out-of-sample predictive ability for realized volatility.
- The predictive contribution is larger when the news content is directly related to the individual stock rather than general market events.
- Forecast improvements are more noticeable on days when realized volatility is already high.
- Blending the news-based signal with a standard realized-volatility benchmark produces both better statistical metrics and economically meaningful gains.
- Explainability analysis isolates the specific news themes that drive the volatility predictions.
Where Pith is reading between the lines
- The same embedding approach could be tested on other quantities such as trading volume or bid-ask spreads if news text contains analogous forward-looking content.
- Deploying the method in live trading would require careful handling of publication lags and real-time embedding updates.
- Extending the cross-section to include international stocks or other asset classes would test whether the documented effects generalize.
- Combining the news embeddings with high-frequency intraday data might further sharpen the timing of volatility signals.
Load-bearing premise
The embedding vectors extracted from news text capture genuine forward-looking signals about volatility rather than noise, selection bias, or information that is already reflected in past prices.
What would settle it
Re-running the out-of-sample tests after randomly shuffling the alignment between news publication dates and subsequent volatility realizations would remove the reported performance edge if the signal is real.
Figures
read the original abstract
We examine whether news can improve realised volatility forecasting using a modern yet operationally simple NLP framework. News text is transformed into embedding-based representations, and forecasts are evaluated both as a standalone, news-only model and as a complement to standard realised volatility benchmarks. In out-of-sample tests on a cross-section of stocks, news contains useful predictive information, with stronger effects for stock-related content and during high volatility days. Combining the news-based signal with a leading benchmark yields consistent improvements in statistical performance and economically meaningful gains, while explainability analysis highlights the news themes most relevant for volatility.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper examines whether news text, transformed via embedding-based representations, can improve realised volatility forecasts. It evaluates standalone news-only models and combinations with standard RV benchmarks in out-of-sample tests across a cross-section of stocks, claiming that news contains useful predictive information (stronger for stock-related content and high-volatility days), that combinations yield statistical and economic gains, and that explainability analysis identifies relevant news themes.
Significance. If the empirical results hold under rigorous validation, the work demonstrates a practical, operationally simple NLP pipeline for incorporating textual signals into volatility forecasting. Strengths include the focus on out-of-sample evaluation, differential effects by news type and market regime, and the addition of economic significance metrics alongside statistical ones.
major comments (2)
- [Abstract / §1] The abstract and introduction assert consistent OOS improvements and economic gains from combining the news signal with a leading benchmark, but the provided text supplies no quantitative values (e.g., percentage reduction in MSE, Diebold-Mariano statistics, or Sharpe-ratio differentials). Without these numbers or the precise baseline definitions, the magnitude and robustness of the central claim cannot be assessed from the summary alone.
- [§4 (Empirical Design)] The evaluation protocol must explicitly rule out look-ahead bias in the news corpus (e.g., publication timestamps relative to the volatility measurement window and any filtering that could introduce selection effects). The weakest assumption noted in the review—that embeddings capture forward-looking information without substantial leakage—directly affects the validity of the reported OOS gains and requires a dedicated robustness subsection.
minor comments (3)
- [§3] Clarify the precise word-embedding architecture, vocabulary construction, and any domain-specific fine-tuning steps; the current description is high-level.
- [§5] Add error bars or bootstrap confidence intervals to the reported performance differentials and state the exact cross-validation or rolling-window scheme used for the OOS tests.
- [§6] The explainability analysis would benefit from a table listing the top news themes by importance score together with their associated volatility impact signs.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and the recommendation for minor revision. We address each major comment below and will incorporate the suggested changes into the revised manuscript.
read point-by-point responses
-
Referee: [Abstract / §1] The abstract and introduction assert consistent OOS improvements and economic gains from combining the news signal with a leading benchmark, but the provided text supplies no quantitative values (e.g., percentage reduction in MSE, Diebold-Mariano statistics, or Sharpe-ratio differentials). Without these numbers or the precise baseline definitions, the magnitude and robustness of the central claim cannot be assessed from the summary alone.
Authors: We agree that the abstract and introduction would benefit from including key quantitative results to better convey the magnitude of the improvements. While the full manuscript reports these metrics in detail (e.g., MSE reductions, DM test statistics, and economic significance measures in Sections 5–6), the abstract remains qualitative. We will revise the abstract to incorporate concise quantitative highlights, such as average out-of-sample MSE improvements and significance levels, along with clearer baseline definitions. revision: yes
-
Referee: [§4 (Empirical Design)] The evaluation protocol must explicitly rule out look-ahead bias in the news corpus (e.g., publication timestamps relative to the volatility measurement window and any filtering that could introduce selection effects). The weakest assumption noted in the review—that embeddings capture forward-looking information without substantial leakage—directly affects the validity of the reported OOS gains and requires a dedicated robustness subsection.
Authors: We agree that an explicit discussion of look-ahead bias is necessary for transparency. The current manuscript describes the use of publication timestamps to align news with the forecast horizon, but we will add a dedicated robustness subsection (likely in §4 or §5) that details timestamp alignment procedures, any selection filters applied to the news corpus, and additional checks (such as restricting to news published strictly before the volatility measurement window) to confirm the absence of leakage. revision: yes
Circularity Check
No significant circularity identified
full rationale
The paper's central claim rests on out-of-sample forecasting performance of embedding-based news representations for realised volatility, evaluated both standalone and as an additive signal to standard benchmarks. No equations or steps are described that reduce a claimed prediction to a fitted parameter by construction, nor does any load-bearing premise rely on self-citation chains or imported uniqueness theorems. The methodology follows standard NLP embedding plus regression pipelines whose predictive content is independently testable against external benchmarks; the derivation chain therefore remains self-contained.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We develop FinText, a financial word embedding... NLP model supported by different word embeddings improves realised volatility forecasts on high volatility days... SHAP... volatility movers
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
HAR-family... CHAR model... ensemble models... LOB data and news sentiments
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Ad \"a mmer, P. and R. A. Sch \"u ssler (2020). Forecasting the equity premium: mind the news! Review of Finance\/ 24\/ (6), 1313--1355
work page 2020
-
[2]
Agirre, E., E. Alfonseca, K. Hall, J. Kravalova, M. Pasca, and A. Soroa (2009). A study on similarity and relatedness using distributional and wordnet-based approaches
work page 2009
-
[3]
Ban, G.-Y., N. El Karoui, and A. E. Lim (2018). Machine learning and portfolio optimization. Management Science\/ 64\/ (3), 1136--1154
work page 2018
-
[4]
Bianchi, D., M. B \"u chner, and A. Tamoni (2021). Bond risk premiums with machine learning. The Review of Financial Studies\/ 34\/ (2), 1046--1089
work page 2021
- [5]
-
[6]
Bollerslev, T., A. J. Patton, and R. Quaedvlieg (2016). Exploiting the errors: A simple approach for improved volatility forecasting. Journal of Econometrics\/ 192\/ (1), 1--18
work page 2016
-
[7]
Bubna, A., S. R. Das, and N. Prabhala (2020). Venture capital communities. Journal of Financial and Quantitative Analysis\/ 55\/ (2), 621--651
work page 2020
-
[8]
Bybee, L., B. T. Kelly, A. Manela, and D. Xiu (2020). The structure of economic news. Technical report, National Bureau of Economic Research
work page 2020
-
[9]
Chen, L., M. Pelger, and J. Zhu (2020). Deep learning in asset pricing. Available at SSRN 3350138\/
work page 2020
- [10]
-
[11]
Conrad, C. and R. F. Engle (2021). Modelling volatility cycles: the (mf)\^ 2 garch model. Available at SSRN\/
work page 2021
-
[12]
Corsi, F. (2009). A simple approximate long-memory model of realized volatility. Journal of Financial Econometrics\/ 7\/ (2), 174--196
work page 2009
-
[13]
Corsi, F. and R. Reno (2009). Har volatility modelling with heterogeneous leverage and jumps. Available at SSRN 1316953\/
work page 2009
-
[14]
Engle, R. F. and S. Martins (2020). Measuring and hedging geopolitical risk
work page 2020
-
[15]
Engle, R. F. and V. K. Ng (1993). Measuring and testing the impact of news on volatility. The journal of finance\/ 48\/ (5), 1749--1778
work page 1993
-
[16]
Gentzkow, M., B. Kelly, and M. Taddy (2019). Text as data. Journal of Economic Literature\/ 57\/ (3), 535--74
work page 2019
-
[17]
Gu, S., B. Kelly, and D. Xiu (2020). Empirical asset pricing via machine learning. The Review of Financial Studies\/ 33\/ (5), 2223--2273
work page 2020
-
[18]
Gu, S., B. Kelly, and D. Xiu (2021). Autoencoder asset pricing models. Journal of Econometrics\/ 222\/ (1), 429--450
work page 2021
-
[19]
Hill, F., R. Reichart, and A. Korhonen (2015). Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics\/ 41\/ (4), 665--695
work page 2015
-
[20]
Jiang, J., B. T. Kelly, and D. Xiu (2020). (re-) imag (in) ing price trends. Chicago Booth Research Paper\/ (21-01)
work page 2020
- [21]
-
[22]
Kalay, A., O. Sade, and A. Wohl (2004). Measuring stock illiquidity: An investigation of the demand and supply schedules at the tase. Journal of Financial Economics\/ 74\/ (3), 461--486
work page 2004
-
[23]
Ke, Z. T., B. T. Kelly, and D. Xiu (2019). Predicting returns with text data. Technical report, National Bureau of Economic Research
work page 2019
-
[24]
Kim, Y. (2014). Convolutional neural networks for sentence classification. CoRR\/ abs/1408.5882
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[25]
Kingma, D. P. and J. Ba (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980\/
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[26]
Li, K., X. Liu, F. Mai, and T. Zhang (2020). The role of corporate culture in bad times: Evidence from the covid-19 pandemic. Journal of Financial and Quantitative Analysis\/ , 1--68
work page 2020
-
[27]
Lim, B., S. Zohren, and S. Roberts (2019). Enhancing time-series momentum strategies using deep neural networks. The Journal of Financial Data Science\/ 1\/ (4), 19--38
work page 2019
-
[28]
Loughran, T. and B. McDonald (2011). When is a liability not a liability? textual analysis, dictionaries, and 10-ks. The Journal of Finance\/ 66\/ (1), 35--65
work page 2011
-
[29]
Loughran, T. and B. McDonald (2016). Textual analysis in accounting and finance: A survey. Journal of Accounting Research\/ 54\/ (4), 1187--1230
work page 2016
-
[30]
A Unified Approach to Interpreting Model Predictions
Lundberg, S. and S.-I. Lee (2017). A unified approach to interpreting model predictions. arXiv preprint arXiv:1705.07874\/
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[31]
Mikolov, T., K. Chen, G. Corrado, and J. Dean (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781\/
work page internal anchor Pith review Pith/arXiv arXiv 2013
- [32]
-
[33]
Distributed Representations of Words and Phrases and their Compositionality
Mikolov, T., I. Sutskever, K. Chen, G. Corrado, and J. Dean (2013). Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:1310.4546\/
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[34]
Morin, F. and Y. Bengio (2005). Hierarchical probabilistic neural network language model. In Aistats , Volume 5, pp.\ 246--252. Citeseer
work page 2005
-
[35]
N s, R. and J. A. Skjeltorp (2006). Order book characteristics and the volume--volatility relation: Empirical evidence from a limit order market. Journal of Financial Markets\/ 9\/ (4), 408--432
work page 2006
-
[36]
Obaid, K. and K. Pukthuanthong (2021). A picture is worth a thousand words: Measuring investor sentiment by combining machine learning and photos from news. Journal of Financial Economics\/
work page 2021
-
[37]
Patton, A. J. (2011). Volatility forecast comparison using imperfect volatility proxies. Journal of Econometrics\/ 160\/ (1), 246--256
work page 2011
-
[38]
Patton, A. J. and K. Sheppard (2015). Good volatility, bad volatility: Signed jumps and the persistence of volatility. Review of Economics and Statistics\/ 97\/ (3), 683--697
work page 2015
-
[39]
Poh, D., B. Lim, S. Zohren, and S. Roberts (2021). Building cross-sectional systematic strategies by learning to rank. The Journal of Financial Data Science\/ 3\/ (2), 70--86
work page 2021
-
[40]
Politis, D. N. and J. P. Romano (1994). The stationary bootstrap. Journal of the American Statistical association\/ 89\/ (428), 1303--1313
work page 1994
- [41]
- [42]
-
[43]
Shapiro, A. H., M. Sudhof, and D. J. Wilson (2020). Measuring news sentiment. Journal of Econometrics\/
work page 2020
-
[44]
Shrikumar, A., P. Greenside, and A. Kundaje (2017). Learning important features through propagating activation differences. In International Conference on Machine Learning , pp.\ 3145--3153. PMLR
work page 2017
-
[45]
Sirignano, J. and R. Cont (2019). Universal features of price formation in financial markets: perspectives from deep learning. Quantitative Finance\/ 19\/ (9), 1449--1459
work page 2019
-
[46]
Sundararajan, M., A. Taly, and Q. Yan (2017). Axiomatic attribution for deep networks. In International Conference on Machine Learning , pp.\ 3319--3328. PMLR
work page 2017
-
[47]
Wood, K., S. Roberts, and S. Zohren (2021). Slow momentum with fast reversion: A trading strategy using deep learning and changepoint detection. arXiv preprint arXiv:2105.13727\/
-
[48]
Wu, W., J. Chen, Z. Yang, and M. L. Tindall (2020). A cross-sectional machine learning approach for hedge fund return prediction and selection. Management Science\/
work page 2020
- [49]
-
[50]
BDLOB: Bayesian Deep Convolutional Neural Networks for Limit Order Books
Zhang, Z., S. Zohren, and S. Roberts (2018). Bdlob: Bayesian deep convolutional neural networks for limit order books. arXiv preprint arXiv:1811.10041\/
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[51]
Zhang, Z., S. Zohren, and S. Roberts (2020). Deep reinforcement learning for trading. The Journal of Financial Data Science\/ 2\/ (2), 25--40
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.