pith. sign in

arxiv: 2108.00480 · v6 · submitted 2021-08-01 · 💱 q-fin.CP · cs.CL· cs.LG

Realised Volatility Forecasting: Machine Learning via Financial Word Embedding

Pith reviewed 2026-05-24 12:22 UTC · model grok-4.3

classification 💱 q-fin.CP cs.CLcs.LG
keywords realized volatility forecastingnews embeddingsmachine learningfinancial NLPvolatility predictionstock-specific newsout-of-sample teststext embeddings in finance
0
0 comments X

The pith

News text embeddings improve realized volatility forecasts beyond standard benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether converting news articles into numerical embedding vectors lets machine learning models forecast realized volatility more accurately than benchmarks that rely only on past price data. It reports that news adds incremental predictive content in out-of-sample tests across stocks, with larger effects when the articles mention the specific stock and when volatility is already elevated. Forecasts that blend the news signal with a leading benchmark show better statistical accuracy and produce economically relevant gains. An explainability step identifies the news themes that matter most for the predictions. Volatility forecasts matter for option pricing, risk management, and portfolio allocation, so any timely extra signal from public news text is potentially valuable.

Core claim

News text can be turned into embedding-based representations that, when used in machine learning models, supply forward-looking information about future realized volatility. This information is incremental to what is contained in historical volatility measures, shows stronger effects for stock-specific news and high-volatility days, and produces consistent statistical and economic improvements when added to a leading benchmark model.

What carries the argument

Embedding-based representations of news text, which convert unstructured articles into dense vectors that serve as input features for volatility forecasting models.

If this is right

  • Standalone models that use only news embeddings exhibit out-of-sample predictive ability for realized volatility.
  • The predictive contribution is larger when the news content is directly related to the individual stock rather than general market events.
  • Forecast improvements are more noticeable on days when realized volatility is already high.
  • Blending the news-based signal with a standard realized-volatility benchmark produces both better statistical metrics and economically meaningful gains.
  • Explainability analysis isolates the specific news themes that drive the volatility predictions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same embedding approach could be tested on other quantities such as trading volume or bid-ask spreads if news text contains analogous forward-looking content.
  • Deploying the method in live trading would require careful handling of publication lags and real-time embedding updates.
  • Extending the cross-section to include international stocks or other asset classes would test whether the documented effects generalize.
  • Combining the news embeddings with high-frequency intraday data might further sharpen the timing of volatility signals.

Load-bearing premise

The embedding vectors extracted from news text capture genuine forward-looking signals about volatility rather than noise, selection bias, or information that is already reflected in past prices.

What would settle it

Re-running the out-of-sample tests after randomly shuffling the alignment between news publication dates and subsequent volatility realizations would remove the reported performance edge if the signal is real.

Figures

Figures reproduced from arXiv: 2108.00480 by Eghbal Rahimikia, Ser-Huang Poon, Stefan Zohren.

Figure 1
Figure 1. Figure 1: Number of words in corpus Notes: This figure shows the total number of words (billion) in the corpus used for training Google Word2Vec (Mikolov et al., 2018), WikiNews (Joulin et al., 2016) and FinText word embedding. Google Word2Vec, WikiNews, and FinText corpora contain 100 billion, 16 billion, and 4.32 billion words, respectively. Word2Vec, WikiNews, and FinText corpora contain 100 billion, 16 billion, … view at source ↗
Figure 3
Figure 3. Figure 3: An abstract representation of model Notes: {X(t,1), X(t,2), ..., X(t,kt)} consists of news headlines of day t and X(t,kt) is the k th token of input t. Also, RVt+1 is the RV of day t + 1 (next day RV). Padding with a maximum length of 500 is adopted to ensure that all inputs of the neural network have the same length. The word embedding block consists of two different word embeddings. To capture days witho… view at source ↗
Figure 4
Figure 4. Figure 4: A detailed representation of model Notes: The sentence matrix is a 500×300 matrix with a maximum length of padding of 500 and word embedding dimensions of 300. In this matrix, each token is defined by a vector of 300 values. This structure contains three filters of different sizes. The filters with the size of 1, 2, and 3 generate feature maps with the size of 500, 499, and 498, respectively. Global max po… view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of daily tokens Notes: The number of daily tokens is calculated, and their distributions are plotted after putting daily stock-related news (left plot) and general hot news (right plot) headlines together (train data - 2046 days). The vertical line is the chosen maximum length of the padding. the set S, N contains all model inputs, |S|! shows the number of different ways the chosen set of toke… view at source ↗
Figure 6
Figure 6. Figure 6: Out-of-sample word cloud Notes: (a) Word cloud of stock-related headlines for all 23 stocks together over the out-of-sample period. (b) Word cloud of general hot headlines over the out-of-sample period. For each stock, we obtained the SHAP values for the constituent n-grams of all the textual infor￾mation used to forecast RV for that stock. In order to identify the volatility movers for the entire sample o… view at source ↗
Figure 7
Figure 7. Figure 7: Yearly RV forecasting performance (Stock-related news) [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Monthly average percentage of OOV (in stock-related news) over the out-of-sample period [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Yearly RV forecasting performance (General hot news) [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: SHAP values for the top negative LM words [PITH_FULL_IMAGE:figures/full_fig_p030_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Robustness checks RC results (Stock-related news) [PITH_FULL_IMAGE:figures/full_fig_p035_11.png] view at source ↗
read the original abstract

We examine whether news can improve realised volatility forecasting using a modern yet operationally simple NLP framework. News text is transformed into embedding-based representations, and forecasts are evaluated both as a standalone, news-only model and as a complement to standard realised volatility benchmarks. In out-of-sample tests on a cross-section of stocks, news contains useful predictive information, with stronger effects for stock-related content and during high volatility days. Combining the news-based signal with a leading benchmark yields consistent improvements in statistical performance and economically meaningful gains, while explainability analysis highlights the news themes most relevant for volatility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper examines whether news text, transformed via embedding-based representations, can improve realised volatility forecasts. It evaluates standalone news-only models and combinations with standard RV benchmarks in out-of-sample tests across a cross-section of stocks, claiming that news contains useful predictive information (stronger for stock-related content and high-volatility days), that combinations yield statistical and economic gains, and that explainability analysis identifies relevant news themes.

Significance. If the empirical results hold under rigorous validation, the work demonstrates a practical, operationally simple NLP pipeline for incorporating textual signals into volatility forecasting. Strengths include the focus on out-of-sample evaluation, differential effects by news type and market regime, and the addition of economic significance metrics alongside statistical ones.

major comments (2)
  1. [Abstract / §1] The abstract and introduction assert consistent OOS improvements and economic gains from combining the news signal with a leading benchmark, but the provided text supplies no quantitative values (e.g., percentage reduction in MSE, Diebold-Mariano statistics, or Sharpe-ratio differentials). Without these numbers or the precise baseline definitions, the magnitude and robustness of the central claim cannot be assessed from the summary alone.
  2. [§4 (Empirical Design)] The evaluation protocol must explicitly rule out look-ahead bias in the news corpus (e.g., publication timestamps relative to the volatility measurement window and any filtering that could introduce selection effects). The weakest assumption noted in the review—that embeddings capture forward-looking information without substantial leakage—directly affects the validity of the reported OOS gains and requires a dedicated robustness subsection.
minor comments (3)
  1. [§3] Clarify the precise word-embedding architecture, vocabulary construction, and any domain-specific fine-tuning steps; the current description is high-level.
  2. [§5] Add error bars or bootstrap confidence intervals to the reported performance differentials and state the exact cross-validation or rolling-window scheme used for the OOS tests.
  3. [§6] The explainability analysis would benefit from a table listing the top news themes by importance score together with their associated volatility impact signs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for minor revision. We address each major comment below and will incorporate the suggested changes into the revised manuscript.

read point-by-point responses
  1. Referee: [Abstract / §1] The abstract and introduction assert consistent OOS improvements and economic gains from combining the news signal with a leading benchmark, but the provided text supplies no quantitative values (e.g., percentage reduction in MSE, Diebold-Mariano statistics, or Sharpe-ratio differentials). Without these numbers or the precise baseline definitions, the magnitude and robustness of the central claim cannot be assessed from the summary alone.

    Authors: We agree that the abstract and introduction would benefit from including key quantitative results to better convey the magnitude of the improvements. While the full manuscript reports these metrics in detail (e.g., MSE reductions, DM test statistics, and economic significance measures in Sections 5–6), the abstract remains qualitative. We will revise the abstract to incorporate concise quantitative highlights, such as average out-of-sample MSE improvements and significance levels, along with clearer baseline definitions. revision: yes

  2. Referee: [§4 (Empirical Design)] The evaluation protocol must explicitly rule out look-ahead bias in the news corpus (e.g., publication timestamps relative to the volatility measurement window and any filtering that could introduce selection effects). The weakest assumption noted in the review—that embeddings capture forward-looking information without substantial leakage—directly affects the validity of the reported OOS gains and requires a dedicated robustness subsection.

    Authors: We agree that an explicit discussion of look-ahead bias is necessary for transparency. The current manuscript describes the use of publication timestamps to align news with the forecast horizon, but we will add a dedicated robustness subsection (likely in §4 or §5) that details timestamp alignment procedures, any selection filters applied to the news corpus, and additional checks (such as restricting to news published strictly before the volatility measurement window) to confirm the absence of leakage. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central claim rests on out-of-sample forecasting performance of embedding-based news representations for realised volatility, evaluated both standalone and as an additive signal to standard benchmarks. No equations or steps are described that reduce a claimed prediction to a fitted parameter by construction, nor does any load-bearing premise rely on self-citation chains or imported uniqueness theorems. The methodology follows standard NLP embedding plus regression pipelines whose predictive content is independently testable against external benchmarks; the derivation chain therefore remains self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are identifiable from the provided text.

pith-pipeline@v0.9.0 · 5627 in / 976 out tokens · 22248 ms · 2026-05-24T12:22:09.862246+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 6 internal anchors

  1. [1]

    a mmer, P. and R. A. Sch \

    Ad \"a mmer, P. and R. A. Sch \"u ssler (2020). Forecasting the equity premium: mind the news! Review of Finance\/ 24\/ (6), 1313--1355

  2. [2]

    Alfonseca, K

    Agirre, E., E. Alfonseca, K. Hall, J. Kravalova, M. Pasca, and A. Soroa (2009). A study on similarity and relatedness using distributional and wordnet-based approaches

  3. [3]

    El Karoui, and A

    Ban, G.-Y., N. El Karoui, and A. E. Lim (2018). Machine learning and portfolio optimization. Management Science\/ 64\/ (3), 1136--1154

  4. [4]

    B \"u chner, and A

    Bianchi, D., M. B \"u chner, and A. Tamoni (2021). Bond risk premiums with machine learning. The Review of Financial Studies\/ 34\/ (2), 1046--1089

  5. [5]

    Grave, A

    Bojanowski, P., E. Grave, A. Joulin, and T. Mikolov (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics\/ 5 , 135--146

  6. [6]

    Bollerslev, T., A. J. Patton, and R. Quaedvlieg (2016). Exploiting the errors: A simple approach for improved volatility forecasting. Journal of Econometrics\/ 192\/ (1), 1--18

  7. [7]

    Bubna, A., S. R. Das, and N. Prabhala (2020). Venture capital communities. Journal of Financial and Quantitative Analysis\/ 55\/ (2), 621--651

  8. [8]

    Bybee, L., B. T. Kelly, A. Manela, and D. Xiu (2020). The structure of economic news. Technical report, National Bureau of Economic Research

  9. [9]

    Pelger, and J

    Chen, L., M. Pelger, and J. Zhu (2020). Deep learning in asset pricing. Available at SSRN 3350138\/

  10. [10]

    Weston, L

    Collobert, R., J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa (2011). Natural language processing (almost) from scratch. Journal of machine learning research\/ 12\/ (ARTICLE), 2493--2537

  11. [11]

    Conrad, C. and R. F. Engle (2021). Modelling volatility cycles: the (mf)\^ 2 garch model. Available at SSRN\/

  12. [12]

    Corsi, F. (2009). A simple approximate long-memory model of realized volatility. Journal of Financial Econometrics\/ 7\/ (2), 174--196

  13. [13]

    Corsi, F. and R. Reno (2009). Har volatility modelling with heterogeneous leverage and jumps. Available at SSRN 1316953\/

  14. [14]

    Engle, R. F. and S. Martins (2020). Measuring and hedging geopolitical risk

  15. [15]

    Engle, R. F. and V. K. Ng (1993). Measuring and testing the impact of news on volatility. The journal of finance\/ 48\/ (5), 1749--1778

  16. [16]

    Kelly, and M

    Gentzkow, M., B. Kelly, and M. Taddy (2019). Text as data. Journal of Economic Literature\/ 57\/ (3), 535--74

  17. [17]

    Kelly, and D

    Gu, S., B. Kelly, and D. Xiu (2020). Empirical asset pricing via machine learning. The Review of Financial Studies\/ 33\/ (5), 2223--2273

  18. [18]

    Kelly, and D

    Gu, S., B. Kelly, and D. Xiu (2021). Autoencoder asset pricing models. Journal of Econometrics\/ 222\/ (1), 429--450

  19. [19]

    Reichart, and A

    Hill, F., R. Reichart, and A. Korhonen (2015). Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics\/ 41\/ (4), 665--695

  20. [20]

    Jiang, J., B. T. Kelly, and D. Xiu (2020). (re-) imag (in) ing price trends. Chicago Booth Research Paper\/ (21-01)

  21. [21]

    Grave, P

    Joulin, A., E. Grave, P. Bojanowski, M. Douze, H. Jégou, and T. Mikolov (2016). Fasttext.zip: Compressing text classification models

  22. [22]

    Sade, and A

    Kalay, A., O. Sade, and A. Wohl (2004). Measuring stock illiquidity: An investigation of the demand and supply schedules at the tase. Journal of Financial Economics\/ 74\/ (3), 461--486

  23. [23]

    Ke, Z. T., B. T. Kelly, and D. Xiu (2019). Predicting returns with text data. Technical report, National Bureau of Economic Research

  24. [24]

    Kim, Y. (2014). Convolutional neural networks for sentence classification. CoRR\/ abs/1408.5882

  25. [25]

    Kingma, D. P. and J. Ba (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980\/

  26. [26]

    Li, K., X. Liu, F. Mai, and T. Zhang (2020). The role of corporate culture in bad times: Evidence from the covid-19 pandemic. Journal of Financial and Quantitative Analysis\/ , 1--68

  27. [27]

    Zohren, and S

    Lim, B., S. Zohren, and S. Roberts (2019). Enhancing time-series momentum strategies using deep neural networks. The Journal of Financial Data Science\/ 1\/ (4), 19--38

  28. [28]

    Loughran, T. and B. McDonald (2011). When is a liability not a liability? textual analysis, dictionaries, and 10-ks. The Journal of Finance\/ 66\/ (1), 35--65

  29. [29]

    Loughran, T. and B. McDonald (2016). Textual analysis in accounting and finance: A survey. Journal of Accounting Research\/ 54\/ (4), 1187--1230

  30. [30]

    A Unified Approach to Interpreting Model Predictions

    Lundberg, S. and S.-I. Lee (2017). A unified approach to interpreting model predictions. arXiv preprint arXiv:1705.07874\/

  31. [31]

    Mikolov, T., K. Chen, G. Corrado, and J. Dean (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781\/

  32. [32]

    Grave, P

    Mikolov, T., E. Grave, P. Bojanowski, C. Puhrsch, and A. Joulin (2018). Advances in pre-training distributed word representations. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018)

  33. [33]

    Distributed Representations of Words and Phrases and their Compositionality

    Mikolov, T., I. Sutskever, K. Chen, G. Corrado, and J. Dean (2013). Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:1310.4546\/

  34. [34]

    Morin, F. and Y. Bengio (2005). Hierarchical probabilistic neural network language model. In Aistats , Volume 5, pp.\ 246--252. Citeseer

  35. [35]

    N s, R. and J. A. Skjeltorp (2006). Order book characteristics and the volume--volatility relation: Empirical evidence from a limit order market. Journal of Financial Markets\/ 9\/ (4), 408--432

  36. [36]

    Obaid, K. and K. Pukthuanthong (2021). A picture is worth a thousand words: Measuring investor sentiment by combining machine learning and photos from news. Journal of Financial Economics\/

  37. [37]

    Patton, A. J. (2011). Volatility forecast comparison using imperfect volatility proxies. Journal of Econometrics\/ 160\/ (1), 246--256

  38. [38]

    Patton, A. J. and K. Sheppard (2015). Good volatility, bad volatility: Signed jumps and the persistence of volatility. Review of Economics and Statistics\/ 97\/ (3), 683--697

  39. [39]

    Poh, D., B. Lim, S. Zohren, and S. Roberts (2021). Building cross-sectional systematic strategies by learning to rank. The Journal of Financial Data Science\/ 3\/ (2), 70--86

  40. [40]

    Politis, D. N. and J. P. Romano (1994). The stationary bootstrap. Journal of the American Statistical association\/ 89\/ (428), 1303--1313

  41. [41]

    and S.-H

    Rahimikia, E. and S.-H. Poon (2020a). Big data approach to realised volatility forecasting using har model augmented with limit order book and news. Available at SSRN 3684040\/

  42. [42]

    and S.-H

    Rahimikia, E. and S.-H. Poon (2020b). Machine learning for realised volatility forecasting. Available at SSRN 3707796\/

  43. [43]

    Shapiro, A. H., M. Sudhof, and D. J. Wilson (2020). Measuring news sentiment. Journal of Econometrics\/

  44. [44]

    Greenside, and A

    Shrikumar, A., P. Greenside, and A. Kundaje (2017). Learning important features through propagating activation differences. In International Conference on Machine Learning , pp.\ 3145--3153. PMLR

  45. [45]

    Sirignano, J. and R. Cont (2019). Universal features of price formation in financial markets: perspectives from deep learning. Quantitative Finance\/ 19\/ (9), 1449--1459

  46. [46]

    Taly, and Q

    Sundararajan, M., A. Taly, and Q. Yan (2017). Axiomatic attribution for deep networks. In International Conference on Machine Learning , pp.\ 3319--3328. PMLR

  47. [47]

    Roberts, and S

    Wood, K., S. Roberts, and S. Zohren (2021). Slow momentum with fast reversion: A trading strategy using deep learning and changepoint detection. arXiv preprint arXiv:2105.13727\/

  48. [48]

    Wu, W., J. Chen, Z. Yang, and M. L. Tindall (2020). A cross-sectional machine learning approach for hedge fund return prediction and selection. Management Science\/

  49. [49]

    Zhang, Z. and S. Zohren (2021). Multi-horizon forecasting for limit order books: Novel deep learning approaches and hardware acceleration using intelligent processing units. arXiv preprint arXiv:2105.10430\/

  50. [50]

    BDLOB: Bayesian Deep Convolutional Neural Networks for Limit Order Books

    Zhang, Z., S. Zohren, and S. Roberts (2018). Bdlob: Bayesian deep convolutional neural networks for limit order books. arXiv preprint arXiv:1811.10041\/

  51. [51]

    Zohren, and S

    Zhang, Z., S. Zohren, and S. Roberts (2020). Deep reinforcement learning for trading. The Journal of Financial Data Science\/ 2\/ (2), 25--40