Realised Volatility Forecasting: Machine Learning via Financial Word Embedding

Eghbal Rahimikia; Ser-Huang Poon; Stefan Zohren

arxiv: 2108.00480 · v6 · submitted 2021-08-01 · 💱 q-fin.CP · cs.CL· cs.LG

Realised Volatility Forecasting: Machine Learning via Financial Word Embedding

Eghbal Rahimikia , Stefan Zohren , Ser-Huang Poon This is my paper

Pith reviewed 2026-05-24 12:22 UTC · model grok-4.3

classification 💱 q-fin.CP cs.CLcs.LG

keywords realized volatility forecastingnews embeddingsmachine learningfinancial NLPvolatility predictionstock-specific newsout-of-sample teststext embeddings in finance

0 comments

The pith

News text embeddings improve realized volatility forecasts beyond standard benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether converting news articles into numerical embedding vectors lets machine learning models forecast realized volatility more accurately than benchmarks that rely only on past price data. It reports that news adds incremental predictive content in out-of-sample tests across stocks, with larger effects when the articles mention the specific stock and when volatility is already elevated. Forecasts that blend the news signal with a leading benchmark show better statistical accuracy and produce economically relevant gains. An explainability step identifies the news themes that matter most for the predictions. Volatility forecasts matter for option pricing, risk management, and portfolio allocation, so any timely extra signal from public news text is potentially valuable.

Core claim

News text can be turned into embedding-based representations that, when used in machine learning models, supply forward-looking information about future realized volatility. This information is incremental to what is contained in historical volatility measures, shows stronger effects for stock-specific news and high-volatility days, and produces consistent statistical and economic improvements when added to a leading benchmark model.

What carries the argument

Embedding-based representations of news text, which convert unstructured articles into dense vectors that serve as input features for volatility forecasting models.

If this is right

Standalone models that use only news embeddings exhibit out-of-sample predictive ability for realized volatility.
The predictive contribution is larger when the news content is directly related to the individual stock rather than general market events.
Forecast improvements are more noticeable on days when realized volatility is already high.
Blending the news-based signal with a standard realized-volatility benchmark produces both better statistical metrics and economically meaningful gains.
Explainability analysis isolates the specific news themes that drive the volatility predictions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same embedding approach could be tested on other quantities such as trading volume or bid-ask spreads if news text contains analogous forward-looking content.
Deploying the method in live trading would require careful handling of publication lags and real-time embedding updates.
Extending the cross-section to include international stocks or other asset classes would test whether the documented effects generalize.
Combining the news embeddings with high-frequency intraday data might further sharpen the timing of volatility signals.

Load-bearing premise

The embedding vectors extracted from news text capture genuine forward-looking signals about volatility rather than noise, selection bias, or information that is already reflected in past prices.

What would settle it

Re-running the out-of-sample tests after randomly shuffling the alignment between news publication dates and subsequent volatility realizations would remove the reported performance edge if the signal is real.

Figures

Figures reproduced from arXiv: 2108.00480 by Eghbal Rahimikia, Ser-Huang Poon, Stefan Zohren.

**Figure 3.** Figure 3: An abstract representation of model Notes: {X(t,1), X(t,2), ..., X(t,kt)} consists of news headlines of day t and X(t,kt) is the k th token of input t. Also, RVt+1 is the RV of day t + 1 (next day RV). Padding with a maximum length of 500 is adopted to ensure that all inputs of the neural network have the same length. The word embedding block consists of two different word embeddings. To capture days witho… view at source ↗

**Figure 4.** Figure 4: A detailed representation of model Notes: The sentence matrix is a 500×300 matrix with a maximum length of padding of 500 and word embedding dimensions of 300. In this matrix, each token is defined by a vector of 300 values. This structure contains three filters of different sizes. The filters with the size of 1, 2, and 3 generate feature maps with the size of 500, 499, and 498, respectively. Global max po… view at source ↗

**Figure 5.** Figure 5: Distribution of daily tokens Notes: The number of daily tokens is calculated, and their distributions are plotted after putting daily stock-related news (left plot) and general hot news (right plot) headlines together (train data - 2046 days). The vertical line is the chosen maximum length of the padding. the set S, N contains all model inputs, |S|! shows the number of different ways the chosen set of toke… view at source ↗

**Figure 6.** Figure 6: Out-of-sample word cloud Notes: (a) Word cloud of stock-related headlines for all 23 stocks together over the out-of-sample period. (b) Word cloud of general hot headlines over the out-of-sample period. For each stock, we obtained the SHAP values for the constituent n-grams of all the textual information used to forecast RV for that stock. In order to identify the volatility movers for the entire sample o… view at source ↗

**Figure 7.** Figure 7: Yearly RV forecasting performance (Stock-related news) [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗

**Figure 8.** Figure 8: Monthly average percentage of OOV (in stock-related news) over the out-of-sample period [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗

**Figure 9.** Figure 9: Yearly RV forecasting performance (General hot news) [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗

**Figure 10.** Figure 10: SHAP values for the top negative LM words [PITH_FULL_IMAGE:figures/full_fig_p030_10.png] view at source ↗

**Figure 11.** Figure 11: Robustness checks RC results (Stock-related news) [PITH_FULL_IMAGE:figures/full_fig_p035_11.png] view at source ↗

read the original abstract

We examine whether news can improve realised volatility forecasting using a modern yet operationally simple NLP framework. News text is transformed into embedding-based representations, and forecasts are evaluated both as a standalone, news-only model and as a complement to standard realised volatility benchmarks. In out-of-sample tests on a cross-section of stocks, news contains useful predictive information, with stronger effects for stock-related content and during high volatility days. Combining the news-based signal with a leading benchmark yields consistent improvements in statistical performance and economically meaningful gains, while explainability analysis highlights the news themes most relevant for volatility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

News embeddings from financial text add a usable but modest signal to realized volatility forecasts, mainly as a complement rather than a replacement.

read the letter

The core finding is that turning news into embedding vectors produces out-of-sample gains for realized volatility, stronger when the news is stock-specific and on high-volatility days, and that adding the signal to a standard benchmark improves both statistical accuracy and economic metrics. The work is new in applying off-the-shelf financial embeddings to this exact forecasting problem in a clean standalone-plus-complement setup, rather than just another neural net on returns. It does the basics right by keeping the NLP step simple, reporting both statistical and economic results, and including an explainability pass on which themes drive the signal. That combination is useful for anyone already running volatility models who wants to test a text add-on without heavy engineering. The main soft spot is the lack of concrete numbers in the abstract—no RMSE differences, no t-stats, no details on how many stocks or how the out-of-sample window was split—so the size of the improvement and its robustness are still unclear. The obvious risk is leakage or selection bias in the news processing, and without seeing the exact timing rules and baseline specifications it is hard to judge how cleanly that was handled. Minor issues include the usual question of whether the gains survive transaction costs or alternative embeddings, but those are normal for this type of paper. This is aimed at applied researchers and practitioners in risk management and derivatives who already work with realized volatility; a reader looking for a practical text signal to bolt onto existing models will get the most out of it. The claims are testable and the design is transparent enough that it deserves a full referee rather than a desk reject.

Referee Report

2 major / 3 minor

Summary. The paper examines whether news text, transformed via embedding-based representations, can improve realised volatility forecasts. It evaluates standalone news-only models and combinations with standard RV benchmarks in out-of-sample tests across a cross-section of stocks, claiming that news contains useful predictive information (stronger for stock-related content and high-volatility days), that combinations yield statistical and economic gains, and that explainability analysis identifies relevant news themes.

Significance. If the empirical results hold under rigorous validation, the work demonstrates a practical, operationally simple NLP pipeline for incorporating textual signals into volatility forecasting. Strengths include the focus on out-of-sample evaluation, differential effects by news type and market regime, and the addition of economic significance metrics alongside statistical ones.

major comments (2)

[Abstract / §1] The abstract and introduction assert consistent OOS improvements and economic gains from combining the news signal with a leading benchmark, but the provided text supplies no quantitative values (e.g., percentage reduction in MSE, Diebold-Mariano statistics, or Sharpe-ratio differentials). Without these numbers or the precise baseline definitions, the magnitude and robustness of the central claim cannot be assessed from the summary alone.
[§4 (Empirical Design)] The evaluation protocol must explicitly rule out look-ahead bias in the news corpus (e.g., publication timestamps relative to the volatility measurement window and any filtering that could introduce selection effects). The weakest assumption noted in the review—that embeddings capture forward-looking information without substantial leakage—directly affects the validity of the reported OOS gains and requires a dedicated robustness subsection.

minor comments (3)

[§3] Clarify the precise word-embedding architecture, vocabulary construction, and any domain-specific fine-tuning steps; the current description is high-level.
[§5] Add error bars or bootstrap confidence intervals to the reported performance differentials and state the exact cross-validation or rolling-window scheme used for the OOS tests.
[§6] The explainability analysis would benefit from a table listing the top news themes by importance score together with their associated volatility impact signs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for minor revision. We address each major comment below and will incorporate the suggested changes into the revised manuscript.

read point-by-point responses

Referee: [Abstract / §1] The abstract and introduction assert consistent OOS improvements and economic gains from combining the news signal with a leading benchmark, but the provided text supplies no quantitative values (e.g., percentage reduction in MSE, Diebold-Mariano statistics, or Sharpe-ratio differentials). Without these numbers or the precise baseline definitions, the magnitude and robustness of the central claim cannot be assessed from the summary alone.

Authors: We agree that the abstract and introduction would benefit from including key quantitative results to better convey the magnitude of the improvements. While the full manuscript reports these metrics in detail (e.g., MSE reductions, DM test statistics, and economic significance measures in Sections 5–6), the abstract remains qualitative. We will revise the abstract to incorporate concise quantitative highlights, such as average out-of-sample MSE improvements and significance levels, along with clearer baseline definitions. revision: yes
Referee: [§4 (Empirical Design)] The evaluation protocol must explicitly rule out look-ahead bias in the news corpus (e.g., publication timestamps relative to the volatility measurement window and any filtering that could introduce selection effects). The weakest assumption noted in the review—that embeddings capture forward-looking information without substantial leakage—directly affects the validity of the reported OOS gains and requires a dedicated robustness subsection.

Authors: We agree that an explicit discussion of look-ahead bias is necessary for transparency. The current manuscript describes the use of publication timestamps to align news with the forecast horizon, but we will add a dedicated robustness subsection (likely in §4 or §5) that details timestamp alignment procedures, any selection filters applied to the news corpus, and additional checks (such as restricting to news published strictly before the volatility measurement window) to confirm the absence of leakage. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central claim rests on out-of-sample forecasting performance of embedding-based news representations for realised volatility, evaluated both standalone and as an additive signal to standard benchmarks. No equations or steps are described that reduce a claimed prediction to a fitted parameter by construction, nor does any load-bearing premise rely on self-citation chains or imported uniqueness theorems. The methodology follows standard NLP embedding plus regression pipelines whose predictive content is independently testable against external benchmarks; the derivation chain therefore remains self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are identifiable from the provided text.

pith-pipeline@v0.9.0 · 5627 in / 976 out tokens · 22248 ms · 2026-05-24T12:22:09.862246+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We develop FinText, a financial word embedding... NLP model supported by different word embeddings improves realised volatility forecasts on high volatility days... SHAP... volatility movers
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

HAR-family... CHAR model... ensemble models... LOB data and news sentiments

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 6 internal anchors

[1]

a mmer, P. and R. A. Sch \

Ad \"a mmer, P. and R. A. Sch \"u ssler (2020). Forecasting the equity premium: mind the news! Review of Finance\/ 24\/ (6), 1313--1355

work page 2020
[2]

Alfonseca, K

Agirre, E., E. Alfonseca, K. Hall, J. Kravalova, M. Pasca, and A. Soroa (2009). A study on similarity and relatedness using distributional and wordnet-based approaches

work page 2009
[3]

El Karoui, and A

Ban, G.-Y., N. El Karoui, and A. E. Lim (2018). Machine learning and portfolio optimization. Management Science\/ 64\/ (3), 1136--1154

work page 2018
[4]

B \"u chner, and A

Bianchi, D., M. B \"u chner, and A. Tamoni (2021). Bond risk premiums with machine learning. The Review of Financial Studies\/ 34\/ (2), 1046--1089

work page 2021
[5]

Grave, A

Bojanowski, P., E. Grave, A. Joulin, and T. Mikolov (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics\/ 5 , 135--146

work page 2017
[6]

Bollerslev, T., A. J. Patton, and R. Quaedvlieg (2016). Exploiting the errors: A simple approach for improved volatility forecasting. Journal of Econometrics\/ 192\/ (1), 1--18

work page 2016
[7]

Bubna, A., S. R. Das, and N. Prabhala (2020). Venture capital communities. Journal of Financial and Quantitative Analysis\/ 55\/ (2), 621--651

work page 2020
[8]

Bybee, L., B. T. Kelly, A. Manela, and D. Xiu (2020). The structure of economic news. Technical report, National Bureau of Economic Research

work page 2020
[9]

Pelger, and J

Chen, L., M. Pelger, and J. Zhu (2020). Deep learning in asset pricing. Available at SSRN 3350138\/

work page 2020
[10]

Weston, L

Collobert, R., J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa (2011). Natural language processing (almost) from scratch. Journal of machine learning research\/ 12\/ (ARTICLE), 2493--2537

work page 2011
[11]

Conrad, C. and R. F. Engle (2021). Modelling volatility cycles: the (mf)\^ 2 garch model. Available at SSRN\/

work page 2021
[12]

Corsi, F. (2009). A simple approximate long-memory model of realized volatility. Journal of Financial Econometrics\/ 7\/ (2), 174--196

work page 2009
[13]

Corsi, F. and R. Reno (2009). Har volatility modelling with heterogeneous leverage and jumps. Available at SSRN 1316953\/

work page 2009
[14]

Engle, R. F. and S. Martins (2020). Measuring and hedging geopolitical risk

work page 2020
[15]

Engle, R. F. and V. K. Ng (1993). Measuring and testing the impact of news on volatility. The journal of finance\/ 48\/ (5), 1749--1778

work page 1993
[16]

Kelly, and M

Gentzkow, M., B. Kelly, and M. Taddy (2019). Text as data. Journal of Economic Literature\/ 57\/ (3), 535--74

work page 2019
[17]

Kelly, and D

Gu, S., B. Kelly, and D. Xiu (2020). Empirical asset pricing via machine learning. The Review of Financial Studies\/ 33\/ (5), 2223--2273

work page 2020
[18]

Kelly, and D

Gu, S., B. Kelly, and D. Xiu (2021). Autoencoder asset pricing models. Journal of Econometrics\/ 222\/ (1), 429--450

work page 2021
[19]

Reichart, and A

Hill, F., R. Reichart, and A. Korhonen (2015). Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics\/ 41\/ (4), 665--695

work page 2015
[20]

Jiang, J., B. T. Kelly, and D. Xiu (2020). (re-) imag (in) ing price trends. Chicago Booth Research Paper\/ (21-01)

work page 2020
[21]

Grave, P

Joulin, A., E. Grave, P. Bojanowski, M. Douze, H. Jégou, and T. Mikolov (2016). Fasttext.zip: Compressing text classification models

work page 2016
[22]

Sade, and A

Kalay, A., O. Sade, and A. Wohl (2004). Measuring stock illiquidity: An investigation of the demand and supply schedules at the tase. Journal of Financial Economics\/ 74\/ (3), 461--486

work page 2004
[23]

Ke, Z. T., B. T. Kelly, and D. Xiu (2019). Predicting returns with text data. Technical report, National Bureau of Economic Research

work page 2019
[24]

Kim, Y. (2014). Convolutional neural networks for sentence classification. CoRR\/ abs/1408.5882

work page internal anchor Pith review Pith/arXiv arXiv 2014
[25]

Kingma, D. P. and J. Ba (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980\/

work page internal anchor Pith review Pith/arXiv arXiv 2014
[26]

Li, K., X. Liu, F. Mai, and T. Zhang (2020). The role of corporate culture in bad times: Evidence from the covid-19 pandemic. Journal of Financial and Quantitative Analysis\/ , 1--68

work page 2020
[27]

Zohren, and S

Lim, B., S. Zohren, and S. Roberts (2019). Enhancing time-series momentum strategies using deep neural networks. The Journal of Financial Data Science\/ 1\/ (4), 19--38

work page 2019
[28]

Loughran, T. and B. McDonald (2011). When is a liability not a liability? textual analysis, dictionaries, and 10-ks. The Journal of Finance\/ 66\/ (1), 35--65

work page 2011
[29]

Loughran, T. and B. McDonald (2016). Textual analysis in accounting and finance: A survey. Journal of Accounting Research\/ 54\/ (4), 1187--1230

work page 2016
[30]

A Unified Approach to Interpreting Model Predictions

Lundberg, S. and S.-I. Lee (2017). A unified approach to interpreting model predictions. arXiv preprint arXiv:1705.07874\/

work page internal anchor Pith review Pith/arXiv arXiv 2017
[31]

Mikolov, T., K. Chen, G. Corrado, and J. Dean (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781\/

work page internal anchor Pith review Pith/arXiv arXiv 2013
[32]

Grave, P

Mikolov, T., E. Grave, P. Bojanowski, C. Puhrsch, and A. Joulin (2018). Advances in pre-training distributed word representations. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018)

work page 2018
[33]

Distributed Representations of Words and Phrases and their Compositionality

Mikolov, T., I. Sutskever, K. Chen, G. Corrado, and J. Dean (2013). Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:1310.4546\/

work page internal anchor Pith review Pith/arXiv arXiv 2013
[34]

Morin, F. and Y. Bengio (2005). Hierarchical probabilistic neural network language model. In Aistats , Volume 5, pp.\ 246--252. Citeseer

work page 2005
[35]

N s, R. and J. A. Skjeltorp (2006). Order book characteristics and the volume--volatility relation: Empirical evidence from a limit order market. Journal of Financial Markets\/ 9\/ (4), 408--432

work page 2006
[36]

Obaid, K. and K. Pukthuanthong (2021). A picture is worth a thousand words: Measuring investor sentiment by combining machine learning and photos from news. Journal of Financial Economics\/

work page 2021
[37]

Patton, A. J. (2011). Volatility forecast comparison using imperfect volatility proxies. Journal of Econometrics\/ 160\/ (1), 246--256

work page 2011
[38]

Patton, A. J. and K. Sheppard (2015). Good volatility, bad volatility: Signed jumps and the persistence of volatility. Review of Economics and Statistics\/ 97\/ (3), 683--697

work page 2015
[39]

Poh, D., B. Lim, S. Zohren, and S. Roberts (2021). Building cross-sectional systematic strategies by learning to rank. The Journal of Financial Data Science\/ 3\/ (2), 70--86

work page 2021
[40]

Politis, D. N. and J. P. Romano (1994). The stationary bootstrap. Journal of the American Statistical association\/ 89\/ (428), 1303--1313

work page 1994
[41]

and S.-H

Rahimikia, E. and S.-H. Poon (2020a). Big data approach to realised volatility forecasting using har model augmented with limit order book and news. Available at SSRN 3684040\/

work page
[42]

and S.-H

Rahimikia, E. and S.-H. Poon (2020b). Machine learning for realised volatility forecasting. Available at SSRN 3707796\/

work page
[43]

Shapiro, A. H., M. Sudhof, and D. J. Wilson (2020). Measuring news sentiment. Journal of Econometrics\/

work page 2020
[44]

Greenside, and A

Shrikumar, A., P. Greenside, and A. Kundaje (2017). Learning important features through propagating activation differences. In International Conference on Machine Learning , pp.\ 3145--3153. PMLR

work page 2017
[45]

Sirignano, J. and R. Cont (2019). Universal features of price formation in financial markets: perspectives from deep learning. Quantitative Finance\/ 19\/ (9), 1449--1459

work page 2019
[46]

Taly, and Q

Sundararajan, M., A. Taly, and Q. Yan (2017). Axiomatic attribution for deep networks. In International Conference on Machine Learning , pp.\ 3319--3328. PMLR

work page 2017
[47]

Roberts, and S

Wood, K., S. Roberts, and S. Zohren (2021). Slow momentum with fast reversion: A trading strategy using deep learning and changepoint detection. arXiv preprint arXiv:2105.13727\/

work page arXiv 2021
[48]

Wu, W., J. Chen, Z. Yang, and M. L. Tindall (2020). A cross-sectional machine learning approach for hedge fund return prediction and selection. Management Science\/

work page 2020
[49]

Zhang, Z. and S. Zohren (2021). Multi-horizon forecasting for limit order books: Novel deep learning approaches and hardware acceleration using intelligent processing units. arXiv preprint arXiv:2105.10430\/

work page arXiv 2021
[50]

BDLOB: Bayesian Deep Convolutional Neural Networks for Limit Order Books

Zhang, Z., S. Zohren, and S. Roberts (2018). Bdlob: Bayesian deep convolutional neural networks for limit order books. arXiv preprint arXiv:1811.10041\/

work page internal anchor Pith review Pith/arXiv arXiv 2018
[51]

Zohren, and S

Zhang, Z., S. Zohren, and S. Roberts (2020). Deep reinforcement learning for trading. The Journal of Financial Data Science\/ 2\/ (2), 25--40

work page 2020

[1] [1]

a mmer, P. and R. A. Sch \

Ad \"a mmer, P. and R. A. Sch \"u ssler (2020). Forecasting the equity premium: mind the news! Review of Finance\/ 24\/ (6), 1313--1355

work page 2020

[2] [2]

Alfonseca, K

Agirre, E., E. Alfonseca, K. Hall, J. Kravalova, M. Pasca, and A. Soroa (2009). A study on similarity and relatedness using distributional and wordnet-based approaches

work page 2009

[3] [3]

El Karoui, and A

Ban, G.-Y., N. El Karoui, and A. E. Lim (2018). Machine learning and portfolio optimization. Management Science\/ 64\/ (3), 1136--1154

work page 2018

[4] [4]

B \"u chner, and A

Bianchi, D., M. B \"u chner, and A. Tamoni (2021). Bond risk premiums with machine learning. The Review of Financial Studies\/ 34\/ (2), 1046--1089

work page 2021

[5] [5]

Grave, A

Bojanowski, P., E. Grave, A. Joulin, and T. Mikolov (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics\/ 5 , 135--146

work page 2017

[6] [6]

Bollerslev, T., A. J. Patton, and R. Quaedvlieg (2016). Exploiting the errors: A simple approach for improved volatility forecasting. Journal of Econometrics\/ 192\/ (1), 1--18

work page 2016

[7] [7]

Bubna, A., S. R. Das, and N. Prabhala (2020). Venture capital communities. Journal of Financial and Quantitative Analysis\/ 55\/ (2), 621--651

work page 2020

[8] [8]

Bybee, L., B. T. Kelly, A. Manela, and D. Xiu (2020). The structure of economic news. Technical report, National Bureau of Economic Research

work page 2020

[9] [9]

Pelger, and J

Chen, L., M. Pelger, and J. Zhu (2020). Deep learning in asset pricing. Available at SSRN 3350138\/

work page 2020

[10] [10]

Weston, L

Collobert, R., J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa (2011). Natural language processing (almost) from scratch. Journal of machine learning research\/ 12\/ (ARTICLE), 2493--2537

work page 2011

[11] [11]

Conrad, C. and R. F. Engle (2021). Modelling volatility cycles: the (mf)\^ 2 garch model. Available at SSRN\/

work page 2021

[12] [12]

Corsi, F. (2009). A simple approximate long-memory model of realized volatility. Journal of Financial Econometrics\/ 7\/ (2), 174--196

work page 2009

[13] [13]

Corsi, F. and R. Reno (2009). Har volatility modelling with heterogeneous leverage and jumps. Available at SSRN 1316953\/

work page 2009

[14] [14]

Engle, R. F. and S. Martins (2020). Measuring and hedging geopolitical risk

work page 2020

[15] [15]

Engle, R. F. and V. K. Ng (1993). Measuring and testing the impact of news on volatility. The journal of finance\/ 48\/ (5), 1749--1778

work page 1993

[16] [16]

Kelly, and M

Gentzkow, M., B. Kelly, and M. Taddy (2019). Text as data. Journal of Economic Literature\/ 57\/ (3), 535--74

work page 2019

[17] [17]

Kelly, and D

Gu, S., B. Kelly, and D. Xiu (2020). Empirical asset pricing via machine learning. The Review of Financial Studies\/ 33\/ (5), 2223--2273

work page 2020

[18] [18]

Kelly, and D

Gu, S., B. Kelly, and D. Xiu (2021). Autoencoder asset pricing models. Journal of Econometrics\/ 222\/ (1), 429--450

work page 2021

[19] [19]

Reichart, and A

Hill, F., R. Reichart, and A. Korhonen (2015). Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics\/ 41\/ (4), 665--695

work page 2015

[20] [20]

Jiang, J., B. T. Kelly, and D. Xiu (2020). (re-) imag (in) ing price trends. Chicago Booth Research Paper\/ (21-01)

work page 2020

[21] [21]

Grave, P

Joulin, A., E. Grave, P. Bojanowski, M. Douze, H. Jégou, and T. Mikolov (2016). Fasttext.zip: Compressing text classification models

work page 2016

[22] [22]

Sade, and A

Kalay, A., O. Sade, and A. Wohl (2004). Measuring stock illiquidity: An investigation of the demand and supply schedules at the tase. Journal of Financial Economics\/ 74\/ (3), 461--486

work page 2004

[23] [23]

Ke, Z. T., B. T. Kelly, and D. Xiu (2019). Predicting returns with text data. Technical report, National Bureau of Economic Research

work page 2019

[24] [24]

Kim, Y. (2014). Convolutional neural networks for sentence classification. CoRR\/ abs/1408.5882

work page internal anchor Pith review Pith/arXiv arXiv 2014

[25] [25]

Kingma, D. P. and J. Ba (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980\/

work page internal anchor Pith review Pith/arXiv arXiv 2014

[26] [26]

Li, K., X. Liu, F. Mai, and T. Zhang (2020). The role of corporate culture in bad times: Evidence from the covid-19 pandemic. Journal of Financial and Quantitative Analysis\/ , 1--68

work page 2020

[27] [27]

Zohren, and S

Lim, B., S. Zohren, and S. Roberts (2019). Enhancing time-series momentum strategies using deep neural networks. The Journal of Financial Data Science\/ 1\/ (4), 19--38

work page 2019

[28] [28]

Loughran, T. and B. McDonald (2011). When is a liability not a liability? textual analysis, dictionaries, and 10-ks. The Journal of Finance\/ 66\/ (1), 35--65

work page 2011

[29] [29]

Loughran, T. and B. McDonald (2016). Textual analysis in accounting and finance: A survey. Journal of Accounting Research\/ 54\/ (4), 1187--1230

work page 2016

[30] [30]

A Unified Approach to Interpreting Model Predictions

Lundberg, S. and S.-I. Lee (2017). A unified approach to interpreting model predictions. arXiv preprint arXiv:1705.07874\/

work page internal anchor Pith review Pith/arXiv arXiv 2017

[31] [31]

Mikolov, T., K. Chen, G. Corrado, and J. Dean (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781\/

work page internal anchor Pith review Pith/arXiv arXiv 2013

[32] [32]

Grave, P

Mikolov, T., E. Grave, P. Bojanowski, C. Puhrsch, and A. Joulin (2018). Advances in pre-training distributed word representations. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018)

work page 2018

[33] [33]

Distributed Representations of Words and Phrases and their Compositionality

Mikolov, T., I. Sutskever, K. Chen, G. Corrado, and J. Dean (2013). Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:1310.4546\/

work page internal anchor Pith review Pith/arXiv arXiv 2013

[34] [34]

Morin, F. and Y. Bengio (2005). Hierarchical probabilistic neural network language model. In Aistats , Volume 5, pp.\ 246--252. Citeseer

work page 2005

[35] [35]

N s, R. and J. A. Skjeltorp (2006). Order book characteristics and the volume--volatility relation: Empirical evidence from a limit order market. Journal of Financial Markets\/ 9\/ (4), 408--432

work page 2006

[36] [36]

Obaid, K. and K. Pukthuanthong (2021). A picture is worth a thousand words: Measuring investor sentiment by combining machine learning and photos from news. Journal of Financial Economics\/

work page 2021

[37] [37]

Patton, A. J. (2011). Volatility forecast comparison using imperfect volatility proxies. Journal of Econometrics\/ 160\/ (1), 246--256

work page 2011

[38] [38]

Patton, A. J. and K. Sheppard (2015). Good volatility, bad volatility: Signed jumps and the persistence of volatility. Review of Economics and Statistics\/ 97\/ (3), 683--697

work page 2015

[39] [39]

Poh, D., B. Lim, S. Zohren, and S. Roberts (2021). Building cross-sectional systematic strategies by learning to rank. The Journal of Financial Data Science\/ 3\/ (2), 70--86

work page 2021

[40] [40]

Politis, D. N. and J. P. Romano (1994). The stationary bootstrap. Journal of the American Statistical association\/ 89\/ (428), 1303--1313

work page 1994

[41] [41]

and S.-H

Rahimikia, E. and S.-H. Poon (2020a). Big data approach to realised volatility forecasting using har model augmented with limit order book and news. Available at SSRN 3684040\/

work page

[42] [42]

and S.-H

Rahimikia, E. and S.-H. Poon (2020b). Machine learning for realised volatility forecasting. Available at SSRN 3707796\/

work page

[43] [43]

Shapiro, A. H., M. Sudhof, and D. J. Wilson (2020). Measuring news sentiment. Journal of Econometrics\/

work page 2020

[44] [44]

Greenside, and A

Shrikumar, A., P. Greenside, and A. Kundaje (2017). Learning important features through propagating activation differences. In International Conference on Machine Learning , pp.\ 3145--3153. PMLR

work page 2017

[45] [45]

Sirignano, J. and R. Cont (2019). Universal features of price formation in financial markets: perspectives from deep learning. Quantitative Finance\/ 19\/ (9), 1449--1459

work page 2019

[46] [46]

Taly, and Q

Sundararajan, M., A. Taly, and Q. Yan (2017). Axiomatic attribution for deep networks. In International Conference on Machine Learning , pp.\ 3319--3328. PMLR

work page 2017

[47] [47]

Roberts, and S

Wood, K., S. Roberts, and S. Zohren (2021). Slow momentum with fast reversion: A trading strategy using deep learning and changepoint detection. arXiv preprint arXiv:2105.13727\/

work page arXiv 2021

[48] [48]

Wu, W., J. Chen, Z. Yang, and M. L. Tindall (2020). A cross-sectional machine learning approach for hedge fund return prediction and selection. Management Science\/

work page 2020

[49] [49]

Zhang, Z. and S. Zohren (2021). Multi-horizon forecasting for limit order books: Novel deep learning approaches and hardware acceleration using intelligent processing units. arXiv preprint arXiv:2105.10430\/

work page arXiv 2021

[50] [50]

BDLOB: Bayesian Deep Convolutional Neural Networks for Limit Order Books

Zhang, Z., S. Zohren, and S. Roberts (2018). Bdlob: Bayesian deep convolutional neural networks for limit order books. arXiv preprint arXiv:1811.10041\/

work page internal anchor Pith review Pith/arXiv arXiv 2018

[51] [51]

Zohren, and S

Zhang, Z., S. Zohren, and S. Roberts (2020). Deep reinforcement learning for trading. The Journal of Financial Data Science\/ 2\/ (2), 25--40

work page 2020