ReSGA: A Large Tail Risk Model for Learning Value-at-Risk and Expected Shortfall

Ke Zhu; Yichi Zhang; Zhoufan Zhu

arxiv: 2606.04576 · v1 · pith:3PM74RQHnew · submitted 2026-06-03 · 📊 stat.ML · cs.LG· econ.EM· q-fin.RM

ReSGA: A Large Tail Risk Model for Learning Value-at-Risk and Expected Shortfall

Yichi Zhang , Ke Zhu , Zhoufan Zhu This is my paper

Pith reviewed 2026-06-28 04:10 UTC · model grok-4.3

classification 📊 stat.ML cs.LGecon.EMq-fin.RM

keywords Value-at-RiskExpected ShortfallAutoencoderTail risk modelingFinancial machine learningScaling analysisPortfolio construction

0 comments

The pith

ReSGA forecasts Value-at-Risk and Expected Shortfall more accurately than prior methods by using a large neural network on asset characteristics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the retrieval-enhanced self-grouping autoencoder (ReSGA) as a high-parameter model for joint VaR and ES estimation. It claims that limited-parameter approaches suffer from misspecification when handling big data with many firm characteristics, while ReSGA captures cross-sectional and temporal patterns. Tested on monthly US equity returns from 1926 to 2023 using 153 characteristics, the model shows lower out-of-sample losses and stronger backtesting results against twelve competitors. Forecast gains also produce economic profits via a size-enhanced left-side momentum portfolio strategy. Scaling experiments indicate that performance gains stem mainly from data complexity, and group-importance plus transfer-learning checks support interpretability and generalizability.

Core claim

ReSGA is a large tail risk model with millions of parameters built to exploit rich cross-sectional dependence and long-term temporal dynamics of assets via their characteristics. On monthly US equity returns 1926-2023 with 153 firm characteristics, it beats twelve econometric and machine learning baselines on out-of-sample loss and statistical backtests. The forecast edge produces sizable economic gains in long-short decile portfolios formed by a new size-enhanced left-side momentum strategy. Scaling analysis shows joint VaR-ES improvements arise primarily from data complexity, not model complexity. Group-importance and transfer-learning results confirm interpretability and cross-market appl

What carries the argument

The retrieval-enhanced self-grouping autoencoder (ReSGA), a neural network that retrieves similar assets and groups them to model tail risks with millions of parameters.

If this is right

Joint VaR-ES forecasts improve when models scale to millions of parameters on characteristic-rich data.
Economic profits arise from portfolios that sort assets by the model's tail-risk signals using a size-enhanced left-side momentum rule.
Scaling experiments separate data complexity as the main driver of better VaR-ES accuracy over model size.
Group-importance analysis reveals which characteristics matter most for the forecasts.
Transfer-learning results show the model applies across different markets without full retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Larger historical or cross-market datasets could yield further accuracy gains without increasing model size.
The retrieval and grouping steps might extend to other tail measures such as expected shortfall at multiple horizons.
If data complexity dominates, practitioners could prioritize collecting more characteristics over tuning network depth.

Load-bearing premise

The out-of-sample gains on the 1926-2023 US equity data with 153 characteristics are not artifacts of model selection or data snooping.

What would settle it

ReSGA failing to show lower out-of-sample loss than the twelve baselines on a fresh post-2023 equity dataset or a non-US market would falsify the performance claim.

Figures

Figures reproduced from arXiv: 2606.04576 by Ke Zhu, Yichi Zhang, Zhoufan Zhu.

**Figure 2.** Figure 2: Group importance from ReSGA across months during the out-of-sample period. [PITH_FULL_IMAGE:figures/full_fig_p033_2.png] view at source ↗

read the original abstract

Learning Value-at-Risk (VaR) and Expected Shortfall (ES) is important for managing financial risks effectively. Existing approaches with limited parameters are vulnerable to model misspecification in the era of big data. To address this limitation, we propose a large tail risk model, the retrieval-enhanced self-grouping autoencoder (ReSGA), which is designed with millions of parameters to exploit the rich cross-sectional dependence and long-term temporal dynamics of assets using their characteristics. Applied to monthly US equity returns from 1926 to 2023 with 153 firm characteristics, ReSGA outperforms twelve econometric and machine learning competitors in terms of out-of-sample loss and statistical backtesting. In addition, its forecast advantages can translate into significant economic gains from long-short decile portfolios that are constructed by a new size-enhanced left-side momentum strategy. To clarify the role of complexity, we further conduct a systematic scaling analysis and demonstrate that improvements in joint VaR-ES forecasting are primarily driven by data complexity rather than model complexity. Finally, our analyses of group-importance and transfer-learning exhibit the interpretability and cross-market generalizability of ReSGA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ReSGA introduces a large autoencoder for joint VaR-ES on long equity data but the scaling claims and missing implementation details need checking.

read the letter

ReSGA is a retrieval-enhanced self-grouping autoencoder with millions of parameters for joint VaR and ES forecasting, plus a size-enhanced left-side momentum strategy for portfolios. On monthly US equity returns 1926-2023 with 153 characteristics it claims to beat twelve econometric and machine learning baselines on out-of-sample loss and backtests, with the gains showing up in economic long-short sorts.

The paper does a few things cleanly. It takes the large-model route seriously for tail risk instead of staying with small parametric forms, and it uses a genuinely long historical sample. The group-importance and transfer-learning sections give some interpretability and a check on cross-market use that is often absent. The scaling analysis is a reasonable attempt to ask whether gains come from data complexity rather than just adding parameters.

The soft spots are mostly about missing pieces rather than outright contradictions. The abstract states outperformance without any loss values, backtest statistics, or description of the training objective. There is no mention of regularization or other steps against overfitting with millions of parameters. On the scaling result, the claim that data complexity drives the improvement rests on the complexity measures being independent of the fitted model; if they are computed from post-fit quantities the separation could be circular, exactly as the stress-test note flags. The abstract alone does not let a reader verify this.

This paper is for researchers who build or evaluate large models for financial risk or who work with characteristic-sorted equity data. A reader already running neural nets on tails or studying scaling in finance will get the most from the architecture and the empirical setup. It deserves a serious referee because the core idea is coherent, the dataset is substantial, and the questions it raises are practical, even though the current version will need more technical detail on the loss, regularization, and scaling metrics.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes the retrieval-enhanced self-grouping autoencoder (ReSGA), a neural network with millions of parameters, for joint estimation of Value-at-Risk (VaR) and Expected Shortfall (ES) on monthly US equity returns 1926-2023 using 153 firm characteristics. It claims superior out-of-sample performance and backtest results versus twelve econometric and machine-learning baselines, economic gains from a new size-enhanced left-side momentum long-short strategy, and that a scaling analysis attributes gains primarily to data complexity rather than model complexity, while also showing interpretability via group importance and cross-market transferability.

Significance. If the outperformance, economic gains, and non-circular scaling conclusions hold after verification, the work would be significant for demonstrating the viability of high-capacity models in tail-risk forecasting with rich characteristic data, potentially shifting practice toward scalable neural approaches in financial risk management.

major comments (3)

[Abstract] Abstract: the claim of outperformance over twelve competitors in out-of-sample loss and statistical backtesting supplies no numerical loss values, no description of the training objective, and no mention of regularization against overfitting with millions of parameters; these omissions are load-bearing for assessing whether the reported superiority is robust.
[Scaling analysis] Scaling analysis (described in abstract): the assertion that improvements are driven by data complexity rather than model complexity cannot be verified without the explicit equations or complexity metrics; if these metrics are computed from post-fit quantities such as realized losses or selected hyperparameters, the separation is circular and does not rule out that apparent gains are simply better-tuned large models.
[Abstract] Abstract: the claim that forecast advantages translate into significant economic gains via a new size-enhanced left-side momentum strategy provides no details on portfolio construction, turnover, or statistical significance of the gains, which is central to the economic-value assertion.

minor comments (1)

The manuscript should add a dedicated methods subsection detailing the precise loss function, optimization procedure, and any regularization or early-stopping rules used to train the millions of parameters.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful comments on the manuscript. We provide point-by-point responses below and indicate where revisions will be made to improve the presentation.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of outperformance over twelve competitors in out-of-sample loss and statistical backtesting supplies no numerical loss values, no description of the training objective, and no mention of regularization against overfitting with millions of parameters; these omissions are load-bearing for assessing whether the reported superiority is robust.

Authors: While the abstract is necessarily concise, the out-of-sample loss values are detailed in Table 3, the joint training objective for VaR and ES is specified in Equation (4) of Section 3, and regularization methods to mitigate overfitting are discussed in Section 4.1. We will revise the abstract to incorporate key numerical results and a brief reference to the objective and regularization approach. revision: partial
Referee: [Scaling analysis] Scaling analysis (described in abstract): the assertion that improvements are driven by data complexity rather than model complexity cannot be verified without the explicit equations or complexity metrics; if these metrics are computed from post-fit quantities such as realized losses or selected hyperparameters, the separation is circular and does not rule out that apparent gains are simply better-tuned large models.

Authors: Section 5 presents the scaling analysis with explicit definitions: model complexity is measured by the number of parameters in the autoencoder, and data complexity by the number of firm characteristics and observations. These are pre-determined quantities, not derived from post-fit losses or hyperparameters, thus avoiding circularity. The analysis varies data size while fixing model size and vice versa. We will add the explicit equations to the main text for clarity if they are not sufficiently prominent. revision: partial
Referee: [Abstract] Abstract: the claim that forecast advantages translate into significant economic gains via a new size-enhanced left-side momentum strategy provides no details on portfolio construction, turnover, or statistical significance of the gains, which is central to the economic-value assertion.

Authors: The size-enhanced left-side momentum strategy is described in detail in Section 6, including how portfolios are formed based on ReSGA forecasts. Turnover rates are reported in Table 6, and the statistical significance of the returns is evaluated using t-statistics adjusted for autocorrelation. We will update the abstract to include concise information on these aspects. revision: partial

Circularity Check

0 steps flagged

No circularity in derivation or scaling analysis

full rationale

The paper introduces ReSGA as a large-parameter autoencoder for joint VaR-ES forecasting on equity returns with characteristics, reports out-of-sample superiority over competitors, and performs a scaling analysis attributing gains to data complexity. No quoted equations or sections exhibit self-definitional loops, fitted parameters renamed as predictions, or load-bearing self-citations that reduce the central claims to tautology. The scaling analysis is presented as an empirical demonstration separating data and model complexity; absent explicit reduction of its metrics to post-fit quantities within the same optimization, the derivation chain remains self-contained against external benchmarks and does not trigger any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that a high-parameter autoencoder can capture cross-sectional dependence and temporal dynamics in equity returns without severe overfitting, and that the scaling experiment isolates data complexity from model complexity.

free parameters (1)

millions of parameters in ReSGA
Model size is chosen to exploit rich dependence; exact count and regularization details are not given in abstract.

axioms (1)

domain assumption Large models with retrieval and self-grouping can exploit cross-sectional dependence and long-term temporal dynamics of assets using firm characteristics
Invoked to justify the design of ReSGA.

invented entities (1)

ReSGA (retrieval-enhanced self-grouping autoencoder) no independent evidence
purpose: Joint forecasting of VaR and ES
New model introduced in the paper; no independent evidence outside the reported experiments.

pith-pipeline@v0.9.1-grok · 5746 in / 1521 out tokens · 42370 ms · 2026-06-28T04:10:46.746682+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

123 extracted references · 25 canonical work pages

[1]

Szekely , Balazs B

barticle [author] Acerbi , Carlo C. Szekely , Balazs B. ( 2014 ). Back-testing Expected Shortfall . Risk 27 76--81 . barticle

2014
[2]

Tasche , Dirk D

barticle [author] Acerbi , Carlo C. Tasche , Dirk D. ( 2002 ). On the Coherence of Expected Shortfall . Journal of Banking & Finance 26 1487--1503 . barticle

2002
[3]

barticle [author] Akaike , H. H. ( 1974 ). A New Look at the Statistical Model Identification . IEEE Transactions on Automatic Control 19 716--723 . barticle

1974
[4]

, Delbaen , Freddy F

barticle [author] Artzner , Philippe P. , Delbaen , Freddy F. , Eber , Jean-Marc J.-M. Heath , David D. ( 1999 ). Coherent Measures of Risk . Mathematical Finance 9 203--228 . 10.1111/1467-9965.00068 barticle

work page doi:10.1111/1467-9965.00068 1999
[5]

, Bali , Turan G

barticle [author] Atilgan , Yigit Y. , Bali , Turan G. T. G. , Demirtas , K. Ozgur K. O. Gunaydin , A. Doruk A. D. ( 2020 ). Left-Tail Momentum: Underreaction to Bad News, Costly Arbitrage and Equity Returns . Journal of Financial Economics 135 725--753 . 10.1016/j.jfineco.2019.07.006 barticle

work page doi:10.1016/j.jfineco.2019.07.006 2020
[6]

Dimitriadis , Timo T

barticle [author] Bayer , Sebastian S. Dimitriadis , Timo T. ( 2022 ). Regression-Based Expected Shortfall Backtesting . Journal of Financial Econometrics 20 437--471 . 10.1093/jjfinec/nbaa013 barticle

work page doi:10.1093/jjfinec/nbaa013 2022
[7]

btechreport [author] Berk , Jonathan B. J. B. ( 2023 ). Comment on ``The Virtue of Complexity in Return Prediction'' SSRN Working Paper No. 4410125 , SSRN . btechreport

2023
[8]

( 2025 )

btechreport [author] Buncic , Daniel D. ( 2025 ). Simplified: A Closer Look at the Virtue of Complexity in Return Prediction SSRN Working Paper No. 5239006 , SSRN . btechreport

2025
[9]

, Jin , Qi Q

btechreport [author] Cartea , \'A lvaro \'A . , Jin , Qi Q. Shi , Yuantao Y. ( 2025 ). The Limited Virtue of Complexity in a Noisy World SSRN Working Paper No. 5202064 , SSRN . btechreport

2025
[10]

barticle [author] Christoffersen , Peter F. P. F. ( 1998 ). Evaluating Interval Forecasts . International Economic Review 39 841--862 . barticle

1998
[11]

, Raftapostolos , Aristeidis A

barticle [author] Chronopoulos , Ilias I. , Raftapostolos , Aristeidis A. Kapetanios , George G. ( 2024 ). Forecasting Value-at-Risk Using Deep Neural Network Quantile Regression . Journal of Financial Econometrics 22 636--669 . barticle

2024
[12]

, Gulcehre , Caglar C

barticle [author] Chung , Junyoung J. , Gulcehre , Caglar C. , Cho , KyungHyun K. Bengio , Yoshua Y. ( 2014 ). Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling . ArXiv preprint ``arXiv:1412.3555'' . barticle

Pith/arXiv arXiv 2014
[13]

, Kong , Weihao W

binproceedings [author] Das , Abhimanyu A. , Kong , Weihao W. , Sen , Rajat R. Zhou , Yichen Y. ( 2024 ). A Decoder-Only Foundation Model for Time-Series Forecasting . In Proceedings of the 41st International Conference on Machine Learning 235 10148--10167 . binproceedings

2024
[14]

, Ke , Shikun (Barry) S

btechreport [author] Didisheim , Antoine A. , Ke , Shikun (Barry) S. B. , Kelly , Bryan B. Malamud , Semyon S. ( 2024 ). APT or ``AIPT''? The Surprising Dominance of Large Factor Models Working Paper No. 33012 , National Bureau of Economic Research . 10.3386/w33012 btechreport

work page doi:10.3386/w33012 2024
[15]

barticle [author] Diebold , Francis X. F. X. Mariano , Roberto S. R. S. ( 1995 ). Comparing Predictive Accuracy . Journal of Business & Economic Statistics 13 253--263 . 10.1080/07350015.1995.10524599 barticle

work page doi:10.1080/07350015.1995.10524599 1995
[16]

, Xu , Shuang S

binproceedings [author] Dong , Linhao L. , Xu , Shuang S. Xu , Bo B. ( 2018 ). Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition . In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 5884--5888 . binproceedings

2018
[17]

, Beyer , Lucas L

binproceedings [author] Dosovitskiy , Alexey A. , Beyer , Lucas L. , Kolesnikov , Alexander A. , Weissenborn , Dirk D. , Zhai , Xiaohua X. , Unterthiner , Thomas T. , Dehghani , Mostafa M. , Minderer , Matthias M. , Heigold , Georg G. , Gelly , Sylvain S. , Uszkoreit , Jakob J. Houlsby , Neil N. ( 2021 ). An Image is Worth 16x16 Words: Transformers for Im...

2021
[18]

Escanciano , Juan Carlos J

barticle [author] Du , Zaichao Z. Escanciano , Juan Carlos J. C. ( 2017 ). Backtesting Expected Shortfall: Accounting for Tail Risk . Management Science 63 940--958 . barticle

2017
[19]

, Bengio , Yoshua Y

binproceedings [author] Dugas , Charles C. , Bengio , Yoshua Y. , B\' e lisle , Fran c ois F. , Nadeau , Claude C. Garcia , Ren\' e R. ( 2000 ). Incorporating Second-Order Functional Knowledge for Better Option Pricing . In Advances in Neural Information Processing Systems 13 472--478 . binproceedings

2000
[20]

barticle [author] Fama , Eugene F. E. F. French , Kenneth R. K. R. ( 2015 ). A Five-Factor Asset Pricing Model . Journal of Financial Economics 116 1--22 . 10.1016/j.jfineco.2014.10.010 barticle

work page doi:10.1016/j.jfineco.2014.10.010 2015
[21]

, Huang , Long L

binproceedings [author] Feng , Cheng C. , Huang , Long L. Krompass , Denis D. ( 2024 ). General Time Transformer: An Encoder-Only Foundation Model for Zero-Shot Multivariate Time Series Forecasting . In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management 3757--3761 . 10.1145/3627673.3679931 binproceedings

work page doi:10.1145/3627673.3679931 2024
[22]

Ziegel , Johanna F

barticle [author] Fissler , Tobias T. Ziegel , Johanna F. J. F. ( 2016 ). Higher Order Elicitability and Osband’s Principle . The Annals of Statistics 44 1680--1707 . 10.1214/16-AOS1439 barticle

work page doi:10.1214/16-aos1439 2016
[23]

( 2011 )

barticle [author] Gneiting , Tilmann T. ( 2011 ). Making and Evaluating Point Forecasts . Journal of the American Statistical Association 106 746--762 . 10.1198/jasa.2011.r10138 barticle

work page doi:10.1198/jasa.2011.r10138 2011
[24]

, Bengio , Yoshua Y

bbook [author] Goodfellow , Ian I. , Bengio , Yoshua Y. Courville , Aaron A. ( 2016 ). Deep Learning . MIT press . bbook

2016
[25]

, Kelly , Bryan B

barticle [author] Gu , Shihao S. , Kelly , Bryan B. Xiu , Dacheng D. ( 2020 ). Empirical Asset Pricing via Machine Learning . The Review of Financial Studies 33 2223--2273 . 10.1093/rfs/hhaa009 barticle

work page doi:10.1093/rfs/hhaa009 2020
[26]

, Kelly , Bryan B

barticle [author] Gu , Shihao S. , Kelly , Bryan B. Xiu , Dacheng D. ( 2021 ). Autoencoder Asset Pricing Models . Journal of Econometrics 222 429--450 . 10.1016/j.jeconom.2020.07.009 barticle

work page doi:10.1016/j.jeconom.2020.07.009 2021
[27]

barticle [author] Hansen , Peter R. P. R. , Lunde , Asger A. Nason , James M. J. M. ( 2011 ). The Model Confidence Set . Econometrica 79 453--497 . 10.3982/ECTA5771 barticle

work page doi:10.3982/ecta5771 2011
[28]

Long short-term memory.Neural Comput., 9(8): 1735–1780, November 1997

barticle [author] Hochreiter , Sepp S. Schmidhuber , J \"u rgen J. ( 1997 ). Long Short-Term Memory . Neural Computation 9 1735--1780 . 10.1162/neco.1997.9.8.1735 barticle

work page doi:10.1162/neco.1997.9.8.1735 1997
[29]

Szegedy , Christian C

binproceedings [author] Ioffe , Sergey S. Szegedy , Christian C. ( 2015 ). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . In Proceedings of the 32nd International Conference on Machine Learning 37 448--456 . binproceedings

2015
[30]

barticle [author] Jensen , Theis Ingerslev T. I. , Kelly , Bryan B. Pedersen , Lasse Heje L. H. ( 2023 ). Is There a Replication Crisis in Finance? The Journal of Finance 78 2465--2518 . 10.1111/jofi.13249 barticle

work page doi:10.1111/jofi.13249 2023
[31]

, McCandlish , Sam S

barticle [author] Kaplan , Jared J. , McCandlish , Sam S. , Henighan , Tom T. , Brown , Tom B. T. B. , Chess , Benjamin B. , Child , Rewon R. , Gray , Scott S. , Radford , Alec A. , Wu , Jeffrey J. Amodei , Dario D. ( 2020 ). Scaling Laws for Neural Language Models . ArXiv preprint ``arXiv:2001.08361'' . barticle

Pith/arXiv arXiv 2020
[32]

, Malamud , Semyon S

barticle [author] Kelly , Bryan B. , Malamud , Semyon S. Zhou , Kangying K. ( 2024 ). The Virtue of Complexity in Return Prediction . The Journal of Finance 79 459--503 . 10.1111/jofi.13298 barticle

work page doi:10.1111/jofi.13298 2024
[33]

, Kuznetsov , Boris B

btechreport [author] Kelly , Bryan B. , Kuznetsov , Boris B. , Malamud , Semyon S. Xu , Teng Andrea T. A. ( 2025 ). Artificial Intelligence Asset Pricing Models Working Paper No. 33351 , National Bureau of Economic Research . 10.3386/w33351 btechreport

work page doi:10.3386/w33351 2025
[34]

binproceedings [author] Kingma , Diederik P. D. P. Ba , Jimmy J. ( 2015 ). Adam: A Method for Stochastic Optimization . In International Conference on Learning Representations . binproceedings

2015
[35]

Bassett , Gilbert G

barticle [author] Koenker , Roger R. Bassett , Gilbert G. ( 1978 ). Regression Quantiles . Econometrica 46 33--50 . barticle

1978
[36]

barticle [author] Li , Sophia Zhengzi S. Z. Tang , Yushan Y. ( 2025 ). Automated Volatility Forecasting . Management Science 71 6248--6274 . 10.1287/mnsc.2023.01520 barticle

work page doi:10.1287/mnsc.2023.01520 2025
[37]

Wang , Ruodu R

barticle [author] Li , Hengxin H. Wang , Ruodu R. ( 2023 ). PELVE: Probability Equivalent Level of VaR and ES . Journal of Econometrics 234 353--370 . barticle

2023
[38]

( 1993 )

bbook [author] Masters , Timothy T. ( 1993 ). Practical Neural Network Recipes in C++ . Academic Press . bbook

1993
[39]

, Petrella , Lea L

barticle [author] Merlo , Luca L. , Petrella , Lea L. Raponi , Valentina V. ( 2021 ). Forecasting VaR and ES Using a Joint Quantile Regression and Its Implications in Portfolio Allocation . Journal of Banking & Finance 133 106248 . 10.1016/j.jbankfin.2021.106248 barticle

work page doi:10.1016/j.jbankfin.2021.106248 2021
[40]

( 2025 )

btechreport [author] Nagel , Stefan S. ( 2025 ). Seemingly Virtuous Complexity in Return Prediction Working Paper No. 34104 , National Bureau of Economic Research . 10.3386/w34104 btechreport

work page doi:10.3386/w34104 2025
[41]

barticle [author] Newey , Whitney K. W. K. West , Kenneth D. K. D. ( 1987 ). A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix . Econometrica 55 703--708 . 10.2307/1913610 barticle

work page doi:10.2307/1913610 1987
[42]

Minimum Capital Requirements for Market Risk Basel Committee on Banking Supervision Publication , Bank for International Settlements

btechreport [author] Basel Committee on Banking Supervision ( 2019 ). Minimum Capital Requirements for Market Risk Basel Committee on Banking Supervision Publication , Bank for International Settlements . btechreport

2019
[43]

binproceedings [author] Oreshkin , Boris N. B. N. , Carpov , Dmitri D. , Chapados , Nicolas N. Bengio , Yoshua Y. ( 2020 ). N-BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting . In International Conference on Learning Representations . binproceedings

2020
[44]

barticle [author] Patton , Andrew J. A. J. , Ziegel , Johanna F. J. F. Chen , Rui R. ( 2019 ). Dynamic Semiparametric Models for Expected Shortfall (and Value-at-Risk) . Journal of Econometrics 211 388--413 . 10.1016/j.jeconom.2018.10.008 barticle

work page doi:10.1016/j.jeconom.2018.10.008 2019
[45]

barticle [author] Rockafellar , R Tyrrell R. T. Uryasev , Stanislav S. ( 2002 ). Conditional Value-at-Risk for General Loss Distributions . Journal of Banking & Finance 26 1443--1471 . 10.1016/S0378-4266(02)00271-6 barticle

work page doi:10.1016/s0378-4266(02)00271-6 2002
[46]

barticle [author] Taylor , James W. J. W. ( 2019 ). Forecasting Value at Risk and Expected Shortfall Using a Semiparametric Approach Based on the Asymmetric Laplace Distribution . Journal of Business & Economic Statistics 37 121--133 . 10.1080/07350015.2017.1281815 barticle

work page doi:10.1080/07350015.2017.1281815 2019
[47]

, Shazeer , Noam N

binproceedings [author] Vaswani , Ashish A. , Shazeer , Noam N. , Parmar , Niki N. , Uszkoreit , Jakob J. , Jones , Llion L. , Gomez , Aidan N A. N. , Kaiser , ukasz . Polosukhin , Illia I. ( 2017 ). Attention is All you Need . In Advances in Neural Information Processing Systems 30 5998--6008 . binproceedings

2017
[48]

, Cucurull , Guillem G

binproceedings [author] Veli c kovi \'c , Petar P. , Cucurull , Guillem G. , Casanova , Arantxa A. , Romero , Adriana A. , Li \`o , Pietro P. Bengio , Yoshua Y. ( 2018 ). Graph Attention Networks . In International Conference on Learning Representations . binproceedings

2018
[49]

, Wang , Ruodu R

barticle [author] Wang , Qiuqi Q. , Wang , Ruodu R. Ziegel , Johanna J. ( 2025 ). E-Backtesting . forthcoming in Management Science . 10.1287/mnsc.2023.01659 barticle

work page doi:10.1287/mnsc.2023.01659 2025
[50]

, Zhu , Zhoufan Z

barticle [author] Yang , Xuanling X. , Zhu , Zhoufan Z. , Li , Dong D. Zhu , Ke K. ( 2024 ). Asset Pricing via the Conditional Quantile Variational Autoencoder . Journal of Business & Economic Statistics 42 681--694 . barticle

2024
[51]

, Yang , Chao-Han Huck C.-H

binproceedings [author] Yao , Qingren Q. , Yang , Chao-Han Huck C.-H. H. , Jiang , Renhe R. , Liang , Yuxuan Y. , Jin , Ming M. Pan , Shirui S. ( 2025 ). Towards Neural Scaling Laws for Time Series Foundation Models . In International Conference on Learning Representations . binproceedings

2025
[52]

, Chen , Muxi M

binproceedings [author] Zeng , Ailing A. , Chen , Muxi M. , Zhang , Lei L. Xu , Qiang Q. ( 2023 ). Are Transformers Effective for Time Series Forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence 37 11121--11128 . binproceedings

2023
[53]

, Zhang , Shanghang S

binproceedings [author] Zhou , Haoyi H. , Zhang , Shanghang S. , Peng , Jieqi J. , Zhang , Shuai S. , Li , Jianxin J. , Xiong , Hui H. Zhang , Wancai W. ( 2021 ). Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting . In Proceedings of the AAAI Conference on Artificial Intelligence 35 11106--11115 . binproceedings

2021
[54]

, Zhang , Ningning N

barticle [author] Zhu , Zhoufan Z. , Zhang , Ningning N. Zhu , Ke K. ( 2024 ). Big Portfolio Selection by Graph-Based Conditional Moments Method . Journal of Empirical Finance 78 101533 . 10.1016/j.jempfin.2024.101533 barticle

work page doi:10.1016/j.jempfin.2024.101533 2024
[55]

Zhu , Ke K

barticle [author] Zhu , Zhoufan Z. Zhu , Ke K. ( 2025 ). Machine Learning Vast Dynamic Conditional Covariance Matrices: the Spirit of ``Divide and Conquer'' . Minor revision for Management Science . barticle

2025
[56]

The Review of Financial Studies , volume =

Empirical Asset Pricing via Machine Learning , author =. The Review of Financial Studies , volume =. 2020 , doi =

2020
[57]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Are Transformers Effective for Time Series Forecasting? , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =
[58]

Neural Computation , volume =

Long Short-Term Memory , author =. Neural Computation , volume =. 1997 , doi =

1997
[59]

2014 , journal =

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , author =. 2014 , journal =

2014
[60]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =
[61]

International Conference on Learning Representations , year =

Towards Neural Scaling Laws for Time Series Foundation Models , author =. International Conference on Learning Representations , year =
[62]

Advances in Neural Information Processing Systems , volume =

Incorporating Second-Order Functional Knowledge for Better Option Pricing , author =. Advances in Neural Information Processing Systems , volume =
[63]

Practical Neural Network Recipes in

Masters, Timothy , isbn =. Practical Neural Network Recipes in. 1993 , publisher =

1993
[64]

Proceedings of the 32nd International Conference on Machine Learning , volume =

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , author =. Proceedings of the 32nd International Conference on Machine Learning , volume =
[65]

2016 , isbn =

Deep Learning , author =. 2016 , isbn =

2016
[66]

Basel III: A Global Regulatory Framework for More Resilient Banks and Banking Systems , institution =
[67]

Advances in Neural Information Processing Systems , volume =

Attention is All you Need , author =. Advances in Neural Information Processing Systems , volume =
[68]

International Conference on Learning Representations , year =

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , author =. International Conference on Learning Representations , year =
[69]

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , year =

Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition , author =. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , year =

2018
[70]

Mathematical Finance , volume =

Coherent Measures of Risk , author =. Mathematical Finance , volume =. 1999 , pages =

1999
[71]

, journal =

Fissler, Tobias and Ziegel, Johanna F. , journal =. Higher Order Elicitability and. 2016 , doi =

2016
[72]

Journal of Econometrics , volume =

Dynamic Semiparametric Models for Expected Shortfall (and Value-at-Risk) , author =. Journal of Econometrics , volume =. 2019 , doi =

2019
[73]

Journal of Applied Econometrics , volume =

Generalized Autoregressive Score Models with Applications , author =. Journal of Applied Econometrics , volume =. 2013 , doi =

2013
[74]

Journal of Econometrics , volume =

Generalized Autoregressive Conditional Heteroskedasticity , author =. Journal of Econometrics , volume =. 1986 , doi =

1986
[75]

Journal of Business & Economic Statistics , volume =

Forecasting Value at Risk and Expected Shortfall Using a Semiparametric Approach Based on the Asymmetric Laplace Distribution , author =. Journal of Business & Economic Statistics , volume =. 2019 , doi =

2019
[76]

Forecasting

Merlo, Luca and Petrella, Lea and Raponi, Valentina , journal =. Forecasting. 2021 , doi =

2021
[77]

Management Science , volume =

Automated Volatility Forecasting , author =. Management Science , volume =. 2025 , doi =

2025
[78]

The Journal of Finance , volume =

Is There a Replication Crisis in Finance? , author =. The Journal of Finance , volume =. 2023 , doi =

2023
[79]

The 29th International Conference on Artificial Intelligence and Statistics , year =

Retrieval Augmented Time Series Forecasting , author =. The 29th International Conference on Artificial Intelligence and Statistics , year =
[80]

Journal of Business & Economic Statistics , volume =

Comparing Predictive Accuracy , author =. Journal of Business & Economic Statistics , volume =. 1995 , doi =

1995

Showing first 80 references.

[1] [1]

Szekely , Balazs B

barticle [author] Acerbi , Carlo C. Szekely , Balazs B. ( 2014 ). Back-testing Expected Shortfall . Risk 27 76--81 . barticle

2014

[2] [2]

Tasche , Dirk D

barticle [author] Acerbi , Carlo C. Tasche , Dirk D. ( 2002 ). On the Coherence of Expected Shortfall . Journal of Banking & Finance 26 1487--1503 . barticle

2002

[3] [3]

barticle [author] Akaike , H. H. ( 1974 ). A New Look at the Statistical Model Identification . IEEE Transactions on Automatic Control 19 716--723 . barticle

1974

[4] [4]

, Delbaen , Freddy F

barticle [author] Artzner , Philippe P. , Delbaen , Freddy F. , Eber , Jean-Marc J.-M. Heath , David D. ( 1999 ). Coherent Measures of Risk . Mathematical Finance 9 203--228 . 10.1111/1467-9965.00068 barticle

work page doi:10.1111/1467-9965.00068 1999

[5] [5]

, Bali , Turan G

barticle [author] Atilgan , Yigit Y. , Bali , Turan G. T. G. , Demirtas , K. Ozgur K. O. Gunaydin , A. Doruk A. D. ( 2020 ). Left-Tail Momentum: Underreaction to Bad News, Costly Arbitrage and Equity Returns . Journal of Financial Economics 135 725--753 . 10.1016/j.jfineco.2019.07.006 barticle

work page doi:10.1016/j.jfineco.2019.07.006 2020

[6] [6]

Dimitriadis , Timo T

barticle [author] Bayer , Sebastian S. Dimitriadis , Timo T. ( 2022 ). Regression-Based Expected Shortfall Backtesting . Journal of Financial Econometrics 20 437--471 . 10.1093/jjfinec/nbaa013 barticle

work page doi:10.1093/jjfinec/nbaa013 2022

[7] [7]

btechreport [author] Berk , Jonathan B. J. B. ( 2023 ). Comment on ``The Virtue of Complexity in Return Prediction'' SSRN Working Paper No. 4410125 , SSRN . btechreport

2023

[8] [8]

( 2025 )

btechreport [author] Buncic , Daniel D. ( 2025 ). Simplified: A Closer Look at the Virtue of Complexity in Return Prediction SSRN Working Paper No. 5239006 , SSRN . btechreport

2025

[9] [9]

, Jin , Qi Q

btechreport [author] Cartea , \'A lvaro \'A . , Jin , Qi Q. Shi , Yuantao Y. ( 2025 ). The Limited Virtue of Complexity in a Noisy World SSRN Working Paper No. 5202064 , SSRN . btechreport

2025

[10] [10]

barticle [author] Christoffersen , Peter F. P. F. ( 1998 ). Evaluating Interval Forecasts . International Economic Review 39 841--862 . barticle

1998

[11] [11]

, Raftapostolos , Aristeidis A

barticle [author] Chronopoulos , Ilias I. , Raftapostolos , Aristeidis A. Kapetanios , George G. ( 2024 ). Forecasting Value-at-Risk Using Deep Neural Network Quantile Regression . Journal of Financial Econometrics 22 636--669 . barticle

2024

[12] [12]

, Gulcehre , Caglar C

barticle [author] Chung , Junyoung J. , Gulcehre , Caglar C. , Cho , KyungHyun K. Bengio , Yoshua Y. ( 2014 ). Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling . ArXiv preprint ``arXiv:1412.3555'' . barticle

Pith/arXiv arXiv 2014

[13] [13]

, Kong , Weihao W

binproceedings [author] Das , Abhimanyu A. , Kong , Weihao W. , Sen , Rajat R. Zhou , Yichen Y. ( 2024 ). A Decoder-Only Foundation Model for Time-Series Forecasting . In Proceedings of the 41st International Conference on Machine Learning 235 10148--10167 . binproceedings

2024

[14] [14]

, Ke , Shikun (Barry) S

btechreport [author] Didisheim , Antoine A. , Ke , Shikun (Barry) S. B. , Kelly , Bryan B. Malamud , Semyon S. ( 2024 ). APT or ``AIPT''? The Surprising Dominance of Large Factor Models Working Paper No. 33012 , National Bureau of Economic Research . 10.3386/w33012 btechreport

work page doi:10.3386/w33012 2024

[15] [15]

barticle [author] Diebold , Francis X. F. X. Mariano , Roberto S. R. S. ( 1995 ). Comparing Predictive Accuracy . Journal of Business & Economic Statistics 13 253--263 . 10.1080/07350015.1995.10524599 barticle

work page doi:10.1080/07350015.1995.10524599 1995

[16] [16]

, Xu , Shuang S

binproceedings [author] Dong , Linhao L. , Xu , Shuang S. Xu , Bo B. ( 2018 ). Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition . In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 5884--5888 . binproceedings

2018

[17] [17]

, Beyer , Lucas L

binproceedings [author] Dosovitskiy , Alexey A. , Beyer , Lucas L. , Kolesnikov , Alexander A. , Weissenborn , Dirk D. , Zhai , Xiaohua X. , Unterthiner , Thomas T. , Dehghani , Mostafa M. , Minderer , Matthias M. , Heigold , Georg G. , Gelly , Sylvain S. , Uszkoreit , Jakob J. Houlsby , Neil N. ( 2021 ). An Image is Worth 16x16 Words: Transformers for Im...

2021

[18] [18]

Escanciano , Juan Carlos J

barticle [author] Du , Zaichao Z. Escanciano , Juan Carlos J. C. ( 2017 ). Backtesting Expected Shortfall: Accounting for Tail Risk . Management Science 63 940--958 . barticle

2017

[19] [19]

, Bengio , Yoshua Y

binproceedings [author] Dugas , Charles C. , Bengio , Yoshua Y. , B\' e lisle , Fran c ois F. , Nadeau , Claude C. Garcia , Ren\' e R. ( 2000 ). Incorporating Second-Order Functional Knowledge for Better Option Pricing . In Advances in Neural Information Processing Systems 13 472--478 . binproceedings

2000

[20] [20]

barticle [author] Fama , Eugene F. E. F. French , Kenneth R. K. R. ( 2015 ). A Five-Factor Asset Pricing Model . Journal of Financial Economics 116 1--22 . 10.1016/j.jfineco.2014.10.010 barticle

work page doi:10.1016/j.jfineco.2014.10.010 2015

[21] [21]

, Huang , Long L

binproceedings [author] Feng , Cheng C. , Huang , Long L. Krompass , Denis D. ( 2024 ). General Time Transformer: An Encoder-Only Foundation Model for Zero-Shot Multivariate Time Series Forecasting . In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management 3757--3761 . 10.1145/3627673.3679931 binproceedings

work page doi:10.1145/3627673.3679931 2024

[22] [22]

Ziegel , Johanna F

barticle [author] Fissler , Tobias T. Ziegel , Johanna F. J. F. ( 2016 ). Higher Order Elicitability and Osband’s Principle . The Annals of Statistics 44 1680--1707 . 10.1214/16-AOS1439 barticle

work page doi:10.1214/16-aos1439 2016

[23] [23]

( 2011 )

barticle [author] Gneiting , Tilmann T. ( 2011 ). Making and Evaluating Point Forecasts . Journal of the American Statistical Association 106 746--762 . 10.1198/jasa.2011.r10138 barticle

work page doi:10.1198/jasa.2011.r10138 2011

[24] [24]

, Bengio , Yoshua Y

bbook [author] Goodfellow , Ian I. , Bengio , Yoshua Y. Courville , Aaron A. ( 2016 ). Deep Learning . MIT press . bbook

2016

[25] [25]

, Kelly , Bryan B

barticle [author] Gu , Shihao S. , Kelly , Bryan B. Xiu , Dacheng D. ( 2020 ). Empirical Asset Pricing via Machine Learning . The Review of Financial Studies 33 2223--2273 . 10.1093/rfs/hhaa009 barticle

work page doi:10.1093/rfs/hhaa009 2020

[26] [26]

, Kelly , Bryan B

barticle [author] Gu , Shihao S. , Kelly , Bryan B. Xiu , Dacheng D. ( 2021 ). Autoencoder Asset Pricing Models . Journal of Econometrics 222 429--450 . 10.1016/j.jeconom.2020.07.009 barticle

work page doi:10.1016/j.jeconom.2020.07.009 2021

[27] [27]

barticle [author] Hansen , Peter R. P. R. , Lunde , Asger A. Nason , James M. J. M. ( 2011 ). The Model Confidence Set . Econometrica 79 453--497 . 10.3982/ECTA5771 barticle

work page doi:10.3982/ecta5771 2011

[28] [28]

Long short-term memory.Neural Comput., 9(8): 1735–1780, November 1997

barticle [author] Hochreiter , Sepp S. Schmidhuber , J \"u rgen J. ( 1997 ). Long Short-Term Memory . Neural Computation 9 1735--1780 . 10.1162/neco.1997.9.8.1735 barticle

work page doi:10.1162/neco.1997.9.8.1735 1997

[29] [29]

Szegedy , Christian C

binproceedings [author] Ioffe , Sergey S. Szegedy , Christian C. ( 2015 ). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . In Proceedings of the 32nd International Conference on Machine Learning 37 448--456 . binproceedings

2015

[30] [30]

barticle [author] Jensen , Theis Ingerslev T. I. , Kelly , Bryan B. Pedersen , Lasse Heje L. H. ( 2023 ). Is There a Replication Crisis in Finance? The Journal of Finance 78 2465--2518 . 10.1111/jofi.13249 barticle

work page doi:10.1111/jofi.13249 2023

[31] [31]

, McCandlish , Sam S

barticle [author] Kaplan , Jared J. , McCandlish , Sam S. , Henighan , Tom T. , Brown , Tom B. T. B. , Chess , Benjamin B. , Child , Rewon R. , Gray , Scott S. , Radford , Alec A. , Wu , Jeffrey J. Amodei , Dario D. ( 2020 ). Scaling Laws for Neural Language Models . ArXiv preprint ``arXiv:2001.08361'' . barticle

Pith/arXiv arXiv 2020

[32] [32]

, Malamud , Semyon S

barticle [author] Kelly , Bryan B. , Malamud , Semyon S. Zhou , Kangying K. ( 2024 ). The Virtue of Complexity in Return Prediction . The Journal of Finance 79 459--503 . 10.1111/jofi.13298 barticle

work page doi:10.1111/jofi.13298 2024

[33] [33]

, Kuznetsov , Boris B

btechreport [author] Kelly , Bryan B. , Kuznetsov , Boris B. , Malamud , Semyon S. Xu , Teng Andrea T. A. ( 2025 ). Artificial Intelligence Asset Pricing Models Working Paper No. 33351 , National Bureau of Economic Research . 10.3386/w33351 btechreport

work page doi:10.3386/w33351 2025

[34] [34]

binproceedings [author] Kingma , Diederik P. D. P. Ba , Jimmy J. ( 2015 ). Adam: A Method for Stochastic Optimization . In International Conference on Learning Representations . binproceedings

2015

[35] [35]

Bassett , Gilbert G

barticle [author] Koenker , Roger R. Bassett , Gilbert G. ( 1978 ). Regression Quantiles . Econometrica 46 33--50 . barticle

1978

[36] [36]

barticle [author] Li , Sophia Zhengzi S. Z. Tang , Yushan Y. ( 2025 ). Automated Volatility Forecasting . Management Science 71 6248--6274 . 10.1287/mnsc.2023.01520 barticle

work page doi:10.1287/mnsc.2023.01520 2025

[37] [37]

Wang , Ruodu R

barticle [author] Li , Hengxin H. Wang , Ruodu R. ( 2023 ). PELVE: Probability Equivalent Level of VaR and ES . Journal of Econometrics 234 353--370 . barticle

2023

[38] [38]

( 1993 )

bbook [author] Masters , Timothy T. ( 1993 ). Practical Neural Network Recipes in C++ . Academic Press . bbook

1993

[39] [39]

, Petrella , Lea L

barticle [author] Merlo , Luca L. , Petrella , Lea L. Raponi , Valentina V. ( 2021 ). Forecasting VaR and ES Using a Joint Quantile Regression and Its Implications in Portfolio Allocation . Journal of Banking & Finance 133 106248 . 10.1016/j.jbankfin.2021.106248 barticle

work page doi:10.1016/j.jbankfin.2021.106248 2021

[40] [40]

( 2025 )

btechreport [author] Nagel , Stefan S. ( 2025 ). Seemingly Virtuous Complexity in Return Prediction Working Paper No. 34104 , National Bureau of Economic Research . 10.3386/w34104 btechreport

work page doi:10.3386/w34104 2025

[41] [41]

barticle [author] Newey , Whitney K. W. K. West , Kenneth D. K. D. ( 1987 ). A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix . Econometrica 55 703--708 . 10.2307/1913610 barticle

work page doi:10.2307/1913610 1987

[42] [42]

Minimum Capital Requirements for Market Risk Basel Committee on Banking Supervision Publication , Bank for International Settlements

btechreport [author] Basel Committee on Banking Supervision ( 2019 ). Minimum Capital Requirements for Market Risk Basel Committee on Banking Supervision Publication , Bank for International Settlements . btechreport

2019

[43] [43]

binproceedings [author] Oreshkin , Boris N. B. N. , Carpov , Dmitri D. , Chapados , Nicolas N. Bengio , Yoshua Y. ( 2020 ). N-BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting . In International Conference on Learning Representations . binproceedings

2020

[44] [44]

barticle [author] Patton , Andrew J. A. J. , Ziegel , Johanna F. J. F. Chen , Rui R. ( 2019 ). Dynamic Semiparametric Models for Expected Shortfall (and Value-at-Risk) . Journal of Econometrics 211 388--413 . 10.1016/j.jeconom.2018.10.008 barticle

work page doi:10.1016/j.jeconom.2018.10.008 2019

[45] [45]

barticle [author] Rockafellar , R Tyrrell R. T. Uryasev , Stanislav S. ( 2002 ). Conditional Value-at-Risk for General Loss Distributions . Journal of Banking & Finance 26 1443--1471 . 10.1016/S0378-4266(02)00271-6 barticle

work page doi:10.1016/s0378-4266(02)00271-6 2002

[46] [46]

barticle [author] Taylor , James W. J. W. ( 2019 ). Forecasting Value at Risk and Expected Shortfall Using a Semiparametric Approach Based on the Asymmetric Laplace Distribution . Journal of Business & Economic Statistics 37 121--133 . 10.1080/07350015.2017.1281815 barticle

work page doi:10.1080/07350015.2017.1281815 2019

[47] [47]

, Shazeer , Noam N

binproceedings [author] Vaswani , Ashish A. , Shazeer , Noam N. , Parmar , Niki N. , Uszkoreit , Jakob J. , Jones , Llion L. , Gomez , Aidan N A. N. , Kaiser , ukasz . Polosukhin , Illia I. ( 2017 ). Attention is All you Need . In Advances in Neural Information Processing Systems 30 5998--6008 . binproceedings

2017

[48] [48]

, Cucurull , Guillem G

binproceedings [author] Veli c kovi \'c , Petar P. , Cucurull , Guillem G. , Casanova , Arantxa A. , Romero , Adriana A. , Li \`o , Pietro P. Bengio , Yoshua Y. ( 2018 ). Graph Attention Networks . In International Conference on Learning Representations . binproceedings

2018

[49] [49]

, Wang , Ruodu R

barticle [author] Wang , Qiuqi Q. , Wang , Ruodu R. Ziegel , Johanna J. ( 2025 ). E-Backtesting . forthcoming in Management Science . 10.1287/mnsc.2023.01659 barticle

work page doi:10.1287/mnsc.2023.01659 2025

[50] [50]

, Zhu , Zhoufan Z

barticle [author] Yang , Xuanling X. , Zhu , Zhoufan Z. , Li , Dong D. Zhu , Ke K. ( 2024 ). Asset Pricing via the Conditional Quantile Variational Autoencoder . Journal of Business & Economic Statistics 42 681--694 . barticle

2024

[51] [51]

, Yang , Chao-Han Huck C.-H

binproceedings [author] Yao , Qingren Q. , Yang , Chao-Han Huck C.-H. H. , Jiang , Renhe R. , Liang , Yuxuan Y. , Jin , Ming M. Pan , Shirui S. ( 2025 ). Towards Neural Scaling Laws for Time Series Foundation Models . In International Conference on Learning Representations . binproceedings

2025

[52] [52]

, Chen , Muxi M

binproceedings [author] Zeng , Ailing A. , Chen , Muxi M. , Zhang , Lei L. Xu , Qiang Q. ( 2023 ). Are Transformers Effective for Time Series Forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence 37 11121--11128 . binproceedings

2023

[53] [53]

, Zhang , Shanghang S

binproceedings [author] Zhou , Haoyi H. , Zhang , Shanghang S. , Peng , Jieqi J. , Zhang , Shuai S. , Li , Jianxin J. , Xiong , Hui H. Zhang , Wancai W. ( 2021 ). Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting . In Proceedings of the AAAI Conference on Artificial Intelligence 35 11106--11115 . binproceedings

2021

[54] [54]

, Zhang , Ningning N

barticle [author] Zhu , Zhoufan Z. , Zhang , Ningning N. Zhu , Ke K. ( 2024 ). Big Portfolio Selection by Graph-Based Conditional Moments Method . Journal of Empirical Finance 78 101533 . 10.1016/j.jempfin.2024.101533 barticle

work page doi:10.1016/j.jempfin.2024.101533 2024

[55] [55]

Zhu , Ke K

barticle [author] Zhu , Zhoufan Z. Zhu , Ke K. ( 2025 ). Machine Learning Vast Dynamic Conditional Covariance Matrices: the Spirit of ``Divide and Conquer'' . Minor revision for Management Science . barticle

2025

[56] [56]

The Review of Financial Studies , volume =

Empirical Asset Pricing via Machine Learning , author =. The Review of Financial Studies , volume =. 2020 , doi =

2020

[57] [57]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Are Transformers Effective for Time Series Forecasting? , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =

[58] [58]

Neural Computation , volume =

Long Short-Term Memory , author =. Neural Computation , volume =. 1997 , doi =

1997

[59] [59]

2014 , journal =

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , author =. 2014 , journal =

2014

[60] [60]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =

[61] [61]

International Conference on Learning Representations , year =

Towards Neural Scaling Laws for Time Series Foundation Models , author =. International Conference on Learning Representations , year =

[62] [62]

Advances in Neural Information Processing Systems , volume =

Incorporating Second-Order Functional Knowledge for Better Option Pricing , author =. Advances in Neural Information Processing Systems , volume =

[63] [63]

Practical Neural Network Recipes in

Masters, Timothy , isbn =. Practical Neural Network Recipes in. 1993 , publisher =

1993

[64] [64]

Proceedings of the 32nd International Conference on Machine Learning , volume =

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , author =. Proceedings of the 32nd International Conference on Machine Learning , volume =

[65] [65]

2016 , isbn =

Deep Learning , author =. 2016 , isbn =

2016

[66] [66]

Basel III: A Global Regulatory Framework for More Resilient Banks and Banking Systems , institution =

[67] [67]

Advances in Neural Information Processing Systems , volume =

Attention is All you Need , author =. Advances in Neural Information Processing Systems , volume =

[68] [68]

International Conference on Learning Representations , year =

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , author =. International Conference on Learning Representations , year =

[69] [69]

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , year =

Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition , author =. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , year =

2018

[70] [70]

Mathematical Finance , volume =

Coherent Measures of Risk , author =. Mathematical Finance , volume =. 1999 , pages =

1999

[71] [71]

, journal =

Fissler, Tobias and Ziegel, Johanna F. , journal =. Higher Order Elicitability and. 2016 , doi =

2016

[72] [72]

Journal of Econometrics , volume =

Dynamic Semiparametric Models for Expected Shortfall (and Value-at-Risk) , author =. Journal of Econometrics , volume =. 2019 , doi =

2019

[73] [73]

Journal of Applied Econometrics , volume =

Generalized Autoregressive Score Models with Applications , author =. Journal of Applied Econometrics , volume =. 2013 , doi =

2013

[74] [74]

Journal of Econometrics , volume =

Generalized Autoregressive Conditional Heteroskedasticity , author =. Journal of Econometrics , volume =. 1986 , doi =

1986

[75] [75]

Journal of Business & Economic Statistics , volume =

Forecasting Value at Risk and Expected Shortfall Using a Semiparametric Approach Based on the Asymmetric Laplace Distribution , author =. Journal of Business & Economic Statistics , volume =. 2019 , doi =

2019

[76] [76]

Forecasting

Merlo, Luca and Petrella, Lea and Raponi, Valentina , journal =. Forecasting. 2021 , doi =

2021

[77] [77]

Management Science , volume =

Automated Volatility Forecasting , author =. Management Science , volume =. 2025 , doi =

2025

[78] [78]

The Journal of Finance , volume =

Is There a Replication Crisis in Finance? , author =. The Journal of Finance , volume =. 2023 , doi =

2023

[79] [79]

The 29th International Conference on Artificial Intelligence and Statistics , year =

Retrieval Augmented Time Series Forecasting , author =. The 29th International Conference on Artificial Intelligence and Statistics , year =

[80] [80]

Journal of Business & Economic Statistics , volume =

Comparing Predictive Accuracy , author =. Journal of Business & Economic Statistics , volume =. 1995 , doi =

1995