pith. sign in

arxiv: 2606.04576 · v1 · pith:3PM74RQHnew · submitted 2026-06-03 · 📊 stat.ML · cs.LG· econ.EM· q-fin.RM

ReSGA: A Large Tail Risk Model for Learning Value-at-Risk and Expected Shortfall

Pith reviewed 2026-06-28 04:10 UTC · model grok-4.3

classification 📊 stat.ML cs.LGecon.EMq-fin.RM
keywords Value-at-RiskExpected ShortfallAutoencoderTail risk modelingFinancial machine learningScaling analysisPortfolio construction
0
0 comments X

The pith

ReSGA forecasts Value-at-Risk and Expected Shortfall more accurately than prior methods by using a large neural network on asset characteristics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the retrieval-enhanced self-grouping autoencoder (ReSGA) as a high-parameter model for joint VaR and ES estimation. It claims that limited-parameter approaches suffer from misspecification when handling big data with many firm characteristics, while ReSGA captures cross-sectional and temporal patterns. Tested on monthly US equity returns from 1926 to 2023 using 153 characteristics, the model shows lower out-of-sample losses and stronger backtesting results against twelve competitors. Forecast gains also produce economic profits via a size-enhanced left-side momentum portfolio strategy. Scaling experiments indicate that performance gains stem mainly from data complexity, and group-importance plus transfer-learning checks support interpretability and generalizability.

Core claim

ReSGA is a large tail risk model with millions of parameters built to exploit rich cross-sectional dependence and long-term temporal dynamics of assets via their characteristics. On monthly US equity returns 1926-2023 with 153 firm characteristics, it beats twelve econometric and machine learning baselines on out-of-sample loss and statistical backtests. The forecast edge produces sizable economic gains in long-short decile portfolios formed by a new size-enhanced left-side momentum strategy. Scaling analysis shows joint VaR-ES improvements arise primarily from data complexity, not model complexity. Group-importance and transfer-learning results confirm interpretability and cross-market appl

What carries the argument

The retrieval-enhanced self-grouping autoencoder (ReSGA), a neural network that retrieves similar assets and groups them to model tail risks with millions of parameters.

If this is right

  • Joint VaR-ES forecasts improve when models scale to millions of parameters on characteristic-rich data.
  • Economic profits arise from portfolios that sort assets by the model's tail-risk signals using a size-enhanced left-side momentum rule.
  • Scaling experiments separate data complexity as the main driver of better VaR-ES accuracy over model size.
  • Group-importance analysis reveals which characteristics matter most for the forecasts.
  • Transfer-learning results show the model applies across different markets without full retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Larger historical or cross-market datasets could yield further accuracy gains without increasing model size.
  • The retrieval and grouping steps might extend to other tail measures such as expected shortfall at multiple horizons.
  • If data complexity dominates, practitioners could prioritize collecting more characteristics over tuning network depth.

Load-bearing premise

The out-of-sample gains on the 1926-2023 US equity data with 153 characteristics are not artifacts of model selection or data snooping.

What would settle it

ReSGA failing to show lower out-of-sample loss than the twelve baselines on a fresh post-2023 equity dataset or a non-US market would falsify the performance claim.

Figures

Figures reproduced from arXiv: 2606.04576 by Ke Zhu, Yichi Zhang, Zhoufan Zhu.

Figure 1
Figure 1. Figure 1: Group importance from ReSGA during the out-of-sample period. [PITH_FULL_IMAGE:figures/full_fig_p032_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Group importance from ReSGA across months during the out-of-sample period. [PITH_FULL_IMAGE:figures/full_fig_p033_2.png] view at source ↗
read the original abstract

Learning Value-at-Risk (VaR) and Expected Shortfall (ES) is important for managing financial risks effectively. Existing approaches with limited parameters are vulnerable to model misspecification in the era of big data. To address this limitation, we propose a large tail risk model, the retrieval-enhanced self-grouping autoencoder (ReSGA), which is designed with millions of parameters to exploit the rich cross-sectional dependence and long-term temporal dynamics of assets using their characteristics. Applied to monthly US equity returns from 1926 to 2023 with 153 firm characteristics, ReSGA outperforms twelve econometric and machine learning competitors in terms of out-of-sample loss and statistical backtesting. In addition, its forecast advantages can translate into significant economic gains from long-short decile portfolios that are constructed by a new size-enhanced left-side momentum strategy. To clarify the role of complexity, we further conduct a systematic scaling analysis and demonstrate that improvements in joint VaR-ES forecasting are primarily driven by data complexity rather than model complexity. Finally, our analyses of group-importance and transfer-learning exhibit the interpretability and cross-market generalizability of ReSGA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes the retrieval-enhanced self-grouping autoencoder (ReSGA), a neural network with millions of parameters, for joint estimation of Value-at-Risk (VaR) and Expected Shortfall (ES) on monthly US equity returns 1926-2023 using 153 firm characteristics. It claims superior out-of-sample performance and backtest results versus twelve econometric and machine-learning baselines, economic gains from a new size-enhanced left-side momentum long-short strategy, and that a scaling analysis attributes gains primarily to data complexity rather than model complexity, while also showing interpretability via group importance and cross-market transferability.

Significance. If the outperformance, economic gains, and non-circular scaling conclusions hold after verification, the work would be significant for demonstrating the viability of high-capacity models in tail-risk forecasting with rich characteristic data, potentially shifting practice toward scalable neural approaches in financial risk management.

major comments (3)
  1. [Abstract] Abstract: the claim of outperformance over twelve competitors in out-of-sample loss and statistical backtesting supplies no numerical loss values, no description of the training objective, and no mention of regularization against overfitting with millions of parameters; these omissions are load-bearing for assessing whether the reported superiority is robust.
  2. [Scaling analysis] Scaling analysis (described in abstract): the assertion that improvements are driven by data complexity rather than model complexity cannot be verified without the explicit equations or complexity metrics; if these metrics are computed from post-fit quantities such as realized losses or selected hyperparameters, the separation is circular and does not rule out that apparent gains are simply better-tuned large models.
  3. [Abstract] Abstract: the claim that forecast advantages translate into significant economic gains via a new size-enhanced left-side momentum strategy provides no details on portfolio construction, turnover, or statistical significance of the gains, which is central to the economic-value assertion.
minor comments (1)
  1. The manuscript should add a dedicated methods subsection detailing the precise loss function, optimization procedure, and any regularization or early-stopping rules used to train the millions of parameters.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful comments on the manuscript. We provide point-by-point responses below and indicate where revisions will be made to improve the presentation.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim of outperformance over twelve competitors in out-of-sample loss and statistical backtesting supplies no numerical loss values, no description of the training objective, and no mention of regularization against overfitting with millions of parameters; these omissions are load-bearing for assessing whether the reported superiority is robust.

    Authors: While the abstract is necessarily concise, the out-of-sample loss values are detailed in Table 3, the joint training objective for VaR and ES is specified in Equation (4) of Section 3, and regularization methods to mitigate overfitting are discussed in Section 4.1. We will revise the abstract to incorporate key numerical results and a brief reference to the objective and regularization approach. revision: partial

  2. Referee: [Scaling analysis] Scaling analysis (described in abstract): the assertion that improvements are driven by data complexity rather than model complexity cannot be verified without the explicit equations or complexity metrics; if these metrics are computed from post-fit quantities such as realized losses or selected hyperparameters, the separation is circular and does not rule out that apparent gains are simply better-tuned large models.

    Authors: Section 5 presents the scaling analysis with explicit definitions: model complexity is measured by the number of parameters in the autoencoder, and data complexity by the number of firm characteristics and observations. These are pre-determined quantities, not derived from post-fit losses or hyperparameters, thus avoiding circularity. The analysis varies data size while fixing model size and vice versa. We will add the explicit equations to the main text for clarity if they are not sufficiently prominent. revision: partial

  3. Referee: [Abstract] Abstract: the claim that forecast advantages translate into significant economic gains via a new size-enhanced left-side momentum strategy provides no details on portfolio construction, turnover, or statistical significance of the gains, which is central to the economic-value assertion.

    Authors: The size-enhanced left-side momentum strategy is described in detail in Section 6, including how portfolios are formed based on ReSGA forecasts. Turnover rates are reported in Table 6, and the statistical significance of the returns is evaluated using t-statistics adjusted for autocorrelation. We will update the abstract to include concise information on these aspects. revision: partial

Circularity Check

0 steps flagged

No circularity in derivation or scaling analysis

full rationale

The paper introduces ReSGA as a large-parameter autoencoder for joint VaR-ES forecasting on equity returns with characteristics, reports out-of-sample superiority over competitors, and performs a scaling analysis attributing gains to data complexity. No quoted equations or sections exhibit self-definitional loops, fitted parameters renamed as predictions, or load-bearing self-citations that reduce the central claims to tautology. The scaling analysis is presented as an empirical demonstration separating data and model complexity; absent explicit reduction of its metrics to post-fit quantities within the same optimization, the derivation chain remains self-contained against external benchmarks and does not trigger any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that a high-parameter autoencoder can capture cross-sectional dependence and temporal dynamics in equity returns without severe overfitting, and that the scaling experiment isolates data complexity from model complexity.

free parameters (1)
  • millions of parameters in ReSGA
    Model size is chosen to exploit rich dependence; exact count and regularization details are not given in abstract.
axioms (1)
  • domain assumption Large models with retrieval and self-grouping can exploit cross-sectional dependence and long-term temporal dynamics of assets using firm characteristics
    Invoked to justify the design of ReSGA.
invented entities (1)
  • ReSGA (retrieval-enhanced self-grouping autoencoder) no independent evidence
    purpose: Joint forecasting of VaR and ES
    New model introduced in the paper; no independent evidence outside the reported experiments.

pith-pipeline@v0.9.1-grok · 5746 in / 1521 out tokens · 42370 ms · 2026-06-28T04:10:46.746682+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

123 extracted references · 25 canonical work pages

  1. [1]

    Szekely , Balazs B

    barticle [author] Acerbi , Carlo C. Szekely , Balazs B. ( 2014 ). Back-testing Expected Shortfall . Risk 27 76--81 . barticle

  2. [2]

    Tasche , Dirk D

    barticle [author] Acerbi , Carlo C. Tasche , Dirk D. ( 2002 ). On the Coherence of Expected Shortfall . Journal of Banking & Finance 26 1487--1503 . barticle

  3. [3]

    barticle [author] Akaike , H. H. ( 1974 ). A New Look at the Statistical Model Identification . IEEE Transactions on Automatic Control 19 716--723 . barticle

  4. [4]

    , Delbaen , Freddy F

    barticle [author] Artzner , Philippe P. , Delbaen , Freddy F. , Eber , Jean-Marc J.-M. Heath , David D. ( 1999 ). Coherent Measures of Risk . Mathematical Finance 9 203--228 . 10.1111/1467-9965.00068 barticle

  5. [5]

    , Bali , Turan G

    barticle [author] Atilgan , Yigit Y. , Bali , Turan G. T. G. , Demirtas , K. Ozgur K. O. Gunaydin , A. Doruk A. D. ( 2020 ). Left-Tail Momentum: Underreaction to Bad News, Costly Arbitrage and Equity Returns . Journal of Financial Economics 135 725--753 . 10.1016/j.jfineco.2019.07.006 barticle

  6. [6]

    Dimitriadis , Timo T

    barticle [author] Bayer , Sebastian S. Dimitriadis , Timo T. ( 2022 ). Regression-Based Expected Shortfall Backtesting . Journal of Financial Econometrics 20 437--471 . 10.1093/jjfinec/nbaa013 barticle

  7. [7]

    btechreport [author] Berk , Jonathan B. J. B. ( 2023 ). Comment on ``The Virtue of Complexity in Return Prediction'' SSRN Working Paper No. 4410125 , SSRN . btechreport

  8. [8]

    ( 2025 )

    btechreport [author] Buncic , Daniel D. ( 2025 ). Simplified: A Closer Look at the Virtue of Complexity in Return Prediction SSRN Working Paper No. 5239006 , SSRN . btechreport

  9. [9]

    , Jin , Qi Q

    btechreport [author] Cartea , \'A lvaro \'A . , Jin , Qi Q. Shi , Yuantao Y. ( 2025 ). The Limited Virtue of Complexity in a Noisy World SSRN Working Paper No. 5202064 , SSRN . btechreport

  10. [10]

    barticle [author] Christoffersen , Peter F. P. F. ( 1998 ). Evaluating Interval Forecasts . International Economic Review 39 841--862 . barticle

  11. [11]

    , Raftapostolos , Aristeidis A

    barticle [author] Chronopoulos , Ilias I. , Raftapostolos , Aristeidis A. Kapetanios , George G. ( 2024 ). Forecasting Value-at-Risk Using Deep Neural Network Quantile Regression . Journal of Financial Econometrics 22 636--669 . barticle

  12. [12]

    , Gulcehre , Caglar C

    barticle [author] Chung , Junyoung J. , Gulcehre , Caglar C. , Cho , KyungHyun K. Bengio , Yoshua Y. ( 2014 ). Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling . ArXiv preprint ``arXiv:1412.3555'' . barticle

  13. [13]

    , Kong , Weihao W

    binproceedings [author] Das , Abhimanyu A. , Kong , Weihao W. , Sen , Rajat R. Zhou , Yichen Y. ( 2024 ). A Decoder-Only Foundation Model for Time-Series Forecasting . In Proceedings of the 41st International Conference on Machine Learning 235 10148--10167 . binproceedings

  14. [14]

    , Ke , Shikun (Barry) S

    btechreport [author] Didisheim , Antoine A. , Ke , Shikun (Barry) S. B. , Kelly , Bryan B. Malamud , Semyon S. ( 2024 ). APT or ``AIPT''? The Surprising Dominance of Large Factor Models Working Paper No. 33012 , National Bureau of Economic Research . 10.3386/w33012 btechreport

  15. [15]

    barticle [author] Diebold , Francis X. F. X. Mariano , Roberto S. R. S. ( 1995 ). Comparing Predictive Accuracy . Journal of Business & Economic Statistics 13 253--263 . 10.1080/07350015.1995.10524599 barticle

  16. [16]

    , Xu , Shuang S

    binproceedings [author] Dong , Linhao L. , Xu , Shuang S. Xu , Bo B. ( 2018 ). Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition . In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 5884--5888 . binproceedings

  17. [17]

    , Beyer , Lucas L

    binproceedings [author] Dosovitskiy , Alexey A. , Beyer , Lucas L. , Kolesnikov , Alexander A. , Weissenborn , Dirk D. , Zhai , Xiaohua X. , Unterthiner , Thomas T. , Dehghani , Mostafa M. , Minderer , Matthias M. , Heigold , Georg G. , Gelly , Sylvain S. , Uszkoreit , Jakob J. Houlsby , Neil N. ( 2021 ). An Image is Worth 16x16 Words: Transformers for Im...

  18. [18]

    Escanciano , Juan Carlos J

    barticle [author] Du , Zaichao Z. Escanciano , Juan Carlos J. C. ( 2017 ). Backtesting Expected Shortfall: Accounting for Tail Risk . Management Science 63 940--958 . barticle

  19. [19]

    , Bengio , Yoshua Y

    binproceedings [author] Dugas , Charles C. , Bengio , Yoshua Y. , B\' e lisle , Fran c ois F. , Nadeau , Claude C. Garcia , Ren\' e R. ( 2000 ). Incorporating Second-Order Functional Knowledge for Better Option Pricing . In Advances in Neural Information Processing Systems 13 472--478 . binproceedings

  20. [20]

    barticle [author] Fama , Eugene F. E. F. French , Kenneth R. K. R. ( 2015 ). A Five-Factor Asset Pricing Model . Journal of Financial Economics 116 1--22 . 10.1016/j.jfineco.2014.10.010 barticle

  21. [21]

    , Huang , Long L

    binproceedings [author] Feng , Cheng C. , Huang , Long L. Krompass , Denis D. ( 2024 ). General Time Transformer: An Encoder-Only Foundation Model for Zero-Shot Multivariate Time Series Forecasting . In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management 3757--3761 . 10.1145/3627673.3679931 binproceedings

  22. [22]

    Ziegel , Johanna F

    barticle [author] Fissler , Tobias T. Ziegel , Johanna F. J. F. ( 2016 ). Higher Order Elicitability and Osband’s Principle . The Annals of Statistics 44 1680--1707 . 10.1214/16-AOS1439 barticle

  23. [23]

    ( 2011 )

    barticle [author] Gneiting , Tilmann T. ( 2011 ). Making and Evaluating Point Forecasts . Journal of the American Statistical Association 106 746--762 . 10.1198/jasa.2011.r10138 barticle

  24. [24]

    , Bengio , Yoshua Y

    bbook [author] Goodfellow , Ian I. , Bengio , Yoshua Y. Courville , Aaron A. ( 2016 ). Deep Learning . MIT press . bbook

  25. [25]

    , Kelly , Bryan B

    barticle [author] Gu , Shihao S. , Kelly , Bryan B. Xiu , Dacheng D. ( 2020 ). Empirical Asset Pricing via Machine Learning . The Review of Financial Studies 33 2223--2273 . 10.1093/rfs/hhaa009 barticle

  26. [26]

    , Kelly , Bryan B

    barticle [author] Gu , Shihao S. , Kelly , Bryan B. Xiu , Dacheng D. ( 2021 ). Autoencoder Asset Pricing Models . Journal of Econometrics 222 429--450 . 10.1016/j.jeconom.2020.07.009 barticle

  27. [27]

    barticle [author] Hansen , Peter R. P. R. , Lunde , Asger A. Nason , James M. J. M. ( 2011 ). The Model Confidence Set . Econometrica 79 453--497 . 10.3982/ECTA5771 barticle

  28. [28]

    Long short-term memory.Neural Comput., 9(8): 1735–1780, November 1997

    barticle [author] Hochreiter , Sepp S. Schmidhuber , J \"u rgen J. ( 1997 ). Long Short-Term Memory . Neural Computation 9 1735--1780 . 10.1162/neco.1997.9.8.1735 barticle

  29. [29]

    Szegedy , Christian C

    binproceedings [author] Ioffe , Sergey S. Szegedy , Christian C. ( 2015 ). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . In Proceedings of the 32nd International Conference on Machine Learning 37 448--456 . binproceedings

  30. [30]

    barticle [author] Jensen , Theis Ingerslev T. I. , Kelly , Bryan B. Pedersen , Lasse Heje L. H. ( 2023 ). Is There a Replication Crisis in Finance? The Journal of Finance 78 2465--2518 . 10.1111/jofi.13249 barticle

  31. [31]

    , McCandlish , Sam S

    barticle [author] Kaplan , Jared J. , McCandlish , Sam S. , Henighan , Tom T. , Brown , Tom B. T. B. , Chess , Benjamin B. , Child , Rewon R. , Gray , Scott S. , Radford , Alec A. , Wu , Jeffrey J. Amodei , Dario D. ( 2020 ). Scaling Laws for Neural Language Models . ArXiv preprint ``arXiv:2001.08361'' . barticle

  32. [32]

    , Malamud , Semyon S

    barticle [author] Kelly , Bryan B. , Malamud , Semyon S. Zhou , Kangying K. ( 2024 ). The Virtue of Complexity in Return Prediction . The Journal of Finance 79 459--503 . 10.1111/jofi.13298 barticle

  33. [33]

    , Kuznetsov , Boris B

    btechreport [author] Kelly , Bryan B. , Kuznetsov , Boris B. , Malamud , Semyon S. Xu , Teng Andrea T. A. ( 2025 ). Artificial Intelligence Asset Pricing Models Working Paper No. 33351 , National Bureau of Economic Research . 10.3386/w33351 btechreport

  34. [34]

    binproceedings [author] Kingma , Diederik P. D. P. Ba , Jimmy J. ( 2015 ). Adam: A Method for Stochastic Optimization . In International Conference on Learning Representations . binproceedings

  35. [35]

    Bassett , Gilbert G

    barticle [author] Koenker , Roger R. Bassett , Gilbert G. ( 1978 ). Regression Quantiles . Econometrica 46 33--50 . barticle

  36. [36]

    barticle [author] Li , Sophia Zhengzi S. Z. Tang , Yushan Y. ( 2025 ). Automated Volatility Forecasting . Management Science 71 6248--6274 . 10.1287/mnsc.2023.01520 barticle

  37. [37]

    Wang , Ruodu R

    barticle [author] Li , Hengxin H. Wang , Ruodu R. ( 2023 ). PELVE: Probability Equivalent Level of VaR and ES . Journal of Econometrics 234 353--370 . barticle

  38. [38]

    ( 1993 )

    bbook [author] Masters , Timothy T. ( 1993 ). Practical Neural Network Recipes in C++ . Academic Press . bbook

  39. [39]

    , Petrella , Lea L

    barticle [author] Merlo , Luca L. , Petrella , Lea L. Raponi , Valentina V. ( 2021 ). Forecasting VaR and ES Using a Joint Quantile Regression and Its Implications in Portfolio Allocation . Journal of Banking & Finance 133 106248 . 10.1016/j.jbankfin.2021.106248 barticle

  40. [40]

    ( 2025 )

    btechreport [author] Nagel , Stefan S. ( 2025 ). Seemingly Virtuous Complexity in Return Prediction Working Paper No. 34104 , National Bureau of Economic Research . 10.3386/w34104 btechreport

  41. [41]

    barticle [author] Newey , Whitney K. W. K. West , Kenneth D. K. D. ( 1987 ). A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix . Econometrica 55 703--708 . 10.2307/1913610 barticle

  42. [42]

    Minimum Capital Requirements for Market Risk Basel Committee on Banking Supervision Publication , Bank for International Settlements

    btechreport [author] Basel Committee on Banking Supervision ( 2019 ). Minimum Capital Requirements for Market Risk Basel Committee on Banking Supervision Publication , Bank for International Settlements . btechreport

  43. [43]

    binproceedings [author] Oreshkin , Boris N. B. N. , Carpov , Dmitri D. , Chapados , Nicolas N. Bengio , Yoshua Y. ( 2020 ). N-BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting . In International Conference on Learning Representations . binproceedings

  44. [44]

    barticle [author] Patton , Andrew J. A. J. , Ziegel , Johanna F. J. F. Chen , Rui R. ( 2019 ). Dynamic Semiparametric Models for Expected Shortfall (and Value-at-Risk) . Journal of Econometrics 211 388--413 . 10.1016/j.jeconom.2018.10.008 barticle

  45. [45]

    barticle [author] Rockafellar , R Tyrrell R. T. Uryasev , Stanislav S. ( 2002 ). Conditional Value-at-Risk for General Loss Distributions . Journal of Banking & Finance 26 1443--1471 . 10.1016/S0378-4266(02)00271-6 barticle

  46. [46]

    barticle [author] Taylor , James W. J. W. ( 2019 ). Forecasting Value at Risk and Expected Shortfall Using a Semiparametric Approach Based on the Asymmetric Laplace Distribution . Journal of Business & Economic Statistics 37 121--133 . 10.1080/07350015.2017.1281815 barticle

  47. [47]

    , Shazeer , Noam N

    binproceedings [author] Vaswani , Ashish A. , Shazeer , Noam N. , Parmar , Niki N. , Uszkoreit , Jakob J. , Jones , Llion L. , Gomez , Aidan N A. N. , Kaiser , ukasz . Polosukhin , Illia I. ( 2017 ). Attention is All you Need . In Advances in Neural Information Processing Systems 30 5998--6008 . binproceedings

  48. [48]

    , Cucurull , Guillem G

    binproceedings [author] Veli c kovi \'c , Petar P. , Cucurull , Guillem G. , Casanova , Arantxa A. , Romero , Adriana A. , Li \`o , Pietro P. Bengio , Yoshua Y. ( 2018 ). Graph Attention Networks . In International Conference on Learning Representations . binproceedings

  49. [49]

    , Wang , Ruodu R

    barticle [author] Wang , Qiuqi Q. , Wang , Ruodu R. Ziegel , Johanna J. ( 2025 ). E-Backtesting . forthcoming in Management Science . 10.1287/mnsc.2023.01659 barticle

  50. [50]

    , Zhu , Zhoufan Z

    barticle [author] Yang , Xuanling X. , Zhu , Zhoufan Z. , Li , Dong D. Zhu , Ke K. ( 2024 ). Asset Pricing via the Conditional Quantile Variational Autoencoder . Journal of Business & Economic Statistics 42 681--694 . barticle

  51. [51]

    , Yang , Chao-Han Huck C.-H

    binproceedings [author] Yao , Qingren Q. , Yang , Chao-Han Huck C.-H. H. , Jiang , Renhe R. , Liang , Yuxuan Y. , Jin , Ming M. Pan , Shirui S. ( 2025 ). Towards Neural Scaling Laws for Time Series Foundation Models . In International Conference on Learning Representations . binproceedings

  52. [52]

    , Chen , Muxi M

    binproceedings [author] Zeng , Ailing A. , Chen , Muxi M. , Zhang , Lei L. Xu , Qiang Q. ( 2023 ). Are Transformers Effective for Time Series Forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence 37 11121--11128 . binproceedings

  53. [53]

    , Zhang , Shanghang S

    binproceedings [author] Zhou , Haoyi H. , Zhang , Shanghang S. , Peng , Jieqi J. , Zhang , Shuai S. , Li , Jianxin J. , Xiong , Hui H. Zhang , Wancai W. ( 2021 ). Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting . In Proceedings of the AAAI Conference on Artificial Intelligence 35 11106--11115 . binproceedings

  54. [54]

    , Zhang , Ningning N

    barticle [author] Zhu , Zhoufan Z. , Zhang , Ningning N. Zhu , Ke K. ( 2024 ). Big Portfolio Selection by Graph-Based Conditional Moments Method . Journal of Empirical Finance 78 101533 . 10.1016/j.jempfin.2024.101533 barticle

  55. [55]

    Zhu , Ke K

    barticle [author] Zhu , Zhoufan Z. Zhu , Ke K. ( 2025 ). Machine Learning Vast Dynamic Conditional Covariance Matrices: the Spirit of ``Divide and Conquer'' . Minor revision for Management Science . barticle

  56. [56]

    The Review of Financial Studies , volume =

    Empirical Asset Pricing via Machine Learning , author =. The Review of Financial Studies , volume =. 2020 , doi =

  57. [57]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume =

    Are Transformers Effective for Time Series Forecasting? , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =

  58. [58]

    Neural Computation , volume =

    Long Short-Term Memory , author =. Neural Computation , volume =. 1997 , doi =

  59. [59]

    2014 , journal =

    Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , author =. 2014 , journal =

  60. [60]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume =

    Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =

  61. [61]

    International Conference on Learning Representations , year =

    Towards Neural Scaling Laws for Time Series Foundation Models , author =. International Conference on Learning Representations , year =

  62. [62]

    Advances in Neural Information Processing Systems , volume =

    Incorporating Second-Order Functional Knowledge for Better Option Pricing , author =. Advances in Neural Information Processing Systems , volume =

  63. [63]

    Practical Neural Network Recipes in

    Masters, Timothy , isbn =. Practical Neural Network Recipes in. 1993 , publisher =

  64. [64]

    Proceedings of the 32nd International Conference on Machine Learning , volume =

    Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , author =. Proceedings of the 32nd International Conference on Machine Learning , volume =

  65. [65]

    2016 , isbn =

    Deep Learning , author =. 2016 , isbn =

  66. [66]

    Basel III: A Global Regulatory Framework for More Resilient Banks and Banking Systems , institution =

  67. [67]

    Advances in Neural Information Processing Systems , volume =

    Attention is All you Need , author =. Advances in Neural Information Processing Systems , volume =

  68. [68]

    International Conference on Learning Representations , year =

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , author =. International Conference on Learning Representations , year =

  69. [69]

    2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , year =

    Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition , author =. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , year =

  70. [70]

    Mathematical Finance , volume =

    Coherent Measures of Risk , author =. Mathematical Finance , volume =. 1999 , pages =

  71. [71]

    , journal =

    Fissler, Tobias and Ziegel, Johanna F. , journal =. Higher Order Elicitability and. 2016 , doi =

  72. [72]

    Journal of Econometrics , volume =

    Dynamic Semiparametric Models for Expected Shortfall (and Value-at-Risk) , author =. Journal of Econometrics , volume =. 2019 , doi =

  73. [73]

    Journal of Applied Econometrics , volume =

    Generalized Autoregressive Score Models with Applications , author =. Journal of Applied Econometrics , volume =. 2013 , doi =

  74. [74]

    Journal of Econometrics , volume =

    Generalized Autoregressive Conditional Heteroskedasticity , author =. Journal of Econometrics , volume =. 1986 , doi =

  75. [75]

    Journal of Business & Economic Statistics , volume =

    Forecasting Value at Risk and Expected Shortfall Using a Semiparametric Approach Based on the Asymmetric Laplace Distribution , author =. Journal of Business & Economic Statistics , volume =. 2019 , doi =

  76. [76]

    Forecasting

    Merlo, Luca and Petrella, Lea and Raponi, Valentina , journal =. Forecasting. 2021 , doi =

  77. [77]

    Management Science , volume =

    Automated Volatility Forecasting , author =. Management Science , volume =. 2025 , doi =

  78. [78]

    The Journal of Finance , volume =

    Is There a Replication Crisis in Finance? , author =. The Journal of Finance , volume =. 2023 , doi =

  79. [79]

    The 29th International Conference on Artificial Intelligence and Statistics , year =

    Retrieval Augmented Time Series Forecasting , author =. The 29th International Conference on Artificial Intelligence and Statistics , year =

  80. [80]

    Journal of Business & Economic Statistics , volume =

    Comparing Predictive Accuracy , author =. Journal of Business & Economic Statistics , volume =. 1995 , doi =

Showing first 80 references.