E-backtesting
Pith reviewed 2026-05-24 11:33 UTC · model grok-4.3
The pith
Unique backtest e-statistics for VaR and ES enable model-free e-processes for risk forecast validation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Backtest e-statistics are introduced to formulate e-processes for risk measure forecasts, and unique forms of backtest e-statistics for VaR and ES are characterized using recent results on identification functions. For a given backtest e-statistic, a few criteria for optimally constructing the e-processes are studied. The proposed method can be naturally applied to many other risk measures and statistical quantities.
What carries the argument
Backtest e-statistics, functions derived from identification functions that form the basis for constructing e-processes to test risk measure forecasts.
If this is right
- The method yields valid model-free tests for ES forecasts without distributional assumptions.
- It extends directly to backtesting many other risk measures and statistical quantities.
- Criteria for optimal e-process construction improve the power or efficiency of the tests.
- Extensive simulation studies and data analysis confirm practical advantages over literature methods.
Where Pith is reading between the lines
- Regulators could use these tests to oversee bank ES forecasts with fewer modeling restrictions.
- The technique connects e-value methods from statistics to regulatory risk management.
- Extensions to dependent data or multi-period forecasts would be natural next steps.
- A counterexample dataset where the e-process fails to stay bounded under correct forecasts would falsify the validity claim.
Load-bearing premise
That e-values and e-processes can be applied directly via backtest e-statistics to produce valid model-free tests for ES without requiring additional assumptions about the underlying data distribution or model.
What would settle it
A simulation or real dataset where forecasts are correct but the constructed e-process exceeds a high threshold like 20 with substantial probability under the null, violating the claimed error control.
Figures
read the original abstract
In the recent Basel Accords, the Expected Shortfall (ES) replaces the Value-at-Risk (VaR) as the standard risk measure for market risk in the banking sector, making it the most important risk measure in financial regulation. One of the most challenging tasks in risk modeling practice is to backtest ES forecasts provided by financial institutions. To design a model-free backtesting procedure for ES, we make use of the recently developed techniques of e-values and e-processes. Backtest e-statistics are introduced to formulate e-processes for risk measure forecasts, and unique forms of backtest e-statistics for VaR and ES are characterized using recent results on identification functions. For a given backtest e-statistic, a few criteria for optimally constructing the e-processes are studied. The proposed method can be naturally applied to many other risk measures and statistical quantities. We conduct extensive simulation studies and data analysis to illustrate the advantages of the model-free backtesting method, and compare it with the ones in the literature.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to introduce backtest e-statistics derived from identification functions to construct e-processes for model-free backtesting of VaR and ES forecasts. It characterizes unique forms of these e-statistics for VaR and ES, studies optimality criteria for e-process construction, and illustrates the approach with simulation studies and real-data analysis, asserting applicability to other risk measures.
Significance. If the central technical step holds, the work would provide a significant advance by enabling sequential, distribution-free backtesting of ES (now the regulatory standard under Basel), leveraging e-process theory for potentially anytime-valid tests. Credit is due for the systematic use of identification functions to derive the e-statistics and for the empirical comparisons.
major comments (2)
- [§3] §3 (characterization of backtest e-statistics for ES): the identification function for ES is joint with VaR and has mean zero under the null by construction, but the specific transformation to an e-statistic whose unconditional expectation is ≤1 for arbitrary distributions (including heavy-tailed returns with possibly infinite moments) is not derived or verified explicitly; this step is load-bearing for the model-free claim.
- [§4] §4 (e-process construction): the optimality criteria and validity of the resulting e-processes for ES presuppose that the backtest e-statistic satisfies the e-value property unconditionally; without an explicit proof or counterexample analysis under the paper's weakest assumption (no extra regularity), the guarantee for model-free ES backtesting remains open.
minor comments (2)
- [Abstract] The abstract states that 'unique forms' are characterized but does not reference the specific theorem or equation number; adding this cross-reference would improve readability.
- [§2] Notation for the identification functions and the resulting e-statistics could be introduced with a short table or displayed equation in §2 to aid readers unfamiliar with the recent e-process literature.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on our manuscript. The points raised highlight the need for more explicit technical details on the e-value property under minimal assumptions. We respond to each major comment below.
read point-by-point responses
-
Referee: [§3] §3 (characterization of backtest e-statistics for ES): the identification function for ES is joint with VaR and has mean zero under the null by construction, but the specific transformation to an e-statistic whose unconditional expectation is ≤1 for arbitrary distributions (including heavy-tailed returns with possibly infinite moments) is not derived or verified explicitly; this step is load-bearing for the model-free claim.
Authors: We thank the referee for this observation. Section 3 characterizes the unique backtest e-statistics via identification functions, which are jointly defined for ES and VaR and satisfy conditional mean zero under the null by construction. The transformation to an e-statistic (non-negative with unconditional expectation ≤1) follows from standard e-value constructions applied to these mean-zero functions. We acknowledge that the explicit derivation and verification for arbitrary distributions, including heavy-tailed cases with possibly infinite moments, was not presented in full detail. In the revision we will add a dedicated paragraph (or short appendix) providing this derivation under the paper's stated weakest assumptions, confirming the model-free property. revision: yes
-
Referee: [§4] §4 (e-process construction): the optimality criteria and validity of the resulting e-processes for ES presuppose that the backtest e-statistic satisfies the e-value property unconditionally; without an explicit proof or counterexample analysis under the paper's weakest assumption (no extra regularity), the guarantee for model-free ES backtesting remains open.
Authors: The referee is correct that the e-process validity and optimality criteria in Section 4 rest on the backtest e-statistic being an unconditional e-value. The constructions follow directly from e-process theory once this property holds. We agree that an explicit proof (or counterexample analysis) under the paper's weakest assumptions is required to close the argument for model-free ES backtesting. In the revision we will insert a short proof of the unconditional e-value property for the ES backtest e-statistic, together with a brief discussion of the no-extra-regularity case. revision: yes
Circularity Check
No circularity: derivation applies external identification results to backtesting without self-referential reduction
full rationale
The paper characterizes unique backtest e-statistics for VaR and ES by invoking recent external results on identification functions, then constructs e-processes from them. No equations or steps in the provided abstract or description reduce a claimed prediction or uniqueness result to a fitted parameter or self-citation chain internal to this work. The model-free application is presented as a direct use of those external tools rather than a re-derivation that loops back on the paper's own inputs. This is the normal case of an independent application of prior mathematical results.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Identification functions exist for VaR and ES and can characterize unique backtest e-statistics
invented entities (1)
-
backtest e-statistics
no independent evidence
Reference graph
Works this paper leans on
-
[1]
C. Acerbi and B. Sz\'ekely. Backtesting Expected Shortfall . Risk Magazine, 27 0 (11): 0 76--81, 2014
work page 2014
-
[2]
C. Acerbi and B. Sz\'ekely. General properties of backtestable statistics. Preprint, SSRN:2905109 , 2017
work page 2017
-
[3]
S. Agrawal, W. Koolen, and S. Juneja. Optimal best-arm identification methods for tail-risk measures. Advances in Neural Information Processing Systems, 34: 0 25578--25590, 2021
work page 2021
-
[4]
P. Artzner, F. Delbaen, J.-M. Eber, and D. Heath. Coherent measures of risk. Mathematical Finance, 9 0 (3): 0 203--228, 1999
work page 1999
-
[5]
Value-at-risk-based risk management: optimal policies and asset prices
Suleyman Basak and Alexander Shapiro. Value-at-risk-based risk management: optimal policies and asset prices. The review of financial studies, 14 0 (2): 0 371--405, 2001
work page 2001
-
[6]
S. Bayer and T. Dimitriadis. Regression-based E xpected S hortfall backtesting. Journal of Financial Econometrics, 20 0 (3): 0 437--471, 2022
work page 2022
-
[7]
Consultative Document: Fundamental Review of the Trading Book (Oct 2013)
BCBS. Consultative Document: Fundamental Review of the Trading Book (Oct 2013). Bank for International Settlements, 2013. https://www.bis.org/publ/bcbs219.pdf
work page 2013
-
[8]
Minimum Capital Requirements for Market Risk (Feb 2016)
BCBS. Minimum Capital Requirements for Market Risk (Feb 2016). Bank for International Settlements, 2016. https://www.bis.org/bcbs/publ/d352.pdf
work page 2016
-
[9]
Minimum Capital Requirements for Market Risk (Jan 2019)
BCBS. Minimum Capital Requirements for Market Risk (Jan 2019). Bank for International Settlements, 2019. https://www.bis.org/bcbs/publ/d457.pdf
work page 2019
- [10]
-
[11]
J. Berkowitz, P. Christoffersen, and D. Pelletier. Evaluating V alue-at- R isk models with desk-level data. Management Science, 57 0 (12): 0 2213--2227, 2011
work page 2011
-
[12]
P. F. Christoffersen. Evaluating interval forecasts. International Economic Review, 39 0 (4): 0 841--862, 1998
work page 1998
-
[13]
P. F. Christoffersen. Elements of Financial Risk Management. Academic Press, second edition, 2011
work page 2011
-
[14]
J. C. S. Chu, M. Stinchcombe, and H. White. Monitoring structural change. Econometrica, 64 0 (5): 0 1045--1065, 1996
work page 1996
-
[15]
T. E. Clark and M. W. McCracken. Tests of equal forecast accuracy and encompassing for nested models. Journal of Econometrics, 105 0 (1): 0 85--110, 2001
work page 2001
-
[16]
D. A. Darling and H. Robbins. Confidence sequences for mean, variance, and median. Proceedings of the National Academy of Sciences, 58 0 (1): 0 66--68, 1967
work page 1967
-
[17]
V. H. de la Pe\ na, M. J. Klass, and T. L. Lai. Self-normalized processes: exponential inequalities, moment bounds and iterated logarithm laws. The Annals of Probability, 32 0 (3): 0 1902--1933, 2004
work page 1902
-
[18]
V. H. de la Pe\ na, T. L. Lai, and Q.-M. Shao. Self-Normalized Processes: Limit Theory and Statistical Applications. Probability and Its Applications. Springer, Berlin, first edition, 2009
work page 2009
-
[19]
V. DeMiguel, L. Garlappi, and R. Uppal. Optimal versus naive diversification: How inefficient is the 1/n portfolio strategy? The Review of Financial Studies, 22 0 (5): 0 1915--1953, 2009
work page 1915
-
[20]
F. X. Diebold and R. S. Mariano. Comparing predictive accuracy. Journal of Business & Economic Statistics, 13: 0 253--263, 1995
work page 1995
- [22]
-
[23]
P. Embrechts, T. Mao, Q. Wang, and R. Wang. Bayes risk, elicitability, and the E xpected S hortfall. Mathematical Finance, 31 0 (4): 0 1190--1217, 2021
work page 2021
-
[24]
R. F. Engle and S. Manganelli. Caviar: Conditional autoregressive V alue at R isk by regression quantiles. Journal of Business & Economic Statistics, 22 0 (4): 0 367--381, 2004
work page 2004
-
[25]
J. C. Escanciano and J. Olmo. Backtesting parametric value-at-risk with estimation risk. Journal of Business & Economic Statistics, 28 0 (1): 0 36--51, 2010
work page 2010
-
[26]
T. Fissler. On Higher Order Elicitability and Some Limit Theorems on the P oisson and W iener Space . PhD Thesis , University of Bern, 2017
work page 2017
-
[27]
T. Fissler and H. Holzmann. Measurability of functionals and of ideal point forecasts. Electronic Journal of Statistics, 16 0 (2): 0 5019--5034, 2022
work page 2022
-
[28]
T. Fissler and J. F. Ziegel. Higher order elicitability and O sband's principle. Annals of Statistics, 44 0 (4): 0 1680--1707, 2016
work page 2016
-
[29]
H. F\" o llmer and A. Schied. Stochastic Finance. De Gruyter, Berlin, fourth edition, 2016
work page 2016
-
[30]
R. Frongillo and I. A. Kash. Elicitation complexity of statistical properties. Biometrika, 108 0 (4): 0 857--879, 2021
work page 2021
-
[31]
R. Giacomini and H. White. Tests of conditional predictive ability. Econometrica, 74 0 (6): 0 1545--1578, 2006
work page 2006
- [32]
-
[33]
P. Gr \"u nwald, R. de Heide, and W. M. Koolen. Safe testing. Journal of the Royal Statistical Society, Series B, 2024. To appear
work page 2024
-
[34]
A. Henzi and J. F. Ziegel. Valid sequential inference on probability forecast performance. Biometrika, 109 0 (3): 0 647--663, 2022
work page 2022
-
[35]
Y. Hoga and M. Demetrescu. Monitoring Value-at-Risk and Expected Shortfall forecasts. Management Science, 69 0 (5): 0 2954--2971, 2023
work page 2023
-
[36]
K. Jang, K.-S. Jun, I. Kuzborskij, and F. Orabona. Tighter pac-bayes bounds through coin-betting. In The Thirty Sixth Annual Conference on Learning Theory, pages 2240--2264. PMLR, 2023
work page 2023
-
[37]
J. L. Kelly . A new interpretation of information rate. Bell System Technical Journal, 35 0 (4): 0 917--926, 1956
work page 1956
-
[38]
P. Kupiec. Techniques for verifying the accuracy of risk measurement models. Journal of Derivatives, 3 0 (2): 0 73--84, 1995
work page 1995
-
[39]
A. J. McNeil and R. Frey. Estimation of tail-related risk measures for heteroscedastic financial time series: An extreme value approach. Journal of Empirical Finance, 7 0 (3--4): 0 271--300, 2000
work page 2000
-
[40]
A. J. McNeil, R. Frey, and P. Embrechts. Quantitative Risk Management: Concepts, Techniques and Tools. Princeton University Press, Princeton, NJ, revised edition, 2015
work page 2015
-
[41]
Multivariate location--scale mixtures of normals and mean--variance--skewness portfolio allocation
Javier Menc \' a and Enrique Sentana. Multivariate location--scale mixtures of normals and mean--variance--skewness portfolio allocation. Journal of Econometrics, 153 0 (2): 0 105--121, 2009
work page 2009
-
[42]
F. Moldenhauer and M. Pitera. Backtesting expected shortfall: a simple recipe? Journal of Risk, 22 0 (1): 0 17–--42, 2017
work page 2017
-
[43]
N. Nolde and J. F. Ziegel. Elicitability and backtesting: P erspectives for banking regulation (with discussion). Annals of Applied Statistics, 11 0 (4): 0 1833--1874, 2017
work page 2017
- [44]
- [45]
- [46]
-
[47]
S. Resnick. A Probability Path. Springer, Birkhäuser Boston, MA, first edition, 2019
work page 2019
-
[48]
R. T. Rockafellar and S. Uryasev. Conditional Value-at-Risk for general loss distributions. Journal of Banking & Finance, 26 0 (7): 0 1443--1471, 2002
work page 2002
-
[49]
D. Schmeidler. Subjective probability and expected utility without additivity. Econometrica, 57 0 (3): 0 571--587, 1989
work page 1989
-
[50]
G. Shafer. The language of betting as a strategy for statistical and scientific communication. Journal of the Royal Statistical Society: Series A, 184 0 (2): 0 407--431, 2021
work page 2021
-
[51]
G. Shafer and V. Vovk. Game-Theoretic Foundations for Probability and Finance. John Wiley & Sons, 2019
work page 2019
- [52]
-
[53]
I. Steinwart, C. Pasin, R. Williamson, and S. Zhang. Elicitation and identification of properties. In Proceedings of The 27th Conference on Learning Theory, volume 35 of Proceedings of Machine Learning Research, pages 482--526, Barcelona, Spain, 2014. PMLR
work page 2014
-
[54]
Q. Su, Z. Qin, L. Peng, and G. Qin. Efficiently backtesting conditional Value-at-Risk and conditional Expected Shortfall . Journal of the American Statistical Association, 116 0 (536): 0 2041--2052, 2021
work page 2041
-
[55]
J. ter Schure and P. Gr \"u nwald. ALL-IN meta-analysis: breathing life into living systematic reviews. Preprint, arXiv:2109.12141 , 2021
-
[56]
A. W. van der Vaart. Asymptotic Statistics. Cambridge University Press, Cambridge, 1998
work page 1998
-
[57]
J. Ville. \' E tude critique de la notion de collectif. Th\`eses de l'entre-deux-guerres, 1939
work page 1939
-
[58]
V. Vovk and R. Wang. Merging sequential e-values via martingales. Preprint, arXiv:2007.06382 , 2020
-
[59]
V. Vovk and R. Wang. E-values: Calibration, combination, and applications. Annals of Statistics, 49 0 (3): 0 1736--1754, 2021
work page 2021
-
[60]
V. Vovk and R. Wang. Efficiency of nonparametric e-tests. arXiv preprint arXiv:2208.08925, 2022
-
[61]
V. Vovk and R. Wang. Nonparametric e-tests of symmetry. The New England Journal of Statistics in Data Science, 2: 0 261--270, 2024
work page 2024
-
[62]
V. Vovk, B. Wang, and R. Wang. Admissible ways of merging p-values under arbitrary dependence. The Annals of Statistics, 50 0 (1): 0 351--375, 2022
work page 2022
-
[63]
A. Wald. Sequential tests of statistical hypotheses. The Annals of Mathematical Statistics, 16 0 (2): 0 117--186, 1945
work page 1945
-
[64]
R. Wang and A. Ramdas. False discovery rate control with e-values. Journal of the Royal Statistical Society, Series B, 84 0 (3): 0 822--852, 2022
work page 2022
-
[65]
R. Wang and Y. Wei. Risk functionals with convex level sets. Mathematical Finance, 30 0 (4): 0 1337--1367, 2020
work page 2020
-
[66]
R. Wang and R. Zitikis. An axiomatic foundation for the Expected Shortfall . Management Science, 67 0 (3): 0 1413--1429, 2021
work page 2021
-
[67]
R. Wang, Y. Wei, and G. E. Willmot. Characterization, robustness and aggregation of signed C hoquet integrals. Mathematics of Operations Research, 45 0 (3): 0 993--1015, 2020
work page 2020
-
[68]
L. Wasserman, A. Ramdas, and S. Balakrishnan. Universal inference. Proceedings of the National Academy of Sciences, 117 0 (29): 0 16880--16890, 2020
work page 2020
-
[69]
I. Waudby-Smith and A. Ramdas. Estimating means of bounded random variables by betting. Journal of Royal Statistical Society, Series B, 86 0 (1): 0 1--27, 2024
work page 2024
-
[70]
K. D. West. Asymptotic inference about predictive ability. Econometrica, 64: 0 1067--1084, 1996
work page 1996
-
[71]
M. E. Yaari. The dual theory of choice under risk. Econometrica, 55 0 (1): 0 95--115, 1987
work page 1987
-
[72]
J. F. Ziegel. Coherence and elicitability. Mathematical Finance, 26 0 (4): 0 901--918, 2016
work page 2016
- [73]
-
[74]
T. Dimitriadis, T. Fissler, and J. F. Ziegel. Osband's principle for identification functions. Statistical Papers, 2023. doi: https://doi.org/10.1007/s00362-023-01428-x
-
[75]
T. Fissler. On Higher Order Elicitability and Some Limit Theorems on the P oisson and W iener Space . PhD thesis, University of Bern, 2017
work page 2017
-
[76]
T. Fissler and J. F. Ziegel. Higher order elicitability and Osband's principle. Annals of Statistics, 44(4):1680--1707, 2016
work page 2016
-
[77]
Y. Hoga and M. Demetrescu. Monitoring Value-at-Risk and Expected Shortfall Forecasts. Management Science, 69(5):2954--2971, 2023
work page 2023
-
[78]
McNeil, A. J. and Frey, R. (2000). Estimation of tail-related risk measures for heteroscedastic financial time series: an extreme value approach. Journal of empirical finance, 7(3--4), 271--300
work page 2000
-
[79]
A. J. McNeil, R. Frey, and P. Embrechts . Quantitative Risk Management: Concepts, Techniques and Tools. Princeton University Press, Princeton, NJ, revised edition, 2015
work page 2015
-
[80]
N. Nolde and J. F. Ziegel. Elicitability and backtesting: Perspectives for banking regulation. Annals of Applied Statistics, 11(4):1833--1874, 2017
work page 2017
- [81]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.