pith. sign in

arxiv: 2605.03789 · v1 · submitted 2026-05-05 · 📊 stat.ML · cs.LG

Training-Free Probabilistic Time-Series Forecasting with Conformal Seasonal Pools

Pith reviewed 2026-05-09 15:35 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords conformal predictionprobabilistic forecastingtime seriestraining-freeseasonal naiveprediction intervalscalibrationDeepNPTS
0
0 comments X

The pith

A training-free method using conformal seasonal pools outperforms deep learning forecasters on calibration and speed with no learned parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Conformal Seasonal Pools as a training-free way to produce probabilistic forecasts for time series. It works by mixing historical values from the same season with signed residuals drawn around a basic seasonal naive forecast. When tested on six standard datasets using a rolling-origin evaluation, the adaptive version of this method beats the deep learning baseline DeepNPTS on CRPS, quantile loss, and especially on how often the prediction intervals actually contain the true values. The coverage advantage matters because intervals that miss the truth too often create direct risks in applications such as energy grid management and financial decisions. The authors conclude that simple training-free conformal approaches should become required baselines whenever new learned forecasters are proposed.

Core claim

Conformal Seasonal Pools (CSP) is a training-free probabilistic time-series forecaster that mixes same-season empirical draws with signed residual draws around a seasonal naive forecast. In an audited rolling-origin benchmark on the six time-series datasets where DeepNPTS was originally evaluated, CSP-Adaptive significantly outperforms DeepNPTS on every metric reported, including CRPS, normalized mean quantile loss, and empirical 95% coverage (mean 0.89 versus 0.66), while running over 500 times faster on CPU. The paper notes that DeepNPTS coverage failures are especially severe in the worst windows where no horizon in the multi-step forecast is covered, posing risks in safety-critical uses.

What carries the argument

Conformal Seasonal Pools, a sampling procedure that draws from same-season historical observations combined with signed residuals around a seasonal naive point forecast to generate calibrated probabilistic predictions and intervals.

If this is right

  • CSP produces prediction intervals with empirical coverage much closer to the nominal 95 percent level than trained deep models.
  • No training step is needed, so forecasts can be generated immediately with far lower computational cost.
  • Deep learning forecasters can fail to cover the truth across entire multi-step trajectories in many windows.
  • Training-free conformal methods should serve as mandatory baselines when new non-parametric forecasters are evaluated.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The strong performance on seasonal data suggests the pooling idea may extend naturally to other domains with repeating cycles such as retail demand or environmental monitoring.
  • If the exchangeability holds more broadly, the results indicate that added model complexity does not automatically improve calibration in time-series uncertainty quantification.
  • The speed advantage could allow CSP to be used inside larger ensembles or updated more frequently in operational settings.

Load-bearing premise

The rolling-origin evaluation on the six datasets is free of data leakage and the time-series observations satisfy the exchangeability conditions required for conformal prediction to deliver the stated coverage.

What would settle it

Re-running the rolling-origin experiments on the same six datasets and finding that CSP-Adaptive no longer produces statistically significant coverage improvements over DeepNPTS or that its empirical coverage drops well below the nominal level.

Figures

Figures reproduced from arXiv: 2605.03789 by Valery Manokhin.

Figure 1
Figure 1. Figure 1: Per-window rank distribution (380 forecast windows). Greener bars indicate more view at source ↗
Figure 2
Figure 2. Figure 2: Per-window empirical coverage distribution across all 380 forecast windows. Each view at source ↗
Figure 3
Figure 3. Figure 3: Per-dataset mean coverage, sorted by gap. CSP-Adaptive (green) covers near or view at source ↗
Figure 4
Figure 4. Figure 4: Accuracy vs. runtime. Lower mean CRPS rank is better; the view at source ↗
read the original abstract

We propose Conformal Seasonal Pools (CSP), a training-free probabilistic time-series forecaster that mixes same-season empirical draws with signed residual draws around a seasonal naive forecast. In an audited rolling-origin benchmark on the six time-series datasets where DeepNPTS was originally evaluated (electricity, exchange_rate, solar_energy, taxi, traffic, wikipedia), CSP-Adaptive significantly outperforms DeepNPTS on every metric we report -- CRPS (per-window paired Wilcoxon $p \approx 4 \times 10^{-10}$), normalized mean quantile loss ($p \approx 7 \times 10^{-10}$), and empirical 95% coverage ($p \approx 8 \times 10^{-45}$, mean 0.89 vs 0.66) -- while running over 500x faster on CPU. Coverage is the most decision-critical of these: a 0.95 nominal interval that contains the truth in only ~66% of cases fails the basic calibration desideratum and would not survive deployment in safety- or decision-critical settings. The failure mode is also more severe than aggregate coverage suggests: in the worst 10% of windows, DeepNPTS's prediction interval covers none of the H forecast horizons -- the entire multi-step trajectory misses the truth at every step simultaneously. This poses serious risk in safety- and decision-critical applications such as healthcare, finance, energy operations, and autonomous systems, where prediction intervals that systematically miss the truth across the entire planning horizon translate directly into misclassified patients, regulatory capital failures, grid imbalances, and safety-case violations. CSP achieves all of this with no learned parameters and no training. We argue training-free conformal samplers should be mandatory baselines when evaluating learned non-parametric forecasters.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Conformal Seasonal Pools (CSP), a training-free probabilistic time-series forecaster that mixes same-season empirical draws with signed residual draws around a seasonal naive forecast and applies conformal quantile selection. In a rolling-origin benchmark on the six datasets previously used for DeepNPTS (electricity, exchange_rate, solar_energy, taxi, traffic, wikipedia), the adaptive variant CSP-Adaptive is reported to outperform DeepNPTS on CRPS (paired Wilcoxon p ≈ 4 × 10^{-10}), normalized mean quantile loss (p ≈ 7 × 10^{-10}), and empirical 95% coverage (p ≈ 8 × 10^{-45}, mean 0.89 vs. 0.66) while running over 500× faster on CPU. The paper positions training-free conformal methods as mandatory baselines for evaluating learned forecasters and stresses the practical risks of poor coverage in safety-critical domains.

Significance. If the empirical superiority and coverage results are robust to the exchangeability concerns, the work is significant as a simple, parameter-free baseline that exposes calibration failures in learned methods and runs orders of magnitude faster. The emphasis on coverage as a decision-critical metric and the use of paired statistical tests across multiple datasets and horizons are strengths; the absence of any learned parameters or training is a clear methodological advantage that could shift evaluation standards in the field.

major comments (2)
  1. [Abstract and method description] The coverage superiority claim rests on the assumption that conformal quantile selection delivers valid finite-sample coverage. However, the signed seasonal residuals are unlikely to be exchangeable due to residual serial correlation in real time-series data (electricity, traffic, etc.), which seasonal adjustment does not eliminate. This directly bears on the reported empirical coverage of 0.89 (vs. nominal 0.95) and the p-value comparison; the manuscript must explicitly address whether the guarantee holds or is only heuristic.
  2. [Experiments section] The rolling-origin benchmark description provides no details on data preprocessing steps, the precise definition and implementation of the 'adaptive' variant, or verification that no post-hoc window exclusions or leakage occurred. These omissions are load-bearing for the reproducibility of the CRPS, quantile loss, and coverage results and the Wilcoxon tests.
minor comments (2)
  1. [Abstract] The abstract states 'audited rolling-origin benchmark' without defining the audit criteria or providing the exact hyperparameter choices for CSP-Adaptive; this should be clarified for readers.
  2. [Method] Notation for the mixing of empirical draws and signed residuals could be made more precise with an equation showing the construction of the calibration scores.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important points on theoretical assumptions and experimental reproducibility that we address below. We have prepared revisions to clarify the heuristic nature of the coverage guarantees and to expand the experimental details for full reproducibility.

read point-by-point responses
  1. Referee: [Abstract and method description] The coverage superiority claim rests on the assumption that conformal quantile selection delivers valid finite-sample coverage. However, the signed seasonal residuals are unlikely to be exchangeable due to residual serial correlation in real time-series data (electricity, traffic, etc.), which seasonal adjustment does not eliminate. This directly bears on the reported empirical coverage of 0.89 (vs. nominal 0.95) and the p-value comparison; the manuscript must explicitly address whether the guarantee holds or is only heuristic.

    Authors: We agree that the finite-sample coverage guarantee of conformal prediction requires exchangeability, which is unlikely to hold exactly for signed seasonal residuals in the presence of residual serial correlation, even after seasonal adjustment. In the revised manuscript we will add an explicit subsection stating that the coverage guarantee is heuristic under temporal dependence rather than strictly valid. We will include supporting analysis of empirical coverage as a function of autocorrelation strength across the datasets and will frame the reported superiority (including the Wilcoxon tests) as an empirical result. This revision will accurately qualify the theoretical claim while retaining the practical demonstration that CSP-Adaptive achieves substantially better calibration than the learned baseline. revision: yes

  2. Referee: [Experiments section] The rolling-origin benchmark description provides no details on data preprocessing steps, the precise definition and implementation of the 'adaptive' variant, or verification that no post-hoc window exclusions or leakage occurred. These omissions are load-bearing for the reproducibility of the CRPS, quantile loss, and coverage results and the Wilcoxon tests.

    Authors: We acknowledge that the current experimental description lacks sufficient detail. In the revision we will expand the Experiments section with: (i) complete preprocessing pipelines for each of the six datasets, including normalization, missing-value handling, and seasonal decomposition; (ii) a precise algorithmic definition of CSP-Adaptive, specifying the adaptation rule (dynamic pool-size selection based on recent calibration performance without any learned parameters or training); and (iii) explicit verification that the rolling-origin procedure uses only past data, with no post-hoc window exclusions or leakage. We will also release the full implementation code upon acceptance to allow independent verification of the reported metrics and paired statistical tests. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation or claims

full rationale

The paper proposes CSP as a training-free algorithm that mixes seasonal empirical draws with signed residuals from a naive forecast and applies standard conformal quantile selection. All reported performance claims (CRPS, quantile loss, coverage) are obtained from direct rolling-origin evaluation on six external public datasets against the independent baseline DeepNPTS, with statistical tests on those results. No equation, parameter, or 'prediction' in the abstract or described method is defined in terms of itself or fitted to the target metric; the construction uses off-the-shelf conformal machinery without self-referential reduction. The central empirical superiority therefore rests on observable benchmark outcomes rather than tautological re-expression of inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The method rests on domain assumptions about seasonal stability and exchangeability but introduces no free parameters or new entities; full paper would be needed to confirm whether any design choices (e.g., pool size) function as implicit hyperparameters.

axioms (2)
  • domain assumption Time series exhibit sufficiently stable seasonal patterns to allow reuse of same-season historical observations for sampling
    Invoked directly in the construction of same-season empirical draws around the seasonal naive forecast.
  • domain assumption Residuals around the seasonal naive forecast satisfy approximate exchangeability required for conformal coverage guarantees
    Necessary for the validity of the prediction intervals produced by the pool.

pith-pipeline@v0.9.0 · 5610 in / 1529 out tokens · 37172 ms · 2026-05-09T15:35:30.301296+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

  1. [1]

    and Rangapuram, Syama Sundar and Salinas, David and Schulz, Jasper and Stella, Lorenzo and Turkmen, Ali Caner and Wang, Yuyang , title =

    Alexandrov, Alexander and Benidis, Konstantinos and Bohlke-Schneider, Michael and Flunkert, Valentin and Gasthaus, Jan and Januschowski, Tim and Maddix, Danielle C. and Rangapuram, Syama Sundar and Salinas, David and Schulz, Jasper and Stella, Lorenzo and Turkmen, Ali Caner and Wang, Yuyang , title =. Journal of Machine Learning Research , volume =. 2020 , url =

  2. [2]

    2025 , eprint =

    Barber, Rina Foygel and Pananjady, Ashwin , title =. 2025 , eprint =

  3. [3]

    and Oreshkin, Boris N

    Challu, Cristian and Olivares, Kin G. and Oreshkin, Boris N. and Garza, Federico and Mergenthaler-Canseco, Max and Dubrawski, Artur , title =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2023 , doi =

  4. [4]

    Adaptive Conformal Inference Under Distribution Shift , year =

    Gibbs, Isaac and Cand\`. Adaptive Conformal Inference Under Distribution Shift , year =. 2106.00170 , archivePrefix =

  5. [5]

    , title =

    Gneiting, Tilmann and Balabdaoui, Fadoua and Raftery, Adrian E. , title =. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume =. 2007 , doi =

  6. [6]

    , title =

    Gneiting, Tilmann and Raftery, Adrian E. , title =. Journal of the American Statistical Association , volume =. 2007 , doi =

  7. [7]

    Annual Review of Statistics and Its Application , volume =

    Gneiting, Tilmann and Katzfuss, Matthias , title =. Annual Review of Statistics and Its Application , volume =. 2014 , doi =

  8. [8]

    and Hyndman, Rob J

    Godahewa, Rakshitha and Bergmeir, Christoph and Webb, Geoffrey I. and Hyndman, Rob J. and Montero-Manso, Pablo , title =. 2021 , eprint =

  9. [9]

    , title =

    Grushka-Cockayne, Yael and Jose, Victor Richmond R. , title =. International Journal of Forecasting , volume =. 2020 , doi =

  10. [10]

    Weather and Forecasting , volume =

    Hersbach, Hans , title =. Weather and Forecasting , volume =. 2000 , doi =

  11. [11]

    and Wasserman, Larry , title =

    Lei, Jing and G'Sell, Max and Rinaldo, Alessandro and Tibshirani, Ryan J. and Wasserman, Larry , title =. Journal of the American Statistical Association , volume =. 2018 , doi =

  12. [12]

    and Loeff, Nicolas and Pfister, Tomas , title =

    Lim, Bryan and Arik, Sercan \"O. and Loeff, Nicolas and Pfister, Tomas , title =. International Journal of Forecasting , volume =. 2021 , doi =

  13. [13]

    International Journal of Forecasting , volume =

    Makridakis, Spyros and Spiliotis, Evangelos and Assimakopoulos, Vassilios , title =. International Journal of Forecasting , volume =. 2018 , doi =

  14. [14]

    International Journal of Forecasting , volume =

    Makridakis, Spyros and Spiliotis, Evangelos and Assimakopoulos, Vassilios , title =. International Journal of Forecasting , volume =. 2020 , doi =

  15. [15]

    , title =

    Makridakis, Spyros and Spiliotis, Evangelos and Assimakopoulos, Vassilios and Chen, Zhi and Gaba, Anil and Tsetlin, Ilia and Winkler, Robert L. , title =. International Journal of Forecasting , volume =. 2022 , doi =

  16. [16]

    and Orenstein, Paulo and Ramos, Thiago and Romano, Jo

    Oliveira, Roberto I. and Orenstein, Paulo and Ramos, Thiago and Romano, Jo. Split Conformal Prediction and Non-Exchangeable Data , journal =. 2024 , url =

  17. [17]

    and Carpov, Dmitri and Chapados, Nicolas and Bengio, Yoshua , title =

    Oreshkin, Boris N. and Carpov, Dmitri and Chapados, Nicolas and Bengio, Yoshua , title =. International Conference on Learning Representations , year =

  18. [18]

    Inductive Confidence Machines for Regression

    Papadopoulos, Harris and Proedrou, Kostas and Vovk, Vladimir and Gammerman, Alex , title =. Machine Learning: ECML 2002 , series =. 2002 , publisher =. doi:10.1007/3-540-36755-1_29 , url =

  19. [19]

    and Sheng, Zhenli and Yang, Bin , title =

    Qiu, Xiangfei and Hu, Jilin and Zhou, Lekui and Wu, Xingjian and Du, Junyang and Zhang, Buang and Guo, Chenjuan and Zhou, Aoying and Jensen, Christian S. and Sheng, Zhenli and Yang, Bin , title =. Proceedings of the VLDB Endowment , volume =. 2024 , doi =

  20. [20]

    2023 , eprint =

    Rangapuram, Syama Sundar and Gasthaus, Jan and Stella, Lorenzo and Flunkert, Valentin and Salinas, David and Wang, Yuyang and Januschowski, Tim , title =. 2023 , eprint =

  21. [21]

    Conformalized Quantile Regression , booktitle =

    Romano, Yaniv and Patterson, Evan and Cand\`. Conformalized Quantile Regression , booktitle =. 2019 , url =

  22. [22]

    Journal of Machine Learning Research , volume =

    Shafer, Glenn and Vovk, Vladimir , title =. Journal of Machine Learning Research , volume =. 2008 , url =

  23. [23]

    2005 , doi =

    Vovk, Vladimir and Gammerman, Alex and Shafer, Glenn , title =. 2005 , doi =

  24. [24]

    Proceedings of the Sixth Workshop on Conformal and Probabilistic Prediction and Applications , series =

    Vovk, Vladimir and Shen, Jieli and Manokhin, Valery and Xie, Min-ge , title =. Proceedings of the Sixth Workshop on Conformal and Probabilistic Prediction and Applications , series =. 2017 , publisher =

  25. [25]

    Braverman Readings in Machine Learning

    Vovk, Vladimir and Nouretdinov, Ilia and Manokhin, Valery and Gammerman, Alex , title =. Braverman Readings in Machine Learning. Key Ideas from Inception to Current State , series =. 2018 , publisher =. doi:10.1007/978-3-319-99492-5_4 , url =

  26. [26]

    Proceedings of the Seventh Workshop on Conformal and Probabilistic Prediction and Applications , series =

    Vovk, Vladimir and Nouretdinov, Ilia and Manokhin, Valery and Gammerman, Alex , title =. Proceedings of the Seventh Workshop on Conformal and Probabilistic Prediction and Applications , series =. 2018 , publisher =

  27. [27]

    Neurocomputing , volume =

    Vovk, Vladimir and Nouretdinov, Ilia and Manokhin, Valery and Gammerman, Alex , title =. Neurocomputing , volume =. 2020 , doi =

  28. [28]

    Proceedings of the 38th International Conference on Machine Learning , series =

    Xu, Chen and Xie, Yao , title =. Proceedings of the 38th International Conference on Machine Learning , series =. 2021 , publisher =

  29. [29]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =

    Xu, Chen and Xie, Yao , title =. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =. 2023 , doi =

  30. [30]

    Advances in Neural Information Processing Systems , series =

    Zhang, Jiawen and Wen, Xumeng and Zhang, Zhenwei and Zheng, Shun and Li, Jia and Bian, Jiang , title =. Advances in Neural Information Processing Systems , series =. 2024 , url =