Leave a Window Out: Modifying the Jackknife for Predictive Inference in Time Series

Ashwin Pananjady; Hanyang Jiang; Rina Foygel Barber; Yao Xie

arxiv: 2605.30292 · v2 · pith:AD6ZKOVCnew · submitted 2026-05-28 · 📊 stat.ML · cs.LG· math.ST· stat.ME· stat.TH

Leave a Window Out: Modifying the Jackknife for Predictive Inference in Time Series

Hanyang Jiang , Rina Foygel Barber , Ashwin Pananjady , Yao Xie This is my paper

Pith reviewed 2026-06-29 05:22 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.STstat.MEstat.TH

keywords time seriesjackknifeconformal predictionpredictive inferencecoverage guaranteestabilitytemporal dependence

0 comments

The pith

The vanilla jackknife can lose coverage guarantees in time series data even with mild dependence, but a leave-a-window-out modification restores valid coverage when the predictor is stable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that the standard leave-one-out jackknife procedure can fail to deliver valid coverage for prediction intervals in time series data, even when dependence is mild. To address this, it introduces the leave-a-window-out method, which excludes a block of recent observations during resampling. This approach guarantees coverage if the fitted model changes only modestly when training data is perturbed slightly. The argument relies on new coefficients that quantify departure from cyclic exchangeability due to temporal structure. A sympathetic reader would care because it enables reliable uncertainty estimates without data splitting in dependent data settings where split conformal methods lose efficiency.

Core claim

The vanilla leave-one-out jackknife can suffer arbitrary loss of coverage even in canonical time series models with mild temporal dependence. As a remedy, the leave-a-window-out (LWO) method achieves valid coverage provided that the model-fitting procedure satisfies mild stability properties. The proofs quantify the degree to which the data departs from cyclic exchangeability using newly introduced coefficients.

What carries the argument

The leave-a-window-out (LWO) jackknife, which modifies the standard jackknife by excluding a contiguous window of observations to account for temporal dependence while preserving coverage under stability.

Load-bearing premise

The model-fitting procedure satisfies mild stability properties, meaning the predictor changes only modestly when a small window of training data is altered.

What would settle it

A time series dataset paired with a demonstrably stable predictor where the empirical coverage of the LWO intervals falls below the nominal level would falsify the coverage claim.

Figures

Figures reproduced from arXiv: 2605.30292 by Ashwin Pananjady, Hanyang Jiang, Rina Foygel Barber, Yao Xie.

**Figure 1.** Figure 1: (Left) Empirical coverage on the multidimensional MA(1) process using 2-nearest neighbors as the base predictor. Coverage is averaged over 1000 independent trials for split CP, the vanilla jackknife, and the LWO method. The nominal coverage level is 90% throughout. (Right) Average radius of the prediction regions produced by split CP, the vanilla jackknife, and LWO, scaled by 1/ √ d to account for the res… view at source ↗

**Figure 2.** Figure 2: Illustration of how the LWO score sk is computed. We leave out a window of length τ starting at (and including) (Xk+1, Yk+1), which is denoted by the white blocks. The predictor fbk is trained on the green points (i.e. all remaining data except (Xk, Yk)), and the score sk is obtained by evaluating the trained predictor fbk on (Xk, Yk). Xt but also on covariates and responses in the L past time instants t −… view at source ↗

**Figure 3.** Figure 3: Empirical performance on the multidimensional MA(1) process across five base predictors. (Left) Empirical coverage of Split CP, Jackknife, and LWO, with the dashed horizontal line marking the nominal 90% coverage level. (Right) Average prediction region radius for the same methods and predictors. Bars show means across 500 repeated trials, and error bars indicate ±1 standard error. • Split CP: standard spl… view at source ↗

**Figure 4.** Figure 4: Empirical performance on two real data benchmarks. (Top) Traffic dataset. (Bottom) Solar Energy dataset. In each row, the left panel reports empirical coverage, and the right panel reports average prediction-set size across base predictors. Bars show means across repeated trials, and error bars indicate ±1 standard error. The dashed horizontal line marks nominal 90% coverage. 5 Proofs of main results In th… view at source ↗

**Figure 5.** Figure 5: Illustration of the circular embedding used to compute the scores s ⟳. We place the cyclically exchangeable sequence Ze = (Ze1, . . . ,Zen+τ+1) on a circle by gluing the end of the sequence back to its beginning, so that Zen+τ+1 is followed by Ze1. Black points indicate observations used for training, the blue point is the evaluation point, and the omitted LWO block is indicated by a red dashed arc togethe… view at source ↗

**Figure 6.** Figure 6: Empirical performance on the sticky Markov-chain process across five base predictors. (Left) Empirical coverage of Split CP, Jackknife, and LWO, with the dashed horizontal line marking the nominal 90% coverage level. (Right) Average prediction radius for the same methods and predictors. Bars show means across 500 repeated trials, and error bars indicate ±1 standard error. We use no additional lagged histo… view at source ↗

read the original abstract

Conformal prediction methods enjoy strong theoretical and empirical predictive inference performance, provided the data is exchangeable and is treated symmetrically during training. However, these assumptions are impractical in many settings, such as time series, where temporal dependence violates exchangeability and it is preferable to use predictors that leverage dependence by treating data asymmetrically. Recent work shows that split conformal prediction is robust to these issues, but sample splitting can reduce accuracy, motivating the study of methods that do not rely on data splitting in the time series setting. In this work, we show that the vanilla leave-one-out jackknife can suffer arbitrary loss of coverage even in canonical time series models with mild temporal dependence. As a remedy, we propose a modification tailored to such settings, which we term the leave-a-window-out (LWO) method, and show that it can achieve valid coverage provided that the model-fitting procedure satisfies mild stability properties. Our proofs are based on quantifying the degree to which the data departs from cyclic exchangeability, which we introduce new coefficients to measure. Experiments on time series demonstrate that our method often enjoys valid coverage when the vanilla jackknife fails to cover, while producing much narrower intervals than split conformal prediction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a leave-a-window-out jackknife plus new coefficients for cyclic exchangeability to fix coverage loss in time series, but the guarantee rests on unverified stability for typical predictors.

read the letter

The main thing to know is that vanilla leave-one-out jackknife can lose coverage arbitrarily in standard time series models with mild dependence, and the authors propose leaving out a window of points instead, backed by coefficients that quantify departure from cyclic exchangeability.

The LWO procedure is the concrete new piece. It keeps the full sample for fitting while adjusting the conformal step for temporal structure, which avoids the efficiency hit from split conformal. The coefficients look like a useful bookkeeping device that turns the stability assumption into a coverage bound. If the experiments really show valid coverage where jackknife fails and narrower intervals than splitting, that is the practical payoff.

The soft spot is exactly the stability premise. The coverage claim requires that the fitted predictor changes only modestly when a small window is dropped. The abstract gives no sign that this was checked for the concrete models in the experiments, such as AR processes or neural nets. If stability does not hold at the assumed level, the conversion from coefficients to coverage guarantee breaks. That makes the result conditional until the full proofs and checks are examined.

This is for researchers working on conformal methods or jackknife intervals for dependent data. A reader already thinking about time series forecasting would find the procedure and the exchangeability coefficients worth seeing.

It deserves a serious referee. The problem is real, the fix is targeted, and the new coefficients are original. The stability question is the main item for review, but the work is coherent enough to go out rather than desk reject.

Referee Report

1 major / 0 minor

Summary. The manuscript claims that the vanilla leave-one-out jackknife can suffer arbitrary loss of coverage even in canonical time series models with mild temporal dependence. As a remedy, it proposes the leave-a-window-out (LWO) method and shows that it achieves valid coverage provided the model-fitting procedure satisfies mild stability properties. The proofs rely on new coefficients quantifying departure from cyclic exchangeability. Experiments on time series data indicate that LWO often attains valid coverage where the jackknife fails while yielding narrower intervals than split conformal prediction.

Significance. If the stability conditions are verified to hold for the predictors employed and the coverage bounds are rigorously derived, the work would meaningfully extend conformal prediction techniques to dependent data without requiring sample splitting, addressing a practical limitation in time-series settings.

major comments (1)

[theoretical results and experiments section] The central coverage guarantee for LWO is obtained by converting the new cyclic-exchangeability coefficients into a bound only when the model-fitting procedure satisfies the mild stability properties. The manuscript provides no verification that this stability holds for the concrete predictors (e.g., AR models or neural networks) used in the reported time-series experiments; without such verification the conversion step fails even if the coefficients themselves are small.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful review and for highlighting the connection between the theoretical conditions and the experimental results. We address the major comment below.

read point-by-point responses

Referee: [theoretical results and experiments section] The central coverage guarantee for LWO is obtained by converting the new cyclic-exchangeability coefficients into a bound only when the model-fitting procedure satisfies the mild stability properties. The manuscript provides no verification that this stability holds for the concrete predictors (e.g., AR models or neural networks) used in the reported time-series experiments; without such verification the conversion step fails even if the coefficients themselves are small.

Authors: We agree that the coverage guarantee is obtained only when both the cyclic-exchangeability coefficients are controlled and the model-fitting procedure satisfies the stated stability properties. The manuscript introduces stability as a sufficient condition for the bound but does not provide empirical verification that this condition holds for the specific AR models or neural networks used in the experiments. The experiments instead report empirical coverage and interval widths to illustrate practical behavior. We will revise the manuscript to explicitly distinguish the theoretical guarantee from the empirical results and to note that verifying stability for concrete predictors remains an important direction for future work. revision: yes

Circularity Check

0 steps flagged

No circularity; coverage proof relies on external stability premise and new coefficients

full rationale

The derivation introduces new coefficients to quantify departure from cyclic exchangeability and proves LWO coverage by converting those coefficients under the stated premise that the fitting procedure satisfies mild stability. This premise is an assumption external to the result rather than a quantity fitted or defined inside the paper's equations. No load-bearing step reduces by construction to a self-citation, a renamed known result, or an input that is statistically forced; the central claim therefore remains independent of the paper's own fitted quantities or prior self-referential theorems.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The coverage guarantee rests on an unelaborated stability assumption for the model-fitting map and on the newly defined coefficients that quantify non-exchangeability; both are introduced in the paper rather than taken from prior literature.

axioms (1)

domain assumption The model-fitting procedure satisfies mild stability properties.
Invoked to turn the cyclic-exchangeability coefficients into a finite-sample coverage guarantee for LWO.

invented entities (1)

Coefficients measuring departure from cyclic exchangeability no independent evidence
purpose: To quantify the degree to which the time series departs from the exchangeability needed for standard jackknife coverage.
Newly introduced in the paper; no independent evidence supplied in the abstract.

pith-pipeline@v0.9.1-grok · 5762 in / 1431 out tokens · 33503 ms · 2026-06-29T05:22:45.860187+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Sequential statistical inference for Large Language Models: Representation, validity, and monitoring
cs.LG 2026-05 unverdicted novelty 3.0

Argues for modeling LLM interactions as dependent stochastic processes to enable valid sequential uncertainty quantification and change-point monitoring for trustworthiness properties.

Reference graph

Works this paper leans on

7 extracted references · 5 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Theoretical Foundations of Conformal Prediction

URLhttps://proceedings.neurips.cc/paper_files/paper/ 2023/hash/47f2fad8c1111d07f83c91be7870f8db-Abstract-Conference.html. Anastasios N Angelopoulos, Rina Foygel Barber, and Stephen Bates. Theoretical foundations of conformal prediction.arXiv preprint arXiv:2411.11824,

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

doi: 10.1007/978-1-4612-2642-0

ISBN 0-387-94214-9. doi: 10.1007/978-1-4612-2642-0. URLhttps://doi.org/10. 1007/978-1-4612-2642-0. Bradley Efron and Gail Gong. A leisurely look at the bootstrap, the jackknife, and cross-validation.The American Statistician, 37(1):36–48,

work page doi:10.1007/978-1-4612-2642-0
[3]

Tao Hong, Pierre Pinson, Shu Fan, Hamidreza Zareipour, Alberto Troccoli, and Rob J Hyndman

doi: 10.1016/j.patcog.2021.108496. Tao Hong, Pierre Pinson, Shu Fan, Hamidreza Zareipour, Alberto Troccoli, and Rob J Hyndman. Proba- bilistic energy forecasting: Global energy forecasting competition 2014 and beyond.International Journal of forecasting, 32(3):896–913,

work page doi:10.1016/j.patcog.2021.108496 2021
[4]

Jonghyeok Lee, Chen Xu, and Yao Xie

URLhttps:// proceedings.neurips.cc/paper/2020/hash/2b346a0aa375a07f5a90a344a61416c4-Abstract.html. Jonghyeok Lee, Chen Xu, and Yao Xie. Kernel-based optimally weighted conformal time-series prediction. InThe Thirteenth International Conference on Learning Representations,

2020
[5]

URLhttps://arxiv.org/abs/2311.04295

doi: 10.1214/ 25-AOS2510. URLhttps://arxiv.org/abs/2311.04295. Henrik Linusson, Ulf Norinder, Henrik Bostr¨ om, Ulf Johansson, and Tuve L¨ ofstr¨ om. Efficient conformity calibration of random forests.Expert Systems with Applications, 154:113335,

work page arXiv
[6]

Financial time series forecasting with deep learning: A systematic literature review: 2005–2019.Applied soft computing, 90:106181,

Omer Berat Sezer, Mehmet Ugur Gudelek, and Ahmet Murat Ozbayoglu. Financial time series forecasting with deep learning: A systematic literature review: 2005–2019.Applied soft computing, 90:106181,

2005
[7]

Mervyn Stone

doi: 10.1214/22-AOS2250. Mervyn Stone. Cross-validatory choice and assessment of statistical predictions.Journal of the Royal Statistical Society: Series B (Methodological), 36(2):111–133,

work page doi:10.1214/22-aos2250

[1] [1]

Theoretical Foundations of Conformal Prediction

URLhttps://proceedings.neurips.cc/paper_files/paper/ 2023/hash/47f2fad8c1111d07f83c91be7870f8db-Abstract-Conference.html. Anastasios N Angelopoulos, Rina Foygel Barber, and Stephen Bates. Theoretical foundations of conformal prediction.arXiv preprint arXiv:2411.11824,

work page internal anchor Pith review Pith/arXiv arXiv 2023

[2] [2]

doi: 10.1007/978-1-4612-2642-0

ISBN 0-387-94214-9. doi: 10.1007/978-1-4612-2642-0. URLhttps://doi.org/10. 1007/978-1-4612-2642-0. Bradley Efron and Gail Gong. A leisurely look at the bootstrap, the jackknife, and cross-validation.The American Statistician, 37(1):36–48,

work page doi:10.1007/978-1-4612-2642-0

[3] [3]

Tao Hong, Pierre Pinson, Shu Fan, Hamidreza Zareipour, Alberto Troccoli, and Rob J Hyndman

doi: 10.1016/j.patcog.2021.108496. Tao Hong, Pierre Pinson, Shu Fan, Hamidreza Zareipour, Alberto Troccoli, and Rob J Hyndman. Proba- bilistic energy forecasting: Global energy forecasting competition 2014 and beyond.International Journal of forecasting, 32(3):896–913,

work page doi:10.1016/j.patcog.2021.108496 2021

[4] [4]

Jonghyeok Lee, Chen Xu, and Yao Xie

URLhttps:// proceedings.neurips.cc/paper/2020/hash/2b346a0aa375a07f5a90a344a61416c4-Abstract.html. Jonghyeok Lee, Chen Xu, and Yao Xie. Kernel-based optimally weighted conformal time-series prediction. InThe Thirteenth International Conference on Learning Representations,

2020

[5] [5]

URLhttps://arxiv.org/abs/2311.04295

doi: 10.1214/ 25-AOS2510. URLhttps://arxiv.org/abs/2311.04295. Henrik Linusson, Ulf Norinder, Henrik Bostr¨ om, Ulf Johansson, and Tuve L¨ ofstr¨ om. Efficient conformity calibration of random forests.Expert Systems with Applications, 154:113335,

work page arXiv

[6] [6]

Financial time series forecasting with deep learning: A systematic literature review: 2005–2019.Applied soft computing, 90:106181,

Omer Berat Sezer, Mehmet Ugur Gudelek, and Ahmet Murat Ozbayoglu. Financial time series forecasting with deep learning: A systematic literature review: 2005–2019.Applied soft computing, 90:106181,

2005

[7] [7]

Mervyn Stone

doi: 10.1214/22-AOS2250. Mervyn Stone. Cross-validatory choice and assessment of statistical predictions.Journal of the Royal Statistical Society: Series B (Methodological), 36(2):111–133,

work page doi:10.1214/22-aos2250