arxiv: 2605.06541 · v1 · submitted 2026-05-07 · 💻 cs.LG · stat.ML

Recognition: unknown

Hedging Memory Horizons for Non-Stationary Prediction via Online Aggregation

Yutong Wang , Yannig Goude , Qiwei Yao

Authors on Pith no claims yet

Pith reviewed 2026-05-08 12:24 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords online aggregationnon-stationary predictionexponentially weighted least squaresmemory hedgingdistribution shiftelectricity load forecastingoracle inequalities

0 comments

The pith

Hedging across a grid of forgetting factors lets online predictors track unknown distribution shifts without external indicators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MELO, which wraps any pool of non-anticipating base predictors with exponentially weighted least-squares experts at several forgetting rates and then aggregates the raw and adapted forecasts using the parameter-free MLpol rule. It proves deterministic oracle inequalities showing that the resulting predictor competes with both the single best base predictor and the best bounded time-varying affine combination of them, at the price of a path-length-dependent tracking term plus a sublinear aggregation overhead. On French national electricity-load data through the COVID lockdown, MELO cuts RMSE by 34.7 percent versus plain MLpol aggregation while requiring only recursive updates and no regime labels or policy covariates.

Core claim

MELO competes with the best raw predictor and the best bounded time-varying affine combinations of the base predictions up to a path-length-dependent tracking cost and a sublinear aggregation overhead.

What carries the argument

MELO procedure that hedges a fixed grid of forgetting factors by running parallel EWLS adaptation experts on the base-predictor pool and then applies MLpol aggregation to the resulting raw and adapted forecasts.

If this is right

Stable quiet-period performance is preserved while automatic adaptation occurs when regimes change.
The method works with any existing non-anticipating base predictors without retraining or external regime indicators.
Only lightweight recursive updates are needed at each time step.
The same hedging idea can be applied to other online aggregation rules beyond MLpol.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may extend naturally to other drifting environments such as financial returns or sensor streams where the right memory length is unknown in advance.
If the true shift dynamics are much faster or slower than the chosen grid, performance may degrade and a data-driven grid selection step could become necessary.
Pairing MELO with stronger modern base predictors could compound the observed error reductions.

Load-bearing premise

Predictions and outcomes stay bounded, base predictors are non-anticipating, and a fixed discrete grid of forgetting factors is sufficient to track arbitrary unknown regime shifts.

What would settle it

On a bounded synthetic stream with abrupt, sustained shifts whose optimal memory length lies outside the chosen forgetting grid, MELO should show no RMSE improvement over the single best fixed forgetting factor.

Figures

Figures reproduced from arXiv: 2605.06541 by Qiwei Yao, Yannig Goude, Yutong Wang.

**Figure 1.** Figure 1: Overview. Causal base predictions zt enter the expert pool directly and through K EWLS experts with different forgetting factors. MLpol aggregates the M + K raw and adapted candidates into yˆt; after yt is observed, both EWLS states and MLpol weights are updated online. We study sequential one-step-ahead prediction. Let Ht−1 denote all information available to the learner before yt is revealed. By conventi… view at source ↗

**Figure 2.** Figure 2: No single forgetting scale is optimal across regimes. Each curve shows the excess RMSE of a standalone EWLS correction with fixed forgetting factor γ, relative to the best fixed scale within that regime. The horizontal axis is the nominal scale h(γ) = 1/(1 − γ), used as an ordered forgetting-scale index because the implemented experts also include small covariance inflation. Empty markers indicate the best… view at source ↗

**Figure 3.** Figure 3: Cumulative regret and MLpol weights over the test period. Top: Cumulative excess squared loss of three aggregators relative to the best static convex combination of all N = M + K = 23 experts, fitted in hindsight. Values below zero mean that the online aggregate beats this static hindsight benchmark. Endpoint labels are in 106 MW2 . Bottom: MLpol weights stack with EWLS experts aggregated into three bucket… view at source ↗

**Figure 4.** Figure 4: Walk-forward ε0 sweep on 2018 out-of-sample. The validation curve has an interior minimum at ε0 = 10−8 (RMSE 677.3). The shaded region marks a conservative two-decade plateau [10−9 , 10−7 ], where the RMSE remains within 1% of the best value. The neighbouring point at 10−6 is also close, but the curve has already started to rise. The left tail worsens when the inflation is too small relative to the conditi… view at source ↗

**Figure 5.** Figure 5: Empirical bounded-iterate diagnostic on RTE-FR Load. Trajectory of the EWLS coefficient norm for the K − 1 = 15 finite-forgetting experts over the 2019–2021 test period. The no-forgetting endpoint γ = 1 is outside the finite-forgetting setting of Theorem 1 and is excluded. Curves are coloured by effective memory h(γ) = 1/(1 − γ), with short-memory experts in warm colours and long-memory experts in cool col… view at source ↗

**Figure 6.** Figure 6: IMM-KF mode probabilities µi,t on a five-mode grid q ∈ {10−8 , . . . , 100}. Colours encode modes by Q-scale, from slow to fast. (a) Full test period. Dashed vertical lines mark the COVID-19 lockdown window. (b) Zoom on the lockdown transition. With the tuned πstay = 0.5, IMM is relatively willing to change mode under the selected validation protocol. Around the lockdown onset, posterior mass shifts away… view at source ↗

**Figure 7.** Figure 7: Mean MLpol weight allocated to each member of the VIKING pool (left) and the VBAKF view at source ↗

**Figure 8.** Figure 8: VBAKF’s estimated state-uncertainty scale view at source ↗

**Figure 9.** Figure 9: Early-window residual correlation matrices across the three datasets, sorted left-toright by base-pool homogeneity. Correlations are computed on an initial diagnostic segment of each chronological test stream: the first 300 test days for RTE-FR Load and the first 20,000 test instances for the two TabReD datasets. Each cell reports the Pearson correlation between two base models’ residuals (prediction minu… view at source ↗

**Figure 10.** Figure 10: Overall RMSE as a function of the number of base-expert families view at source ↗

**Figure 11.** Figure 11: (a) Decomposition of the aggregation gain into three pairwise contrasts as a function of H. Base+EWLS vs. Base-only (green) and EWLS-only vs. Base-only (blue) trace a near-identical monotone decline from ∼570 MW at H = 1 to ∼320 MW at H = 7, the signature of substitutability between EWLS and family heterogeneity. Base+EWLS vs. EWLS-only (orange) is flat at ≈ 12 MW for all H ≥ 2: base experts contribute a … view at source ↗

**Figure 12.** Figure 12: Per-regime RMSE as a function of H (log scale). In stable regimes (pre- and postlockdown), Base-only catches up as heterogeneity grows. Under the lockdown structural break, the Base+EWLS curve is nearly flat in H while Base-only plateaus at ∼2400 MW: the drift-tracking contribution of the framework comes from EWLS and cannot be replaced by pool heterogeneity. 1. EWLS and family heterogeneity are partial … view at source ↗

**Figure 13.** Figure 13: Fully disaggregated MLpol weight trajectory. All N =M +K = 23 experts are shown as individual strata. Base experts (bottom of the stack) are coloured individually; EWLS experts (top) are coloured by forgetting factor γ on the RdYlBu ramp (red = fast forgetting, γ = 0.95; blue = slow / near-static, γ →1). Dashed red vertical lines mark the COVID-19 lockdown window (2020-03-17 to 2020-05-11). The plot begin… view at source ↗

**Figure 14.** Figure 14: Within-base reallocation. Top: raw MLpol weight of each of the M = 7 base experts. Bottom: the same weights normalised to sum to one across the base pool, so one reads each expert’s share of the base allocation independently of the base-pool total. The first six weeks covering the uniform-weight warm-up transient are clipped. Dashed red vertical lines mark the COVID-19 lockdown window. Under the structura… view at source ↗

read the original abstract

We study online prediction under distribution shift, where inputs arrive chronologically and outcomes are revealed only after prediction. In this setting, predictors must remain stable in quiet regimes yet adapt when regimes shift, and the right adaptation memory is unknown in advance. We propose MELO (Memory-hedged Exponentially Weighted Least-Squares Online aggregation), a model-agnostic method that hedges across adaptation scales: it wraps any non-anticipating base-predictor pool with exponentially weighted least-squares (EWLS) adaptation experts at multiple forgetting factors, and aggregates raw and EWLS-adapted forecasts with MLpol, a parameter-free online aggregation rule. Under boundedness conditions, we establish deterministic oracle inequalities showing that it competes with both the best raw predictor and the best bounded, time-varying affine combinations of the base predictions, up to a path-length-dependent tracking cost and a sublinear aggregation overhead. We evaluate MELO on French national electricity-load forecasting through the COVID-19 lockdown using no regime indicators, lockdown dates, or policy covariates. MELO reduces overall RMSE by 34.7\% relative to base-only MLpol and achieves lower overall RMSE than a TabICL reference supplied with an external COVID policy-response covariate. Moreover, MELO requires only lightweight per-step recursive updates without model retraining.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MELO hedges memory horizons via multi-scale EWLS plus MLpol, with clean oracle inequalities and a 34.7% RMSE gain on the electricity data, but the fixed forgetting grid has no discretization analysis.

read the letter

MELO is a method that hedges multiple memory horizons by running EWLS experts at different forgetting factors on top of base predictors, then aggregates everything with MLpol. The paper shows deterministic oracle inequalities under boundedness that let it compete with the best fixed predictor and the best time-varying affine combo of them, paying only path-length and sublinear costs. The new part is this specific combination for handling unknown adaptation scales in a parameter-light way. It extends standard online aggregation by adding the memory hedging layer without needing regime detectors or retraining. The recursive updates keep it efficient for streaming data. On the French electricity load data during COVID, it cuts overall RMSE by 34.7 percent versus base-only MLpol and does better than a reference that uses external policy covariates. That is a solid real-world test with no cherry-picking of indicators. The main soft spot is the fixed discrete grid of forgetting factors. If the shift requires a memory length not close to any grid point, the closest expert adds approximation error that sits outside the path-length term in the bound. The abstract does not provide a discretization analysis or advice on grid density, so the claimed guarantee may not hold tightly in practice. Only one dataset is shown, which limits how much we can generalize the empirical gains. This work suits people building online forecasters for non-stationary series, such as in energy, traffic, or finance. Readers who already use aggregation rules like MLpol will find the memory extension directly useful and easy to implement. It deserves a serious referee. The theory is grounded and the application is relevant. I would send it for peer review, asking reviewers to check the grid sensitivity and perhaps add a second dataset or simulation study on regime shifts.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes MELO, a model-agnostic online aggregation method that wraps non-anticipating base predictors with exponentially weighted least-squares (EWLS) experts at a grid of forgetting factors and aggregates the raw and adapted forecasts via MLpol. Under boundedness assumptions it establishes deterministic oracle inequalities showing competition with both the best raw predictor and the best bounded time-varying affine combination of the bases, up to a path-length-dependent tracking cost plus sublinear aggregation overhead. On French national electricity-load data during the COVID-19 period, MELO achieves a 34.7% RMSE reduction relative to base-only MLpol and outperforms a TabICL reference that receives an external policy covariate.

Significance. If the oracle inequalities hold as stated, the work supplies a practical, parameter-light way to hedge unknown adaptation memory in non-stationary online prediction without external change indicators. The deterministic (non-probabilistic) nature of the bounds, the explicit path-length term, and the real-data evaluation on a regime-shift episode without post-hoc selection are clear strengths. The approach sits at the intersection of online aggregation and adaptive filtering and could be useful wherever memory horizons are unknown a priori.

major comments (2)

[Abstract and §3] Abstract and §3 (MELO construction): the stated oracle inequality competes with the best bounded time-varying affine combination of the base predictors up to a path-length term. However, MELO realizes the competition by hedging only over a fixed finite grid of EWLS forgetting factors. No discretization-error analysis or grid-density guarantee is supplied; an optimal forgetting factor lying between or outside the grid points incurs an extra approximation error that is not absorbed into the path-length cost and can therefore make the realized regret exceed the claimed bound.
[§4] §4 (empirical study): the reported 34.7% RMSE reduction is measured against base-only MLpol on a single COVID-era electricity series. No ablation or sensitivity table is given for grid range, spacing, or number of forgetting factors, leaving open the possibility that performance is sensitive to the particular discretization chosen for that dataset.

minor comments (2)

[Theorem statement] The notation for the path-length functional (variation of the combination weights) should be given an explicit equation number in the statement of the main theorem so that readers can verify how it interacts with the discretization.
[§4] A short paragraph clarifying that the boundedness assumptions are verified (or approximately satisfied) on the electricity-load series would help readers assess applicability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive report and positive overall assessment. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (MELO construction): the stated oracle inequality competes with the best bounded time-varying affine combination of the base predictors up to a path-length term. However, MELO realizes the competition by hedging only over a fixed finite grid of EWLS forgetting factors. No discretization-error analysis or grid-density guarantee is supplied; an optimal forgetting factor lying between or outside the grid points incurs an extra approximation error that is not absorbed into the path-length cost and can therefore make the realized regret exceed the claimed bound.

Authors: We agree that the current statement of the oracle inequality requires clarification. The bound holds with respect to the best time-varying affine combination whose forgetting factor lies within the finite grid employed by MELO; the path-length term controls variation of the combination weights but does not explicitly bound the additional approximation error incurred when the optimal forgetting factor falls between or outside grid points. In the revised manuscript we will (i) restate the theorem to make the grid restriction explicit and (ii) add a short paragraph discussing practical grid selection (e.g., a geometrically spaced grid over [0,1] with spacing chosen so that the induced approximation error is absorbed into the existing sub-linear term for typical path lengths). This change does not alter the deterministic nature of the bounds or the practical algorithm. revision: yes
Referee: [§4] §4 (empirical study): the reported 34.7% RMSE reduction is measured against base-only MLpol on a single COVID-era electricity series. No ablation or sensitivity table is given for grid range, spacing, or number of forgetting factors, leaving open the possibility that performance is sensitive to the particular discretization chosen for that dataset.

Authors: We acknowledge that the empirical evaluation would be strengthened by sensitivity analysis. In the revised version we will add a table (or set of plots) that reports RMSE for the same French electricity series under different grid configurations: varying the number of forgetting factors (e.g., 5, 10, 20), the range (e.g., [0.01,0.99] vs. [0.001,0.999]), and the spacing (linear vs. geometric). The table will also include the performance of the single best grid point chosen ex post, thereby quantifying the benefit of hedging versus using a fixed forgetting factor. These results will be obtained with the same experimental protocol already described. revision: yes

Circularity Check

0 steps flagged

No circularity: oracle inequalities are external and derivation is self-contained

full rationale

The paper establishes deterministic oracle inequalities bounding MELO's regret against the best raw base predictor and the best bounded time-varying affine combination of the base predictions, with explicit additive terms for path-length tracking cost and sublinear MLpol aggregation overhead. These bounds are derived from standard online learning techniques under stated boundedness assumptions and do not reduce to any fitted parameter, self-defined quantity, or prior self-citation by construction. The fixed grid of EWLS forgetting factors is an explicit algorithmic choice whose discretization effect is absorbed into the path-length term rather than presupposed; the empirical RMSE reduction on real electricity data is an independent validation, not a tautological fit. No load-bearing step collapses to renaming or self-referential input.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard online learning assumptions plus the boundedness for the regret bounds.

free parameters (1)

forgetting factors grid
A discrete set of forgetting factors is chosen to create the expert pool; the paper does not specify how the grid is selected but hedges over it rather than fitting a single value.

axioms (1)

domain assumption Boundedness conditions on the loss and predictions
Invoked to establish the deterministic oracle inequalities.

pith-pipeline@v0.9.0 · 5533 in / 1339 out tokens · 39577 ms · 2026-05-08T12:24:38.809209+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

86 extracted references · 4 canonical work pages · 1 internal anchor

[1]

1983 , publisher=

Theory and practice of recursive identification , author=. 1983 , publisher=

1983
[2]

2006 , publisher=

Prediction, learning, and games , author=. 2006 , publisher=

2006
[3]

Journal of Basic Engineering , year=

A New Approach to Linear Filtering and Prediction Problems , author=. Journal of Basic Engineering , year=
[4]

COLT , year=

A second-order bound with excess losses , author=. COLT , year=
[5]

Introduction to Online Convex Optimization , author=
[6]

The Thirteenth International Conference on Learning Representations , year=

TabM: Advancing tabular deep learning with parameter-efficient ensembling , author=. The Thirteenth International Conference on Learning Representations , year=
[7]

Proceedings of the 38th International Conference on Machine Learning , pages =

Leveraging Good Representations in Linear Contextual Bandits , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =

2021
[8]

Freund and R

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting , journal =. 1997 , issn =. doi:https://doi.org/10.1006/jcss.1997.1504 , url =

work page doi:10.1006/jcss.1997.1504 1997
[9]

1996 , publisher=

Statistical digital signal processing and modeling , author=. 1996 , publisher=

1996
[10]

1990 , publisher=

Forecasting, structural time series models and the Kalman filter , author=. 1990 , publisher=

1990
[11]

Proceedings of the AAAI conference on artificial intelligence , volume=

Informer: Beyond efficient transformer for long sequence time-series forecasting , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
[12]

Advances in neural information processing systems , volume=

Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting , author=. Advances in neural information processing systems , volume=
[13]

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

A time series is worth 64 words: Long-term forecasting with transformers , author=. arXiv preprint arXiv:2211.14730 , year=

work page internal anchor Pith review arXiv
[14]

A NSTransformer-Based Carbon Emission Prediction Model for Transmission Line Project Construction , year=

Liu, Rui and Liu, Chao and Li, Shuzheng and Ma, Na , booktitle=. A NSTransformer-Based Carbon Emission Prediction Model for Transmission Line Project Construction , year=
[15]

International Conference on Learning Representations , year=

Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift , author=. International Conference on Learning Representations , year=
[16]

International Journal of Computer Vision , volume=

A comprehensive survey on test-time adaptation under distribution shifts , author=. International Journal of Computer Vision , volume=. 2025 , publisher=

2025
[17]

Online learning and online convex optimization.Foundations and Trends in Machine Learning, 4(2):107–194, 2012

Shalev-Shwartz, Shai , title =. Foundations and Trends in Machine Learning , volume =. 2012 , month =. doi:10.1561/2200000018 , url =

work page doi:10.1561/2200000018 2012
[18]

The Journal of Machine Learning Research , volume=

Follow the leader if you can, hedge if you must , author=. The Journal of Machine Learning Research , volume=. 2014 , publisher=

2014
[19]

Machine Learning , volume=

Optimal learning with Bernstein online aggregation , author=. Machine Learning , volume=. 2017 , publisher=

2017
[20]

Machine learning , volume=

Tracking the best expert , author=. Machine learning , volume=. 1998 , publisher=

1998
[21]

Machine learning , volume=

Selective sampling using the query by committee algorithm , author=. Machine learning , volume=. 1997 , publisher=

1997
[22]

2003 , publisher=

Fundamentals of adaptive filtering , author=. 2003 , publisher=

2003
[23]

IEEE Open Access Journal of Power and Energy , volume=

State-space models for online post-covid electricity load forecasting competition , author=. IEEE Open Access Journal of Power and Energy , volume=. 2022 , publisher=

2022
[24]

1979 , publisher =

Optimal Filtering , author =. 1979 , publisher =

1979
[25]

2008 , doi =

A robust variable forgetting factor recursive least-squares algorithm for system identification , author =. 2008 , doi =

2008
[26]

IEEE Transactions on Automatic Control , volume =

Approaches to adaptive filtering , author =. IEEE Transactions on Automatic Control , volume =. 1972 , publisher =

1972
[27]

Learning an Outlier-Robust

Ting, Jo-Anne and Theodorou, Evangelos and Schaal, Stefan , booktitle =. Learning an Outlier-Robust. 2007 , publisher =

2007
[28]

Gradient-based variable forgetting factor

Leung, Shun-Hung and So, Ching-Fong , journal =. Gradient-based variable forgetting factor. 2005 , doi =

2005
[29]

International Conference on Learning Representations (

Efficiently Modeling Long Sequences with Structured State Spaces , author =. International Conference on Learning Representations (. 2022 , url =

2022
[30]

and van der Hoeven, Dirk , title =

van Erven, Tim and Koolen, Wouter M. and van der Hoeven, Dirk , title =. Journal of Machine Learning Research , volume =
[31]

Proceedings of the 20th International Conference on Machine Learning (ICML) , pages =

Zinkevich, Martin , title =. Proceedings of the 20th International Conference on Machine Learning (ICML) , pages =
[32]

Advances in Neural Information Processing Systems 31 (NeurIPS) , pages =

Zhang, Lijun and Lu, Shiyin and Zhou, Zhi-Hua , title =. Advances in Neural Information Processing Systems 31 (NeurIPS) , pages =
[33]

and Granger, Clive W

Bates, John M. and Granger, Clive W. J. , title =. Journal of the Operational Research Society , volume =. 1969 , doi =

1969
[34]

Handbook of Economic Forecasting, Volume 1 , editor =

Timmermann, Allan , title =. Handbook of Economic Forecasting, Volume 1 , editor =
[35]

Hashem and Pettenuzzo, Davide and Timmermann, Allan , title =

Pesaran, M. Hashem and Pettenuzzo, Davide and Timmermann, Allan , title =. The Review of Economic Studies , volume =. 2006 , doi =

2006
[36]

Hashem and Pick, Andreas and Pranovich, Mikhail , title =

Pesaran, M. Hashem and Pick, Andreas and Pranovich, Mikhail , title =. Journal of Econometrics , volume =. 2013 , doi =

2013
[37]

Raftery, Adrian E. and K. Online prediction under model uncertainty via dynamic model averaging: Application to a cold rolling mill , journal =. 2010 , doi =

2010
[38]

International Economic Review , volume =

Koop, Gary and Korobilis, Dimitris , title =. International Economic Review , volume =. 2012 , doi =

2012
[39]

Journal of Econometrics , volume =

Giraitis, Liudas and Kapetanios, George and Price, Simon , title =. Journal of Econometrics , volume =. 2013 , doi =

2013
[40]

Machine Learning , volume =

Devaine, Marie and Gaillard, Pierre and Goude, Yannig and Stoltz, Gilles , title =. Machine Learning , volume =. 2013 , doi =

2013
[41]

Modeling and Stochastic Learning for Forecasting in High Dimensions , editor =

Gaillard, Pierre and Goude, Yannig , title =. Modeling and Stochastic Learning for Forecasting in High Dimensions , editor =. 2015 , doi =

2015
[42]

IEEE Transactions on Power Systems , volume =

Obst, David and de Vilmarest, Joseph and Goude, Yannig , title =. IEEE Transactions on Power Systems , volume =. 2021 , doi =

2021
[43]

Nature human behaviour , volume=

A global panel database of pandemic policies (Oxford COVID-19 Government Response Tracker) , author=. Nature human behaviour , volume=. 2021 , publisher=

2021
[44]

2018 , publisher=

High-dimensional probability: An introduction with applications in data science , author=. 2018 , publisher=

2018
[45]

The Annals of Statistics , volume=

The jackknife and the bootstrap for general stationary observations , author=. The Annals of Statistics , volume=
[46]

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages=

Optuna: A next-generation hyperparameter optimization framework , author=. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages=
[47]

Econometrica , volume=

A simple adaptive procedure leading to correlated equilibrium , author=. Econometrica , volume=. 2000 , publisher=

2000
[48]

Statistical Inference for Stochastic Processes , volume =

de Vilmarest, Joseph and Wintenberger, Olivier , title =. Statistical Inference for Stochastic Processes , volume =. 2024 , doi =

2024
[49]

IEEE Transactions on Automatic Control , volume =

Huang, Yulong and Zhang, Yonggang and Wu, Zhemin and Li, Ning and Chambers, Jonathon , title =. IEEE Transactions on Automatic Control , volume =. 2018 , doi =

2018
[50]

Blom, Henk A. P. and Bar-Shalom, Yaakov , title =. IEEE Transactions on Automatic Control , volume =. 1988 , doi =

1988
[51]

Advances in Neural Information Processing Systems , editor =

Moulines, Eric and Bach, Francis , title =. Advances in Neural Information Processing Systems , editor =. 2011 , publisher =

2011
[52]

Concentration Inequalities: A Nonasymptotic Theory of Independence , publisher =

Boucheron, St. Concentration Inequalities: A Nonasymptotic Theory of Independence , publisher =. 2013 , isbn =

2013
[53]

Advances in Neural Information Processing Systems , volume=

Revisiting Deep Learning Models for Tabular Data , author=. Advances in Neural Information Processing Systems , volume=
[54]

Advances in Neural Information Processing Systems , volume =

OneNet: Enhancing Time Series Forecasting Models under Concept Drift by Online Ensembling , author =. Advances in Neural Information Processing Systems , volume =. 2023 , pages =

2023
[55]

Advances in Neural Information Processing Systems , volume =

Online Time Series Forecasting with Theoretical Guarantees , author =. Advances in Neural Information Processing Systems , volume =. 2025 , url =

2025
[56]

International Conference on Learning Representations , year =

Fast and Slow Streams for Online Time Series Forecasting Without Information Leakage , author =. International Conference on Learning Representations , year =
[57]

Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining , series =

Proactive Model Adaptation Against Concept Drift for Online Time Series Forecasting , author =. Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining , series =. 2025 , pages =. doi:10.1145/3690624.3709210 , url =

work page doi:10.1145/3690624.3709210 2025
[58]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Battling the Non-stationarity in Time Series Forecasting via Test-time Adaptation , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2025 , pages =

2025
[59]

Advances in Neural Information Processing Systems , volume =

Improving Time Series Forecasting via Instance-aware Post-hoc Revision , author =. Advances in Neural Information Processing Systems , volume =
[60]

Proceedings of the 42nd International Conference on Machine Learning , pages =

Lightweight Online Adaption for Time Series Foundation Model Forecasts , author =. Proceedings of the 42nd International Conference on Machine Learning , pages =. 2025 , editor =

2025
[61]

International Conference on Learning Representations , year=

TabReD: Analyzing Pitfalls and Filling the Gaps in Tabular Deep Learning Benchmarks , author=. International Conference on Learning Representations , year=
[62]

Advances in Neural Information Processing Systems , year=

On Embeddings for Numerical Features in Tabular Deep Learning , author=. Advances in Neural Information Processing Systems , year=
[63]

KDD , year=

XGBoost: A Scalable Tree Boosting System , author=. KDD , year=
[64]

NeurIPS , year=

LightGBM: A Highly Efficient Gradient Boosting Decision Tree , author=. NeurIPS , year=
[65]

ICLR , year=

TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second , author=. ICLR , year=
[66]

Jingang Qu and David Holzm. Tab. Forty-second International Conference on Machine Learning , year=
[67]

ACM Computing Surveys , volume =

A Survey on Concept Drift Adaptation , author =. ACM Computing Surveys , volume =. 2014 , publisher =

2014
[68]

Proceedings of the 2007 SIAM International Conference on Data Mining , pages =

Learning from Time-Changing Data with Adaptive Windowing , author =. Proceedings of the 2007 SIAM International Conference on Data Mining , pages =. 2007 , publisher =

2007
[69]

Machine Learning and Knowledge Discovery in Databases , pages =

Adaptive Random Forests for Evolving Data Stream Classification , author =. Machine Learning and Knowledge Discovery in Databases , pages =. 2017 , publisher =

2017
[70]

Additive Models and Robust Aggregation for

Gaillard, Pierre and Goude, Yannig and Nedellec, Rapha\". Additive Models and Robust Aggregation for. International Journal of Forecasting , year =
[71]

Local Short and Middle Term Electricity Load Forecasting with Semi-Parametric Additive Models , journal =

Goude, Yannig and Nedellec, Rapha\". Local Short and Middle Term Electricity Load Forecasting with Semi-Parametric Additive Models , journal =. 2014 , volume =

2014
[72]

and Kennard, Robert W

Hoerl, Arthur E. and Kennard, Robert W. , title =. Technometrics , year =
[73]

, title =

Wood, Simon N. , title =
[74]

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year =

He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian , title =. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year =
[75]

Statistical Science , year =

Hastie, Trevor and Tibshirani, Robert , title =. Statistical Science , year =
[76]

Adaptive Methods for Short-Term Electricity Load Forecasting During COVID-19 Lockdown in France , year=

Obst, David and de Vilmarest, Joseph and Goude, Yannig , journal=. Adaptive Methods for Short-Term Electricity Load Forecasting During COVID-19 Lockdown in France , year=
[77]

and Watson, Mark W

Stock, James H. and Watson, Mark W. , title =. Journal of Forecasting , year =
[78]

Journal of Machine Learning Research , volume =

Tracking the Best Linear Predictor , author =. Journal of Machine Learning Research , volume =
[79]

Warmuth , title =

Olivier Bousquet and Manfred K. Warmuth , title =. Journal of Machine Learning Research , volume =
[80]

Advances in Neural Information Processing Systems 25 (NeurIPS) , year =

Nicol\`o Cesa-Bianchi and Pierre Gaillard and G\'abor Lugosi and Gilles Stoltz , title =. Advances in Neural Information Processing Systems 25 (NeurIPS) , year =

Showing first 80 references.