pith. sign in

arxiv: 2606.02663 · v1 · pith:4QTVHL6Rnew · submitted 2026-06-01 · 💻 cs.LG · cs.AI

AdaWeather: Adaptively Mixing Probabilistic Weather Forecasts with Logarithmic Regret

Pith reviewed 2026-06-28 15:53 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords probabilistic weather forecastingadaptive mixingmixture of expertslogarithmic regretforecast combinationonline learningtemperature prediction
0
0 comments X

The pith

AdaWeather adaptively mixes probabilistic forecasts to achieve logarithmic regret against the best static mixture in hindsight.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents AdaWeather as a method to combine multiple probabilistic weather forecasts whose relative strengths change with location and time. It uses an adaptive weighting scheme drawn from both machine learning and expert-advice techniques. The central result is an extension of standard regret analysis showing that the adaptive weights incur only logarithmic regret relative to the single best fixed mixture that could have been chosen after seeing all data. This bound is tighter than the usual comparison to the single best individual forecast. Experiments on temperature data indicate that the resulting combined forecasts improve on both individual models and existing non-adaptive combinations.

Core claim

AdaWeather is an adaptive framework that combines many probabilistic forecasts using both machine learning and mixture-of-experts methods; its analysis shows logarithmic regret with respect to the best static mixture of experts in hindsight rather than merely the best single expert.

What carries the argument

The adaptive weighting procedure that extends prediction-with-expert-advice techniques to produce logarithmic regret against the best fixed convex combination of input forecasts.

If this is right

  • Combined forecasts can improve on every individual input model across varying conditions.
  • The method supplies a regret guarantee that grows only logarithmically with the number of time steps.
  • Temperature forecasts produced by the mixture show measurable gains over both single models and prior combination techniques.
  • The same adaptive scheme can be applied to any collection of well-calibrated probabilistic forecasts whose accuracies differ by context.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The logarithmic regret bound may support stable long-horizon operation in automated forecasting pipelines.
  • Similar adaptive mixing could be tested on other variables such as precipitation or wind speed where context dependence is also strong.
  • The framework suggests a route to combine numerical weather prediction outputs with machine-learning models without committing to one or the other in advance.

Load-bearing premise

Relative performance among the input forecasts varies sufficiently with context that adaptive reweighting produces consistent gains while the forecasts stay well calibrated under mixing.

What would settle it

A large-scale weather dataset on which the adaptive method incurs linear (not logarithmic) regret or fails to beat the best static mixture chosen in hindsight.

Figures

Figures reproduced from arXiv: 2606.02663 by Dhruman Gupta (Ashoka University), Manmeet Singh (Western Kentucky University), Mihir More (Ashoka University), Parthasarathi Mukhopadhyay (Ashoka University), Rushil Gupta (Ashoka University), Sandeep Juneja (Ashoka University), Saptarishi Dhanuka (Ashoka University), Sarvesh Iyer (Ashoka University).

Figure 1
Figure 1. Figure 1: Overall Framework Multi-model combination in weather and time series: Operational forecasting has long combined models via grand multi-model means, super-ensemble regression [48], Bayesian model averaging [43], ensemble model output statistics [42], and proper-scoring-rule calibration [44]. The deep-learning era has reframed combination as a learnable problem. Mixture-of-experts [15, 16, 17, 18] has been a… view at source ↗
Figure 2
Figure 2. Figure 2: Evaluation of performance across rollout, lead-time, and regret-based metrics [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Ablations with labels meaning held out models, showing remarkably better [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 8
Figure 8. Figure 8: the hybrid collapses onto the U-Net when the U-Net pseudo-expert band is high (and [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗
Figure 4
Figure 4. Figure 4: Per-pixel % CRPS improvement over the best raw expert (green i.e. positive = [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Per-pixel best raw expert showing variation (categorical, opacity proportional to [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Difference map used to localise the BIH-evaluation patch (RED cells = hybrid [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Three-group decomposition of expert weight contributions (top of each panel) [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: (Top) VT-MOS + U-Net stacked expert weights over 2025. (Bottom) 30-day rolling [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Per-bucket CRPS conditional on truth percentile. VT-MOS+U-Net has lower [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: VT-MOS Monte-Carlo convergence: per-step RMSE of the predictor [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Station-level evaluation across 546 StationBench-India sites, 2021–2024. There is [PITH_FULL_IMAGE:figures/full_fig_p024_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Per-pixel best raw expert showing spatial variation. [PITH_FULL_IMAGE:figures/full_fig_p025_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Per-pixel % CRPS improvement of the hybrid method over the best raw expert, [PITH_FULL_IMAGE:figures/full_fig_p025_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Per-pixel % CRPS improvement over the best raw expert, aggregated over [PITH_FULL_IMAGE:figures/full_fig_p026_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Difference map used to localise the BIH-evaluation patch. Green cells indicate [PITH_FULL_IMAGE:figures/full_fig_p027_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Rollouts and per-lead CRPS aggregated over 2024–2025. [PITH_FULL_IMAGE:figures/full_fig_p028_16.png] view at source ↗
read the original abstract

Recent advances in machine learning have produced probabilistic weather forecasting models comparable to state-of-the-art numerical weather predictors. But no model consistently dominates spatio-temporally, and relative performance is highly context-dependent. This motivates adaptive methods for combining multiple forecasts to obtain improvements and robustness. While combined forecasts have been proposed in the literature, these are achieved either through supervised learning or through prediction with expert advice methods. We introduce AdaWeather, an adaptive framework that combines many probabilistic forecasts using both machine learning as well as mixture of experts to arrive at a unified improved probabilistic forecast. While traditional expert methods develop the regret bounds with respect to the best single expert in hindsight, we extend the algorithm and analysis to show our method has logarithmic regret compared to the best static mixture of experts in hindsight. Empirically, we focus on forecasting temperature, and observe improvements over existing methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript presents AdaWeather, a framework that adaptively combines probabilistic weather forecasts from multiple models using a combination of machine learning and mixture-of-experts techniques. The key theoretical contribution is an extension of online learning algorithms to achieve logarithmic regret bounds with respect to the best static convex combination of the input forecasts in hindsight, rather than the single best expert. The empirical evaluation focuses on temperature forecasting and reports improvements over existing methods.

Significance. If the logarithmic regret bound holds, this work provides a principled online learning method for combining forecasts with guarantees against the best fixed mixture, which is a stronger benchmark than the usual single best expert. This has potential significance for improving the robustness of probabilistic weather predictions in a domain where relative model performance is context-dependent.

minor comments (2)
  1. [Abstract] The abstract states the O(log T) regret claim against the best static mixture but supplies no derivation steps, assumptions, or loss-function details, making it difficult to assess the extension from standard Hedge-style analysis without reading the body.
  2. The empirical section reports improvements on temperature forecasting but lacks details on the number of input forecasts, the exact form of the probabilistic mixture (e.g., weighted densities vs. parameter averaging), and any statistical significance testing of the gains.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our work and for recommending minor revision. We appreciate the recognition that logarithmic regret relative to the best static mixture provides a stronger benchmark than the single best expert.

Circularity Check

0 steps flagged

No significant circularity; derivation extends standard online learning results

full rationale

The paper introduces AdaWeather as an adaptive mixture of probabilistic forecasts and claims an extension of prediction-with-expert-advice algorithms to obtain O(log T) regret against the best fixed convex combination of experts rather than the single best expert. The abstract explicitly positions this as an extension of traditional methods, with no equations, fitted parameters, or self-citations presented as load-bearing for the regret bound. No self-definitional steps, fitted-input predictions, or ansatz smuggling are detectable from the given material. The central claim therefore remains an independent algorithmic extension whose validity rests on the correctness of the analysis rather than on re-labeling of inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the regret analysis presumably relies on standard online-learning assumptions not enumerated here.

pith-pipeline@v0.9.1-grok · 5737 in / 916 out tokens · 21554 ms · 2026-06-28T15:53:46.125662+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 7 canonical work pages · 1 internal anchor

  1. [1]

    FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators

    Jaideep Pathak, Shashank Subramanian, Peter Harrington, Sanjeev Raja, Ashesh Chattopadhyay, Morteza Mardani, Thorsten Kurth, David Hall, Zongyi Li, Kamyar Azizzadenesheli, et al. Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators.arXiv preprint arXiv:2202.11214, 2022

  2. [2]

    Ac- curate medium-range global weather forecasting with 3D neural networks.Nature, 619(7970):533–538, 2023

    Kaifeng Bi, Lingxi Xie, Hengheng Zhang, Xin Chen, Xiaotao Gu, and Qi Tian. Ac- curate medium-range global weather forecasting with 3D neural networks.Nature, 619(7970):533–538, 2023

  3. [3]

    Learningskillfulmedium-rangeglobalweatherforecasting.Science, 382(6677):1416– 1421, 2023

    Remi Lam, Alvaro Sanchez-Gonzalez, Matthew Willson, Peter Wirnsberger, Meire Fortunato, Ferran Alet, Suman Ravuri, Timo Ewalds, Zach Eaton-Rosen, Weihua Hu, etal. Learningskillfulmedium-rangeglobalweatherforecasting.Science, 382(6677):1416– 1421, 2023

  4. [4]

    10 Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar

    Simon Lang, Mihai Alexe, Mariana CA Clare, Christopher Roberts, Rilwan Adewoyin, Zied Ben Bouallègue, Matthew Chantry, Jesper Dramsch, Peter D Dueben, Sara Hahner, et al. Aifs-crps: ensemble forecasting using a model trained with a loss function based on the continuous ranked probability score.arXiv preprint arXiv:2412.15832, 2024

  5. [5]

    Andersson, Andrew El-Kadi, Dominic Masters, Timo Ewalds, Jacklynn Stott, Shakir Mohamed, Peter Battaglia, Remi Lam, and Matthew Willson

    Ilan Price, Alvaro Sanchez-Gonzalez, Ferran Alet, Tom R. Andersson, Andrew El-Kadi, Dominic Masters, Timo Ewalds, Jacklynn Stott, Shakir Mohamed, Peter Battaglia, Remi Lam, and Matthew Willson. GenCast: Diffusion-based ensemble forecasting for medium-range weather.Nature, 2024

  6. [6]

    Neural general circulation models for weather and climate.Nature, 2024

    Dmitrii Kochkov, Janni Yuval, Ian Langmore, Peter Norgaard, Jamie Smith, Griffin Mooers, Milan Klöwer, James Lottes, Stephan Rasp, Peter Düben, et al. Neural general circulation models for weather and climate.Nature, 2024

  7. [7]

    Gupta, and Aditya Grover

    Tung Nguyen, Johannes Brandstetter, Ashish Kapoor, Jayesh K. Gupta, and Aditya Grover. ClimaX: A foundation model for weather and climate.Proceedings of the 40th International Conference on Machine Learning (ICML), 2023

  8. [8]

    arXiv , author =:2405.13063 , file =

    Cristian Bodnar, Wessel P. Bruinsma, Ana Lucic, Megan Stanley, Johannes Brandstetter, Patrick Garvan, Maik Riechert, Jonathan A. Weyn, Haiyu Dong, Anna Vaughan, et al. Aurora: A foundation model of the atmosphere.arXiv preprint arXiv:2405.13063, 2024

  9. [9]

    Scaling transformer neural networks for skillful and reliable medium-range weather forecasting.arXiv preprint arXiv:2312.03876, 2024

    Tung Nguyen, Rohan Shah, Hritik Bansal, Troy Arcomano, Romit Maulik, Rao Kota- marthi, Ian Foster, Sandeep Madireddy, and Aditya Grover. Scaling transformer neural networks for skillful and reliable medium-range weather forecasting.arXiv preprint arXiv:2312.03876, 2024

  10. [10]

    WeatherBench 2: A benchmark for the next generation of data-driven global weather models.Journal of Advances in Modeling Earth Systems, 16(6), 2024

    Stephan Rasp, Stephan Hoyer, Alexander Merose, Ian Langmore, Peter Battaglia, Tyler Russell, Alvaro Sanchez-Gonzalez, Vivian Yang, Rob Carver, Shreya Agrawal, et al. WeatherBench 2: A benchmark for the next generation of data-driven global weather models.Journal of Advances in Modeling Earth Systems, 16(6), 2024

  11. [11]

    Mowe: A mixture of weather experts.arXiv preprint arXiv:2509.09052, 2025

    Dibyajyoti Chakraborty, Romit Maulik, Peter Harrington, Dallas Foster, Moham- mad Amin Nabian, and Sanjay Choudhry. Mowe: A mixture of weather experts.arXiv preprint arXiv:2509.09052, 2025

  12. [12]

    Boosting weather forecast via generative superensemble.npj Climate and Atmospheric Science, 2025

    Congyi Nai, Xi Chen, Shangshang Yang, Zimiu Xiao, and Baoxiang Pan. Boosting weather forecast via generative superensemble.npj Climate and Atmospheric Science, 2025

  13. [13]

    Breaking silos: Adaptive model fusion unlocks better time series forecasting

    ZhiningLiu, ZeYang, XiaoLin, RuizhongQiu, TianxinWei, YadaZhu, HendrikHamann, Jingrui He, and Hanghang Tong. Breaking silos: Adaptive model fusion unlocks better time series forecasting. InProceedings of the 42nd International Conference on Machine Learning (ICML), 2025

  14. [14]

    The quiet revolution of numerical weather prediction.Nature, 525(7567):47–55, 2015

    Peter Bauer, Alan Thorpe, and Gilbert Brunet. The quiet revolution of numerical weather prediction.Nature, 525(7567):47–55, 2015. 11

  15. [15]

    Jacobs, Michael I

    Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. Adaptive mixtures of local experts.Neural Computation, 3(1):79–87, 1991

  16. [16]

    Jordan and Robert A

    Michael I. Jordan and Robert A. Jacobs. Hierarchical mixtures of experts and the EM algorithm.Neural Computation, 6(2):181–214, 1994

  17. [17]

    Outrageously large neural networks: The sparsely-gated mixture- of-experts layer

    Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture- of-experts layer. InInternational Conference on Learning Representations (ICLR), 2017

  18. [18]

    Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity.Journal of Machine Learning Research, 23:1–39, 2022

    William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity.Journal of Machine Learning Research, 23:1–39, 2022

  19. [19]

    Spatial mixture-of-experts

    Nikoli Dryden and Torsten Hoefler. Spatial mixture-of-experts. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

  20. [20]

    VA-MoE: Variables-adaptive mixture of experts for incremental weather forecasting

    Hao Chen, Han Tao, Guo Song, Jie Zhang, Yonghan Dong, Yunlong Yu, and Lei Bai. VA-MoE: Variables-adaptive mixture of experts for incremental weather forecasting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025

  21. [21]

    Pérez-Ortiz, P

    M. Pérez-Ortiz, P. A. Gutiérrez, P. Tino, C. Casanova-Mateo, and S. Salcedo-Sanz. A mixture of experts model for predicting persistent weather patterns. InProceedings of the International Joint Conference on Neural Networks (IJCNN), 2019

  22. [22]

    Informer: Beyond efficient transformer for long sequence time-series forecasting

    Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting. InProceedings of the AAAI conference on artificial intelligence, volume 35, pages 11106–11115, 2021

  23. [23]

    Autoformer: Decomposition transformers with Auto-Correlation for long-term series forecasting.NeurIPS, 2021

    Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with Auto-Correlation for long-term series forecasting.NeurIPS, 2021

  24. [24]

    FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting.ICML, 2022

    Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting.ICML, 2022

  25. [25]

    Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam

    Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. InInternational Conference on Learning Representations (ICLR), 2023

  26. [26]

    Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio

    Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio. N-BEATS: Neural basis expansion analysis for interpretable time series forecasting. InInternational Conference on Learning Representations (ICLR), 2020

  27. [27]

    TimesNet: Temporal 2d-variation modeling for general time series analysis.ICLR, 2023

    Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. TimesNet: Temporal 2d-variation modeling for general time series analysis.ICLR, 2023

  28. [28]

    OneNet: Enhancing time series forecasting models under concept drift by online ensembling

    Yi-Fan Zhang, Qingsong Wen, Xue Wang, Weiqi Chen, Liang Sun, Zhang Zhang, Liang Wang, Rong Jin, and Tieniu Tan. OneNet: Enhancing time series forecasting models under concept drift by online ensembling. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

  29. [29]

    Quang Pham, Chenghao Liu, Doyen Sahoo, and Steven C. H. Hoi. Learning fast and slow for online time series forecasting. InInternational Conference on Learning Representations (ICLR), 2023

  30. [30]

    Fast and slow streams for online time series forecasting without information leakage

    Ying-ye Ava Lau, Zhiwen Shao, and Dit-Yan Yeung. Fast and slow streams for online time series forecasting without information leakage. InInternational Conference on Learning Representations (ICLR), 2025. 12

  31. [31]

    Reinforcement learning based dynamic model combination for time series forecasting

    Yuwei Fu, Di Wu, and Benoit Boulet. Reinforcement learning based dynamic model combination for time series forecasting. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2022

  32. [32]

    Online mixture of experts: No-regret learning for optimal collective decision-making

    Larkin Liu and Jalal Etesami. Online mixture of experts: No-regret learning for optimal collective decision-making. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

  33. [33]

    Aggregating strategies

    Volodimir G Vovk. Aggregating strategies. InProceedings of the Third Annual Workshop on Computational Learning Theory, pages 371–386. Morgan Kaufmann, 1990

  34. [34]

    Vladimir G. Vovk. A game of prediction with expert advice. InProceedings of the Eighth Annual Conference on Computational Learning Theory, pages 51–60. ACM, 1995

  35. [35]

    Predicting a binary sequence almost as well as the optimal biased coin

    Yoav Freund. Predicting a binary sequence almost as well as the optimal biased coin. Technical report, AT&T Research, 1996

  36. [36]

    Nick Littlestone and Manfred K. Warmuth. The weighted majority algorithm.Informa- tion and Computation, 108(2):212–261, 1994

  37. [37]

    Cambridge University Press, 2006

    Nicolò Cesa-Bianchi and Gábor Lugosi.Prediction, Learning, and Games. Cambridge University Press, 2006

  38. [38]

    Dynamic local regret for non-convex online forecasting

    Sergul Aydore, Tianhao Zhu, and Dean Foster. Dynamic local regret for non-convex online forecasting. InAdvances in Neural Information Processing Systems (NeurIPS), 2019

  39. [39]

    Mark Herbster and Manfred K. Warmuth. Tracking the best expert.Machine Learning, 32(2):151–178, 1998

  40. [40]

    Seshadhri

    Elad Hazan and C. Seshadhri. Efficient learning algorithms for changing environments. InProceedings of the 26th International Conference on Machine Learning (ICML), 2009

  41. [41]

    On- line optimization: Competing with dynamic comparators.Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS), 2015

    Ali Jadbabaie, Alexander Rakhlin, Shahin Shahrampour, and Karthik Sridharan. On- line optimization: Competing with dynamic comparators.Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS), 2015

  42. [42]

    Raftery, Anton H

    Tilmann Gneiting, Adrian E. Raftery, Anton H. Westveld, and Tom Goldman. Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation.Monthly Weather Review, 133(5):1098–1118, 2005

  43. [43]

    Raftery, Tilmann Gneiting, Fadoua Balabdaoui, and Michael Polakowski

    Adrian E. Raftery, Tilmann Gneiting, Fadoua Balabdaoui, and Michael Polakowski. Using Bayesian model averaging to calibrate forecast ensembles.Monthly Weather Review, 133(5):1155–1174, 2005

  44. [44]

    Tilmann Gneiting and Adrian E. Raftery. Strictly proper scoring rules, prediction, and estimation.Journal of the American Statistical Association, 102(477):359–378, 2007

  45. [45]

    Cover and Erik Ordentlich

    Thomas M. Cover and Erik Ordentlich. Universal portfolios with side information. IEEE Transactions on Information Theory, 42(2):348–363, 1996

  46. [46]

    V’yugin and Vladimir Trunov

    Vladimir V. V’yugin and Vladimir Trunov. Online aggregation of probability forecasts with confidence.arXiv preprint arXiv:2109.14309, 2021

  47. [47]

    David Haussler, Jyrki Kivinen, and Manfred K. Warmuth. Tight worst-case loss bounds for predicting with expert advice.Technical Report UCSC-CRL-94-36, University of California, Santa Cruz, 1994

  48. [48]

    T. N. Krishnamurti, C. M. Kishtawal, Timothy E. LaRow, David R. Bachiochi, Zhan Zhang, C. Eric Williford, Sulochana Gadgil, and Sajani Surendran. Improved weather and seasonal climate forecasts from multimodel superensemble.Science, 285(5433):1548– 1550, 1999. 13

  49. [49]

    DeepAR: Probabilistic forecasting with autoregressive recurrent networks.International Journal of Forecasting, 36(3):1181–1191, 2020

    David Salinas, Valentin Flunkert, Jan Gasthaus, and Tim Januschowski. DeepAR: Probabilistic forecasting with autoregressive recurrent networks.International Journal of Forecasting, 36(3):1181–1191, 2020

  50. [50]

    Reversible instance normalization for accurate time-series forecasting against distribution shift

    Taesung Kim, Jinhee Kim, Yunwon Tae, Cheonbok Park, Jang-Ho Choi, and Jaegul Choo. Reversible instance normalization for accurate time-series forecasting against distribution shift. InInternational Conference on Learning Representations (ICLR), 2022

  51. [51]

    SAN: Self-adaptive normalization for non-stationary time series forecasting

    Zhiding Liu, Mingyue Cheng, Zhi Li, Zhenya Huang, Qi Liu, Yanyan Xie, and Enhong Chen. SAN: Self-adaptive normalization for non-stationary time series forecasting. Advances in Neural Information Processing Systems (NeurIPS), 2023

  52. [52]

    McClelland, Bruce L

    James L. McClelland, Bruce L. McNaughton, and Randall C. O’Reilly. Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory.Psychological Review, 102(3):419–457, 1995

  53. [53]

    McClelland

    Dharshan Kumaran, Demis Hassabis, and James L. McClelland. What learning systems do intelligent agents need? complementary learning systems theory updated.Trends in Cognitive Sciences, 20(7):512–534, 2016

  54. [54]

    Gradient episodic memory for continual learning

    David Lopez-Paz and Marc’Aurelio Ranzato. Gradient episodic memory for continual learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2017

  55. [55]

    The problem of concept drift: Definitions and related work.Technical Report TCD-CS-2004-15, Trinity College Dublin, 2004

    Alexey Tsymbal. The problem of concept drift: Definitions and related work.Technical Report TCD-CS-2004-15, Trinity College Dublin, 2004

  56. [56]

    A survey on concept drift adaptation.ACM Computing Surveys, 46(4):1–37, 2014

    João Gama, Indr˙ e Žliobait˙ e, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. A survey on concept drift adaptation.ACM Computing Surveys, 46(4):1–37, 2014

  57. [57]

    Online learning and online convex optimization.Foundations and Trends in Machine Learning, 4(2):107–194, 2012

    Shai Shalev-Shwartz. Online learning and online convex optimization.Foundations and Trends in Machine Learning, 4(2):107–194, 2012

  58. [58]

    Foundations and Trends in Optimization, 2016

    Elad Hazan.Introduction to Online Convex Optimization. Foundations and Trends in Optimization, 2016

  59. [59]

    Distributional regression: CRPS-error bounds for model fitting, model selection and convex aggregation

    Clément Dombry and Ahmed Zaoui. Distributional regression: CRPS-error bounds for model fitting, model selection and convex aggregation. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

  60. [60]

    Contribution of expert aggregation to temperature prediction, part I.Preprint, 2025

    Leo Pfitzner, Olivier Wintenberger, Olivier Mestre, and Marion Riverain. Contribution of expert aggregation to temperature prediction, part I.Preprint, 2025

  61. [61]

    Helmbold, Robert E

    Nicoló Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire, and Manfred K. Warmuth. How to use expert advice.J. ACM, 44(3):427–485, may 1997

  62. [62]

    discretization

    Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, and Aaron Courville. FiLM: Visual reasoning with a general conditioning layer. September 2017. 14 Appendix Contents A Broader Impact 16 B Data 16 C Experiments 16 C.1 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 C.2 Evaluation . . . . . . . . . . . . ...