pith. sign in

arxiv: 2605.19231 · v1 · pith:53LKJRAXnew · submitted 2026-05-19 · 💻 cs.LG · stat.ML

DeRegiME: Deep Regime Mixtures for Probabilistic Forecasting under Distribution Shift

Pith reviewed 2026-05-20 07:51 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords probabilistic forecastingdistribution shiftregime mixturesGaussian processtime seriesneural networksuncertainty estimation
0
0 comments X

The pith

DeRegiME models residual uncertainty in time series as recurring regimes assigned by a shared sparse variational Gaussian process gate.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DeRegiME to separate the mean signal from latent uncertainty regimes in probabilistic forecasting. It uses a sparse variational GP with a nonstationary kernel to softly cluster residual patterns into a finite set of recurring regimes and combine them with per-regime noise processes through a single shared gate. This yields an interpretable decomposition that surfaces implicit changepoints when distribution shifts affect uncertainty rather than the conditional mean. A sympathetic reader would care because standard neural forecasters often discard or fail to expose this residual structure, leading to poorer handling of abrupt, gradual, or seasonal shifts. The result is a single sparse-GP posterior rather than a full mixture of experts.

Core claim

DeRegiME establishes that a direct-sum feature-space representation formed by a shared sparse variational GP gate with nonstationary regime-mixing kernel and Student-t likelihood can capture recurring regimes in residual uncertainty, producing improved negative log predictive density, CRPS, and MSE on ten benchmarks spanning different shift types while pruning the number of active regimes via stick-breaking.

What carries the argument

The shared sparse variational GP gate with nonstationary regime-mixing kernel, which softly assigns each forecast location to learned recurring regimes and mixes per-regime sub-kernels plus noise processes into one posterior.

If this is right

  • The approach produces an interpretable mean-residual-noise decomposition with regimes as clusters of residual similarity.
  • Regime transitions appear as implicit changepoints in the uncertainty structure.
  • The stick-breaking gate automatically determines the effective number of regimes.
  • Kernel validity and predictive-density propriety are formally established.
  • Gains hold consistently across abrupt, gradual, and seasonal distribution shifts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The regime assignments could serve as an unsupervised detector for entering new uncertainty states in unlabeled streams.
  • Decision systems might condition actions on the current regime rather than a single forecast distribution.
  • The nonstationary kernel structure may extend naturally to modeling horizon-dependent uncertainty changes.
  • Similar gate mechanisms could be adapted to other sequence tasks where residual structure carries the shift signal.

Load-bearing premise

Residual uncertainty follows patterns that repeat across a small number of recurring regimes whose transitions can be learned by the sparse variational GP gate.

What would settle it

Replacing the sparse GP gate with a standard mixture head without the nonstationary kernel or residual clustering and seeing the NLPD gains disappear on the same benchmarks would falsify the value of the regime mechanism.

Figures

Figures reproduced from arXiv: 2605.19231 by Kieran Wood, Stefan Zohren, Stephen J. Roberts.

Figure 1
Figure 1. Figure 1: DeRegiME pipeline, past-context routing (grey) and predictive to the forecast (teal). 1 arXiv:2605.19231v1 [cs.LG] 19 May 2026 [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Representative DeRegiME regime diagnostics: outbreak (Illness), eurozone-debt-crisis shift (GBP), few-state reuse (aggregate Traffic), and a horizon-indexed intraday transition (ETTm1). Gate colours per panel are local display ranks by average mass, with grey marking “Other”. References Ryan P. Adams and David J. C. MacKay. Bayesian online changepoint detection, 2007. arXiv:0710.3742. Alexander Alexandrov,… view at source ↗
Figure 3
Figure 3. Figure 3: Additional representative DeRegiME diagnostics for the six benchmark datasets not shown in [PITH_FULL_IMAGE:figures/full_fig_p027_3.png] view at source ↗
read the original abstract

We introduce DeRegiME -- Deep Regime Mixture of Experts -- a direct multi-horizon probabilistic forecaster that separates latent uncertainty regimes from the underlying signal and softly assigns each forecast location to learned recurring regimes using a sparse variational Gaussian process (GP) whose nonstationary regime-mixing kernel and Student-t likelihood combine per-regime sub-kernels and noise processes via a shared gate. This yields a single sparse-GP posterior, not a mixture of GP experts. DeRegiME addresses a key limitation of neural forecasters: point forecasts discard residual uncertainty, and probabilistic heads -- whether single marginals, uninterpreted mixtures, quantile sets, or diffusion samples -- rarely expose the regime structure of the residual. Yet distribution shift in noisy heteroskedastic time series may be abrupt, gradual, or horizon-dependent and often appears in residual uncertainty rather than the conditional mean. DeRegiME yields an interpretable mean-residual-noise decomposition with a direct-sum feature-space representation that anchors regimes as clusters of residual similarity whose transitions surface as implicit changepoints. The effective number of regimes is pruned by the stick-breaking gate. We prove kernel validity and predictive-density propriety, and across ten benchmarks and three encoder grids DeRegiME improves negative log predictive density (NLPD) by 20.3% over the strongest encoder-matched baseline, a DeepAR/GluonTS-style dynamic Student-t head, with parallel gains on CRPS (3.0%) and MSE (4.7%). Improvements are consistent across all datasets, which span abrupt, gradual, and seasonal shifts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces DeRegiME, a direct multi-horizon probabilistic forecaster that separates latent uncertainty regimes from the underlying signal via a mean-residual-noise decomposition. Residual uncertainty is modeled by softly assigning forecast locations to a finite set of recurring regimes using a shared sparse variational Gaussian process with a nonstationary regime-mixing kernel, stick-breaking gate, and per-regime Student-t noise processes, yielding a single sparse-GP posterior. The work proves kernel validity and predictive-density propriety, and reports consistent empirical gains of 20.3% in NLPD, 3.0% in CRPS, and 4.7% in MSE over an encoder-matched DeepAR/GluonTS-style dynamic Student-t baseline across ten benchmarks spanning abrupt, gradual, and seasonal distribution shifts.

Significance. If the central claims hold, the contribution would be significant for probabilistic time-series forecasting under distribution shift. By focusing regime modeling on residual uncertainty rather than the conditional mean and providing an interpretable direct-sum feature-space representation, the approach addresses a limitation of standard neural probabilistic heads. Explicit credit is due for the kernel-validity and predictive-density proofs as well as the consistent gains across three encoder grids and ten benchmarks with varied shift types; these elements strengthen the case for practical utility in heteroskedastic noisy series.

major comments (3)
  1. [Empirical Evaluation] The central empirical claim attributes the NLPD gains specifically to the regime mechanism in the residual uncertainty (Abstract). However, without an ablation that isolates the shared sparse variational GP gate and nonstationary kernel from other components (e.g., the Student-t likelihood or encoder), it remains unclear whether the 20.3% improvement is driven by regime discovery or by the overall model capacity; this directly affects whether the encoder-matched baseline comparison isolates the contribution of the regime structure.
  2. [Model Description] The weakest modeling assumption is that residual uncertainty can be captured by a finite set of recurring regimes whose transitions are learnable via the shared sparse variational GP gate with nonstationary kernel (Abstract and model description). The manuscript provides no diagnostic evidence—such as regime-assignment visualizations, changepoint alignment with known shifts, or sensitivity to the effective number of regimes after stick-breaking pruning—showing that the soft assignments correspond to actual distribution-shift structure rather than spurious clusters in the residuals.
  3. [Variational Inference] The variational approximation quality for the single sparse-GP posterior under multi-horizon forecasting is load-bearing for avoiding gate collapse or overfitting (Skeptic note and variational inference section). The paper lacks convergence diagnostics, ELBO gap analysis, or posterior predictive checks across the ten benchmarks that would confirm the approximation remains reliable when the nonstationary kernel and Student-t processes are combined.
minor comments (2)
  1. [Abstract] The abstract states improvements are 'consistent across all datasets' but does not report per-dataset breakdowns or statistical significance tests; adding these would improve transparency without altering the main claims.
  2. [Notation] Notation for the direct-sum feature-space representation anchoring regimes as clusters of residual similarity would benefit from an explicit equation or small diagram in the methods section.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for the detailed and insightful comments on our manuscript. These suggestions have helped us identify areas where we can provide additional clarity and evidence. We address each major comment below and indicate the revisions we will make in the updated version of the paper.

read point-by-point responses
  1. Referee: [Empirical Evaluation] The central empirical claim attributes the NLPD gains specifically to the regime mechanism in the residual uncertainty (Abstract). However, without an ablation that isolates the shared sparse variational GP gate and nonstationary kernel from other components (e.g., the Student-t likelihood or encoder), it remains unclear whether the 20.3% improvement is driven by regime discovery or by the overall model capacity; this directly affects whether the encoder-matched baseline comparison isolates the contribution of the regime structure.

    Authors: We thank the referee for highlighting this important point regarding the attribution of performance gains. While our encoder-matched baseline comparison controls for the encoder architecture and uses a comparable dynamic Student-t head, we agree that an ablation specifically removing the regime-mixing components would more directly isolate their contribution. In the revised manuscript, we will add an ablation study where we replace the shared sparse variational GP with nonstationary kernel and stick-breaking gate with a simpler shared GP or fixed regime structure, keeping the encoder and likelihood the same. This will allow us to quantify the incremental benefit of the regime mechanism. We believe this will strengthen the empirical claims without altering the core results. revision: yes

  2. Referee: [Model Description] The weakest modeling assumption is that residual uncertainty can be captured by a finite set of recurring regimes whose transitions are learnable via the shared sparse variational GP gate with nonstationary kernel (Abstract and model description). The manuscript provides no diagnostic evidence—such as regime-assignment visualizations, changepoint alignment with known shifts, or sensitivity to the effective number of regimes after stick-breaking pruning—showing that the soft assignments correspond to actual distribution-shift structure rather than spurious clusters in the residuals.

    Authors: We appreciate the referee's concern about validating the interpretability of the learned regimes. Although the manuscript emphasizes the mean-residual-noise decomposition and direct-sum representation, we acknowledge the value of explicit diagnostics. In the revision, we will include visualizations of regime assignments over time for representative datasets, demonstrating alignment with known abrupt and gradual shifts. Additionally, we will report the effective number of regimes after stick-breaking pruning and sensitivity analysis to the truncation level. These additions will provide evidence that the regimes capture meaningful residual uncertainty structures rather than artifacts. revision: yes

  3. Referee: [Variational Inference] The variational approximation quality for the single sparse-GP posterior under multi-horizon forecasting is load-bearing for avoiding gate collapse or overfitting (Skeptic note and variational inference section). The paper lacks convergence diagnostics, ELBO gap analysis, or posterior predictive checks across the ten benchmarks that would confirm the approximation remains reliable when the nonstationary kernel and Student-t processes are combined.

    Authors: We recognize the importance of assessing the quality of the variational approximation, particularly given the combination of nonstationary kernel and Student-t processes. The design of a single shared sparse-GP posterior is intended to promote stability and avoid the collapse issues common in mixture models. To address this, we will incorporate ELBO convergence curves and a selection of posterior predictive checks in the supplementary material for the revised submission. While providing exhaustive diagnostics for all ten benchmarks and all encoder grids may exceed space constraints, we will present representative results across different shift types to demonstrate the reliability of the approximation. revision: partial

Circularity Check

0 steps flagged

Derivation chain is self-contained; no circular reductions identified

full rationale

The paper constructs DeRegiME directly as a mean-residual-noise decomposition with a shared sparse variational GP gate using a nonstationary regime-mixing kernel and Student-t likelihood, yielding a single sparse-GP posterior. It reports empirical gains over encoder-matched baselines and states proofs for kernel validity and predictive-density propriety. No equations or steps in the abstract reduce a claimed prediction or result to a fitted parameter or self-citation by construction. The regime assignments and improvements are presented as outcomes of the model rather than inputs redefined as outputs. This is the normal case of an independent architectural proposal.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The abstract implies several modeling choices whose details are not provided. The stick-breaking gate and per-regime sub-kernels are central but their exact parameterization is not specified here.

free parameters (2)
  • stick-breaking gate parameters
    The effective number of regimes is pruned by the stick-breaking gate, requiring hyperparameters that control regime count and sparsity.
  • regime-mixing kernel hyperparameters
    Nonstationary regime-mixing kernel combines per-regime sub-kernels, implying fitted lengthscales or variances per regime.
axioms (2)
  • standard math The nonstationary regime-mixing kernel is a valid positive semi-definite kernel
    Abstract states that kernel validity is proved.
  • standard math The Student-t likelihood yields a proper predictive density when combined with the GP posterior
    Abstract claims predictive-density propriety is proved.

pith-pipeline@v0.9.0 · 5821 in / 1434 out tokens · 46285 ms · 2026-05-20T07:51:22.786882+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

100 extracted references · 100 canonical work pages · 3 internal anchors

  1. [1]

    and MacKay, David J

    Adams, Ryan P. and MacKay, David J. C. , title =. 2007 , note =

  2. [2]

    and Rangapuram, Syama and Salinas, David and Schulz, Jasper and Stella, Lorenzo and T

    Alexandrov, Alexander and Benidis, Konstantinos and Bohlke-Schneider, Michael and Flunkert, Valentin and Gasthaus, Jan and Januschowski, Tim and Maddix, Danielle C. and Rangapuram, Syama and Salinas, David and Schulz, Jasper and Stella, Lorenzo and T. Journal of Machine Learning Research , volume =

  3. [3]

    , title =

    Bishop, Christopher M. , title =

  4. [4]

    Bayesian Inference for Finite Mixtures of Univariate and Multivariate Skew-Normal and Skew- t Distributions , journal =

    Fr. Bayesian Inference for Finite Mixtures of Univariate and Multivariate Skew-Normal and Skew- t Distributions , journal =

  5. [5]

    Journal of Machine Learning Research , volume =

    Fedus, William and Zoph, Barret and Shazeer, Noam , title =. Journal of Machine Learning Research , volume =

  6. [6]

    and Reece, Steven and Rogers, Alex and Roberts, Stephen J

    Garnett, Roman and Osborne, Michael A. and Reece, Steven and Rogers, Alex and Roberts, Stephen J. , title =. The Computer Journal , volume =

  7. [7]

    Garnelo, Marta and Rosenbaum, Dan and Maddison, Christopher and Ramalho, Tiago and Saxton, David and Shanahan, Murray and Teh, Yee Whye and Rezende, Danilo and Eslami, S. M. Ali , title =. Proceedings of the International Conference on Machine Learning , year =

  8. [8]

    and Eslami, S

    Garnelo, Marta and Schwarz, Jonathan and Rosenbaum, Dan and Viola, Fabio and Rezende, Danilo J. and Eslami, S. M. Ali and Teh, Yee Whye , title =. 2018 , note =

  9. [9]

    , title =

    Gneiting, Tilmann and Raftery, Adrian E. , title =. Journal of the American Statistical Association , volume =

  10. [10]

    , title =

    Hamilton, James D. , title =. Econometrica , volume =

  11. [11]

    , title =

    Hensman, James and Fusi, Nicolo and Lawrence, Neil D. , title =. Proceedings of the Conference on Uncertainty in Artificial Intelligence , year =

  12. [12]

    Scandinavian Journal of Statistics , volume =

    Holzmann, Hajo and Munk, Axel and Gneiting, Tilmann , title =. Scandinavian Journal of Statistics , volume =

  13. [13]

    , title =

    Ishwaran, Hemant and James, Lancelot F. , title =. Journal of the American Statistical Association , volume =

  14. [14]

    and Jordan, Michael I

    Jacobs, Robert A. and Jordan, Michael I. and Nowlan, Steven J. and Hinton, Geoffrey E. , title =. Neural Computation , volume =

  15. [15]

    and Jacobs, Robert A

    Jordan, Michael I. and Jacobs, Robert A. , title =. Neural Computation , volume =

  16. [16]

    International Conference on Learning Representations , year =

    Kim, Taesung and Kim, Jinhee and Tae, Yunwon and Park, Cheonbok and Choi, Jang-Ho and Choo, Jaegul , title =. International Conference on Learning Representations , year =

  17. [17]

    Advances in Neural Information Processing Systems , year =

    Kollovieh, Marcel and Ansari, Abdul Fatir and Bohlke-Schneider, Michael and Zschiegner, Jasper and Wang, Hao and Wang, Yuyang , title =. Advances in Neural Information Processing Systems , year =

  18. [18]

    and Little, Roderick J

    Lange, Kenneth L. and Little, Roderick J. A. and Taylor, Jeremy M. G. , title =. Journal of the American Statistical Association , volume =

  19. [19]

    Temporal Fusion Transformers for Interpretable Multi-Horizon Time Series Forecasting , journal =

    Lim, Bryan and Arik, Sercan. Temporal Fusion Transformers for Interpretable Multi-Horizon Time Series Forecasting , journal =

  20. [20]

    2023 , note =

    Liu, Yong and Hu, Tengge and Zhang, Haoran and Wu, Haixu and Wang, Shiyu and Ma, Lintao and Long, Mingsheng , title =. 2023 , note =

  21. [21]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume =

    Zeng, Ailing and Chen, Muxi and Zhang, Lei and Xu, Qiang , title =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2023 , doi =

  22. [22]

    and Basford, Kaye E

    McLachlan, Geoffrey J. and Basford, Kaye E. , title =

  23. [23]

    and Peel, David , title =

    McLachlan, Geoffrey J. and Peel, David , title =

  24. [24]

    and Sinthong, Phanwadee and Kalagnanam, Jayant , title =

    Nie, Yuqi and Nguyen, Nam H. and Sinthong, Phanwadee and Kalagnanam, Jayant , title =. International Conference on Learning Representations , year =

  25. [25]

    , title =

    Peel, David and McLachlan, Geoffrey J. , title =. Statistics and Computing , volume =

  26. [26]

    and Sheng, Zhenli and Yang, Bin , title =

    Qiu, Xiangfei and Hu, Jilin and Zhou, Lekui and Wu, Xingjian and Du, Junyang and Zhang, Buang and Guo, Chenjuan and Zhou, Aoying and Jensen, Christian S. and Sheng, Zhenli and Yang, Bin , title =. Proceedings of the VLDB Endowment , volume =. 2024 , doi =

  27. [27]

    and Ghahramani, Zoubin , title =

    Rasmussen, Carl E. and Ghahramani, Zoubin , title =. Advances in Neural Information Processing Systems , year =

  28. [28]

    and Williams, Christopher K

    Rasmussen, Carl E. and Williams, Christopher K. I. , title =

  29. [29]

    International Conference on Learning Representations , year =

    Rasul, Kashif and Sheikh, Abdul-Saboor and Schuster, Ingmar and Bergmann, Urs and Vollgraf, Roland , title =. International Conference on Learning Representations , year =

  30. [30]

    Proceedings of the International Conference on Machine Learning , year =

    Rasul, Kashif and Seward, Calvin and Schuster, Ingmar and Vollgraf, Roland , title =. Proceedings of the International Conference on Machine Learning , year =

  31. [31]

    International Journal of Forecasting , volume =

    Salinas, David and Flunkert, Valentin and Gasthaus, Jan and Januschowski, Tim , title =. International Journal of Forecasting , volume =

  32. [32]

    Statistica Sinica , volume =

    Sethuraman, Jayaram , title =. Statistica Sinica , volume =

  33. [33]

    International Conference on Learning Representations , year =

    Shazeer, Noam and Mirhoseini, Azalia and Maziarz, Krzysztof and Davis, Andy and Le, Quoc and Hinton, Geoffrey and Dean, Jeff , title =. International Conference on Learning Representations , year =

  34. [34]

    Annals of Mathematical Statistics , volume =

    Teicher, Henry , title =. Annals of Mathematical Statistics , volume =

  35. [35]

    Proceedings of the International Conference on Artificial Intelligence and Statistics , year =

    Titsias, Michalis , title =. Proceedings of the International Conference on Artificial Intelligence and Statistics , year =

  36. [36]

    , title =

    Trefethen, Lloyd N. , title =. SIAM Review , volume =

  37. [37]

    Advances in Neural Information Processing Systems , year =

    Tresp, Volker , title =. Advances in Neural Information Processing Systems , year =

  38. [38]

    SIAM/ASA Journal on Uncertainty Quantification , volume =

    Volodina, Victoria and Williamson, Daniel , title =. SIAM/ASA Journal on Uncertainty Quantification , volume =

  39. [39]

    and Hu, Zhiting and Salakhutdinov, Ruslan and Xing, Eric P

    Wilson, Andrew G. and Hu, Zhiting and Salakhutdinov, Ruslan and Xing, Eric P. , title =. Proceedings of the International Conference on Artificial Intelligence and Statistics , year =

  40. [40]

    Advances in Neural Information Processing Systems , year =

    Wu, Haixu and Xu, Jiehui and Wang, Jianmin and Long, Mingsheng , title =. Advances in Neural Information Processing Systems , year =

  41. [41]

    The Journal of Financial Data Science , volume =

    Wood, Kieran and Roberts, Stephen and Zohren, Stefan , title =. The Journal of Financial Data Science , volume =. 2022 , doi =

  42. [42]

    and Spragins, John D

    Yakowitz, Sidney J. and Spragins, John D. , title =. Annals of Mathematical Statistics , volume =

  43. [43]

    2021 , note =

    Yan, Tijin and Zhang, Hongwei and Zhou, Tong and Zhan, Yufeng and Xia, Yuanqing , title =. 2021 , note =

  44. [44]

    Proceedings of the AAAI Conference on Artificial Intelligence , year =

    Zhou, Haoyi and Zhang, Shanghang and Peng, Jieqi and Zhang, Shuai and Li, Jianxin and Xiong, Hui and Zhang, Wancai , title =. Proceedings of the AAAI Conference on Artificial Intelligence , year =

  45. [45]

    and Mahoney, Michael W

    Ansari, Abdul Fatir and Stella, Lorenzo and Turkmen, Caner and Zhang, Xiyuan and Mercado, Pedro and Shen, Huibin and Shchur, Oleksandr and Rangapuram, Syama Sundar and Arango, Sebastian Pineda and Kapoor, Shubham and Zschiegner, Jasper and Maddix, Danielle C. and Mahoney, Michael W. and Torkkola, Kari and Wilson, Andrew Gordon and Bohlke-Schneider, Michae...

  46. [46]

    Proceedings of the International Conference on Machine Learning , year =

    Goswami, Mononito and Szafer, Konrad and Choudhry, Arjun and Cai, Yifu and Li, Shuo and Dubrawski, Artur , title =. Proceedings of the International Conference on Machine Learning , year =

  47. [47]

    Proceedings of the International Conference on Machine Learning , year =

    Das, Abhimanyu and Kong, Weihao and Sen, Rajat and Zhou, Yichen , title =. Proceedings of the International Conference on Machine Learning , year =

  48. [48]

    Proceedings of the International Conference on Machine Learning , year =

    Woo, Gerald and Liu, Chenghao and Kumar, Akshat and Xiong, Caiming and Savarese, Silvio and Sahoo, Doyen , title =. Proceedings of the International Conference on Machine Learning , year =

  49. [49]

    and Zhou, Jun , title =

    Wang, Shiyu and Wu, Haixu and Shi, Xiaoming and Hu, Tengge and Luo, Huakun and Ma, Lintao and Zhang, James Y. and Zhou, Jun , title =. Proceedings of the International Conference on Learning Representations , year =

  50. [50]

    Modelling Extremal Events for Insurance and Finance , publisher =

    Embrechts, Paul and Kl. Modelling Extremal Events for Insurance and Finance , publisher =

  51. [51]

    Bayesian Online Changepoint Detection

    Ryan P. Adams and David J. C. MacKay. Bayesian online changepoint detection, 2007. arXiv:0710.3742

  52. [52]

    Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, Lorenzo Stella, Ali Caner T \"u rkmen, and Yuyang Wang

    Alexander Alexandrov, Konstantinos Benidis, Michael Bohlke-Schneider, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Danielle C. Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, Lorenzo Stella, Ali Caner T \"u rkmen, and Yuyang Wang. GluonTS : Probabilistic and neural time series modeling in python. Journal of Machine Learning Research, 21 0 (1...

  53. [53]

    Maddix, Michael W

    Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Maddix, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, and Yuyang Wang. Chronos: Learning the language of time se...

  54. [54]

    Christopher M. Bishop. Mixture density networks. Technical Report NCRG/94/004, Aston University, 1994

  55. [55]

    A decoder-only foundation model for time-series forecasting

    Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting. In Proceedings of the International Conference on Machine Learning, 2024

  56. [56]

    Modelling Extremal Events for Insurance and Finance, volume 33 of Applications of Mathematics

    Paul Embrechts, Claudia Kl \"u ppelberg, and Thomas Mikosch. Modelling Extremal Events for Insurance and Finance, volume 33 of Applications of Mathematics. Springer, 1997

  57. [57]

    Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity

    William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Journal of Machine Learning Research, 23 0 (120): 0 1--39, 2022

  58. [58]

    Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew- t distributions

    Sylvia Fr \"u hwirth-Schnatter and Saumyadipta Pyne. Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew- t distributions. Biostatistics, 11 0 (2): 0 317--336, 2010

  59. [59]

    Marta Garnelo, Dan Rosenbaum, Christopher Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo Rezende, and S. M. Ali Eslami. Conditional neural processes. In Proceedings of the International Conference on Machine Learning, 2018 a

  60. [60]

    Neural Processes

    Marta Garnelo, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S. M. Ali Eslami, and Yee Whye Teh. Neural processes, 2018 b . arXiv:1807.01622

  61. [61]

    Osborne, Steven Reece, Alex Rogers, and Stephen J

    Roman Garnett, Michael A. Osborne, Steven Reece, Alex Rogers, and Stephen J. Roberts. Sequential Bayesian prediction in the presence of changepoints and faults. The Computer Journal, 53 0 (9): 0 1430--1446, 2010

  62. [62]

    Tilmann Gneiting and Adrian E. Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102 0 (477): 0 359--378, 2007

  63. [63]

    MOMENT : A family of open time-series foundation models

    Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, and Artur Dubrawski. MOMENT : A family of open time-series foundation models. In Proceedings of the International Conference on Machine Learning, 2024

  64. [64]

    Hamilton

    James D. Hamilton. A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica, 57 0 (2): 0 357--384, 1989

  65. [65]

    Lawrence

    James Hensman, Nicolo Fusi, and Neil D. Lawrence. Gaussian processes for big data. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, 2013

  66. [66]

    Identifiability of finite mixtures of elliptical distributions

    Hajo Holzmann, Axel Munk, and Tilmann Gneiting. Identifiability of finite mixtures of elliptical distributions. Scandinavian Journal of Statistics, 33 0 (4): 0 753--763, 2006

  67. [67]

    Hemant Ishwaran and Lancelot F. James. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96 0 (453): 0 161--173, 2001

  68. [68]

    Jacobs, Michael I

    Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. Adaptive mixtures of local experts. Neural Computation, 3 0 (1): 0 79--87, 1991

  69. [69]

    Jordan and Robert A

    Michael I. Jordan and Robert A. Jacobs. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6 0 (2): 0 181--214, 1994

  70. [70]

    Reversible instance normalization for accurate time-series forecasting against distribution shift

    Taesung Kim, Jinhee Kim, Yunwon Tae, Cheonbok Park, Jang-Ho Choi, and Jaegul Choo. Reversible instance normalization for accurate time-series forecasting against distribution shift. In International Conference on Learning Representations, 2022

  71. [71]

    Predict, refine, synthesize: Self-guiding diffusion models for probabilistic time series forecasting

    Marcel Kollovieh, Abdul Fatir Ansari, Michael Bohlke-Schneider, Jasper Zschiegner, Hao Wang, and Yuyang Wang. Predict, refine, synthesize: Self-guiding diffusion models for probabilistic time series forecasting. In Advances in Neural Information Processing Systems, 2023

  72. [72]

    Lange, Roderick J

    Kenneth L. Lange, Roderick J. A. Little, and Jeremy M. G. Taylor. Robust statistical modeling using the t distribution. Journal of the American Statistical Association, 84 0 (408): 0 881--896, 1989

  73. [73]

    Arik, Nicolas Loeff, and Tomas Pfister

    Bryan Lim, Sercan \"O . Arik, Nicolas Loeff, and Tomas Pfister. Temporal fusion transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting, 37 0 (4): 0 1748--1764, 2021

  74. [74]

    iTransformer: Inverted Transformers Are Effective for Time Series Forecasting

    Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. i T ransformer: Inverted transformers are effective for time series forecasting, 2023. arXiv:2310.06625

  75. [75]

    McLachlan and Kaye E

    Geoffrey J. McLachlan and Kaye E. Basford. Mixture Models: Inference and Applications to Clustering. Marcel Dekker, 1988

  76. [76]

    McLachlan and David Peel

    Geoffrey J. McLachlan and David Peel. Finite Mixture Models. Wiley, 2000

  77. [77]

    Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam

    Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. In International Conference on Learning Representations, 2023

  78. [78]

    McLachlan

    David Peel and Geoffrey J. McLachlan. Robust mixture modelling using the t distribution. Statistics and Computing, 10 0 (4): 0 339--348, 2000

  79. [79]

    and Sheng, Zhenli and Yang, Bin , title =

    Xiangfei Qiu, Jilin Hu, Lekui Zhou, Xingjian Wu, Junyang Du, Buang Zhang, Chenjuan Guo, Aoying Zhou, Christian S. Jensen, Zhenli Sheng, and Bin Yang. TFB : Towards comprehensive and fair benchmarking of time series forecasting methods. Proceedings of the VLDB Endowment, 17 0 (9): 0 2363--2377, 2024. doi:10.14778/3665844.3665863

  80. [80]

    Rasmussen and Zoubin Ghahramani

    Carl E. Rasmussen and Zoubin Ghahramani. Infinite mixtures of gaussian process experts. In Advances in Neural Information Processing Systems, 2002

Showing first 80 references.