DeRegiME: Deep Regime Mixtures for Probabilistic Forecasting under Distribution Shift

Kieran Wood; Stefan Zohren; Stephen J. Roberts

arxiv: 2605.19231 · v1 · pith:53LKJRAXnew · submitted 2026-05-19 · 💻 cs.LG · stat.ML

DeRegiME: Deep Regime Mixtures for Probabilistic Forecasting under Distribution Shift

Kieran Wood , Stefan Zohren , Stephen J. Roberts This is my paper

Pith reviewed 2026-05-20 07:51 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords probabilistic forecastingdistribution shiftregime mixturesGaussian processtime seriesneural networksuncertainty estimation

0 comments

The pith

DeRegiME models residual uncertainty in time series as recurring regimes assigned by a shared sparse variational Gaussian process gate.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DeRegiME to separate the mean signal from latent uncertainty regimes in probabilistic forecasting. It uses a sparse variational GP with a nonstationary kernel to softly cluster residual patterns into a finite set of recurring regimes and combine them with per-regime noise processes through a single shared gate. This yields an interpretable decomposition that surfaces implicit changepoints when distribution shifts affect uncertainty rather than the conditional mean. A sympathetic reader would care because standard neural forecasters often discard or fail to expose this residual structure, leading to poorer handling of abrupt, gradual, or seasonal shifts. The result is a single sparse-GP posterior rather than a full mixture of experts.

Core claim

DeRegiME establishes that a direct-sum feature-space representation formed by a shared sparse variational GP gate with nonstationary regime-mixing kernel and Student-t likelihood can capture recurring regimes in residual uncertainty, producing improved negative log predictive density, CRPS, and MSE on ten benchmarks spanning different shift types while pruning the number of active regimes via stick-breaking.

What carries the argument

The shared sparse variational GP gate with nonstationary regime-mixing kernel, which softly assigns each forecast location to learned recurring regimes and mixes per-regime sub-kernels plus noise processes into one posterior.

If this is right

The approach produces an interpretable mean-residual-noise decomposition with regimes as clusters of residual similarity.
Regime transitions appear as implicit changepoints in the uncertainty structure.
The stick-breaking gate automatically determines the effective number of regimes.
Kernel validity and predictive-density propriety are formally established.
Gains hold consistently across abrupt, gradual, and seasonal distribution shifts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The regime assignments could serve as an unsupervised detector for entering new uncertainty states in unlabeled streams.
Decision systems might condition actions on the current regime rather than a single forecast distribution.
The nonstationary kernel structure may extend naturally to modeling horizon-dependent uncertainty changes.
Similar gate mechanisms could be adapted to other sequence tasks where residual structure carries the shift signal.

Load-bearing premise

Residual uncertainty follows patterns that repeat across a small number of recurring regimes whose transitions can be learned by the sparse variational GP gate.

What would settle it

Replacing the sparse GP gate with a standard mixture head without the nonstationary kernel or residual clustering and seeing the NLPD gains disappear on the same benchmarks would falsify the value of the regime mechanism.

Figures

Figures reproduced from arXiv: 2605.19231 by Kieran Wood, Stefan Zohren, Stephen J. Roberts.

**Figure 1.** Figure 1: DeRegiME pipeline, past-context routing (grey) and predictive to the forecast (teal). 1 arXiv:2605.19231v1 [cs.LG] 19 May 2026 [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Representative DeRegiME regime diagnostics: outbreak (Illness), eurozone-debt-crisis shift (GBP), few-state reuse (aggregate Traffic), and a horizon-indexed intraday transition (ETTm1). Gate colours per panel are local display ranks by average mass, with grey marking “Other”. References Ryan P. Adams and David J. C. MacKay. Bayesian online changepoint detection, 2007. arXiv:0710.3742. Alexander Alexandrov,… view at source ↗

**Figure 3.** Figure 3: Additional representative DeRegiME diagnostics for the six benchmark datasets not shown in [PITH_FULL_IMAGE:figures/full_fig_p027_3.png] view at source ↗

read the original abstract

We introduce DeRegiME -- Deep Regime Mixture of Experts -- a direct multi-horizon probabilistic forecaster that separates latent uncertainty regimes from the underlying signal and softly assigns each forecast location to learned recurring regimes using a sparse variational Gaussian process (GP) whose nonstationary regime-mixing kernel and Student-t likelihood combine per-regime sub-kernels and noise processes via a shared gate. This yields a single sparse-GP posterior, not a mixture of GP experts. DeRegiME addresses a key limitation of neural forecasters: point forecasts discard residual uncertainty, and probabilistic heads -- whether single marginals, uninterpreted mixtures, quantile sets, or diffusion samples -- rarely expose the regime structure of the residual. Yet distribution shift in noisy heteroskedastic time series may be abrupt, gradual, or horizon-dependent and often appears in residual uncertainty rather than the conditional mean. DeRegiME yields an interpretable mean-residual-noise decomposition with a direct-sum feature-space representation that anchors regimes as clusters of residual similarity whose transitions surface as implicit changepoints. The effective number of regimes is pruned by the stick-breaking gate. We prove kernel validity and predictive-density propriety, and across ten benchmarks and three encoder grids DeRegiME improves negative log predictive density (NLPD) by 20.3% over the strongest encoder-matched baseline, a DeepAR/GluonTS-style dynamic Student-t head, with parallel gains on CRPS (3.0%) and MSE (4.7%). Improvements are consistent across all datasets, which span abrupt, gradual, and seasonal shifts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DeRegiME puts a sparse variational GP gate with nonstationary kernel on residual uncertainty for regime-aware probabilistic forecasting and reports consistent NLPD gains, though the finite recurring regimes assumption is the least secured part.

read the letter

The main point is that DeRegiME separates residual uncertainty into a finite set of recurring regimes using a shared sparse variational GP with a nonstationary regime-mixing kernel and Student-t noise per regime, all while producing a single posterior rather than a mixture of experts. It adds stick-breaking to prune regimes and claims a direct-sum feature representation that surfaces implicit changepoints in the residuals. The paper proves kernel validity and predictive-density propriety, then shows a 20.3% NLPD lift over an encoder-matched DeepAR-style baseline across ten benchmarks that include abrupt, gradual, and seasonal shifts, with smaller gains on CRPS and MSE that hold for three different encoder grids.

Referee Report

3 major / 2 minor

Summary. The paper introduces DeRegiME, a direct multi-horizon probabilistic forecaster that separates latent uncertainty regimes from the underlying signal via a mean-residual-noise decomposition. Residual uncertainty is modeled by softly assigning forecast locations to a finite set of recurring regimes using a shared sparse variational Gaussian process with a nonstationary regime-mixing kernel, stick-breaking gate, and per-regime Student-t noise processes, yielding a single sparse-GP posterior. The work proves kernel validity and predictive-density propriety, and reports consistent empirical gains of 20.3% in NLPD, 3.0% in CRPS, and 4.7% in MSE over an encoder-matched DeepAR/GluonTS-style dynamic Student-t baseline across ten benchmarks spanning abrupt, gradual, and seasonal distribution shifts.

Significance. If the central claims hold, the contribution would be significant for probabilistic time-series forecasting under distribution shift. By focusing regime modeling on residual uncertainty rather than the conditional mean and providing an interpretable direct-sum feature-space representation, the approach addresses a limitation of standard neural probabilistic heads. Explicit credit is due for the kernel-validity and predictive-density proofs as well as the consistent gains across three encoder grids and ten benchmarks with varied shift types; these elements strengthen the case for practical utility in heteroskedastic noisy series.

major comments (3)

[Empirical Evaluation] The central empirical claim attributes the NLPD gains specifically to the regime mechanism in the residual uncertainty (Abstract). However, without an ablation that isolates the shared sparse variational GP gate and nonstationary kernel from other components (e.g., the Student-t likelihood or encoder), it remains unclear whether the 20.3% improvement is driven by regime discovery or by the overall model capacity; this directly affects whether the encoder-matched baseline comparison isolates the contribution of the regime structure.
[Model Description] The weakest modeling assumption is that residual uncertainty can be captured by a finite set of recurring regimes whose transitions are learnable via the shared sparse variational GP gate with nonstationary kernel (Abstract and model description). The manuscript provides no diagnostic evidence—such as regime-assignment visualizations, changepoint alignment with known shifts, or sensitivity to the effective number of regimes after stick-breaking pruning—showing that the soft assignments correspond to actual distribution-shift structure rather than spurious clusters in the residuals.
[Variational Inference] The variational approximation quality for the single sparse-GP posterior under multi-horizon forecasting is load-bearing for avoiding gate collapse or overfitting (Skeptic note and variational inference section). The paper lacks convergence diagnostics, ELBO gap analysis, or posterior predictive checks across the ten benchmarks that would confirm the approximation remains reliable when the nonstationary kernel and Student-t processes are combined.

minor comments (2)

[Abstract] The abstract states improvements are 'consistent across all datasets' but does not report per-dataset breakdowns or statistical significance tests; adding these would improve transparency without altering the main claims.
[Notation] Notation for the direct-sum feature-space representation anchoring regimes as clusters of residual similarity would benefit from an explicit equation or small diagram in the methods section.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for the detailed and insightful comments on our manuscript. These suggestions have helped us identify areas where we can provide additional clarity and evidence. We address each major comment below and indicate the revisions we will make in the updated version of the paper.

read point-by-point responses

Referee: [Empirical Evaluation] The central empirical claim attributes the NLPD gains specifically to the regime mechanism in the residual uncertainty (Abstract). However, without an ablation that isolates the shared sparse variational GP gate and nonstationary kernel from other components (e.g., the Student-t likelihood or encoder), it remains unclear whether the 20.3% improvement is driven by regime discovery or by the overall model capacity; this directly affects whether the encoder-matched baseline comparison isolates the contribution of the regime structure.

Authors: We thank the referee for highlighting this important point regarding the attribution of performance gains. While our encoder-matched baseline comparison controls for the encoder architecture and uses a comparable dynamic Student-t head, we agree that an ablation specifically removing the regime-mixing components would more directly isolate their contribution. In the revised manuscript, we will add an ablation study where we replace the shared sparse variational GP with nonstationary kernel and stick-breaking gate with a simpler shared GP or fixed regime structure, keeping the encoder and likelihood the same. This will allow us to quantify the incremental benefit of the regime mechanism. We believe this will strengthen the empirical claims without altering the core results. revision: yes
Referee: [Model Description] The weakest modeling assumption is that residual uncertainty can be captured by a finite set of recurring regimes whose transitions are learnable via the shared sparse variational GP gate with nonstationary kernel (Abstract and model description). The manuscript provides no diagnostic evidence—such as regime-assignment visualizations, changepoint alignment with known shifts, or sensitivity to the effective number of regimes after stick-breaking pruning—showing that the soft assignments correspond to actual distribution-shift structure rather than spurious clusters in the residuals.

Authors: We appreciate the referee's concern about validating the interpretability of the learned regimes. Although the manuscript emphasizes the mean-residual-noise decomposition and direct-sum representation, we acknowledge the value of explicit diagnostics. In the revision, we will include visualizations of regime assignments over time for representative datasets, demonstrating alignment with known abrupt and gradual shifts. Additionally, we will report the effective number of regimes after stick-breaking pruning and sensitivity analysis to the truncation level. These additions will provide evidence that the regimes capture meaningful residual uncertainty structures rather than artifacts. revision: yes
Referee: [Variational Inference] The variational approximation quality for the single sparse-GP posterior under multi-horizon forecasting is load-bearing for avoiding gate collapse or overfitting (Skeptic note and variational inference section). The paper lacks convergence diagnostics, ELBO gap analysis, or posterior predictive checks across the ten benchmarks that would confirm the approximation remains reliable when the nonstationary kernel and Student-t processes are combined.

Authors: We recognize the importance of assessing the quality of the variational approximation, particularly given the combination of nonstationary kernel and Student-t processes. The design of a single shared sparse-GP posterior is intended to promote stability and avoid the collapse issues common in mixture models. To address this, we will incorporate ELBO convergence curves and a selection of posterior predictive checks in the supplementary material for the revised submission. While providing exhaustive diagnostics for all ten benchmarks and all encoder grids may exceed space constraints, we will present representative results across different shift types to demonstrate the reliability of the approximation. revision: partial

Circularity Check

0 steps flagged

Derivation chain is self-contained; no circular reductions identified

full rationale

The paper constructs DeRegiME directly as a mean-residual-noise decomposition with a shared sparse variational GP gate using a nonstationary regime-mixing kernel and Student-t likelihood, yielding a single sparse-GP posterior. It reports empirical gains over encoder-matched baselines and states proofs for kernel validity and predictive-density propriety. No equations or steps in the abstract reduce a claimed prediction or result to a fitted parameter or self-citation by construction. The regime assignments and improvements are presented as outcomes of the model rather than inputs redefined as outputs. This is the normal case of an independent architectural proposal.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The abstract implies several modeling choices whose details are not provided. The stick-breaking gate and per-regime sub-kernels are central but their exact parameterization is not specified here.

free parameters (2)

stick-breaking gate parameters
The effective number of regimes is pruned by the stick-breaking gate, requiring hyperparameters that control regime count and sparsity.
regime-mixing kernel hyperparameters
Nonstationary regime-mixing kernel combines per-regime sub-kernels, implying fitted lengthscales or variances per regime.

axioms (2)

standard math The nonstationary regime-mixing kernel is a valid positive semi-definite kernel
Abstract states that kernel validity is proved.
standard math The Student-t likelihood yields a proper predictive density when combined with the GP posterior
Abstract claims predictive-density propriety is proved.

pith-pipeline@v0.9.0 · 5821 in / 1434 out tokens · 46285 ms · 2026-05-20T07:51:22.786882+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We formulate direct multi-horizon forecasting as a decomposition into a shared conditional mean, structured GP residual signal, and regime-dependent residual uncertainty... regime-mixing kernel K_mix(ξ,ξ′)=∑ π_r(ξ)π_r(ξ′)K_r(z_r(ξ),z_r(ξ′))
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Proposition 1 (Direct-sum feature-space representation)... Theorem 2 (Positive semi-definiteness)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

100 extracted references · 100 canonical work pages · 3 internal anchors

[1]

and MacKay, David J

Adams, Ryan P. and MacKay, David J. C. , title =. 2007 , note =

work page 2007
[2]

and Rangapuram, Syama and Salinas, David and Schulz, Jasper and Stella, Lorenzo and T

Alexandrov, Alexander and Benidis, Konstantinos and Bohlke-Schneider, Michael and Flunkert, Valentin and Gasthaus, Jan and Januschowski, Tim and Maddix, Danielle C. and Rangapuram, Syama and Salinas, David and Schulz, Jasper and Stella, Lorenzo and T. Journal of Machine Learning Research , volume =

work page
[3]

, title =

Bishop, Christopher M. , title =

work page
[4]

Bayesian Inference for Finite Mixtures of Univariate and Multivariate Skew-Normal and Skew- t Distributions , journal =

Fr. Bayesian Inference for Finite Mixtures of Univariate and Multivariate Skew-Normal and Skew- t Distributions , journal =

work page
[5]

Journal of Machine Learning Research , volume =

Fedus, William and Zoph, Barret and Shazeer, Noam , title =. Journal of Machine Learning Research , volume =

work page
[6]

and Reece, Steven and Rogers, Alex and Roberts, Stephen J

Garnett, Roman and Osborne, Michael A. and Reece, Steven and Rogers, Alex and Roberts, Stephen J. , title =. The Computer Journal , volume =

work page
[7]

Garnelo, Marta and Rosenbaum, Dan and Maddison, Christopher and Ramalho, Tiago and Saxton, David and Shanahan, Murray and Teh, Yee Whye and Rezende, Danilo and Eslami, S. M. Ali , title =. Proceedings of the International Conference on Machine Learning , year =

work page
[8]

and Eslami, S

Garnelo, Marta and Schwarz, Jonathan and Rosenbaum, Dan and Viola, Fabio and Rezende, Danilo J. and Eslami, S. M. Ali and Teh, Yee Whye , title =. 2018 , note =

work page 2018
[9]

, title =

Gneiting, Tilmann and Raftery, Adrian E. , title =. Journal of the American Statistical Association , volume =

work page
[10]

, title =

Hamilton, James D. , title =. Econometrica , volume =

work page
[11]

, title =

Hensman, James and Fusi, Nicolo and Lawrence, Neil D. , title =. Proceedings of the Conference on Uncertainty in Artificial Intelligence , year =

work page
[12]

Scandinavian Journal of Statistics , volume =

Holzmann, Hajo and Munk, Axel and Gneiting, Tilmann , title =. Scandinavian Journal of Statistics , volume =

work page
[13]

, title =

Ishwaran, Hemant and James, Lancelot F. , title =. Journal of the American Statistical Association , volume =

work page
[14]

and Jordan, Michael I

Jacobs, Robert A. and Jordan, Michael I. and Nowlan, Steven J. and Hinton, Geoffrey E. , title =. Neural Computation , volume =

work page
[15]

and Jacobs, Robert A

Jordan, Michael I. and Jacobs, Robert A. , title =. Neural Computation , volume =

work page
[16]

International Conference on Learning Representations , year =

Kim, Taesung and Kim, Jinhee and Tae, Yunwon and Park, Cheonbok and Choi, Jang-Ho and Choo, Jaegul , title =. International Conference on Learning Representations , year =

work page
[17]

Advances in Neural Information Processing Systems , year =

Kollovieh, Marcel and Ansari, Abdul Fatir and Bohlke-Schneider, Michael and Zschiegner, Jasper and Wang, Hao and Wang, Yuyang , title =. Advances in Neural Information Processing Systems , year =

work page
[18]

and Little, Roderick J

Lange, Kenneth L. and Little, Roderick J. A. and Taylor, Jeremy M. G. , title =. Journal of the American Statistical Association , volume =

work page
[19]

Temporal Fusion Transformers for Interpretable Multi-Horizon Time Series Forecasting , journal =

Lim, Bryan and Arik, Sercan. Temporal Fusion Transformers for Interpretable Multi-Horizon Time Series Forecasting , journal =

work page
[20]

2023 , note =

Liu, Yong and Hu, Tengge and Zhang, Haoran and Wu, Haixu and Wang, Shiyu and Ma, Lintao and Long, Mingsheng , title =. 2023 , note =

work page 2023
[21]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Zeng, Ailing and Chen, Muxi and Zhang, Lei and Xu, Qiang , title =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2023 , doi =

work page 2023
[22]

and Basford, Kaye E

McLachlan, Geoffrey J. and Basford, Kaye E. , title =

work page
[23]

and Peel, David , title =

McLachlan, Geoffrey J. and Peel, David , title =

work page
[24]

and Sinthong, Phanwadee and Kalagnanam, Jayant , title =

Nie, Yuqi and Nguyen, Nam H. and Sinthong, Phanwadee and Kalagnanam, Jayant , title =. International Conference on Learning Representations , year =

work page
[25]

, title =

Peel, David and McLachlan, Geoffrey J. , title =. Statistics and Computing , volume =

work page
[26]

and Sheng, Zhenli and Yang, Bin , title =

Qiu, Xiangfei and Hu, Jilin and Zhou, Lekui and Wu, Xingjian and Du, Junyang and Zhang, Buang and Guo, Chenjuan and Zhou, Aoying and Jensen, Christian S. and Sheng, Zhenli and Yang, Bin , title =. Proceedings of the VLDB Endowment , volume =. 2024 , doi =

work page 2024
[27]

and Ghahramani, Zoubin , title =

Rasmussen, Carl E. and Ghahramani, Zoubin , title =. Advances in Neural Information Processing Systems , year =

work page
[28]

and Williams, Christopher K

Rasmussen, Carl E. and Williams, Christopher K. I. , title =

work page
[29]

International Conference on Learning Representations , year =

Rasul, Kashif and Sheikh, Abdul-Saboor and Schuster, Ingmar and Bergmann, Urs and Vollgraf, Roland , title =. International Conference on Learning Representations , year =

work page
[30]

Proceedings of the International Conference on Machine Learning , year =

Rasul, Kashif and Seward, Calvin and Schuster, Ingmar and Vollgraf, Roland , title =. Proceedings of the International Conference on Machine Learning , year =

work page
[31]

International Journal of Forecasting , volume =

Salinas, David and Flunkert, Valentin and Gasthaus, Jan and Januschowski, Tim , title =. International Journal of Forecasting , volume =

work page
[32]

Statistica Sinica , volume =

Sethuraman, Jayaram , title =. Statistica Sinica , volume =

work page
[33]

International Conference on Learning Representations , year =

Shazeer, Noam and Mirhoseini, Azalia and Maziarz, Krzysztof and Davis, Andy and Le, Quoc and Hinton, Geoffrey and Dean, Jeff , title =. International Conference on Learning Representations , year =

work page
[34]

Annals of Mathematical Statistics , volume =

Teicher, Henry , title =. Annals of Mathematical Statistics , volume =

work page
[35]

Proceedings of the International Conference on Artificial Intelligence and Statistics , year =

Titsias, Michalis , title =. Proceedings of the International Conference on Artificial Intelligence and Statistics , year =

work page
[36]

, title =

Trefethen, Lloyd N. , title =. SIAM Review , volume =

work page
[37]

Advances in Neural Information Processing Systems , year =

Tresp, Volker , title =. Advances in Neural Information Processing Systems , year =

work page
[38]

SIAM/ASA Journal on Uncertainty Quantification , volume =

Volodina, Victoria and Williamson, Daniel , title =. SIAM/ASA Journal on Uncertainty Quantification , volume =

work page
[39]

and Hu, Zhiting and Salakhutdinov, Ruslan and Xing, Eric P

Wilson, Andrew G. and Hu, Zhiting and Salakhutdinov, Ruslan and Xing, Eric P. , title =. Proceedings of the International Conference on Artificial Intelligence and Statistics , year =

work page
[40]

Advances in Neural Information Processing Systems , year =

Wu, Haixu and Xu, Jiehui and Wang, Jianmin and Long, Mingsheng , title =. Advances in Neural Information Processing Systems , year =

work page
[41]

The Journal of Financial Data Science , volume =

Wood, Kieran and Roberts, Stephen and Zohren, Stefan , title =. The Journal of Financial Data Science , volume =. 2022 , doi =

work page 2022
[42]

and Spragins, John D

Yakowitz, Sidney J. and Spragins, John D. , title =. Annals of Mathematical Statistics , volume =

work page
[43]

2021 , note =

Yan, Tijin and Zhang, Hongwei and Zhou, Tong and Zhan, Yufeng and Xia, Yuanqing , title =. 2021 , note =

work page 2021
[44]

Proceedings of the AAAI Conference on Artificial Intelligence , year =

Zhou, Haoyi and Zhang, Shanghang and Peng, Jieqi and Zhang, Shuai and Li, Jianxin and Xiong, Hui and Zhang, Wancai , title =. Proceedings of the AAAI Conference on Artificial Intelligence , year =

work page
[45]

and Mahoney, Michael W

Ansari, Abdul Fatir and Stella, Lorenzo and Turkmen, Caner and Zhang, Xiyuan and Mercado, Pedro and Shen, Huibin and Shchur, Oleksandr and Rangapuram, Syama Sundar and Arango, Sebastian Pineda and Kapoor, Shubham and Zschiegner, Jasper and Maddix, Danielle C. and Mahoney, Michael W. and Torkkola, Kari and Wilson, Andrew Gordon and Bohlke-Schneider, Michae...

work page
[46]

Proceedings of the International Conference on Machine Learning , year =

Goswami, Mononito and Szafer, Konrad and Choudhry, Arjun and Cai, Yifu and Li, Shuo and Dubrawski, Artur , title =. Proceedings of the International Conference on Machine Learning , year =

work page
[47]

Proceedings of the International Conference on Machine Learning , year =

Das, Abhimanyu and Kong, Weihao and Sen, Rajat and Zhou, Yichen , title =. Proceedings of the International Conference on Machine Learning , year =

work page
[48]

Proceedings of the International Conference on Machine Learning , year =

Woo, Gerald and Liu, Chenghao and Kumar, Akshat and Xiong, Caiming and Savarese, Silvio and Sahoo, Doyen , title =. Proceedings of the International Conference on Machine Learning , year =

work page
[49]

and Zhou, Jun , title =

Wang, Shiyu and Wu, Haixu and Shi, Xiaoming and Hu, Tengge and Luo, Huakun and Ma, Lintao and Zhang, James Y. and Zhou, Jun , title =. Proceedings of the International Conference on Learning Representations , year =

work page
[50]

Modelling Extremal Events for Insurance and Finance , publisher =

Embrechts, Paul and Kl. Modelling Extremal Events for Insurance and Finance , publisher =

work page
[51]

Bayesian Online Changepoint Detection

Ryan P. Adams and David J. C. MacKay. Bayesian online changepoint detection, 2007. arXiv:0710.3742

work page internal anchor Pith review Pith/arXiv arXiv 2007
[52]

Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, Lorenzo Stella, Ali Caner T \"u rkmen, and Yuyang Wang

Alexander Alexandrov, Konstantinos Benidis, Michael Bohlke-Schneider, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Danielle C. Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, Lorenzo Stella, Ali Caner T \"u rkmen, and Yuyang Wang. GluonTS : Probabilistic and neural time series modeling in python. Journal of Machine Learning Research, 21 0 (1...

work page 2020
[53]

Maddix, Michael W

Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Maddix, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, and Yuyang Wang. Chronos: Learning the language of time se...

work page 2024
[54]

Christopher M. Bishop. Mixture density networks. Technical Report NCRG/94/004, Aston University, 1994

work page 1994
[55]

A decoder-only foundation model for time-series forecasting

Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting. In Proceedings of the International Conference on Machine Learning, 2024

work page 2024
[56]

Modelling Extremal Events for Insurance and Finance, volume 33 of Applications of Mathematics

Paul Embrechts, Claudia Kl \"u ppelberg, and Thomas Mikosch. Modelling Extremal Events for Insurance and Finance, volume 33 of Applications of Mathematics. Springer, 1997

work page 1997
[57]

Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity

William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Journal of Machine Learning Research, 23 0 (120): 0 1--39, 2022

work page 2022
[58]

Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew- t distributions

Sylvia Fr \"u hwirth-Schnatter and Saumyadipta Pyne. Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew- t distributions. Biostatistics, 11 0 (2): 0 317--336, 2010

work page 2010
[59]

Marta Garnelo, Dan Rosenbaum, Christopher Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo Rezende, and S. M. Ali Eslami. Conditional neural processes. In Proceedings of the International Conference on Machine Learning, 2018 a

work page 2018
[60]

Neural Processes

Marta Garnelo, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S. M. Ali Eslami, and Yee Whye Teh. Neural processes, 2018 b . arXiv:1807.01622

work page internal anchor Pith review Pith/arXiv arXiv 2018
[61]

Osborne, Steven Reece, Alex Rogers, and Stephen J

Roman Garnett, Michael A. Osborne, Steven Reece, Alex Rogers, and Stephen J. Roberts. Sequential Bayesian prediction in the presence of changepoints and faults. The Computer Journal, 53 0 (9): 0 1430--1446, 2010

work page 2010
[62]

Tilmann Gneiting and Adrian E. Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102 0 (477): 0 359--378, 2007

work page 2007
[63]

MOMENT : A family of open time-series foundation models

Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, and Artur Dubrawski. MOMENT : A family of open time-series foundation models. In Proceedings of the International Conference on Machine Learning, 2024

work page 2024
[64]

Hamilton

James D. Hamilton. A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica, 57 0 (2): 0 357--384, 1989

work page 1989
[65]

Lawrence

James Hensman, Nicolo Fusi, and Neil D. Lawrence. Gaussian processes for big data. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, 2013

work page 2013
[66]

Identifiability of finite mixtures of elliptical distributions

Hajo Holzmann, Axel Munk, and Tilmann Gneiting. Identifiability of finite mixtures of elliptical distributions. Scandinavian Journal of Statistics, 33 0 (4): 0 753--763, 2006

work page 2006
[67]

Hemant Ishwaran and Lancelot F. James. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96 0 (453): 0 161--173, 2001

work page 2001
[68]

Jacobs, Michael I

Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. Adaptive mixtures of local experts. Neural Computation, 3 0 (1): 0 79--87, 1991

work page 1991
[69]

Jordan and Robert A

Michael I. Jordan and Robert A. Jacobs. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6 0 (2): 0 181--214, 1994

work page 1994
[70]

Reversible instance normalization for accurate time-series forecasting against distribution shift

Taesung Kim, Jinhee Kim, Yunwon Tae, Cheonbok Park, Jang-Ho Choi, and Jaegul Choo. Reversible instance normalization for accurate time-series forecasting against distribution shift. In International Conference on Learning Representations, 2022

work page 2022
[71]

Predict, refine, synthesize: Self-guiding diffusion models for probabilistic time series forecasting

Marcel Kollovieh, Abdul Fatir Ansari, Michael Bohlke-Schneider, Jasper Zschiegner, Hao Wang, and Yuyang Wang. Predict, refine, synthesize: Self-guiding diffusion models for probabilistic time series forecasting. In Advances in Neural Information Processing Systems, 2023

work page 2023
[72]

Lange, Roderick J

Kenneth L. Lange, Roderick J. A. Little, and Jeremy M. G. Taylor. Robust statistical modeling using the t distribution. Journal of the American Statistical Association, 84 0 (408): 0 881--896, 1989

work page 1989
[73]

Arik, Nicolas Loeff, and Tomas Pfister

Bryan Lim, Sercan \"O . Arik, Nicolas Loeff, and Tomas Pfister. Temporal fusion transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting, 37 0 (4): 0 1748--1764, 2021

work page 2021
[74]

iTransformer: Inverted Transformers Are Effective for Time Series Forecasting

Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. i T ransformer: Inverted transformers are effective for time series forecasting, 2023. arXiv:2310.06625

work page internal anchor Pith review Pith/arXiv arXiv 2023
[75]

McLachlan and Kaye E

Geoffrey J. McLachlan and Kaye E. Basford. Mixture Models: Inference and Applications to Clustering. Marcel Dekker, 1988

work page 1988
[76]

McLachlan and David Peel

Geoffrey J. McLachlan and David Peel. Finite Mixture Models. Wiley, 2000

work page 2000
[77]

Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam

Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. In International Conference on Learning Representations, 2023

work page 2023
[78]

McLachlan

David Peel and Geoffrey J. McLachlan. Robust mixture modelling using the t distribution. Statistics and Computing, 10 0 (4): 0 339--348, 2000

work page 2000
[79]

and Sheng, Zhenli and Yang, Bin , title =

Xiangfei Qiu, Jilin Hu, Lekui Zhou, Xingjian Wu, Junyang Du, Buang Zhang, Chenjuan Guo, Aoying Zhou, Christian S. Jensen, Zhenli Sheng, and Bin Yang. TFB : Towards comprehensive and fair benchmarking of time series forecasting methods. Proceedings of the VLDB Endowment, 17 0 (9): 0 2363--2377, 2024. doi:10.14778/3665844.3665863

work page doi:10.14778/3665844.3665863 2024
[80]

Rasmussen and Zoubin Ghahramani

Carl E. Rasmussen and Zoubin Ghahramani. Infinite mixtures of gaussian process experts. In Advances in Neural Information Processing Systems, 2002

work page 2002

Showing first 80 references.

[1] [1]

and MacKay, David J

Adams, Ryan P. and MacKay, David J. C. , title =. 2007 , note =

work page 2007

[2] [2]

and Rangapuram, Syama and Salinas, David and Schulz, Jasper and Stella, Lorenzo and T

Alexandrov, Alexander and Benidis, Konstantinos and Bohlke-Schneider, Michael and Flunkert, Valentin and Gasthaus, Jan and Januschowski, Tim and Maddix, Danielle C. and Rangapuram, Syama and Salinas, David and Schulz, Jasper and Stella, Lorenzo and T. Journal of Machine Learning Research , volume =

work page

[3] [3]

, title =

Bishop, Christopher M. , title =

work page

[4] [4]

Bayesian Inference for Finite Mixtures of Univariate and Multivariate Skew-Normal and Skew- t Distributions , journal =

Fr. Bayesian Inference for Finite Mixtures of Univariate and Multivariate Skew-Normal and Skew- t Distributions , journal =

work page

[5] [5]

Journal of Machine Learning Research , volume =

Fedus, William and Zoph, Barret and Shazeer, Noam , title =. Journal of Machine Learning Research , volume =

work page

[6] [6]

and Reece, Steven and Rogers, Alex and Roberts, Stephen J

Garnett, Roman and Osborne, Michael A. and Reece, Steven and Rogers, Alex and Roberts, Stephen J. , title =. The Computer Journal , volume =

work page

[7] [7]

Garnelo, Marta and Rosenbaum, Dan and Maddison, Christopher and Ramalho, Tiago and Saxton, David and Shanahan, Murray and Teh, Yee Whye and Rezende, Danilo and Eslami, S. M. Ali , title =. Proceedings of the International Conference on Machine Learning , year =

work page

[8] [8]

and Eslami, S

Garnelo, Marta and Schwarz, Jonathan and Rosenbaum, Dan and Viola, Fabio and Rezende, Danilo J. and Eslami, S. M. Ali and Teh, Yee Whye , title =. 2018 , note =

work page 2018

[9] [9]

, title =

Gneiting, Tilmann and Raftery, Adrian E. , title =. Journal of the American Statistical Association , volume =

work page

[10] [10]

, title =

Hamilton, James D. , title =. Econometrica , volume =

work page

[11] [11]

, title =

Hensman, James and Fusi, Nicolo and Lawrence, Neil D. , title =. Proceedings of the Conference on Uncertainty in Artificial Intelligence , year =

work page

[12] [12]

Scandinavian Journal of Statistics , volume =

Holzmann, Hajo and Munk, Axel and Gneiting, Tilmann , title =. Scandinavian Journal of Statistics , volume =

work page

[13] [13]

, title =

Ishwaran, Hemant and James, Lancelot F. , title =. Journal of the American Statistical Association , volume =

work page

[14] [14]

and Jordan, Michael I

Jacobs, Robert A. and Jordan, Michael I. and Nowlan, Steven J. and Hinton, Geoffrey E. , title =. Neural Computation , volume =

work page

[15] [15]

and Jacobs, Robert A

Jordan, Michael I. and Jacobs, Robert A. , title =. Neural Computation , volume =

work page

[16] [16]

International Conference on Learning Representations , year =

Kim, Taesung and Kim, Jinhee and Tae, Yunwon and Park, Cheonbok and Choi, Jang-Ho and Choo, Jaegul , title =. International Conference on Learning Representations , year =

work page

[17] [17]

Advances in Neural Information Processing Systems , year =

Kollovieh, Marcel and Ansari, Abdul Fatir and Bohlke-Schneider, Michael and Zschiegner, Jasper and Wang, Hao and Wang, Yuyang , title =. Advances in Neural Information Processing Systems , year =

work page

[18] [18]

and Little, Roderick J

Lange, Kenneth L. and Little, Roderick J. A. and Taylor, Jeremy M. G. , title =. Journal of the American Statistical Association , volume =

work page

[19] [19]

Temporal Fusion Transformers for Interpretable Multi-Horizon Time Series Forecasting , journal =

Lim, Bryan and Arik, Sercan. Temporal Fusion Transformers for Interpretable Multi-Horizon Time Series Forecasting , journal =

work page

[20] [20]

2023 , note =

Liu, Yong and Hu, Tengge and Zhang, Haoran and Wu, Haixu and Wang, Shiyu and Ma, Lintao and Long, Mingsheng , title =. 2023 , note =

work page 2023

[21] [21]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Zeng, Ailing and Chen, Muxi and Zhang, Lei and Xu, Qiang , title =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2023 , doi =

work page 2023

[22] [22]

and Basford, Kaye E

McLachlan, Geoffrey J. and Basford, Kaye E. , title =

work page

[23] [23]

and Peel, David , title =

McLachlan, Geoffrey J. and Peel, David , title =

work page

[24] [24]

and Sinthong, Phanwadee and Kalagnanam, Jayant , title =

Nie, Yuqi and Nguyen, Nam H. and Sinthong, Phanwadee and Kalagnanam, Jayant , title =. International Conference on Learning Representations , year =

work page

[25] [25]

, title =

Peel, David and McLachlan, Geoffrey J. , title =. Statistics and Computing , volume =

work page

[26] [26]

and Sheng, Zhenli and Yang, Bin , title =

Qiu, Xiangfei and Hu, Jilin and Zhou, Lekui and Wu, Xingjian and Du, Junyang and Zhang, Buang and Guo, Chenjuan and Zhou, Aoying and Jensen, Christian S. and Sheng, Zhenli and Yang, Bin , title =. Proceedings of the VLDB Endowment , volume =. 2024 , doi =

work page 2024

[27] [27]

and Ghahramani, Zoubin , title =

Rasmussen, Carl E. and Ghahramani, Zoubin , title =. Advances in Neural Information Processing Systems , year =

work page

[28] [28]

and Williams, Christopher K

Rasmussen, Carl E. and Williams, Christopher K. I. , title =

work page

[29] [29]

International Conference on Learning Representations , year =

Rasul, Kashif and Sheikh, Abdul-Saboor and Schuster, Ingmar and Bergmann, Urs and Vollgraf, Roland , title =. International Conference on Learning Representations , year =

work page

[30] [30]

Proceedings of the International Conference on Machine Learning , year =

Rasul, Kashif and Seward, Calvin and Schuster, Ingmar and Vollgraf, Roland , title =. Proceedings of the International Conference on Machine Learning , year =

work page

[31] [31]

International Journal of Forecasting , volume =

Salinas, David and Flunkert, Valentin and Gasthaus, Jan and Januschowski, Tim , title =. International Journal of Forecasting , volume =

work page

[32] [32]

Statistica Sinica , volume =

Sethuraman, Jayaram , title =. Statistica Sinica , volume =

work page

[33] [33]

International Conference on Learning Representations , year =

Shazeer, Noam and Mirhoseini, Azalia and Maziarz, Krzysztof and Davis, Andy and Le, Quoc and Hinton, Geoffrey and Dean, Jeff , title =. International Conference on Learning Representations , year =

work page

[34] [34]

Annals of Mathematical Statistics , volume =

Teicher, Henry , title =. Annals of Mathematical Statistics , volume =

work page

[35] [35]

Proceedings of the International Conference on Artificial Intelligence and Statistics , year =

Titsias, Michalis , title =. Proceedings of the International Conference on Artificial Intelligence and Statistics , year =

work page

[36] [36]

, title =

Trefethen, Lloyd N. , title =. SIAM Review , volume =

work page

[37] [37]

Advances in Neural Information Processing Systems , year =

Tresp, Volker , title =. Advances in Neural Information Processing Systems , year =

work page

[38] [38]

SIAM/ASA Journal on Uncertainty Quantification , volume =

Volodina, Victoria and Williamson, Daniel , title =. SIAM/ASA Journal on Uncertainty Quantification , volume =

work page

[39] [39]

and Hu, Zhiting and Salakhutdinov, Ruslan and Xing, Eric P

Wilson, Andrew G. and Hu, Zhiting and Salakhutdinov, Ruslan and Xing, Eric P. , title =. Proceedings of the International Conference on Artificial Intelligence and Statistics , year =

work page

[40] [40]

Advances in Neural Information Processing Systems , year =

Wu, Haixu and Xu, Jiehui and Wang, Jianmin and Long, Mingsheng , title =. Advances in Neural Information Processing Systems , year =

work page

[41] [41]

The Journal of Financial Data Science , volume =

Wood, Kieran and Roberts, Stephen and Zohren, Stefan , title =. The Journal of Financial Data Science , volume =. 2022 , doi =

work page 2022

[42] [42]

and Spragins, John D

Yakowitz, Sidney J. and Spragins, John D. , title =. Annals of Mathematical Statistics , volume =

work page

[43] [43]

2021 , note =

Yan, Tijin and Zhang, Hongwei and Zhou, Tong and Zhan, Yufeng and Xia, Yuanqing , title =. 2021 , note =

work page 2021

[44] [44]

Proceedings of the AAAI Conference on Artificial Intelligence , year =

Zhou, Haoyi and Zhang, Shanghang and Peng, Jieqi and Zhang, Shuai and Li, Jianxin and Xiong, Hui and Zhang, Wancai , title =. Proceedings of the AAAI Conference on Artificial Intelligence , year =

work page

[45] [45]

and Mahoney, Michael W

Ansari, Abdul Fatir and Stella, Lorenzo and Turkmen, Caner and Zhang, Xiyuan and Mercado, Pedro and Shen, Huibin and Shchur, Oleksandr and Rangapuram, Syama Sundar and Arango, Sebastian Pineda and Kapoor, Shubham and Zschiegner, Jasper and Maddix, Danielle C. and Mahoney, Michael W. and Torkkola, Kari and Wilson, Andrew Gordon and Bohlke-Schneider, Michae...

work page

[46] [46]

Proceedings of the International Conference on Machine Learning , year =

Goswami, Mononito and Szafer, Konrad and Choudhry, Arjun and Cai, Yifu and Li, Shuo and Dubrawski, Artur , title =. Proceedings of the International Conference on Machine Learning , year =

work page

[47] [47]

Proceedings of the International Conference on Machine Learning , year =

Das, Abhimanyu and Kong, Weihao and Sen, Rajat and Zhou, Yichen , title =. Proceedings of the International Conference on Machine Learning , year =

work page

[48] [48]

Proceedings of the International Conference on Machine Learning , year =

Woo, Gerald and Liu, Chenghao and Kumar, Akshat and Xiong, Caiming and Savarese, Silvio and Sahoo, Doyen , title =. Proceedings of the International Conference on Machine Learning , year =

work page

[49] [49]

and Zhou, Jun , title =

Wang, Shiyu and Wu, Haixu and Shi, Xiaoming and Hu, Tengge and Luo, Huakun and Ma, Lintao and Zhang, James Y. and Zhou, Jun , title =. Proceedings of the International Conference on Learning Representations , year =

work page

[50] [50]

Modelling Extremal Events for Insurance and Finance , publisher =

Embrechts, Paul and Kl. Modelling Extremal Events for Insurance and Finance , publisher =

work page

[51] [51]

Bayesian Online Changepoint Detection

Ryan P. Adams and David J. C. MacKay. Bayesian online changepoint detection, 2007. arXiv:0710.3742

work page internal anchor Pith review Pith/arXiv arXiv 2007

[52] [52]

Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, Lorenzo Stella, Ali Caner T \"u rkmen, and Yuyang Wang

Alexander Alexandrov, Konstantinos Benidis, Michael Bohlke-Schneider, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Danielle C. Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, Lorenzo Stella, Ali Caner T \"u rkmen, and Yuyang Wang. GluonTS : Probabilistic and neural time series modeling in python. Journal of Machine Learning Research, 21 0 (1...

work page 2020

[53] [53]

Maddix, Michael W

Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Maddix, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, and Yuyang Wang. Chronos: Learning the language of time se...

work page 2024

[54] [54]

Christopher M. Bishop. Mixture density networks. Technical Report NCRG/94/004, Aston University, 1994

work page 1994

[55] [55]

A decoder-only foundation model for time-series forecasting

Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting. In Proceedings of the International Conference on Machine Learning, 2024

work page 2024

[56] [56]

Modelling Extremal Events for Insurance and Finance, volume 33 of Applications of Mathematics

Paul Embrechts, Claudia Kl \"u ppelberg, and Thomas Mikosch. Modelling Extremal Events for Insurance and Finance, volume 33 of Applications of Mathematics. Springer, 1997

work page 1997

[57] [57]

Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity

William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Journal of Machine Learning Research, 23 0 (120): 0 1--39, 2022

work page 2022

[58] [58]

Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew- t distributions

Sylvia Fr \"u hwirth-Schnatter and Saumyadipta Pyne. Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew- t distributions. Biostatistics, 11 0 (2): 0 317--336, 2010

work page 2010

[59] [59]

Marta Garnelo, Dan Rosenbaum, Christopher Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo Rezende, and S. M. Ali Eslami. Conditional neural processes. In Proceedings of the International Conference on Machine Learning, 2018 a

work page 2018

[60] [60]

Neural Processes

Marta Garnelo, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S. M. Ali Eslami, and Yee Whye Teh. Neural processes, 2018 b . arXiv:1807.01622

work page internal anchor Pith review Pith/arXiv arXiv 2018

[61] [61]

Osborne, Steven Reece, Alex Rogers, and Stephen J

Roman Garnett, Michael A. Osborne, Steven Reece, Alex Rogers, and Stephen J. Roberts. Sequential Bayesian prediction in the presence of changepoints and faults. The Computer Journal, 53 0 (9): 0 1430--1446, 2010

work page 2010

[62] [62]

Tilmann Gneiting and Adrian E. Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102 0 (477): 0 359--378, 2007

work page 2007

[63] [63]

MOMENT : A family of open time-series foundation models

Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, and Artur Dubrawski. MOMENT : A family of open time-series foundation models. In Proceedings of the International Conference on Machine Learning, 2024

work page 2024

[64] [64]

Hamilton

James D. Hamilton. A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica, 57 0 (2): 0 357--384, 1989

work page 1989

[65] [65]

Lawrence

James Hensman, Nicolo Fusi, and Neil D. Lawrence. Gaussian processes for big data. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, 2013

work page 2013

[66] [66]

Identifiability of finite mixtures of elliptical distributions

Hajo Holzmann, Axel Munk, and Tilmann Gneiting. Identifiability of finite mixtures of elliptical distributions. Scandinavian Journal of Statistics, 33 0 (4): 0 753--763, 2006

work page 2006

[67] [67]

Hemant Ishwaran and Lancelot F. James. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96 0 (453): 0 161--173, 2001

work page 2001

[68] [68]

Jacobs, Michael I

Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. Adaptive mixtures of local experts. Neural Computation, 3 0 (1): 0 79--87, 1991

work page 1991

[69] [69]

Jordan and Robert A

Michael I. Jordan and Robert A. Jacobs. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6 0 (2): 0 181--214, 1994

work page 1994

[70] [70]

Reversible instance normalization for accurate time-series forecasting against distribution shift

Taesung Kim, Jinhee Kim, Yunwon Tae, Cheonbok Park, Jang-Ho Choi, and Jaegul Choo. Reversible instance normalization for accurate time-series forecasting against distribution shift. In International Conference on Learning Representations, 2022

work page 2022

[71] [71]

Predict, refine, synthesize: Self-guiding diffusion models for probabilistic time series forecasting

Marcel Kollovieh, Abdul Fatir Ansari, Michael Bohlke-Schneider, Jasper Zschiegner, Hao Wang, and Yuyang Wang. Predict, refine, synthesize: Self-guiding diffusion models for probabilistic time series forecasting. In Advances in Neural Information Processing Systems, 2023

work page 2023

[72] [72]

Lange, Roderick J

Kenneth L. Lange, Roderick J. A. Little, and Jeremy M. G. Taylor. Robust statistical modeling using the t distribution. Journal of the American Statistical Association, 84 0 (408): 0 881--896, 1989

work page 1989

[73] [73]

Arik, Nicolas Loeff, and Tomas Pfister

Bryan Lim, Sercan \"O . Arik, Nicolas Loeff, and Tomas Pfister. Temporal fusion transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting, 37 0 (4): 0 1748--1764, 2021

work page 2021

[74] [74]

iTransformer: Inverted Transformers Are Effective for Time Series Forecasting

Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. i T ransformer: Inverted transformers are effective for time series forecasting, 2023. arXiv:2310.06625

work page internal anchor Pith review Pith/arXiv arXiv 2023

[75] [75]

McLachlan and Kaye E

Geoffrey J. McLachlan and Kaye E. Basford. Mixture Models: Inference and Applications to Clustering. Marcel Dekker, 1988

work page 1988

[76] [76]

McLachlan and David Peel

Geoffrey J. McLachlan and David Peel. Finite Mixture Models. Wiley, 2000

work page 2000

[77] [77]

Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam

Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. In International Conference on Learning Representations, 2023

work page 2023

[78] [78]

McLachlan

David Peel and Geoffrey J. McLachlan. Robust mixture modelling using the t distribution. Statistics and Computing, 10 0 (4): 0 339--348, 2000

work page 2000

[79] [79]

and Sheng, Zhenli and Yang, Bin , title =

Xiangfei Qiu, Jilin Hu, Lekui Zhou, Xingjian Wu, Junyang Du, Buang Zhang, Chenjuan Guo, Aoying Zhou, Christian S. Jensen, Zhenli Sheng, and Bin Yang. TFB : Towards comprehensive and fair benchmarking of time series forecasting methods. Proceedings of the VLDB Endowment, 17 0 (9): 0 2363--2377, 2024. doi:10.14778/3665844.3665863

work page doi:10.14778/3665844.3665863 2024

[80] [80]

Rasmussen and Zoubin Ghahramani

Carl E. Rasmussen and Zoubin Ghahramani. Infinite mixtures of gaussian process experts. In Advances in Neural Information Processing Systems, 2002

work page 2002