DeRegiME: Deep Regime Mixtures for Probabilistic Forecasting under Distribution Shift
Pith reviewed 2026-05-20 07:51 UTC · model grok-4.3
The pith
DeRegiME models residual uncertainty in time series as recurring regimes assigned by a shared sparse variational Gaussian process gate.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DeRegiME establishes that a direct-sum feature-space representation formed by a shared sparse variational GP gate with nonstationary regime-mixing kernel and Student-t likelihood can capture recurring regimes in residual uncertainty, producing improved negative log predictive density, CRPS, and MSE on ten benchmarks spanning different shift types while pruning the number of active regimes via stick-breaking.
What carries the argument
The shared sparse variational GP gate with nonstationary regime-mixing kernel, which softly assigns each forecast location to learned recurring regimes and mixes per-regime sub-kernels plus noise processes into one posterior.
If this is right
- The approach produces an interpretable mean-residual-noise decomposition with regimes as clusters of residual similarity.
- Regime transitions appear as implicit changepoints in the uncertainty structure.
- The stick-breaking gate automatically determines the effective number of regimes.
- Kernel validity and predictive-density propriety are formally established.
- Gains hold consistently across abrupt, gradual, and seasonal distribution shifts.
Where Pith is reading between the lines
- The regime assignments could serve as an unsupervised detector for entering new uncertainty states in unlabeled streams.
- Decision systems might condition actions on the current regime rather than a single forecast distribution.
- The nonstationary kernel structure may extend naturally to modeling horizon-dependent uncertainty changes.
- Similar gate mechanisms could be adapted to other sequence tasks where residual structure carries the shift signal.
Load-bearing premise
Residual uncertainty follows patterns that repeat across a small number of recurring regimes whose transitions can be learned by the sparse variational GP gate.
What would settle it
Replacing the sparse GP gate with a standard mixture head without the nonstationary kernel or residual clustering and seeing the NLPD gains disappear on the same benchmarks would falsify the value of the regime mechanism.
Figures
read the original abstract
We introduce DeRegiME -- Deep Regime Mixture of Experts -- a direct multi-horizon probabilistic forecaster that separates latent uncertainty regimes from the underlying signal and softly assigns each forecast location to learned recurring regimes using a sparse variational Gaussian process (GP) whose nonstationary regime-mixing kernel and Student-t likelihood combine per-regime sub-kernels and noise processes via a shared gate. This yields a single sparse-GP posterior, not a mixture of GP experts. DeRegiME addresses a key limitation of neural forecasters: point forecasts discard residual uncertainty, and probabilistic heads -- whether single marginals, uninterpreted mixtures, quantile sets, or diffusion samples -- rarely expose the regime structure of the residual. Yet distribution shift in noisy heteroskedastic time series may be abrupt, gradual, or horizon-dependent and often appears in residual uncertainty rather than the conditional mean. DeRegiME yields an interpretable mean-residual-noise decomposition with a direct-sum feature-space representation that anchors regimes as clusters of residual similarity whose transitions surface as implicit changepoints. The effective number of regimes is pruned by the stick-breaking gate. We prove kernel validity and predictive-density propriety, and across ten benchmarks and three encoder grids DeRegiME improves negative log predictive density (NLPD) by 20.3% over the strongest encoder-matched baseline, a DeepAR/GluonTS-style dynamic Student-t head, with parallel gains on CRPS (3.0%) and MSE (4.7%). Improvements are consistent across all datasets, which span abrupt, gradual, and seasonal shifts.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces DeRegiME, a direct multi-horizon probabilistic forecaster that separates latent uncertainty regimes from the underlying signal via a mean-residual-noise decomposition. Residual uncertainty is modeled by softly assigning forecast locations to a finite set of recurring regimes using a shared sparse variational Gaussian process with a nonstationary regime-mixing kernel, stick-breaking gate, and per-regime Student-t noise processes, yielding a single sparse-GP posterior. The work proves kernel validity and predictive-density propriety, and reports consistent empirical gains of 20.3% in NLPD, 3.0% in CRPS, and 4.7% in MSE over an encoder-matched DeepAR/GluonTS-style dynamic Student-t baseline across ten benchmarks spanning abrupt, gradual, and seasonal distribution shifts.
Significance. If the central claims hold, the contribution would be significant for probabilistic time-series forecasting under distribution shift. By focusing regime modeling on residual uncertainty rather than the conditional mean and providing an interpretable direct-sum feature-space representation, the approach addresses a limitation of standard neural probabilistic heads. Explicit credit is due for the kernel-validity and predictive-density proofs as well as the consistent gains across three encoder grids and ten benchmarks with varied shift types; these elements strengthen the case for practical utility in heteroskedastic noisy series.
major comments (3)
- [Empirical Evaluation] The central empirical claim attributes the NLPD gains specifically to the regime mechanism in the residual uncertainty (Abstract). However, without an ablation that isolates the shared sparse variational GP gate and nonstationary kernel from other components (e.g., the Student-t likelihood or encoder), it remains unclear whether the 20.3% improvement is driven by regime discovery or by the overall model capacity; this directly affects whether the encoder-matched baseline comparison isolates the contribution of the regime structure.
- [Model Description] The weakest modeling assumption is that residual uncertainty can be captured by a finite set of recurring regimes whose transitions are learnable via the shared sparse variational GP gate with nonstationary kernel (Abstract and model description). The manuscript provides no diagnostic evidence—such as regime-assignment visualizations, changepoint alignment with known shifts, or sensitivity to the effective number of regimes after stick-breaking pruning—showing that the soft assignments correspond to actual distribution-shift structure rather than spurious clusters in the residuals.
- [Variational Inference] The variational approximation quality for the single sparse-GP posterior under multi-horizon forecasting is load-bearing for avoiding gate collapse or overfitting (Skeptic note and variational inference section). The paper lacks convergence diagnostics, ELBO gap analysis, or posterior predictive checks across the ten benchmarks that would confirm the approximation remains reliable when the nonstationary kernel and Student-t processes are combined.
minor comments (2)
- [Abstract] The abstract states improvements are 'consistent across all datasets' but does not report per-dataset breakdowns or statistical significance tests; adding these would improve transparency without altering the main claims.
- [Notation] Notation for the direct-sum feature-space representation anchoring regimes as clusters of residual similarity would benefit from an explicit equation or small diagram in the methods section.
Simulated Author's Rebuttal
We are grateful to the referee for the detailed and insightful comments on our manuscript. These suggestions have helped us identify areas where we can provide additional clarity and evidence. We address each major comment below and indicate the revisions we will make in the updated version of the paper.
read point-by-point responses
-
Referee: [Empirical Evaluation] The central empirical claim attributes the NLPD gains specifically to the regime mechanism in the residual uncertainty (Abstract). However, without an ablation that isolates the shared sparse variational GP gate and nonstationary kernel from other components (e.g., the Student-t likelihood or encoder), it remains unclear whether the 20.3% improvement is driven by regime discovery or by the overall model capacity; this directly affects whether the encoder-matched baseline comparison isolates the contribution of the regime structure.
Authors: We thank the referee for highlighting this important point regarding the attribution of performance gains. While our encoder-matched baseline comparison controls for the encoder architecture and uses a comparable dynamic Student-t head, we agree that an ablation specifically removing the regime-mixing components would more directly isolate their contribution. In the revised manuscript, we will add an ablation study where we replace the shared sparse variational GP with nonstationary kernel and stick-breaking gate with a simpler shared GP or fixed regime structure, keeping the encoder and likelihood the same. This will allow us to quantify the incremental benefit of the regime mechanism. We believe this will strengthen the empirical claims without altering the core results. revision: yes
-
Referee: [Model Description] The weakest modeling assumption is that residual uncertainty can be captured by a finite set of recurring regimes whose transitions are learnable via the shared sparse variational GP gate with nonstationary kernel (Abstract and model description). The manuscript provides no diagnostic evidence—such as regime-assignment visualizations, changepoint alignment with known shifts, or sensitivity to the effective number of regimes after stick-breaking pruning—showing that the soft assignments correspond to actual distribution-shift structure rather than spurious clusters in the residuals.
Authors: We appreciate the referee's concern about validating the interpretability of the learned regimes. Although the manuscript emphasizes the mean-residual-noise decomposition and direct-sum representation, we acknowledge the value of explicit diagnostics. In the revision, we will include visualizations of regime assignments over time for representative datasets, demonstrating alignment with known abrupt and gradual shifts. Additionally, we will report the effective number of regimes after stick-breaking pruning and sensitivity analysis to the truncation level. These additions will provide evidence that the regimes capture meaningful residual uncertainty structures rather than artifacts. revision: yes
-
Referee: [Variational Inference] The variational approximation quality for the single sparse-GP posterior under multi-horizon forecasting is load-bearing for avoiding gate collapse or overfitting (Skeptic note and variational inference section). The paper lacks convergence diagnostics, ELBO gap analysis, or posterior predictive checks across the ten benchmarks that would confirm the approximation remains reliable when the nonstationary kernel and Student-t processes are combined.
Authors: We recognize the importance of assessing the quality of the variational approximation, particularly given the combination of nonstationary kernel and Student-t processes. The design of a single shared sparse-GP posterior is intended to promote stability and avoid the collapse issues common in mixture models. To address this, we will incorporate ELBO convergence curves and a selection of posterior predictive checks in the supplementary material for the revised submission. While providing exhaustive diagnostics for all ten benchmarks and all encoder grids may exceed space constraints, we will present representative results across different shift types to demonstrate the reliability of the approximation. revision: partial
Circularity Check
Derivation chain is self-contained; no circular reductions identified
full rationale
The paper constructs DeRegiME directly as a mean-residual-noise decomposition with a shared sparse variational GP gate using a nonstationary regime-mixing kernel and Student-t likelihood, yielding a single sparse-GP posterior. It reports empirical gains over encoder-matched baselines and states proofs for kernel validity and predictive-density propriety. No equations or steps in the abstract reduce a claimed prediction or result to a fitted parameter or self-citation by construction. The regime assignments and improvements are presented as outcomes of the model rather than inputs redefined as outputs. This is the normal case of an independent architectural proposal.
Axiom & Free-Parameter Ledger
free parameters (2)
- stick-breaking gate parameters
- regime-mixing kernel hyperparameters
axioms (2)
- standard math The nonstationary regime-mixing kernel is a valid positive semi-definite kernel
- standard math The Student-t likelihood yields a proper predictive density when combined with the GP posterior
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We formulate direct multi-horizon forecasting as a decomposition into a shared conditional mean, structured GP residual signal, and regime-dependent residual uncertainty... regime-mixing kernel K_mix(ξ,ξ′)=∑ π_r(ξ)π_r(ξ′)K_r(z_r(ξ),z_r(ξ′))
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Proposition 1 (Direct-sum feature-space representation)... Theorem 2 (Positive semi-definiteness)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
and Rangapuram, Syama and Salinas, David and Schulz, Jasper and Stella, Lorenzo and T
Alexandrov, Alexander and Benidis, Konstantinos and Bohlke-Schneider, Michael and Flunkert, Valentin and Gasthaus, Jan and Januschowski, Tim and Maddix, Danielle C. and Rangapuram, Syama and Salinas, David and Schulz, Jasper and Stella, Lorenzo and T. Journal of Machine Learning Research , volume =
- [3]
-
[4]
Fr. Bayesian Inference for Finite Mixtures of Univariate and Multivariate Skew-Normal and Skew- t Distributions , journal =
-
[5]
Journal of Machine Learning Research , volume =
Fedus, William and Zoph, Barret and Shazeer, Noam , title =. Journal of Machine Learning Research , volume =
-
[6]
and Reece, Steven and Rogers, Alex and Roberts, Stephen J
Garnett, Roman and Osborne, Michael A. and Reece, Steven and Rogers, Alex and Roberts, Stephen J. , title =. The Computer Journal , volume =
-
[7]
Garnelo, Marta and Rosenbaum, Dan and Maddison, Christopher and Ramalho, Tiago and Saxton, David and Shanahan, Murray and Teh, Yee Whye and Rezende, Danilo and Eslami, S. M. Ali , title =. Proceedings of the International Conference on Machine Learning , year =
-
[8]
Garnelo, Marta and Schwarz, Jonathan and Rosenbaum, Dan and Viola, Fabio and Rezende, Danilo J. and Eslami, S. M. Ali and Teh, Yee Whye , title =. 2018 , note =
work page 2018
- [9]
- [10]
- [11]
-
[12]
Scandinavian Journal of Statistics , volume =
Holzmann, Hajo and Munk, Axel and Gneiting, Tilmann , title =. Scandinavian Journal of Statistics , volume =
- [13]
-
[14]
Jacobs, Robert A. and Jordan, Michael I. and Nowlan, Steven J. and Hinton, Geoffrey E. , title =. Neural Computation , volume =
-
[15]
Jordan, Michael I. and Jacobs, Robert A. , title =. Neural Computation , volume =
-
[16]
International Conference on Learning Representations , year =
Kim, Taesung and Kim, Jinhee and Tae, Yunwon and Park, Cheonbok and Choi, Jang-Ho and Choo, Jaegul , title =. International Conference on Learning Representations , year =
-
[17]
Advances in Neural Information Processing Systems , year =
Kollovieh, Marcel and Ansari, Abdul Fatir and Bohlke-Schneider, Michael and Zschiegner, Jasper and Wang, Hao and Wang, Yuyang , title =. Advances in Neural Information Processing Systems , year =
-
[18]
Lange, Kenneth L. and Little, Roderick J. A. and Taylor, Jeremy M. G. , title =. Journal of the American Statistical Association , volume =
-
[19]
Temporal Fusion Transformers for Interpretable Multi-Horizon Time Series Forecasting , journal =
Lim, Bryan and Arik, Sercan. Temporal Fusion Transformers for Interpretable Multi-Horizon Time Series Forecasting , journal =
-
[20]
Liu, Yong and Hu, Tengge and Zhang, Haoran and Wu, Haixu and Wang, Shiyu and Ma, Lintao and Long, Mingsheng , title =. 2023 , note =
work page 2023
-
[21]
Proceedings of the AAAI Conference on Artificial Intelligence , volume =
Zeng, Ailing and Chen, Muxi and Zhang, Lei and Xu, Qiang , title =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2023 , doi =
work page 2023
- [22]
- [23]
-
[24]
and Sinthong, Phanwadee and Kalagnanam, Jayant , title =
Nie, Yuqi and Nguyen, Nam H. and Sinthong, Phanwadee and Kalagnanam, Jayant , title =. International Conference on Learning Representations , year =
- [25]
-
[26]
and Sheng, Zhenli and Yang, Bin , title =
Qiu, Xiangfei and Hu, Jilin and Zhou, Lekui and Wu, Xingjian and Du, Junyang and Zhang, Buang and Guo, Chenjuan and Zhou, Aoying and Jensen, Christian S. and Sheng, Zhenli and Yang, Bin , title =. Proceedings of the VLDB Endowment , volume =. 2024 , doi =
work page 2024
-
[27]
and Ghahramani, Zoubin , title =
Rasmussen, Carl E. and Ghahramani, Zoubin , title =. Advances in Neural Information Processing Systems , year =
- [28]
-
[29]
International Conference on Learning Representations , year =
Rasul, Kashif and Sheikh, Abdul-Saboor and Schuster, Ingmar and Bergmann, Urs and Vollgraf, Roland , title =. International Conference on Learning Representations , year =
-
[30]
Proceedings of the International Conference on Machine Learning , year =
Rasul, Kashif and Seward, Calvin and Schuster, Ingmar and Vollgraf, Roland , title =. Proceedings of the International Conference on Machine Learning , year =
-
[31]
International Journal of Forecasting , volume =
Salinas, David and Flunkert, Valentin and Gasthaus, Jan and Januschowski, Tim , title =. International Journal of Forecasting , volume =
- [32]
-
[33]
International Conference on Learning Representations , year =
Shazeer, Noam and Mirhoseini, Azalia and Maziarz, Krzysztof and Davis, Andy and Le, Quoc and Hinton, Geoffrey and Dean, Jeff , title =. International Conference on Learning Representations , year =
-
[34]
Annals of Mathematical Statistics , volume =
Teicher, Henry , title =. Annals of Mathematical Statistics , volume =
-
[35]
Proceedings of the International Conference on Artificial Intelligence and Statistics , year =
Titsias, Michalis , title =. Proceedings of the International Conference on Artificial Intelligence and Statistics , year =
- [36]
-
[37]
Advances in Neural Information Processing Systems , year =
Tresp, Volker , title =. Advances in Neural Information Processing Systems , year =
-
[38]
SIAM/ASA Journal on Uncertainty Quantification , volume =
Volodina, Victoria and Williamson, Daniel , title =. SIAM/ASA Journal on Uncertainty Quantification , volume =
-
[39]
and Hu, Zhiting and Salakhutdinov, Ruslan and Xing, Eric P
Wilson, Andrew G. and Hu, Zhiting and Salakhutdinov, Ruslan and Xing, Eric P. , title =. Proceedings of the International Conference on Artificial Intelligence and Statistics , year =
-
[40]
Advances in Neural Information Processing Systems , year =
Wu, Haixu and Xu, Jiehui and Wang, Jianmin and Long, Mingsheng , title =. Advances in Neural Information Processing Systems , year =
-
[41]
The Journal of Financial Data Science , volume =
Wood, Kieran and Roberts, Stephen and Zohren, Stefan , title =. The Journal of Financial Data Science , volume =. 2022 , doi =
work page 2022
-
[42]
Yakowitz, Sidney J. and Spragins, John D. , title =. Annals of Mathematical Statistics , volume =
-
[43]
Yan, Tijin and Zhang, Hongwei and Zhou, Tong and Zhan, Yufeng and Xia, Yuanqing , title =. 2021 , note =
work page 2021
-
[44]
Proceedings of the AAAI Conference on Artificial Intelligence , year =
Zhou, Haoyi and Zhang, Shanghang and Peng, Jieqi and Zhang, Shuai and Li, Jianxin and Xiong, Hui and Zhang, Wancai , title =. Proceedings of the AAAI Conference on Artificial Intelligence , year =
-
[45]
Ansari, Abdul Fatir and Stella, Lorenzo and Turkmen, Caner and Zhang, Xiyuan and Mercado, Pedro and Shen, Huibin and Shchur, Oleksandr and Rangapuram, Syama Sundar and Arango, Sebastian Pineda and Kapoor, Shubham and Zschiegner, Jasper and Maddix, Danielle C. and Mahoney, Michael W. and Torkkola, Kari and Wilson, Andrew Gordon and Bohlke-Schneider, Michae...
-
[46]
Proceedings of the International Conference on Machine Learning , year =
Goswami, Mononito and Szafer, Konrad and Choudhry, Arjun and Cai, Yifu and Li, Shuo and Dubrawski, Artur , title =. Proceedings of the International Conference on Machine Learning , year =
-
[47]
Proceedings of the International Conference on Machine Learning , year =
Das, Abhimanyu and Kong, Weihao and Sen, Rajat and Zhou, Yichen , title =. Proceedings of the International Conference on Machine Learning , year =
-
[48]
Proceedings of the International Conference on Machine Learning , year =
Woo, Gerald and Liu, Chenghao and Kumar, Akshat and Xiong, Caiming and Savarese, Silvio and Sahoo, Doyen , title =. Proceedings of the International Conference on Machine Learning , year =
-
[49]
Wang, Shiyu and Wu, Haixu and Shi, Xiaoming and Hu, Tengge and Luo, Huakun and Ma, Lintao and Zhang, James Y. and Zhou, Jun , title =. Proceedings of the International Conference on Learning Representations , year =
-
[50]
Modelling Extremal Events for Insurance and Finance , publisher =
Embrechts, Paul and Kl. Modelling Extremal Events for Insurance and Finance , publisher =
-
[51]
Bayesian Online Changepoint Detection
Ryan P. Adams and David J. C. MacKay. Bayesian online changepoint detection, 2007. arXiv:0710.3742
work page internal anchor Pith review Pith/arXiv arXiv 2007
-
[52]
Alexander Alexandrov, Konstantinos Benidis, Michael Bohlke-Schneider, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Danielle C. Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, Lorenzo Stella, Ali Caner T \"u rkmen, and Yuyang Wang. GluonTS : Probabilistic and neural time series modeling in python. Journal of Machine Learning Research, 21 0 (1...
work page 2020
-
[53]
Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Maddix, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, and Yuyang Wang. Chronos: Learning the language of time se...
work page 2024
-
[54]
Christopher M. Bishop. Mixture density networks. Technical Report NCRG/94/004, Aston University, 1994
work page 1994
-
[55]
A decoder-only foundation model for time-series forecasting
Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting. In Proceedings of the International Conference on Machine Learning, 2024
work page 2024
-
[56]
Modelling Extremal Events for Insurance and Finance, volume 33 of Applications of Mathematics
Paul Embrechts, Claudia Kl \"u ppelberg, and Thomas Mikosch. Modelling Extremal Events for Insurance and Finance, volume 33 of Applications of Mathematics. Springer, 1997
work page 1997
-
[57]
Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity
William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Journal of Machine Learning Research, 23 0 (120): 0 1--39, 2022
work page 2022
-
[58]
Sylvia Fr \"u hwirth-Schnatter and Saumyadipta Pyne. Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew- t distributions. Biostatistics, 11 0 (2): 0 317--336, 2010
work page 2010
-
[59]
Marta Garnelo, Dan Rosenbaum, Christopher Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo Rezende, and S. M. Ali Eslami. Conditional neural processes. In Proceedings of the International Conference on Machine Learning, 2018 a
work page 2018
-
[60]
Marta Garnelo, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S. M. Ali Eslami, and Yee Whye Teh. Neural processes, 2018 b . arXiv:1807.01622
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[61]
Osborne, Steven Reece, Alex Rogers, and Stephen J
Roman Garnett, Michael A. Osborne, Steven Reece, Alex Rogers, and Stephen J. Roberts. Sequential Bayesian prediction in the presence of changepoints and faults. The Computer Journal, 53 0 (9): 0 1430--1446, 2010
work page 2010
-
[62]
Tilmann Gneiting and Adrian E. Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102 0 (477): 0 359--378, 2007
work page 2007
-
[63]
MOMENT : A family of open time-series foundation models
Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, and Artur Dubrawski. MOMENT : A family of open time-series foundation models. In Proceedings of the International Conference on Machine Learning, 2024
work page 2024
- [64]
- [65]
-
[66]
Identifiability of finite mixtures of elliptical distributions
Hajo Holzmann, Axel Munk, and Tilmann Gneiting. Identifiability of finite mixtures of elliptical distributions. Scandinavian Journal of Statistics, 33 0 (4): 0 753--763, 2006
work page 2006
-
[67]
Hemant Ishwaran and Lancelot F. James. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96 0 (453): 0 161--173, 2001
work page 2001
-
[68]
Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. Adaptive mixtures of local experts. Neural Computation, 3 0 (1): 0 79--87, 1991
work page 1991
-
[69]
Michael I. Jordan and Robert A. Jacobs. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6 0 (2): 0 181--214, 1994
work page 1994
-
[70]
Reversible instance normalization for accurate time-series forecasting against distribution shift
Taesung Kim, Jinhee Kim, Yunwon Tae, Cheonbok Park, Jang-Ho Choi, and Jaegul Choo. Reversible instance normalization for accurate time-series forecasting against distribution shift. In International Conference on Learning Representations, 2022
work page 2022
-
[71]
Predict, refine, synthesize: Self-guiding diffusion models for probabilistic time series forecasting
Marcel Kollovieh, Abdul Fatir Ansari, Michael Bohlke-Schneider, Jasper Zschiegner, Hao Wang, and Yuyang Wang. Predict, refine, synthesize: Self-guiding diffusion models for probabilistic time series forecasting. In Advances in Neural Information Processing Systems, 2023
work page 2023
-
[72]
Kenneth L. Lange, Roderick J. A. Little, and Jeremy M. G. Taylor. Robust statistical modeling using the t distribution. Journal of the American Statistical Association, 84 0 (408): 0 881--896, 1989
work page 1989
-
[73]
Arik, Nicolas Loeff, and Tomas Pfister
Bryan Lim, Sercan \"O . Arik, Nicolas Loeff, and Tomas Pfister. Temporal fusion transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting, 37 0 (4): 0 1748--1764, 2021
work page 2021
-
[74]
iTransformer: Inverted Transformers Are Effective for Time Series Forecasting
Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. i T ransformer: Inverted transformers are effective for time series forecasting, 2023. arXiv:2310.06625
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[75]
Geoffrey J. McLachlan and Kaye E. Basford. Mixture Models: Inference and Applications to Clustering. Marcel Dekker, 1988
work page 1988
-
[76]
Geoffrey J. McLachlan and David Peel. Finite Mixture Models. Wiley, 2000
work page 2000
-
[77]
Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam
Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. In International Conference on Learning Representations, 2023
work page 2023
- [78]
-
[79]
and Sheng, Zhenli and Yang, Bin , title =
Xiangfei Qiu, Jilin Hu, Lekui Zhou, Xingjian Wu, Junyang Du, Buang Zhang, Chenjuan Guo, Aoying Zhou, Christian S. Jensen, Zhenli Sheng, and Bin Yang. TFB : Towards comprehensive and fair benchmarking of time series forecasting methods. Proceedings of the VLDB Endowment, 17 0 (9): 0 2363--2377, 2024. doi:10.14778/3665844.3665863
-
[80]
Rasmussen and Zoubin Ghahramani
Carl E. Rasmussen and Zoubin Ghahramani. Infinite mixtures of gaussian process experts. In Advances in Neural Information Processing Systems, 2002
work page 2002
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.