Recognition: 2 theorem links
· Lean TheoremTimePre: Bridging Accuracy, Efficiency, and Stability in Probabilistic Time-Series Forecasting
Pith reviewed 2026-05-17 05:38 UTC · model grok-4.3
The pith
TimePre unifies MLP efficiency with MCL flexibility using stabilized normalization for probabilistic time-series forecasting
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TimePre is a simple framework that unifies the efficiency of MLP-based models with the distributional flexibility of MCL for probabilistic time-series forecasting. Stabilized Instance Normalization stabilizes the hybrid architecture by correcting channel-wise statistical shifts, thereby resolving the catastrophic hypothesis collapse. Extensive experiments on six benchmark datasets demonstrate that TimePre achieves state-of-the-art accuracy on key probabilistic metrics, with inference speeds orders of magnitude faster than sampling-based models and greater stability than prior MCL approaches.
What carries the argument
Stabilized Instance Normalization (SIN), a normalization layer that corrects channel-wise statistical shifts to stabilize the MLP-MCL hybrid and prevent hypothesis collapse.
If this is right
- TimePre can serve as a drop-in replacement for slower sampling-based probabilistic forecasters in latency-sensitive settings.
- The stabilized hybrid enables reliable distributional outputs without the instability that previously limited MCL use in time series.
- Inference speed improvements allow deployment of full probabilistic models on hardware with limited compute resources.
- The normalization technique provides a general way to combine MLP efficiency with flexible output modeling in forecasting pipelines.
Where Pith is reading between the lines
- If the stabilization works broadly, similar channel-wise corrections could be tested in other hybrid models that mix deterministic and stochastic components.
- The speed advantage might open probabilistic forecasting to real-time applications such as dynamic pricing or sensor networks.
- Greater stability could make uncertainty estimates more usable for downstream decision systems that rely on calibrated probabilities.
- Extending the evaluation to irregularly sampled or multivariate series with strong non-stationarity would test whether the claimed generality holds.
Load-bearing premise
That Stabilized Instance Normalization resolves catastrophic hypothesis collapse in the MLP-MCL hybrid without introducing new statistical biases or requiring dataset-specific tuning that undermines generality.
What would settle it
Observing hypothesis collapse or loss of accuracy and speed gains when TimePre is tested on a seventh benchmark dataset outside the original six would indicate the stabilization does not hold generally.
Figures
read the original abstract
We propose TimePre, a simple framework that unifies the efficiency of Multilayer Perceptron (MLP)-based models with the distributional flexibility of Multiple Choice Learning (MCL) for Probabilistic Time-Series Forecasting (PTSF). Stabilized Instance Normalization (SIN), the core of TimePre, is a normalization layer that explicitly addresses the trade-off among accuracy, efficiency, and stability. SIN stabilizes the hybrid architecture by correcting channel-wise statistical shifts, thereby resolving the catastrophic hypothesis collapse. Extensive experiments on six benchmark datasets demonstrate that TimePre achieves state-of-the-art (SOTA) accuracy on key probabilistic metrics. Critically, TimePre achieves inference speeds that are orders of magnitude faster than sampling-based models, and is more stable than prior MCL approaches.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes TimePre, a framework that unifies the efficiency of MLP-based models with the distributional flexibility of Multiple Choice Learning (MCL) for probabilistic time-series forecasting. The core innovation is Stabilized Instance Normalization (SIN), which corrects channel-wise statistical shifts to resolve catastrophic hypothesis collapse in the MLP-MCL hybrid. The authors claim SOTA accuracy on key probabilistic metrics across six benchmark datasets, inference speeds orders of magnitude faster than sampling-based models, and improved stability over prior MCL approaches.
Significance. If the claims hold after proper validation, the work could meaningfully advance practical probabilistic time-series forecasting by balancing accuracy, speed, and stability in a simple hybrid architecture. The targeted use of normalization to stabilize MCL is a relevant direction for addressing collapse issues. However, the absence of isolated ablations, detailed baselines, and quantitative definitions of key phenomena currently limits assessability of the contribution.
major comments (3)
- [§3] §3 (Method): The description of Stabilized Instance Normalization (SIN) provides no quantitative definition or metric for 'catastrophic hypothesis collapse', nor an ablation isolating SIN from other architectural choices in the MLP-MCL hybrid. This is load-bearing for the central stability and generality claims, as it leaves open whether SIN resolves the issue without new biases or per-dataset tuning.
- [§4] §4 (Experiments): No details are given on baselines, exact probabilistic metrics (e.g., CRPS, NLL), number of runs, or statistical significance tests supporting the SOTA accuracy claims on the six datasets. This prevents evaluation of the accuracy and speed results.
- [§4.2] §4.2 (Inference results): The 'orders of magnitude faster' inference claim lacks hardware specifications, implementation details of compared sampling-based models, or controlled conditions, undermining the efficiency advantage over prior approaches.
minor comments (2)
- [Abstract] The abstract would be clearer with a brief enumeration of the specific probabilistic metrics and datasets used to support the SOTA claim.
- [§3.1] Notation for the MCL component and loss function could be made more explicit to aid reproducibility.
Simulated Author's Rebuttal
We appreciate the referee's detailed and constructive feedback, which identifies important areas for improving the clarity and rigor of our presentation. We address each major comment below and will make the necessary revisions to enhance the manuscript's assessability while preserving the core contributions of TimePre.
read point-by-point responses
-
Referee: [§3] §3 (Method): The description of Stabilized Instance Normalization (SIN) provides no quantitative definition or metric for 'catastrophic hypothesis collapse', nor an ablation isolating SIN from other architectural choices in the MLP-MCL hybrid. This is load-bearing for the central stability and generality claims, as it leaves open whether SIN resolves the issue without new biases or per-dataset tuning.
Authors: We agree that an explicit quantitative definition and isolated ablation would strengthen the stability claims. In the revised manuscript, we will add a precise metric for catastrophic hypothesis collapse, defined as the point at which the variance across MCL hypotheses drops below a threshold (e.g., 0.01 in normalized prediction space) leading to effective single-mode behavior. We will also include a dedicated ablation study comparing the full TimePre model to an MLP-MCL variant without SIN, with all other components held constant across the six datasets. This will confirm that SIN addresses collapse without introducing per-dataset tuning or new biases. revision: yes
-
Referee: [§4] §4 (Experiments): No details are given on baselines, exact probabilistic metrics (e.g., CRPS, NLL), number of runs, or statistical significance tests supporting the SOTA accuracy claims on the six datasets. This prevents evaluation of the accuracy and speed results.
Authors: We concur that additional experimental details are required for full reproducibility and evaluation. The revision will expand Section 4 with: (i) a complete table of baselines including citations and implementation sources, (ii) explicit formulas and computation details for all probabilistic metrics (CRPS, NLL, and others), (iii) the number of runs (five independent runs with distinct random seeds), and (iv) statistical significance results using paired t-tests with p-values reported for SOTA comparisons. Hyperparameters, data preprocessing, and train/validation/test splits will also be fully specified. revision: yes
-
Referee: [§4.2] §4.2 (Inference results): The 'orders of magnitude faster' inference claim lacks hardware specifications, implementation details of compared sampling-based models, or controlled conditions, undermining the efficiency advantage over prior approaches.
Authors: We thank the referee for this observation. The revised version will specify the exact hardware (NVIDIA A100 80GB GPU with PyTorch 2.0), the sampling procedures and sample counts used for the compared models, and confirm that all timing measurements were performed under identical conditions (same batch size, sequence length, and input preprocessing). We will report both mean and standard deviation of inference latency per sample to ensure the efficiency comparison is transparent and controlled. revision: yes
Circularity Check
No significant circularity; claims rest on empirical results rather than self-referential derivations
full rationale
The provided abstract and description introduce TimePre as an empirical framework combining MLP efficiency with MCL flexibility, using Stabilized Instance Normalization (SIN) to address hypothesis collapse via channel-wise corrections. No equations, mathematical derivations, fitted parameters renamed as predictions, or self-citation chains appear in the text. Claims of SOTA accuracy, faster inference, and improved stability are tied to experiments on six benchmark datasets, without any reduction of outputs to inputs by construction or load-bearing self-references. The central premise does not reduce to a definition or prior fit within the paper itself, making the derivation chain self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
invented entities (1)
-
Stabilized Instance Normalization (SIN)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Stabilized Instance Normalization (SIN) ... correcting channel-wise statistical shifts, thereby resolving the catastrophic hypothesis collapse
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Relaxed WTA Objective ... (1−ε)L(k∗) + ε/(K−1) Σ L(j)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Alexander Alexandrov, Konstantinos Benidis, Michael Bohlke-Schneider, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Danielle C. Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, Lorenzo Stella, Ali Caner T¨urkmen, and Yuyang Wang. Gluonts: Probabilistic time series models in python, 2019. 5
work page 2019
-
[2]
A convergence analysis of gradient descent for deep linear neural networks
Sanjeev Arora, Nadav Cohen, Noah Golowich, and Wei Hu. A convergence analysis of gradient descent for deep linear neural networks. InInternational Conference on Learning Representations, 2019. 8
work page 2019
-
[3]
TACTis-2: Bet- ter, faster, simpler attentional copulas for multivariate time series
Arjun Ashok, ´Etienne Marcotte, Valentina Zantedeschi, Nicolas Chapados, and Alexandre Drouin. TACTis-2: Bet- ter, faster, simpler attentional copulas for multivariate time series. InThe Twelfth International Conference on Learning Representations, 2024. 1, 5, 7
work page 2024
-
[4]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization, 2016. 6, 7, 1
work page 2016
- [5]
-
[6]
Wenjing Chen and Victoria Crawford. Bicriteria approxima- tion algorithms for the submodular cover problem.Advances in Neural Information Processing Systems, 36:72705–72716,
-
[7]
Adap- tive threshold sampling for pure exploration in submodular bandits
Wenjing Chen, Shuo Xing, and Victoria G Crawford. Adap- tive threshold sampling for pure exploration in submodular bandits. InThe 41st Conference on Uncertainty in Artificial Intelligence. 2
-
[8]
Fair submodular cover.arXiv preprint arXiv:2407.04804, 2024
Wenjing Chen, Shuo Xing, Samson Zhou, and Victo- ria G Crawford. Fair submodular cover.arXiv preprint arXiv:2407.04804, 2024. 2
-
[9]
Learning phrase representations using RNN encoder–decoder for statistical machine translation
Kyunghyun Cho, Bart van Merri ¨enboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using RNN encoder–decoder for statistical machine translation. InPro- ceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734, Doha, Qatar, 2014. A...
work page 2014
-
[10]
Winner- takes-all for multivariate probabilistic time series forecast- ing
Adrien Cortes, Remi Rehm, and Victor Letzelter. Winner- takes-all for multivariate probabilistic time series forecast- ing. InForty-second International Conference on Machine Learning, 2025. 2, 5, 7, 8
work page 2025
-
[11]
Kazunori D Y AMADA, Fangzhou Lin, and Tsukasa Naka- mura. Developing a novel recurrent neural network architec- ture with fewer parameters and good learning performance. Interdisciplinary information sciences, 27(1):25–40, 2021. 2
work page 2021
-
[12]
Kazunori D Y AMADA, Samy Baladram, and Fangzhou Lin. Progress in research on implementing machine conscious- ness.Interdisciplinary Information Sciences, 28(1):95–105,
-
[13]
Abhimanyu Das, Weihao Kong, Andrew Leach, Shaan K Mathur, Rajat Sen, and Rose Yu. Long-term forecasting with tiDE: Time-series dense encoder.Transactions on Machine Learning Research, 2023. 2, 3, 8
work page 2023
-
[14]
Greedy function approximation: A gradi- ent boosting machine.The Annals of Statistics, 29, 2000
Jerome Friedman. Greedy function approximation: A gradi- ent boosting machine.The Annals of Statistics, 29, 2000. 3, 1
work page 2000
-
[15]
Gray.Vector Quantization and Signal Compression
Allen Gersho and Robert M. Gray.Vector Quantization and Signal Compression. Springer, 1992. 1
work page 1992
-
[16]
Tilmann Gneiting and Adrian E Raftery. Strictly proper scor- ing rules, prediction, and estimation.Journal of the Ameri- can Statistical Association, 102(477):359–378, 2007. 5, 2
work page 2007
-
[17]
Multiple choice learning: Learning to produce multiple structured outputs
Abner Guzman-Rivera, Dhruv Batra, and Pushmeet Kohli. Multiple choice learning: Learning to produce multiple structured outputs. InAdvances in Neural Information Pro- cessing Systems, pages 1799–1807, 2012. 2, 3, 4, 1
work page 2012
-
[18]
Hansika Hewamalage, Christoph Bergmeir, and Kasun Ban- dara. Recurrent neural networks for time series forecasting: Current status and future directions.International Journal of Forecasting, 37(1):388–427, 2021. 1
work page 2021
- [19]
-
[20]
Long short-term memory.Neural Computation, 9(8):1735–1780, 1997
Sepp Hochreiter and J ¨urgen Schmidhuber. Long short-term memory.Neural Computation, 9(8):1735–1780, 1997. 2, 3
work page 1997
-
[21]
Li, Sheng Wang, Jiheng Zhang, Ziyun Li, and Tianlong Chen
Yang Hu, Xiao Wang, Zezhen Ding, Lirong Wu, Huatian Zhang, Stan Z. Li, Sheng Wang, Jiheng Zhang, Ziyun Li, and Tianlong Chen. Flowts: Time series generation via rectified flow, 2025. 1
work page 2025
-
[22]
Lei Huang, Dawei Yang, Bo Lang, and Jia Deng. Decorre- lated batch normalization.2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 791–800,
work page 2018
-
[23]
OTexts, Australia, 2nd edi- tion, 2018
8 [23]{Robin John}Hyndman and George Athanasopoulos.Fore- casting: Principles and Practice. OTexts, Australia, 2nd edi- tion, 2018. 2
work page 2018
-
[24]
Rob J Hyndman, Anne B Koehler, Ralph D Snyder, and Si- mone Grose. A state space framework for automatic fore- casting using exponential smoothing methods.International Journal of Forecasting, 18(3):439–454, 2002. 5, 7
work page 2002
-
[25]
Batch normalization: Accelerating deep network training by reducing internal co- variate shift
Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal co- variate shift. InProceedings of the 32nd International Con- ference on Machine Learning (ICML), pages 448–456, 2015. 4, 6, 7, 1
work page 2015
-
[26]
KANMixer: a minimal KAN-centered mixer for long-term time series forecasting
Lingyu Jiang, Yuping Wang, Yao Su, Shuo Xing, Wenjing Chen, Xin Zhang, Zhengzhong Tu, Ziming Zhang, Fangzhou Lin, Michael Zielewski, et al. Kanmixer: Can kan serve as a new modeling core for long-term time series forecasting? arXiv preprint arXiv:2508.01575, 2025. 3
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[27]
A style-based generator architecture for generative adversarial networks
Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4401–4410,
-
[28]
Jongseon Kim, Hyungjoon Kim, HyunGi Kim, Dongjun Lee, and Sungroh Yoon. A comprehensive survey of deep learning for time series forecasting: Architectural diversity and open challenges, 2025. 1
work page 2025
-
[29]
Similarity of neural network representa- tions revisited, 2019
Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of neural network representa- tions revisited, 2019. 8
work page 2019
-
[30]
Guokun Lai, Wei-Cheng Chang, Yiming Yang, and Hanxiao Liu. Modeling long- and short-term temporal patterns with deep neural networks.The 41st International ACM SIGIR Conference on Research & Development in Information Re- trieval, 2017. 5
work page 2017
-
[31]
Simple and scalable predictive uncertainty esti- mation using deep ensembles
Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty esti- mation using deep ensembles. InAdvances in Neural Infor- mation Processing Systems, pages 6402–6413, 2017. 3
work page 2017
-
[32]
Deep learning.Nature, 521:436–44, 2015
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning.Nature, 521:436–44, 2015. 1
work page 2015
-
[33]
LeCun, L ´eon Bottou, Genevieve B
Yann A. LeCun, L ´eon Bottou, Genevieve B. Orr, and Klaus- Robert M ¨uller.Efficient BackProp, pages 9–48. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012. 8
work page 2012
-
[34]
Seungjun Lee, S. P. Purkayastha, Michael Cogswell, Viresh Ranjan, David Crandall, and Dhruv Batra. Stochastic multi- ple choice learning for training diverse deep ensembles. In Advances in Neural Information Processing Systems, pages 2119–2127, 2016. 3, 4, 5, 1
work page 2016
-
[35]
Winner-takes-all learners are geometry-aware conditional density estimators, 2024
Victor Letzelter, David Perera, C ´edric Rommel, Mathieu Fontaine, Slim Essid, Gael Richard, and Patrick P ´erez. Winner-takes-all learners are geometry-aware conditional density estimators, 2024. 2, 1
work page 2024
-
[36]
Diffusion convolutional recurrent neural network: Data-driven traffic forecasting
Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. In6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenRe- view.net, 2018. 5
work page 2018
-
[37]
RMLP: A reparameterized MLP-like network for long-term time se- ries forecasting
Yutong Li, Ming Yang, Muzi Yang, and Chen Wang. RMLP: A reparameterized MLP-like network for long-term time se- ries forecasting. InProceedings of the AAAI Conference on Artificial Intelligence, pages 13589–13597, 2024. 2
work page 2024
-
[38]
Bryan Lim and Stefan Zohren. Time series forecasting with deep learning: a survey.Philosophical Transactions of the Royal Society A, 379(2194):20200209, 2021. 3
work page 2021
-
[39]
A functional view of quantization and clustering.ESAIM: Probability and Statistics, 21:93–114, 2017
Jean-Michel Loubes and Bertrand Pelletier. A functional view of quantization and clustering.ESAIM: Probability and Statistics, 21:93–114, 2017. 1
work page 2017
-
[40]
Treernn: Topology- preserving deep graph embedding and learning
Yecheng Lyu, Ming Li, Xinming Huang, Ulkuhan Guler, Patrick Schaumont, and Ziming Zhang. Treernn: Topology- preserving deep graph embedding and learning. In2020 25th International Conference on Pattern Recognition (ICPR), pages 7493–7499. IEEE, 2021. 2
work page 2021
-
[41]
Web traffic time series forecasting.https : / / kaggle
Maggie, Oren Anava, Vitaly Kuznetsov, and Will Cukier- ski. Web traffic time series forecasting.https : / / kaggle . com / competitions / web - traffic - time-series-forecasting, 2017. Kaggle. 5
work page 2017
-
[42]
Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann LeCun, and Nathan Srebro. Implicit regularization in deep learning: A view from function space.Advances in Neural Information Processing Systems (NeurIPS), 30, 2017. 2
work page 2017
-
[43]
A time series is worth 64 words: Long-term forecasting with transformers
Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. InThe Eleventh International Conference on Learning Representations, 2023. 2
work page 2023
-
[44]
Multiple choice learning for ef- ficient speech separation with many speakers, 2024
David Perera, Franc ¸ois Derrida, Th ´eo Mariotte, Ga ¨el Richard, and Slim Essid. Multiple choice learning for ef- ficient speech separation with many speakers, 2024. 2, 1
work page 2024
-
[45]
An- nealed multiple choice learning: Overcoming limitations of winner-takes-all with annealing
David Perera, Victor Letzelter, Theo Mariotte, Adrien Cortes, Mickael Chen, Slim Essid, and Ga ¨el Richard. An- nealed multiple choice learning: Overcoming limitations of winner-takes-all with annealing. InThe Thirty-eighth An- nual Conference on Neural Information Processing Systems,
-
[46]
Multi-choice learning for multimodal sequence prediction
Ruwan Perera, Dhruv Batra, David Crandall, and Zsolt Kira. Multi-choice learning for multimodal sequence prediction. Transactions on Machine Learning Research (TMLR), 2024. 4, 5, 1, 3
work page 2024
-
[47]
Lawrence.Dataset Shift in Ma- chine Learning
Joaquin Quionero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D. Lawrence.Dataset Shift in Ma- chine Learning. The MIT Press, 2009. 2
work page 2009
-
[48]
Rial A. Rajagukguk, Raden A. A. Ramadhan, and Hyun-Jin Lee. A review on deep learning models for forecasting time series data of solar irradiance and photovoltaic power.Ener- gies, 13(24), 2020. 1
work page 2020
-
[49]
Multi-variate probabilis- tic time series forecasting via conditioned normalizing flows
Kashif Rasul, Abdul-Saboor Sheikh, Ingmar Schuster, Urs Bergmann, and Roland V ollgraf. Multi-variate probabilis- tic time series forecasting via conditioned normalizing flows. CoRR, abs/2002.06103, 2020. 1, 5, 7
-
[50]
Kashif Rasul, Calvin Seward, Ingmar Schuster, and Roland V ollgraf. Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting.CoRR, abs/2101.12072, 2021. 1, 5, 7
-
[51]
Alejandro Rodriguez Dom ´ınguez, Muhammad Shahzad, and Xia Hong. Structured basis function networks: Loss- centric multi-hypothesis ensembles with controllable diver- sity. 2025. 2
work page 2025
-
[52]
Learning representations by back-propagating er- rors.nature, 323(6088):533–536, 1986
David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back-propagating er- rors.nature, 323(6088):533–536, 1986. 2
work page 1986
-
[53]
Learning in an uncertain world: Representing am- biguity through multiple hypotheses
Christian Rupprecht, Iro Laina, Robert DiPietro, Maximil- ian Baust, Federico Tombari, Nassir Navab, and Gregory D Hager. Learning in an uncertain world: Representing am- biguity through multiple hypotheses. InProceedings of the IEEE International Conference on Computer Vision (ICCV), pages 3591–3600, 2017. 2, 3, 4, 5, 1
work page 2017
-
[54]
David Salinas, Valentin Flunkert, and Jan Gasthaus. Deepar: Probabilistic forecasting with autoregressive recurrent net- works.International Journal of Forecasting, 36(3):1181– 1191, 2020. 3, 5, 7, 1, 2
work page 2020
-
[55]
Trajectory-wise mul- tiple choice learning for dynamics generalization in rein- forcement learning
Younggyo Seo, Kimin Lee, Ignasi Clavera, Thanard Kuru- tach, Jinwoo Shin, and Pieter Abbeel. Trajectory-wise mul- tiple choice learning for dynamics generalization in rein- forcement learning. InProceedings of the 34th International Conference on Neural Information Processing Systems, Red Hook, NY , USA, 2020. Curran Associates Inc. 2, 1
work page 2020
-
[56]
Trajectory-wise multi- ple choice learning for dynamics generalization in reinforce- ment learning
Younggyo Seo, Kimin Lee, Ignasi Clavera, Thanard Kuru- tach, Jinwoo Shin, and Pieter Abbeel. Trajectory-wise multi- ple choice learning for dynamics generalization in reinforce- ment learning. InAdvances in Neural Information Process- ing Systems (NeurIPS), pages 17672–17683, 2020. 5
work page 2020
-
[57]
Recursive and di- rect multi-step forecasting: the best of both worlds
Souhaib Ben Taieb and Rob J Hyndman. Recursive and di- rect multi-step forecasting: the best of both worlds. 2012. 2
work page 2012
-
[58]
Instance Normalization: The Missing Ingredient for Fast Stylization
Dmitry Ulyanov, Andrea Vedaldi, and Victor S. Lempitsky. Instance normalization: The missing ingredient for fast styl- ization.ArXiv, abs/1607.08022, 2016. 4
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[59]
Edoardo Urettini, Daniele Atzeni, Reshawn J. Ramjattan, and Antonio Carta. Gas-norm: Score-driven adaptive nor- malization for non-stationary time series forecasting in deep learning. InProceedings of the 33rd ACM International Con- ference on Information and Knowledge Management, page 2282–2291, New York, NY , USA, 2024. Association for Computing Machinery. 4
work page 2024
-
[60]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neu- ral Information Processing Systems. Curran Associates, Inc.,
-
[61]
Shiyu Wang, Haixu Wu, Xiaoming Shi, Tengge Hu, Huakun Luo, Lintao Ma, James Y . Zhang, and JUN ZHOU. Timemixer: Decomposable multiscale mixing for time se- ries forecasting. InThe Twelfth International Conference on Learning Representations, 2024. 2, 8
work page 2024
-
[62]
Maddix, Jan Gasthaus, Dean Foster, and Tim Januschowski
Yuyang Wang, Alex Smola, Danielle C. Maddix, Jan Gasthaus, Dean Foster, and Tim Januschowski. Deep fac- tors for forecasting, 2019. 1
work page 2019
-
[63]
A multi-horizon quantile recurrent forecaster
Ruofeng Wen, Kari Torkkola, and Balakrishnan Narayanaswamy. A multi-horizon quantile recurrent forecaster. InAdvances in Neural Information Processing Systems (NeurIPS), 2017. 3
work page 2017
-
[64]
Autoformer: Decomposition transformers with auto- correlation for long-term series forecasting
Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with auto- correlation for long-term series forecasting. InAdvances in Neural Information Processing Systems, pages 22419– 22430, 2021. 3
work page 2021
-
[65]
Interpretable weather forecasting for worldwide sta- tions with a unified deep model.Nat
Haixu Wu, Hang Zhou, Mingsheng Long, and Jianmin Wang. Interpretable weather forecasting for worldwide sta- tions with a unified deep model.Nat. Mac. Intell., 5(6):602– 611, 2023. 1
work page 2023
-
[66]
Yuxin Wu and Kaiming He. Group normalization. InCom- puter Vision – ECCV 2018: 15th European Conference, Mu- nich, Germany, September 8-14, 2018, Proceedings, Part XIII, page 3–19, Berlin, Heidelberg, 2018. Springer-Verlag. 6, 7, 1
work page 2018
-
[67]
Graph wavenet for deep spatial-temporal graph modeling
Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, and Chengqi Zhang. Graph wavenet for deep spatial-temporal graph modeling. InProceedings of the 28th International Joint Conference on Artificial Intelligence, page 1907–1913. AAAI Press, 2019. 5
work page 1907
-
[68]
Relation is an option for processing context information
Kazunori D Yamada, M Samy Baladram, and Fangzhou Lin. Relation is an option for processing context information. Frontiers in Artificial Intelligence, 5:924688, 2022. 3
work page 2022
-
[69]
Barlow twins: Self-supervised learning via redundancy reduction, 2021
Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, and St´ephane Deny. Barlow twins: Self-supervised learning via redundancy reduction, 2021. 3, 8
work page 2021
-
[70]
Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. Are transformers effective for time series forecasting? InPro- ceedings of the AAAI Conference on Artificial Intelligence, pages 11121–11128, 2023. 2, 4, 8
work page 2023
-
[71]
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning re- quires rethinking generalization.Communications of the ACM, 64(3):107–115, 2021. 2
work page 2021
-
[72]
mixup: Beyond empirical risk minimiza- tion
Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimiza- tion. InInternational Conference on Learning Representa- tions (ICLR), 2018. 2
work page 2018
-
[73]
Deep spatio- temporal residual networks for citywide crowd flows predic- tion
Junbo Zhang, Yu Zheng, and Dekang Qi. Deep spatio- temporal residual networks for citywide crowd flows predic- tion. InProceedings of the Thirty-First AAAI Conference on Artificial Intelligence, page 1655–1661. AAAI Press, 2017. 5
work page 2017
-
[74]
Gps: A probabilistic distributional similarity with gumbel priors for set-to-set matching
Ziming Zhang, Fangzhou Lin, Haotian Liu, Jose Morales, Haichong Zhang, Kazunori Yamada, Vijaya B Kolachalama, and Venkatesh Saligrama. Gps: A probabilistic distributional similarity with gumbel priors for set-to-set matching. InThe Thirteenth International Conference on Learning Represen- tations, 2025. 2
work page 2025
-
[75]
Informer: Beyond efficient transformer for long sequence time-series forecast- ing, 2021
Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecast- ing, 2021. 2, 3
work page 2021
-
[76]
Fedformer: Frequency enhanced decom- posed transformer for long-term series forecasting
Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. Fedformer: Frequency enhanced decom- posed transformer for long-term series forecasting. InIn- ternational Conference on Machine Learning, pages 27268– 27286. PMLR, 2022. 3 TimePre: Bridging Accuracy, Efficiency, and Stability in Probabilistic Time-Series Forecasting Supplementary Material
work page 2022
-
[77]
Related Work 6.1. Multiple Choice Learning (MCL) The Multiple Choice Learning framework provides an ef- fective paradigm for modeling diverse outcomes under un- certainty. Originally proposed by Guzm´an-Rivera et al. [17] as an assignment-based multi-model training framework, MCL was later reformulated into a differentiable winner- takes-all (WTA) loss by...
-
[78]
Experiment Details 7.1. Datasets We evaluate our method on six widely used probabilistic time-series forecasting benchmarks from the GluonTS li- brary, namelySolar,Electricity,Exchange,Traffic,Taxi, andWikipedia. All datasets contain strictly positive real- valued sequences and come with standard train–test splits defined in prior work. An overview of the...
work page 2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.