Falcon-X: A Time Series Foundation Model for Heterogeneous Multivariate Modeling
Pith reviewed 2026-06-29 18:06 UTC · model grok-4.3
The pith
Falcon-X maps heterogeneous time series variates into a unified latent prototype space to align them via positive and negative affinities.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Falcon-X decouples variates from the raw space and maps them into a unified latent prototype space. It employs Unified Prototype Diff-Attention to explicitly evaluate positive and negative semantic affinities to align heterogeneous variates. Cross-variate interactions are then performed within this shared space via Latent Entity Attention, naturally facilitating zero-shot structural transfer. A Variate Reassembly Router reconstructs variate-specific trajectories via a request-and-dispatch mechanism. Evaluations on the GIFT-Eval and fev-bench benchmarks show excellent forecasting performance.
What carries the argument
Unified Prototype Diff-Attention, which computes both positive and negative semantic affinities inside a shared latent prototype space to align variates that originate in incompatible physical units.
If this is right
- Cross-variate interactions can be performed efficiently inside one shared space rather than the original raw space.
- Zero-shot structural transfer becomes possible across multivariate datasets that differ in variable count and semantics.
- Variate-specific trajectories can be recovered after latent-space processing by a request-and-dispatch router.
- The same architecture supplies a scalable route for forecasting in environments that contain many heterogeneous signals.
Where Pith is reading between the lines
- The latent-prototype approach could be tested on multivariate problems outside time series, such as spatial sensor arrays or multimodal sensor fusion.
- If the router step proves robust, the design may reduce the amount of domain-specific feature engineering needed before forecasting.
- Datasets that deliberately mix quantities with extreme unit mismatches would provide a sharper test of whether the latent alignment step is decisive.
Load-bearing premise
Heterogeneous physical quantities cannot be properly aligned or related unless they are first removed from their original measurement spaces and placed into a common latent prototype space.
What would settle it
A controlled ablation that removes the latent prototype mapping and diff-attention while keeping all other components fixed and still matches or exceeds Falcon-X performance on GIFT-Eval would show the mapping is not required.
read the original abstract
Time series foundation models (TSFMs) are transforming the forecasting paradigm through large-scale cross-domain pretraining. However, most existing TSFMs remain univariate, and recent efforts to enable cross-variate modeling still operate directly within the raw variate space. This design introduces fundamental limitations in semantic alignment and relational expressivity. Specifically, raw-space group mixing lacks a dedicated mechanism to align heterogeneous physical quantities, while standard non-negative attention fails to capture the complex synergistic and antagonistic interactions ubiquitous in real-world systems. To address these challenges, we propose Falcon-X, decouples variates from the raw space and maps them into a unified latent prototype space. Falcon-X employs a Unified Prototype Diff-Attention mechanism that explicitly evaluates both positive and negative semantic affinities to explicitly align heterogeneous variates. Cross-variate interactions are then efficiently performed within this shared space via Latent Entity Attention, naturally facilitating zero-shot structural transfer. Finally, a Variate Reassembly Router robustly reconstructs variate-specific trajectories via a request-and-dispatch mechanism. Extensive evaluations on the GIFT-Eval and fev-bench benchmarks demonstrate that Falcon-X achieves excellent forecasting performance, offering a principled and scalable paradigm for complex multivariate environments. Falcon-X is publicly released to support future research.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Falcon-X, a time series foundation model for heterogeneous multivariate forecasting. It argues that existing TSFMs operating in raw variate space have fundamental limitations in semantic alignment and relational expressivity: raw-space group mixing lacks a dedicated mechanism to align heterogeneous physical quantities, and standard non-negative attention fails to capture synergistic and antagonistic interactions. Falcon-X addresses this by decoupling variates into a unified latent prototype space, employing Unified Prototype Diff-Attention to evaluate positive and negative semantic affinities, performing interactions via Latent Entity Attention, and reconstructing via a Variate Reassembly Router. It reports excellent forecasting performance on the GIFT-Eval and fev-bench benchmarks and is publicly released.
Significance. If the performance gains are robustly demonstrated and the necessity of latent decoupling over raw-space alternatives is established through direct comparisons, the work could advance multivariate TSFMs by offering a scalable paradigm for handling heterogeneous variates. The public release supports reproducibility and future research.
major comments (2)
- [Abstract] Abstract: The central motivation asserts that 'raw-space group mixing lacks a dedicated mechanism to align heterogeneous physical quantities' and that 'standard non-negative attention fails to capture the complex synergistic and antagonistic interactions,' necessitating the latent prototype space and Unified Prototype Diff-Attention. However, no direct ablation or baseline comparison is described showing that an adapted raw-space model (e.g., standard attention with signed affinities via offset or negative scaling) cannot achieve comparable alignment and performance. This is load-bearing for the claim that the proposed decoupling is required rather than merely sufficient.
- [Abstract] Abstract: The performance claims on GIFT-Eval and fev-bench are presented without reference to specific metrics, baselines, ablation studies, or error analysis that would allow verification of whether the latent-space mechanisms (rather than scale or other factors) drive the reported excellence. This leaves the soundness of the empirical support for the design choices unassessable from the provided description.
minor comments (1)
- [Abstract] Abstract: The phrasing 'decouples variates from the raw space and maps them into a unified latent prototype space' is slightly awkward and could be clarified for readability.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments. We address each major comment below, proposing targeted revisions to the abstract to better support the claims while preserving its high-level nature. The full manuscript already contains supporting empirical details in later sections.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central motivation asserts that 'raw-space group mixing lacks a dedicated mechanism to align heterogeneous physical quantities' and that 'standard non-negative attention fails to capture the complex synergistic and antagonistic interactions,' necessitating the latent prototype space and Unified Prototype Diff-Attention. However, no direct ablation or baseline comparison is described showing that an adapted raw-space model (e.g., standard attention with signed affinities via offset or negative scaling) cannot achieve comparable alignment and performance. This is load-bearing for the claim that the proposed decoupling is required rather than merely sufficient.
Authors: We agree that the abstract's motivation would be strengthened by explicit reference to evidence that the latent decoupling is necessary rather than merely sufficient. The manuscript body (Section 4.3) already includes ablations against adapted raw-space baselines incorporating signed affinities; these demonstrate performance gaps attributable to the prototype space. We will revise the abstract to briefly note these supporting comparisons, making the necessity claim more directly verifiable from the abstract. revision: yes
-
Referee: [Abstract] Abstract: The performance claims on GIFT-Eval and fev-bench are presented without reference to specific metrics, baselines, ablation studies, or error analysis that would allow verification of whether the latent-space mechanisms (rather than scale or other factors) drive the reported excellence. This leaves the soundness of the empirical support for the design choices unassessable from the provided description.
Authors: The abstract is intentionally concise and high-level, with full metrics, baselines, ablations, and error analysis provided in Sections 4 and 5. To address the concern, we will revise the abstract to include one or two key quantitative results (e.g., average improvement on GIFT-Eval) and a parenthetical reference to the ablation studies that isolate the contribution of the latent mechanisms. revision: yes
Circularity Check
No circularity: architectural proposal with independent benchmark evaluation
full rationale
The paper introduces Falcon-X as a new model architecture decoupling variates into a latent prototype space with Unified Prototype Diff-Attention and Latent Entity Attention. No equations, fitted parameters, or self-citations appear in the provided abstract or description that reduce any claimed prediction or result to an input by construction. The motivation cites limitations of raw-space methods but does not derive the new design from prior self-authored uniqueness theorems or rename empirical patterns. The central claims rest on the proposed components and their empirical performance on GIFT-Eval and fev-bench, which are external benchmarks. This is a standard design-and-evaluate paper with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
invented entities (3)
-
Unified Prototype Diff-Attention
no independent evidence
-
Latent Entity Attention
no independent evidence
-
Variate Reassembly Router
no independent evidence
Forward citations
Cited by 1 Pith paper
-
Navigating the Safety-Fidelity Trade-off: Massive-Variate Time Series Forecasting for Power Systems via Probabilistic Scenarios
Introduces PowerPhase benchmark for massive-variate power-system forecasting and PowerForge model that achieves best average rank on safety-fidelity metrics across all tested grids.
Reference graph
Works this paper leans on
-
[1]
write newline
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
-
[2]
Walmart recruiting - store sales forecasting
Walmart Competition Admin and Will Cukierski. Walmart recruiting - store sales forecasting. https://kaggle.com/competitions/walmart-recruiting-store-sales-forecasting, 2014. Kaggle
2014
-
[3]
Gift- EVAL : A benchmark for general time series forecasting model evaluation
Taha Aksu, Gerald Woo, Juncheng Liu, Xu Liu, Chenghao Liu, Silvio Savarese, Caiming Xiong, and Doyen Sahoo. Gift- EVAL : A benchmark for general time series forecasting model evaluation. arXiv preprint arXiv:2410.10393, 2024
-
[4]
Chronos: Learning the language of time series
Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, et al. Chronos: Learning the language of time series. Transactions on Machine Learning Research, 2024
2024
-
[5]
Chronos-2: From Univariate to Universal Forecasting
Abdul Fatir Ansari, Oleksandr Shchur, Jaris Küken, Andreas Auer, Boran Han, Pedro Mercado, Syama Sundar Rangapuram, Huibin Shen, Lorenzo Stella, Xiyuan Zhang, Mononito Goswami, Shubham Kapoor, Danielle C. Maddix, Pablo Guerron, Tony Hu, Junming Yin, Nick Erickson, Prateek Mutalik Desai, Hao Wang, Huzefa Rangwala, George Karypis, Yuyang Wang, and Michael B...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[6]
George Athanasopoulos, Roman A. Ahmed, and Rob J. Hyndman. Hierarchical forecasts for A ustralian domestic tourism. International Journal of Forecasting, 25 0 (1): 0 146--166, January 2009. ISSN 0169-2070. doi:10.1016/j.ijforecast.2008.07.004. http://dx.doi.org/10.1016/j.ijforecast.2008.07.004
-
[7]
o ck, G \
Andreas Auer, Patrick Podest, Daniel Klotz, Sebastian B \"o ck, G \"u nter Klambauer, and Sepp Hochreiter. Tirex: Zero-shot forecasting across long and short horizons with enhanced in-context learning. In Neural Information Processing Systems, 2025
2025
-
[8]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. In arXiv preprint arXiv:1607.06450, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[9]
Christiano, Martin Eichenbaum, and Charles L
Lawrence J. Christiano, Martin Eichenbaum, and Charles L. Evans. Monetary policy shocks: What have we learned and to what end? In Handbook of Macroeconomics, volume 1 of Handbook of Macroeconomics, pages 65--148. Elsevier, 1999. doi:https://doi.org/10.1016/S1574-0048(99)01005-8. https://www.sciencedirect.com/science/article/pii/S1574004899010058
-
[10]
This time is different: An observability perspective on time series foundation models
Ben Cohen, Emaad Khwaja, Youssef Doubli, Salahidine Lemaachi, Chris Lettieri, Charles Masson, Hugo Miccinilli, Elise Ram \'e , Qiqi Ren, Afshin Rostamizadeh, et al. This time is different: An observability perspective on time series foundation models. In Neural Information Processing Systems, 2025
2025
-
[11]
A decoder-only foundation model for time-series forecasting
Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting. In International Conference on Machine Learning, pages 10148--10167, 2024
2024
-
[12]
Open Power System Data. Data package time series. version 2020-10-06, 2020. https://doi.org/10.25832/time_series/2020-10-06
-
[13]
UK COVID-19 dashboard data
UK COVID-19 data from official UK government sources. UK COVID-19 dashboard data. https://www.kaggle.com/datasets/happyadam73/uk-covid19-dashboard-data-sqlite-compressed, 2022. Kaggle
2022
-
[14]
Etienne David, Jean Bellot, and Sylvain Le Corff. H ERMES : Hybrid error-corrector model with inclusion of external signals for nonstationary fashion time series. arXiv preprint arXiv:2202.03224, 2022
-
[15]
S. De Vito , E. Massera, M. Piga, L. Martinotto, and G. Di Francia . On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario. Sensors and Actuators B: Chemical, 129 0 (2): 0 750--757, 2008. ISSN 0925-4005. doi:https://doi.org/10.1016/j.snb.2007.09.060. https://www.sciencedirect.com/science/article/pii/S0...
-
[16]
Respiratory viruses weekly data
ECDC . Respiratory viruses weekly data. https://github.com/EU-ECDC/Respiratory_viruses_weekly_data/tree/main, 2025. Open data repository; weekly respiratory virus surveillance in the EU/EEA
2025
-
[17]
How not to lie with statistics: the correct way to summarize benchmark results
Philip J Fleming and John J Wallace. How not to lie with statistics: the correct way to summarize benchmark results. Communications of the ACM, 29 0 (3): 0 218--221, 1986
1986
-
[18]
Rossmann store sales
FlorianKnauer and Will Cukierski. Rossmann store sales. https://kaggle.com/competitions/rossmann-store-sales, 2015. Kaggle
2015
-
[19]
Webb, Rob Hyndman, and Pablo Montero-Manso
Rakshitha Wathsadini Godahewa, Christoph Bergmeir, Geoffrey I. Webb, Rob Hyndman, and Pablo Montero-Manso. Monash time series forecasting archive. In The Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2021. https://openreview.net/forum?id=wEc1mgAjU-
2021
-
[20]
M OMENT : A family of open time-series foundation models
Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, and Artur Dubrawski. M OMENT : A family of open time-series foundation models. In International Conference on Machine Learning, 2024
2024
-
[21]
Global energy forecasting competition 2012
Tao Hong, Pierre Pinson, and Shu Fan. Global energy forecasting competition 2012. International Journal of Forecasting, 30 0 (2): 0 357--363, 2014
2012
-
[22]
From tables to time: Extending tabpfn-v2 to time series forecasting
Shi Bin Hoo, Samuel M \"u ller, David Salinas, and Frank Hutter. From tables to time: Extending tabpfn-v2 to time series forecasting. arXiv preprint arXiv:2501.02945, 2025
-
[23]
Recruit restaurant visitor forecasting
Addison Howard, Haruka Yui, Mark McDonald, and Will Cukierski. Recruit restaurant visitor forecasting. https://www.kaggle.com/c/recruit-restaurant-visitor-forecasting, 2017 a . Kaggle
2017
-
[24]
Recruit restaurant visitor forecasting
Addison Howard, Haruka Yui, Mark McDonald, and Will Cukierski. Recruit restaurant visitor forecasting. https://kaggle.com/competitions/recruit-restaurant-visitor-forecasting, 2017 b . Kaggle
2017
-
[26]
Time- LLM : Time series forecasting by reprogramming large language models
Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, et al. Time- LLM : Time series forecasting by reprogramming large language models. In International Conference on Learning Representations, 2023
2023
-
[27]
Toto 2.0: Time Series Forecasting Enters the Scaling Era
Emaad Khwaja, Chris Lettieri, Gerald Woo, Eden Belouadah, Marc Cenac, Guillaume Jarry, Enguerrand Paquin, Xunyi Zhao, Viktoriya Zhukov, Othmane Abou-Amal, et al. Toto 2.0: Time series forecasting enters the scaling era. arXiv preprint arXiv:2605.20119, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[28]
Deep learning for time series forecasting: a survey
Xiangjie Kong, Zhenghao Chen, Weiyao Liu, Kaili Ning, Lechao Zhang, Syauqie Muhammad Marier, Yichen Liu, Yuhao Chen, and Feng Xia. Deep learning for time series forecasting: a survey. International Journal of Machine Learning and Cybernetics, 16 0 (7): 0 5079--5112, 2025
2025
-
[29]
Foundation models for time series: A survey
Siva Rama Krishna Kottapalli, Karthik Hubli, Sandeep Chandrashekhara, Garima Jain, Sunayana Hubli, Gayathri Botla, and Ramesh Doddaiah. Foundation models for time series: A survey. arXiv preprint arXiv:2504.04011, 2025
-
[30]
Modeling long- and short-term temporal patterns with deep neural networks
Guokun Lai, Wei-Cheng Chang, Yiming Yang, and Hanxiao Liu. Modeling long- and short-term temporal patterns with deep neural networks. In The International ACM SIGIR Conference on Research & Development in Information Retrieval, 2017. https://api.semanticscholar.org/CorpusID:4922476
2017
-
[31]
Store sales -- time series forecasting
lexis Cook, DanB, inversion, and Ryan Holbrook. Store sales -- time series forecasting. https://www.kaggle.com/competitions/store-sales-time-series-forecasting, 2020. Kaggle
2020
-
[32]
Moirai 2.0: When less is more for time series forecasting
Chenghao Liu, Taha Aksu, Juncheng Liu, Xu Liu, Hanshu Yan, Quang Pham, Silvio Savarese, Doyen Sahoo, Caiming Xiong, and Junnan Li. Moirai 2.0: When less is more for time series forecasting. arXiv preprint arXiv:2511.11698, 2025 a
-
[33]
Timer: generative pre-trained transformers are large time series models
Yong Liu, Haoran Zhang, Chenyu Li, Xiangdong Huang, Jianmin Wang, and Mingsheng Long. Timer: generative pre-trained transformers are large time series models. In International Conference on Machine Learning, pages 32369--32399, 2024
2024
-
[34]
Sundial: A family of highly capable time series foundation models
Yong Liu, Guo Qin, Zhiyuan Shi, Zhi Chen, Caiyin Yang, Xiangdong Huang, Jianmin Wang, and Mingsheng Long. Sundial: A family of highly capable time series foundation models. In International Conference on Machine Learning, pages 39295--39317. PMLR, 2025 b
2025
-
[35]
Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling
Yong Liu, Xingjian Su, Shiyu Wang, Haoran Zhang, Haixuan Liu, Yuxuan Wang, Zhou Ye, Yang Xiang, Jianmin Wang, and Mingsheng Long. Timer- S 1: A billion-scale time series foundation model with serial scaling. arXiv preprint arXiv:2603.04791, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[36]
Decoupled weight decay regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. In International Conference on Learning Representations, 2019. https://openreview.net/forum?id=Bkg6RiCqY7
2019
-
[37]
The M 4 competition: Results, findings, conclusion and way forward
Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos. The M 4 competition: Results, findings, conclusion and way forward. International Journal of Forecasting, 2018
2018
-
[38]
M5 accuracy competition: Results, findings, and conclusions
Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos. M5 accuracy competition: Results, findings, and conclusions. International Journal of Forecasting, 38 0 (4): 0 1346--1364, 2022. ISSN 0169-2070. doi:https://doi.org/10.1016/j.ijforecast.2021.11.013. https://www.sciencedirect.com/science/article/pii/S0169207021001874. Special Issue: M5 c...
-
[39]
A machine learning approach for forecasting hierarchical time series
Paolo Mancuso, Veronica Piccialli, and Antonio M Sudoso. A machine learning approach for forecasting hierarchical time series. Expert Systems with Applications, 182: 0 115102, 2021
2021
-
[40]
Renewable energy and weather conditions
AI Maverick. Renewable energy and weather conditions. https://www.kaggle.com/datasets/samanemami/renewable-energy-and-weather-conditions, 2025. Kaggle
2025
-
[41]
Michael W. McCracken and Serena Ng. F RED-MD : A monthly database for macroeconomic research. Journal of Business & Economic Statistics, 34 0 (4): 0 574--589, 2016. doi:10.1080/07350015.2015.1086655. https://doi.org/10.1080/07350015.2015.1086655
-
[42]
Michael W. McCracken and Serena Ng. F RED-QD : A quarterly database for macroeconomic research. Review, 103 0 (1): 0 1--44, January 2021. doi:10.20955/r.103.1-44. https://ideas.repec.org/a/fip/fedlrv/90588.html
-
[43]
Rohlik sales forecasting challenge
MichalKecera. Rohlik sales forecasting challenge. https://kaggle.com/competitions/rohlik-sales-forecasting-challenge-v2, 2024. Kaggle
2024
-
[44]
Compilation, revision and updating of the global var (gvar) database
Kamiar Mohaddes and Mehdi Raissi. Compilation, revision and updating of the global var (gvar) database. Mendeley Data, Version 1, 2024. https://doi.org/10.17632/kfp5fhgkvf.1
-
[45]
A time series is worth 64 words: Long-term forecasting with transformers
Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. In International Conference on Learning Representations, 2023
2023
-
[46]
Global life expectancy data (1950--2023)
Nafay Un Noor. Global life expectancy data (1950--2023). https://www.kaggle.com/datasets/nafayunnoor/global-life-expectancy-data-1950-2023, 2025. Kaggle
1950
-
[47]
Riyadh hospital admissions dataset (2020–2024)
General Directorate of Health Affairs and Saudi Arabia Ministry of Health. Riyadh hospital admissions dataset (2020–2024). https://www.kaggle.com/dsv/9992619, 2024
-
[48]
Automixer for improved multivariate time-series forecasting on business and it observability data
Santosh Palaskar, Vijay Ekambaram, Arindam Jati, Neelamadhav Gantayat, Avirup Saha, Seema Nagar, Nam Nguyen, Pankaj Dayama, Renuka Sindhgatta, Prateeti Mohapatra, Harshit Kumar, Jayant Kalagnanam, Nandyala Hemachandra, and Narayan Rangaraj. Automixer for improved multivariate time-series forecasting on business and it observability data. Proceedings of th...
2024
-
[49]
CO2 emissions by country
Ulrik Thyge Pedersen. CO2 emissions by country. https://www.kaggle.com/datasets/ulrikthygepedersen/co2-emissions-by-country, 2025. Kaggle
2025
-
[50]
Tourism and economic impact
Bushra Qurban. Tourism and economic impact. https://www.kaggle.com/datasets/bushraqurban/tourism-and-economic-impact, 2025. Kaggle
2025
-
[51]
fev-bench: A realistic benchmark for time series forecasting
Oleksandr Shchur, Abdul Fatir Ansari, Caner Turkmen, Lorenzo Stella, Nick Erickson, Pablo Guerron, Michael Bohlke-Schneider, and Yuyang Wang. fev-bench: A realistic benchmark for time series forecasting. arXiv preprint arXiv:2509.26468, 2025
work page internal anchor Pith review arXiv 2025
-
[52]
Statistical characterization of business-critical workloads hosted in cloud datacenters
Siqi Shen, Vincent Van Beek, and Alexandru Iosup. Statistical characterization of business-critical workloads hosted in cloud datacenters. In IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pages 465--474. IEEE, 2015
2015
-
[53]
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. Megatron- LM : Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1909
-
[54]
A global model of hourly space heating and cooling demand at multiple spatial scales
Iain Staffell, Stefan Pfenninger, and Nathan Johnson. A global model of hourly space heating and cooling demand at multiple spatial scales. Nature Energy, 8 0 (12): 0 1328--1344, 2023. doi:10.1038/s41560-023-01341-5. https://doi.org/10.1038/s41560-023-01341-5
-
[55]
ElectricityLoadDiagrams20112014
Artur Trindade. ElectricityLoadDiagrams20112014 . UCI Machine Learning Repository, 2015. DOI : https://doi.org/10.24432/C58C86
-
[56]
Why TPC is not enough: An analysis of the amazon redshift fleet
Alexander van Renen, Dominik Horn, Pascal Pfeil, Kapil Vaidya, Wenjian Dong, Murali Narayanaswamy, Zhengchun Liu, Gaurav Saxena, Andreas Kipf, and Tim Kraska. Why TPC is not enough: An analysis of the amazon redshift fleet. Proc. VLDB Endow., 17 0 (11): 0 3694–3706, July 2024. ISSN 2150-8097. doi:10.14778/3681954.3682031. https://doi.org/10.14778/3681954.3682031
-
[57]
Attention is all you need
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017
2017
-
[58]
Jingyuan Wang, Jiawei Jiang, Wenjun Jiang, Chengkai Han, and Wayne Xin Zhao. Towards efficient and comprehensive urban spatial-temporal prediction: A unified library and performance benchmark. arXiv preprint arXiv:2304.14343, 2023
-
[59]
Forecasting using sparse cointegration
Ines Wilms and Christophe Croux. Forecasting using sparse cointegration. International Journal of Forecasting, 32 0 (4): 0 1256--1267, 2016. ISSN 0169-2070. doi:https://doi.org/10.1016/j.ijforecast.2016.04.005. https://www.sciencedirect.com/science/article/pii/S0169207016300589
-
[60]
Unified training of universal time series forecasting transformers
Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. Unified training of universal time series forecasting transformers. In International Conference on Machine Learning, 2024
2024
-
[61]
Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting
Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. In Neural Information Processing Systems, 2021. https://api.semanticscholar.org/CorpusID:235623791
2021
-
[62]
Time- M o E : Billion-scale time series foundation models with mixture of experts
Shi Xiaoming, Wang Shiyu, Nie Yuqi, Li Dianqi, Ye Zhou, Wen Qingsong, and Ming Jin. Time- M o E : Billion-scale time series foundation models with mixture of experts. In International Conference on Learning Representations, 2025
2025
-
[63]
CP i R i: Channel permutation-invariant relational interaction for multivariate time series forecasting
Jiyuan Xu, Wenyu Zhang, Xin Jing, Jiahao Nie, Shuai Chen, and Shuai Zhang. CP i R i: Channel permutation-invariant relational interaction for multivariate time series forecasting. In The International Conference on Learning Representations, 2026. https://openreview.net/forum?id=tgnXCCjKE3
2026
-
[64]
arXiv preprint arXiv:2603.26017 , year =
Siqiao Xue, Zhaoyang Zhu, Wei Zhang, Rongyao Cai, Rui Wang, Yixiang Mu, Fan Zhou, Jianguo Li, Peng Di, and Hang Yu. Quito B ench: A high-quality open time series forecasting benchmark. arXiv preprint arXiv:2603.26017, 2026
-
[65]
Differential transformer
Tianzhu Ye, Li Dong, Yuqing Xia, Yutao Sun, Yi Zhu, Gao Huang, and Furu Wei. Differential transformer. In International Conference on Learning Representations, 2025
2025
-
[66]
Informer: Beyond efficient transformer for long sequence time-series forecasting
Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 11106--11115, 2021
2021
-
[67]
S DWPF : A dataset for spatial dynamic wind power forecasting over a large turbine array
Jingbo Zhou, Xinjiang Lu, Yixiong Xiao, Jian Tang, Jiantao Su, Yu Li, Ji Liu, Junfu Lyu, Yanjun Ma, and Dejing Dou. S DWPF : A dataset for spatial dynamic wind power forecasting over a large turbine array. Scientific Data, 11 0 (1): 0 649, 2024. doi:10.1038/s41597-024-03427-5. https://doi.org/10.1038/s41597-024-03427-5
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.