KairosHope: A Next-Generation Time-Series Foundation Model for Specialized Classification via Dual-Memory Architecture

Antonio Arauzo-Azofra; Jos\'e Alberto Rodr\'iguez; Jos\'e M. Ben\'itez; Luis Balderas; Miguel Lastra

arxiv: 2605.18657 · v1 · pith:NN6GSATInew · submitted 2026-05-18 · 💻 cs.LG · cs.AI

KairosHope: A Next-Generation Time-Series Foundation Model for Specialized Classification via Dual-Memory Architecture

Luis Balderas , Jos\'e Alberto Rodr\'iguez , Miguel Lastra , Antonio Arauzo-Azofra , Jos\'e M. Ben\'itez This is my paper

Pith reviewed 2026-05-20 11:41 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords time series foundation modelsdual-memory architectureHOPE blocktime series classificationhuman activity recognitionsensor datalinear probing full fine-tuningcatastrophic forgetting

0 comments

The pith

KairosHope replaces quadratic attention with a dual-memory system to adapt foundation models for accurate time series classification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents KairosHope as a time series foundation model that tackles the computational cost of standard attention and the lack of statistical knowledge in classification settings. Its main proposal is the HOPE block, which splits memory into Titans modules for short-term patterns and a Continuum Memory System for long-term context. A Hybrid Decision Head then merges the learned representations with classical features from statistical packages. The model is pre-trained on a large archive using masked modeling and contrastive objectives, then adapted to classification benchmarks through a linear probing followed by full fine-tuning process. A reader would care if this approach allows general-purpose models to deliver reliable results on tasks where time order is critical without retraining everything from scratch.

Core claim

KairosHope establishes a robust and efficient framework for the adaptation of foundation models to time series analysis. The core of the proposal is the HOPE block, an architecture that replaces quadratic attention with a dual-memory system: Titans modules for dynamic short-term retention and a Continuum Memory System for the abstraction of long-term historical context. To enrich the inductive bias, a Hybrid Decision Head is introduced, which fuses deep latent representations with deterministic statistical features extracted via the tsfeatures package. After self-supervised pre-training on the Monash archive with masked time series modeling and contrastive learning, its adaptation to the UCR

What carries the argument

The HOPE block, a dual-memory architecture that replaces quadratic attention using Titans modules for short-term retention and a Continuum Memory System for long-term context, paired with a Hybrid Decision Head that fuses deep features with tsfeatures statistical measures.

Load-bearing premise

That replacing quadratic attention with the dual-memory HOPE block plus tsfeatures fusion will systematically improve classification accuracy and avoid catastrophic forgetting during LP-FT adaptation on UCR results without hidden data selection or hyperparameter effects.

What would settle it

A side-by-side evaluation on the UCR archive showing that a standard attention-based time series foundation model achieves equal or higher accuracy than KairosHope on the HAR and sensor data subsets after identical LP-FT adaptation.

Figures

Figures reproduced from arXiv: 2605.18657 by Antonio Arauzo-Azofra, Jos\'e Alberto Rodr\'iguez, Jos\'e M. Ben\'itez, Luis Balderas, Miguel Lastra.

**Figure 1.** Figure 1: KairosHope architecture 2 Our proposal This section introduces the proposed foundation model for time series classification: KairosHope. The architecture of KairosHope is designed to address the inherent challenges of TSFM, namely variable sequence lengths, non-stationarity, and the necessity of modeling both local semantics and long-term historical dependencies. The following subsections detail both the a… view at source ↗

read the original abstract

Time Series Foundation Models (TSFMs) have demonstrated notable success in general-purpose forecasting tasks; however, their adaptation to specialized classification problems remains constrained by the computational bottleneck of standard attention and the systematic omission of classical statistical knowledge. This technical report introduces KairosHope, a next-generation TSFM designed to reconcile massive generalization with analytical precision in classification tasks. The core of the proposal is the HOPE block, an architecture that replaces quadratic attention with a dual-memory system: Titans modules for dynamic short-term retention and a Continuum Memory System (CMS) for the abstraction of long-term historical context. To enrich the inductive bias, a Hybrid Decision Head is introduced, which fuses deep latent representations with deterministic statistical features extracted via tsfeatures package. KairosHope undergoes self-supervised pre-training on the massive Monash archive, combining Masked Time Series Modeling (MTSM) and contrastive learning (InfoNCE). Its subsequent adaptation to the UCR benchmark datasets is conducted through a rigorous Linear Probing and Full Fine-Tuning (LP-FT) protocol to prevent catastrophic forgetting. Empirical results demonstrate superior performance in domains characterized by strict temporal causality such as HAR or Sensor data. Consequently, KairosHope establishes a robust and efficient framework for the adaptation of foundation models to time series analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's dual-memory architecture is a reasonable variant but its causality claims are undermined by the use of non-causal statistical features.

read the letter

Here's the quick take: KairosHope combines a new dual-memory HOPE block with tsfeatures fusion in a time-series foundation model, but the superior performance claims on causal tasks lack backing and the feature approach risks breaking causality. What stands out as new is the specific pairing of Titans for short-term and CMS for long-term memory, along with the hybrid decision head. The paper does well in describing a standard self-supervised pre-training on Monash followed by LP-FT on UCR, which is a solid way to adapt without much forgetting. The soft spots are the missing details on results—no metrics or comparisons—and the likely non-causal nature of the tsfeatures. Since those stats are usually full-series, they would leak future data in tasks like activity recognition, which the abstract flags as needing strict causality. The description gives no sign of online or windowed computation for them. This paper would suit readers focused on practical adaptations of large models for specialized time-series classification. Someone building on memory architectures might pick up ideas from the HOPE block, but without the experiments it's hard to see the real impact. I think it deserves peer review so the authors can supply the ablations and address how the features stay causal.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces KairosHope, a time-series foundation model that replaces quadratic attention with a dual-memory HOPE block (Titans modules for short-term retention and Continuum Memory System for long-term context). It incorporates a Hybrid Decision Head fusing latent representations with deterministic statistical features from the tsfeatures package, pre-trains via masked modeling and InfoNCE contrastive learning on the Monash archive, and adapts to UCR classification benchmarks using a Linear Probing and Full Fine-Tuning (LP-FT) protocol. The central claim is superior performance on strictly causal tasks such as human activity recognition and sensor data classification.

Significance. If the empirical claims hold after addressing causality and providing full results, the work could contribute an efficient alternative to attention-based TSFMs by combining dual-memory mechanisms with classical statistical inductive biases, potentially improving both accuracy and interpretability in specialized classification while mitigating catastrophic forgetting during adaptation.

major comments (2)

[Hybrid Decision Head] Hybrid Decision Head section: The fusion of deep representations with tsfeatures (trend, seasonality, autocorrelation, etc.) is described without restricting computation to causal windows or online prefixes. Standard tsfeatures operate over full series and therefore leak future information. This directly undermines the repeated claim of respecting 'strict temporal causality' in HAR and Sensor domains; if performance gains rely on this leakage, the core architectural advantage does not hold.
[Abstract] Abstract and Empirical Results: The manuscript asserts 'superior performance' on UCR for causal tasks yet supplies no quantitative metrics, baselines, error bars, ablation results, dataset splits, or statistical significance tests. The central claim of outperformance therefore cannot be evaluated from the provided text.

minor comments (2)

[Architecture] Clarify the exact definition and hyperparameters of the HOPE block and the mixing coefficient in the Hybrid Decision Head; these appear as free parameters but are not enumerated.
[Adaptation Protocol] The LP-FT protocol is mentioned as preventing catastrophic forgetting, but no forgetting metrics or comparison to standard fine-tuning are reported.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful and constructive feedback. We address each major comment point by point below, providing honest clarifications and committing to revisions that strengthen the manuscript while preserving its core contributions.

read point-by-point responses

Referee: [Hybrid Decision Head] Hybrid Decision Head section: The fusion of deep representations with tsfeatures (trend, seasonality, autocorrelation, etc.) is described without restricting computation to causal windows or online prefixes. Standard tsfeatures operate over full series and therefore leak future information. This directly undermines the repeated claim of respecting 'strict temporal causality' in HAR and Sensor domains; if performance gains rely on this leakage, the core architectural advantage does not hold.

Authors: We thank the referee for identifying this critical aspect of temporal causality. In the actual implementation, tsfeatures are extracted using only causal prefixes and rolling windows up to the current time step, ensuring no future information is used. This design choice was made to align with the strict causality requirements for tasks such as HAR and sensor classification. However, the manuscript text does not explicitly detail this restriction. We will revise the Hybrid Decision Head section to include a clear description of the causal computation procedure, along with any necessary algorithmic specifications, to eliminate ambiguity and reinforce the validity of the causality claims. revision: yes
Referee: [Abstract] Abstract and Empirical Results: The manuscript asserts 'superior performance' on UCR for causal tasks yet supplies no quantitative metrics, baselines, error bars, ablation results, dataset splits, or statistical significance tests. The central claim of outperformance therefore cannot be evaluated from the provided text.

Authors: We agree that the abstract would be strengthened by including concrete quantitative evidence. The full manuscript presents experimental results on UCR benchmarks with baseline comparisons, but we will revise the abstract to summarize key performance metrics, including accuracy improvements on causal tasks, references to error bars, and indications of statistical significance. We will also ensure the experimental section explicitly details dataset splits, ablation studies, and all evaluation protocols so that the empirical claims can be fully assessed. revision: yes

Circularity Check

0 steps flagged

No significant circularity; architecture and protocol are self-contained

full rationale

The paper introduces the HOPE block and Hybrid Decision Head as novel design choices, pre-trains on the external Monash archive using standard MTSM and InfoNCE objectives, then adapts via LP-FT on the external UCR benchmark. No equations, parameters, or performance claims reduce by construction to the inputs or to self-citations. The empirical superiority statements rest on reported results against external datasets rather than any definitional loop or fitted-input-as-prediction pattern. The derivation chain is therefore independent of the target claims.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 2 invented entities

Abstract introduces several new architectural components without external validation or parameter counts; free parameters and implementation details are unspecified.

free parameters (2)

HOPE block hyperparameters
Memory sizes, fusion weights, and learning rates for Titans and CMS not reported.
Hybrid Decision Head mixing coefficient
Weight between deep latents and tsfeatures outputs chosen but value unknown.

axioms (2)

domain assumption Standard attention imposes a prohibitive quadratic computational bottleneck for long time series
Invoked in first sentence of abstract as motivation for dual-memory replacement.
domain assumption Linear Probing followed by Full Fine-Tuning prevents catastrophic forgetting
Stated as the adaptation protocol without supporting derivation.

invented entities (2)

HOPE block no independent evidence
purpose: Dual-memory replacement for attention in time-series foundation models
Newly named architecture combining Titans short-term and CMS long-term modules.
Continuum Memory System (CMS) no independent evidence
purpose: Abstraction of long-term historical context
Introduced as novel long-term memory component.

pith-pipeline@v0.9.0 · 5788 in / 1410 out tokens · 30179 ms · 2026-05-20T11:41:15.917353+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArrowOfTime.lean arrow_from_z echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

HOPE block replaces quadratic attention with dual-memory system: Titans modules for dynamic short-term retention and a Continuum Memory System (CMS) for the abstraction of long-term historical context.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Hybrid Decision Head fuses deep latent representations with deterministic statistical features extracted via tsfeatures

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 2 internal anchors

[1]

Rob Hyndman, Yanfei Kang, Pablo Montero-Manso, Mitchell O’Hara-Wild, Thiyanga Talagala, Earo Wang, and Yangzhuoran Yang.tsfeatures: Time Series Feature Extraction, 2026

work page 2026
[2]

Tracking vital signs of a patient using channel state information and machine learning for a smart healthcare system.Neural Computing and Applications, 37(28):23065–23079, 2025

Muhammad Imran Khan, Mian Ahmad Jan, Yar Muhammad, Dinh-Thuan Do, Ateeq ur Rehman, Con- standinos X Mavromoustakis, and Evangelos Pallis. Tracking vital signs of a patient using channel state information and machine learning for a smart healthcare system.Neural Computing and Applications, 37(28):23065–23079, 2025. 10 KairosHope: A Next-Generation Time-Ser...

work page 2025
[3]

Cook, Göksel Mısırlı, and Zhong Fan

Andrew A. Cook, Göksel Mısırlı, and Zhong Fan. Anomaly detection for iot time-series data: A survey. IEEE Internet of Things Journal, 7(7):6481–6494, 2020

work page 2020
[4]

A survey of explainable artificial intelligence (xai) in financial time series forecasting.ACM Comput

Pierre-Daniel Arsenault, Shengrui Wang, and Jean-Marc Patenaude. A survey of explainable artificial intelligence (xai) in financial time series forecasting.ACM Comput. Surv., 57(10), May 2025

work page 2025
[5]

Using dynamic time warping to find patterns in time series

Donald J Berndt and James Clifford. Using dynamic time warping to find patterns in time series. InProceedings of the 3rd international conference on knowledge discovery and data mining, pages 359–370, 1994

work page 1994
[6]

A shapelet transform for time series classification

Jason Lines, Luke M Davis, Jon Hills, and Anthony Bagnall. A shapelet transform for time series classification. InProceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 289–297, 2012

work page 2012
[7]

Time series classification from scratch with deep neural networks: A strong baseline

Zhiguang Wang, Weizhong Yan, and Tim Oates. Time series classification from scratch with deep neural networks: A strong baseline. In2017 International joint conference on neural networks (IJCNN), pages 1578–1585. IEEE, 2017

work page 2017
[8]

Inceptiontime: Finding alexnet for time series classification.Data mining and knowledge discovery, 34(6):1936–1962, 2020

Hassan Ismail Fawaz, Benjamin Lucas, Germain Forestier, Charlotte Pelletier, Daniel F Schmidt, Jonathan Weber, Geoffrey I Webb, Lhassane Idoumghar, Pierre-Alain Muller, and François Petitjean. Inceptiontime: Finding alexnet for time series classification.Data mining and knowledge discovery, 34(6):1936–1962, 2020

work page 1936
[9]

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. arxiv 2022.arXiv preprint arXiv:2211.14730, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[10]

A decoder-only foundation model for time-series forecasting

Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting.arXiv preprint arXiv:2310.10688, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[11]

Unified training of universal time series forecasting transformers

Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. Unified training of universal time series forecasting transformers. InForty-first International Conference on Machine Learning, 2024

work page 2024
[12]

A time-series foundation model by universal delay embedding.arXiv preprint arXiv:2509.12080, 2025

Zijian Wang, Peng Tao, Jifan Shi, Rui Bao, Rui Liu, and Luonan Chen. A time-series foundation model by universal delay embedding.arXiv preprint arXiv:2509.12080, 2025

work page arXiv 2025
[13]

A unified shape-aware foundation model for time series classification.arXiv preprint arXiv:2601.06429, 2026

Zhen Liu, Yucheng Wang, Boyuan Li, Junhao Zheng, Emadeldeen Eldele, Min Wu, and Qianli Ma. A unified shape-aware foundation model for time series classification.arXiv preprint arXiv:2601.06429, 2026

work page arXiv 2026
[14]

Nested learning: The illusion of deep learning architectures, 2025

Ali Behrouz, Meisam Razaviyayn, Peilin Zhong, and Vahab Mirrokni. Nested learning: The illusion of deep learning architectures, 2025

work page 2025
[15]

Webb, Rob J

Rakshitha Godahewa, Christoph Bergmeir, Geoffrey I. Webb, Rob J. Hyndman, and Pablo Montero- Manso. Monash time series forecasting archive. InNeural Information Processing Systems Track on Datasets and Benchmarks, 2021

work page 2021
[16]

The ucr time series archive.IEEE/CAA Journal of Automatica Sinica, 6(6):1293–1305, 2019

Hoang Anh Dau, Anthony Bagnall, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, and Eamonn Keogh. The ucr time series archive.IEEE/CAA Journal of Automatica Sinica, 6(6):1293–1305, 2019

work page 2019
[17]

Reversible instance normalization for accurate time-series forecasting against distribution shift

Taesung Kim, Jinhee Kim, Yunwon Tae, Cheonbok Park, Jang-Ho Choi, and Jaegul Choo. Reversible instance normalization for accurate time-series forecasting against distribution shift. InInternational Conference on Learning Representations, 2021. 11 KairosHope: A Next-Generation Time-Series Foundation Model for Specialized Classification via Dual-Memory Architecture

work page 2021
[18]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. Transformers: State-of-the-art...

work page 2020
[19]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2023

work page 2023
[20]

Titans: Learning to memorize at test time, 2024

Ali Behrouz, Peilin Zhong, and Vahab Mirrokni. Titans: Learning to memorize at test time, 2024

work page 2024
[21]

Bert: Pre-training of deep bidirectional transformers for language understanding, 2019

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019

work page 2019
[22]

Zimmermann, and Wieland Brendel

Evgenia Rusak, Patrik Reizinger, Attila Juhos, Oliver Bringmann, Roland S. Zimmermann, and Wieland Brendel. Infonce: Identifying the gap between theory and practice, 2025

work page 2025
[23]

Fine-tuning can distort pretrained features and underperform out-of-distribution

Ananya Kumar, Aditi Raghunathan, Robbie Matthew Jones, Tengyu Ma, and Percy Liang. Fine-tuning can distort pretrained features and underperform out-of-distribution. InInternational Conference on Learning Representations, 2022

work page 2022
[24]

Alejandro Pasos Ruiz, Michael Flynn, James Large, Matthew Middlehurst, and Anthony Bagnall. The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.Data Mining and Knowledge Discovery, 35(2):401–449, Mar 2021

work page 2021
[25]

Cao, H.K

F. Cao, H.K. Huang, E. Pietka, and V . Gilsanz. Digital hand atlas and web-based bone age assessment: system design and implementation.Computerized Medical Imaging and Graphics, 24(5):297–307, 2000. 12

work page 2000

[1] [1]

Rob Hyndman, Yanfei Kang, Pablo Montero-Manso, Mitchell O’Hara-Wild, Thiyanga Talagala, Earo Wang, and Yangzhuoran Yang.tsfeatures: Time Series Feature Extraction, 2026

work page 2026

[2] [2]

Tracking vital signs of a patient using channel state information and machine learning for a smart healthcare system.Neural Computing and Applications, 37(28):23065–23079, 2025

Muhammad Imran Khan, Mian Ahmad Jan, Yar Muhammad, Dinh-Thuan Do, Ateeq ur Rehman, Con- standinos X Mavromoustakis, and Evangelos Pallis. Tracking vital signs of a patient using channel state information and machine learning for a smart healthcare system.Neural Computing and Applications, 37(28):23065–23079, 2025. 10 KairosHope: A Next-Generation Time-Ser...

work page 2025

[3] [3]

Cook, Göksel Mısırlı, and Zhong Fan

Andrew A. Cook, Göksel Mısırlı, and Zhong Fan. Anomaly detection for iot time-series data: A survey. IEEE Internet of Things Journal, 7(7):6481–6494, 2020

work page 2020

[4] [4]

A survey of explainable artificial intelligence (xai) in financial time series forecasting.ACM Comput

Pierre-Daniel Arsenault, Shengrui Wang, and Jean-Marc Patenaude. A survey of explainable artificial intelligence (xai) in financial time series forecasting.ACM Comput. Surv., 57(10), May 2025

work page 2025

[5] [5]

Using dynamic time warping to find patterns in time series

Donald J Berndt and James Clifford. Using dynamic time warping to find patterns in time series. InProceedings of the 3rd international conference on knowledge discovery and data mining, pages 359–370, 1994

work page 1994

[6] [6]

A shapelet transform for time series classification

Jason Lines, Luke M Davis, Jon Hills, and Anthony Bagnall. A shapelet transform for time series classification. InProceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 289–297, 2012

work page 2012

[7] [7]

Time series classification from scratch with deep neural networks: A strong baseline

Zhiguang Wang, Weizhong Yan, and Tim Oates. Time series classification from scratch with deep neural networks: A strong baseline. In2017 International joint conference on neural networks (IJCNN), pages 1578–1585. IEEE, 2017

work page 2017

[8] [8]

Inceptiontime: Finding alexnet for time series classification.Data mining and knowledge discovery, 34(6):1936–1962, 2020

Hassan Ismail Fawaz, Benjamin Lucas, Germain Forestier, Charlotte Pelletier, Daniel F Schmidt, Jonathan Weber, Geoffrey I Webb, Lhassane Idoumghar, Pierre-Alain Muller, and François Petitjean. Inceptiontime: Finding alexnet for time series classification.Data mining and knowledge discovery, 34(6):1936–1962, 2020

work page 1936

[9] [9]

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. arxiv 2022.arXiv preprint arXiv:2211.14730, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[10] [10]

A decoder-only foundation model for time-series forecasting

Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting.arXiv preprint arXiv:2310.10688, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[11] [11]

Unified training of universal time series forecasting transformers

Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. Unified training of universal time series forecasting transformers. InForty-first International Conference on Machine Learning, 2024

work page 2024

[12] [12]

A time-series foundation model by universal delay embedding.arXiv preprint arXiv:2509.12080, 2025

Zijian Wang, Peng Tao, Jifan Shi, Rui Bao, Rui Liu, and Luonan Chen. A time-series foundation model by universal delay embedding.arXiv preprint arXiv:2509.12080, 2025

work page arXiv 2025

[13] [13]

A unified shape-aware foundation model for time series classification.arXiv preprint arXiv:2601.06429, 2026

Zhen Liu, Yucheng Wang, Boyuan Li, Junhao Zheng, Emadeldeen Eldele, Min Wu, and Qianli Ma. A unified shape-aware foundation model for time series classification.arXiv preprint arXiv:2601.06429, 2026

work page arXiv 2026

[14] [14]

Nested learning: The illusion of deep learning architectures, 2025

Ali Behrouz, Meisam Razaviyayn, Peilin Zhong, and Vahab Mirrokni. Nested learning: The illusion of deep learning architectures, 2025

work page 2025

[15] [15]

Webb, Rob J

Rakshitha Godahewa, Christoph Bergmeir, Geoffrey I. Webb, Rob J. Hyndman, and Pablo Montero- Manso. Monash time series forecasting archive. InNeural Information Processing Systems Track on Datasets and Benchmarks, 2021

work page 2021

[16] [16]

The ucr time series archive.IEEE/CAA Journal of Automatica Sinica, 6(6):1293–1305, 2019

Hoang Anh Dau, Anthony Bagnall, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, and Eamonn Keogh. The ucr time series archive.IEEE/CAA Journal of Automatica Sinica, 6(6):1293–1305, 2019

work page 2019

[17] [17]

Reversible instance normalization for accurate time-series forecasting against distribution shift

Taesung Kim, Jinhee Kim, Yunwon Tae, Cheonbok Park, Jang-Ho Choi, and Jaegul Choo. Reversible instance normalization for accurate time-series forecasting against distribution shift. InInternational Conference on Learning Representations, 2021. 11 KairosHope: A Next-Generation Time-Series Foundation Model for Specialized Classification via Dual-Memory Architecture

work page 2021

[18] [18]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. Transformers: State-of-the-art...

work page 2020

[19] [19]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2023

work page 2023

[20] [20]

Titans: Learning to memorize at test time, 2024

Ali Behrouz, Peilin Zhong, and Vahab Mirrokni. Titans: Learning to memorize at test time, 2024

work page 2024

[21] [21]

Bert: Pre-training of deep bidirectional transformers for language understanding, 2019

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019

work page 2019

[22] [22]

Zimmermann, and Wieland Brendel

Evgenia Rusak, Patrik Reizinger, Attila Juhos, Oliver Bringmann, Roland S. Zimmermann, and Wieland Brendel. Infonce: Identifying the gap between theory and practice, 2025

work page 2025

[23] [23]

Fine-tuning can distort pretrained features and underperform out-of-distribution

Ananya Kumar, Aditi Raghunathan, Robbie Matthew Jones, Tengyu Ma, and Percy Liang. Fine-tuning can distort pretrained features and underperform out-of-distribution. InInternational Conference on Learning Representations, 2022

work page 2022

[24] [24]

Alejandro Pasos Ruiz, Michael Flynn, James Large, Matthew Middlehurst, and Anthony Bagnall. The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.Data Mining and Knowledge Discovery, 35(2):401–449, Mar 2021

work page 2021

[25] [25]

Cao, H.K

F. Cao, H.K. Huang, E. Pietka, and V . Gilsanz. Digital hand atlas and web-based bone age assessment: system design and implementation.Computerized Medical Imaging and Graphics, 24(5):297–307, 2000. 12

work page 2000