Recognition: no theorem link
Non-Stationarity in the Embedding Space of Time Series Foundation Models
Pith reviewed 2026-05-10 19:28 UTC · model grok-4.3
The pith
Time series foundation models show smooth degradation in detecting non-stationarity within their embedding spaces as shift strength increases.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Different forms of distributional non-stationarity including mean shifts, variance changes, and linear trends, plus temporal non-stationarity from persistence, become linearly accessible in TSFM embedding spaces under controlled conditions, yet this detectability degrades smoothly as shift strength grows and different models display distinct failure modes.
What carries the argument
Linear probes applied to embeddings from synthetic time series containing controlled mean shifts, variance changes, linear trends, and persistence violations.
If this is right
- Detectability of non-stationarity in embeddings is gradual and scales with shift magnitude instead of being binary.
- Each TSFM exhibits model-specific sensitivities and blind spots for particular non-stationarity types.
- Classical SPC-style diagnostics for mean, variance, and trend changes can be partially recovered from embedding spaces.
- Model choice for monitoring applications depends on the expected non-stationarity forms in the target data.
Where Pith is reading between the lines
- Applications in anomaly detection may benefit from testing multiple TSFMs to cover different non-stationarity types.
- Embeddings could be post-processed with explicit stationarity detectors to compensate for model-specific gaps.
- Training objectives for future TSFMs might include explicit preservation of non-stationarity signals.
Load-bearing premise
The controlled synthetic non-stationarities sufficiently represent the forms of non-stationarity encountered in real-world time series data and linear probes are the appropriate measure of accessibility.
What would settle it
An observation of abrupt rather than smooth drops in detectability, or uniform failure modes across models, when applying the same linear probes to real time series with independently verified non-stationarities would challenge the findings.
Figures
read the original abstract
Time series foundation models (TSFMs) are widely used as generic feature extractors, yet the notion of non-stationarity in their embedding spaces remains poorly understood. Recent work often conflates non-stationarity with distribution shift, blurring distinctions fundamental to classical time-series analysis and long-standing methodologies such as statistical process control (SPC). In SPC, non-stationarity signals a process leaving a stable regime - via shifts in mean, variance, or emerging trends - and detecting such departures is central to quality monitoring and change-point analysis. Motivated by this diagnostic tradition, we study how different forms of distributional non-stationarity - mean shifts, variance changes, and linear trends - become linearly accessible in TSFM embedding spaces under controlled conditions. We further examine temporal non-stationarity arising from persistence, which reflects violations of weak stationarity due to long-memory or near-unit-root behavior rather than explicit distributional shifts. By sweeping shift strength and probing multiple TSFMs, we find that embedding-space detectability of non-stationarity degrades smoothly and that different models exhibit distinct, model-specific failure modes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper examines how different forms of non-stationarity (mean shifts, variance changes, linear trends, and persistence) manifest as linearly accessible features in the embedding spaces of time series foundation models (TSFMs). Using controlled synthetic injections and linear probes across multiple models, it reports that detectability degrades smoothly with increasing shift strength and that models display distinct, model-specific failure modes. The work distinguishes non-stationarity from general distribution shift and draws motivation from statistical process control traditions.
Significance. If the central empirical findings hold under more general conditions, the results would clarify the diagnostic capabilities of TSFM embeddings for change-point and regime-shift detection tasks, with potential value for applications in quality monitoring and time-series analysis. The controlled sweep over shift strength and comparison across models is a positive design element that allows quantitative characterization of degradation behavior.
major comments (2)
- [Experimental setup and probing methodology (as outlined in abstract)] The central claim that embedding-space detectability of non-stationarity 'degrades smoothly' and exhibits 'model-specific failure modes' rests exclusively on linear probes applied to four families of synthetic injections (mean shifts, variance changes, linear trends, persistence). This setup does not demonstrate that linear separability is an adequate proxy for accessibility, as any non-linear encoding of non-stationarity in the embeddings would remain invisible to the reported probes.
- [Data generation and non-stationarity definitions] The chosen synthetic generators (additive mean/variance shifts, linear trends, persistence) do not span common real-world non-stationarities such as regime-switching, multiplicative seasonality, or non-additive trend-noise interactions. Without evidence that these synthetics are representative of the data regimes where TSFMs are deployed, the reported degradation behavior cannot be generalized to 'embedding-space detectability' in general.
minor comments (2)
- Clarify the exact definition of 'linear accessibility' and the training procedure for the probes (e.g., whether probes are trained on held-out data or use the full embedding set).
- Provide quantitative details on the number of models tested, the range of shift strengths, and the statistical tests used to establish 'smooth degradation' and 'distinct failure modes'.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and detailed comments. We address each major point below, clarifying the intentional scope of our controlled study while agreeing to strengthen the manuscript with additional discussion of limitations and generalizability.
read point-by-point responses
-
Referee: [Experimental setup and probing methodology (as outlined in abstract)] The central claim that embedding-space detectability of non-stationarity 'degrades smoothly' and exhibits 'model-specific failure modes' rests exclusively on linear probes applied to four families of synthetic injections (mean shifts, variance changes, linear trends, persistence). This setup does not demonstrate that linear separability is an adequate proxy for accessibility, as any non-linear encoding of non-stationarity in the embeddings would remain invisible to the reported probes.
Authors: We thank the referee for highlighting this distinction. Our claims are explicitly limited to linear accessibility and detectability, as stated throughout the abstract, introduction, and title. Linear probes were chosen deliberately because they provide a direct, quantifiable measure of whether non-stationarity signals are present in a linearly decodable form, aligning with the linear statistical tools central to statistical process control. We agree that this does not rule out non-linear encodings and will revise the manuscript to (i) restate the linear focus more prominently in the abstract and methods, and (ii) add a dedicated limitations paragraph acknowledging that non-linear probes could reveal additional structure and suggesting this as future work. revision: partial
-
Referee: [Data generation and non-stationarity definitions] The chosen synthetic generators (additive mean/variance shifts, linear trends, persistence) do not span common real-world non-stationarities such as regime-switching, multiplicative seasonality, or non-additive trend-noise interactions. Without evidence that these synthetics are representative of the data regimes where TSFMs are deployed, the reported degradation behavior cannot be generalized to 'embedding-space detectability' in general.
Authors: The synthetic generators were selected to isolate specific, well-defined forms of non-stationarity (mean shifts, variance changes, linear trends, and persistence) so that degradation with shift strength could be measured cleanly and model-specific failure modes identified. This controlled design follows directly from the SPC motivation in the introduction. We do not claim these cover all real-world non-stationarities, and we will revise the manuscript to (i) add an explicit scope statement in the introduction and conclusion, (ii) discuss how the chosen forms relate to (but do not exhaust) phenomena such as regime-switching, and (iii) note that broader generalization would require additional experiments on real-world datasets with mixed non-stationarities. revision: partial
Circularity Check
No circularity: purely empirical experimental reporting
full rationale
The paper conducts controlled synthetic injections of non-stationarity (mean shifts, variance changes, linear trends, persistence) into time series, extracts embeddings from several TSFMs, and measures linear probe performance as shift strength varies. All reported findings are direct experimental outcomes from these sweeps; no equations, fitted parameters, or predictions are presented that reduce by construction to the input generators or probe definitions. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The derivation chain is therefore self-contained observational reporting rather than any of the enumerated circular patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[2]
Chronos-2: From Univariate to Universal Forecasting
URL https://arxiv.org/abs/2510.15821. Andreas Auer, Patrick Podest, Daniel Klotz, Sebastian B ¨ock, G ¨unter Klambauer, and Sepp Hochre- iter. Tirex: Zero-shot forecasting across long and short horizons with enhanced in-context learning. (arXiv:2505.23719), May
work page internal anchor Pith review arXiv
-
[3]
Tirex: Zero-shot forecasting across long and short horizons with enhanced in-context learning
doi: 10.48550/arXiv.2505.23719. URLhttp://arxiv.org/abs/ 2505.23719. arXiv:2505.23719 [cs]. J.D. Cryer and K.S. Chan.Time Series Analysis: With Applications in R. Springer Texts in Statistics. Springer New York,
-
[4]
URLhttp://www.jstor.org/stable/2286348
ISSN 01621459, 1537274X. URLhttp://www.jstor.org/stable/2286348. Wei Fan, Pengyang Wang, Dongkun Wang, Dongjie Wang, Yuanchun Zhou, and Yanjie Fu. Dish-ts: A general paradigm for alleviating distribution shift in time series forecasting. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp. 7522–7529,
-
[5]
5 ICLR 2026 Workshop on Time Series in the Age of Large Models (TSALM) Denis Kwiatkowski, Peter CB Phillips, Peter Schmidt, and Yongcheol Shin
URLhttps://openreview.net/forum?id= cGDAkQo1C0p. 5 ICLR 2026 Workshop on Time Series in the Age of Large Models (TSALM) Denis Kwiatkowski, Peter CB Phillips, Peter Schmidt, and Yongcheol Shin. Testing the null hypothesis of stationarity against the alternative of a unit root. how sure are we that economic time series have a unit root?Journal of Econometri...
2026
-
[6]
doi: 10.1016/0304-4076(92) 90104-Y
ISSN 0304-4076. doi: 10.1016/0304-4076(92) 90104-Y. Yuxuan Liang, Haomin Wen, Yuqi Nie, Yushan Jiang, Ming Jin, Dongjin Song, Shirui Pan, and Qingsong Wen. Foundation models for time series analysis: A tutorial and survey. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’24, pp. 6555–6565, New York, NY , USA,
-
[7]
A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models
Association for Computing Machinery. ISBN 9798400704901. doi: 10.1145/3637528. 3671451. URLhttps://doi.org/10.1145/3637528.3671451. Peiyuan Liu, Beiliang Wu, Yifan Hu, Naiqi Li, Tao Dai, Jigang Bao, and Shu-Tao Xia. Timebridge: Non- stationarity matters for long-term time series forecasting.International Conference on Machine Learning,
-
[8]
URLhttp://www.jstor.org/stable/2336182
ISSN 00063444. URLhttp://www.jstor.org/stable/2336182. Lina Sj¨osten. A comparative study of the kpss and adf tests in terms of size and power,
-
[9]
Timemixer++: A general time series pattern machine for universal predictive analysis
ISSN 2835-8856. URL https://openreview.net/forum?id=QlTLkH6xRC. Shiyu Wang, Jiawei Li, Xiaoming Shi, Zhou Ye, Baichuan Mo, Wenze Lin, Shengtong Ju, Zhixuan Chu, and Ming Jin. Timemixer++: A general time series pattern machine for universal predictive analysis.arXiv preprint arXiv:2410.16032,
-
[10]
All experiments are conducted at the window level with sequence lengthL=
6 ICLR 2026 Workshop on Time Series in the Age of Large Models (TSALM) A DATAGENERATINGPROCESS We generate synthetic time series using a controlled AR(1) process to study how distributional and temporal non-stationarity manifest in embedding space. All experiments are conducted at the window level with sequence lengthL=
2026
-
[11]
Unless otherwise specified, we use ϕ= 0.6
A.1 BASELINEAR(1) PROCESS The stationary baseline is defined as an AR(1) process xt =µ+ϕ(x t−1 −µ) +ε t, ε t ∼ N(0, σ 2), whereµ= 0.5,σ= 0.06, and|ϕ|<1ensures weak stationarity. Unless otherwise specified, we use ϕ= 0.6. This baseline defines thestationaryclass. A.2 DISTRIBUTIONALSHIFTTYPES We consider three forms of distributional non-stationarity applie...
2026
-
[12]
These models were selected to span diverse architectural paradigms and training objectives while enabling consistent extraction of window-level embeddings
Shift structure Half-window change (continuous) Trend SlopeαUniform(0.3s,0.6s)·sign Trend form Additive linear ramp Shift Strength Strength levels{1.0,0.7,0.5,0.35,0.25,0.18,0.12,0.08} Interpretations= 1: strongest shift,s→0: indistinguishable B MODELS We evaluate three representative time series foundation models (TSFMs): Chronos2 (Ansari et al., 2025), ...
2025
-
[13]
is a pretrained time series foundation model designed for universal forecasting across univariate and multivariate settings.1 It extends the Chronos family with group-attention mechanisms that enable cross-series information sharing and in-context learning. The model processes normalized input 1https://github.com/amazon-science/chronos-forecasting 9 ICLR ...
2026
-
[14]
is a family of open-source foundation models for general-purpose time series analysis, trained via large-scale multi-dataset pretraining. 2 It learns representations through self- supervised objectives such as masked reconstruction, enabling a single model to support forecasting, classi- fication, anomaly detection, and imputation tasks. By reconstructing...
2026
-
[15]
B.5 WHYTHESETSFMS? We selected these models for three primary reasons
These features train a logistic regression model to perform shift type classification. B.5 WHYTHESETSFMS? We selected these models for three primary reasons. (1) Comparable embedding extraction.All three models provide encoder outputs that can be converted into fixed-length window embeddings without task-specific fine-tuning, enabling consistent represent...
2026
-
[16]
Longer win- dows consistently improve separability for all methods, reflecting the benefit of additional temporal context
C.2.1 FIXEDPERSISTENCE(ϕ= 0.6) Table 3 summarizes Macro-F1 across sequence lengths under fixed persistence (ϕ= 0.6). Longer win- dows consistently improve separability for all methods, reflecting the benefit of additional temporal context. Importantly, the qualitative model ranking is unchanged acrossL, and the weak-shift regime (s= 0.12) continues to rev...
2026
-
[17]
These formulations further reinforce the view of non-stationarity as a form of distribution shift that disrupts stable representation learning
distinguishes betweenintra-space shift, referring to temporal changes within a single representation space, andinter-space shift, describing misalignment across representations learned under different temporal regimes. These formulations further reinforce the view of non-stationarity as a form of distribution shift that disrupts stable representation lear...
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.