Superposition Is Not Necessary: A Mechanistic Interpretability Analysis of Transformer Representations for Time Series Forecasting
Pith reviewed 2026-05-08 16:32 UTC · model grok-4.3
The pith
Transformer representations for time series forecasting do not rely on superposition.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A narrow single-layer transformer already matches the accuracy of deeper PatchTST configurations on standard forecasting benchmarks. Sparse autoencoders fitted to its post-GELU FFN activations produce representations that remain sparse, show negligible performance change when the dictionary is expanded to 4x overcompleteness, and yield only minimal forecast shifts when dominant latents are causally edited.
What carries the argument
Sparse autoencoders trained on post-GELU intermediate FFN activations, varied from 0.5x to 4.0x overcomplete to test whether representations depend on superposition.
If this is right
- Superposition is not necessary for competitive performance on standard time series forecasting benchmarks.
- Representations remain sparse and stable under aggressive dictionary expansion.
- Large portions of overcomplete dictionaries remain inactive.
- Targeted causal interventions on dominant latent features produce minimal forecast perturbation.
Where Pith is reading between the lines
- Time series data may present fewer overlapping features than language, lowering the pressure to use superposition.
- Mechanistic tools developed for language models can be reused to compare representational strategies across domains.
- Simpler architectures could be sufficient for forecasting once the absence of superposition is confirmed.
Load-bearing premise
The sparse autoencoders accurately detect whether superposition is present or absent in the FFN activations.
What would settle it
Finding that larger dictionaries or alternative intervention methods activate new features whose removal substantially worsens forecast accuracy would contradict the claim.
read the original abstract
Transformer architectures have been widely adopted for time series forecasting, yet whether the representational mechanisms that make them powerful in NLP actually engage on time series data remains unexplored. The persistent competitiveness of simple linear models such as DLinear has fueled ongoing debate, but no mechanistic explanation for this phenomenon has been offered. We address this gap by applying sparse autoencoders (SAEs), a tool from mechanistic interpretability, to probe the internal representations of PatchTST. We first establish that a single-layer, narrow-dimensional transformer matches the forecasting performance of deeper configurations across commonly used benchmarks. We then train SAEs on the post-GELU intermediate FFN activations with dictionary sizes ranging from 0.5x to 4.0x the native dimensionality. Expanding the dictionary yields negligible downstream performance change (average 0.214%), with large portions of overcomplete dictionaries remaining inactive. Targeted causal interventions on dominant latent features produce minimal forecast perturbation. Across all evaluated settings, we observe no empirical evidence that the analyzed FFN representations rely on strong superposition. Instead, the representations remain sparse, stable under aggressive dictionary expansion, and largely insensitive to latent interventions. These results demonstrate that superposition is not necessary for competitive performance on standard forecasting benchmarks, suggesting they may not demand the rich compositional representations that drive transformer success in language modeling, and helping explain the persistent competitiveness of simple linear models
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper applies sparse autoencoders (SAEs) to post-GELU FFN activations in a PatchTST model for time series forecasting. It reports that dictionary sizes from 0.5x to 4x native dimension produce negligible downstream performance change (avg. 0.214%), large inactive feature fractions in overcomplete dictionaries, and minimal forecast perturbation under targeted causal interventions on dominant latents. The central claim is that these representations exhibit no strong superposition, are sparse and stable, and that superposition is therefore not necessary for competitive performance on standard forecasting benchmarks (in contrast to language modeling).
Significance. If the empirical findings are robust, the work supplies a mechanistic account for the persistent competitiveness of linear baselines such as DLinear against transformers on time-series tasks. It demonstrates that standard MI tools (SAEs plus causal interventions) can be productively transferred to a non-language domain and yields a falsifiable negative result: the absence of superposition under dictionary expansion and intervention. This could inform both architecture design and the scope of interpretability methods outside NLP.
major comments (2)
- [SAE training and evaluation (results on dictionary expansion)] The manuscript provides no reconstruction MSE, variance explained, or L0 sparsity curves for the SAEs at any dictionary size. Because the central negative claim (no strong superposition) rests on the premise that the learned dictionaries faithfully recover the structure of the post-GELU activations, the absence of these fidelity diagnostics leaves open the possibility that the reported stability and inactivity are artifacts of underfit or low-capacity autoencoders rather than properties of the transformer representations themselves.
- [Experimental results on interventions and performance] The reported average performance change of 0.214% is presented without accompanying statistical tests, per-run variance, or explicit controls for SAE reconstruction quality. This weakens the claim that the representations are 'largely insensitive to latent interventions' and 'stable under aggressive dictionary expansion,' as the magnitude of change cannot be assessed against noise or baseline variability.
minor comments (2)
- [Abstract and §4] The abstract and main text should specify the exact forecasting metric (MSE, MAE, etc.) underlying the 0.214% figure and the precise benchmark datasets used.
- [Introduction / preliminary experiments] The statement that a single-layer narrow transformer matches deeper configurations requires an explicit reference to the supporting table or figure.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments, which identify important opportunities to strengthen the rigor of our analysis. We address each major comment below and commit to revisions that directly incorporate the requested diagnostics and statistical controls.
read point-by-point responses
-
Referee: The manuscript provides no reconstruction MSE, variance explained, or L0 sparsity curves for the SAEs at any dictionary size. Because the central negative claim (no strong superposition) rests on the premise that the learned dictionaries faithfully recover the structure of the post-GELU activations, the absence of these fidelity diagnostics leaves open the possibility that the reported stability and inactivity are artifacts of underfit or low-capacity autoencoders rather than properties of the transformer representations themselves.
Authors: We agree that explicit fidelity metrics are necessary to substantiate the claim that the SAEs recover the true structure of the post-GELU activations. The large fractions of inactive features we already report for overcomplete dictionaries provide indirect evidence against severe underfitting, since additional capacity remains unused. To address the concern directly, we will add reconstruction MSE, variance explained, and L0 sparsity curves across all dictionary sizes (0.5x–4x) to a new appendix section in the revised manuscript. These diagnostics will allow readers to evaluate SAE quality independently and confirm that the observed stability and inactivity reflect properties of the transformer representations rather than SAE limitations. revision: yes
-
Referee: The reported average performance change of 0.214% is presented without accompanying statistical tests, per-run variance, or explicit controls for SAE reconstruction quality. This weakens the claim that the representations are 'largely insensitive to latent interventions' and 'stable under aggressive dictionary expansion,' as the magnitude of change cannot be assessed against noise or baseline variability.
Authors: We acknowledge that the 0.214% figure requires statistical context to support claims of stability and insensitivity. In the revised manuscript we will report per-run variances, standard deviations across seeds, and results of appropriate statistical tests (e.g., paired tests across multiple runs) for both the dictionary-expansion and intervention experiments. We will further add an analysis that correlates SAE reconstruction quality with the magnitude of forecast perturbations, providing explicit controls for reconstruction fidelity. These additions will place the small observed changes in proper perspective relative to experimental noise. revision: yes
Circularity Check
Empirical SAE analysis yields observational claim with no derivational reduction
full rationale
The paper advances an empirical claim that FFN representations in PatchTST lack strong superposition, supported by direct measurements of downstream performance stability (avg. 0.214% change), inactive dictionary fractions, and intervention insensitivity after training SAEs on post-GELU activations at 0.5x–4x expansion. No mathematical derivation, first-principles prediction, or self-referential definition is presented whose output reduces to its inputs by construction. The analysis is self-contained as an experimental probe; external SAE fidelity concerns affect validity but do not constitute circularity under the specified patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Sparse autoencoders trained on post-GELU activations can accurately recover whether superposition is present in the model's representations.
Reference graph
Works this paper leans on
-
[1]
International Conference on Learning Representations (ICLR) , year=
A Time Series is Worth 64 Words: Long-term Forecasting with Transformers , author=. International Conference on Learning Representations (ICLR) , year=
-
[2]
AAAI Conference on Artificial Intelligence , year=
Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting , author=. AAAI Conference on Artificial Intelligence , year=
-
[3]
Advances in Neural Information Processing Systems (NeurIPS) , year=
Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=
-
[4]
Zhou, Tian and Ma, Ziqing and Wen, Qingsong and Wang, Xue and Sun, Liang and Jin, Rong , booktitle=
-
[5]
Liu, Yong and Hu, Tengge and Zhang, Haoran and Wu, Haixu and Wang, Shiyu and Ma, Lintao and Long, Mingsheng , booktitle=
-
[6]
Wu, Haixu and Hu, Tengge and Liu, Yong and Zhou, Hang and Wang, Jianmin and Long, Mingsheng , booktitle=
-
[7]
AAAI Conference on Artificial Intelligence , year=
Are Transformers Effective for Time Series Forecasting? , author=. AAAI Conference on Artificial Intelligence , year=
-
[8]
Transformer Circuits Thread , year=
Toy Models of Superposition , author=. Transformer Circuits Thread , year=
-
[9]
Transformer Circuits Thread , year=
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning , author=. Transformer Circuits Thread , year=
-
[10]
Scaling Monosemanticity: Extracting Interpretable Features from
Templeton, Adly and Conerly, Tom and Marcus, Jonathan and Lindsey, Jack and Bricken, Trenton and Chen, Brian and Pearce, Adam and Citro, Craig and Ameber, Emmanuel and Jones, Andy and others , journal=. Scaling Monosemanticity: Extracting Interpretable Features from. 2024 , url=
work page 2024
-
[11]
International Conference on Learning Representations (ICLR) , year=
Sparse Autoencoders Find Highly Interpretable Linguistic Features in Language Models , author=. International Conference on Learning Representations (ICLR) , year=
- [12]
-
[13]
International Conference on Learning Representations (ICLR) , year=
Progress Measures for Grokking via Mechanistic Interpretability , author=. International Conference on Learning Representations (ICLR) , year=
-
[14]
Advances in Neural Information Processing Systems (NeurIPS) , year=
Attention Is All You Need , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=
-
[15]
Su, Jianlin and Lu, Yu and Pan, Shengfeng and Murtadha, Ahmed and Wen, Bo and Liu, Yunfeng , journal=
-
[16]
International Conference on Learning Representations (ICLR) , year=
Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift , author=. International Conference on Learning Representations (ICLR) , year=
-
[17]
Advances in Neural Information Processing Systems (NeurIPS) , year=
Root Mean Square Layer Normalization , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=
-
[18]
Das, Abhimanyu and Kong, Weihao and Leber, Andrew and Mathews, Rajat and Sen, Raqjat , booktitle=. Long-term Forecasting with
-
[19]
Oreshkin, Boris N and Carpov, Dmitri and Chapados, Nicolas and Bengio, Yoshua , journal=
-
[20]
Tenney, Ian and Das, Dipanjan and Pavlick, Ellie , booktitle=
-
[21]
Niu, Jingcheng and Lu, Wenjie and Penn, Gerald , booktitle=. Does
-
[22]
Communications in Computer and Information Science , publisher=
Mechanistic Interpretability for Transformer-Based Time Series Classification , author=. Communications in Computer and Information Science , publisher=
-
[23]
Sparse autoencoders can interpret randomly initialized transformers
Sparse Autoencoders Can Interpret Randomly Initialized Transformers , author=. arXiv preprint arXiv:2501.17727 , year=
-
[24]
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models , author=. arXiv preprint arXiv:2504.02821 , year=
-
[25]
Position: There are no champions in long-term time series forecasting
There are no Champions in Supervised Long-Term Time Series Forecasting , author=. arXiv preprint arXiv:2502.14045 , year=
- [26]
-
[27]
Chen, Si-An and Li, Chun-Liang and Yoder, Nate and Arik, Sercan O. and Pfister, Tomas , journal=
-
[28]
Neural Information Processing Systems Track on Datasets and Benchmarks , year=
Monash Time Series Forecasting Archive , author=. Neural Information Processing Systems Track on Datasets and Benchmarks , year=
-
[29]
Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks , author=. The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval , pages=
-
[30]
ICLR 2026 Workshop on Time Series in the Age of Large Models (TSALM) , year=
Finding the Zeitgeist in Time Series Foundation Models , author=. ICLR 2026 Workshop on Time Series in the Age of Large Models (TSALM) , year=
work page 2026
-
[31]
arXiv preprint arXiv:2603.10071 , year=
Dissecting Chronos: Sparse Autoencoders Reveal Causal Feature Hierarchies in Time Series Foundation Models , author=. arXiv preprint arXiv:2603.10071 , year=
-
[32]
International Journal of Forecasting , volume=
Temporal Fusion Transformers for interpretable multi-horizon time series forecasting , author=. International Journal of Forecasting , volume=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.