Superposition Is Not Necessary: A Mechanistic Interpretability Analysis of Transformer Representations for Time Series Forecasting

Alper Y{\i}ld{\i}r{\i}m

arxiv: 2605.05151 · v1 · submitted 2026-05-06 · 💻 cs.LG · cs.AI

Superposition Is Not Necessary: A Mechanistic Interpretability Analysis of Transformer Representations for Time Series Forecasting

Alper Y{\i}ld{\i}r{\i}m This is my paper

Pith reviewed 2026-05-08 16:32 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords superpositionsparse autoencodersmechanistic interpretabilitytime series forecastingtransformerPatchTSTfeed-forward networksrepresentational analysis

0 comments

The pith

Transformer representations for time series forecasting do not rely on superposition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper trains sparse autoencoders on the post-GELU activations of the feed-forward layers inside PatchTST, a transformer used for time series forecasting. Across dictionary sizes from half to four times the native width, expanding the dictionary leaves performance almost unchanged while most extra features stay inactive. Interventions that edit the dominant latents also leave forecasts nearly untouched. These patterns indicate that the representations stay sparse and do not depend on the overlapping features that superposition would produce. The finding supplies a mechanistic reason why simple linear models remain competitive on the same benchmarks.

Core claim

A narrow single-layer transformer already matches the accuracy of deeper PatchTST configurations on standard forecasting benchmarks. Sparse autoencoders fitted to its post-GELU FFN activations produce representations that remain sparse, show negligible performance change when the dictionary is expanded to 4x overcompleteness, and yield only minimal forecast shifts when dominant latents are causally edited.

What carries the argument

Sparse autoencoders trained on post-GELU intermediate FFN activations, varied from 0.5x to 4.0x overcomplete to test whether representations depend on superposition.

If this is right

Superposition is not necessary for competitive performance on standard time series forecasting benchmarks.
Representations remain sparse and stable under aggressive dictionary expansion.
Large portions of overcomplete dictionaries remain inactive.
Targeted causal interventions on dominant latent features produce minimal forecast perturbation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Time series data may present fewer overlapping features than language, lowering the pressure to use superposition.
Mechanistic tools developed for language models can be reused to compare representational strategies across domains.
Simpler architectures could be sufficient for forecasting once the absence of superposition is confirmed.

Load-bearing premise

The sparse autoencoders accurately detect whether superposition is present or absent in the FFN activations.

What would settle it

Finding that larger dictionaries or alternative intervention methods activate new features whose removal substantially worsens forecast accuracy would contradict the claim.

read the original abstract

Transformer architectures have been widely adopted for time series forecasting, yet whether the representational mechanisms that make them powerful in NLP actually engage on time series data remains unexplored. The persistent competitiveness of simple linear models such as DLinear has fueled ongoing debate, but no mechanistic explanation for this phenomenon has been offered. We address this gap by applying sparse autoencoders (SAEs), a tool from mechanistic interpretability, to probe the internal representations of PatchTST. We first establish that a single-layer, narrow-dimensional transformer matches the forecasting performance of deeper configurations across commonly used benchmarks. We then train SAEs on the post-GELU intermediate FFN activations with dictionary sizes ranging from 0.5x to 4.0x the native dimensionality. Expanding the dictionary yields negligible downstream performance change (average 0.214%), with large portions of overcomplete dictionaries remaining inactive. Targeted causal interventions on dominant latent features produce minimal forecast perturbation. Across all evaluated settings, we observe no empirical evidence that the analyzed FFN representations rely on strong superposition. Instead, the representations remain sparse, stable under aggressive dictionary expansion, and largely insensitive to latent interventions. These results demonstrate that superposition is not necessary for competitive performance on standard forecasting benchmarks, suggesting they may not demand the rich compositional representations that drive transformer success in language modeling, and helping explain the persistent competitiveness of simple linear models

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper finds no evidence of superposition in PatchTST FFN layers on forecasting tasks via SAEs and interventions, but the analysis skips basic checks on whether the autoencoders actually reconstruct the activations.

read the letter

The main takeaway is that the authors probe PatchTST's post-GELU FFN activations with sparse autoencoders and see sparse, stable representations that do not change much when the dictionary expands or when they intervene on the learned features. Performance shifts average just 0.214 percent, many features stay inactive, and forecasts hold up. This lines up with why simple linear models stay competitive on these benchmarks and suggests time-series transformers may not need the dense overlapping features common in language models.

Referee Report

2 major / 2 minor

Summary. The paper applies sparse autoencoders (SAEs) to post-GELU FFN activations in a PatchTST model for time series forecasting. It reports that dictionary sizes from 0.5x to 4x native dimension produce negligible downstream performance change (avg. 0.214%), large inactive feature fractions in overcomplete dictionaries, and minimal forecast perturbation under targeted causal interventions on dominant latents. The central claim is that these representations exhibit no strong superposition, are sparse and stable, and that superposition is therefore not necessary for competitive performance on standard forecasting benchmarks (in contrast to language modeling).

Significance. If the empirical findings are robust, the work supplies a mechanistic account for the persistent competitiveness of linear baselines such as DLinear against transformers on time-series tasks. It demonstrates that standard MI tools (SAEs plus causal interventions) can be productively transferred to a non-language domain and yields a falsifiable negative result: the absence of superposition under dictionary expansion and intervention. This could inform both architecture design and the scope of interpretability methods outside NLP.

major comments (2)

[SAE training and evaluation (results on dictionary expansion)] The manuscript provides no reconstruction MSE, variance explained, or L0 sparsity curves for the SAEs at any dictionary size. Because the central negative claim (no strong superposition) rests on the premise that the learned dictionaries faithfully recover the structure of the post-GELU activations, the absence of these fidelity diagnostics leaves open the possibility that the reported stability and inactivity are artifacts of underfit or low-capacity autoencoders rather than properties of the transformer representations themselves.
[Experimental results on interventions and performance] The reported average performance change of 0.214% is presented without accompanying statistical tests, per-run variance, or explicit controls for SAE reconstruction quality. This weakens the claim that the representations are 'largely insensitive to latent interventions' and 'stable under aggressive dictionary expansion,' as the magnitude of change cannot be assessed against noise or baseline variability.

minor comments (2)

[Abstract and §4] The abstract and main text should specify the exact forecasting metric (MSE, MAE, etc.) underlying the 0.214% figure and the precise benchmark datasets used.
[Introduction / preliminary experiments] The statement that a single-layer narrow transformer matches deeper configurations requires an explicit reference to the supporting table or figure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which identify important opportunities to strengthen the rigor of our analysis. We address each major comment below and commit to revisions that directly incorporate the requested diagnostics and statistical controls.

read point-by-point responses

Referee: The manuscript provides no reconstruction MSE, variance explained, or L0 sparsity curves for the SAEs at any dictionary size. Because the central negative claim (no strong superposition) rests on the premise that the learned dictionaries faithfully recover the structure of the post-GELU activations, the absence of these fidelity diagnostics leaves open the possibility that the reported stability and inactivity are artifacts of underfit or low-capacity autoencoders rather than properties of the transformer representations themselves.

Authors: We agree that explicit fidelity metrics are necessary to substantiate the claim that the SAEs recover the true structure of the post-GELU activations. The large fractions of inactive features we already report for overcomplete dictionaries provide indirect evidence against severe underfitting, since additional capacity remains unused. To address the concern directly, we will add reconstruction MSE, variance explained, and L0 sparsity curves across all dictionary sizes (0.5x–4x) to a new appendix section in the revised manuscript. These diagnostics will allow readers to evaluate SAE quality independently and confirm that the observed stability and inactivity reflect properties of the transformer representations rather than SAE limitations. revision: yes
Referee: The reported average performance change of 0.214% is presented without accompanying statistical tests, per-run variance, or explicit controls for SAE reconstruction quality. This weakens the claim that the representations are 'largely insensitive to latent interventions' and 'stable under aggressive dictionary expansion,' as the magnitude of change cannot be assessed against noise or baseline variability.

Authors: We acknowledge that the 0.214% figure requires statistical context to support claims of stability and insensitivity. In the revised manuscript we will report per-run variances, standard deviations across seeds, and results of appropriate statistical tests (e.g., paired tests across multiple runs) for both the dictionary-expansion and intervention experiments. We will further add an analysis that correlates SAE reconstruction quality with the magnitude of forecast perturbations, providing explicit controls for reconstruction fidelity. These additions will place the small observed changes in proper perspective relative to experimental noise. revision: yes

Circularity Check

0 steps flagged

Empirical SAE analysis yields observational claim with no derivational reduction

full rationale

The paper advances an empirical claim that FFN representations in PatchTST lack strong superposition, supported by direct measurements of downstream performance stability (avg. 0.214% change), inactive dictionary fractions, and intervention insensitivity after training SAEs on post-GELU activations at 0.5x–4x expansion. No mathematical derivation, first-principles prediction, or self-referential definition is presented whose output reduces to its inputs by construction. The analysis is self-contained as an experimental probe; external SAE fidelity concerns affect validity but do not constitute circularity under the specified patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the domain assumption that sparse autoencoders serve as a reliable probe for detecting superposition in transformer activations; no free parameters or invented entities are introduced in the reported analysis.

axioms (1)

domain assumption Sparse autoencoders trained on post-GELU activations can accurately recover whether superposition is present in the model's representations.
This is the core premise of the mechanistic interpretability approach used; the paper treats it as given without additional validation in the abstract.

pith-pipeline@v0.9.0 · 5551 in / 1288 out tokens · 76152 ms · 2026-05-08T16:32:37.485044+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

[1]

International Conference on Learning Representations (ICLR) , year=

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers , author=. International Conference on Learning Representations (ICLR) , year=

work page
[2]

AAAI Conference on Artificial Intelligence , year=

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting , author=. AAAI Conference on Artificial Intelligence , year=

work page
[3]

Advances in Neural Information Processing Systems (NeurIPS) , year=

Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

work page
[4]

Zhou, Tian and Ma, Ziqing and Wen, Qingsong and Wang, Xue and Sun, Liang and Jin, Rong , booktitle=

work page
[5]

Liu, Yong and Hu, Tengge and Zhang, Haoran and Wu, Haixu and Wang, Shiyu and Ma, Lintao and Long, Mingsheng , booktitle=

work page
[6]

Wu, Haixu and Hu, Tengge and Liu, Yong and Zhou, Hang and Wang, Jianmin and Long, Mingsheng , booktitle=

work page
[7]

AAAI Conference on Artificial Intelligence , year=

Are Transformers Effective for Time Series Forecasting? , author=. AAAI Conference on Artificial Intelligence , year=

work page
[8]

Transformer Circuits Thread , year=

Toy Models of Superposition , author=. Transformer Circuits Thread , year=

work page
[9]

Transformer Circuits Thread , year=

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning , author=. Transformer Circuits Thread , year=

work page
[10]

Scaling Monosemanticity: Extracting Interpretable Features from

Templeton, Adly and Conerly, Tom and Marcus, Jonathan and Lindsey, Jack and Bricken, Trenton and Chen, Brian and Pearce, Adam and Citro, Craig and Ameber, Emmanuel and Jones, Andy and others , journal=. Scaling Monosemanticity: Extracting Interpretable Features from. 2024 , url=

work page 2024
[11]

International Conference on Learning Representations (ICLR) , year=

Sparse Autoencoders Find Highly Interpretable Linguistic Features in Language Models , author=. International Conference on Learning Representations (ICLR) , year=

work page
[12]

Distill , year=

Zoom In: An Introduction to Circuits , author=. Distill , year=

work page
[13]

International Conference on Learning Representations (ICLR) , year=

Progress Measures for Grokking via Mechanistic Interpretability , author=. International Conference on Learning Representations (ICLR) , year=

work page
[14]

Advances in Neural Information Processing Systems (NeurIPS) , year=

Attention Is All You Need , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

work page
[15]

Su, Jianlin and Lu, Yu and Pan, Shengfeng and Murtadha, Ahmed and Wen, Bo and Liu, Yunfeng , journal=

work page
[16]

International Conference on Learning Representations (ICLR) , year=

Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift , author=. International Conference on Learning Representations (ICLR) , year=

work page
[17]

Advances in Neural Information Processing Systems (NeurIPS) , year=

Root Mean Square Layer Normalization , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

work page
[18]

Long-term Forecasting with

Das, Abhimanyu and Kong, Weihao and Leber, Andrew and Mathews, Rajat and Sen, Raqjat , booktitle=. Long-term Forecasting with

work page
[19]

Oreshkin, Boris N and Carpov, Dmitri and Chapados, Nicolas and Bengio, Yoshua , journal=

work page
[20]

Tenney, Ian and Das, Dipanjan and Pavlick, Ellie , booktitle=

work page
[21]

Niu, Jingcheng and Lu, Wenjie and Penn, Gerald , booktitle=. Does

work page
[22]

Communications in Computer and Information Science , publisher=

Mechanistic Interpretability for Transformer-Based Time Series Classification , author=. Communications in Computer and Information Science , publisher=

work page
[23]

Sparse autoencoders can interpret randomly initialized transformers

Sparse Autoencoders Can Interpret Randomly Initialized Transformers , author=. arXiv preprint arXiv:2501.17727 , year=

work page arXiv
[24]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al

Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models , author=. arXiv preprint arXiv:2504.02821 , year=

work page arXiv
[25]

Position: There are no champions in long-term time series forecasting

There are no Champions in Supervised Long-Term Time Series Forecasting , author=. arXiv preprint arXiv:2502.14045 , year=

work page arXiv
[26]

2024 , url =

Zhijian Xu and Ailing Zeng and Qiang Xu , booktitle =. 2024 , url =

work page 2024
[27]

and Pfister, Tomas , journal=

Chen, Si-An and Li, Chun-Liang and Yoder, Nate and Arik, Sercan O. and Pfister, Tomas , journal=

work page
[28]

Neural Information Processing Systems Track on Datasets and Benchmarks , year=

Monash Time Series Forecasting Archive , author=. Neural Information Processing Systems Track on Datasets and Benchmarks , year=

work page
[29]

The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval , pages=

Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks , author=. The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval , pages=

work page
[30]

ICLR 2026 Workshop on Time Series in the Age of Large Models (TSALM) , year=

Finding the Zeitgeist in Time Series Foundation Models , author=. ICLR 2026 Workshop on Time Series in the Age of Large Models (TSALM) , year=

work page 2026
[31]

arXiv preprint arXiv:2603.10071 , year=

Dissecting Chronos: Sparse Autoencoders Reveal Causal Feature Hierarchies in Time Series Foundation Models , author=. arXiv preprint arXiv:2603.10071 , year=

work page arXiv
[32]

International Journal of Forecasting , volume=

Temporal Fusion Transformers for interpretable multi-horizon time series forecasting , author=. International Journal of Forecasting , volume=

work page

[1] [1]

International Conference on Learning Representations (ICLR) , year=

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers , author=. International Conference on Learning Representations (ICLR) , year=

work page

[2] [2]

AAAI Conference on Artificial Intelligence , year=

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting , author=. AAAI Conference on Artificial Intelligence , year=

work page

[3] [3]

Advances in Neural Information Processing Systems (NeurIPS) , year=

Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

work page

[4] [4]

Zhou, Tian and Ma, Ziqing and Wen, Qingsong and Wang, Xue and Sun, Liang and Jin, Rong , booktitle=

work page

[5] [5]

Liu, Yong and Hu, Tengge and Zhang, Haoran and Wu, Haixu and Wang, Shiyu and Ma, Lintao and Long, Mingsheng , booktitle=

work page

[6] [6]

Wu, Haixu and Hu, Tengge and Liu, Yong and Zhou, Hang and Wang, Jianmin and Long, Mingsheng , booktitle=

work page

[7] [7]

AAAI Conference on Artificial Intelligence , year=

Are Transformers Effective for Time Series Forecasting? , author=. AAAI Conference on Artificial Intelligence , year=

work page

[8] [8]

Transformer Circuits Thread , year=

Toy Models of Superposition , author=. Transformer Circuits Thread , year=

work page

[9] [9]

Transformer Circuits Thread , year=

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning , author=. Transformer Circuits Thread , year=

work page

[10] [10]

Scaling Monosemanticity: Extracting Interpretable Features from

Templeton, Adly and Conerly, Tom and Marcus, Jonathan and Lindsey, Jack and Bricken, Trenton and Chen, Brian and Pearce, Adam and Citro, Craig and Ameber, Emmanuel and Jones, Andy and others , journal=. Scaling Monosemanticity: Extracting Interpretable Features from. 2024 , url=

work page 2024

[11] [11]

International Conference on Learning Representations (ICLR) , year=

Sparse Autoencoders Find Highly Interpretable Linguistic Features in Language Models , author=. International Conference on Learning Representations (ICLR) , year=

work page

[12] [12]

Distill , year=

Zoom In: An Introduction to Circuits , author=. Distill , year=

work page

[13] [13]

International Conference on Learning Representations (ICLR) , year=

Progress Measures for Grokking via Mechanistic Interpretability , author=. International Conference on Learning Representations (ICLR) , year=

work page

[14] [14]

Advances in Neural Information Processing Systems (NeurIPS) , year=

Attention Is All You Need , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

work page

[15] [15]

Su, Jianlin and Lu, Yu and Pan, Shengfeng and Murtadha, Ahmed and Wen, Bo and Liu, Yunfeng , journal=

work page

[16] [16]

International Conference on Learning Representations (ICLR) , year=

Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift , author=. International Conference on Learning Representations (ICLR) , year=

work page

[17] [17]

Advances in Neural Information Processing Systems (NeurIPS) , year=

Root Mean Square Layer Normalization , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

work page

[18] [18]

Long-term Forecasting with

Das, Abhimanyu and Kong, Weihao and Leber, Andrew and Mathews, Rajat and Sen, Raqjat , booktitle=. Long-term Forecasting with

work page

[19] [19]

Oreshkin, Boris N and Carpov, Dmitri and Chapados, Nicolas and Bengio, Yoshua , journal=

work page

[20] [20]

Tenney, Ian and Das, Dipanjan and Pavlick, Ellie , booktitle=

work page

[21] [21]

Niu, Jingcheng and Lu, Wenjie and Penn, Gerald , booktitle=. Does

work page

[22] [22]

Communications in Computer and Information Science , publisher=

Mechanistic Interpretability for Transformer-Based Time Series Classification , author=. Communications in Computer and Information Science , publisher=

work page

[23] [23]

Sparse autoencoders can interpret randomly initialized transformers

Sparse Autoencoders Can Interpret Randomly Initialized Transformers , author=. arXiv preprint arXiv:2501.17727 , year=

work page arXiv

[24] [24]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al

Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models , author=. arXiv preprint arXiv:2504.02821 , year=

work page arXiv

[25] [25]

Position: There are no champions in long-term time series forecasting

There are no Champions in Supervised Long-Term Time Series Forecasting , author=. arXiv preprint arXiv:2502.14045 , year=

work page arXiv

[26] [26]

2024 , url =

Zhijian Xu and Ailing Zeng and Qiang Xu , booktitle =. 2024 , url =

work page 2024

[27] [27]

and Pfister, Tomas , journal=

Chen, Si-An and Li, Chun-Liang and Yoder, Nate and Arik, Sercan O. and Pfister, Tomas , journal=

work page

[28] [28]

Neural Information Processing Systems Track on Datasets and Benchmarks , year=

Monash Time Series Forecasting Archive , author=. Neural Information Processing Systems Track on Datasets and Benchmarks , year=

work page

[29] [29]

The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval , pages=

Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks , author=. The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval , pages=

work page

[30] [30]

ICLR 2026 Workshop on Time Series in the Age of Large Models (TSALM) , year=

Finding the Zeitgeist in Time Series Foundation Models , author=. ICLR 2026 Workshop on Time Series in the Age of Large Models (TSALM) , year=

work page 2026

[31] [31]

arXiv preprint arXiv:2603.10071 , year=

Dissecting Chronos: Sparse Autoencoders Reveal Causal Feature Hierarchies in Time Series Foundation Models , author=. arXiv preprint arXiv:2603.10071 , year=

work page arXiv

[32] [32]

International Journal of Forecasting , volume=

Temporal Fusion Transformers for interpretable multi-horizon time series forecasting , author=. International Journal of Forecasting , volume=

work page