Looped SSMs: Depth-Recurrence and Input Reshaping for Time Series Classification

Daniela Rus; M\'onika Farsang; Radu Grosu; Ramin Hasani

arxiv: 2605.16048 · v1 · pith:KB6AIX3Lnew · submitted 2026-05-15 · 💻 cs.LG · cs.AI

Looped SSMs: Depth-Recurrence and Input Reshaping for Time Series Classification

M\'onika Farsang , Ramin Hasani , Daniela Rus , Radu Grosu This is my paper

Pith reviewed 2026-05-20 20:55 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords state space modelstime series classificationdepth recurrenceparameter sharinginductive biasinput reshaping

0 comments

The pith

Looping the same SSM block across layers matches or beats expanded models with independent parameters per layer on time series tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a state space model block reused L times with only k parameters performs as well as or better than a version using k times L separate parameters. This holds for four different SSM architectures on six classification benchmarks. Because the expanded model can always simulate the looped one by setting all copies equal, the result cannot come from greater expressive power. Instead the shared parameters appear to act as an inductive bias that makes training easier. Input reshaping adds separate gains of one to six percent that combine with the looping effect.

Core claim

A looped SSM with k parameters iterated L times consistently closely matches or outperforms a standard SSM with k · L independent parameters across four architectures and six benchmarks, despite a strictly smaller hypothesis space. The advantage arises because parameter sharing across depth functions as a beneficial inductive bias that simplifies optimization, independent of the models' inherent sequence recurrence.

What carries the argument

Depth-recurrence via looping, in which the identical SSM block is applied repeatedly across layers to enforce parameter sharing.

If this is right

Looped models can reach comparable accuracy with far fewer total parameters.
Depth-recurrence supplies benefits orthogonal to the sequence recurrence already present in SSMs.
Input reshaping by timestep concatenation or feature-time flattening produces consistent accuracy lifts.
The two techniques can be stacked for additive gains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sharing pattern could be tested in other recurrent or attention-based sequence models to reduce parameter count.
Optimal loop depth may vary with input dimensionality or task length.
Hybrid designs that share parameters only in early or late layers might offer further efficiency.

Load-bearing premise

The performance advantage comes specifically from the inductive bias created by sharing parameters across depth rather than from training dynamics or the particular choice of benchmarks.

What would settle it

A controlled experiment in which looped and expanded models are trained to convergence on the same data with identical random seeds and the looped version still underperforms.

Figures

Figures reproduced from arXiv: 2605.16048 by Daniela Rus, M\'onika Farsang, Radu Grosu, Ramin Hasani.

**Figure 2.** Figure 2: Illustration of the two input reshaping strategies. A concentration factor [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

State Space Models (SSMs) are inherently recurrent along the sequence dimension, yet depth-recurrence - reusing the same block repeatedly across layers, as recently applied in looped transformers - has not been explored in this model family. We show that a looped SSM with $k$ parameters iterated $L$ times consistently closely matches or outperforms a standard SSM with $k \cdot L$ independent parameters across four architectures (LRU, S5, LinOSS, LrcSSM) and six time series classification benchmarks, despite operating within a strictly smaller hypothesis space, as we formally establish. Since the larger model contains the looped model as a special case, this dominance cannot be explained by expressivity and instead points to parameter sharing across depth as a beneficial inductive bias that simplifies optimization. These results demonstrate that depth-recurrence is orthogonal to sequence-recurrence and independently beneficial. We further show that input reshaping is an equally neglected design axis: concatenating timesteps for low-dimensional inputs, or flattening and rechunking the joint feature-time dimension for high-dimensional ones, yields accuracy gains of 1-6% across all models, confirmed over 5 random seeds. Both techniques provide standalone improvements that compound when combined, suggesting that depth and input reshaping are two independent and underexplored design axes for SSMs on time series.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Looped SSMs with shared parameters across depth match or beat larger independent models on time series tasks, but the comparison needs tighter controls on hyperparameter tuning.

read the letter

The main point here is that reusing the same SSM block L times with k parameters can match or outperform a standard stack with k times L independent parameters on time series classification. This holds for LRU, S5, LinOSS, and LrcSSM across six benchmarks, and they pair it with simple input reshaping that adds 1-6% accuracy. They formally note that the looped version sits in a smaller hypothesis space, which rules out extra capacity as the explanation and points to sharing as an optimization-friendly bias. Depth-recurrence is treated as orthogonal to the usual sequence recurrence in SSMs. Input reshaping gets its own confirmation over five random seeds, and the two ideas compound when used together. The paper does a reasonable job showing consistency across architectures and datasets without overclaiming theoretical breakthroughs. The empirical scope is broad enough to make the pattern worth noticing. The soft spot is the one the stress-test note flags. If the standard deeper models did not receive independent hyperparameter tuning, differences in gradient flow, initialization sensitivity, or training dynamics could produce the observed edge without isolating the claimed inductive bias. The abstract only ties the seed confirmation to the reshaping experiments, so the core looped-versus-standard results sit on thinner ground until the full methods section clarifies the training protocol. This work is aimed at researchers already using or extending SSMs for time series, especially those interested in parameter-efficient design choices. A reader looking for practical tweaks rather than new theory would get the most out of it. The paper is coherent on its own terms and shows clear engagement with the relevant literature, so it deserves a serious referee even if revisions on experimental controls are likely needed. I would send it to review.

Referee Report

2 major / 2 minor

Summary. The paper introduces depth-recurrence to State Space Models by reusing the same SSM block (k parameters) across L layers and demonstrates that this looped variant consistently matches or outperforms a standard SSM with k·L independent parameters on six time series classification benchmarks across four architectures (LRU, S5, LinOSS, LrcSSM). It formally notes that the larger model contains the looped case as a special case, attributing the result to parameter sharing as an inductive bias that aids optimization rather than increased expressivity. The work additionally proposes input reshaping (concatenation for low-dimensional inputs or flattening/rechunking for high-dimensional ones) that yields 1-6% accuracy gains, with both techniques compounding when combined.

Significance. If the central empirical claim holds after controlling for training dynamics, the result would establish depth-recurrence as an orthogonal and beneficial design axis for SSMs, separate from their inherent sequence-recurrence. This could inform more efficient scaling of deep SSM architectures for time series tasks by leveraging parameter sharing to simplify optimization landscapes. The multi-architecture, multi-benchmark scope and the explicit smaller-hypothesis-space argument provide a solid foundation for follow-up work on inductive biases in recurrent sequence models.

major comments (2)

[§4] §4 (Experiments): The looped-vs-standard comparison does not report independent hyperparameter tuning (learning rate, optimizer settings, epochs, or initialization) for the k·L parameter models. If identical regimes were used for both, differences in gradient flow through L independent layers versus looped recurrence could explain the observed performance without isolating the claimed inductive bias of parameter sharing.
[§3.2] §3.2 and Table 2: The formal argument that the standard model contains the looped model as a special case is used to rule out expressivity, yet the empirical results would be strengthened by an ablation that explicitly initializes the standard model to recover the looped weights and verifies that optimization still diverges under the same training protocol.

minor comments (2)

[Abstract] The abstract states results for input reshaping are 'confirmed over 5 random seeds' but does not clarify whether the main looped-vs-standard tables also use multiple seeds or report variance; adding this detail would improve reproducibility.
[§3.1] Notation for the looped iteration (e.g., how the hidden state is passed between iterations of the same block) could be clarified with a small diagram or explicit recurrence equation in §3.1.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below with clarifications on our experimental protocol and theoretical claims, and we outline the revisions we will make to the manuscript.

read point-by-point responses

Referee: [§4] §4 (Experiments): The looped-vs-standard comparison does not report independent hyperparameter tuning (learning rate, optimizer settings, epochs, or initialization) for the k·L parameter models. If identical regimes were used for both, differences in gradient flow through L independent layers versus looped recurrence could explain the observed performance without isolating the claimed inductive bias of parameter sharing.

Authors: We appreciate this point on experimental controls. Our comparisons deliberately employed identical hyperparameter regimes—including the same learning rate, optimizer, epoch count, and initialization—for both looped and standard models to ensure a fair, controlled evaluation under the same training dynamics. This design choice allows us to attribute performance differences to the inductive bias of parameter sharing rather than to separately optimized training for the larger model. Gradient flow distinctions are a direct consequence of the architectural choice and thus part of the optimization benefit we claim. In the revision we will explicitly document this identical-regime protocol in §4 and add a brief discussion of how it supports isolating the inductive bias effect. revision: yes
Referee: [§3.2] §3.2 and Table 2: The formal argument that the standard model contains the looped model as a special case is used to rule out expressivity, yet the empirical results would be strengthened by an ablation that explicitly initializes the standard model to recover the looped weights and verifies that optimization still diverges under the same training protocol.

Authors: The formal argument in §3.2 shows that any looped weight configuration is realizable inside the standard model by repeating parameters across layers, thereby ruling out greater expressivity as an explanation for the observed advantage. We agree that an initialization ablation—setting the standard model’s layers to identical looped weights and checking whether optimization diverges from that point—would provide further insight into the optimization landscape. Our current experiments use standard random initialization for both models, which is the conventional protocol and already demonstrates the practical benefit of depth-recurrence. We will add a paragraph in the revised §3.2 discussing this ablation as a valuable direction for future work while noting that the existing random-initialization results suffice to support our claims. revision: partial

Circularity Check

0 steps flagged

Empirical comparisons with set-inclusion argument show no circularity

full rationale

The paper's core claims rest on direct experimental comparisons of looped SSMs (k parameters, L iterations) versus standard SSMs (k·L independent parameters) across four architectures and six benchmarks. The statement that the larger model contains the looped model as a special case is a standard expressivity argument establishing that performance gains cannot be attributed to greater hypothesis-space size; this is a logical inclusion, not a self-referential definition or fitted parameter renamed as prediction. No equations, derivations, or predictions are shown to reduce to their own inputs by construction. No load-bearing self-citations or ansatzes imported from prior author work appear in the provided claims. The results are therefore self-contained against external benchmarks and receive a score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on the abstract alone, the work relies on standard machine-learning assumptions such as random initialization and gradient-based optimization; no explicit free parameters, ad-hoc axioms, or new invented entities are introduced or detailed.

pith-pipeline@v0.9.0 · 5779 in / 1044 out tokens · 40739 ms · 2026-05-20T20:55:30.353721+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery and embed_strictMono unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We formally establish that the latter contains the former as a special case... this dominance cannot be explained by expressivity and instead points to parameter sharing across depth as a beneficial inductive bias
IndisputableMonolith/Foundation/ArrowOfTime.lean forward_accumulates / z_monotone_absolute unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

depth-recurrence is precisely orthogonal to sequence-recurrence

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 5 internal anchors

[1]

The UEA multivariate time series classification archive, 2018

Anthony Bagnall, Hoang Anh Dau, Jason Lines, Michael Flynn, James Large, Aaron Bostrom, Paul Southam, and Eamonn Keogh. The uea multivariate time series classification archive, 2018. arXiv preprint arXiv:1811.00075, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[2]

Flexivit: One model for all patch sizes

Lucas Beyer, Pavel Izmailov, Alexander Kolesnikov, Mathilde Caron, Simon Kornblith, Xiaohua Zhai, Matthias Minderer, Michael Tschannen, Ibrahim Alabdulmohsin, and Filip Pavetic. Flexivit: One model for all patch sizes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14496--14506, 2023

work page 2023
[3]

A spelling device for the paralysed

Niels Birbaumer, Nimr Ghanayim, Thilo Hinterberger, Iver Iversen, Boris Kotchoubey, Andrea K \"u bler, Juri Perelmouter, Edward Taub, and Herta Flor. A spelling device for the paralysed. Nature, 398 0 (6725): 0 297--298, 1999

work page 1999
[4]

A Mechanistic Analysis of Looped Reasoning Language Models

Hugh Blayney, \'A lvaro Arroyo, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Michael M Bronstein, and Xiaowen Dong. A mechanistic analysis of looped reasoning language models. arXiv preprint arXiv:2604.11791, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[5]

Learning to dissipate energy in oscillatory state-space models

Jared Boyer, T Konstantin Rusch, and Daniela Rus. Learning to dissipate energy in oscillatory state-space models. arXiv preprint arXiv:2505.12171, 2025

work page arXiv 2025
[6]

Universal transformers

Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Lukasz Kaiser. Universal transformers. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=HyzdRiR9Y7

work page 2019
[7]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL https...

work page 2021
[8]

Parallelization of non-linear state-space models: Scaling up liquid-resistance liquid-capacitance networks for efficient sequence modeling

M \'o nika Farsang and Radu Grosu. Parallelization of non-linear state-space models: Scaling up liquid-resistance liquid-capacitance networks for efficient sequence modeling. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URL https://openreview.net/forum?id=ENYvdnyhLl

work page 2026
[9]

Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, and Tom Goldstein

Jonas Geiping, Sean Michael McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, and Tom Goldstein. Scaling up test-time compute with latent reasoning: A recurrent depth approach. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URL https://openreview.net/forum...

work page 2026
[10]

Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals

Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George B Moody, Chung-Kang Peng, and H Eugene Stanley. Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. circulation, 101 0 (23): 0 e215--e220, 2000

work page 2000
[11]

Mamba: Linear-time sequence modeling with selective state spaces

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. In First Conference on Language Modeling, 2024. URL https://openreview.net/forum?id=tEYskw1VY2

work page 2024
[12]

Efficiently modeling long sequences with structured state spaces

Albert Gu, Karan Goel, and Christopher Re. Efficiently modeling long sequences with structured state spaces. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=uYLFoz1vlAC

work page 2022
[13]

The capacity and robustness trade-off: Revisiting the channel independent strategy for multivariate time series forecasting

Lu Han, Han-Jia Ye, and De-Chuan Zhan. The capacity and robustness trade-off: Revisiting the channel independent strategy for multivariate time series forecasting. IEEE Transactions on Knowledge and Data Engineering, 36 0 (11): 0 7129--7142, 2024

work page 2024
[14]

Liquid structural state-space models

Ramin Hasani, Mathias Lechner, Tsun-Hsuan Wang, Makram Chahine, Alexander Amini, and Daniela Rus. Liquid structural state-space models. In The Eleventh International Conference on Learning Representations, 2023

work page 2023
[15]

Anderson Keller, Carmen Amo Alonso, Terrence Sejnowski, and Hava T Siegelmann

Arjun Karuvally, Franz Nowak, T. Anderson Keller, Carmen Amo Alonso, Terrence Sejnowski, and Hava T Siegelmann. Bridging expressivity and scalability with adaptive unitary SSM s. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URL https://openreview.net/forum?id=s4zitEu2R8

work page 2026
[16]

o der, N Hill, Wolfgang Rosenstiel, Christian Elger, Niels Birbaumer, and Bernhard Sch \

Thomas Lal, Thilo Hinterberger, Guido Widman, Michael Schr \"o der, N Hill, Wolfgang Rosenstiel, Christian Elger, Niels Birbaumer, and Bernhard Sch \"o lkopf. Methods towards invasive human brain computer interfaces. Advances in neural information processing systems, 17, 2004

work page 2004
[17]

Detecting forged alcohol non-invasively through vibrational spectroscopy and machine learning

James Large, E Kate Kemsley, Nikolaus Wellner, Ian Goodall, and Anthony Bagnall. Detecting forged alcohol non-invasively through vibrational spectroscopy and machine learning. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 298--309. Springer, 2018

work page 2018
[18]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012--10022, 2021

work page 2021
[19]

Rough transformers: Lightweight and continuous time series modelling through signature patching

Fernando Moreno-Pino, \'A lvaro Arroyo, Harrison Waldon, Xiaowen Dong, and \'A lvaro Cartea. Rough transformers: Lightweight and continuous time series modelling through signature patching. Advances in Neural Information Processing Systems, 37: 0 106264--106294, 2024

work page 2024
[20]

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. arXiv preprint arXiv:2211.14730, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[21]

Weight-space linear recurrent neural networks

Roussel Desmond Nzoyem, Nawid Keshtmand, Enrique Crespo Fernandez, Idriss Tsayem, Raul Santos-Rodriguez, David AW Barton, and Tom Deakin. Weight-space linear recurrent neural networks. arXiv preprint arXiv:2506.01153, 2025

work page arXiv 2025
[22]

Resurrecting recurrent neural networks for long sequences

Antonio Orvieto, Samuel L Smith, Albert Gu, Anushan Fernando, Caglar Gulcehre, Razvan Pascanu, and Soham De. Resurrecting recurrent neural networks for long sequences. In International Conference on Machine Learning, pages 26670--26698. PMLR, 2023

work page 2023
[23]

Two-scale latent dynamics for recurrent-depth transformers.CoRR, abs/2509.23314, 2025

Francesco Pappone, Donato Crisostomi, and Emanuele Rodol \`a . Two-scale latent dynamics for recurrent-depth transformers. arXiv preprint arXiv:2509.23314, 2025

work page arXiv 2025
[24]

Learning long range dependencies through time reversal symmetry breaking

Guillaume Pourcel and Maxence Ernoult. Learning long range dependencies through time reversal symmetry breaking. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URL https://openreview.net/forum?id=w1ihNiIBOc

work page 2026
[25]

Parcae: Scaling Laws For Stable Looped Language Models

Hayden Prairie, Zachary Novack, Taylor Berg-Kirkpatrick, and Daniel Y Fu. Parcae: Scaling laws for stable looped language models. arXiv preprint arXiv:2604.12946, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[26]

Oscillatory state-space models

T Konstantin Rusch and Daniela Rus. Oscillatory state-space models. In The Thirteenth International Conference on Learning Representations, 2025

work page 2025
[27]

Nikunj Saunshi, Nishanth Dikkala, Zhiyuan Li, Sanjiv Kumar, and Sashank J. Reddi. Reasoning with latent thoughts: On the power of looped transformers. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=din0lGfZFd

work page 2025
[28]

Simplified state space layers for sequence modeling

Jimmy TH Smith, Andrew Warrington, and Scott W Linderman. Simplified state space layers for sequence modeling. In ICLR, 2023

work page 2023
[29]

Log neural controlled differential equations: The lie brackets make a difference

Benjamin Walker, Andrew Donald McLeod, Tiexin Qin, Yichuan Cheng, Haoliang Li, and Terry Lyons. Log neural controlled differential equations: The lie brackets make a difference. In Forty-first International Conference on Machine Learning, 2024 a . URL https://openreview.net/forum?id=0tYrMtQyPT

work page 2024
[30]

Log neural controlled differential equations: The lie brackets make a difference

Benjamin Walker, Andrew Donald McLeod, Tiexin Qin, Yichuan Cheng, Haoliang Li, and Terry Lyons. Log neural controlled differential equations: The lie brackets make a difference. In Forty-first International Conference on Machine Learning, 2024 b

work page 2024
[31]

A database of caenorhabditis elegans behavioral phenotypes

Eviatar Yemini, Tadas Jucikas, Laura J Grundy, Andr \'e EX Brown, and William R Schafer. A database of caenorhabditis elegans behavioral phenotypes. Nature methods, 10 0 (9): 0 877--879, 2013

work page 2013
[32]

Scaling Latent Reasoning via Looped Language Models

Rui-Jie Zhu, Zixuan Wang, Kai Hua, Tianyu Zhang, Ziniu Li, Haoran Que, Boyi Wei, Zixin Wen, Fan Yin, He Xing, et al. Scaling latent reasoning via looped language models. arXiv preprint arXiv:2510.25741, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[1] [1]

The UEA multivariate time series classification archive, 2018

Anthony Bagnall, Hoang Anh Dau, Jason Lines, Michael Flynn, James Large, Aaron Bostrom, Paul Southam, and Eamonn Keogh. The uea multivariate time series classification archive, 2018. arXiv preprint arXiv:1811.00075, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[2] [2]

Flexivit: One model for all patch sizes

Lucas Beyer, Pavel Izmailov, Alexander Kolesnikov, Mathilde Caron, Simon Kornblith, Xiaohua Zhai, Matthias Minderer, Michael Tschannen, Ibrahim Alabdulmohsin, and Filip Pavetic. Flexivit: One model for all patch sizes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14496--14506, 2023

work page 2023

[3] [3]

A spelling device for the paralysed

Niels Birbaumer, Nimr Ghanayim, Thilo Hinterberger, Iver Iversen, Boris Kotchoubey, Andrea K \"u bler, Juri Perelmouter, Edward Taub, and Herta Flor. A spelling device for the paralysed. Nature, 398 0 (6725): 0 297--298, 1999

work page 1999

[4] [4]

A Mechanistic Analysis of Looped Reasoning Language Models

Hugh Blayney, \'A lvaro Arroyo, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Michael M Bronstein, and Xiaowen Dong. A mechanistic analysis of looped reasoning language models. arXiv preprint arXiv:2604.11791, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[5] [5]

Learning to dissipate energy in oscillatory state-space models

Jared Boyer, T Konstantin Rusch, and Daniela Rus. Learning to dissipate energy in oscillatory state-space models. arXiv preprint arXiv:2505.12171, 2025

work page arXiv 2025

[6] [6]

Universal transformers

Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Lukasz Kaiser. Universal transformers. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=HyzdRiR9Y7

work page 2019

[7] [7]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL https...

work page 2021

[8] [8]

Parallelization of non-linear state-space models: Scaling up liquid-resistance liquid-capacitance networks for efficient sequence modeling

M \'o nika Farsang and Radu Grosu. Parallelization of non-linear state-space models: Scaling up liquid-resistance liquid-capacitance networks for efficient sequence modeling. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URL https://openreview.net/forum?id=ENYvdnyhLl

work page 2026

[9] [9]

Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, and Tom Goldstein

Jonas Geiping, Sean Michael McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, and Tom Goldstein. Scaling up test-time compute with latent reasoning: A recurrent depth approach. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URL https://openreview.net/forum...

work page 2026

[10] [10]

Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals

Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George B Moody, Chung-Kang Peng, and H Eugene Stanley. Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. circulation, 101 0 (23): 0 e215--e220, 2000

work page 2000

[11] [11]

Mamba: Linear-time sequence modeling with selective state spaces

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. In First Conference on Language Modeling, 2024. URL https://openreview.net/forum?id=tEYskw1VY2

work page 2024

[12] [12]

Efficiently modeling long sequences with structured state spaces

Albert Gu, Karan Goel, and Christopher Re. Efficiently modeling long sequences with structured state spaces. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=uYLFoz1vlAC

work page 2022

[13] [13]

The capacity and robustness trade-off: Revisiting the channel independent strategy for multivariate time series forecasting

Lu Han, Han-Jia Ye, and De-Chuan Zhan. The capacity and robustness trade-off: Revisiting the channel independent strategy for multivariate time series forecasting. IEEE Transactions on Knowledge and Data Engineering, 36 0 (11): 0 7129--7142, 2024

work page 2024

[14] [14]

Liquid structural state-space models

Ramin Hasani, Mathias Lechner, Tsun-Hsuan Wang, Makram Chahine, Alexander Amini, and Daniela Rus. Liquid structural state-space models. In The Eleventh International Conference on Learning Representations, 2023

work page 2023

[15] [15]

Anderson Keller, Carmen Amo Alonso, Terrence Sejnowski, and Hava T Siegelmann

Arjun Karuvally, Franz Nowak, T. Anderson Keller, Carmen Amo Alonso, Terrence Sejnowski, and Hava T Siegelmann. Bridging expressivity and scalability with adaptive unitary SSM s. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URL https://openreview.net/forum?id=s4zitEu2R8

work page 2026

[16] [16]

o der, N Hill, Wolfgang Rosenstiel, Christian Elger, Niels Birbaumer, and Bernhard Sch \

Thomas Lal, Thilo Hinterberger, Guido Widman, Michael Schr \"o der, N Hill, Wolfgang Rosenstiel, Christian Elger, Niels Birbaumer, and Bernhard Sch \"o lkopf. Methods towards invasive human brain computer interfaces. Advances in neural information processing systems, 17, 2004

work page 2004

[17] [17]

Detecting forged alcohol non-invasively through vibrational spectroscopy and machine learning

James Large, E Kate Kemsley, Nikolaus Wellner, Ian Goodall, and Anthony Bagnall. Detecting forged alcohol non-invasively through vibrational spectroscopy and machine learning. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 298--309. Springer, 2018

work page 2018

[18] [18]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012--10022, 2021

work page 2021

[19] [19]

Rough transformers: Lightweight and continuous time series modelling through signature patching

Fernando Moreno-Pino, \'A lvaro Arroyo, Harrison Waldon, Xiaowen Dong, and \'A lvaro Cartea. Rough transformers: Lightweight and continuous time series modelling through signature patching. Advances in Neural Information Processing Systems, 37: 0 106264--106294, 2024

work page 2024

[20] [20]

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. arXiv preprint arXiv:2211.14730, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[21] [21]

Weight-space linear recurrent neural networks

Roussel Desmond Nzoyem, Nawid Keshtmand, Enrique Crespo Fernandez, Idriss Tsayem, Raul Santos-Rodriguez, David AW Barton, and Tom Deakin. Weight-space linear recurrent neural networks. arXiv preprint arXiv:2506.01153, 2025

work page arXiv 2025

[22] [22]

Resurrecting recurrent neural networks for long sequences

Antonio Orvieto, Samuel L Smith, Albert Gu, Anushan Fernando, Caglar Gulcehre, Razvan Pascanu, and Soham De. Resurrecting recurrent neural networks for long sequences. In International Conference on Machine Learning, pages 26670--26698. PMLR, 2023

work page 2023

[23] [23]

Two-scale latent dynamics for recurrent-depth transformers.CoRR, abs/2509.23314, 2025

Francesco Pappone, Donato Crisostomi, and Emanuele Rodol \`a . Two-scale latent dynamics for recurrent-depth transformers. arXiv preprint arXiv:2509.23314, 2025

work page arXiv 2025

[24] [24]

Learning long range dependencies through time reversal symmetry breaking

Guillaume Pourcel and Maxence Ernoult. Learning long range dependencies through time reversal symmetry breaking. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URL https://openreview.net/forum?id=w1ihNiIBOc

work page 2026

[25] [25]

Parcae: Scaling Laws For Stable Looped Language Models

Hayden Prairie, Zachary Novack, Taylor Berg-Kirkpatrick, and Daniel Y Fu. Parcae: Scaling laws for stable looped language models. arXiv preprint arXiv:2604.12946, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[26] [26]

Oscillatory state-space models

T Konstantin Rusch and Daniela Rus. Oscillatory state-space models. In The Thirteenth International Conference on Learning Representations, 2025

work page 2025

[27] [27]

Nikunj Saunshi, Nishanth Dikkala, Zhiyuan Li, Sanjiv Kumar, and Sashank J. Reddi. Reasoning with latent thoughts: On the power of looped transformers. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=din0lGfZFd

work page 2025

[28] [28]

Simplified state space layers for sequence modeling

Jimmy TH Smith, Andrew Warrington, and Scott W Linderman. Simplified state space layers for sequence modeling. In ICLR, 2023

work page 2023

[29] [29]

Log neural controlled differential equations: The lie brackets make a difference

Benjamin Walker, Andrew Donald McLeod, Tiexin Qin, Yichuan Cheng, Haoliang Li, and Terry Lyons. Log neural controlled differential equations: The lie brackets make a difference. In Forty-first International Conference on Machine Learning, 2024 a . URL https://openreview.net/forum?id=0tYrMtQyPT

work page 2024

[30] [30]

Log neural controlled differential equations: The lie brackets make a difference

Benjamin Walker, Andrew Donald McLeod, Tiexin Qin, Yichuan Cheng, Haoliang Li, and Terry Lyons. Log neural controlled differential equations: The lie brackets make a difference. In Forty-first International Conference on Machine Learning, 2024 b

work page 2024

[31] [31]

A database of caenorhabditis elegans behavioral phenotypes

Eviatar Yemini, Tadas Jucikas, Laura J Grundy, Andr \'e EX Brown, and William R Schafer. A database of caenorhabditis elegans behavioral phenotypes. Nature methods, 10 0 (9): 0 877--879, 2013

work page 2013

[32] [32]

Scaling Latent Reasoning via Looped Language Models

Rui-Jie Zhu, Zixuan Wang, Kai Hua, Tianyu Zhang, Ziniu Li, Haoran Que, Boyi Wei, Zixin Wen, Fan Yin, He Xing, et al. Scaling latent reasoning via looped language models. arXiv preprint arXiv:2510.25741, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025