Learning the Koopman Operator using Attention Free Transformers
Pith reviewed 2026-06-26 08:47 UTC · model grok-4.3
The pith
Koopman autoencoders gain stability over long horizons by adding an attention-free latent memory block and change-point re-encoding.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
An attention-free latent memory block aggregates a short window of past latents to produce a corrected state before each Koopman step, and dynamic re-encoding via EWMA, CUSUM, or two-sample tests detects drift and resets predictions to the autoencoder manifold; together these yield lower accumulated error over horizons up to 1000 steps than plain Koopman autoencoders or capacity-matched multi-head attention models on three benchmark systems while preserving lower inference latency.
What carries the argument
The attention-free latent memory (AFT) block, which aggregates past latents in linear time to correct the input to each Koopman operator update.
If this is right
- Error accumulation drops consistently across the three benchmark systems over long prediction horizons.
- Inference latency stays lower than that of matched-capacity multi-head attention models.
- Gains appear both with and without the optional re-encoding step.
- Ablations confirm that different trigger policies still deliver the error reduction.
Where Pith is reading between the lines
- The linear-time correction could let the same idea stabilize other latent-space linear predictors where attention cost would be prohibitive.
- If the detectors work, hybrid linear-nonlinear forecasting pipelines become practical for real-time settings that need to stay on a learned manifold.
- Applying the triggers to systems dominated by continuous spectra would test whether the current change-point logic generalizes beyond the reported benchmarks.
Load-bearing premise
Lightweight change-point detectors can reliably flag latent drift and trigger re-encoding without creating new phase or amplitude errors on the target systems.
What would settle it
If the 1000-step rollout error on the Duffing oscillator or Repressilator is not lower for the AFT model than for the plain Koopman autoencoder, the central claim does not hold.
Figures
read the original abstract
Learning Koopman operators with autoencoders enables linear prediction in a latent space, but long-horizon rollouts often drift off the learned manifold, leading to phase and amplitude errors on systems with switching, continuous spectra, or strong transients. We introduce two complementary components that make Koopman predictors more robust. First, we add an attention-free latent memory (AFT) block that aggregates a short window of past latents to produce a corrected latent before each Koopman update. Unlike multi-head attention, AFT operates in linear time and adds only $\approx$30k parameters ($3d^2 + T^2$, fewer than matched multi-head attention), yet captures the local temporal context needed to suppress error divergence. Second, we propose dynamic re-encoding: lightweight, online change-point triggers (EWMA, CUSUM, and sequential two-sample tests) that detect latent drift and project predictions back onto the autoencoder manifold. Across three benchmark systems -- Duffing oscillator, Repressilator, IRMA -- our model consistently reduces error accumulation compared to a Koopman autoencoder and matched-capacity multi-head attention. We also compare against GRU and Transformer autoencoders, evaluated both from initial conditions and with a 50-step context, and find that Koopman+AFT (with optional re-encoding) attains markedly lower long-horizon error while maintaining lower inference latency. We report improvements over horizons up to 1000 steps, together with ablations over trigger policies. The result is a fast, compact predictor that stays on the learned manifold over long horizons.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes two additions to Koopman autoencoders for improved long-horizon prediction: (1) an Attention-Free Transformer (AFT) block that aggregates a short window of past latents in linear time with ~30k parameters to correct the latent state before each Koopman step, and (2) optional dynamic re-encoding triggered by lightweight online change-point detectors (EWMA, CUSUM, sequential two-sample tests) to project predictions back onto the learned manifold. Experiments on the Duffing oscillator, Repressilator, and IRMA systems report lower error accumulation than a baseline Koopman autoencoder, matched-capacity multi-head attention, GRU, and Transformer autoencoders, with ablations on trigger policies and evaluations from both initial conditions and 50-step context, up to 1000-step horizons.
Significance. If the reported error reductions and latency advantages are confirmed by detailed quantitative results, the work offers a compact, linear-time mechanism for mitigating manifold drift in Koopman predictors. The explicit parameter count (3d² + T²) and optional re-encoding design are falsifiable strengths that could make the method attractive for resource-constrained forecasting or control tasks involving switching or transient dynamics.
major comments (2)
- [Abstract] Abstract: the central claim of 'consistent' and 'markedly lower' error reduction across three benchmarks is stated without any numerical values, tables, error bars, or statistical tests. The manuscript must supply these quantitative results (including per-system, per-horizon metrics and ablation tables) to support the empirical contribution.
- [Method (dynamic re-encoding)] The reliability of the change-point detectors for triggering re-encoding without introducing phase/amplitude errors is load-bearing for the second component. The manuscript should provide explicit analysis or ablation results demonstrating that EWMA/CUSUM/sequential tests do not degrade prediction quality on the target systems when triggered.
minor comments (2)
- Clarify whether the reported ~30k parameter count for AFT includes the full encoder/decoder or only the AFT block, and confirm the matched-capacity multi-head attention baseline uses identical total parameters.
- The abstract mentions comparisons 'evaluated both from initial conditions and with a 50-step context'; the manuscript should explicitly state the context length used for all baselines to ensure fair comparison.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to incorporate the requested quantitative details and expanded analysis.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim of 'consistent' and 'markedly lower' error reduction across three benchmarks is stated without any numerical values, tables, error bars, or statistical tests. The manuscript must supply these quantitative results (including per-system, per-horizon metrics and ablation tables) to support the empirical contribution.
Authors: We agree that the abstract would be strengthened by including concrete numerical support for the claims. In the revised manuscript we will add key quantitative results (per-system and per-horizon error reductions with error bars) and explicit references to the ablation tables already present in the experiments section. revision: yes
-
Referee: [Method (dynamic re-encoding)] The reliability of the change-point detectors for triggering re-encoding without introducing phase/amplitude errors is load-bearing for the second component. The manuscript should provide explicit analysis or ablation results demonstrating that EWMA/CUSUM/sequential tests do not degrade prediction quality on the target systems when triggered.
Authors: The current manuscript already reports ablations over trigger policies. To directly address the concern, we will expand the analysis in the revision with explicit side-by-side comparisons of long-horizon prediction error (including phase and amplitude metrics) on Duffing, Repressilator, and IRMA when the detectors are active versus inactive, confirming that triggering does not introduce degradation. revision: yes
Circularity Check
No significant circularity identified
full rationale
The paper introduces an empirical architecture (Koopman autoencoder augmented with an AFT block for latent aggregation and optional online change-point triggers for re-encoding) and evaluates it on three benchmark dynamical systems against matched baselines. No derivation chain, equation, or central claim reduces by construction to fitted inputs, self-citations, or renamed known results; performance improvements are presented as falsifiable experimental outcomes with explicit ablations and latency comparisons. The method description remains independent of the reported error metrics.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Koopman, Bernard O. , title =. Proceedings of the National Academy of Sciences , year =. doi:10.1073/pnas.17.5.315 , url =
-
[2]
Rowley, Clarence W. and Mezi. Spectral analysis of nonlinear flows , journal =. 2009 , volume =. doi:10.1017/S0022112009992059 , url =
-
[3]
Journal of Fluid Mechanics656, 5–28 (2010)
Schmid, Peter J. , title =. Journal of Fluid Mechanics , year =. doi:10.1017/S0022112010001217 , url =
-
[4]
Applied Koopmanism , journal =
Budi. Applied Koopmanism , journal =. 2012 , volume =. doi:10.1063/1.4772195 , url =
-
[5]
Analysis of Fluid Flows via Spectral Properties of the Koopman Operator , journal =
Mezi. Analysis of Fluid Flows via Spectral Properties of the Koopman Operator , journal =. 2013 , volume =. doi:10.1146/annurev-fluid-120710-101204 , url =
-
[6]
Proctor, Joshua L. and Brunton, Steven L. and Kutz, J. Nathan , title =. SIAM Journal on Applied Dynamical Systems , year =. doi:10.1137/15M1013857 , url =
-
[7]
Korda, Milan and Mezi. Linear Predictors for Nonlinear Dynamical Systems: Koopman Operator Meets Model Predictive Control , journal =. 2018 , volume =. doi:10.1016/j.automatica.2018.03.046 , url =
-
[8]
Global Stability Analysis Using the Eigenfunctions of the Koopman Operator , journal =
Mauroy, Alexandre and Mezi. Global Stability Analysis Using the Eigenfunctions of the Koopman Operator , journal =. 2016 , volume =. doi:10.1109/TAC.2016.2518918 , url =
-
[9]
Scaling Learning Algorithms Towards
Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards
-
[10]
and Osindero, Simon and Teh, Yee Whye , journal =
Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =
-
[11]
2016 , publisher=
Deep learning , author=. 2016 , publisher=
2016
-
[12]
Dietrich, Felix and Thiem, Thomas N. and Kevrekidis, Ioannis G. On the Koopman Operator of Algorithms. SIAM Journal on Applied Dynamical Systems. 2020. doi:10.1137/19m1277059
-
[13]
Bak, Stanley and Bogomolov, Sergiy and Duggirala, Parasara Sridhar and Gerlach, Adam R. and Potomkin, Kostiantyn. Reachability of Black-Box Nonlinear Systems after Koopman Operator Linearization. ArXiv. 2021. doi:10.48550/arxiv.2105.00886
-
[14]
Colbrook, Matthew J. and Townsend, Alex. Rigorous data‐driven computation of spectral properties of Koopman operators for dynamical systems. Communications on Pure and Applied Mathematics. 2024. doi:10.1002/cpa.22125
-
[15]
Journal of Nonlinear Science , author =
Korda, Milan and Mezić, Igor. On Convergence of Extended Dynamic Mode Decomposition to the Koopman Operator. Journal of Nonlinear Science. 2018. doi:10.1007/s00332-017-9423-0
-
[18]
Koopman Operators in Robot Learning
Shi, Lu and Haseli, Masih and Mamakoukas, Giorgos and Bruder, Daniel and Abraham, Ian and Murphey, Todd and Cortes, Jorge and Karydis, Konstantinos. Koopman Operators in Robot Learning. ArXiv. 2024. doi:10.48550/arxiv.2408.04200
-
[19]
Kamb, Mason and Kaiser, Eurika and Brunton, Steven L. and Kutz, J. Nathan. Time-Delay Observables for Koopman: Theory and Applications. SIAM Journal on Applied Dynamical Systems. 2020. doi:10.1137/18m1216572
-
[20]
On Numerical Approximations of the Koopman Operator
Mezić, Igor. On Numerical Approximations of the Koopman Operator. Mathematics. 2022. doi:10.3390/math10071180
-
[21]
Consistent spectral approximation of Koopman operators using resolvent compactification
Giannakis, Dimitrios and Valva, Claire. Consistent spectral approximation of Koopman operators using resolvent compactification. Nonlinearity. 2024. doi:10.1088/1361-6544/ad4ade
-
[22]
Data-Driven Nonlinear Stabilization Using Koopman Operator
Huang, Bowen and Ma, Xu and Vaidya, Umesh. Data-Driven Nonlinear Stabilization Using Koopman Operator. Lecture Notes in Control and Information Sciences. 2020. doi:10.1007/978-3-030-35713-9\_12
-
[23]
Proctor, Joshua L. and Brunton, Steven L. and Kutz, J. Nathan. Generalizing Koopman Theory to Allow for Inputs and Control. SIAM Journal on Applied Dynamical Systems. 2018. doi:10.1137/16m1062296
-
[24]
Learning Deep Neural Network Representations for Koopman Operators of Nonlinear Dynamical Systems
Yeung, Enoch and Kundu, Soumya and Hodas, Nathan Oken. Learning Deep Neural Network Representations for Koopman Operators of Nonlinear Dynamical Systems. 2019 American Control Conference (ACC). 2019. doi:10.23919/acc.2019.8815339
-
[25]
Estimating Koopman operators for nonlinear dynamical systems: a nonparametric approach
Zanini, Francesco and Chiuso, A. Estimating Koopman operators for nonlinear dynamical systems: a nonparametric approach. ArXiv. 2021. doi:10.48550/arxiv.2103.13752
-
[26]
Koopman Operator Inspired Nonlinear System Identification
Wilson, Dan. Koopman Operator Inspired Nonlinear System Identification. SIAM Journal on Applied Dynamical Systems. 2023. doi:10.1137/22m1512272
-
[27]
Pan, Shaowu and Kaiser, E. and de Silva, Brian M. and Kutz, J. and Brunton, S. PyKoopman: A Python Package for Data-Driven Approximation of the Koopman Operator. ArXiv. 2023. doi:10.48550/arxiv.2306.12962
-
[28]
On Computation of Koopman Operator from Sparse Data
Sinha, S. and Vaidya, U. and Yeung, Enoch. On Computation of Koopman Operator from Sparse Data. 2019 American Control Conference (ACC). 2019. doi:10.48550/arxiv.1901.03024
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1901.03024 2019
-
[29]
Salova, Anastasiya and Emenheiser, Jeffrey and Rupe, Adam and Crutchfield, James P. and D’Souza, Raissa M. Koopman operator and its approximations for systems with symmetries. Chaos: An Interdisciplinary Journal of Nonlinear Science. 2019. doi:10.1063/1.5099091
-
[30]
Nature , volume=
A synthetic oscillatory network of transcriptional regulators , author=. Nature , volume=. 2000 , publisher=
2000
-
[31]
2019 IEEE 58th Conference on Decision and Control (CDC) , pages=
Koopman operators for generalized persistence of excitation conditions for nonlinear systems , author=. 2019 IEEE 58th Conference on Decision and Control (CDC) , pages=. 2019 , organization=
2019
-
[32]
Automatica , volume=
Optimal control formulation of pulse-based control using Koopman operator , author=. Automatica , volume=. 2018 , publisher=
2018
-
[33]
arXiv preprint arXiv:2210.09343 , year=
Data-driven observability decomposition with koopman operators for optimization of output functions of nonlinear systems , author=. arXiv preprint arXiv:2210.09343 , year=
-
[34]
Cell systems , volume=
Combining a toggle switch and a repressilator within the AC-DC circuit generates distinct dynamical behaviors , author=. Cell systems , volume=. 2018 , publisher=
2018
-
[35]
SIAM journal on applied dynamical systems , volume=
-error Bounds for Approximations of the Koopman Operator by Kernel Extended Dynamic Mode Decomposition , author=. SIAM journal on applied dynamical systems , volume=. 2025 , publisher=
2025
-
[36]
SIAM Journal on Applied Dynamical Systems , volume=
Physics-informed probabilistic learning of linear embeddings of nonlinear dynamics with guaranteed stability , author=. SIAM Journal on Applied Dynamical Systems , volume=. 2020 , publisher=
2020
-
[37]
Chaos: An Interdisciplinary Journal of Nonlinear Science , volume=
Deep learning enhanced dynamic mode decomposition , author=. Chaos: An Interdisciplinary Journal of Nonlinear Science , volume=. 2022 , publisher=
2022
-
[38]
SIAM Journal on Applied Dynamical Systems , volume=
Linearly recurrent autoencoder networks for learning dynamics , author=. SIAM Journal on Applied Dynamical Systems , volume=. 2019 , publisher=
2019
-
[39]
Chaos: An Interdisciplinary Journal of Nonlinear Science , volume=
Extended dynamic mode decomposition with dictionary learning: A data-driven adaptive spectral decomposition of the Koopman operator , author=. Chaos: An Interdisciplinary Journal of Nonlinear Science , volume=. 2017 , publisher=
2017
-
[40]
PLoS computational biology , volume=
In-vivo real-time control of protein expression from endogenous and synthetic gene networks , author=. PLoS computational biology , volume=. 2014 , publisher=
2014
-
[41]
Synthetic Gene Networks: Methods and Protocols , pages=
Predicting synthetic gene networks , author=. Synthetic Gene Networks: Methods and Protocols , pages=. 2011 , publisher=
2011
-
[42]
PloS one , volume=
How to turn a genetic circuit into a synthetic tunable oscillator, or a bistable switch , author=. PloS one , volume=. 2009 , publisher=
2009
-
[43]
Cell , volume=
A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches , author=. Cell , volume=. 2009 , publisher=
2009
-
[44]
2019 IEEE Biomedical Circuits and Systems Conference (BioCAS) , pages=
A data-driven method for quantifying the impact of a genetic circuit on its host , author=. 2019 IEEE Biomedical Circuits and Systems Conference (BioCAS) , pages=. 2019 , organization=
2019
-
[45]
Nature communications , volume=
Deep learning for universal linear embeddings of nonlinear dynamics , author=. Nature communications , volume=. 2018 , publisher=
2018
-
[46]
Advances in enzyme regulation , volume=
Oscillatory behavior in enzymatic control processes , author=. Advances in enzyme regulation , volume=. 1965 , publisher=
1965
-
[47]
Journal of Fluid Mechanics , volume=
A hierarchy of low-dimensional models for the transient and post-transient cylinder wake , author=. Journal of Fluid Mechanics , volume=. 2003 , publisher=
2003
-
[48]
Neural computation , volume=
Long short-term memory , author=. Neural computation , volume=. 1997 , publisher=
1997
-
[49]
Cognitive science , volume=
Finding structure in time , author=. Cognitive science , volume=. 1990 , publisher=
1990
-
[50]
arXiv preprint arXiv:1406.1078 , year=
Learning phrase representations using RNN encoder-decoder for statistical machine translation , author=. arXiv preprint arXiv:1406.1078 , year=
-
[51]
the Annals of Statistics , volume=
Optimal stopping times for detecting changes in distributions , author=. the Annals of Statistics , volume=. 1986 , publisher=
1986
-
[52]
Technometrics , volume=
Control chart tests based on geometric moving averages , author=. Technometrics , volume=. 2000 , publisher=
2000
-
[53]
Journal of Quality Technology , volume=
Two nonparametric control charts for detecting arbitrary distribution changes , author=. Journal of Quality Technology , volume=. 2012 , publisher=
2012
-
[54]
2019 American Control Conference (ACC) , pages=
Learning deep neural network representations for Koopman operators of nonlinear dynamical systems , author=. 2019 American Control Conference (ACC) , pages=. 2019 , organization=
2019
-
[55]
Journal of Nonlinear Science , volume=
A data--driven approximation of the koopman operator: Extending dynamic mode decomposition , author=. Journal of Nonlinear Science , volume=. 2015 , publisher=
2015
-
[56]
arXiv preprint arXiv:2105.14103 , year=
An attention free transformer , author=. arXiv preprint arXiv:2105.14103 , year=
-
[57]
arXiv preprint arXiv:2412.04578 , year=
Loss Terms and Operator Forms of Koopman Autoencoders , author=. arXiv preprint arXiv:2412.04578 , year=
-
[58]
arXiv preprint arXiv:2310.15386 , year=
Course correcting Koopman representations , author=. arXiv preprint arXiv:2310.15386 , year=
-
[59]
Physics Letters A , volume=
An equation for continuous chaos , author=. Physics Letters A , volume=. 1976 , publisher=
1976
-
[60]
bioRxiv , pages=
DeepMapper: attention-based autoencoder for system identification in wound healing and stage prediction , author=. bioRxiv , pages=. 2024 , publisher=
2024
-
[61]
Proceedings of the Royal Society A , volume=
Mori--Zwanzig latent space Koopman closure for nonlinear autoencoder , author=. Proceedings of the Royal Society A , volume=. 2025 , publisher=
2025
-
[62]
arXiv preprint arXiv:2210.03675 , year=
Koopman neural forecaster for time series with temporal distribution shifts , author=. arXiv preprint arXiv:2210.03675 , year=
-
[63]
arXiv preprint arXiv:2406.12062 , year=
Entropic Regression DMD (ERDMD) Discovers Informative Sparse and Nonuniformly Time Delayed Models , author=. arXiv preprint arXiv:2406.12062 , year=
-
[64]
SIAM Journal on Applied Dynamical Systems , volume=
Ergodic theory, dynamic mode decomposition, and computation of spectral properties of the Koopman operator , author=. SIAM Journal on Applied Dynamical Systems , volume=. 2017 , publisher=
2017
-
[65]
Nature communications , volume=
Chaos as an intermittently forced linear system , author=. Nature communications , volume=. 2017 , publisher=
2017
-
[66]
Scientific Reports , volume=
Temporally-consistent koopman autoencoders for forecasting dynamical systems , author=. Scientific Reports , volume=. 2025 , publisher=
2025
-
[67]
arXiv preprint arXiv:2503.12930 , year=
Augmented Invertible Koopman Autoencoder for long-term time series forecasting , author=. arXiv preprint arXiv:2503.12930 , year=
-
[68]
Measurement , volume=
Output-only modal identification with recursive dynamic mode decomposition for time-varying systems , author=. Measurement , volume=. 2024 , publisher=
2024
-
[69]
arXiv preprint arXiv:1511.06876 , year=
Recursive dynamic mode decomposition of a transient cylinder wake , author=. arXiv preprint arXiv:1511.06876 , year=
-
[70]
Physical Review E , volume=
Dynamic mode decomposition for multiscale nonlinear physics , author=. Physical Review E , volume=. 2019 , publisher=
2019
-
[71]
ACS synthetic biology , volume=
Growth defects and loss-of-function in synthetic gene circuits , author=. ACS synthetic biology , volume=. 2019 , publisher=
2019
-
[72]
Proceedings of the National Academy of Sciences , volume=
Mechanistic links between cellular trade-offs, gene expression, and growth , author=. Proceedings of the National Academy of Sciences , volume=. 2015 , publisher=
2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.