Recognition: 2 theorem links
· Lean TheoremAdversarial Robustness of Deep State Space Models for Forecasting
Pith reviewed 2026-05-13 19:43 UTC · model grok-4.3
The pith
Spacetime SSM forecasters can exactly match the optimal Kalman predictor for autoregressive processes, yet their error grows with instability and decoder size under adversarial attacks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The decoder-only Spacetime architecture can represent the optimal Kalman predictor when the underlying data-generating process is autoregressive, a property no other SSM possesses. This equivalence enables closed-form bounds on adversarial forecasting error that quantify the amplifying effects of open-loop instability, closed-loop instability, and decoder state dimension. Robust forecaster design is formulated as a Stackelberg game against worst-case stealthy adversaries constrained by a detection budget and solved via adversarial training, while model-free attacks that bypass gradient computations are shown to be more effective than projected gradient descent.
What carries the argument
The decoder-only Spacetime SSM architecture, which achieves exact equivalence to the optimal Kalman predictor under autoregressive assumptions, together with the Stackelberg game formulation that solves for robust design against detection-budget-constrained adversaries.
If this is right
- When data follows an autoregressive process, the decoder-only Spacetime model achieves the exact optimal Kalman predictor.
- Adversarial forecasting error increases with both open-loop and closed-loop instability of the underlying system.
- Larger decoder state dimensions directly amplify the upper bound on adversarial error.
- Adversarial training solves the Stackelberg game and yields more robust forecasters.
- Model-free attacks that use only local linear input-output behavior outperform gradient-based attacks without requiring model access.
Where Pith is reading between the lines
- Stability metrics could serve as a practical design criterion when selecting SSM architectures for forecasting under potential attacks.
- The local-linearity exploitation in model-free attacks may extend to other recurrent or state-space time-series models beyond Spacetime.
- The Stackelberg formulation with detection budgets could be adapted to robust learning in related control or dynamical systems problems.
- Evaluating the equivalence on processes that deviate from strict autoregression would clarify the practical scope of the Kalman representation result.
Load-bearing premise
The data-generating process must be autoregressive for the Spacetime model to match the optimal Kalman predictor, and adversaries must operate under a fixed detection budget in the Stackelberg formulation.
What would settle it
A counterexample on a synthetic autoregressive dataset where the Spacetime model fails to achieve the same one-step prediction error as the true Kalman filter, or empirical results on the Monash benchmarks where model-free attacks do not exceed projected gradient descent error by at least 33 percent.
Figures
read the original abstract
State-space model (SSM) for time-series forecasting have demonstrated strong empirical performance on benchmark datasets, yet their robustness under adversarial perturbations is poorly understood. We address this gap through a control-theoretic lens, focusing on the recently proposed Spacetime SSM forecaster. We first establish that the decoder-only Spacetime architecture can represent the optimal Kalman predictor when the underlying data-generating process is autoregressive - a property no other SSM possesses. Building on this, we formulate robust forecaster design as a Stackelberg game against worst-case stealthy adversaries constrained by a detection budget, and solve it via adversarial training. We derive closed-form bounds on adversarial forecasting error that expose how open-loop instability, closed-loop instability, and decoder state dimension each amplify vulnerability - offering actionable principles towards robust forecaster design. Finally, we show that even adversaries with no access to the forecaster can nonetheless construct effective attacks by exploiting the model's locally linear input-output behavior, bypassing gradient computations entirely. Experiments on the Monash benchmark datasets highlight that model-free attacks, without any gradient computation, can cause at least 33% more error than projected gradient descent with a small step size.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that decoder-only Spacetime SSMs can exactly represent the optimal Kalman predictor for autoregressive data-generating processes (a property asserted to be unique among SSMs). It formulates robust forecasting as a Stackelberg game against stealthy adversaries under a detection budget, derives closed-form bounds showing how open-loop instability, closed-loop instability, and decoder state dimension amplify adversarial error, and reports that model-free attacks (exploiting local linearity) cause at least 33% more forecasting error than small-step PGD on Monash benchmarks.
Significance. If the Kalman equivalence holds exactly and the closed-form bounds are valid, the work would supply concrete theoretical principles for robust SSM forecaster design and introduce practical model-free attack methods. The use of standard Monash benchmarks and the control-theoretic framing add relevance to time-series robustness literature, though the load-bearing nature of the equivalence means the significance is conditional on verification of that step.
major comments (2)
- [Abstract / Kalman equivalence derivation] Abstract and the section deriving the Kalman equivalence: the assertion that decoder-only Spacetime exactly recovers the optimal Kalman predictor recursion (including the gain and one-step predictor) for arbitrary AR order is load-bearing for the subsequent Stackelberg formulation and closed-form bounds. The manuscript must exhibit the precise parameterization of the hidden-state transition matrix and observation map that matches the Kalman filter equations without residual approximation error; any mismatch would render the instability-based bounds inapplicable to the trained model.
- [Experiments] Experiments section reporting the 33% error gap: the claim that model-free attacks cause at least 33% more error than PGD lacks reported error bars, number of random seeds, or statistical tests. Without these, the quantitative comparison cannot reliably support the superiority statement, especially given the low verification level of the theoretical claims.
minor comments (2)
- [Bounds derivation] Clarify in the notation or bounds section whether the closed-form expressions assume exact linearity or hold under the locally linear approximation used for the model-free attack.
- [Introduction / Related work] Add explicit comparison (even a brief remark) showing why other SSM architectures cannot achieve the exact Kalman representation, to substantiate the uniqueness claim.
Simulated Author's Rebuttal
We thank the referee for their insightful comments on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the paper.
read point-by-point responses
-
Referee: Abstract and the section deriving the Kalman equivalence: the assertion that decoder-only Spacetime exactly recovers the optimal Kalman predictor recursion (including the gain and one-step predictor) for arbitrary AR order is load-bearing for the subsequent Stackelberg formulation and closed-form bounds. The manuscript must exhibit the precise parameterization of the hidden-state transition matrix and observation map that matches the Kalman filter equations without residual approximation error; any mismatch would render the instability-based bounds inapplicable to the trained model.
Authors: We appreciate the referee's focus on the foundational Kalman equivalence claim. Our derivation in the main text parameterizes the Spacetime SSM's hidden state transition matrix as the companion form of the AR coefficients and the observation map to extract the one-step prediction, with the decoder state dimension set to match the AR order. This allows exact recovery of the Kalman recursion, including the optimal gain from the Riccati solution. To address the request for explicit exhibition, we will revise the manuscript by adding a new appendix or subsection that provides the full matrix expressions and verifies the equivalence for arbitrary AR orders, ensuring no residual error and validating the bounds. revision: yes
-
Referee: Experiments section reporting the 33% error gap: the claim that model-free attacks cause at least 33% more error than PGD lacks reported error bars, number of random seeds, or statistical tests. Without these, the quantitative comparison cannot reliably support the superiority statement, especially given the low verification level of the theoretical claims.
Authors: We concur that including error bars, seed counts, and statistical tests will improve the reliability of the experimental results. We will update the Experiments section to report the forecasting errors averaged over 5 independent random seeds, with standard error bars shown in the relevant tables and figures. Additionally, we will include p-values from appropriate statistical tests to confirm the significance of the observed 33% or greater error increase for model-free attacks compared to PGD. revision: yes
Circularity Check
Minor self-citation for Spacetime architecture; Kalman equivalence and bounds derived independently
full rationale
The derivation begins by explicitly matching the decoder-only Spacetime state-update and output equations to the Kalman predictor recursion for autoregressive processes, then proceeds to Stackelberg-game bounds on error amplification via open- and closed-loop instability and state dimension. These steps use the architecture's linear realization and standard game-theoretic setup rather than re-using fitted parameters or self-referential definitions. Experiments rely on external Monash benchmarks, and model-free attacks exploit local linearity without gradient fitting. The sole self-citation (for the recently proposed Spacetime forecaster) is not load-bearing for the equivalence claim or the subsequent bounds.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Underlying data-generating process is autoregressive
- domain assumption Adversaries are constrained by a detection budget in the Stackelberg game
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Proposition 3: sup ||Hε|| = σ_max(H) with H built from encoder/decoder state matrices; bounds involve ρ(A) and ρ(Ā+BK)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Modeling and forecasting realized volatility,
T. G. Andersen, T. Bollerslev, F. X. Diebold, and P. Labys, “Modeling and forecasting realized volatility,”Econometrica, vol. 71, no. 2, pp. 579–625, 2003
work page 2003
-
[2]
M. H. Amini, A. Kargarian, and O. Karabasoglu, “ARIMA-based de- coupled time series forecasting of electric vehicle charging demand for stochastic power system operation,”Electric Power Systems Research, vol. 140, pp. 378–390, 2016
work page 2016
-
[3]
Trend analysis of climate time series: A review of methods,
M. Mudelsee, “Trend analysis of climate time series: A review of methods,”Earth-science reviews, vol. 190, pp. 310–322, 2019
work page 2019
-
[4]
M. Jin, H. Y . Koh, Q. Wen, D. Zambon, C. Alippi, G. I. Webb, I. King, and S. Pan, “A survey on graph neural networks for time series: Forecasting, classification, imputation, and anomaly detection,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
work page 2024
-
[5]
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
S. Bai, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,”arXiv:1803.01271, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[6]
A decoder-only foundation model for time-series forecasting,
A. Das, W. Kong, R. Sen, and Y . Zhou, “A decoder-only foundation model for time-series forecasting,” inForty-first International Confer- ence on Machine Learning, 2024
work page 2024
-
[7]
Deep state space models for time series forecasting,
S. S. Rangapuram, M. W. Seeger, J. Gasthaus, L. Stella, Y . Wang, and T. Januschowski, “Deep state space models for time series forecasting,” Advances in neural information processing systems, vol. 31, 2018
work page 2018
-
[8]
Are transformers effective for time series forecasting?,
A. Zeng, M. Chen, L. Zhang, and Q. Xu, “Are transformers effective for time series forecasting?,” inProceedings of the AAAI conference on artificial intelligence, vol. 37, pp. 11121–11128, 2023
work page 2023
-
[9]
Effectively modeling time series with simple discrete state spaces,
M. Zhang, K. K. Saab, M. Poli, T. Dao, K. Goel, and C. Re, “Effectively modeling time series with simple discrete state spaces,” inThe Eleventh Intl. Conference on Learning Representations, 2023
work page 2023
-
[10]
Efficiently modeling long sequences with structured state spaces,
A. Gu, K. Goel, and C. R ´e, “Efficiently modeling long sequences with structured state spaces,” inThe International Conference on Learning Representations (ICLR), 2022
work page 2022
-
[11]
Mamba: Linear-time sequence modeling with selective state spaces,
A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,” inFirst conf. on language modeling, 2024
work page 2024
-
[12]
Adversarial vulnerabilities in large language models for time series forecasting,
F. Liu, S. Jiang, L. Miranda-Moreno, S. Choi, and L. Sun, “Adversarial vulnerabilities in large language models for time series forecasting,” inNeurips Safe Generative AI Workshop 2024, 2024
work page 2024
-
[13]
Backtime: Backdoor attacks on multivariate time series forecasting,
X. Lin, Z. Liu, D. Fu, R. Qiu, and H. Tong, “Backtime: Backdoor attacks on multivariate time series forecasting,”Advances in Neural Information Processing Systems, vol. 37, pp. 131344–131368, 2024
work page 2024
-
[14]
Investigating machine learning attacks on financial time series models,
M. Gallagher, N. Pitropakis, C. Chrysoulas, P. Papadopoulos, A. My- lonas, and S. Katsikas, “Investigating machine learning attacks on financial time series models,”Computers & Security, vol. 123, p. 102933, 2022
work page 2022
-
[15]
Small per- turbations are enough: Adversarial attacks on time series prediction,
T. Wu, X. Wang, S. Qiao, X. Xian, Y . Liu, and L. Zhang, “Small per- turbations are enough: Adversarial attacks on time series prediction,” Information Sciences, vol. 587, pp. 794–812, 2022
work page 2022
-
[16]
Adversarial attacks on time-series intrusion detection for industrial control systems,
G. Zizzo, C. Hankin, S. Maffeis, and K. Jones, “Adversarial attacks on time-series intrusion detection for industrial control systems,” in2020 IEEE 19th international conference on trust, security and privacy in computing and communications (TrustCom), pp. 899–910, IEEE, 2020
work page 2020
-
[17]
Exploring adversarial robustness of deep state space models,
B. Qi, Y . Luo, J. Gao, P. Li, K. Tian, Z. Ma, and B. Zhou, “Exploring adversarial robustness of deep state space models,”Advances in Neural Information Processing Systems, vol. 37, pp. 6549–6573, 2024
work page 2024
-
[18]
Rambo: Reliability analysis for mamba through bit-flip attack opti- mization,
S. Das, S. Bhattacharya, S. Kundu, A. Raha, S. Kundu, and K. Basu, “Rambo: Reliability analysis for mamba through bit-flip attack opti- mization,”arXiv preprint arXiv:2512.15778, 2025
-
[19]
Badvim: Unveiling backdoor threats in visual state space model,
C.-Y . Lee, Y .-H. Chiang, Z.-Y . Wu, C.-M. Yu, and C.-S. Lu, “Badvim: Unveiling backdoor threats in visual state space model,”arXiv preprint arXiv:2408.11679, 2024
-
[20]
Conditions for effective mitigation of attack impact via randomized detector tuning,
S. C. Anand, K. Hassan, and H. Sandberg, “Conditions for effective mitigation of attack impact via randomized detector tuning,” in2025 IEEE 64th Conference on Decision and Control (CDC), pp. 5002– 5007, IEEE, 2025
work page 2025
-
[21]
Towards deep learning models resistant to adversarial attacks,
A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in International Conference on Learning Representations, 2018
work page 2018
-
[22]
Robust probabilistic time series forecasting,
T. Yoon, Y . Park, E. K. Ryu, and Y . Wang, “Robust probabilistic time series forecasting,” inInternational Conference on Artificial Intelligence and Statistics, pp. 1336–1358, PMLR, 2022
work page 2022
-
[23]
A se- cure control framework for resource-limited adversaries,
A. Teixeira, I. Shames, H. Sandberg, and K. H. Johansson, “A se- cure control framework for resource-limited adversaries,”Automatica, vol. 51, pp. 135–148, 2015
work page 2015
-
[24]
G. E. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung,Time series analysis: forecasting and control. John Wiley & Sons, 2015
work page 2015
-
[25]
Can a transformer represent a Kalman filter?,
G. Goel and P. Bartlett, “Can a transformer represent a Kalman filter?,” in6th Annual Learning for Dynamics & Control Conference, pp. 1502–1512, PMLR, 2024
work page 2024
-
[26]
ElectricityLoadDiagrams20112014
A. Trindade, “ElectricityLoadDiagrams20112014.” UCI Machine Learning Repository, 2015. DOI: https://doi.org/10.24432/C58C86
-
[27]
Monash time series forecasting archive,
R. W. Godahewa, C. Bergmeir, G. Webb, R. Hyndman, and P. Montero-Manso, “Monash time series forecasting archive,” inProc. of the Neural Information Processing Systems Track on Datasets and Benchmarks(J. Vanschoren and S. Yeung, eds.), vol. 1, 2021
work page 2021
-
[28]
Anomaly detection based on convolutional recurrent autoencoder for IoT time series,
C. Yin, S. Zhang, J. Wang, and N. N. Xiong, “Anomaly detection based on convolutional recurrent autoencoder for IoT time series,”IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 1, pp. 112–122, 2020
work page 2020
-
[29]
False data injection attacks against state estimation in electric power grids,
Y . Liu, P. Ning, and M. K. Reiter, “False data injection attacks against state estimation in electric power grids,”ACM Trans. on Information and System Security (TISSEC), vol. 14, no. 1, pp. 1–33, 2011. APPENDIX The results obtained on three additional Monash bench- mark datasets [27] are presented in Table II. Fine-tuning is performed with around1%of at...
work page 2011
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.