arxiv: 2604.03427 · v1 · submitted 2026-04-03 · 💻 cs.LG · cs.SY· eess.SY

Recognition: 2 theorem links

· Lean Theorem

Adversarial Robustness of Deep State Space Models for Forecasting

Sribalaji C. Anand , George J. Pappas

Authors on Pith no claims yet

Pith reviewed 2026-05-13 19:43 UTC · model grok-4.3

classification 💻 cs.LG cs.SYeess.SY

keywords adversarial robustnessstate space modelstime series forecastingKalman predictorStackelberg gamemodel-free attacksautoregressive processesSpacetime architecture

0 comments

The pith

Spacetime SSM forecasters can exactly match the optimal Kalman predictor for autoregressive processes, yet their error grows with instability and decoder size under adversarial attacks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the decoder-only Spacetime architecture is the only SSM able to represent the optimal Kalman predictor when the data-generating process is autoregressive. Using this equivalence, the authors derive closed-form bounds showing that open-loop instability, closed-loop instability, and larger decoder state dimensions each increase vulnerability to worst-case stealthy adversaries. They cast robust design as a Stackelberg game solved by adversarial training and demonstrate that model-free attacks, which exploit local linearity without needing gradients, produce at least 33 percent more forecasting error than small-step projected gradient descent on Monash benchmarks.

Core claim

The decoder-only Spacetime architecture can represent the optimal Kalman predictor when the underlying data-generating process is autoregressive, a property no other SSM possesses. This equivalence enables closed-form bounds on adversarial forecasting error that quantify the amplifying effects of open-loop instability, closed-loop instability, and decoder state dimension. Robust forecaster design is formulated as a Stackelberg game against worst-case stealthy adversaries constrained by a detection budget and solved via adversarial training, while model-free attacks that bypass gradient computations are shown to be more effective than projected gradient descent.

What carries the argument

The decoder-only Spacetime SSM architecture, which achieves exact equivalence to the optimal Kalman predictor under autoregressive assumptions, together with the Stackelberg game formulation that solves for robust design against detection-budget-constrained adversaries.

If this is right

When data follows an autoregressive process, the decoder-only Spacetime model achieves the exact optimal Kalman predictor.
Adversarial forecasting error increases with both open-loop and closed-loop instability of the underlying system.
Larger decoder state dimensions directly amplify the upper bound on adversarial error.
Adversarial training solves the Stackelberg game and yields more robust forecasters.
Model-free attacks that use only local linear input-output behavior outperform gradient-based attacks without requiring model access.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Stability metrics could serve as a practical design criterion when selecting SSM architectures for forecasting under potential attacks.
The local-linearity exploitation in model-free attacks may extend to other recurrent or state-space time-series models beyond Spacetime.
The Stackelberg formulation with detection budgets could be adapted to robust learning in related control or dynamical systems problems.
Evaluating the equivalence on processes that deviate from strict autoregression would clarify the practical scope of the Kalman representation result.

Load-bearing premise

The data-generating process must be autoregressive for the Spacetime model to match the optimal Kalman predictor, and adversaries must operate under a fixed detection budget in the Stackelberg formulation.

What would settle it

A counterexample on a synthetic autoregressive dataset where the Spacetime model fails to achieve the same one-step prediction error as the true Kalman filter, or empirical results on the Monash benchmarks where model-free attacks do not exceed projected gradient descent error by at least 33 percent.

Figures

Figures reproduced from arXiv: 2604.03427 by George J. Pappas, Sribalaji C. Anand.

**Figure 2.** Figure 2: Spacetime architecture (left) and layer components (right). Here [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Forecaster performance on test data excerpt (left) and distribution of [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Adversarial error as a function of ℓ (left) and h (right), with approximately constant spectral radius across models in both experiments. and H2 ∈ R nd×ℓ is a matrix whose columns are given by (H2):,j = X ℓ−1 k=j A¯ℓ−1−kB CA ¯ k−jB. Using submultiplicativity of the spectral norm and the inequality ∥H2∥2 ≤ √ nd∥H2∥∞, we obtain ∥H∥2 ≤ ∥H1∥2∥H2∥2 ≤ ∥H1∥2 √ nd∥H2∥∞. Thus, the decoder state dimension plays a n… view at source ↗

**Figure 5.** Figure 5: Adversarial error caused by PGD attacks, and data-driven attacks. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

State-space model (SSM) for time-series forecasting have demonstrated strong empirical performance on benchmark datasets, yet their robustness under adversarial perturbations is poorly understood. We address this gap through a control-theoretic lens, focusing on the recently proposed Spacetime SSM forecaster. We first establish that the decoder-only Spacetime architecture can represent the optimal Kalman predictor when the underlying data-generating process is autoregressive - a property no other SSM possesses. Building on this, we formulate robust forecaster design as a Stackelberg game against worst-case stealthy adversaries constrained by a detection budget, and solve it via adversarial training. We derive closed-form bounds on adversarial forecasting error that expose how open-loop instability, closed-loop instability, and decoder state dimension each amplify vulnerability - offering actionable principles towards robust forecaster design. Finally, we show that even adversaries with no access to the forecaster can nonetheless construct effective attacks by exploiting the model's locally linear input-output behavior, bypassing gradient computations entirely. Experiments on the Monash benchmark datasets highlight that model-free attacks, without any gradient computation, can cause at least 33% more error than projected gradient descent with a small step size.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper brings a clean control-theoretic lens to Spacetime SSM robustness with closed-form bounds and a practical model-free attack, but the Kalman equivalence claim needs explicit verification in the derivations.

read the letter

The main thing here is a control-theoretic analysis of adversarial robustness for the Spacetime SSM forecaster. It shows that the decoder-only version can represent the optimal Kalman predictor for autoregressive data, sets up robust design as a Stackelberg game against stealthy adversaries with a detection budget, derives closed-form bounds linking vulnerability to open-loop instability, closed-loop instability, and decoder dimension, and demonstrates a model-free attack that exploits local linearity to beat projected gradient descent by at least 33% on Monash benchmarks without needing gradients or model access.

Referee Report

2 major / 2 minor

Summary. The paper claims that decoder-only Spacetime SSMs can exactly represent the optimal Kalman predictor for autoregressive data-generating processes (a property asserted to be unique among SSMs). It formulates robust forecasting as a Stackelberg game against stealthy adversaries under a detection budget, derives closed-form bounds showing how open-loop instability, closed-loop instability, and decoder state dimension amplify adversarial error, and reports that model-free attacks (exploiting local linearity) cause at least 33% more forecasting error than small-step PGD on Monash benchmarks.

Significance. If the Kalman equivalence holds exactly and the closed-form bounds are valid, the work would supply concrete theoretical principles for robust SSM forecaster design and introduce practical model-free attack methods. The use of standard Monash benchmarks and the control-theoretic framing add relevance to time-series robustness literature, though the load-bearing nature of the equivalence means the significance is conditional on verification of that step.

major comments (2)

[Abstract / Kalman equivalence derivation] Abstract and the section deriving the Kalman equivalence: the assertion that decoder-only Spacetime exactly recovers the optimal Kalman predictor recursion (including the gain and one-step predictor) for arbitrary AR order is load-bearing for the subsequent Stackelberg formulation and closed-form bounds. The manuscript must exhibit the precise parameterization of the hidden-state transition matrix and observation map that matches the Kalman filter equations without residual approximation error; any mismatch would render the instability-based bounds inapplicable to the trained model.
[Experiments] Experiments section reporting the 33% error gap: the claim that model-free attacks cause at least 33% more error than PGD lacks reported error bars, number of random seeds, or statistical tests. Without these, the quantitative comparison cannot reliably support the superiority statement, especially given the low verification level of the theoretical claims.

minor comments (2)

[Bounds derivation] Clarify in the notation or bounds section whether the closed-form expressions assume exact linearity or hold under the locally linear approximation used for the model-free attack.
[Introduction / Related work] Add explicit comparison (even a brief remark) showing why other SSM architectures cannot achieve the exact Kalman representation, to substantiate the uniqueness claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the paper.

read point-by-point responses

Referee: Abstract and the section deriving the Kalman equivalence: the assertion that decoder-only Spacetime exactly recovers the optimal Kalman predictor recursion (including the gain and one-step predictor) for arbitrary AR order is load-bearing for the subsequent Stackelberg formulation and closed-form bounds. The manuscript must exhibit the precise parameterization of the hidden-state transition matrix and observation map that matches the Kalman filter equations without residual approximation error; any mismatch would render the instability-based bounds inapplicable to the trained model.

Authors: We appreciate the referee's focus on the foundational Kalman equivalence claim. Our derivation in the main text parameterizes the Spacetime SSM's hidden state transition matrix as the companion form of the AR coefficients and the observation map to extract the one-step prediction, with the decoder state dimension set to match the AR order. This allows exact recovery of the Kalman recursion, including the optimal gain from the Riccati solution. To address the request for explicit exhibition, we will revise the manuscript by adding a new appendix or subsection that provides the full matrix expressions and verifies the equivalence for arbitrary AR orders, ensuring no residual error and validating the bounds. revision: yes
Referee: Experiments section reporting the 33% error gap: the claim that model-free attacks cause at least 33% more error than PGD lacks reported error bars, number of random seeds, or statistical tests. Without these, the quantitative comparison cannot reliably support the superiority statement, especially given the low verification level of the theoretical claims.

Authors: We concur that including error bars, seed counts, and statistical tests will improve the reliability of the experimental results. We will update the Experiments section to report the forecasting errors averaged over 5 independent random seeds, with standard error bars shown in the relevant tables and figures. Additionally, we will include p-values from appropriate statistical tests to confirm the significance of the observed 33% or greater error increase for model-free attacks compared to PGD. revision: yes

Circularity Check

0 steps flagged

Minor self-citation for Spacetime architecture; Kalman equivalence and bounds derived independently

full rationale

The derivation begins by explicitly matching the decoder-only Spacetime state-update and output equations to the Kalman predictor recursion for autoregressive processes, then proceeds to Stackelberg-game bounds on error amplification via open- and closed-loop instability and state dimension. These steps use the architecture's linear realization and standard game-theoretic setup rather than re-using fitted parameters or self-referential definitions. Experiments rely on external Monash benchmarks, and model-free attacks exploit local linearity without gradient fitting. The sole self-citation (for the recently proposed Spacetime forecaster) is not load-bearing for the equivalence claim or the subsequent bounds.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the autoregressive data assumption for Kalman equivalence and the constrained adversary model; no free parameters or invented entities are introduced beyond standard control-theoretic constructs.

axioms (2)

domain assumption Underlying data-generating process is autoregressive
Invoked to establish that decoder-only Spacetime represents the optimal Kalman predictor.
domain assumption Adversaries are constrained by a detection budget in the Stackelberg game
Used to formulate and solve the robust forecaster design problem.

pith-pipeline@v0.9.0 · 5507 in / 1330 out tokens · 51755 ms · 2026-05-13T19:43:55.030494+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Proposition 3: sup ||Hε|| = σ_max(H) with H built from encoder/decoder state matrices; bounds involve ρ(A) and ρ(Ā+BK)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 1 internal anchor

[1]

Modeling and forecasting realized volatility,

T. G. Andersen, T. Bollerslev, F. X. Diebold, and P. Labys, “Modeling and forecasting realized volatility,”Econometrica, vol. 71, no. 2, pp. 579–625, 2003

work page 2003
[2]

ARIMA-based de- coupled time series forecasting of electric vehicle charging demand for stochastic power system operation,

M. H. Amini, A. Kargarian, and O. Karabasoglu, “ARIMA-based de- coupled time series forecasting of electric vehicle charging demand for stochastic power system operation,”Electric Power Systems Research, vol. 140, pp. 378–390, 2016

work page 2016
[3]

Trend analysis of climate time series: A review of methods,

M. Mudelsee, “Trend analysis of climate time series: A review of methods,”Earth-science reviews, vol. 190, pp. 310–322, 2019

work page 2019
[4]

A survey on graph neural networks for time series: Forecasting, classification, imputation, and anomaly detection,

M. Jin, H. Y . Koh, Q. Wen, D. Zambon, C. Alippi, G. I. Webb, I. King, and S. Pan, “A survey on graph neural networks for time series: Forecasting, classification, imputation, and anomaly detection,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

work page 2024
[5]

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

S. Bai, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,”arXiv:1803.01271, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[6]

A decoder-only foundation model for time-series forecasting,

A. Das, W. Kong, R. Sen, and Y . Zhou, “A decoder-only foundation model for time-series forecasting,” inForty-first International Confer- ence on Machine Learning, 2024

work page 2024
[7]

Deep state space models for time series forecasting,

S. S. Rangapuram, M. W. Seeger, J. Gasthaus, L. Stella, Y . Wang, and T. Januschowski, “Deep state space models for time series forecasting,” Advances in neural information processing systems, vol. 31, 2018

work page 2018
[8]

Are transformers effective for time series forecasting?,

A. Zeng, M. Chen, L. Zhang, and Q. Xu, “Are transformers effective for time series forecasting?,” inProceedings of the AAAI conference on artificial intelligence, vol. 37, pp. 11121–11128, 2023

work page 2023
[9]

Effectively modeling time series with simple discrete state spaces,

M. Zhang, K. K. Saab, M. Poli, T. Dao, K. Goel, and C. Re, “Effectively modeling time series with simple discrete state spaces,” inThe Eleventh Intl. Conference on Learning Representations, 2023

work page 2023
[10]

Efficiently modeling long sequences with structured state spaces,

A. Gu, K. Goel, and C. R ´e, “Efficiently modeling long sequences with structured state spaces,” inThe International Conference on Learning Representations (ICLR), 2022

work page 2022
[11]

Mamba: Linear-time sequence modeling with selective state spaces,

A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,” inFirst conf. on language modeling, 2024

work page 2024
[12]

Adversarial vulnerabilities in large language models for time series forecasting,

F. Liu, S. Jiang, L. Miranda-Moreno, S. Choi, and L. Sun, “Adversarial vulnerabilities in large language models for time series forecasting,” inNeurips Safe Generative AI Workshop 2024, 2024

work page 2024
[13]

Backtime: Backdoor attacks on multivariate time series forecasting,

X. Lin, Z. Liu, D. Fu, R. Qiu, and H. Tong, “Backtime: Backdoor attacks on multivariate time series forecasting,”Advances in Neural Information Processing Systems, vol. 37, pp. 131344–131368, 2024

work page 2024
[14]

Investigating machine learning attacks on financial time series models,

M. Gallagher, N. Pitropakis, C. Chrysoulas, P. Papadopoulos, A. My- lonas, and S. Katsikas, “Investigating machine learning attacks on financial time series models,”Computers & Security, vol. 123, p. 102933, 2022

work page 2022
[15]

Small per- turbations are enough: Adversarial attacks on time series prediction,

T. Wu, X. Wang, S. Qiao, X. Xian, Y . Liu, and L. Zhang, “Small per- turbations are enough: Adversarial attacks on time series prediction,” Information Sciences, vol. 587, pp. 794–812, 2022

work page 2022
[16]

Adversarial attacks on time-series intrusion detection for industrial control systems,

G. Zizzo, C. Hankin, S. Maffeis, and K. Jones, “Adversarial attacks on time-series intrusion detection for industrial control systems,” in2020 IEEE 19th international conference on trust, security and privacy in computing and communications (TrustCom), pp. 899–910, IEEE, 2020

work page 2020
[17]

Exploring adversarial robustness of deep state space models,

B. Qi, Y . Luo, J. Gao, P. Li, K. Tian, Z. Ma, and B. Zhou, “Exploring adversarial robustness of deep state space models,”Advances in Neural Information Processing Systems, vol. 37, pp. 6549–6573, 2024

work page 2024
[18]

Rambo: Reliability analysis for mamba through bit-flip attack opti- mization,

S. Das, S. Bhattacharya, S. Kundu, A. Raha, S. Kundu, and K. Basu, “Rambo: Reliability analysis for mamba through bit-flip attack opti- mization,”arXiv preprint arXiv:2512.15778, 2025

work page arXiv 2025
[19]

Badvim: Unveiling backdoor threats in visual state space model,

C.-Y . Lee, Y .-H. Chiang, Z.-Y . Wu, C.-M. Yu, and C.-S. Lu, “Badvim: Unveiling backdoor threats in visual state space model,”arXiv preprint arXiv:2408.11679, 2024

work page arXiv 2024
[20]

Conditions for effective mitigation of attack impact via randomized detector tuning,

S. C. Anand, K. Hassan, and H. Sandberg, “Conditions for effective mitigation of attack impact via randomized detector tuning,” in2025 IEEE 64th Conference on Decision and Control (CDC), pp. 5002– 5007, IEEE, 2025

work page 2025
[21]

Towards deep learning models resistant to adversarial attacks,

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in International Conference on Learning Representations, 2018

work page 2018
[22]

Robust probabilistic time series forecasting,

T. Yoon, Y . Park, E. K. Ryu, and Y . Wang, “Robust probabilistic time series forecasting,” inInternational Conference on Artificial Intelligence and Statistics, pp. 1336–1358, PMLR, 2022

work page 2022
[23]

A se- cure control framework for resource-limited adversaries,

A. Teixeira, I. Shames, H. Sandberg, and K. H. Johansson, “A se- cure control framework for resource-limited adversaries,”Automatica, vol. 51, pp. 135–148, 2015

work page 2015
[24]

G. E. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung,Time series analysis: forecasting and control. John Wiley & Sons, 2015

work page 2015
[25]

Can a transformer represent a Kalman filter?,

G. Goel and P. Bartlett, “Can a transformer represent a Kalman filter?,” in6th Annual Learning for Dynamics & Control Conference, pp. 1502–1512, PMLR, 2024

work page 2024
[26]

ElectricityLoadDiagrams20112014

A. Trindade, “ElectricityLoadDiagrams20112014.” UCI Machine Learning Repository, 2015. DOI: https://doi.org/10.24432/C58C86

work page doi:10.24432/c58c86 2015
[27]

Monash time series forecasting archive,

R. W. Godahewa, C. Bergmeir, G. Webb, R. Hyndman, and P. Montero-Manso, “Monash time series forecasting archive,” inProc. of the Neural Information Processing Systems Track on Datasets and Benchmarks(J. Vanschoren and S. Yeung, eds.), vol. 1, 2021

work page 2021
[28]

Anomaly detection based on convolutional recurrent autoencoder for IoT time series,

C. Yin, S. Zhang, J. Wang, and N. N. Xiong, “Anomaly detection based on convolutional recurrent autoencoder for IoT time series,”IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 1, pp. 112–122, 2020

work page 2020
[29]

False data injection attacks against state estimation in electric power grids,

Y . Liu, P. Ning, and M. K. Reiter, “False data injection attacks against state estimation in electric power grids,”ACM Trans. on Information and System Security (TISSEC), vol. 14, no. 1, pp. 1–33, 2011. APPENDIX The results obtained on three additional Monash bench- mark datasets [27] are presented in Table II. Fine-tuning is performed with around1%of at...

work page 2011