pith. machine review for the scientific record. sign in

arxiv: 2604.03427 · v1 · submitted 2026-04-03 · 💻 cs.LG · cs.SY· eess.SY

Recognition: 2 theorem links

· Lean Theorem

Adversarial Robustness of Deep State Space Models for Forecasting

Authors on Pith no claims yet

Pith reviewed 2026-05-13 19:43 UTC · model grok-4.3

classification 💻 cs.LG cs.SYeess.SY
keywords adversarial robustnessstate space modelstime series forecastingKalman predictorStackelberg gamemodel-free attacksautoregressive processesSpacetime architecture
0
0 comments X

The pith

Spacetime SSM forecasters can exactly match the optimal Kalman predictor for autoregressive processes, yet their error grows with instability and decoder size under adversarial attacks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the decoder-only Spacetime architecture is the only SSM able to represent the optimal Kalman predictor when the data-generating process is autoregressive. Using this equivalence, the authors derive closed-form bounds showing that open-loop instability, closed-loop instability, and larger decoder state dimensions each increase vulnerability to worst-case stealthy adversaries. They cast robust design as a Stackelberg game solved by adversarial training and demonstrate that model-free attacks, which exploit local linearity without needing gradients, produce at least 33 percent more forecasting error than small-step projected gradient descent on Monash benchmarks.

Core claim

The decoder-only Spacetime architecture can represent the optimal Kalman predictor when the underlying data-generating process is autoregressive, a property no other SSM possesses. This equivalence enables closed-form bounds on adversarial forecasting error that quantify the amplifying effects of open-loop instability, closed-loop instability, and decoder state dimension. Robust forecaster design is formulated as a Stackelberg game against worst-case stealthy adversaries constrained by a detection budget and solved via adversarial training, while model-free attacks that bypass gradient computations are shown to be more effective than projected gradient descent.

What carries the argument

The decoder-only Spacetime SSM architecture, which achieves exact equivalence to the optimal Kalman predictor under autoregressive assumptions, together with the Stackelberg game formulation that solves for robust design against detection-budget-constrained adversaries.

If this is right

  • When data follows an autoregressive process, the decoder-only Spacetime model achieves the exact optimal Kalman predictor.
  • Adversarial forecasting error increases with both open-loop and closed-loop instability of the underlying system.
  • Larger decoder state dimensions directly amplify the upper bound on adversarial error.
  • Adversarial training solves the Stackelberg game and yields more robust forecasters.
  • Model-free attacks that use only local linear input-output behavior outperform gradient-based attacks without requiring model access.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Stability metrics could serve as a practical design criterion when selecting SSM architectures for forecasting under potential attacks.
  • The local-linearity exploitation in model-free attacks may extend to other recurrent or state-space time-series models beyond Spacetime.
  • The Stackelberg formulation with detection budgets could be adapted to robust learning in related control or dynamical systems problems.
  • Evaluating the equivalence on processes that deviate from strict autoregression would clarify the practical scope of the Kalman representation result.

Load-bearing premise

The data-generating process must be autoregressive for the Spacetime model to match the optimal Kalman predictor, and adversaries must operate under a fixed detection budget in the Stackelberg formulation.

What would settle it

A counterexample on a synthetic autoregressive dataset where the Spacetime model fails to achieve the same one-step prediction error as the true Kalman filter, or empirical results on the Monash benchmarks where model-free attacks do not exceed projected gradient descent error by at least 33 percent.

Figures

Figures reproduced from arXiv: 2604.03427 by George J. Pappas, Sribalaji C. Anand.

Figure 1
Figure 1. Figure 1: Problem setup: the adversary (red) injects attack signal into the data [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Spacetime architecture (left) and layer components (right). Here [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Forecaster performance on test data excerpt (left) and distribution of [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Adversarial error as a function of ℓ (left) and h (right), with approximately constant spectral radius across models in both experiments. and H2 ∈ R nd×ℓ is a matrix whose columns are given by (H2):,j = X ℓ−1 k=j A¯ℓ−1−kB CA ¯ k−jB. Using submultiplica￾tivity of the spectral norm and the inequality ∥H2∥2 ≤ √ nd∥H2∥∞, we obtain ∥H∥2 ≤ ∥H1∥2∥H2∥2 ≤ ∥H1∥2 √ nd∥H2∥∞. Thus, the decoder state dimension plays a n… view at source ↗
Figure 5
Figure 5. Figure 5: Adversarial error caused by PGD attacks, and data-driven attacks. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
read the original abstract

State-space model (SSM) for time-series forecasting have demonstrated strong empirical performance on benchmark datasets, yet their robustness under adversarial perturbations is poorly understood. We address this gap through a control-theoretic lens, focusing on the recently proposed Spacetime SSM forecaster. We first establish that the decoder-only Spacetime architecture can represent the optimal Kalman predictor when the underlying data-generating process is autoregressive - a property no other SSM possesses. Building on this, we formulate robust forecaster design as a Stackelberg game against worst-case stealthy adversaries constrained by a detection budget, and solve it via adversarial training. We derive closed-form bounds on adversarial forecasting error that expose how open-loop instability, closed-loop instability, and decoder state dimension each amplify vulnerability - offering actionable principles towards robust forecaster design. Finally, we show that even adversaries with no access to the forecaster can nonetheless construct effective attacks by exploiting the model's locally linear input-output behavior, bypassing gradient computations entirely. Experiments on the Monash benchmark datasets highlight that model-free attacks, without any gradient computation, can cause at least 33% more error than projected gradient descent with a small step size.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that decoder-only Spacetime SSMs can exactly represent the optimal Kalman predictor for autoregressive data-generating processes (a property asserted to be unique among SSMs). It formulates robust forecasting as a Stackelberg game against stealthy adversaries under a detection budget, derives closed-form bounds showing how open-loop instability, closed-loop instability, and decoder state dimension amplify adversarial error, and reports that model-free attacks (exploiting local linearity) cause at least 33% more forecasting error than small-step PGD on Monash benchmarks.

Significance. If the Kalman equivalence holds exactly and the closed-form bounds are valid, the work would supply concrete theoretical principles for robust SSM forecaster design and introduce practical model-free attack methods. The use of standard Monash benchmarks and the control-theoretic framing add relevance to time-series robustness literature, though the load-bearing nature of the equivalence means the significance is conditional on verification of that step.

major comments (2)
  1. [Abstract / Kalman equivalence derivation] Abstract and the section deriving the Kalman equivalence: the assertion that decoder-only Spacetime exactly recovers the optimal Kalman predictor recursion (including the gain and one-step predictor) for arbitrary AR order is load-bearing for the subsequent Stackelberg formulation and closed-form bounds. The manuscript must exhibit the precise parameterization of the hidden-state transition matrix and observation map that matches the Kalman filter equations without residual approximation error; any mismatch would render the instability-based bounds inapplicable to the trained model.
  2. [Experiments] Experiments section reporting the 33% error gap: the claim that model-free attacks cause at least 33% more error than PGD lacks reported error bars, number of random seeds, or statistical tests. Without these, the quantitative comparison cannot reliably support the superiority statement, especially given the low verification level of the theoretical claims.
minor comments (2)
  1. [Bounds derivation] Clarify in the notation or bounds section whether the closed-form expressions assume exact linearity or hold under the locally linear approximation used for the model-free attack.
  2. [Introduction / Related work] Add explicit comparison (even a brief remark) showing why other SSM architectures cannot achieve the exact Kalman representation, to substantiate the uniqueness claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the paper.

read point-by-point responses
  1. Referee: Abstract and the section deriving the Kalman equivalence: the assertion that decoder-only Spacetime exactly recovers the optimal Kalman predictor recursion (including the gain and one-step predictor) for arbitrary AR order is load-bearing for the subsequent Stackelberg formulation and closed-form bounds. The manuscript must exhibit the precise parameterization of the hidden-state transition matrix and observation map that matches the Kalman filter equations without residual approximation error; any mismatch would render the instability-based bounds inapplicable to the trained model.

    Authors: We appreciate the referee's focus on the foundational Kalman equivalence claim. Our derivation in the main text parameterizes the Spacetime SSM's hidden state transition matrix as the companion form of the AR coefficients and the observation map to extract the one-step prediction, with the decoder state dimension set to match the AR order. This allows exact recovery of the Kalman recursion, including the optimal gain from the Riccati solution. To address the request for explicit exhibition, we will revise the manuscript by adding a new appendix or subsection that provides the full matrix expressions and verifies the equivalence for arbitrary AR orders, ensuring no residual error and validating the bounds. revision: yes

  2. Referee: Experiments section reporting the 33% error gap: the claim that model-free attacks cause at least 33% more error than PGD lacks reported error bars, number of random seeds, or statistical tests. Without these, the quantitative comparison cannot reliably support the superiority statement, especially given the low verification level of the theoretical claims.

    Authors: We concur that including error bars, seed counts, and statistical tests will improve the reliability of the experimental results. We will update the Experiments section to report the forecasting errors averaged over 5 independent random seeds, with standard error bars shown in the relevant tables and figures. Additionally, we will include p-values from appropriate statistical tests to confirm the significance of the observed 33% or greater error increase for model-free attacks compared to PGD. revision: yes

Circularity Check

0 steps flagged

Minor self-citation for Spacetime architecture; Kalman equivalence and bounds derived independently

full rationale

The derivation begins by explicitly matching the decoder-only Spacetime state-update and output equations to the Kalman predictor recursion for autoregressive processes, then proceeds to Stackelberg-game bounds on error amplification via open- and closed-loop instability and state dimension. These steps use the architecture's linear realization and standard game-theoretic setup rather than re-using fitted parameters or self-referential definitions. Experiments rely on external Monash benchmarks, and model-free attacks exploit local linearity without gradient fitting. The sole self-citation (for the recently proposed Spacetime forecaster) is not load-bearing for the equivalence claim or the subsequent bounds.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the autoregressive data assumption for Kalman equivalence and the constrained adversary model; no free parameters or invented entities are introduced beyond standard control-theoretic constructs.

axioms (2)
  • domain assumption Underlying data-generating process is autoregressive
    Invoked to establish that decoder-only Spacetime represents the optimal Kalman predictor.
  • domain assumption Adversaries are constrained by a detection budget in the Stackelberg game
    Used to formulate and solve the robust forecaster design problem.

pith-pipeline@v0.9.0 · 5507 in / 1330 out tokens · 51755 ms · 2026-05-13T19:43:55.030494+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 1 internal anchor

  1. [1]

    Modeling and forecasting realized volatility,

    T. G. Andersen, T. Bollerslev, F. X. Diebold, and P. Labys, “Modeling and forecasting realized volatility,”Econometrica, vol. 71, no. 2, pp. 579–625, 2003

  2. [2]

    ARIMA-based de- coupled time series forecasting of electric vehicle charging demand for stochastic power system operation,

    M. H. Amini, A. Kargarian, and O. Karabasoglu, “ARIMA-based de- coupled time series forecasting of electric vehicle charging demand for stochastic power system operation,”Electric Power Systems Research, vol. 140, pp. 378–390, 2016

  3. [3]

    Trend analysis of climate time series: A review of methods,

    M. Mudelsee, “Trend analysis of climate time series: A review of methods,”Earth-science reviews, vol. 190, pp. 310–322, 2019

  4. [4]

    A survey on graph neural networks for time series: Forecasting, classification, imputation, and anomaly detection,

    M. Jin, H. Y . Koh, Q. Wen, D. Zambon, C. Alippi, G. I. Webb, I. King, and S. Pan, “A survey on graph neural networks for time series: Forecasting, classification, imputation, and anomaly detection,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

  5. [5]

    An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

    S. Bai, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,”arXiv:1803.01271, 2018

  6. [6]

    A decoder-only foundation model for time-series forecasting,

    A. Das, W. Kong, R. Sen, and Y . Zhou, “A decoder-only foundation model for time-series forecasting,” inForty-first International Confer- ence on Machine Learning, 2024

  7. [7]

    Deep state space models for time series forecasting,

    S. S. Rangapuram, M. W. Seeger, J. Gasthaus, L. Stella, Y . Wang, and T. Januschowski, “Deep state space models for time series forecasting,” Advances in neural information processing systems, vol. 31, 2018

  8. [8]

    Are transformers effective for time series forecasting?,

    A. Zeng, M. Chen, L. Zhang, and Q. Xu, “Are transformers effective for time series forecasting?,” inProceedings of the AAAI conference on artificial intelligence, vol. 37, pp. 11121–11128, 2023

  9. [9]

    Effectively modeling time series with simple discrete state spaces,

    M. Zhang, K. K. Saab, M. Poli, T. Dao, K. Goel, and C. Re, “Effectively modeling time series with simple discrete state spaces,” inThe Eleventh Intl. Conference on Learning Representations, 2023

  10. [10]

    Efficiently modeling long sequences with structured state spaces,

    A. Gu, K. Goel, and C. R ´e, “Efficiently modeling long sequences with structured state spaces,” inThe International Conference on Learning Representations (ICLR), 2022

  11. [11]

    Mamba: Linear-time sequence modeling with selective state spaces,

    A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,” inFirst conf. on language modeling, 2024

  12. [12]

    Adversarial vulnerabilities in large language models for time series forecasting,

    F. Liu, S. Jiang, L. Miranda-Moreno, S. Choi, and L. Sun, “Adversarial vulnerabilities in large language models for time series forecasting,” inNeurips Safe Generative AI Workshop 2024, 2024

  13. [13]

    Backtime: Backdoor attacks on multivariate time series forecasting,

    X. Lin, Z. Liu, D. Fu, R. Qiu, and H. Tong, “Backtime: Backdoor attacks on multivariate time series forecasting,”Advances in Neural Information Processing Systems, vol. 37, pp. 131344–131368, 2024

  14. [14]

    Investigating machine learning attacks on financial time series models,

    M. Gallagher, N. Pitropakis, C. Chrysoulas, P. Papadopoulos, A. My- lonas, and S. Katsikas, “Investigating machine learning attacks on financial time series models,”Computers & Security, vol. 123, p. 102933, 2022

  15. [15]

    Small per- turbations are enough: Adversarial attacks on time series prediction,

    T. Wu, X. Wang, S. Qiao, X. Xian, Y . Liu, and L. Zhang, “Small per- turbations are enough: Adversarial attacks on time series prediction,” Information Sciences, vol. 587, pp. 794–812, 2022

  16. [16]

    Adversarial attacks on time-series intrusion detection for industrial control systems,

    G. Zizzo, C. Hankin, S. Maffeis, and K. Jones, “Adversarial attacks on time-series intrusion detection for industrial control systems,” in2020 IEEE 19th international conference on trust, security and privacy in computing and communications (TrustCom), pp. 899–910, IEEE, 2020

  17. [17]

    Exploring adversarial robustness of deep state space models,

    B. Qi, Y . Luo, J. Gao, P. Li, K. Tian, Z. Ma, and B. Zhou, “Exploring adversarial robustness of deep state space models,”Advances in Neural Information Processing Systems, vol. 37, pp. 6549–6573, 2024

  18. [18]

    Rambo: Reliability analysis for mamba through bit-flip attack opti- mization,

    S. Das, S. Bhattacharya, S. Kundu, A. Raha, S. Kundu, and K. Basu, “Rambo: Reliability analysis for mamba through bit-flip attack opti- mization,”arXiv preprint arXiv:2512.15778, 2025

  19. [19]

    Badvim: Unveiling backdoor threats in visual state space model,

    C.-Y . Lee, Y .-H. Chiang, Z.-Y . Wu, C.-M. Yu, and C.-S. Lu, “Badvim: Unveiling backdoor threats in visual state space model,”arXiv preprint arXiv:2408.11679, 2024

  20. [20]

    Conditions for effective mitigation of attack impact via randomized detector tuning,

    S. C. Anand, K. Hassan, and H. Sandberg, “Conditions for effective mitigation of attack impact via randomized detector tuning,” in2025 IEEE 64th Conference on Decision and Control (CDC), pp. 5002– 5007, IEEE, 2025

  21. [21]

    Towards deep learning models resistant to adversarial attacks,

    A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in International Conference on Learning Representations, 2018

  22. [22]

    Robust probabilistic time series forecasting,

    T. Yoon, Y . Park, E. K. Ryu, and Y . Wang, “Robust probabilistic time series forecasting,” inInternational Conference on Artificial Intelligence and Statistics, pp. 1336–1358, PMLR, 2022

  23. [23]

    A se- cure control framework for resource-limited adversaries,

    A. Teixeira, I. Shames, H. Sandberg, and K. H. Johansson, “A se- cure control framework for resource-limited adversaries,”Automatica, vol. 51, pp. 135–148, 2015

  24. [24]

    G. E. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung,Time series analysis: forecasting and control. John Wiley & Sons, 2015

  25. [25]

    Can a transformer represent a Kalman filter?,

    G. Goel and P. Bartlett, “Can a transformer represent a Kalman filter?,” in6th Annual Learning for Dynamics & Control Conference, pp. 1502–1512, PMLR, 2024

  26. [26]

    ElectricityLoadDiagrams20112014

    A. Trindade, “ElectricityLoadDiagrams20112014.” UCI Machine Learning Repository, 2015. DOI: https://doi.org/10.24432/C58C86

  27. [27]

    Monash time series forecasting archive,

    R. W. Godahewa, C. Bergmeir, G. Webb, R. Hyndman, and P. Montero-Manso, “Monash time series forecasting archive,” inProc. of the Neural Information Processing Systems Track on Datasets and Benchmarks(J. Vanschoren and S. Yeung, eds.), vol. 1, 2021

  28. [28]

    Anomaly detection based on convolutional recurrent autoencoder for IoT time series,

    C. Yin, S. Zhang, J. Wang, and N. N. Xiong, “Anomaly detection based on convolutional recurrent autoencoder for IoT time series,”IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 1, pp. 112–122, 2020

  29. [29]

    False data injection attacks against state estimation in electric power grids,

    Y . Liu, P. Ning, and M. K. Reiter, “False data injection attacks against state estimation in electric power grids,”ACM Trans. on Information and System Security (TISSEC), vol. 14, no. 1, pp. 1–33, 2011. APPENDIX The results obtained on three additional Monash bench- mark datasets [27] are presented in Table II. Fine-tuning is performed with around1%of at...