Adversarial observations in probabilistic State-Space Models for robust Reinforcement Learning

D. R\'ios Insua; M. Santos-Pascual

arxiv: 2606.20880 · v1 · pith:OJ6MIRSOnew · submitted 2026-06-18 · 📊 stat.ML · cs.LG· stat.ME

Adversarial observations in probabilistic State-Space Models for robust Reinforcement Learning

M. Santos-Pascual , D. R\'ios Insua This is my paper

Pith reviewed 2026-06-26 15:03 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME

keywords adversarial attacksprobabilistic state-space modelsreinforcement learningrobust RLlatent statesobservation perturbationslikelihood constraintsrobotics safety

0 comments

The pith

Adversarial observation shifts that remain model-consistent change latent states and policies in linear state-space RL models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper analyzes adversarial attacks on linear probabilistic state-space models used in reinforcement learning, where perturbations to observations must satisfy likelihood constraints to appear realistic. These shifts are shown to propagate to the estimated latent states and subsequently influence the agent's policy decisions. Understanding this mechanism matters for developing reinforcement learning agents that can handle sensor noise or deliberate attacks without failing. The analysis points toward methods for improving robustness in applications like robotics.

Core claim

Adversarial yet realistic observation shifts influence the latent state and influence policy decisions in linear probabilistic state-space models for reinforcement learning. This perspective provides a principled pathway toward building more robust reinforcement learning systems, with direct relevance to safety-critical domains such as robotics.

What carries the argument

Likelihood-constrained adversarial perturbations on observations in linear probabilistic state-space models, which affect latent state inference and policy decisions.

If this is right

Such attacks can mislead the inference of the environment's latent state.
Policy decisions become vulnerable to these consistent observation changes.
Robustness in RL can be improved by considering these adversarial effects.
Applications in robotics require accounting for sensor noise and attacks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending this analysis to nonlinear or deep state-space models could reveal broader vulnerabilities.
Empirical tests on physical robotic platforms would validate the influence on real policies.
Designing RL training procedures that simulate such attacks might yield inherently more robust policies.

Load-bearing premise

The attacker alters observations under likelihood constraints that ensure the perturbations remain consistent with the model.

What would settle it

Demonstrating that likelihood-constrained observation perturbations in a linear SSM for RL do not change the inferred latent state or the chosen policy would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.20880 by D. R\'ios Insua, M. Santos-Pascual.

**Figure 2.** Figure 2: Impact of the adversarial perturbation at time [PITH_FULL_IMAGE:figures/full_fig_p022_2.png] view at source ↗

**Figure 3.** Figure 3: Effect of the confidence level ϵ and of the attacked time step t on the adversarial perturbation and its impact on state estimation. stronger perturbations also tend to spread more visibly to neighboring state estimates. In addition, changing ϵ slightly alter its perturb direction o adv t − oˆt as Figure 3a illustrates. Consequently, the direction of the most disruptive attack also depends on the likeliho… view at source ↗

**Figure 4.** Figure 4: Impact of adversarial perturbation at time [PITH_FULL_IMAGE:figures/full_fig_p024_4.png] view at source ↗

**Figure 5.** Figure 5: Estimated density of the most disruptive directions in a three-dimensional observation [PITH_FULL_IMAGE:figures/full_fig_p025_5.png] view at source ↗

**Figure 6.** Figure 6: Cumulative reward over 2000 episodes for the three evaluation settings: ( [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗

**Figure 7.** Figure 7: Example trajectory from a single episode under the three considered scenarios. The [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗

read the original abstract

Decision-making under partial or adversarial observability requires accurate inference of the environment's latent state and its associated uncertainty. This work analyses adversarial attacks on linear probabilistic state-space models, commonly integrated within reinforcement learning architectures, where the attacker alters observations under likelihood constraints that ensure the perturbations remains consistent. We analyze how such adversarial yet realistic observation shifts influence the latent state and influence policy decisions. This perspective provides a principled pathway toward building more robust reinforcement learning systems, with direct relevance to safety-critical domains such as robotics, where reliable operation under sensor noise, partial failures, and adversarial conditions is essential.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper traces how likelihood-constrained adversarial observations shift latent states and policies in linear probabilistic SSMs for RL, with a clean argument but mostly diagnostic rather than constructive.

read the letter

The main takeaway is that this work follows the effect of observation perturbations that stay inside the model's likelihood through the filtering equations and into the policy in linear probabilistic state-space models used for reinforcement learning.

It does a clean job of defining realistic attacks via the likelihood constraint and showing why that matters for safety-critical settings such as robotics. The stress-test note is right that the argument structure has no obvious internal contradiction or unsupported leap.

The soft spot is that the piece stays at the level of analysis. The abstract and available description give no new derivations, closed-form bounds, or experiments that quantify how large the policy shift actually is under typical conditions. Without those, it is hard to judge whether the identified vulnerability is a minor nuisance or a load-bearing problem in practice.

This is for people already working on probabilistic models in RL who want to think about observation attacks. A reader looking for new robust algorithms or reproducible empirical results will not find much here.

It shows honest engagement with the literature and a clear question, so it deserves a serious referee even if the final verdict is that more concrete evidence is needed.

Referee Report

2 major / 0 minor

Summary. The manuscript claims that in linear probabilistic state-space models integrated with reinforcement learning, an attacker can alter observations subject to likelihood constraints (ensuring perturbations remain consistent with the model) and that these realistic adversarial shifts still influence the inferred latent state and downstream policy decisions. The work positions this analysis as a pathway to more robust RL systems, particularly for safety-critical domains.

Significance. The topic addresses an important intersection of adversarial robustness, state estimation, and RL. If the central claim were supported by explicit derivations through the filtering recursions and by reproducible experiments showing policy degradation under likelihood-constrained attacks, it would be relevant to safety-critical applications. However, the manuscript supplies no such derivations, experiments, or quantitative results, so the significance cannot be evaluated.

major comments (2)

[Abstract] Abstract and entire manuscript: the central claim—that likelihood-constrained observation perturbations shift the latent state and alter policy decisions—is stated but never demonstrated. No filtering equations, attack formulation, or propagation analysis is provided, rendering the claim unsupported.
No section, table, or equation supplies the linear SSM dynamics, the observation likelihood used for the constraint, the filtering update, or the policy mapping. Without these, it is impossible to verify whether the perturbations remain inside the model or produce the claimed downstream effect.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments correctly note that the submitted manuscript lacks explicit equations, derivations, and experimental demonstrations of the central claims. We will revise the paper to address these gaps by adding the required mathematical details and supporting analyses.

read point-by-point responses

Referee: [Abstract] Abstract and entire manuscript: the central claim—that likelihood-constrained observation perturbations shift the latent state and alter policy decisions—is stated but never demonstrated. No filtering equations, attack formulation, or propagation analysis is provided, rendering the claim unsupported.

Authors: We agree that the current version does not demonstrate the claim through explicit derivations. In the revised manuscript we will add a dedicated technical section deriving the linear probabilistic SSM, formulating the likelihood-constrained adversarial observation shift, showing the filtering recursion updates, and tracing the propagation to the posterior latent state and the downstream policy. This will make the central claim explicit and verifiable. revision: yes
Referee: [—] No section, table, or equation supplies the linear SSM dynamics, the observation likelihood used for the constraint, the filtering update, or the policy mapping. Without these, it is impossible to verify whether the perturbations remain inside the model or produce the claimed downstream effect.

Authors: This observation is accurate for the submitted draft. The revision will include the precise linear SSM state and observation equations, the form of the observation likelihood used to enforce the constraint, the filtering update equations, and the mapping from filtered latent states to policy actions. These additions will allow direct verification that the perturbations stay model-consistent and affect the policy. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The abstract and provided context describe an analysis that defines 'realistic' adversarial perturbations via explicit likelihood constraints and then traces their effects on latent states through standard filtering equations in linear probabilistic SSMs. No equations, derivations, self-citations, or fitted parameters are shown that reduce the central claim to its own inputs by construction. The setup is self-contained against the model dynamics without renaming known results or smuggling ansatzes. This matches the default expectation of no circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5627 in / 857 out tokens · 17689 ms · 2026-06-26T15:03:27.897198+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 14 canonical work pages · 1 internal anchor

[1]

BARRENO, M., NELSON, B., JOSEPH, A. D. and TYGAR, J. D. (2006). Can Machine Learning Be Secure? In Proceedings of the ACM Symposium on Information, Computer and Communications Security (ASIACCS)

2006
[2]

and MUNIR, A

BEHZADAN, V. and MUNIR, A. (2017). Vulnerability of Deep Reinforcement Learning to Policy Induction At- tacks.arXiv preprint

2017
[3]

Learning long-term dependencies with gradient descent is diﬀicult

BENGIO, Y., SIMARD, P. and FRASCONI, P. (1994). Learning Long-Term Dependencies with Gradient Descent Is Difficult.IEEE Transactions on Neural Networks5157–166. https://doi.org/10.1109/72.279181

work page doi:10.1109/72.279181 1994
[4]

and ROLI, F

BIGGIO, B. and ROLI, F. (2018). Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning. Pattern Recognition84317–331. https://doi.org/10.1016/j.patcog.2018.07.023

work page doi:10.1016/j.patcog.2018.07.023 2018
[5]

and VANDENBERGHE, L

BOYD, S. and VANDENBERGHE, L. (2004).Convex Optimization. Cambridge University Press, Cambridge, UK

2004
[6]

W., WILSON, D

CAMERON, F., BEQUETTE, B. W., WILSON, D. M., BUCKINGHAM, B. A., LEE, H. and NIEMEYER, G. (2011). A Closed-Loop Artificial Pancreas Based on Risk Management.Journal of Diabetes Science and Technology 5368–379. https://doi.org/10.1177/193229681100500226

work page doi:10.1177/193229681100500226 2011
[7]

R., GOULD, N

CONN, A. R., GOULD, N. I. M. and TOINT, P. L. (2000).Trust-Region Methods. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA

2000
[8]

ABDOLMALEKI, A.,DELASCASAS, D. et al. (2022). Magnetic control of tokamak plasmas through deep reinforcement learning.Nature602414–419. 28

2022
[9]

and VYATKIN, V

DENG, J., SIERLA, S., SUN, J. and VYATKIN, V. (2023). Offline Reinforcement Learning for Industrial Process Control: A Case Study from Steel Industry.Information Sciences632221–231. https://doi.org/10.1016/j.ins. 2023.03.019

work page doi:10.1016/j.ins 2023
[10]

and MURPHY, K

KNOBLAUCH, J., JONES, M., BRIOL, F.-X. and MURPHY, K. P. (2024). Outlier-Robust Kalman Fil- tering through Generalised Bayes. InProceedings of the 41st International Conference on Machine Learning. Proceedings of Machine Learning Research23512138–12171

2024
[11]

DURAN-MARTIN, G., SÁNCHEZ-BETANCOURT, L., SHESTOPALOFF, A. Y. and MURPHY, K. P. (2025). A Uni- fying Framework for Generalised Bayesian Online Learning in Non-Stationary Environments.Transactions on Machine Learning Research

2025
[12]

and ZHENG, W

FANG, C., QI, Y., CHEN, J., TAN, R. and ZHENG, W. X. (2020). Stealthy Actuator Signal Attacks in Stochas- tic Control Systems: Performance and Limitations.IEEE Transactions on Automatic Control653927–3934. https://doi.org/10.1109/TAC.2019.2950072

work page doi:10.1109/tac.2019.2950072 2020
[13]

W., KOLLMAN, C., WOODALL, W

RIA, L., SWANSON, V., LUM, J. W., KOLLMAN, C., WOODALL, W. and BECK, R. W. (2018). Predictive Low-Glucose Suspend Reduces Hypoglycemia in Adults, Adolescents, and Children With Type 1 Diabetes in an At-Home Randomized Crossover Study: Results of the PROLOG Trial.Diabetes Care412155–2161. https://doi.org/10.2337/dc18-0771 GARCÍA, J. and FERNÁNDEZ, F. (2015...

work page doi:10.2337/dc18-0771 2018
[14]

Gaudet, R

GAUDET, B., LINARES, R. and FURFARO, R. (2020). Deep Reinforcement Learning for Six Degree-of-Freedom Planetary Landing.Advances in Space Research651723–1741. https://doi.org/10.1016/j.asr.2019.12.030

work page doi:10.1016/j.asr.2019.12.030 2020
[15]

and RUSSELL, S

GLEAVE, A., DENNIS, M., WILD, C., KANT, N., LEVINE, S. and RUSSELL, S. (2020). Adversarial Policies: Attacking Deep Reinforcement Learning. InInternational Conference on Learning Representations (ICLR)

2020
[16]

J., SHLENS, J

GOODFELLOW, I. J., SHLENS, J. and SZEGEDY, C. (2015). Explaining and Harnessing Adversarial Examples. InInternational Conference on Learning Representations (ICLR)

2015
[17]

and DAO, T

GU, A. and DAO, T. (2023). Mamba: Linear-Time Sequence Modeling with Selective State Spaces.arXiv preprint

2023
[18]

and RÉ, C

GU, A., GOEL, K. and RÉ, C. (2022). Efficiently Modeling Long Sequences with Structured State Spaces. In International Conference on Learning Representations (ICLR)

2022
[19]

and NOROUZI, M

HAFNER, D., LILLICRAP, T., BA, J. and NOROUZI, M. (2020). Dream to Control: Learning Behaviors by Latent Imagination. InInternational Conference on Learning Representations (ICLR)

2020
[20]

and WEST, M

HARRISON, J. and WEST, M. (1991). Dynamic Linear Model Diagnostics.Biometrika78797–808. https://doi. org/10.1093/biomet/78.4.797

work page doi:10.1093/biomet/78.4.797 1991
[21]

HINTON, G. E. (2002). Training Products of Experts by Minimizing Contrastive Divergence.Neural Computation 141771–1800

2002
[22]

and ABBEEL, P

HUANG, S., PAPERNOT, N., GOODFELLOW, I., DUAN, Y. and ABBEEL, P. (2017). Adversarial Attacks on Neural Network Policies.arXiv preprint

2017
[23]

P., LITTMAN, M

KAELBLING, L. P., LITTMAN, M. L. and CASSANDRA, A. R. (1998). Planning and Acting in Partially Observ- able Stochastic Domains.Artificial Intelligence10199–134

1998
[24]

and HASSIBI, B

KARGIN, T., HAJAR, J., MALIK, V. and HASSIBI, B. (2024). Distributionally Robust Kalman Filtering over Finite and Infinite Horizon

2024
[25]

and SCARAMUZZA, D

KAUFMANN, E., BAUERSFELD, L., LOQUERCIO, A., MÜLLER, M., KOLTUN, V. and SCARAMUZZA, D. (2023). Champion-level drone racing using deep reinforcement learning.Nature620982–987

2023
[26]

and LI, W

KIOURTI, P., WARDEGA, K., JHA, S. and LI, W. (2020). TrojDRL: Evaluation of Backdoor Attacks on Deep Reinforcement Learning. InProceedings of the 57th ACM/IEEE Design Automation Conference1–6. IEEE

2020
[27]

and CLOSAS, P

LI, H., MEDINA, D., VILÀ-VALLS, J. and CLOSAS, P. (2021). Robust Variational-Based Kalman Filter for Outlier Rejection With Correlated Measurements.IEEE Transactions on Signal Processing69357–369. https: //doi.org/10.1109/TSP.2020.3042944

work page doi:10.1109/tsp.2020.3042944 2021
[28]

and BEHBAHANI, F

LU, C., SCHROECKER, Y., GU, A., PARISOTTO, E., FOERSTER, J., SINGH, S. and BEHBAHANI, F. (2023). Structured State Space Models for In-Context Reinforcement Learning. InAdvances in Neural Information Processing Systems3647016–47031

2023
[29]

E., BOTTERO, A

LUIS, C. E., BOTTERO, A. G., VINOGRADSKA, J., BERKENKAMP, F. and PETERS, J. (2024). Uncertainty Representations in State-Space Layers for Deep Reinforcement Learning under Partial Observability.arXiv preprint

2024
[30]

and VLADU, A

MADRY, A., MAKELOV, A., SCHMIDT, L., TSIPRAS, D. and VLADU, A. (2018). Towards Deep Learning Mod- els Resistant to Adversarial Attacks. InInternational Conference on Learning Representations (ICLR)

2018
[31]

MURPHY, K. P. (2023).Probabilistic Machine Learning: Advanced Topics. MIT Press. 29

2023
[32]

and WRIGHT, S

NOCEDAL, J. and WRIGHT, S. J. (2006).Numerical Optimization, 2 ed. Springer, New York, NY

2006
[33]

and CAMPAGNOLI, P

PETRIS, G., PETRONE, S. and CAMPAGNOLI, P. (2009).Dynamic Linear Models with R. Springer Science & Business Media

2009
[34]

and SUKTHANKAR, R

PINTO, L., DAVIDSON, J. and SUKTHANKAR, R. (2017). Robust Adversarial Reinforcement Learning. InInter- national Conference on Machine Learning (ICML). QUIÑONERO-CANDELA, J., SUGIYAMA, M., SCHWAIGHOFER, A. and LAWRENCE, N. D., eds. (2009).Dataset Shift in Machine Learning. MIT Press

2017
[35]

and AMATO, C

RATHBUN, E., OPREA, A. and AMATO, C. (2025). Adversarial Inception Backdoor Attacks against Reinforce- ment Learning. InProceedings of the 42nd International Conference on Machine Learning.Proceedings of Machine Learning Research26751273–51296. PMLR

2025
[36]

E., TUNG, F

RAUCH, H. E., TUNG, F. and STRIEBEL, C. T. (1965). Maximum likelihood estimates of linear dynamic sys- tems.AIAA Journal31445–1450. https://doi.org/10.2514/3.3166

work page doi:10.2514/3.3166 1965
[37]

Robust Bayesian Filtering and Smoothing Using Student's t Distribution

ROTH, M., ARDESHIRI, T., ÖZKAN, E. and GUSTAFSSON, F. (2017). Robust Bayesian Filtering and Smoothing Using Student’s t Distribution. https://doi.org/10.48550/arXiv.1703.02428 SÄRKKÄ, S. and HARTIKAINEN, J. (2013). Variational Bayesian Adaptation of Noise Covariances in Non-Linear Kalman Filtering. https://doi.org/10.48550/arXiv.1302.0681 SÄRKKÄ, S. and N...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1703.02428 2017
[38]

A., KUHN, D

SHAFIEEZADEH-ABADEH, S., NGUYEN, V. A., KUHN, D. and MOHAJERINESFAHANI, P. (2018). Wasserstein Distributionally Robust Kalman Filtering. InAdvances in Neural Information Processing Systems31

2018
[39]

(2005).Stochastic Volatility: Selected Readings

SHEPHARD, N., ed. (2005).Stochastic Volatility: Selected Readings. Oxford University Press

2005
[40]

TABUADA, P. (2017). Secure State Estimation for Cyber-Physical Systems under Sensor Attacks: A Satis- fiability Modulo Theory Approach.IEEE Transactions on Automatic Control624917–4932. https://doi.org/ 10.1109/TAC.2017.2650223

work page doi:10.1109/tac.2017.2650223 2017
[41]

and LINDERMAN, S

SMITH, J., WARRINGTON, A. and LINDERMAN, S. W. (2023). Simplified State Space Layers for Sequence Modeling. InInternational Conference on Learning Representations (ICLR)

2023
[42]

M., MIMI, M

SOMVANSHI, S., ISLAM, M. M., MIMI, M. S., POLOCK, S. B. B., CHHETRI, G., DUTTA, A., RAFE, A. and DAS, S. (2025). Advancing Intelligent Sequence Modeling: Evolution, Trade-offs, and Applications of State-Space Architectures from S4 to Mamba.arXiv preprint arXiv:2503.18970

Pith/arXiv arXiv 2025
[43]

VASSILEV, A., OPREA, A. et al. (2024). Adversarial Machine Learning: A Taxonomy and Terminology of At- tacks and Mitigations NIST AI Report No. NIST.AI.100-2e2023, National Institute of Standards and Technol- ogy (NIST). https://doi.org/10.6028/NIST.AI.100-2e2023

work page doi:10.6028/nist.ai.100-2e2023 2024
[44]

POLOSUKHIN, I. (2017). Attention Is All You Need. InAdvances in Neural Information Processing Systems (NeurIPS)

2017
[45]

and WANG, H

WANG, H., LI, H., FANG, J. and WANG, H. (2018). Robust Gaussian Kalman Filter With Outlier Detection. IEEE Signal Processing Letters251236–1240. https://doi.org/10.1109/LSP.2018.2851156

work page doi:10.1109/lsp.2018.2851156 2018
[46]

and HARRISON, J

WEST, M. and HARRISON, J. (1997).Bayesian Forecasting and Dynamic Models. Springer Science & Business Media

1997

[1] [1]

BARRENO, M., NELSON, B., JOSEPH, A. D. and TYGAR, J. D. (2006). Can Machine Learning Be Secure? In Proceedings of the ACM Symposium on Information, Computer and Communications Security (ASIACCS)

2006

[2] [2]

and MUNIR, A

BEHZADAN, V. and MUNIR, A. (2017). Vulnerability of Deep Reinforcement Learning to Policy Induction At- tacks.arXiv preprint

2017

[3] [3]

Learning long-term dependencies with gradient descent is diﬀicult

BENGIO, Y., SIMARD, P. and FRASCONI, P. (1994). Learning Long-Term Dependencies with Gradient Descent Is Difficult.IEEE Transactions on Neural Networks5157–166. https://doi.org/10.1109/72.279181

work page doi:10.1109/72.279181 1994

[4] [4]

and ROLI, F

BIGGIO, B. and ROLI, F. (2018). Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning. Pattern Recognition84317–331. https://doi.org/10.1016/j.patcog.2018.07.023

work page doi:10.1016/j.patcog.2018.07.023 2018

[5] [5]

and VANDENBERGHE, L

BOYD, S. and VANDENBERGHE, L. (2004).Convex Optimization. Cambridge University Press, Cambridge, UK

2004

[6] [6]

W., WILSON, D

CAMERON, F., BEQUETTE, B. W., WILSON, D. M., BUCKINGHAM, B. A., LEE, H. and NIEMEYER, G. (2011). A Closed-Loop Artificial Pancreas Based on Risk Management.Journal of Diabetes Science and Technology 5368–379. https://doi.org/10.1177/193229681100500226

work page doi:10.1177/193229681100500226 2011

[7] [7]

R., GOULD, N

CONN, A. R., GOULD, N. I. M. and TOINT, P. L. (2000).Trust-Region Methods. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA

2000

[8] [8]

ABDOLMALEKI, A.,DELASCASAS, D. et al. (2022). Magnetic control of tokamak plasmas through deep reinforcement learning.Nature602414–419. 28

2022

[9] [9]

and VYATKIN, V

DENG, J., SIERLA, S., SUN, J. and VYATKIN, V. (2023). Offline Reinforcement Learning for Industrial Process Control: A Case Study from Steel Industry.Information Sciences632221–231. https://doi.org/10.1016/j.ins. 2023.03.019

work page doi:10.1016/j.ins 2023

[10] [10]

and MURPHY, K

KNOBLAUCH, J., JONES, M., BRIOL, F.-X. and MURPHY, K. P. (2024). Outlier-Robust Kalman Fil- tering through Generalised Bayes. InProceedings of the 41st International Conference on Machine Learning. Proceedings of Machine Learning Research23512138–12171

2024

[11] [11]

DURAN-MARTIN, G., SÁNCHEZ-BETANCOURT, L., SHESTOPALOFF, A. Y. and MURPHY, K. P. (2025). A Uni- fying Framework for Generalised Bayesian Online Learning in Non-Stationary Environments.Transactions on Machine Learning Research

2025

[12] [12]

and ZHENG, W

FANG, C., QI, Y., CHEN, J., TAN, R. and ZHENG, W. X. (2020). Stealthy Actuator Signal Attacks in Stochas- tic Control Systems: Performance and Limitations.IEEE Transactions on Automatic Control653927–3934. https://doi.org/10.1109/TAC.2019.2950072

work page doi:10.1109/tac.2019.2950072 2020

[13] [13]

W., KOLLMAN, C., WOODALL, W

RIA, L., SWANSON, V., LUM, J. W., KOLLMAN, C., WOODALL, W. and BECK, R. W. (2018). Predictive Low-Glucose Suspend Reduces Hypoglycemia in Adults, Adolescents, and Children With Type 1 Diabetes in an At-Home Randomized Crossover Study: Results of the PROLOG Trial.Diabetes Care412155–2161. https://doi.org/10.2337/dc18-0771 GARCÍA, J. and FERNÁNDEZ, F. (2015...

work page doi:10.2337/dc18-0771 2018

[14] [14]

Gaudet, R

GAUDET, B., LINARES, R. and FURFARO, R. (2020). Deep Reinforcement Learning for Six Degree-of-Freedom Planetary Landing.Advances in Space Research651723–1741. https://doi.org/10.1016/j.asr.2019.12.030

work page doi:10.1016/j.asr.2019.12.030 2020

[15] [15]

and RUSSELL, S

GLEAVE, A., DENNIS, M., WILD, C., KANT, N., LEVINE, S. and RUSSELL, S. (2020). Adversarial Policies: Attacking Deep Reinforcement Learning. InInternational Conference on Learning Representations (ICLR)

2020

[16] [16]

J., SHLENS, J

GOODFELLOW, I. J., SHLENS, J. and SZEGEDY, C. (2015). Explaining and Harnessing Adversarial Examples. InInternational Conference on Learning Representations (ICLR)

2015

[17] [17]

and DAO, T

GU, A. and DAO, T. (2023). Mamba: Linear-Time Sequence Modeling with Selective State Spaces.arXiv preprint

2023

[18] [18]

and RÉ, C

GU, A., GOEL, K. and RÉ, C. (2022). Efficiently Modeling Long Sequences with Structured State Spaces. In International Conference on Learning Representations (ICLR)

2022

[19] [19]

and NOROUZI, M

HAFNER, D., LILLICRAP, T., BA, J. and NOROUZI, M. (2020). Dream to Control: Learning Behaviors by Latent Imagination. InInternational Conference on Learning Representations (ICLR)

2020

[20] [20]

and WEST, M

HARRISON, J. and WEST, M. (1991). Dynamic Linear Model Diagnostics.Biometrika78797–808. https://doi. org/10.1093/biomet/78.4.797

work page doi:10.1093/biomet/78.4.797 1991

[21] [21]

HINTON, G. E. (2002). Training Products of Experts by Minimizing Contrastive Divergence.Neural Computation 141771–1800

2002

[22] [22]

and ABBEEL, P

HUANG, S., PAPERNOT, N., GOODFELLOW, I., DUAN, Y. and ABBEEL, P. (2017). Adversarial Attacks on Neural Network Policies.arXiv preprint

2017

[23] [23]

P., LITTMAN, M

KAELBLING, L. P., LITTMAN, M. L. and CASSANDRA, A. R. (1998). Planning and Acting in Partially Observ- able Stochastic Domains.Artificial Intelligence10199–134

1998

[24] [24]

and HASSIBI, B

KARGIN, T., HAJAR, J., MALIK, V. and HASSIBI, B. (2024). Distributionally Robust Kalman Filtering over Finite and Infinite Horizon

2024

[25] [25]

and SCARAMUZZA, D

KAUFMANN, E., BAUERSFELD, L., LOQUERCIO, A., MÜLLER, M., KOLTUN, V. and SCARAMUZZA, D. (2023). Champion-level drone racing using deep reinforcement learning.Nature620982–987

2023

[26] [26]

and LI, W

KIOURTI, P., WARDEGA, K., JHA, S. and LI, W. (2020). TrojDRL: Evaluation of Backdoor Attacks on Deep Reinforcement Learning. InProceedings of the 57th ACM/IEEE Design Automation Conference1–6. IEEE

2020

[27] [27]

and CLOSAS, P

LI, H., MEDINA, D., VILÀ-VALLS, J. and CLOSAS, P. (2021). Robust Variational-Based Kalman Filter for Outlier Rejection With Correlated Measurements.IEEE Transactions on Signal Processing69357–369. https: //doi.org/10.1109/TSP.2020.3042944

work page doi:10.1109/tsp.2020.3042944 2021

[28] [28]

and BEHBAHANI, F

LU, C., SCHROECKER, Y., GU, A., PARISOTTO, E., FOERSTER, J., SINGH, S. and BEHBAHANI, F. (2023). Structured State Space Models for In-Context Reinforcement Learning. InAdvances in Neural Information Processing Systems3647016–47031

2023

[29] [29]

E., BOTTERO, A

LUIS, C. E., BOTTERO, A. G., VINOGRADSKA, J., BERKENKAMP, F. and PETERS, J. (2024). Uncertainty Representations in State-Space Layers for Deep Reinforcement Learning under Partial Observability.arXiv preprint

2024

[30] [30]

and VLADU, A

MADRY, A., MAKELOV, A., SCHMIDT, L., TSIPRAS, D. and VLADU, A. (2018). Towards Deep Learning Mod- els Resistant to Adversarial Attacks. InInternational Conference on Learning Representations (ICLR)

2018

[31] [31]

MURPHY, K. P. (2023).Probabilistic Machine Learning: Advanced Topics. MIT Press. 29

2023

[32] [32]

and WRIGHT, S

NOCEDAL, J. and WRIGHT, S. J. (2006).Numerical Optimization, 2 ed. Springer, New York, NY

2006

[33] [33]

and CAMPAGNOLI, P

PETRIS, G., PETRONE, S. and CAMPAGNOLI, P. (2009).Dynamic Linear Models with R. Springer Science & Business Media

2009

[34] [34]

and SUKTHANKAR, R

PINTO, L., DAVIDSON, J. and SUKTHANKAR, R. (2017). Robust Adversarial Reinforcement Learning. InInter- national Conference on Machine Learning (ICML). QUIÑONERO-CANDELA, J., SUGIYAMA, M., SCHWAIGHOFER, A. and LAWRENCE, N. D., eds. (2009).Dataset Shift in Machine Learning. MIT Press

2017

[35] [35]

and AMATO, C

RATHBUN, E., OPREA, A. and AMATO, C. (2025). Adversarial Inception Backdoor Attacks against Reinforce- ment Learning. InProceedings of the 42nd International Conference on Machine Learning.Proceedings of Machine Learning Research26751273–51296. PMLR

2025

[36] [36]

E., TUNG, F

RAUCH, H. E., TUNG, F. and STRIEBEL, C. T. (1965). Maximum likelihood estimates of linear dynamic sys- tems.AIAA Journal31445–1450. https://doi.org/10.2514/3.3166

work page doi:10.2514/3.3166 1965

[37] [37]

Robust Bayesian Filtering and Smoothing Using Student's t Distribution

ROTH, M., ARDESHIRI, T., ÖZKAN, E. and GUSTAFSSON, F. (2017). Robust Bayesian Filtering and Smoothing Using Student’s t Distribution. https://doi.org/10.48550/arXiv.1703.02428 SÄRKKÄ, S. and HARTIKAINEN, J. (2013). Variational Bayesian Adaptation of Noise Covariances in Non-Linear Kalman Filtering. https://doi.org/10.48550/arXiv.1302.0681 SÄRKKÄ, S. and N...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1703.02428 2017

[38] [38]

A., KUHN, D

SHAFIEEZADEH-ABADEH, S., NGUYEN, V. A., KUHN, D. and MOHAJERINESFAHANI, P. (2018). Wasserstein Distributionally Robust Kalman Filtering. InAdvances in Neural Information Processing Systems31

2018

[39] [39]

(2005).Stochastic Volatility: Selected Readings

SHEPHARD, N., ed. (2005).Stochastic Volatility: Selected Readings. Oxford University Press

2005

[40] [40]

TABUADA, P. (2017). Secure State Estimation for Cyber-Physical Systems under Sensor Attacks: A Satis- fiability Modulo Theory Approach.IEEE Transactions on Automatic Control624917–4932. https://doi.org/ 10.1109/TAC.2017.2650223

work page doi:10.1109/tac.2017.2650223 2017

[41] [41]

and LINDERMAN, S

SMITH, J., WARRINGTON, A. and LINDERMAN, S. W. (2023). Simplified State Space Layers for Sequence Modeling. InInternational Conference on Learning Representations (ICLR)

2023

[42] [42]

M., MIMI, M

SOMVANSHI, S., ISLAM, M. M., MIMI, M. S., POLOCK, S. B. B., CHHETRI, G., DUTTA, A., RAFE, A. and DAS, S. (2025). Advancing Intelligent Sequence Modeling: Evolution, Trade-offs, and Applications of State-Space Architectures from S4 to Mamba.arXiv preprint arXiv:2503.18970

Pith/arXiv arXiv 2025

[43] [43]

VASSILEV, A., OPREA, A. et al. (2024). Adversarial Machine Learning: A Taxonomy and Terminology of At- tacks and Mitigations NIST AI Report No. NIST.AI.100-2e2023, National Institute of Standards and Technol- ogy (NIST). https://doi.org/10.6028/NIST.AI.100-2e2023

work page doi:10.6028/nist.ai.100-2e2023 2024

[44] [44]

POLOSUKHIN, I. (2017). Attention Is All You Need. InAdvances in Neural Information Processing Systems (NeurIPS)

2017

[45] [45]

and WANG, H

WANG, H., LI, H., FANG, J. and WANG, H. (2018). Robust Gaussian Kalman Filter With Outlier Detection. IEEE Signal Processing Letters251236–1240. https://doi.org/10.1109/LSP.2018.2851156

work page doi:10.1109/lsp.2018.2851156 2018

[46] [46]

and HARRISON, J

WEST, M. and HARRISON, J. (1997).Bayesian Forecasting and Dynamic Models. Springer Science & Business Media

1997