Secure Planning Against Stealthy Attacks via Model-Free Reinforcement Learning
Pith reviewed 2026-05-24 14:26 UTC · model grok-4.3
The pith
A combined LTL formula for task and stealthy-attack prevention in a stochastic game can be satisfied by model-free reinforcement learning without an environment model.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We consider the problem of security-aware planning in an unknown stochastic environment, in the presence of attacks on control signals of the robot. We model the attacker as an agent who has the full knowledge of the controller as well as the employed intrusion-detection system and who wants to prevent the controller from performing tasks while staying stealthy. We formulate the problem as a stochastic game between the attacker and the controller and present an approach to express the objective of such an agent and the controller as a combined linear temporal logic (LTL) formula. We then show that the planning problem, described formally as the problem of satisfying an LTL formula in a stoch
What carries the argument
Combined LTL formula that encodes both task completion and stealthy-attack prevention inside a stochastic game between controller and attacker, solved via model-free RL.
If this is right
- The planning problem can be solved without any model of the environment.
- Model-free RL computes a policy that meets both task and security requirements expressed in the combined LTL formula.
- The method applies to robotic systems facing actuator attacks that must remain stealthy.
- The approach is evaluated on two robotic planning case studies.
Where Pith is reading between the lines
- The same combined-LTL plus model-free-RL pattern could be tried for other attack types whose goals are expressible in temporal logic.
- It suggests that adversarial control problems might be handled without explicit environment models when objectives fit inside an LTL formula.
- Scalability to high-dimensional state spaces or continuous dynamics would require further testing beyond the two case studies.
Load-bearing premise
The objectives of the attacker and the controller can be expressed together in one LTL formula that model-free reinforcement learning can satisfy in a completely unknown environment.
What would settle it
Apply model-free RL to a simulated robotic task with actuator attacks and an intrusion-detection system, then check whether the learned policy satisfies the combined LTL formula while completing the task and blocking stealthy attacks.
Figures
read the original abstract
We consider the problem of security-aware planning in an unknown stochastic environment, in the presence of attacks on control signals (i.e., actuators) of the robot. We model the attacker as an agent who has the full knowledge of the controller as well as the employed intrusion-detection system and who wants to prevent the controller from performing tasks while staying stealthy. We formulate the problem as a stochastic game between the attacker and the controller and present an approach to express the objective of such an agent and the controller as a combined linear temporal logic (LTL) formula. We then show that the planning problem, described formally as the problem of satisfying an LTL formula in a stochastic game, can be solved via model-free reinforcement learning when the environment is completely unknown. Finally, we illustrate and evaluate our methods on two robotic planning case studies.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper addresses security-aware planning for a robot in an unknown stochastic environment subject to stealthy attacks on its actuators. The attacker is modeled as having complete knowledge of the controller and intrusion-detection system and seeking to disrupt task completion while remaining undetected. The interaction is formulated as a stochastic game whose objectives (for both parties) are encoded as a single combined LTL formula; the resulting LTL-satisfaction problem on the unknown game is then solved by model-free reinforcement learning. The method is illustrated and evaluated on two robotic planning case studies.
Significance. If the reduction from the combined LTL formula to a model-free RL objective is shown to be sound and the learned policies are demonstrated to satisfy the specification with high probability, the work would provide a practical route to secure planning without an a-priori environment model. The explicit construction of a joint LTL formula that simultaneously encodes task satisfaction and stealth constraints is a clear technical contribution.
major comments (2)
- [Abstract] Abstract: the central claim that 'the planning problem … can be solved via model-free reinforcement learning when the environment is completely unknown' is stated without any indication of the reward construction, the handling of the two-player game structure inside the RL loop, or convergence arguments; these details are load-bearing for the claim and must be supplied with explicit equations or algorithms.
- [Abstract] The weakest assumption identified in the reader’s report (that a single LTL formula can be formed whose satisfaction corresponds to both task completion and stealthy-attack prevention, and that model-free RL can find a policy for it) is never discharged in the provided description; a concrete construction of the product automaton or the reward function on accepting states is required before the reduction can be accepted.
minor comments (1)
- [Evaluation] The two case studies should report quantitative metrics (success rate, attack success rate, number of episodes) together with the exact LTL formulas employed so that the empirical support for the method can be assessed.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive suggestions. The comments focus on making the abstract self-contained with respect to the technical reduction. We will revise the abstract to include brief but explicit indications of the LTL-to-reward construction and the handling of the game inside the RL procedure, while preserving the manuscript's existing technical sections that already contain the full derivations.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that 'the planning problem … can be solved via model-free reinforcement learning when the environment is completely unknown' is stated without any indication of the reward construction, the handling of the two-player game structure inside the RL loop, or convergence arguments; these details are load-bearing for the claim and must be supplied with explicit equations or algorithms.
Authors: We agree that the abstract would be strengthened by indicating these elements. The full manuscript already supplies them: the combined LTL formula is constructed in Section III, the product automaton and the reward function (r = 1 on accepting states, discounted sum otherwise) appear in Section IV, the two-player structure is handled by treating the attacker as part of the environment in the model-free Q-learning update, and convergence follows from standard results on RL for LTL satisfaction under the assumption of sufficient exploration. To address the referee's point directly, we will add one sentence to the abstract that references the reward construction from the accepting states of the product automaton and notes that standard model-free RL is applied to the resulting zero-sum game. revision: yes
-
Referee: [Abstract] The weakest assumption identified in the reader’s report (that a single LTL formula can be formed whose satisfaction corresponds to both task completion and stealthy-attack prevention, and that model-free RL can find a policy for it) is never discharged in the provided description; a concrete construction of the product automaton or the reward function on accepting states is required before the reduction can be accepted.
Authors: The concrete construction is given in the body of the paper (Sections III and IV): the task LTL φ_task and the stealth LTL φ_stealth are conjoined into a single formula φ = φ_task ∧ φ_stealth; the product automaton is formed in the standard way; and the reward function assigns positive reward precisely on the accepting states of this automaton, turning LTL satisfaction into an RL objective. Because the referee correctly notes that the abstract itself does not discharge this, we will revise the abstract to include a short clause stating that the objectives are encoded as a single LTL formula whose satisfaction is reduced to a reward-maximization problem on the product automaton. revision: yes
Circularity Check
No significant circularity; standard reduction to RL on LTL game
full rationale
The derivation reduces the security planning problem to a stochastic game whose objectives are encoded as a single LTL formula; satisfaction of that formula in an unknown environment is then solved by model-free RL. This is a conventional product-automaton construction followed by reward shaping on accepting states, with no self-definitional loops, no fitted parameters renamed as predictions, and no load-bearing self-citations that close the argument. The approach is externally falsifiable via simulation on the two robotic case studies and does not rely on any uniqueness theorem or ansatz imported from the authors' prior work.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The problem can be modeled as a stochastic game between controller and attacker.
- domain assumption Objectives can be expressed as a combined LTL formula.
Reference graph
Works this paper leans on
-
[1]
Andrew J. Kerns, Daniel P. Shepard, Jahshan A. Bhatti, and Todd E. Humphreys. Unmanned aircraft capture and control via GPS spoofing. Journal of Field Robotics , 31(4):617–636, 2014
work page 2014
-
[2]
Mark L. Psiaki, Todd E. Humphreys, and Brian Stauffer. Attackers can spoof navigation signals without our knowledge. Here’s how to fight back GPS lies. IEEE Spectrum, 53(8):26–53, 2016
work page 2016
-
[3]
Real-time safety assessment of unmanned aircraft systems against stealthy cyber attacks
Cheolhyeon Kwon, Scott Yantek, and Inseok Hwang. Real-time safety assessment of unmanned aircraft systems against stealthy cyber attacks. Journal of Aerospace Information Systems, 13(1):27–45, 2016
work page 2016
-
[4]
Non-invasive spoofing attacks for anti-lock braking systems
Yasser Shoukry, Paul Martin, Paulo Tabuada, and Mani Srivastava. Non-invasive spoofing attacks for anti-lock braking systems. In International Conference on Cryptographic Hardware and Embedded Systems, pages 55–72. Springer, 2013
work page 2013
-
[5]
D’Argenio, Bernd Finkbeiner, and Holger Hermanns
Gilles Barthe, Pedro R. D’Argenio, Bernd Finkbeiner, and Holger Hermanns. Facets of software doping. In International Symposium on Leveraging Applications of Formal Methods, pages 601–608. Springer, 2016
work page 2016
-
[6]
Survey of recent cyber security attacks on robotic systems and their mitigation approaches
Abdullahi Chowdhury, Gour Karmakar, and Joarder Kamruzzaman. Survey of recent cyber security attacks on robotic systems and their mitigation approaches. In Cyber Law, Privacy, and Security: Concepts, Methodologies, Tools, and Applications, pages 1426–1441. IGI Global, 2019
work page 2019
-
[7]
Secure control against replay attacks
Yilin Mo and Bruno Sinopoli. Secure control against replay attacks. In 2009 47th Annual Allerton Conference on Communication, Control, and Computing, pages 911–918, 2009
work page 2009
-
[8]
Roy S. Smith. Covert Misappropriation of Networked Control Systems: Presenting a Feedback Structure. IEEE Control Systems Magazine, 35(1):82–92, 2015
work page 2015
-
[9]
Andre Teixeira, Iman Shames, Henrik Sandberg, and Karl H. Jo- hansson. Revealing stealthy attacks in control systems. In 2012 50th Annual Allerton Conference on Communication, Control, and Computing, pages 1806–1813, Monticello, IL, USA, October 2012. IEEE
work page 2012
-
[10]
False data injection attacks in control systems
Yilin Mo and Bruno Sinopoli. False data injection attacks in control systems. In First workshop on Secure Control Systems , pages 1–6, 2010
work page 2010
-
[11]
Analysis and design of stealthy cyber attacks on unmanned aerial systems
Cheolhyeon Kwon, Weiyi Liu, and Inseok Hwang. Analysis and design of stealthy cyber attacks on unmanned aerial systems. Journal of Aerospace Information Systems , 11(8):525–539, 2014
work page 2014
-
[12]
Relaxing integrity requirements for attack-resilient cyber-physical systems
Ilija Jovanov and Miroslav Pajic. Relaxing integrity requirements for attack-resilient cyber-physical systems. IEEE Transactions on Automatic Control, 64(12):4843–4858, Dec 2019
work page 2019
-
[13]
ConAML: Constrained Adversarial Machine Learning for Cyber-Physical Systems
Jiangnan Li, Jin Young Lee, Yingyuan Yang, Jinyuan Stella Sun, and Kevin Tomsovic. ConAML: Constrained Adversarial Machine Learning for Cyber-Physical Systems. arXiv:2003.05631 [cs], March 2020
-
[14]
Adver- sarial Machine Learning Beyond the Image Domain
Giulio Zizzo, Chris Hankin, Sergio Maffeis, and Kevin Jones. Adver- sarial Machine Learning Beyond the Image Domain. In Proceedings of the 56th Annual Design Automation Conference 2019, DAC ’19, pages 1–4, Las Vegas, NV , USA, June 2019. Association for Computing Machinery
work page 2019
-
[15]
A Deep Learning-based Framework for Conducting Stealthy Attacks in Industrial Control Systems
Cheng Feng, Tingting Li, Zhanxing Zhu, and Deeph Chana. A deep learning-based framework for conducting stealthy attacks in industrial control systems. arXiv:1709.06397 [cs], September 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[16]
Lloyd S. Shapley. Stochastic games. Proceedings of the National Academy of Sciences , 39(10):1095–1100, 1953
work page 1953
-
[17]
Hadas Kress-Gazit, Georgios E. Fainekos, and George J. Pappas. Where’s Waldo? sensor-based temporal logic motion planning. In Proceedings 2007 IEEE International Conference on Robotics and Automation, pages 3116–3121. IEEE, 2007
work page 2007
-
[18]
Meng Guo, Karl H. Johansson, and Dimos V . Dimarogonas. Revising motion planning under linear temporal logic specifications in partially known workspaces. In 2013 IEEE International Conference on Robotics and Automation , pages 5025–5032. IEEE, 2013
work page 2013
-
[19]
Borzoo Bonakdarpour, Jyotirmoy V . Deshmukh, and Miroslav Pajic. Opportunities and challenges in monitoring cyber-physical systems security. In International Symposium on Leveraging Applications of Formal Methods, pages 9–18. Springer, 2018
work page 2018
-
[20]
Runtime monitoring of cyber-physical systems under timing and memory constraints
Ramy Medhat, Borzoo Bonakdarpour, Deepak Kumar, and Sebastian Fischmeister. Runtime monitoring of cyber-physical systems under timing and memory constraints. ACM Transactions on Embedded Computing Systems (TECS) , 14(4):1–29, 2015
work page 2015
-
[21]
Synthesizing monitors for safety properties
Klaus Havelund and Grigore Ros ¸u. Synthesizing monitors for safety properties. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems , pages 342–356. Springer, 2002
work page 2002
-
[22]
Moonjoo Kim, M. Viswanathan, H. Ben-Abdallah, S. Kannan, I. Lee, and O. Sokolsky. Formally specified monitoring of temporal prop- erties. In Proceedings of 11th Euromicro Conference on Real-Time Systems. Euromicro RTS’99, pages 114–122, June 1999
work page 1999
-
[23]
Model-Free Reinforcement Learning for Stochastic Games with Linear Temporal Logic Objectives
Alper Kamil Bozkurt, Yu Wang, Michael M. Zavlanos, and Miroslav Pajic. Model-free reinforcement learning for stochastic games with linear temporal logic objectives, 2020. arXiv:2010.01050 [cs.RO]
work page internal anchor Pith review Pith/arXiv arXiv 2020
- [24]
-
[25]
Christel Baier and Joost-Pieter Katoen. Principles of Model Checking. MIT Press, Cambridge, MA, USA, 2008
work page 2008
-
[26]
Miroslav Pajic, James Weimer, Nicola Bezzo, Oleg Sokolsky, George J. Pappas, and Insup Lee. Design and implementation of attack-resilient cyberphysical systems: With a focus on attack-resilient state estimators. IEEE Control Systems Magazine , 37(2):66–81, April 2017
work page 2017
-
[27]
Miroslav Pajic, Insup Lee, and George J. Pappas. Attack-resilient state estimation for noisy dynamical systems. IEEE Transactions on Control of Network Systems , 4(1):82–92, March 2017
work page 2017
-
[28]
Nicola Bezzo, James Weimer, Miroslav Pajic, Oleg Sokolsky, George J. Pappas, and Insup Lee. Attack resilient state estimation for autonomous robotic systems. In 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3692–3698, Sept 2014
work page 2014
-
[29]
Young Hwan Chang, Qie Hu, and Claire J. Tomlin. Secure estimation based Kalman filter for cyber–physical systems against sensor attacks. Automatica, 95:399–412, 2018
work page 2018
-
[30]
Security-aware synthesis using delayed-action games
Mahmoud Elfar, Yu Wang, and Miroslav Pajic. Security-aware synthesis using delayed-action games. In Computer Aided Verification (CAV), pages 180–199. Springer International Publishing, 2019
work page 2019
-
[31]
Mahmoud Elfar, Haibei Zhu, Mary L. Cummings, and Miroslav Pajic. Security-aware synthesis of human-UA V protocols. In 2019 International Conference on Robotics and Automation (ICRA) , pages 8011–8017, May 2019
work page 2019
-
[32]
Fainekos, Antoine Girard, Hadas Kress-Gazit, and George J
Georgios E. Fainekos, Antoine Girard, Hadas Kress-Gazit, and George J. Pappas. Temporal logic motion planning for dynamic robots. Automatica, 45(2):343–352, February 2009
work page 2009
-
[33]
Syn- thesis for Robots: Guarantees and Feedback for Robot Behavior
Hadas Kress-Gazit, Morteza Lahijanian, and Vasumathi Raman. Syn- thesis for Robots: Guarantees and Feedback for Robot Behavior. An- nual Review of Control, Robotics, and Autonomous Systems, 1(1):211– 236, 2018
work page 2018
-
[34]
Network scheduling for secure cyber-physical systems
Vuk Lesi, Ilija Jovanov, and Miroslav Pajic. Network scheduling for secure cyber-physical systems. In 2017 IEEE Real-Time Systems Symposium (RTSS), pages 45–55, Dec 2017
work page 2017
-
[35]
Bobba, and Rodolfo Pel- lizzoni
Monowar Hasan, Sibin Mohan, Rakesh B. Bobba, and Rodolfo Pel- lizzoni. Exploring opportunistic execution for integrating security into legacy hard real-time systems. In 2016 IEEE Real-Time Systems Symposium (RTSS), pages 123–134. IEEE, 2016
work page 2016
-
[36]
Alper Kamil Bozkurt, Yu Wang, Michael M. Zavlanos, and Miroslav Pajic. Control synthesis from linear temporal logic specifications using model-free reinforcement learning. In 2020 IEEE International Conference on Robotics and Automation (ICRA) , pages 10349–10355. IEEE, 2020
work page 2020
-
[37]
Omega-regular objectives in model-free reinforcement learning
Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak. Omega-regular objectives in model-free reinforcement learning. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems , pages 395–412. Springer, 2019
work page 2019
-
[38]
Model-free reinforcement learning for stochastic parity games
Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak. Model-free reinforcement learning for stochastic parity games. In 31st International Conference on Concurrency Theory (CONCUR 2020) . Schloss Dagstuhl-Leibniz- Zentrum f ¨ur Informatik, 2020
work page 2020
-
[39]
Generalized Rabin(1) synthesis with applications to robust system synthesis
R ¨udiger Ehlers. Generalized Rabin(1) synthesis with applications to robust system synthesis. In NASA Formal Methods Symposium, pages 101–115. Springer, 2011
work page 2011
-
[40]
Efficient model checking of safety properties
Timo Latvala. Efficient model checking of safety properties. In International SPIN Workshop on Model Checking of Software , pages 74–88. Springer, 2003
work page 2003
- [41]
-
[42]
Markov games as a framework for multi-agent reinforcement learning
Michael L Littman. Markov games as a framework for multi-agent reinforcement learning. In Machine learning proceedings 1994, pages 157–163. Elsevier, 1994
work page 1994
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.