pith. sign in

arxiv: 2011.01882 · v2 · submitted 2020-11-03 · 💻 cs.RO · cs.GT

Secure Planning Against Stealthy Attacks via Model-Free Reinforcement Learning

Pith reviewed 2026-05-24 14:26 UTC · model grok-4.3

classification 💻 cs.RO cs.GT
keywords secure planningstealthy attacksmodel-free reinforcement learninglinear temporal logicstochastic gamerobotic planningunknown environmentactuator attacks
0
0 comments X

The pith

A combined LTL formula for task and stealthy-attack prevention in a stochastic game can be satisfied by model-free reinforcement learning without an environment model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to plan robot tasks securely in unknown stochastic environments when an attacker can manipulate control signals but must remain undetected by an intrusion-detection system. It models the interaction as a stochastic game between the controller and the attacker. The objectives of both parties are captured together in one linear temporal logic formula. This combined specification is then satisfied by model-free reinforcement learning, which learns a policy directly from interaction without any model of the environment. A sympathetic reader would care because many real robotic systems operate in unmapped spaces where actuator attacks are possible.

Core claim

We consider the problem of security-aware planning in an unknown stochastic environment, in the presence of attacks on control signals of the robot. We model the attacker as an agent who has the full knowledge of the controller as well as the employed intrusion-detection system and who wants to prevent the controller from performing tasks while staying stealthy. We formulate the problem as a stochastic game between the attacker and the controller and present an approach to express the objective of such an agent and the controller as a combined linear temporal logic (LTL) formula. We then show that the planning problem, described formally as the problem of satisfying an LTL formula in a stoch

What carries the argument

Combined LTL formula that encodes both task completion and stealthy-attack prevention inside a stochastic game between controller and attacker, solved via model-free RL.

If this is right

  • The planning problem can be solved without any model of the environment.
  • Model-free RL computes a policy that meets both task and security requirements expressed in the combined LTL formula.
  • The method applies to robotic systems facing actuator attacks that must remain stealthy.
  • The approach is evaluated on two robotic planning case studies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same combined-LTL plus model-free-RL pattern could be tried for other attack types whose goals are expressible in temporal logic.
  • It suggests that adversarial control problems might be handled without explicit environment models when objectives fit inside an LTL formula.
  • Scalability to high-dimensional state spaces or continuous dynamics would require further testing beyond the two case studies.

Load-bearing premise

The objectives of the attacker and the controller can be expressed together in one LTL formula that model-free reinforcement learning can satisfy in a completely unknown environment.

What would settle it

Apply model-free RL to a simulated robotic task with actuator attacks and an intrusion-detection system, then check whether the learned policy satisfies the combined LTL formula while completing the task and blocking stealthy attacks.

Figures

Figures reproduced from arXiv: 2011.01882 by Alper Kamil Bozkurt, Miroslav Pajic, Yu Wang.

Figure 1
Figure 1. Figure 1: Surveillance scenario (from left to right): (a) The controller strategy from b to c and the cell labels; (b) The controller and attacker strategies from b to c before any anomaly occurs; (c) The controller and attacker strategies from b to c after one anomaly. 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 0.79 0.78 b 0.70 0.53 0.32 0.32 0.31 0.31 c 0.31 0.80 0.76 0.60 0.27 0.16 0.22 0.31 0.31 0.31 0.81 0.69 0.27 0.14 0.… view at source ↗
Figure 2
Figure 2. Figure 2: Task sequence scenario (from left to right): (a) The controller strategy from d to e and the cell labels; (b) The controller and attacker strategies from d to e right after an anomaly occurs; (c) The controller and attacker strategies from d to e right after an alarm. path from b to c, the learned controller strategy prefers a quite long path. There is only one cell between b and c, and this cell and all t… view at source ↗
read the original abstract

We consider the problem of security-aware planning in an unknown stochastic environment, in the presence of attacks on control signals (i.e., actuators) of the robot. We model the attacker as an agent who has the full knowledge of the controller as well as the employed intrusion-detection system and who wants to prevent the controller from performing tasks while staying stealthy. We formulate the problem as a stochastic game between the attacker and the controller and present an approach to express the objective of such an agent and the controller as a combined linear temporal logic (LTL) formula. We then show that the planning problem, described formally as the problem of satisfying an LTL formula in a stochastic game, can be solved via model-free reinforcement learning when the environment is completely unknown. Finally, we illustrate and evaluate our methods on two robotic planning case studies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper addresses security-aware planning for a robot in an unknown stochastic environment subject to stealthy attacks on its actuators. The attacker is modeled as having complete knowledge of the controller and intrusion-detection system and seeking to disrupt task completion while remaining undetected. The interaction is formulated as a stochastic game whose objectives (for both parties) are encoded as a single combined LTL formula; the resulting LTL-satisfaction problem on the unknown game is then solved by model-free reinforcement learning. The method is illustrated and evaluated on two robotic planning case studies.

Significance. If the reduction from the combined LTL formula to a model-free RL objective is shown to be sound and the learned policies are demonstrated to satisfy the specification with high probability, the work would provide a practical route to secure planning without an a-priori environment model. The explicit construction of a joint LTL formula that simultaneously encodes task satisfaction and stealth constraints is a clear technical contribution.

major comments (2)
  1. [Abstract] Abstract: the central claim that 'the planning problem … can be solved via model-free reinforcement learning when the environment is completely unknown' is stated without any indication of the reward construction, the handling of the two-player game structure inside the RL loop, or convergence arguments; these details are load-bearing for the claim and must be supplied with explicit equations or algorithms.
  2. [Abstract] The weakest assumption identified in the reader’s report (that a single LTL formula can be formed whose satisfaction corresponds to both task completion and stealthy-attack prevention, and that model-free RL can find a policy for it) is never discharged in the provided description; a concrete construction of the product automaton or the reward function on accepting states is required before the reduction can be accepted.
minor comments (1)
  1. [Evaluation] The two case studies should report quantitative metrics (success rate, attack success rate, number of episodes) together with the exact LTL formulas employed so that the empirical support for the method can be assessed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive suggestions. The comments focus on making the abstract self-contained with respect to the technical reduction. We will revise the abstract to include brief but explicit indications of the LTL-to-reward construction and the handling of the game inside the RL procedure, while preserving the manuscript's existing technical sections that already contain the full derivations.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'the planning problem … can be solved via model-free reinforcement learning when the environment is completely unknown' is stated without any indication of the reward construction, the handling of the two-player game structure inside the RL loop, or convergence arguments; these details are load-bearing for the claim and must be supplied with explicit equations or algorithms.

    Authors: We agree that the abstract would be strengthened by indicating these elements. The full manuscript already supplies them: the combined LTL formula is constructed in Section III, the product automaton and the reward function (r = 1 on accepting states, discounted sum otherwise) appear in Section IV, the two-player structure is handled by treating the attacker as part of the environment in the model-free Q-learning update, and convergence follows from standard results on RL for LTL satisfaction under the assumption of sufficient exploration. To address the referee's point directly, we will add one sentence to the abstract that references the reward construction from the accepting states of the product automaton and notes that standard model-free RL is applied to the resulting zero-sum game. revision: yes

  2. Referee: [Abstract] The weakest assumption identified in the reader’s report (that a single LTL formula can be formed whose satisfaction corresponds to both task completion and stealthy-attack prevention, and that model-free RL can find a policy for it) is never discharged in the provided description; a concrete construction of the product automaton or the reward function on accepting states is required before the reduction can be accepted.

    Authors: The concrete construction is given in the body of the paper (Sections III and IV): the task LTL φ_task and the stealth LTL φ_stealth are conjoined into a single formula φ = φ_task ∧ φ_stealth; the product automaton is formed in the standard way; and the reward function assigns positive reward precisely on the accepting states of this automaton, turning LTL satisfaction into an RL objective. Because the referee correctly notes that the abstract itself does not discharge this, we will revise the abstract to include a short clause stating that the objectives are encoded as a single LTL formula whose satisfaction is reduced to a reward-maximization problem on the product automaton. revision: yes

Circularity Check

0 steps flagged

No significant circularity; standard reduction to RL on LTL game

full rationale

The derivation reduces the security planning problem to a stochastic game whose objectives are encoded as a single LTL formula; satisfaction of that formula in an unknown environment is then solved by model-free RL. This is a conventional product-automaton construction followed by reward shaping on accepting states, with no self-definitional loops, no fitted parameters renamed as predictions, and no load-bearing self-citations that close the argument. The approach is externally falsifiable via simulation on the two robotic case studies and does not rely on any uniqueness theorem or ansatz imported from the authors' prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach relies on standard assumptions in game theory and formal methods for robotics security.

axioms (2)
  • domain assumption The problem can be modeled as a stochastic game between controller and attacker.
    Central to the formulation in the abstract.
  • domain assumption Objectives can be expressed as a combined LTL formula.
    Used to formalize the planning problem.

pith-pipeline@v0.9.0 · 5671 in / 1285 out tokens · 31454 ms · 2026-05-24T14:26:47.197552+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 2 internal anchors

  1. [1]

    Kerns, Daniel P

    Andrew J. Kerns, Daniel P. Shepard, Jahshan A. Bhatti, and Todd E. Humphreys. Unmanned aircraft capture and control via GPS spoofing. Journal of Field Robotics , 31(4):617–636, 2014

  2. [2]

    Psiaki, Todd E

    Mark L. Psiaki, Todd E. Humphreys, and Brian Stauffer. Attackers can spoof navigation signals without our knowledge. Here’s how to fight back GPS lies. IEEE Spectrum, 53(8):26–53, 2016

  3. [3]

    Real-time safety assessment of unmanned aircraft systems against stealthy cyber attacks

    Cheolhyeon Kwon, Scott Yantek, and Inseok Hwang. Real-time safety assessment of unmanned aircraft systems against stealthy cyber attacks. Journal of Aerospace Information Systems, 13(1):27–45, 2016

  4. [4]

    Non-invasive spoofing attacks for anti-lock braking systems

    Yasser Shoukry, Paul Martin, Paulo Tabuada, and Mani Srivastava. Non-invasive spoofing attacks for anti-lock braking systems. In International Conference on Cryptographic Hardware and Embedded Systems, pages 55–72. Springer, 2013

  5. [5]

    D’Argenio, Bernd Finkbeiner, and Holger Hermanns

    Gilles Barthe, Pedro R. D’Argenio, Bernd Finkbeiner, and Holger Hermanns. Facets of software doping. In International Symposium on Leveraging Applications of Formal Methods, pages 601–608. Springer, 2016

  6. [6]

    Survey of recent cyber security attacks on robotic systems and their mitigation approaches

    Abdullahi Chowdhury, Gour Karmakar, and Joarder Kamruzzaman. Survey of recent cyber security attacks on robotic systems and their mitigation approaches. In Cyber Law, Privacy, and Security: Concepts, Methodologies, Tools, and Applications, pages 1426–1441. IGI Global, 2019

  7. [7]

    Secure control against replay attacks

    Yilin Mo and Bruno Sinopoli. Secure control against replay attacks. In 2009 47th Annual Allerton Conference on Communication, Control, and Computing, pages 911–918, 2009

  8. [8]

    Roy S. Smith. Covert Misappropriation of Networked Control Systems: Presenting a Feedback Structure. IEEE Control Systems Magazine, 35(1):82–92, 2015

  9. [9]

    Jo- hansson

    Andre Teixeira, Iman Shames, Henrik Sandberg, and Karl H. Jo- hansson. Revealing stealthy attacks in control systems. In 2012 50th Annual Allerton Conference on Communication, Control, and Computing, pages 1806–1813, Monticello, IL, USA, October 2012. IEEE

  10. [10]

    False data injection attacks in control systems

    Yilin Mo and Bruno Sinopoli. False data injection attacks in control systems. In First workshop on Secure Control Systems , pages 1–6, 2010

  11. [11]

    Analysis and design of stealthy cyber attacks on unmanned aerial systems

    Cheolhyeon Kwon, Weiyi Liu, and Inseok Hwang. Analysis and design of stealthy cyber attacks on unmanned aerial systems. Journal of Aerospace Information Systems , 11(8):525–539, 2014

  12. [12]

    Relaxing integrity requirements for attack-resilient cyber-physical systems

    Ilija Jovanov and Miroslav Pajic. Relaxing integrity requirements for attack-resilient cyber-physical systems. IEEE Transactions on Automatic Control, 64(12):4843–4858, Dec 2019

  13. [13]

    ConAML: Constrained Adversarial Machine Learning for Cyber-Physical Systems

    Jiangnan Li, Jin Young Lee, Yingyuan Yang, Jinyuan Stella Sun, and Kevin Tomsovic. ConAML: Constrained Adversarial Machine Learning for Cyber-Physical Systems. arXiv:2003.05631 [cs], March 2020

  14. [14]

    Adver- sarial Machine Learning Beyond the Image Domain

    Giulio Zizzo, Chris Hankin, Sergio Maffeis, and Kevin Jones. Adver- sarial Machine Learning Beyond the Image Domain. In Proceedings of the 56th Annual Design Automation Conference 2019, DAC ’19, pages 1–4, Las Vegas, NV , USA, June 2019. Association for Computing Machinery

  15. [15]

    A Deep Learning-based Framework for Conducting Stealthy Attacks in Industrial Control Systems

    Cheng Feng, Tingting Li, Zhanxing Zhu, and Deeph Chana. A deep learning-based framework for conducting stealthy attacks in industrial control systems. arXiv:1709.06397 [cs], September 2017

  16. [16]

    Lloyd S. Shapley. Stochastic games. Proceedings of the National Academy of Sciences , 39(10):1095–1100, 1953

  17. [17]

    Fainekos, and George J

    Hadas Kress-Gazit, Georgios E. Fainekos, and George J. Pappas. Where’s Waldo? sensor-based temporal logic motion planning. In Proceedings 2007 IEEE International Conference on Robotics and Automation, pages 3116–3121. IEEE, 2007

  18. [18]

    Johansson, and Dimos V

    Meng Guo, Karl H. Johansson, and Dimos V . Dimarogonas. Revising motion planning under linear temporal logic specifications in partially known workspaces. In 2013 IEEE International Conference on Robotics and Automation , pages 5025–5032. IEEE, 2013

  19. [19]

    Deshmukh, and Miroslav Pajic

    Borzoo Bonakdarpour, Jyotirmoy V . Deshmukh, and Miroslav Pajic. Opportunities and challenges in monitoring cyber-physical systems security. In International Symposium on Leveraging Applications of Formal Methods, pages 9–18. Springer, 2018

  20. [20]

    Runtime monitoring of cyber-physical systems under timing and memory constraints

    Ramy Medhat, Borzoo Bonakdarpour, Deepak Kumar, and Sebastian Fischmeister. Runtime monitoring of cyber-physical systems under timing and memory constraints. ACM Transactions on Embedded Computing Systems (TECS) , 14(4):1–29, 2015

  21. [21]

    Synthesizing monitors for safety properties

    Klaus Havelund and Grigore Ros ¸u. Synthesizing monitors for safety properties. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems , pages 342–356. Springer, 2002

  22. [22]

    Viswanathan, H

    Moonjoo Kim, M. Viswanathan, H. Ben-Abdallah, S. Kannan, I. Lee, and O. Sokolsky. Formally specified monitoring of temporal prop- erties. In Proceedings of 11th Euromicro Conference on Real-Time Systems. Euromicro RTS’99, pages 114–122, June 1999

  23. [23]

    Model-Free Reinforcement Learning for Stochastic Games with Linear Temporal Logic Objectives

    Alper Kamil Bozkurt, Yu Wang, Michael M. Zavlanos, and Miroslav Pajic. Model-free reinforcement learning for stochastic games with linear temporal logic objectives, 2020. arXiv:2010.01050 [cs.RO]

  24. [24]

    Henzinger

    Krishnendu Chatterjee and Thomas A. Henzinger. A survey of stochastic ω-regular games. Journal of Computer and System Sciences, 78(2):394 – 413, 2012. Games in Verification

  25. [25]

    Principles of Model Checking

    Christel Baier and Joost-Pieter Katoen. Principles of Model Checking. MIT Press, Cambridge, MA, USA, 2008

  26. [26]

    Pappas, and Insup Lee

    Miroslav Pajic, James Weimer, Nicola Bezzo, Oleg Sokolsky, George J. Pappas, and Insup Lee. Design and implementation of attack-resilient cyberphysical systems: With a focus on attack-resilient state estimators. IEEE Control Systems Magazine , 37(2):66–81, April 2017

  27. [27]

    Miroslav Pajic, Insup Lee, and George J. Pappas. Attack-resilient state estimation for noisy dynamical systems. IEEE Transactions on Control of Network Systems , 4(1):82–92, March 2017

  28. [28]

    Pappas, and Insup Lee

    Nicola Bezzo, James Weimer, Miroslav Pajic, Oleg Sokolsky, George J. Pappas, and Insup Lee. Attack resilient state estimation for autonomous robotic systems. In 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3692–3698, Sept 2014

  29. [29]

    Young Hwan Chang, Qie Hu, and Claire J. Tomlin. Secure estimation based Kalman filter for cyber–physical systems against sensor attacks. Automatica, 95:399–412, 2018

  30. [30]

    Security-aware synthesis using delayed-action games

    Mahmoud Elfar, Yu Wang, and Miroslav Pajic. Security-aware synthesis using delayed-action games. In Computer Aided Verification (CAV), pages 180–199. Springer International Publishing, 2019

  31. [31]

    Cummings, and Miroslav Pajic

    Mahmoud Elfar, Haibei Zhu, Mary L. Cummings, and Miroslav Pajic. Security-aware synthesis of human-UA V protocols. In 2019 International Conference on Robotics and Automation (ICRA) , pages 8011–8017, May 2019

  32. [32]

    Fainekos, Antoine Girard, Hadas Kress-Gazit, and George J

    Georgios E. Fainekos, Antoine Girard, Hadas Kress-Gazit, and George J. Pappas. Temporal logic motion planning for dynamic robots. Automatica, 45(2):343–352, February 2009

  33. [33]

    Syn- thesis for Robots: Guarantees and Feedback for Robot Behavior

    Hadas Kress-Gazit, Morteza Lahijanian, and Vasumathi Raman. Syn- thesis for Robots: Guarantees and Feedback for Robot Behavior. An- nual Review of Control, Robotics, and Autonomous Systems, 1(1):211– 236, 2018

  34. [34]

    Network scheduling for secure cyber-physical systems

    Vuk Lesi, Ilija Jovanov, and Miroslav Pajic. Network scheduling for secure cyber-physical systems. In 2017 IEEE Real-Time Systems Symposium (RTSS), pages 45–55, Dec 2017

  35. [35]

    Bobba, and Rodolfo Pel- lizzoni

    Monowar Hasan, Sibin Mohan, Rakesh B. Bobba, and Rodolfo Pel- lizzoni. Exploring opportunistic execution for integrating security into legacy hard real-time systems. In 2016 IEEE Real-Time Systems Symposium (RTSS), pages 123–134. IEEE, 2016

  36. [36]

    Zavlanos, and Miroslav Pajic

    Alper Kamil Bozkurt, Yu Wang, Michael M. Zavlanos, and Miroslav Pajic. Control synthesis from linear temporal logic specifications using model-free reinforcement learning. In 2020 IEEE International Conference on Robotics and Automation (ICRA) , pages 10349–10355. IEEE, 2020

  37. [37]

    Omega-regular objectives in model-free reinforcement learning

    Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak. Omega-regular objectives in model-free reinforcement learning. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems , pages 395–412. Springer, 2019

  38. [38]

    Model-free reinforcement learning for stochastic parity games

    Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak. Model-free reinforcement learning for stochastic parity games. In 31st International Conference on Concurrency Theory (CONCUR 2020) . Schloss Dagstuhl-Leibniz- Zentrum f ¨ur Informatik, 2020

  39. [39]

    Generalized Rabin(1) synthesis with applications to robust system synthesis

    R ¨udiger Ehlers. Generalized Rabin(1) synthesis with applications to robust system synthesis. In NASA Formal Methods Symposium, pages 101–115. Springer, 2011

  40. [40]

    Efficient model checking of safety properties

    Timo Latvala. Efficient model checking of safety properties. In International SPIN Workshop on Model Checking of Software , pages 74–88. Springer, 2003

  41. [41]

    CSRL, 2020

    CPSL@Duke. CSRL, 2020. https://gitlab.oit.duke.edu/ cpsl/csrl

  42. [42]

    Markov games as a framework for multi-agent reinforcement learning

    Michael L Littman. Markov games as a framework for multi-agent reinforcement learning. In Machine learning proceedings 1994, pages 157–163. Elsevier, 1994