pith. sign in

arxiv: 2601.14104 · v2 · pith:PNBRIR36new · submitted 2026-01-20 · 💻 cs.RO · cs.CV

When Backdoors Meet Partial Observability: Attacking Real-World Reinforcement Learning

Pith reviewed 2026-05-16 12:35 UTC · model grok-4.3

classification 💻 cs.RO cs.CV
keywords backdoor attackreinforcement learningpartial observabilitydiffusion modelreal-world roboticsvisual triggerTurtleBot3adversarial attack
0
0 comments X p. Extension
pith:PNBRIR36 Add to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{PNBRIR36}

Prints a linked pith:PNBRIR36 badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

A diffusion model learns visual triggers that activate backdoors in real robot RL policies despite varying uncontrollable sensors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Real-world reinforcement learning policies often rely on multimodal observations where attackers can manipulate only visual inputs through physical patches while LiDAR and odometry signals remain beyond control and fluctuate across trajectories. Existing backdoor attacks assume full observation control and break down under these partial-observability conditions. The paper introduces DGBA, which trains a conditional diffusion model to generate a stochastic distribution of effective visual triggers and pairs it with advantage-based poisoning that targets only critical decision states. This combination keeps the policy behaving normally on clean inputs yet reliably executes the malicious behavior when the trigger patch appears. Physical experiments on a TurtleBot3 platform confirm higher attack success rates than prior methods while preserving standard task performance.

Core claim

Backdoor attacks remain viable for real-world RL under partial observability when a conditional diffusion model learns a stochastic trigger distribution over small printable visual patches and an advantage-based strategy poisons only high-value training states, producing consistent attack activation across varying uncontrollable auxiliary observations such as LiDAR and odometry readings.

What carries the argument

The diffusion-guided backdoor attack (DGBA) framework whose core component is a conditional diffusion model that captures and samples from the distribution of visual triggers effective under fluctuating auxiliary states.

If this is right

  • Backdoor attacks become practical for physical robot deployments that use cameras alongside other sensors.
  • Attackers require control over only visual observations to achieve reliable triggers.
  • Normal policy behavior on clean inputs is preserved after the attack is embedded.
  • Advantage-based poisoning concentrates the attack on fewer, more effective training states.
  • Attack success rates exceed those of prior RL backdoor methods in real-world tests.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same diffusion-based trigger learning could extend to other multimodal real-world RL settings such as autonomous navigation where only camera images are patchable.
  • Detection methods might focus on identifying training trajectories whose visual inputs follow learned diffusion distributions rather than natural statistics.
  • Physical deployment costs remain low because the triggers are small printable patches.
  • Scaling the approach to environments with moving or changing triggers would test whether the stochastic distribution generalizes beyond static patches.

Load-bearing premise

A conditional diffusion model can learn a stochastic distribution of visual triggers that produces consistent backdoor activation regardless of how the uncontrollable auxiliary states such as LiDAR and odometry vary.

What would settle it

Deploy the backdoored TurtleBot3 policy repeatedly with the visual patch present across many different LiDAR scans and odometry trajectories and measure whether the malicious behavior activates reliably in every case or fails in a substantial fraction of them.

Figures

Figures reproduced from arXiv: 2601.14104 by Haibo Hu, Jiawei Lian, Qingqing Ye, Tairan Huang, Yaxin Xiao, Yi Wang, Yulin Jin.

Figure 1
Figure 1. Figure 1: TurtleBot3 Burger Recent studies show that such attacks remain highly effec￾tive on RL agents in simula￾tion, even when only a small fraction of training data is poi￾soned [Kiourti et al., 2019; Cui et al., 2024; Rathbun et al., 2024]. However, these attacks typically assume that policy outputs are executed directly in the environment, an assumption that holds in simulation but does not fully reflect real-… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of DGBA backdoor training and deployment pipeline. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Real-world demonstrations on TurtleBot3. Top row: clean navigation without trigger. Middle row: trigger-activated right-turn [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

Backdoor attacks can cause reinforcement learning (RL) policies to behave normally under clean inputs while executing malicious behaviors when triggers are present. Existing RL backdoor attacks are primarily studied in simulation and often assume that attackers can reliably manipulate the observations driving policy decisions. This assumption becomes fragile in real-world deployment, where RL policies commonly rely on multimodal observations. Attackers can manipulate visual inputs through physical triggers, but auxiliary states such as LiDAR and odometry signals remain uncontrollable and vary across trajectories. We study this overlooked challenge and propose a diffusion-guided backdoor attack framework (DGBA) for real-world RL. DGBA uses small printable visual patches as triggers and learns a stochastic trigger distribution via conditional diffusion to maintain consistent attack activation under varying uncontrollable states. We further introduce an advantage-based poisoning strategy that injects triggers only at decision-critical training states. Experiments on a physical TurtleBot3 platform show that DGBA consistently outperforms prior RL backdoor attacks while preserving normal task performance. Demo videos and code are available in the supplementary material.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes DGBA, a diffusion-guided backdoor attack framework for real-world RL under partial observability. It employs small printable visual patches as triggers and trains a conditional diffusion model to generate a stochastic trigger distribution that aims to ensure consistent attack activation despite varying uncontrollable auxiliary observations (e.g., LiDAR, odometry). An advantage-based poisoning strategy injects triggers only at decision-critical states during training. Physical experiments on a TurtleBot3 platform are reported to show that DGBA outperforms prior RL backdoor attacks while preserving nominal task performance.

Significance. If the physical-robot results are reproducible and the diffusion conditioning proves robust, the work would demonstrate a concrete attack vector on deployed multimodal RL policies, where attackers control only visual channels. The combination of diffusion-based stochastic triggers and advantage poisoning addresses a realistic partial-observability gap left by prior simulation-only studies. The availability of demo videos and code is a positive for reproducibility.

major comments (2)
  1. [Experimental Results] Experimental Results (physical TurtleBot3 evaluation): The central claim that DGBA 'consistently outperforms prior RL backdoor attacks' rests on assertions of superior attack success rate and preserved task performance, yet the manuscript provides no quantitative metrics (e.g., mean attack success rate ± std, number of trials, statistical significance tests) or ablation studies on auxiliary-state variance. Without these, the reported outperformance cannot be verified and the weakest assumption—that the conditional diffusion maintains stable activation probability outside the training distribution of LiDAR/odometry—remains untested.
  2. [§4.2] §4.2 (Diffusion trigger distribution): The claim that the conditional diffusion model 'learns a stochastic trigger distribution to maintain consistent attack activation under varying uncontrollable states' is load-bearing for the novelty argument, but no coverage analysis, out-of-distribution auxiliary-state success rates, or sensitivity plots are supplied. If the learned distribution collapses for unseen LiDAR/odometry combinations, the advantage-poisoning component alone would not suffice to explain the reported gains over baselines.
minor comments (2)
  1. [§3.2] Notation for the conditional diffusion model (Eq. 3–5) uses p_θ(x_t | s_aux) without explicitly defining the auxiliary-state conditioning variable s_aux in the main text; a short clarification would improve readability.
  2. [Figure 3] Figure 3 (attack activation curves) lacks error bars or per-trajectory variance, making it hard to judge stability across runs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. We address each major comment below and will revise the manuscript to incorporate additional quantitative metrics, statistical analysis, and supporting evaluations of the diffusion component.

read point-by-point responses
  1. Referee: [Experimental Results] Experimental Results (physical TurtleBot3 evaluation): The central claim that DGBA 'consistently outperforms prior RL backdoor attacks' rests on assertions of superior attack success rate and preserved task performance, yet the manuscript provides no quantitative metrics (e.g., mean attack success rate ± std, number of trials, statistical significance tests) or ablation studies on auxiliary-state variance. Without these, the reported outperformance cannot be verified and the weakest assumption—that the conditional diffusion maintains stable activation probability outside the training distribution of LiDAR/odometry—remains untested.

    Authors: We agree that the physical results section requires more rigorous quantitative support. In the revised manuscript we will add a table reporting mean attack success rate ± standard deviation across 30 independent physical trials per method, together with p-values from paired statistical tests against baselines. We will also include an ablation that systematically varies auxiliary-state noise (LiDAR and odometry) to quantify activation stability outside the training distribution. revision: yes

  2. Referee: [§4.2] §4.2 (Diffusion trigger distribution): The claim that the conditional diffusion model 'learns a stochastic trigger distribution to maintain consistent attack activation under varying uncontrollable states' is load-bearing for the novelty argument, but no coverage analysis, out-of-distribution auxiliary-state success rates, or sensitivity plots are supplied. If the learned distribution collapses for unseen LiDAR/odometry combinations, the advantage-poisoning component alone would not suffice to explain the reported gains over baselines.

    Authors: We acknowledge the need for explicit validation of the diffusion model's contribution. The revision will add (i) coverage statistics of the learned trigger distribution, (ii) attack success rates measured on held-out out-of-distribution LiDAR/odometry combinations, and (iii) sensitivity plots showing attack performance as a function of auxiliary-state deviation. These additions will demonstrate that the stochastic trigger distribution provides measurable gains beyond advantage-based poisoning. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical DGBA framework

full rationale

The paper introduces a new diffusion-guided backdoor attack (DGBA) framework for real-world RL under partial observability, using conditional diffusion to learn a stochastic trigger distribution and an advantage-based poisoning strategy. These are novel components trained on the target task rather than reductions of prior fitted parameters or self-citations. The central claims rest on physical TurtleBot3 experiments comparing against prior attacks, with no load-bearing derivation that collapses to its own inputs by construction. The approach is self-contained and empirical.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Central claim depends on standard RL advantage estimation and diffusion model training; no new entities postulated and no ad-hoc constants beyond typical model hyperparameters.

free parameters (1)
  • diffusion conditioning parameters
    Hyperparameters of the conditional diffusion model fitted to produce the stochastic trigger distribution.
axioms (1)
  • domain assumption Advantage function reliably identifies decision-critical states for targeted poisoning
    Used to decide when to inject triggers during training.

pith-pipeline@v0.9.0 · 5494 in / 1095 out tokens · 46460 ms · 2026-05-16T12:35:43.305276+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 3 internal anchors

  1. [1]

    Turtlebot 3 as a robotics education platform

    [Amsters and Slaets, 2019] Robin Amsters and Peter Slaets. Turtlebot 3 as a robotics education platform. In Mu- nir Merdan, Wilfried Lepuschitz, Gottfried Koppensteiner, Richard Balogh, and David Obdrz´alek, editors,Robotics in Education - Current Research and Innovations, Proceed- ings of the 10th RiE, Vienna, Austria, April 10-12, 2019, volume 1023 ofAd...

  2. [2]

    Stepping locomotion for a walking excavator robot using hierarchical reinforcement learning and action masking

    [Babu and Kirchner, 2025] Ajish Babu and Frank Kirchner. Stepping locomotion for a walking excavator robot using hierarchical reinforcement learning and action masking. In IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2025, Hangzhou, China, October 19- 25, 2025, pages 4917–4923. IEEE,

  3. [3]

    Multi-agent inverse reinforcement learning in real world unstructured pedestrian crowds

    [Chandraet al., 2025 ] Rohan Chandra, Haresh Karnan, Ne- gar Mehr, Peter Stone, and Joydeep Biswas. Multi-agent inverse reinforcement learning in real world unstructured pedestrian crowds. InIEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2025, Hangzhou, China, October 19-25, 2025, pages 18668–18675. IEEE,

  4. [4]

    Dames, and Mac Schwager

    [Chenet al., 2025 ] Timothy Chen, Ola Shorinwa, Joseph Bruno, Aiden Swann, Javier Yu, Weijia Zeng, Keiko Nagami, Philip M. Dames, and Mac Schwager. Splat-nav: Safe real-time robot navigation in gaussian splatting maps. IEEE Trans. Robotics, 41:2765–2784,

  5. [5]

    Choi, Fernando Casta˜neda, Won- suhk Jung, Bike Zhang, Claire J

    [Choiet al., 2025 ] Jason J. Choi, Fernando Casta˜neda, Won- suhk Jung, Bike Zhang, Claire J. Tomlin, and Koushil Sreenath. Constraint-guided online data selection for scal- able data-driven safety filters in uncertain robotic systems. IEEE Trans. Robotics, 41:3779–3798,

  6. [6]

    Badrl: Sparse targeted back- door attack against reinforcement learning

    [Cuiet al., 2024 ] Jing Cui, Yufei Han, Yuzhe Ma, Jianbin Jiao, and Junge Zhang. Badrl: Sparse targeted back- door attack against reinforcement learning. In Michael J. Wooldridge, Jennifer G. Dy, and Sriraam Natarajan, edi- tors,Thirty-Eighth AAAI Conference on Artificial Intelli- gence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Art...

  7. [7]

    Ferreira Filho, David F

    [Filhoet al., 2025 ] Edson B. Ferreira Filho, David F. Brochero Giraldo, Arthur H. D. Nunes, and Lu- ciano C. A. Pimenta. Safe radial segregation algorithm for swarms of dubins-like robots. InIEEE International Conference on Robotics and Automation, ICRA 2025, Atlanta, GA, USA, May 19-23, 2025, pages 3044–3050. IEEE,

  8. [8]

    Denoising diffusion probabilistic models

    [Hoet al., 2020 ] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria- Florina Balcan, and Hsuan-Tien Lin, editors,Advances in Neural Information Processing Systems 33: Annual Con- ference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-1...

  9. [9]

    Im- pact of static friction on sim2real in robotic reinforcement learning

    [Huet al., 2025 ] Xiaoyi Hu, Qiao Sun, Bailin He, Haojie Liu, Xueyi Zhang, Chunpeng lu, and Jiangwei Zhong. Im- pact of static friction on sim2real in robotic reinforcement learning. InIEEE/RSJ International Conference on Intel- ligent Robots and Systems, IROS 2025, Hangzhou, China, October 19-25, 2025, pages 17107–17114. IEEE,

  10. [10]

    TrojDRL: Trojan Attacks on Deep Reinforcement Learning Agents

    [Kiourtiet al., 2019 ] Panagiota Kiourti, Kacper Wardega, Susmit Jha, and Wenchao Li. Trojdrl: Trojan at- tacks on deep reinforcement learning agents.CoRR, abs/1903.06638,

  11. [11]

    Koenig and Andrew Howard

    [Koenig and Howard, 2004] Nathan P. Koenig and Andrew Howard. Design and use paradigms for gazebo, an open- source multi-robot simulator. In2004 IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems, Sendai, Japan, September 28 - October 2, 2004, pages 2149–2154. IEEE,

  12. [12]

    Weakly-supervised 3d spatial rea- soning for text-based visual question answering.IEEE Trans

    [Liet al., 2023 ] Hao Li, Jinfa Huang, Peng Jin, Guoli Song, Qi Wu, and Jie Chen. Weakly-supervised 3d spatial rea- soning for text-based visual question answering.IEEE Trans. Image Process., 32:3367–3382,

  13. [13]

    Safe reinforcement learning on the constraint manifold: Theory and applications.IEEE Trans

    [Liuet al., 2025 ] Puze Liu, Haitham Bou-Ammar, Jan Pe- ters, and Davide Tateo. Safe reinforcement learning on the constraint manifold: Theory and applications.IEEE Trans. Robotics, 41:3442–3461,

  14. [14]

    Rusu, Joel Veness, Marc G

    [Mnihet al., 2015 ] V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Belle- mare, Alex Graves, Martin A. Riedmiller, Andreas Fid- jeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Has- sabis. Human-level control t...

  15. [15]

    Visual-based forklift learning system enabling zero-shot sim2real without real-world data

    [Oishiet al., 2025 ] Koshi Oishi, Teruki Kato, Hiroya Makino, and Seigo Ito. Visual-based forklift learning system enabling zero-shot sim2real without real-world data. InIEEE International Conference on Robotics and Automation, ICRA 2025, Atlanta, GA, USA, May 19-23, 2025, pages 4915–4921. IEEE,

  16. [16]

    Sleepernets: Universal backdoor poi- soning attacks against reinforcement learning agents

    [Rathbunet al., 2024 ] Ethan Rathbun, Christopher Amato, and Alina Oprea. Sleepernets: Universal backdoor poi- soning attacks against reinforcement learning agents. In Amir Globersons, Lester Mackey, Danielle Belgrave, An- gela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors,Advances in Neural Information Process- ing Systems 38: Annual Con...

  17. [17]

    Enhanced meta reinforcement learning using demonstrations in sparse reward environments.CoRR, abs/2209.13048,

    [Rengarajanet al., 2022a ] Desik Rengarajan, Sapana Chaudhary, Jaewon Kim, Dileep Kalathil, and Srinivas Shakkottai. Enhanced meta reinforcement learning using demonstrations in sparse reward environments.CoRR, abs/2209.13048,

  18. [18]

    Kalathil, and Srinivas Shakkottai

    [Rengarajanet al., 2022b ] Desik Rengarajan, Gargi Vaidya, Akshay Sarvesh, Dileep M. Kalathil, and Srinivas Shakkottai. Reinforcement learning with sparse rewards using guidance from offline demonstration. InThe Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29,

  19. [19]

    Federated ensemble-directed offline reinforcement learning

    [Rengarajanet al., 2024 ] Desik Rengarajan, Nitin Ragothaman, Dileep Kalathil, and Srinivas Shakkot- tai. Federated ensemble-directed offline reinforcement learning. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors,Advances in Neural Infor- mation Processing Systems 38: Annual Conf...

  20. [20]

    U-net: Convolutional networks for biomedical image segmentation

    [Ronnebergeret al., 2015 ] Olaf Ronneberger, Philipp Fis- cher, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In Nassir Navab, Joachim Hornegger, William M. Wells III, and Alejan- dro F. Frangi, editors,Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th In- ternational Conference Munich, Ger...

  21. [21]

    Jordan, and Philipp Moritz

    [Schulmanet al., 2015 ] John Schulman, Sergey Levine, Pieter Abbeel, Michael I. Jordan, and Philipp Moritz. Trust region policy optimization. In Francis R. Bach and David M. Blei, editors,Proceedings of the 32nd Inter- national Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, volume 37 ofJMLR Work- shop and Conference Proceedings,...

  22. [22]

    Proximal Policy Optimization Algorithms

    [Schulmanet al., 2017 ] John Schulman, Filip Wolski, Pra- fulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.CoRR, abs/1707.06347,

  23. [23]

    Denoising Diffusion Implicit Models

    [Songet al., 2020 ] Jiaming Song, Chenlin Meng, and Ste- fano Ermon. Denoising diffusion implicit models.CoRR, abs/2010.02502,

  24. [24]

    Deep reinforcement learning for robotics: A survey of real-world successes

    [Tanget al., 2025 ] Chen Tang, Ben Abbatematteo, Jiaheng Hu, Rohan Chandra, Roberto Mart ´ın-Mart´ın, and Peter Stone. Deep reinforcement learning for robotics: A survey of real-world successes. In Toby Walsh, Julie Shah, and Zico Kolter, editors,AAAI-25, Sponsored by the Associa- tion for the Advancement of Artificial Intelligence, Febru- ary 25 - March ...

  25. [25]

    The coachai badminton environment: Bridging the gap be- tween a reinforcement learning environment and real- world badminton games

    [Wanget al., 2024 ] Kuang-Da Wang, Yu-Tse Chen, Yu- Heng Lin, Wei-Yao Wang, and Wen-Chih Peng. The coachai badminton environment: Bridging the gap be- tween a reinforcement learning environment and real- world badminton games. In Michael J. Wooldridge, Jen- nifer G. Dy, and Sriraam Natarajan, editors,Thirty-Eighth AAAI Conference on Artificial Intelligenc...

  26. [26]

    Adding conditional control to text-to-image dif- fusion models

    [Zhanget al., 2023 ] Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image dif- fusion models. InIEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023, pages 3813–3824. IEEE,

  27. [27]

    Safe corridor-based MPC for follow-ahead and obstacle avoidance of mobile robot in cluttered environ- ments

    [Zhanget al., 2025 ] Yikun Zhang, Xinxing Chen, and Jian Huang. Safe corridor-based MPC for follow-ahead and obstacle avoidance of mobile robot in cluttered environ- ments. InIEEE/RSJ International Conference on Intelli- gent Robots and Systems, IROS 2025, Hangzhou, China, October 19-25, 2025, pages 5045–5052. IEEE,

  28. [28]

    [Zhouet al., 2023 ] Ruida Zhou, Tao Liu, Min Cheng, Dileep Kalathil, P. R. Kumar, and Chao Tian. Natural actor-critic for robust reinforcement learning with function approxi- mation. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors, Advances in Neural Information Processing Systems 36: Annual Conference o...

  29. [29]

    The ingredients of real world robotic reinforcement learning

    [Zhuet al., 2020 ] Henry Zhu, Justin Yu, Abhishek Gupta, Dhruv Shah, Kristian Hartikainen, Avi Singh, Vikash Ku- mar, and Sergey Levine. The ingredients of real world robotic reinforcement learning. In8th International Con- ference on Learning Representations, ICLR 2020, Ad- dis Ababa, Ethiopia, April 26-30,

  30. [30]

    OpenReview.net, 2020