pith. sign in

arxiv: 2502.03698 · v4 · submitted 2025-02-06 · 💻 cs.LG · cs.CR· cs.RO

How Vulnerable Is My Learned Policy? Universal Adversarial Perturbation Attacks On Modern Behavior Cloning Policies

Pith reviewed 2026-05-23 03:29 UTC · model grok-4.3

classification 💻 cs.LG cs.CRcs.RO
keywords adversarial attacksbehavior cloningimitation learninguniversal adversarial perturbationsblack-box attackspolicy vulnerabilitytransfer attacks
0
0 comments X

The pith

Modern behavior cloning policies are highly vulnerable to universal adversarial perturbation attacks, including black-box transfers across algorithms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper performs the first systematic study of adversarial attacks on a range of imitation learning algorithms used for behavior cloning. It evaluates methods including Vanilla Behavior Cloning, LSTM-GMM, Implicit Behavior Cloning, Diffusion Policy, and Vector-Quantized Behavior Transformer under white-box, grey-box, and black-box universal adversarial perturbations. Experiments show that these policies are highly vulnerable, with attacks transferring successfully even without direct access to the target model. A reader would care because learning from demonstrations is a common way to train AI agents, and undetected fragility could affect reliability in deployment settings.

Core claim

The central claim is that most existing imitation learning algorithms for behavior cloning are highly vulnerable to universal adversarial perturbations. This vulnerability appears in white-box settings where full model access is available as well as in black-box transfer attacks where perturbations crafted on one algorithm affect others. The study compares vulnerabilities across classic and recent methods and concludes that these algorithms share common weaknesses to such attacks.

What carries the argument

Universal adversarial perturbation attacks applied to the input observations of behavior cloning policies, tested in white-box, grey-box, and black-box transfer settings across multiple algorithms.

If this is right

  • Vulnerability holds for both white-box and black-box attacks across a range of imitation learning algorithms.
  • Black-box transfer attacks succeed, allowing perturbations to move between different algorithms without model access.
  • Current imitation learning methods share limitations that make them susceptible to input perturbations.
  • The findings point to the need for new approaches to improve robustness in behavior cloning policies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the claim holds, adversarial robustness testing should become a standard part of evaluating any new behavior cloning method.
  • This raises the possibility that sensor noise or small environmental changes in real-world settings could act like these attacks and disrupt deployed policies.
  • The results connect to larger questions about how learned policies handle input variations that were not present in training demonstrations.

Load-bearing premise

The tested algorithms and attack setups are representative of modern behavior cloning policies and realistic threats in their intended deployment domains.

What would settle it

New experiments on additional imitation learning algorithms or in physical robot deployments that show low attack success rates would falsify the claim of widespread high vulnerability.

Figures

Figures reproduced from arXiv: 2502.03698 by Akansha Kalra, Basavasagar Patil, Daniel S. Brown, Guanhong Tao.

Figure 1
Figure 1. Figure 1: Environments used : for crafting and evaluating Uni￾versal Adversarial Perturbation attacks to study adversarial robustness of modern behavior cloning algorithms. (a)-(c) are from RoboMimic [10] and (d) is from [11]. . increases complexity by requiring precise alignment and complex insertion dynamics. The nut’s initial pose is ran￾domized with z-axis rotation within a square region on the table surface. Pu… view at source ↗
Figure 2
Figure 2. Figure 2: Task Success Rates under decreasing attack strength (ε) of different behavior cloning algorithms demon￾strating their sensitivity to even small adversarial inputs. The steep drop in performance of all BC algorithms except IBC, which suffers a minimal drop, emphasizes the lack of robustness across algorithms. D. How sensitive are attacks to the range of adversarial perturbation? We systematically vary attac… view at source ↗
read the original abstract

Learning from demonstrations is a popular approach to train AI models; however, their vulnerability to adversarial attacks remains underexplored. We present the first systematic study of adversarial attacks, across a range of both classic and recently proposed imitation learning algorithms, including Vanilla Behavior Cloning (Vanilla BC), LSTM-GMM, Implicit Behavior Cloning (IBC), Diffusion Policy (DP), and Vector-Quantized Behavior Transformer (VQ-BET). We study the vulnerability of these methods to both white-box, grey-box and black-box adversarial perturbations. Our experiments reveal that most existing methods are highly vulnerable to these attacks, including black-box transfer attacks that transfer across algorithms. To the best of our knowledge, we are the first to study and compare the vulnerabilities of different popular imitation learning algorithms to both white-box and black-box attacks. Our findings highlight the vulnerabilities of modern imitation learning algorithms, paving the way for future work in addressing such limitations. Videos and code are available at https://sites.google.com/view/uap-attacks-on-bc.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper conducts the first systematic empirical study of universal adversarial perturbation (UAP) attacks on behavior cloning policies. It evaluates five imitation learning algorithms—Vanilla BC, LSTM-GMM, IBC, Diffusion Policy (DP), and VQ-BET—under white-box, grey-box, and black-box attack settings, reporting that most are highly vulnerable with successful black-box transfer across algorithms. The work positions itself as the initial comparison of these vulnerabilities and releases code and videos.

Significance. If the experimental results hold under broader conditions, the findings would establish a concrete security limitation for deployed imitation learning systems, particularly in robotics and control where BC is common. The cross-algorithm black-box transfer result, if robust, would be especially notable as it suggests shared vulnerabilities rather than algorithm-specific weaknesses. The public release of code supports reproducibility and follow-on work on defenses.

major comments (3)
  1. [§5] §5 (Experiments) and Table 2: The headline claim that 'most existing methods are highly vulnerable' rests on results from only five algorithms. No explicit justification or coverage argument is given for why Vanilla BC, LSTM-GMM, IBC, DP, and VQ-BET are representative of the current diversity of BC methods (e.g., newer transformer-based or flow-matching variants trained on large heterogeneous datasets). Without this, the transferability and vulnerability conclusions cannot be generalized beyond the tested suite.
  2. [§4.3] §4.3 and §5.2: The black-box transfer attack protocol is described at a high level, but the manuscript does not report the precise success-rate thresholds, number of source-target pairs, or statistical significance tests used to declare 'transfer across algorithms.' This detail is load-bearing for the central empirical claim.
  3. [§5.1] §5.1, Table 1: The environments and observation spaces used (e.g., dimensionality, horizon length, presence of safety constraints) are not compared against typical real-world BC deployment settings. If the chosen tasks are low-dimensional or lack the complexity of modern applications, the reported attack success rates may not indicate vulnerability under realistic threat models.
minor comments (2)
  1. [Introduction] The abstract states the study covers 'classic and recently proposed' algorithms, but the introduction does not cite the original papers for each of the five methods with publication years; adding these would improve context.
  2. Figure 3 (attack visualization) caption does not specify the perturbation magnitude (ε) or the exact policy being visualized; this reduces clarity for readers reproducing the results.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below with clarifications and indicate planned revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§5] §5 (Experiments) and Table 2: The headline claim that 'most existing methods are highly vulnerable' rests on results from only five algorithms. No explicit justification or coverage argument is given for why Vanilla BC, LSTM-GMM, IBC, DP, and VQ-BET are representative of the current diversity of BC methods (e.g., newer transformer-based or flow-matching variants trained on large heterogeneous datasets). Without this, the transferability and vulnerability conclusions cannot be generalized beyond the tested suite.

    Authors: These five algorithms were deliberately selected to span classical (Vanilla BC, LSTM-GMM) to modern (IBC, Diffusion Policy, VQ-BET) approaches, covering feedforward, recurrent, implicit, diffusion, and transformer-based paradigms that dominate recent BC literature. VQ-BET specifically addresses transformer-based methods. We will add an explicit justification subsection in §5 discussing selection criteria, prevalence in the field, and scope limitations (e.g., excluding certain flow-matching variants). This will support the claims without overgeneralization. revision: yes

  2. Referee: [§4.3] §4.3 and §5.2: The black-box transfer attack protocol is described at a high level, but the manuscript does not report the precise success-rate thresholds, number of source-target pairs, or statistical significance tests used to declare 'transfer across algorithms.' This detail is load-bearing for the central empirical claim.

    Authors: The referee correctly notes that these implementation details are not fully reported in the current manuscript. We will revise §4.3 and §5.2 to explicitly state the success-rate thresholds, the number of source-target pairs evaluated, and any statistical significance tests used, ensuring the transfer results are fully reproducible and supported. revision: yes

  3. Referee: [§5.1] §5.1, Table 1: The environments and observation spaces used (e.g., dimensionality, horizon length, presence of safety constraints) are not compared against typical real-world BC deployment settings. If the chosen tasks are low-dimensional or lack the complexity of modern applications, the reported attack success rates may not indicate vulnerability under realistic threat models.

    Authors: The environments were chosen as standard benchmarks from the source papers of each method to enable fair comparisons. We agree a direct mapping to real-world settings is absent. In the revision we will expand §5.1 with a discussion comparing observation dimensionality, horizon lengths, and constraints to typical robotic deployments, plus an explicit limitations paragraph on the gap to more complex real-world scenarios. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical evaluation of existing algorithms

full rationale

The paper conducts an experimental comparison of adversarial vulnerability across five imitation learning methods (Vanilla BC, LSTM-GMM, IBC, DP, VQ-BET) using white-box, grey-box, and black-box attacks. No derivation chain, equations, fitted parameters renamed as predictions, or self-citations that bear the load of any central claim exist. Results are reported directly from the described experiments on the chosen tasks and algorithms without reduction to prior definitions or ansatzes. The representativeness concern raised by the skeptic is a question of external validity, not circularity per the enumerated patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This empirical study applies existing adversarial attack techniques to imitation learning; it introduces no new free parameters, mathematical axioms beyond standard ML assumptions, or invented entities.

axioms (1)
  • domain assumption Imitation learning policies can be subjected to gradient-based or transfer-based adversarial perturbations using techniques from supervised learning.
    Required to apply universal adversarial perturbations to the listed behavior cloning methods.

pith-pipeline@v0.9.0 · 5723 in / 1125 out tokens · 41027 ms · 2026-05-23T03:29:30.770535+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Immune2V: Image Immunization Against Dual-Stream Image-to-Video Generation

    cs.CV 2026-04 unverdicted novelty 7.0

    Immune2V immunizes images against dual-stream I2V generation by enforcing temporally balanced latent divergence and aligning generative features to a precomputed collapse trajectory, yielding stronger persistent degra...

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · cited by 1 Pith paper · 4 internal anchors

  1. [1]

    Intriguing properties of neural networks

    C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” CoRR, vol. abs/1312.6199, 2013

  2. [2]

    Explaining and Harnessing Adversarial Examples

    I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harness- ing adversarial examples,” CoRR, vol. abs/1412.6572, 2014

  3. [3]

    Threat of adversarial attacks on deep learning in computer vision: A survey,

    N. Akhtar and A. Mian, “Threat of adversarial attacks on deep learning in computer vision: A survey,” Ieee Access, 2018

  4. [4]

    Adversarial attacks on deep-learning models in natural language processing: A survey,

    W. E. Zhang, Q. Z. Sheng, A. Alhazmi, and C. Li, “Adversarial attacks on deep-learning models in natural language processing: A survey,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 11, no. 3, pp. 1–41, 2020

  5. [5]

    A survey on adversarial attacks and defences,

    A. Chakraborty, M. Alam, V . Dey, A. Chattopadhyay, and D. Mukhopadhyay, “A survey on adversarial attacks and defences,” CAAI Transactions on Intelligence Technology, 2021

  6. [6]

    Studying adversarial attacks on behavioral cloning dynamics,

    G. Hall, A. Das, J. Quarles, and P. Rad, “Studying adversarial attacks on behavioral cloning dynamics,” in 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 2020, pp. 452–459

  7. [7]

    Adversar- ial driving: Attacking end-to-end autonomous driving,

    H. Wu, S. Yunas, S. Rowlands, W. Ruan, and J. Wahlstr ¨om, “Adversar- ial driving: Attacking end-to-end autonomous driving,” in 2023 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2023, pp. 1–7

  8. [8]

    Simple physical adversarial examples against end-to-end autonomous driv- ing models,

    A. Boloor, X. He, C. Gill, Y . V orobeychik, and X. Zhang, “Simple physical adversarial examples against end-to-end autonomous driv- ing models,” in 2019 IEEE International Conference on Embedded Software and Systems (ICESS). IEEE, 2019, pp. 1–7

  9. [9]

    Diffusion policy attacker: Craft- ing adversarial attacks for diffusion-based policies,

    Y . Chen, H. Xue, and Y . Chen, “Diffusion policy attacker: Craft- ing adversarial attacks for diffusion-based policies,” ArXiv, vol. abs/2405.19424, 2024

  10. [10]

    What matters in learning from offline human demonstrations for robot manipulation,

    A. Mandlekar, D. Xu, J. Wong, S. Nasiriany, C. Wang, R. Kulkarni, L. Fei-Fei, S. Savarese, Y . Zhu, and R. Mart’in-Mart’in, “What matters in learning from offline human demonstrations for robot manipulation,” in Conference on Robot Learning, 2021

  11. [11]

    Implicit behavioral cloning,

    P. Florence, C. Lynch, A. Zeng, O. Ramirez, A. Wahid, L. Downs, A. Wong, J. Lee, I. Mordatch, and J. Tompson, “Implicit behavioral cloning,” Conference on Robot Learning (CoRL), 2021

  12. [12]

    Diffusion policy: Visuomotor policy learning via action diffusion,

    C. Chi, S. Feng, Y . Du, Z. Xu, E. Cousineau, B. Burchfiel, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,” in Proceedings of Robotics: Science and Systems (RSS), 2023

  13. [13]

    Behavior generation with latent actions

    S. Lee, Y . Wang, H. Etukuru, H. J. Kim, N. Muhammad, M. Shafiullah, and L. Pinto, “Behavior generation with latent actions,” ArXiv, vol. abs/2403.03181, 2024

  14. [14]

    Uni- versal adversarial perturbations,

    S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Uni- versal adversarial perturbations,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

  15. [15]

    (certified!!) adversarial robustness for free!

    N. Carlini, F. Tram `er, K. D. Dvijotham, L. Rice, M. Sun, and J. Z. Kolter, “(certified!!) adversarial robustness for free!” in The Eleventh International Conference on Learning Representations. OpenReview, 2023

  16. [16]

    Physical adversarial attack on a robotic arm,

    Y . Jia, C. M. Poskitt, J. Sun, and S. Chattopadhyay, “Physical adversarial attack on a robotic arm,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 9334–9341, 2022

  17. [17]

    Quantifying assistive robustness via the natural-adversarial frontier,

    J. Z.-Y . He, D. S. Brown, Z. Erickson, and A. Dragan, “Quantifying assistive robustness via the natural-adversarial frontier,” in Conference on Robot Learning. PMLR, 2023, pp. 1865–1886

  18. [18]

    Preventing imitation learning with adversarial policy ensembles,

    A. Zhan, S. Tiomkin, and P. Abbeel, “Preventing imitation learning with adversarial policy ensembles,” arXiv preprint arXiv:2002.01059, 2020

  19. [19]

    Rethinking the intermediate features in adversarial attacks: Misleading robotic models via adver- sarial distillation,

    K. Zhao, H. Huang, M. Li, and Y . Wu, “Rethinking the intermediate features in adversarial attacks: Misleading robotic models via adver- sarial distillation,” arXiv preprint arXiv:2411.15222, 2024

  20. [20]

    Attacking deep reinforcement learning with decoupled adversarial policy,

    K. Mo, W. Tang, J. Li, and X. Yuan, “Attacking deep reinforcement learning with decoupled adversarial policy,” IEEE Transactions on Dependable and Secure Computing, vol. 20, pp. 758–768, 2023

  21. [21]

    Stealthy and efficient adversarial attacks against deep reinforcement learning,

    J. Sun, T. Zhang, X. Xie, L. Ma, Y . Zheng, K. Chen, and Y . Liu, “Stealthy and efficient adversarial attacks against deep reinforcement learning,” in AAAI Conference on Artificial Intelligence, 2020

  22. [22]

    Robust deep reinforcement learning with adversarial attacks,

    A. Pattanaik, Z. Tang, S. Liu, G. Bommannan, and G. V . Chowdhary, “Robust deep reinforcement learning with adversarial attacks,” in Adaptive Agents and Multi-Agent Systems, 2017

  23. [23]

    Tactics of adversarial attack on deep reinforcement learning agents,

    Y .-C. Lin, Z.-W. Hong, Y .-H. Liao, M.-L. Shih, M.-Y . Liu, and M. Sun, “Tactics of adversarial attack on deep reinforcement learning agents,” in International Joint Conference on Artificial Intelligence, 2017

  24. [24]

    Adversarial policies: Attacking deep reinforcement learning.arXiv preprint arXiv:1905.10615, 2019

    A. Gleave, M. Dennis, N. Kant, C. Wild, S. Levine, and S. J. Russell, “Adversarial policies: Attacking deep reinforcement learning,” ArXiv, vol. abs/1905.10615, 2019

  25. [25]

    Robust deep reinforcement learning against adversarial perturbations on state observations,

    H. Zhang, H. Chen, C. Xiao, B. Li, M. Liu, D. Boning, and C.-J. Hsieh, “Robust deep reinforcement learning against adversarial perturbations on state observations,” Advances in Neural Information Processing Systems, vol. 33, pp. 21 024–21 037, 2020

  26. [26]

    Robust reinforcement learning on state observations with learned optimal adversary,

    H. Zhang, H. Chen, D. Boning, and C.-J. Hsieh, “Robust reinforcement learning on state observations with learned optimal adversary,” in International Conference on Learning Representation (ICLR), 2021

  27. [27]

    Robust reinforcement learning: A review of foundations and recent advances,

    J. Moos, K. Hansel, H. Abdulsamad, S. Stark, D. Clever, and J. Peters, “Robust reinforcement learning: A review of foundations and recent advances,” Machine Learning and Knowledge Extraction, 2022

  28. [28]

    Adversarial Attacks on Neural Network Policies

    S. Huang, N. Papernot, I. Goodfellow, Y . Duan, and P. Abbeel, “Adversarial attacks on neural network policies,” arXiv preprint arXiv:1702.02284, 2017

  29. [29]

    White- box adversarial policies in deep reinforcement learning,

    S. Casper, T. Killian, G. Kreiman, and D. Hadfield-Menell, “White- box adversarial policies in deep reinforcement learning,”arXiv preprint arXiv:2209.02167, 2022

  30. [30]

    Bird: generalizable backdoor detection and removal for deep reinforcement learning,

    X. Chen, W. Guo, G. Tao, X. Zhang, and D. Song, “Bird: generalizable backdoor detection and removal for deep reinforcement learning,” Advances in Neural Information Processing Systems, 2023

  31. [31]

    A framework for behavioural cloning

    M. Bain and C. Sammut, “A framework for behavioural cloning.” in Machine Intelligence 15, 1995, pp. 103–129

  32. [32]

    Behavioral cloning from obser- vation,

    F. Torabi, G. Warnell, and P. Stone, “Behavioral cloning from obser- vation,” in Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018, pp. 4950–4957

  33. [33]

    A reduction of imitation learning and structured prediction to no-regret online learning,

    S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011

  34. [34]

    Long short-term memory,

    S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, pp. 1735–1780, 1997

  35. [35]

    G. J. McLachlan and D. Peel, Finite mixture models. John Wiley & Sons, 2000

  36. [36]

    Denoising Diffusion Probabilistic Models

    J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” ArXiv, vol. abs/2006.11239, 2020

  37. [37]

    Towards deep learning models resistant to adversarial attacks,

    A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in International Conference on Learning Representations, 2018