WMAttack: Automated Attack Search for Adversarial Evaluation of World-Model Agents
Pith reviewed 2026-05-25 05:12 UTC · model grok-4.3
The pith
WMAttack automates search over attack configurations to find stronger adversarial evaluations for world-model agents.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
WMAttack formulates robustness evaluation as finite-budget search over attack configurations. Self-Correcting Attack Search refines the proposal distribution using feedback from reward degradation, action instability, runtime cost, and rollout variability. Representation-Guided Attack Retrieval retrieves effective historical configurations from representation-similar tasks to warm-start new environments. Across Atari and DMC tasks this discovers stronger attacks, raising normalized reward drop from 0.497 to 1.034 on DreamerV3 Atari and from 0.319 to 0.682 on DMC.
What carries the argument
Self-Correcting Attack Search (SCAS) that updates proposal distribution from runtime feedback signals, combined with Representation-Guided Attack Retrieval (RGAR) that supplies warm-start configurations from similar tasks.
If this is right
- WMAttack yields higher normalized reward drops than evaluated baselines on both Atari and DMC suites.
- SCAS improves final attack utility when evaluation budgets are fixed.
- RGAR raises the quality of initial candidate attacks for new tasks.
- The framework supplies a theoretical condition under which proposal refinement improves finite-budget search accuracy.
Where Pith is reading between the lines
- The same search machinery could be applied to evaluate robustness of other latent-dynamics agents beyond the tested DreamerV3 instances.
- If the feedback loop generalizes, it offers a route to make attack search itself part of an iterative defense loop for world-model training.
- Representation similarity used for retrieval could be replaced by other task embeddings if the current metric proves brittle on out-of-distribution environments.
Load-bearing premise
Feedback signals from reward degradation, action instability, runtime cost, and rollout variability can reliably shift the attack proposal distribution toward higher-utility configurations without introducing selection bias or overfitting.
What would settle it
On a held-out environment, running the same fixed evaluation budget shows that manually tuned or random attacks achieve equal or higher normalized reward drop than WMAttack outputs.
Figures
read the original abstract
Despite the growing use of world models as decision-making agents, their adversarial robustness remains underexplored due to the lack of dedicated automated evaluation methods. A key obstacle is that attack evaluation must be both accurate and efficient: weak manually tuned attacks can overestimate robustness, while exhaustive hyperparameter search is prohibitively expensive because each candidate requires closed-loop rollouts through learned latent dynamics. We introduce WMAttack, an automated attack-search framework for adversarial evaluation of world-model agents. WMAttack formulates robustness evaluation as a finite-budget search over attack configurations, including attack families, perturbation budgets, optimization steps, restarts, and allocation rules. To improve search accuracy, Self-Correcting Attack Search (SCAS) refines the attack proposal distribution using feedback from reward degradation, action instability, runtime cost, and rollout variability. To improve search efficiency, Representation-Guided Attack Retrieval (RGAR) retrieves effective historical configurations from representation-similar tasks, providing a warm start for unseen environments. We provide a theoretical explanation showing that proposal refinement improves finite-budget search when it shifts probability mass toward high-utility attacks. Across Atari and DeepMind Control tasks, WMAttack consistently discovers stronger attacks than the evaluated baselines, improving normalized reward drop from 0.497 to 1.034 on DreamerV3 Atari and from 0.319 to 0.682 on DMC. Ablations further show that RGAR improves initial candidate quality and SCAS improves final attack utility under fixed evaluation budgets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces WMAttack, an automated finite-budget search framework for finding adversarial attacks on world-model agents. It proposes Self-Correcting Attack Search (SCAS), which iteratively refines the proposal distribution over attack configurations (families, budgets, steps, restarts, allocations) using feedback signals from reward degradation, action instability, runtime cost, and rollout variability, together with Representation-Guided Attack Retrieval (RGAR) that warm-starts from representation-similar tasks. A theoretical argument is supplied showing that such refinement improves search when it shifts mass toward high-utility attacks. On DreamerV3 Atari and DeepMind Control tasks the method reports normalized reward drops rising from 0.497 to 1.034 and from 0.319 to 0.682 respectively, with ablations indicating that RGAR improves initial candidates and SCAS improves final utility under fixed budgets.
Significance. If the empirical gains are robust, the work supplies a concrete, automated procedure for more reliable adversarial evaluation of world-model agents, an area the abstract correctly identifies as underexplored. The combination of a finite-budget search formulation, an explicit (if conditional) theoretical justification for proposal refinement, and ablations that isolate the contribution of SCAS and RGAR constitutes a clear methodological advance over purely manual or exhaustive tuning. The reported improvements on two standard benchmarks (Atari, DMC) would be directly usable by practitioners evaluating robustness of latent-dynamics agents.
major comments (3)
- [Abstract] Abstract and Experiments section: the headline gains (0.497→1.034 on DreamerV3 Atari; 0.319→0.682 on DMC) are stated without error bars, standard deviations, number of independent runs, or statistical tests, so it is impossible to determine whether the reported differences exceed run-to-run variability.
- [Theoretical explanation] Theoretical explanation paragraph: the argument establishes improvement only conditional on the four feedback signals correctly shifting probability mass to genuinely higher-utility attacks; it supplies no bound or analysis showing that reward degradation, action instability, runtime cost, and rollout variability computed on the identical evaluation environments are free of selection bias or environment-specific artifacts (latent dynamics or reward-predictor errors) that would not generalize.
- [Ablations] Ablations paragraph: the reported SCAS ablations demonstrate higher attack utility under fixed budget on the search environments, but do not include held-out tasks, held-out seeds, or cross-environment transfer tests that would isolate whether the utility gain survives outside the in-distribution search used to compute the feedback signals.
minor comments (1)
- [Abstract] The manuscript should clarify the precise definition of “normalized reward drop” and the exact set of baseline attack configurations against which the 0.497 and 0.319 figures were obtained.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below, indicating where revisions will be made to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract and Experiments section: the headline gains (0.497→1.034 on DreamerV3 Atari; 0.319→0.682 on DMC) are stated without error bars, standard deviations, number of independent runs, or statistical tests, so it is impossible to determine whether the reported differences exceed run-to-run variability.
Authors: We agree that the abstract and experiments section should report error bars, the number of independent runs, and any statistical tests to allow assessment of variability. The experiments were run with 5 independent seeds per task, with reported values as means; standard deviations were computed internally. We will revise both the abstract and experiments section to include these details (error bars as standard deviations, explicit run count, and significance tests where applicable). revision: yes
-
Referee: [Theoretical explanation] Theoretical explanation paragraph: the argument establishes improvement only conditional on the four feedback signals correctly shifting probability mass to genuinely higher-utility attacks; it supplies no bound or analysis showing that reward degradation, action instability, runtime cost, and rollout variability computed on the identical evaluation environments are free of selection bias or environment-specific artifacts (latent dynamics or reward-predictor errors) that would not generalize.
Authors: The theoretical argument is explicitly conditional on the feedback signals shifting mass toward higher-utility attacks, as described in the manuscript. We do not provide a formal bound or analysis proving the signals are free of selection bias or environment-specific artifacts, since the signals are derived from the same evaluation environments. This is a limitation of the current analysis. We will revise the theoretical section to more clearly articulate the assumptions and note the potential for bias in the feedback signals. revision: yes
-
Referee: [Ablations] Ablations paragraph: the reported SCAS ablations demonstrate higher attack utility under fixed budget on the search environments, but do not include held-out tasks, held-out seeds, or cross-environment transfer tests that would isolate whether the utility gain survives outside the in-distribution search used to compute the feedback signals.
Authors: The ablations are intentionally conducted on the search environments to isolate the effect of SCAS and RGAR on attack utility under the finite-budget formulation. We acknowledge that the absence of held-out tasks, additional seeds, or cross-environment transfer limits claims about generalization beyond the in-distribution setting. We will add an explicit discussion of this scope limitation in the ablations section; space permitting, we will also include results on a small set of held-out configurations. revision: partial
Circularity Check
No circularity: empirical gains are direct comparisons; theory is conditional and non-reductive.
full rationale
The paper reports normalized reward-drop improvements (0.497→1.034 Atari, 0.319→0.682 DMC) via direct experimental comparison of WMAttack against baselines under fixed budgets. SCAS and RGAR are algorithmic components whose utility is measured empirically rather than derived from fitted parameters. The supplied theoretical argument states only that refinement improves search conditional on shifting mass toward higher-utility attacks; this is a standard conditional guarantee and does not reduce the reported attack strengths or the empirical deltas to the inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing premises. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Proposal refinement improves finite-budget search when it shifts probability mass toward high-utility attacks.
Reference graph
Works this paper leans on
-
[1]
Diffusion for world modeling: Visual details matter in atari
Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kanervisto, Amos Storkey, Tim Pearce, and François Fleuret. Diffusion for world modeling: Visual details matter in atari. InAdvances in Neural Information Processing Systems, 2024
work page 2024
-
[2]
Square at- tack: A query-efficient black-box adversarial attack via random search
Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. Square at- tack: A query-efficient black-box adversarial attack via random search. InEuropean Conference on Computer Vision, 2020
work page 2020
-
[3]
Yarden As, Bhavya Sukhija, and Andreas Krause. Safe exploration using bayesian world models and log-barrier optimization.arXiv preprint arXiv:2405.05890, 2024
-
[4]
Verification of the implicit world model in a generative model via adversarial sequences
András Balogh and Márk Jelasity. Verification of the implicit world model in a generative model via adversarial sequences. InInternational Conference on Learning Representations,
-
[5]
URLhttps://openreview.net/forum?id=BLOIB8CwBI
-
[6]
On the robustness of deep reinforcement learning to adversarial attacks
Vahid Behzadan and Arslan Munir. On the robustness of deep reinforcement learning to adversarial attacks. InMachine Learning and Data Mining in Pattern Recognition: 13th International Conference, MLDM 2017, pages 262–274. Springer, 2017
work page 2017
-
[7]
Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling
Marc G. Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The arcade learning environment: An evaluation platform for general agents.Journal of Artificial Intelligence Research, 47:253–279, 2013
work page 2013
-
[8]
Rt-2: Vision-language-action models transfer web knowledge to robotic control
Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choro- manski, Tianli Ding, Danny Driess, Avinash Fu, et al. Rt-2: Vision-language-action models transfer web knowledge to robotic control. InProceedings of the 7th Annual Conference on Robot Learning (CoRL), 2023. 10
work page 2023
-
[9]
Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks
Francesco Croce and Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. InInternational Conference on Machine Learning, 2020
work page 2020
-
[10]
Minimally distorted adversarial examples with a fast adaptive boundary attack
Francesco Croce and Matthias Hein. Minimally distorted adversarial examples with a fast adaptive boundary attack. InInternational Conference on Machine Learning, 2020
work page 2020
-
[11]
RobustBench: A standardized adversarial robustness benchmark
Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Edoardo Debenedetti, Nicolas Flammarion, Mung Chiang, Prateek Mittal, and Matthias Hein. RobustBench: A standardized adversarial robustness benchmark. InAdvances in Neural Information Processing Systems Datasets and Benchmarks Track, 2021
work page 2021
-
[12]
TRAP: Tail-aware Ranking Attack for World-Model Planning
Siyuan Duan, Ke Zhang, and Xizhao Luo. Trap: Tail-aware ranking attack for world-model planning.arXiv preprint arXiv:2605.01950, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[13]
Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024. URL https://arxiv.org/abs/2407. 21783
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[14]
iworld-bench: A benchmark for interactive world models with a unified action generation framework
Jianjie Fang, Yingshan Lei, Qin Wan, Ziyou Wang, Yuchao Huang, Yongyan Xu, Baining Zhao, Weichen Zhang, Chen Gao, Xinlei Chen, and Yong Li. iworld-bench: A benchmark for interactive world models with a unified action generation framework. InProceedings of the 43rd International Conference on Machine Learning (ICML), 2026. Originally appeared as arXiv prep...
work page 2026
-
[15]
Adversarial policies: Attacking deep reinforcement learning
Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, and Stuart Russell. Adversarial policies: Attacking deep reinforcement learning. InInternational Conference on Learning Representations, 2020
work page 2020
-
[16]
Goodfellow, Jonathon Shlens, and Christian Szegedy
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adver- sarial examples. InInternational Conference on Learning Representations, 2015
work page 2015
-
[17]
Copyrightshield: Enhancing diffu- sion model security against copyright infringement attacks
Zhixiang Guo, Siyuan Liang, Aishan Liu, and Dacheng Tao. Copyrightshield: Enhancing diffu- sion model security against copyright infringement attacks. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 19417–19426, 2025
work page 2025
-
[18]
When world models dream wrong: Physical-conditioned adversarial attacks against world models, 2026
Zhixiang Guo, Siyuan Liang, Andras Balogh, Noah Lunberry, Rong-Cheng Tu, Mark Jelasity, and Dacheng Tao. When world models dream wrong: Physical-conditioned adversarial attacks against world models, 2026. URLhttps://arxiv.org/abs/2602.18739
-
[19]
Recurrent world models facilitate policy evolution
David Ha and Jürgen Schmidhuber. Recurrent world models facilitate policy evolution. In Advances in Neural Information Processing Systems, 2018
work page 2018
-
[20]
Learning latent dynamics for planning from pixels
Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. InInternational Conference on Machine Learning, 2019
work page 2019
-
[21]
Dream to control: Learning behaviors by latent imagination
Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination. InInternational Conference on Learning Representa- tions, 2020
work page 2020
-
[22]
Mastering atari with discrete world models
Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, and Jimmy Ba. Mastering atari with discrete world models. InInternational Conference on Learning Representations, 2021
work page 2021
-
[23]
Mastering Diverse Domains through World Models
Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models.arXiv preprint arXiv:2301.04104, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[24]
TD-MPC2: Scalable, robust world models for continuous control
Nicklas Hansen, Hao Su, and Xiaolong Wang. TD-MPC2: Scalable, robust world models for continuous control. InInternational Conference on Learning Representations, 2024
work page 2024
-
[25]
GAIA-1: A Generative World Model for Autonomous Driving
Anthony Hu, Lloyd Russell, Hudson Yeo, Zak Murez, George Fedoseev, Alex Kendall, Jamie Shotton, and Gianluca Corrado. GAIA-1: A generative world model for autonomous driving. arXiv preprint arXiv:2309.17080, 2023. 11
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[26]
Adversarial attacks on neural network policies
Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, and Pieter Abbeel. Adversarial attacks on neural network policies. InInternational Conference on Learning Representations Workshop, 2017
work page 2017
-
[27]
Safedreamer: Safe reinforcement learning with world models
Weidong Huang, Jiaming Ji, Chunhe Xia, Borong Zhang, and Yaodong Yang. Safedreamer: Safe reinforcement learning with world models. InInternational Conference on Learning Representations, 2024
work page 2024
-
[28]
Taisuke Kobayashi. Lira: Light-robust adversary for model-based reinforcement learning in real world.Robotics and Autonomous Systems, 2025
work page 2025
-
[29]
Dehong Kong, Sifan Yu, Siyuan Liang, Jiawei Liang, Jianhou Gan, Aishan Liu, and Wenqi Ren. Universal camouflage attack on vision-language models for autonomous driving.arXiv preprint arXiv:2509.20196, 2025
-
[30]
Muxue Liang, Chuan Wang, Siyuan Liang, Aishan Liu, Yanan Cao, Qingyong Li, Zeming Liu, Liang Yang, and Xiaochun Cao. Hard-label black-box adversarial attacks for implicit scene interactions.IEEE Transactions on Information Forensics and Security, 20:10346–10360, 2025
work page 2025
-
[31]
A large-scale multiple-objective method for black-box attack against object detection
Siyuan Liang, Longkang Li, Yanbo Fan, Xiaojun Jia, Jingzhi Li, Baoyuan Wu, and Xiaochun Cao. A large-scale multiple-objective method for black-box attack against object detection. In European Conference on Computer Vision, 2022
work page 2022
-
[32]
Siyuan Liang, Baoyuan Wu, Yanbo Fan, Xingxing Wei, and Xiaochun Cao. Parallel rect- angle flip attack: A query-based black-box attack against object detection.arXiv preprint arXiv:2201.08970, 2022
-
[33]
Object detectors in the open environment: Challenges, solutions, and outlook
Siyuan Liang, Wei Wang, Ruoyu Chen, Aishan Liu, Boxi Wu, Ee-Chien Chang, Xiaochun Cao, and Dacheng Tao. Object detectors in the open environment: Challenges, solutions, and outlook. arXiv preprint arXiv:2403.16271, 2024
-
[34]
Tactics of adversarial attack on deep reinforcement learning agents
Yen-Chen Lin, Zhang-Wei Hong, Yuan-Hong Liao, Meng-Li Shih, Ming-Yu Liu, and Min Sun. Tactics of adversarial attack on deep reinforcement learning agents. InInternational Joint Conference on Artificial Intelligence, 2017
work page 2017
-
[35]
Metadv: A unified and interactive adversarial testing platform for autonomous driving
Aishan Liu, Jiakai Wang, Tianyuan Zhang, Hainan Li, Jiangfan Liu, Siyuan Liang, Yilong Ren, Xianglong Liu, and Dacheng Tao. Metadv: A unified and interactive adversarial testing platform for autonomous driving. InProceedings of the 33rd ACM International Conference on Multimedia, pages 13474–13476, 2025
work page 2025
-
[36]
Improving adversarial transferability by stable diffusion.arXiv preprint arXiv:2311.11017, 2023
Jiayang Liu, Siyu Zhu, Siyuan Liang, Jie Zhang, Han Fang, Weiming Zhang, and Ee- Chien Chang. Improving adversarial transferability by stable diffusion.arXiv preprint arXiv:2311.11017, 2023
-
[37]
Bridging the task gap: Multi-task adversarial transferability in clip and its derivatives
Kuanrong Liu, Siyuan Liang, Cheng Qian, Ming Zhang, and Xiaochun Cao. Bridging the task gap: Multi-task adversarial transferability in clip and its derivatives. InChinese Conference on Pattern Recognition and Computer Vision (PRCV), pages 152–166. Springer, 2025
work page 2025
-
[38]
Transformers are sample-efficient world models
Vincent Micheli, Eloi Alonso, and François Fleuret. Transformers are sample-efficient world models. InInternational Conference on Learning Representations, 2023
work page 2023
-
[39]
Alexander Panfilov, Peter Romov, Igor Shilov, Yves-Alexandre de Montjoye, Jonas Geiping, and Maksym Andriushchenko. Claudini: Autoresearch discovers state-of-the-art adversarial attack algorithms for llms.arXiv preprint arXiv:2603.24511, 2026
-
[40]
How hard is it to confuse a world model?arXiv preprint arXiv:2510.21232, 2025
Waris Radji and Odalric-Ambrym Maillard. How hard is it to confuse a world model?arXiv preprint arXiv:2510.21232, 2025
-
[41]
Junwon Seo, Kensuke Nakamura, and Andrea Bajcsy. Uncertainty-aware latent safety filters for avoiding out-of-distribution failures.arXiv preprint arXiv:2505.00779, 2025
-
[42]
Ruixiang Sun, Hongyu Zang, Xin Li, and Riashat Islam. Learning latent dynamic robust representations for world models.arXiv preprint arXiv:2405.06263, 2024. 12
-
[43]
Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, Timothy Lillicrap, and Martin Riedmiller. Deepmind control suite.arXiv preprint arXiv:1801.00690, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[44]
Qwen Team. Qwen2.5 technical report.arXiv preprint arXiv:2412.15115, 2024. URL https: //arxiv.org/abs/2412.15115
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[45]
Rishi Upadhyay, Howard Zhang, Jim Solomon, Ayush Agrawal, Pranay Boreddy, Shruti Satya Narayana, Yunhao Ba, Alex Wong, Celso M. de Melo, and Achuta Kadambi. World- Bench: Disambiguating physics for diagnostic evaluation of world models.arXiv preprint arXiv:2601.21282, 2026
-
[46]
Chen, Ashesh Rambachan, Jon Kleinberg, and Sendhil Mullainathan
Keyon Vafa, Justin Y . Chen, Ashesh Rambachan, Jon Kleinberg, and Sendhil Mullainathan. Evaluating the world model implicit in a generative model. InAdvances in Neural Information Processing Systems, volume 37, 2024
work page 2024
-
[47]
Lu Wang, Tianyuan Zhang, Yang Qu, Siyuan Liang, Yuwei Chen, Aishan Liu, Xianglong Liu, and Dacheng Tao. Black-box adversarial attack on vision language models for autonomous driving.arXiv preprint arXiv:2501.13563, 2025
-
[48]
Text adversarial attacks with dynamic outputs.arXiv preprint arXiv:2509.22393, 2025
Wenqiang Wang, Siyuan Liang, Xiao Yan, and Xiaochun Cao. Text adversarial attacks with dynamic outputs.arXiv preprint arXiv:2509.22393, 2025
-
[49]
Zhiyuan Wang, Zeliang Zhang, Siyuan Liang, and Xiaosen Wang. Diversifying the high-level features for better adversarial transferability.arXiv preprint arXiv:2304.10136, 2023
-
[50]
Transferable Adversarial Attacks for Image and Video Object Detection
Xingxing Wei, Siyuan Liang, Ning Chen, and Xiaochun Cao. Transferable adversarial attacks for image and video object detection.arXiv preprint arXiv:1811.12641, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[51]
Shuhan Xu, Siyuan Liang, Hongling Zheng, Yong Luo, Han Hu, Lefei Zhang, and Dacheng Tao. Ctrlattack: A unified attack on world-model control in diffusion models.arXiv preprint arXiv:2603.13435, 2026
-
[52]
Learning interactive real-world simulators
Sherry Yang, Yilun Du, Kamyar Ghasemipour, Jonathan Tompson, Leslie Kaelbling, Dale Schuurmans, and Pieter Abbeel. Learning interactive real-world simulators. InInternational Conference on Learning Representations, 2024
work page 2024
-
[53]
Chenlu Ye, Jiafan He, Quanquan Gu, and Tong Zhang. Towards robust model-based reinforce- ment learning against adversarial corruption.arXiv preprint arXiv:2402.08991, 2024
-
[54]
Learning invariant representations for reinforcement learning without reconstruction
Amy Zhang, Rowan McAllister, Roberto Calandra, Yarin Gal, and Sergey Levine. Learning invariant representations for reinforcement learning without reconstruction. InInternational Conference on Learning Representations (ICLR), 2021
work page 2021
-
[55]
Harry Zhang. Safe deep model-based reinforcement learning with lyapunov functions.arXiv preprint arXiv:2405.16184, 2024
-
[56]
Robust deep reinforcement learning against adversarial perturbations on state observations
Huan Zhang, Hongge Chen, Chaowei Xiao, Bo Li, Duane Boning, and Cho-Jui Hsieh. Robust deep reinforcement learning against adversarial perturbations on state observations. InAdvances in Neural Information Processing Systems, 2020
work page 2020
-
[57]
Visual Adversarial Attack on Vision-Language Models for Autonomous Driving
Tianyuan Zhang, Lu Wang, Xinwei Zhang, Yitong Zhang, Boyi Jia, Siyuan Liang, Shengshan Hu, Qiang Fu, Aishan Liu, and Xianglong Liu. Visual adversarial attack on vision-language models for autonomous driving.arXiv preprint arXiv:2411.18275, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[58]
Weipu Zhang, Gang Wang, Jian Sun, Yetian Yuan, and Gao Huang. STORM: Efficient stochastic transformer based world models for reinforcement learning.arXiv preprint arXiv:2310.09615, 2024
-
[59]
World model robustness via surprise recognition.arXiv preprint arXiv:2512.01119, 2025
Geigh Zollicoffer, Tanush Chopra, Mingkuan Yan, Xiaoxu Ma, Kenneth Eaton, and Mark Riedl. World model robustness via surprise recognition.arXiv preprint arXiv:2512.01119, 2025. 13 A Victim and Threat Model Details This appendix details the closed-loop evaluation process under clean and attacked observations. The victim is a fixed world-model agent Mθ = (ϕ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.