WMAttack: Automated Attack Search for Adversarial Evaluation of World-Model Agents

Andras Balogh; Cheng Guo; Dacheng Tao; Mark Jelasity; Shi Fu; Siyuan Liang; Zhixiang Guo

arxiv: 2605.23220 · v1 · pith:SAOKXUIRnew · submitted 2026-05-22 · 💻 cs.LG

WMAttack: Automated Attack Search for Adversarial Evaluation of World-Model Agents

Zhixiang Guo , Siyuan Liang , Shi Fu , Cheng Guo , Andras Balogh , Mark Jelasity , Dacheng Tao This is my paper

Pith reviewed 2026-05-25 05:12 UTC · model grok-4.3

classification 💻 cs.LG

keywords adversarial robustnessworld modelsreinforcement learningattack searchrobustness evaluationAtariDeepMind Control

0 comments

The pith

WMAttack automates search over attack configurations to find stronger adversarial evaluations for world-model agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents WMAttack as a finite-budget search framework that treats attack selection as an optimization problem over families, budgets, steps, and allocation rules. It introduces Self-Correcting Attack Search to shift proposal probabilities toward higher-utility attacks using signals from reward drops, action changes, cost, and variability, plus Representation-Guided Attack Retrieval to reuse effective settings from similar prior tasks. Experiments on DreamerV3 Atari and DeepMind Control show the method yields higher normalized reward drops than baselines. A supporting argument shows that such refinement improves search quality under limited evaluations. The result matters because weaker manual attacks can falsely suggest greater robustness than actually exists.

Core claim

WMAttack formulates robustness evaluation as finite-budget search over attack configurations. Self-Correcting Attack Search refines the proposal distribution using feedback from reward degradation, action instability, runtime cost, and rollout variability. Representation-Guided Attack Retrieval retrieves effective historical configurations from representation-similar tasks to warm-start new environments. Across Atari and DMC tasks this discovers stronger attacks, raising normalized reward drop from 0.497 to 1.034 on DreamerV3 Atari and from 0.319 to 0.682 on DMC.

What carries the argument

Self-Correcting Attack Search (SCAS) that updates proposal distribution from runtime feedback signals, combined with Representation-Guided Attack Retrieval (RGAR) that supplies warm-start configurations from similar tasks.

If this is right

WMAttack yields higher normalized reward drops than evaluated baselines on both Atari and DMC suites.
SCAS improves final attack utility when evaluation budgets are fixed.
RGAR raises the quality of initial candidate attacks for new tasks.
The framework supplies a theoretical condition under which proposal refinement improves finite-budget search accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same search machinery could be applied to evaluate robustness of other latent-dynamics agents beyond the tested DreamerV3 instances.
If the feedback loop generalizes, it offers a route to make attack search itself part of an iterative defense loop for world-model training.
Representation similarity used for retrieval could be replaced by other task embeddings if the current metric proves brittle on out-of-distribution environments.

Load-bearing premise

Feedback signals from reward degradation, action instability, runtime cost, and rollout variability can reliably shift the attack proposal distribution toward higher-utility configurations without introducing selection bias or overfitting.

What would settle it

On a held-out environment, running the same fixed evaluation budget shows that manually tuned or random attacks achieve equal or higher normalized reward drop than WMAttack outputs.

Figures

Figures reproduced from arXiv: 2605.23220 by Andras Balogh, Cheng Guo, Dacheng Tao, Mark Jelasity, Shi Fu, Siyuan Liang, Zhixiang Guo.

**Figure 2.** Figure 2: Overview of WMAttack. RGAR initializes the attack proposal distribution from [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Task-level reward-drop distribution and per-task improvement on DreamerV3 Atari and [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Attack-specific search efficiency on DreamerV3 Atari and DMC. WMAttack generally [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Design analysis of WMAttack on DreamerV3 Atari. RGAR improves first-round candidate [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗

read the original abstract

Despite the growing use of world models as decision-making agents, their adversarial robustness remains underexplored due to the lack of dedicated automated evaluation methods. A key obstacle is that attack evaluation must be both accurate and efficient: weak manually tuned attacks can overestimate robustness, while exhaustive hyperparameter search is prohibitively expensive because each candidate requires closed-loop rollouts through learned latent dynamics. We introduce WMAttack, an automated attack-search framework for adversarial evaluation of world-model agents. WMAttack formulates robustness evaluation as a finite-budget search over attack configurations, including attack families, perturbation budgets, optimization steps, restarts, and allocation rules. To improve search accuracy, Self-Correcting Attack Search (SCAS) refines the attack proposal distribution using feedback from reward degradation, action instability, runtime cost, and rollout variability. To improve search efficiency, Representation-Guided Attack Retrieval (RGAR) retrieves effective historical configurations from representation-similar tasks, providing a warm start for unseen environments. We provide a theoretical explanation showing that proposal refinement improves finite-budget search when it shifts probability mass toward high-utility attacks. Across Atari and DeepMind Control tasks, WMAttack consistently discovers stronger attacks than the evaluated baselines, improving normalized reward drop from 0.497 to 1.034 on DreamerV3 Atari and from 0.319 to 0.682 on DMC. Ablations further show that RGAR improves initial candidate quality and SCAS improves final attack utility under fixed evaluation budgets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

WMAttack supplies a workable search loop for stronger attacks on world-model agents, with gains on Atari and DMC, but the feedback signals come from the same rollouts so the improvements may not travel.

read the letter

The paper's main point is a finite-budget search method called WMAttack that combines two new pieces: SCAS, which updates attack proposals using reward drop, action instability, cost, and rollout variance, and RGAR, which pulls prior configs from representation-similar tasks for a warm start. They show this beats the baselines they compare against, lifting normalized reward drop from 0.497 to 1.034 on DreamerV3 Atari and from 0.319 to 0.682 on DMC tasks. The theory section gives a straightforward argument that shifting probability mass toward higher-utility attacks improves the search under a fixed budget, which is reasonable on its face. The experiments include ablations that separate the contribution of each component under the same budget, and the framing as an automated evaluation tool fills a gap that people working on model-based RL have noted. That part is useful and cleanly executed. The soft spot is exactly the one the stress test flags. All four feedback signals in SCAS are computed on the identical environments and world models being attacked, so the search can favor configurations that exploit latent dynamics or reward predictor errors specific to those runs. RGAR still evaluates on the target task, so the final reported attacks are the result of in-distribution tuning. The paper does not show results on held-out tasks, different random seeds, or transfer to new world models, which leaves the practical value of the discovered attacks unclear. The theory only conditions on the signals being unbiased; it does not bound environment-specific artifacts. This is the kind of work that belongs in a robustness or RL evaluation venue. Readers who need concrete attack generators for testing world-model agents will find the method and the code useful once the overfitting question is addressed. It is coherent on its own terms and engages the right literature, so it should go to peer review rather than a desk reject. The referees can ask for the extra controls that would make the gains more convincing.

Referee Report

3 major / 1 minor

Summary. The manuscript introduces WMAttack, an automated finite-budget search framework for finding adversarial attacks on world-model agents. It proposes Self-Correcting Attack Search (SCAS), which iteratively refines the proposal distribution over attack configurations (families, budgets, steps, restarts, allocations) using feedback signals from reward degradation, action instability, runtime cost, and rollout variability, together with Representation-Guided Attack Retrieval (RGAR) that warm-starts from representation-similar tasks. A theoretical argument is supplied showing that such refinement improves search when it shifts mass toward high-utility attacks. On DreamerV3 Atari and DeepMind Control tasks the method reports normalized reward drops rising from 0.497 to 1.034 and from 0.319 to 0.682 respectively, with ablations indicating that RGAR improves initial candidates and SCAS improves final utility under fixed budgets.

Significance. If the empirical gains are robust, the work supplies a concrete, automated procedure for more reliable adversarial evaluation of world-model agents, an area the abstract correctly identifies as underexplored. The combination of a finite-budget search formulation, an explicit (if conditional) theoretical justification for proposal refinement, and ablations that isolate the contribution of SCAS and RGAR constitutes a clear methodological advance over purely manual or exhaustive tuning. The reported improvements on two standard benchmarks (Atari, DMC) would be directly usable by practitioners evaluating robustness of latent-dynamics agents.

major comments (3)

[Abstract] Abstract and Experiments section: the headline gains (0.497→1.034 on DreamerV3 Atari; 0.319→0.682 on DMC) are stated without error bars, standard deviations, number of independent runs, or statistical tests, so it is impossible to determine whether the reported differences exceed run-to-run variability.
[Theoretical explanation] Theoretical explanation paragraph: the argument establishes improvement only conditional on the four feedback signals correctly shifting probability mass to genuinely higher-utility attacks; it supplies no bound or analysis showing that reward degradation, action instability, runtime cost, and rollout variability computed on the identical evaluation environments are free of selection bias or environment-specific artifacts (latent dynamics or reward-predictor errors) that would not generalize.
[Ablations] Ablations paragraph: the reported SCAS ablations demonstrate higher attack utility under fixed budget on the search environments, but do not include held-out tasks, held-out seeds, or cross-environment transfer tests that would isolate whether the utility gain survives outside the in-distribution search used to compute the feedback signals.

minor comments (1)

[Abstract] The manuscript should clarify the precise definition of “normalized reward drop” and the exact set of baseline attack configurations against which the 0.497 and 0.319 figures were obtained.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below, indicating where revisions will be made to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract and Experiments section: the headline gains (0.497→1.034 on DreamerV3 Atari; 0.319→0.682 on DMC) are stated without error bars, standard deviations, number of independent runs, or statistical tests, so it is impossible to determine whether the reported differences exceed run-to-run variability.

Authors: We agree that the abstract and experiments section should report error bars, the number of independent runs, and any statistical tests to allow assessment of variability. The experiments were run with 5 independent seeds per task, with reported values as means; standard deviations were computed internally. We will revise both the abstract and experiments section to include these details (error bars as standard deviations, explicit run count, and significance tests where applicable). revision: yes
Referee: [Theoretical explanation] Theoretical explanation paragraph: the argument establishes improvement only conditional on the four feedback signals correctly shifting probability mass to genuinely higher-utility attacks; it supplies no bound or analysis showing that reward degradation, action instability, runtime cost, and rollout variability computed on the identical evaluation environments are free of selection bias or environment-specific artifacts (latent dynamics or reward-predictor errors) that would not generalize.

Authors: The theoretical argument is explicitly conditional on the feedback signals shifting mass toward higher-utility attacks, as described in the manuscript. We do not provide a formal bound or analysis proving the signals are free of selection bias or environment-specific artifacts, since the signals are derived from the same evaluation environments. This is a limitation of the current analysis. We will revise the theoretical section to more clearly articulate the assumptions and note the potential for bias in the feedback signals. revision: yes
Referee: [Ablations] Ablations paragraph: the reported SCAS ablations demonstrate higher attack utility under fixed budget on the search environments, but do not include held-out tasks, held-out seeds, or cross-environment transfer tests that would isolate whether the utility gain survives outside the in-distribution search used to compute the feedback signals.

Authors: The ablations are intentionally conducted on the search environments to isolate the effect of SCAS and RGAR on attack utility under the finite-budget formulation. We acknowledge that the absence of held-out tasks, additional seeds, or cross-environment transfer limits claims about generalization beyond the in-distribution setting. We will add an explicit discussion of this scope limitation in the ablations section; space permitting, we will also include results on a small set of held-out configurations. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical gains are direct comparisons; theory is conditional and non-reductive.

full rationale

The paper reports normalized reward-drop improvements (0.497→1.034 Atari, 0.319→0.682 DMC) via direct experimental comparison of WMAttack against baselines under fixed budgets. SCAS and RGAR are algorithmic components whose utility is measured empirically rather than derived from fitted parameters. The supplied theoretical argument states only that refinement improves search conditional on shifting mass toward higher-utility attacks; this is a standard conditional guarantee and does not reduce the reported attack strengths or the empirical deltas to the inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing premises. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields minimal ledger entries; the main unverified premise is the effectiveness of the proposed refinement and retrieval mechanisms.

axioms (1)

domain assumption Proposal refinement improves finite-budget search when it shifts probability mass toward high-utility attacks.
Invoked to justify SCAS; location: abstract theoretical explanation paragraph.

pith-pipeline@v0.9.0 · 5815 in / 1154 out tokens · 20979 ms · 2026-05-25T05:12:57.873209+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · 8 internal anchors

[1]

Diffusion for world modeling: Visual details matter in atari

Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kanervisto, Amos Storkey, Tim Pearce, and François Fleuret. Diffusion for world modeling: Visual details matter in atari. InAdvances in Neural Information Processing Systems, 2024

work page 2024
[2]

Square at- tack: A query-efficient black-box adversarial attack via random search

Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. Square at- tack: A query-efficient black-box adversarial attack via random search. InEuropean Conference on Computer Vision, 2020

work page 2020
[3]

Safe exploration using bayesian world models and log-barrier optimization.arXiv preprint arXiv:2405.05890, 2024

Yarden As, Bhavya Sukhija, and Andreas Krause. Safe exploration using bayesian world models and log-barrier optimization.arXiv preprint arXiv:2405.05890, 2024

work page arXiv 2024
[4]

Verification of the implicit world model in a generative model via adversarial sequences

András Balogh and Márk Jelasity. Verification of the implicit world model in a generative model via adversarial sequences. InInternational Conference on Learning Representations,

work page
[5]

URLhttps://openreview.net/forum?id=BLOIB8CwBI

work page
[6]

On the robustness of deep reinforcement learning to adversarial attacks

Vahid Behzadan and Arslan Munir. On the robustness of deep reinforcement learning to adversarial attacks. InMachine Learning and Data Mining in Pattern Recognition: 13th International Conference, MLDM 2017, pages 262–274. Springer, 2017

work page 2017
[7]

Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling

Marc G. Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The arcade learning environment: An evaluation platform for general agents.Journal of Artificial Intelligence Research, 47:253–279, 2013

work page 2013
[8]

Rt-2: Vision-language-action models transfer web knowledge to robotic control

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choro- manski, Tianli Ding, Danny Driess, Avinash Fu, et al. Rt-2: Vision-language-action models transfer web knowledge to robotic control. InProceedings of the 7th Annual Conference on Robot Learning (CoRL), 2023. 10

work page 2023
[9]

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

Francesco Croce and Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. InInternational Conference on Machine Learning, 2020

work page 2020
[10]

Minimally distorted adversarial examples with a fast adaptive boundary attack

Francesco Croce and Matthias Hein. Minimally distorted adversarial examples with a fast adaptive boundary attack. InInternational Conference on Machine Learning, 2020

work page 2020
[11]

RobustBench: A standardized adversarial robustness benchmark

Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Edoardo Debenedetti, Nicolas Flammarion, Mung Chiang, Prateek Mittal, and Matthias Hein. RobustBench: A standardized adversarial robustness benchmark. InAdvances in Neural Information Processing Systems Datasets and Benchmarks Track, 2021

work page 2021
[12]

TRAP: Tail-aware Ranking Attack for World-Model Planning

Siyuan Duan, Ke Zhang, and Xizhao Luo. Trap: Tail-aware ranking attack for world-model planning.arXiv preprint arXiv:2605.01950, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[13]

The Llama 3 Herd of Models

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024. URL https://arxiv.org/abs/2407. 21783

work page internal anchor Pith review Pith/arXiv arXiv 2024
[14]

iworld-bench: A benchmark for interactive world models with a unified action generation framework

Jianjie Fang, Yingshan Lei, Qin Wan, Ziyou Wang, Yuchao Huang, Yongyan Xu, Baining Zhao, Weichen Zhang, Chen Gao, Xinlei Chen, and Yong Li. iworld-bench: A benchmark for interactive world models with a unified action generation framework. InProceedings of the 43rd International Conference on Machine Learning (ICML), 2026. Originally appeared as arXiv prep...

work page 2026
[15]

Adversarial policies: Attacking deep reinforcement learning

Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, and Stuart Russell. Adversarial policies: Attacking deep reinforcement learning. InInternational Conference on Learning Representations, 2020

work page 2020
[16]

Goodfellow, Jonathon Shlens, and Christian Szegedy

Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adver- sarial examples. InInternational Conference on Learning Representations, 2015

work page 2015
[17]

Copyrightshield: Enhancing diffu- sion model security against copyright infringement attacks

Zhixiang Guo, Siyuan Liang, Aishan Liu, and Dacheng Tao. Copyrightshield: Enhancing diffu- sion model security against copyright infringement attacks. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 19417–19426, 2025

work page 2025
[18]

When world models dream wrong: Physical-conditioned adversarial attacks against world models, 2026

Zhixiang Guo, Siyuan Liang, Andras Balogh, Noah Lunberry, Rong-Cheng Tu, Mark Jelasity, and Dacheng Tao. When world models dream wrong: Physical-conditioned adversarial attacks against world models, 2026. URLhttps://arxiv.org/abs/2602.18739

work page arXiv 2026
[19]

Recurrent world models facilitate policy evolution

David Ha and Jürgen Schmidhuber. Recurrent world models facilitate policy evolution. In Advances in Neural Information Processing Systems, 2018

work page 2018
[20]

Learning latent dynamics for planning from pixels

Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. InInternational Conference on Machine Learning, 2019

work page 2019
[21]

Dream to control: Learning behaviors by latent imagination

Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination. InInternational Conference on Learning Representa- tions, 2020

work page 2020
[22]

Mastering atari with discrete world models

Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, and Jimmy Ba. Mastering atari with discrete world models. InInternational Conference on Learning Representations, 2021

work page 2021
[23]

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models.arXiv preprint arXiv:2301.04104, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[24]

TD-MPC2: Scalable, robust world models for continuous control

Nicklas Hansen, Hao Su, and Xiaolong Wang. TD-MPC2: Scalable, robust world models for continuous control. InInternational Conference on Learning Representations, 2024

work page 2024
[25]

GAIA-1: A Generative World Model for Autonomous Driving

Anthony Hu, Lloyd Russell, Hudson Yeo, Zak Murez, George Fedoseev, Alex Kendall, Jamie Shotton, and Gianluca Corrado. GAIA-1: A generative world model for autonomous driving. arXiv preprint arXiv:2309.17080, 2023. 11

work page internal anchor Pith review Pith/arXiv arXiv 2023
[26]

Adversarial attacks on neural network policies

Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, and Pieter Abbeel. Adversarial attacks on neural network policies. InInternational Conference on Learning Representations Workshop, 2017

work page 2017
[27]

Safedreamer: Safe reinforcement learning with world models

Weidong Huang, Jiaming Ji, Chunhe Xia, Borong Zhang, and Yaodong Yang. Safedreamer: Safe reinforcement learning with world models. InInternational Conference on Learning Representations, 2024

work page 2024
[28]

Lira: Light-robust adversary for model-based reinforcement learning in real world.Robotics and Autonomous Systems, 2025

Taisuke Kobayashi. Lira: Light-robust adversary for model-based reinforcement learning in real world.Robotics and Autonomous Systems, 2025

work page 2025
[29]

Universal camouflage attack on vision-language models for autonomous driving.arXiv preprint arXiv:2509.20196, 2025

Dehong Kong, Sifan Yu, Siyuan Liang, Jiawei Liang, Jianhou Gan, Aishan Liu, and Wenqi Ren. Universal camouflage attack on vision-language models for autonomous driving.arXiv preprint arXiv:2509.20196, 2025

work page arXiv 2025
[30]

Hard-label black-box adversarial attacks for implicit scene interactions.IEEE Transactions on Information Forensics and Security, 20:10346–10360, 2025

Muxue Liang, Chuan Wang, Siyuan Liang, Aishan Liu, Yanan Cao, Qingyong Li, Zeming Liu, Liang Yang, and Xiaochun Cao. Hard-label black-box adversarial attacks for implicit scene interactions.IEEE Transactions on Information Forensics and Security, 20:10346–10360, 2025

work page 2025
[31]

A large-scale multiple-objective method for black-box attack against object detection

Siyuan Liang, Longkang Li, Yanbo Fan, Xiaojun Jia, Jingzhi Li, Baoyuan Wu, and Xiaochun Cao. A large-scale multiple-objective method for black-box attack against object detection. In European Conference on Computer Vision, 2022

work page 2022
[32]

Parallel rect- angle flip attack: A query-based black-box attack against object detection.arXiv preprint arXiv:2201.08970, 2022

Siyuan Liang, Baoyuan Wu, Yanbo Fan, Xingxing Wei, and Xiaochun Cao. Parallel rect- angle flip attack: A query-based black-box attack against object detection.arXiv preprint arXiv:2201.08970, 2022

work page arXiv 2022
[33]

Object detectors in the open environment: Challenges, solutions, and outlook

Siyuan Liang, Wei Wang, Ruoyu Chen, Aishan Liu, Boxi Wu, Ee-Chien Chang, Xiaochun Cao, and Dacheng Tao. Object detectors in the open environment: Challenges, solutions, and outlook. arXiv preprint arXiv:2403.16271, 2024

work page arXiv 2024
[34]

Tactics of adversarial attack on deep reinforcement learning agents

Yen-Chen Lin, Zhang-Wei Hong, Yuan-Hong Liao, Meng-Li Shih, Ming-Yu Liu, and Min Sun. Tactics of adversarial attack on deep reinforcement learning agents. InInternational Joint Conference on Artificial Intelligence, 2017

work page 2017
[35]

Metadv: A unified and interactive adversarial testing platform for autonomous driving

Aishan Liu, Jiakai Wang, Tianyuan Zhang, Hainan Li, Jiangfan Liu, Siyuan Liang, Yilong Ren, Xianglong Liu, and Dacheng Tao. Metadv: A unified and interactive adversarial testing platform for autonomous driving. InProceedings of the 33rd ACM International Conference on Multimedia, pages 13474–13476, 2025

work page 2025
[36]

Improving adversarial transferability by stable diffusion.arXiv preprint arXiv:2311.11017, 2023

Jiayang Liu, Siyu Zhu, Siyuan Liang, Jie Zhang, Han Fang, Weiming Zhang, and Ee- Chien Chang. Improving adversarial transferability by stable diffusion.arXiv preprint arXiv:2311.11017, 2023

work page arXiv 2023
[37]

Bridging the task gap: Multi-task adversarial transferability in clip and its derivatives

Kuanrong Liu, Siyuan Liang, Cheng Qian, Ming Zhang, and Xiaochun Cao. Bridging the task gap: Multi-task adversarial transferability in clip and its derivatives. InChinese Conference on Pattern Recognition and Computer Vision (PRCV), pages 152–166. Springer, 2025

work page 2025
[38]

Transformers are sample-efficient world models

Vincent Micheli, Eloi Alonso, and François Fleuret. Transformers are sample-efficient world models. InInternational Conference on Learning Representations, 2023

work page 2023
[39]

Claudini: Autoresearch discovers state-of-the-art adversarial attack algorithms for llms.arXiv preprint arXiv:2603.24511, 2026

Alexander Panfilov, Peter Romov, Igor Shilov, Yves-Alexandre de Montjoye, Jonas Geiping, and Maksym Andriushchenko. Claudini: Autoresearch discovers state-of-the-art adversarial attack algorithms for llms.arXiv preprint arXiv:2603.24511, 2026

work page arXiv 2026
[40]

How hard is it to confuse a world model?arXiv preprint arXiv:2510.21232, 2025

Waris Radji and Odalric-Ambrym Maillard. How hard is it to confuse a world model?arXiv preprint arXiv:2510.21232, 2025

work page arXiv 2025
[41]

Uncertainty-aware latent safety filters for avoiding out-of-distribution failures.arXiv preprint arXiv:2505.00779, 2025

Junwon Seo, Kensuke Nakamura, and Andrea Bajcsy. Uncertainty-aware latent safety filters for avoiding out-of-distribution failures.arXiv preprint arXiv:2505.00779, 2025

work page arXiv 2025
[42]

Learning latent dynamic robust representations for world models.arXiv preprint arXiv:2405.06263, 2024

Ruixiang Sun, Hongyu Zang, Xin Li, and Riashat Islam. Learning latent dynamic robust representations for world models.arXiv preprint arXiv:2405.06263, 2024. 12

work page arXiv 2024
[43]

DeepMind Control Suite

Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, Timothy Lillicrap, and Martin Riedmiller. Deepmind control suite.arXiv preprint arXiv:1801.00690, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[44]

Qwen2.5 Technical Report

Qwen Team. Qwen2.5 technical report.arXiv preprint arXiv:2412.15115, 2024. URL https: //arxiv.org/abs/2412.15115

work page internal anchor Pith review Pith/arXiv arXiv 2024
[45]

de Melo, and Achuta Kadambi

Rishi Upadhyay, Howard Zhang, Jim Solomon, Ayush Agrawal, Pranay Boreddy, Shruti Satya Narayana, Yunhao Ba, Alex Wong, Celso M. de Melo, and Achuta Kadambi. World- Bench: Disambiguating physics for diagnostic evaluation of world models.arXiv preprint arXiv:2601.21282, 2026

work page arXiv 2026
[46]

Chen, Ashesh Rambachan, Jon Kleinberg, and Sendhil Mullainathan

Keyon Vafa, Justin Y . Chen, Ashesh Rambachan, Jon Kleinberg, and Sendhil Mullainathan. Evaluating the world model implicit in a generative model. InAdvances in Neural Information Processing Systems, volume 37, 2024

work page 2024
[47]

Black-box adversarial attack on vision language models for autonomous driving.arXiv preprint arXiv:2501.13563, 2025

Lu Wang, Tianyuan Zhang, Yang Qu, Siyuan Liang, Yuwei Chen, Aishan Liu, Xianglong Liu, and Dacheng Tao. Black-box adversarial attack on vision language models for autonomous driving.arXiv preprint arXiv:2501.13563, 2025

work page arXiv 2025
[48]

Text adversarial attacks with dynamic outputs.arXiv preprint arXiv:2509.22393, 2025

Wenqiang Wang, Siyuan Liang, Xiao Yan, and Xiaochun Cao. Text adversarial attacks with dynamic outputs.arXiv preprint arXiv:2509.22393, 2025

work page arXiv 2025
[49]

Diversifying the high-level features for better adversarial transferability.arXiv preprint arXiv:2304.10136, 2023

Zhiyuan Wang, Zeliang Zhang, Siyuan Liang, and Xiaosen Wang. Diversifying the high-level features for better adversarial transferability.arXiv preprint arXiv:2304.10136, 2023

work page arXiv 2023
[50]

Transferable Adversarial Attacks for Image and Video Object Detection

Xingxing Wei, Siyuan Liang, Ning Chen, and Xiaochun Cao. Transferable adversarial attacks for image and video object detection.arXiv preprint arXiv:1811.12641, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[51]

Ctrlattack: A unified attack on world-model control in diffusion models.arXiv preprint arXiv:2603.13435, 2026

Shuhan Xu, Siyuan Liang, Hongling Zheng, Yong Luo, Han Hu, Lefei Zhang, and Dacheng Tao. Ctrlattack: A unified attack on world-model control in diffusion models.arXiv preprint arXiv:2603.13435, 2026

work page arXiv 2026
[52]

Learning interactive real-world simulators

Sherry Yang, Yilun Du, Kamyar Ghasemipour, Jonathan Tompson, Leslie Kaelbling, Dale Schuurmans, and Pieter Abbeel. Learning interactive real-world simulators. InInternational Conference on Learning Representations, 2024

work page 2024
[53]

Towards robust model-based reinforce- ment learning against adversarial corruption.arXiv preprint arXiv:2402.08991, 2024

Chenlu Ye, Jiafan He, Quanquan Gu, and Tong Zhang. Towards robust model-based reinforce- ment learning against adversarial corruption.arXiv preprint arXiv:2402.08991, 2024

work page arXiv 2024
[54]

Learning invariant representations for reinforcement learning without reconstruction

Amy Zhang, Rowan McAllister, Roberto Calandra, Yarin Gal, and Sergey Levine. Learning invariant representations for reinforcement learning without reconstruction. InInternational Conference on Learning Representations (ICLR), 2021

work page 2021
[55]

Safe deep model-based reinforcement learning with lyapunov functions.arXiv preprint arXiv:2405.16184, 2024

Harry Zhang. Safe deep model-based reinforcement learning with lyapunov functions.arXiv preprint arXiv:2405.16184, 2024

work page arXiv 2024
[56]

Robust deep reinforcement learning against adversarial perturbations on state observations

Huan Zhang, Hongge Chen, Chaowei Xiao, Bo Li, Duane Boning, and Cho-Jui Hsieh. Robust deep reinforcement learning against adversarial perturbations on state observations. InAdvances in Neural Information Processing Systems, 2020

work page 2020
[57]

Visual Adversarial Attack on Vision-Language Models for Autonomous Driving

Tianyuan Zhang, Lu Wang, Xinwei Zhang, Yitong Zhang, Boyi Jia, Siyuan Liang, Shengshan Hu, Qiang Fu, Aishan Liu, and Xianglong Liu. Visual adversarial attack on vision-language models for autonomous driving.arXiv preprint arXiv:2411.18275, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[58]

STORM: Efficient stochastic transformer based world models for reinforcement learning.arXiv preprint arXiv:2310.09615, 2024

Weipu Zhang, Gang Wang, Jian Sun, Yetian Yuan, and Gao Huang. STORM: Efficient stochastic transformer based world models for reinforcement learning.arXiv preprint arXiv:2310.09615, 2024

work page arXiv 2024
[59]

World model robustness via surprise recognition.arXiv preprint arXiv:2512.01119, 2025

Geigh Zollicoffer, Tanush Chopra, Mingkuan Yan, Xiaoxu Ma, Kenneth Eaton, and Mark Riedl. World model robustness via surprise recognition.arXiv preprint arXiv:2512.01119, 2025. 13 A Victim and Threat Model Details This appendix details the closed-loop evaluation process under clean and attacked observations. The victim is a fixed world-model agent Mθ = (ϕ...

work page arXiv 2025

[1] [1]

Diffusion for world modeling: Visual details matter in atari

Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kanervisto, Amos Storkey, Tim Pearce, and François Fleuret. Diffusion for world modeling: Visual details matter in atari. InAdvances in Neural Information Processing Systems, 2024

work page 2024

[2] [2]

Square at- tack: A query-efficient black-box adversarial attack via random search

Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. Square at- tack: A query-efficient black-box adversarial attack via random search. InEuropean Conference on Computer Vision, 2020

work page 2020

[3] [3]

Safe exploration using bayesian world models and log-barrier optimization.arXiv preprint arXiv:2405.05890, 2024

Yarden As, Bhavya Sukhija, and Andreas Krause. Safe exploration using bayesian world models and log-barrier optimization.arXiv preprint arXiv:2405.05890, 2024

work page arXiv 2024

[4] [4]

Verification of the implicit world model in a generative model via adversarial sequences

András Balogh and Márk Jelasity. Verification of the implicit world model in a generative model via adversarial sequences. InInternational Conference on Learning Representations,

work page

[5] [5]

URLhttps://openreview.net/forum?id=BLOIB8CwBI

work page

[6] [6]

On the robustness of deep reinforcement learning to adversarial attacks

Vahid Behzadan and Arslan Munir. On the robustness of deep reinforcement learning to adversarial attacks. InMachine Learning and Data Mining in Pattern Recognition: 13th International Conference, MLDM 2017, pages 262–274. Springer, 2017

work page 2017

[7] [7]

Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling

Marc G. Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The arcade learning environment: An evaluation platform for general agents.Journal of Artificial Intelligence Research, 47:253–279, 2013

work page 2013

[8] [8]

Rt-2: Vision-language-action models transfer web knowledge to robotic control

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choro- manski, Tianli Ding, Danny Driess, Avinash Fu, et al. Rt-2: Vision-language-action models transfer web knowledge to robotic control. InProceedings of the 7th Annual Conference on Robot Learning (CoRL), 2023. 10

work page 2023

[9] [9]

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

Francesco Croce and Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. InInternational Conference on Machine Learning, 2020

work page 2020

[10] [10]

Minimally distorted adversarial examples with a fast adaptive boundary attack

Francesco Croce and Matthias Hein. Minimally distorted adversarial examples with a fast adaptive boundary attack. InInternational Conference on Machine Learning, 2020

work page 2020

[11] [11]

RobustBench: A standardized adversarial robustness benchmark

Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Edoardo Debenedetti, Nicolas Flammarion, Mung Chiang, Prateek Mittal, and Matthias Hein. RobustBench: A standardized adversarial robustness benchmark. InAdvances in Neural Information Processing Systems Datasets and Benchmarks Track, 2021

work page 2021

[12] [12]

TRAP: Tail-aware Ranking Attack for World-Model Planning

Siyuan Duan, Ke Zhang, and Xizhao Luo. Trap: Tail-aware ranking attack for world-model planning.arXiv preprint arXiv:2605.01950, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[13] [13]

The Llama 3 Herd of Models

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024. URL https://arxiv.org/abs/2407. 21783

work page internal anchor Pith review Pith/arXiv arXiv 2024

[14] [14]

iworld-bench: A benchmark for interactive world models with a unified action generation framework

Jianjie Fang, Yingshan Lei, Qin Wan, Ziyou Wang, Yuchao Huang, Yongyan Xu, Baining Zhao, Weichen Zhang, Chen Gao, Xinlei Chen, and Yong Li. iworld-bench: A benchmark for interactive world models with a unified action generation framework. InProceedings of the 43rd International Conference on Machine Learning (ICML), 2026. Originally appeared as arXiv prep...

work page 2026

[15] [15]

Adversarial policies: Attacking deep reinforcement learning

Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, and Stuart Russell. Adversarial policies: Attacking deep reinforcement learning. InInternational Conference on Learning Representations, 2020

work page 2020

[16] [16]

Goodfellow, Jonathon Shlens, and Christian Szegedy

Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adver- sarial examples. InInternational Conference on Learning Representations, 2015

work page 2015

[17] [17]

Copyrightshield: Enhancing diffu- sion model security against copyright infringement attacks

Zhixiang Guo, Siyuan Liang, Aishan Liu, and Dacheng Tao. Copyrightshield: Enhancing diffu- sion model security against copyright infringement attacks. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 19417–19426, 2025

work page 2025

[18] [18]

When world models dream wrong: Physical-conditioned adversarial attacks against world models, 2026

Zhixiang Guo, Siyuan Liang, Andras Balogh, Noah Lunberry, Rong-Cheng Tu, Mark Jelasity, and Dacheng Tao. When world models dream wrong: Physical-conditioned adversarial attacks against world models, 2026. URLhttps://arxiv.org/abs/2602.18739

work page arXiv 2026

[19] [19]

Recurrent world models facilitate policy evolution

David Ha and Jürgen Schmidhuber. Recurrent world models facilitate policy evolution. In Advances in Neural Information Processing Systems, 2018

work page 2018

[20] [20]

Learning latent dynamics for planning from pixels

Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. InInternational Conference on Machine Learning, 2019

work page 2019

[21] [21]

Dream to control: Learning behaviors by latent imagination

Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination. InInternational Conference on Learning Representa- tions, 2020

work page 2020

[22] [22]

Mastering atari with discrete world models

Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, and Jimmy Ba. Mastering atari with discrete world models. InInternational Conference on Learning Representations, 2021

work page 2021

[23] [23]

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models.arXiv preprint arXiv:2301.04104, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[24] [24]

TD-MPC2: Scalable, robust world models for continuous control

Nicklas Hansen, Hao Su, and Xiaolong Wang. TD-MPC2: Scalable, robust world models for continuous control. InInternational Conference on Learning Representations, 2024

work page 2024

[25] [25]

GAIA-1: A Generative World Model for Autonomous Driving

Anthony Hu, Lloyd Russell, Hudson Yeo, Zak Murez, George Fedoseev, Alex Kendall, Jamie Shotton, and Gianluca Corrado. GAIA-1: A generative world model for autonomous driving. arXiv preprint arXiv:2309.17080, 2023. 11

work page internal anchor Pith review Pith/arXiv arXiv 2023

[26] [26]

Adversarial attacks on neural network policies

Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, and Pieter Abbeel. Adversarial attacks on neural network policies. InInternational Conference on Learning Representations Workshop, 2017

work page 2017

[27] [27]

Safedreamer: Safe reinforcement learning with world models

Weidong Huang, Jiaming Ji, Chunhe Xia, Borong Zhang, and Yaodong Yang. Safedreamer: Safe reinforcement learning with world models. InInternational Conference on Learning Representations, 2024

work page 2024

[28] [28]

Lira: Light-robust adversary for model-based reinforcement learning in real world.Robotics and Autonomous Systems, 2025

Taisuke Kobayashi. Lira: Light-robust adversary for model-based reinforcement learning in real world.Robotics and Autonomous Systems, 2025

work page 2025

[29] [29]

Universal camouflage attack on vision-language models for autonomous driving.arXiv preprint arXiv:2509.20196, 2025

Dehong Kong, Sifan Yu, Siyuan Liang, Jiawei Liang, Jianhou Gan, Aishan Liu, and Wenqi Ren. Universal camouflage attack on vision-language models for autonomous driving.arXiv preprint arXiv:2509.20196, 2025

work page arXiv 2025

[30] [30]

Hard-label black-box adversarial attacks for implicit scene interactions.IEEE Transactions on Information Forensics and Security, 20:10346–10360, 2025

Muxue Liang, Chuan Wang, Siyuan Liang, Aishan Liu, Yanan Cao, Qingyong Li, Zeming Liu, Liang Yang, and Xiaochun Cao. Hard-label black-box adversarial attacks for implicit scene interactions.IEEE Transactions on Information Forensics and Security, 20:10346–10360, 2025

work page 2025

[31] [31]

A large-scale multiple-objective method for black-box attack against object detection

Siyuan Liang, Longkang Li, Yanbo Fan, Xiaojun Jia, Jingzhi Li, Baoyuan Wu, and Xiaochun Cao. A large-scale multiple-objective method for black-box attack against object detection. In European Conference on Computer Vision, 2022

work page 2022

[32] [32]

Parallel rect- angle flip attack: A query-based black-box attack against object detection.arXiv preprint arXiv:2201.08970, 2022

Siyuan Liang, Baoyuan Wu, Yanbo Fan, Xingxing Wei, and Xiaochun Cao. Parallel rect- angle flip attack: A query-based black-box attack against object detection.arXiv preprint arXiv:2201.08970, 2022

work page arXiv 2022

[33] [33]

Object detectors in the open environment: Challenges, solutions, and outlook

Siyuan Liang, Wei Wang, Ruoyu Chen, Aishan Liu, Boxi Wu, Ee-Chien Chang, Xiaochun Cao, and Dacheng Tao. Object detectors in the open environment: Challenges, solutions, and outlook. arXiv preprint arXiv:2403.16271, 2024

work page arXiv 2024

[34] [34]

Tactics of adversarial attack on deep reinforcement learning agents

Yen-Chen Lin, Zhang-Wei Hong, Yuan-Hong Liao, Meng-Li Shih, Ming-Yu Liu, and Min Sun. Tactics of adversarial attack on deep reinforcement learning agents. InInternational Joint Conference on Artificial Intelligence, 2017

work page 2017

[35] [35]

Metadv: A unified and interactive adversarial testing platform for autonomous driving

Aishan Liu, Jiakai Wang, Tianyuan Zhang, Hainan Li, Jiangfan Liu, Siyuan Liang, Yilong Ren, Xianglong Liu, and Dacheng Tao. Metadv: A unified and interactive adversarial testing platform for autonomous driving. InProceedings of the 33rd ACM International Conference on Multimedia, pages 13474–13476, 2025

work page 2025

[36] [36]

Improving adversarial transferability by stable diffusion.arXiv preprint arXiv:2311.11017, 2023

Jiayang Liu, Siyu Zhu, Siyuan Liang, Jie Zhang, Han Fang, Weiming Zhang, and Ee- Chien Chang. Improving adversarial transferability by stable diffusion.arXiv preprint arXiv:2311.11017, 2023

work page arXiv 2023

[37] [37]

Bridging the task gap: Multi-task adversarial transferability in clip and its derivatives

Kuanrong Liu, Siyuan Liang, Cheng Qian, Ming Zhang, and Xiaochun Cao. Bridging the task gap: Multi-task adversarial transferability in clip and its derivatives. InChinese Conference on Pattern Recognition and Computer Vision (PRCV), pages 152–166. Springer, 2025

work page 2025

[38] [38]

Transformers are sample-efficient world models

Vincent Micheli, Eloi Alonso, and François Fleuret. Transformers are sample-efficient world models. InInternational Conference on Learning Representations, 2023

work page 2023

[39] [39]

Claudini: Autoresearch discovers state-of-the-art adversarial attack algorithms for llms.arXiv preprint arXiv:2603.24511, 2026

Alexander Panfilov, Peter Romov, Igor Shilov, Yves-Alexandre de Montjoye, Jonas Geiping, and Maksym Andriushchenko. Claudini: Autoresearch discovers state-of-the-art adversarial attack algorithms for llms.arXiv preprint arXiv:2603.24511, 2026

work page arXiv 2026

[40] [40]

How hard is it to confuse a world model?arXiv preprint arXiv:2510.21232, 2025

Waris Radji and Odalric-Ambrym Maillard. How hard is it to confuse a world model?arXiv preprint arXiv:2510.21232, 2025

work page arXiv 2025

[41] [41]

Uncertainty-aware latent safety filters for avoiding out-of-distribution failures.arXiv preprint arXiv:2505.00779, 2025

Junwon Seo, Kensuke Nakamura, and Andrea Bajcsy. Uncertainty-aware latent safety filters for avoiding out-of-distribution failures.arXiv preprint arXiv:2505.00779, 2025

work page arXiv 2025

[42] [42]

Learning latent dynamic robust representations for world models.arXiv preprint arXiv:2405.06263, 2024

Ruixiang Sun, Hongyu Zang, Xin Li, and Riashat Islam. Learning latent dynamic robust representations for world models.arXiv preprint arXiv:2405.06263, 2024. 12

work page arXiv 2024

[43] [43]

DeepMind Control Suite

Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, Timothy Lillicrap, and Martin Riedmiller. Deepmind control suite.arXiv preprint arXiv:1801.00690, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[44] [44]

Qwen2.5 Technical Report

Qwen Team. Qwen2.5 technical report.arXiv preprint arXiv:2412.15115, 2024. URL https: //arxiv.org/abs/2412.15115

work page internal anchor Pith review Pith/arXiv arXiv 2024

[45] [45]

de Melo, and Achuta Kadambi

Rishi Upadhyay, Howard Zhang, Jim Solomon, Ayush Agrawal, Pranay Boreddy, Shruti Satya Narayana, Yunhao Ba, Alex Wong, Celso M. de Melo, and Achuta Kadambi. World- Bench: Disambiguating physics for diagnostic evaluation of world models.arXiv preprint arXiv:2601.21282, 2026

work page arXiv 2026

[46] [46]

Chen, Ashesh Rambachan, Jon Kleinberg, and Sendhil Mullainathan

Keyon Vafa, Justin Y . Chen, Ashesh Rambachan, Jon Kleinberg, and Sendhil Mullainathan. Evaluating the world model implicit in a generative model. InAdvances in Neural Information Processing Systems, volume 37, 2024

work page 2024

[47] [47]

Black-box adversarial attack on vision language models for autonomous driving.arXiv preprint arXiv:2501.13563, 2025

Lu Wang, Tianyuan Zhang, Yang Qu, Siyuan Liang, Yuwei Chen, Aishan Liu, Xianglong Liu, and Dacheng Tao. Black-box adversarial attack on vision language models for autonomous driving.arXiv preprint arXiv:2501.13563, 2025

work page arXiv 2025

[48] [48]

Text adversarial attacks with dynamic outputs.arXiv preprint arXiv:2509.22393, 2025

Wenqiang Wang, Siyuan Liang, Xiao Yan, and Xiaochun Cao. Text adversarial attacks with dynamic outputs.arXiv preprint arXiv:2509.22393, 2025

work page arXiv 2025

[49] [49]

Diversifying the high-level features for better adversarial transferability.arXiv preprint arXiv:2304.10136, 2023

Zhiyuan Wang, Zeliang Zhang, Siyuan Liang, and Xiaosen Wang. Diversifying the high-level features for better adversarial transferability.arXiv preprint arXiv:2304.10136, 2023

work page arXiv 2023

[50] [50]

Transferable Adversarial Attacks for Image and Video Object Detection

Xingxing Wei, Siyuan Liang, Ning Chen, and Xiaochun Cao. Transferable adversarial attacks for image and video object detection.arXiv preprint arXiv:1811.12641, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[51] [51]

Ctrlattack: A unified attack on world-model control in diffusion models.arXiv preprint arXiv:2603.13435, 2026

Shuhan Xu, Siyuan Liang, Hongling Zheng, Yong Luo, Han Hu, Lefei Zhang, and Dacheng Tao. Ctrlattack: A unified attack on world-model control in diffusion models.arXiv preprint arXiv:2603.13435, 2026

work page arXiv 2026

[52] [52]

Learning interactive real-world simulators

Sherry Yang, Yilun Du, Kamyar Ghasemipour, Jonathan Tompson, Leslie Kaelbling, Dale Schuurmans, and Pieter Abbeel. Learning interactive real-world simulators. InInternational Conference on Learning Representations, 2024

work page 2024

[53] [53]

Towards robust model-based reinforce- ment learning against adversarial corruption.arXiv preprint arXiv:2402.08991, 2024

Chenlu Ye, Jiafan He, Quanquan Gu, and Tong Zhang. Towards robust model-based reinforce- ment learning against adversarial corruption.arXiv preprint arXiv:2402.08991, 2024

work page arXiv 2024

[54] [54]

Learning invariant representations for reinforcement learning without reconstruction

Amy Zhang, Rowan McAllister, Roberto Calandra, Yarin Gal, and Sergey Levine. Learning invariant representations for reinforcement learning without reconstruction. InInternational Conference on Learning Representations (ICLR), 2021

work page 2021

[55] [55]

Safe deep model-based reinforcement learning with lyapunov functions.arXiv preprint arXiv:2405.16184, 2024

Harry Zhang. Safe deep model-based reinforcement learning with lyapunov functions.arXiv preprint arXiv:2405.16184, 2024

work page arXiv 2024

[56] [56]

Robust deep reinforcement learning against adversarial perturbations on state observations

Huan Zhang, Hongge Chen, Chaowei Xiao, Bo Li, Duane Boning, and Cho-Jui Hsieh. Robust deep reinforcement learning against adversarial perturbations on state observations. InAdvances in Neural Information Processing Systems, 2020

work page 2020

[57] [57]

Visual Adversarial Attack on Vision-Language Models for Autonomous Driving

Tianyuan Zhang, Lu Wang, Xinwei Zhang, Yitong Zhang, Boyi Jia, Siyuan Liang, Shengshan Hu, Qiang Fu, Aishan Liu, and Xianglong Liu. Visual adversarial attack on vision-language models for autonomous driving.arXiv preprint arXiv:2411.18275, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[58] [58]

STORM: Efficient stochastic transformer based world models for reinforcement learning.arXiv preprint arXiv:2310.09615, 2024

Weipu Zhang, Gang Wang, Jian Sun, Yetian Yuan, and Gao Huang. STORM: Efficient stochastic transformer based world models for reinforcement learning.arXiv preprint arXiv:2310.09615, 2024

work page arXiv 2024

[59] [59]

World model robustness via surprise recognition.arXiv preprint arXiv:2512.01119, 2025

Geigh Zollicoffer, Tanush Chopra, Mingkuan Yan, Xiaoxu Ma, Kenneth Eaton, and Mark Riedl. World model robustness via surprise recognition.arXiv preprint arXiv:2512.01119, 2025. 13 A Victim and Threat Model Details This appendix details the closed-loop evaluation process under clean and attacked observations. The victim is a fixed world-model agent Mθ = (ϕ...

work page arXiv 2025