Robust Deep Reinforcement Learning with Adversarial Attacks

Robust deep reinforcement learning with adversarial attacks , author= · 2017 · cs.LG · arXiv 1712.03632

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

This paper proposes adversarial attacks for Reinforcement Learning (RL) and then improves the robustness of Deep Reinforcement Learning algorithms (DRL) to parameter uncertainties with the help of these attacks. We show that even a naively engineered attack successfully degrades the performance of DRL algorithm. We further improve the attack using gradient information of an engineered loss function which leads to further degradation in performance. These attacks are then leveraged during training to improve the robustness of RL within robust control framework. We show that this adversarial training of DRL algorithms like Deep Double Q learning and Deep Deterministic Policy Gradients leads to significant increase in robustness to parameter variations for RL benchmarks such as Cart-pole, Mountain Car, Hopper and Half Cheetah environment.

representative citing papers

Interaction-Breaking Adversarial Learning Framework for Robust Multi-Agent Reinforcement Learning

cs.LG · 2026-05-18 · unverdicted · novelty 7.0 · 2 refs

IBAL framework constructs information-theoretic adversarial attacks on agent observations and actions to train MARL agents that remain robust to interaction disruptions and agent-missing scenarios.

Efficient Preference Poisoning Attack on Offline RLHF

cs.LG · 2026-05-04 · unverdicted · novelty 7.0

Preference poisoning against log-linear DPO reduces to a binary sparse approximation problem solved by lattice-reduction (BAL-A) and matching-pursuit (BMP-A) algorithms that carry recovery guarantees.

Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning

cs.LG · 2025-02-05 · unverdicted · novelty 6.0

Wolfpack attack framework disrupts MARL cooperation by targeting initial and assisting agents; WALL trains robust policies against it with reported experimental gains.

citing papers explorer

Showing 3 of 3 citing papers.

Interaction-Breaking Adversarial Learning Framework for Robust Multi-Agent Reinforcement Learning cs.LG · 2026-05-18 · unverdicted · none · ref 9 · 2 links · internal anchor
IBAL framework constructs information-theoretic adversarial attacks on agent observations and actions to train MARL agents that remain robust to interaction disruptions and agent-missing scenarios.
Efficient Preference Poisoning Attack on Offline RLHF cs.LG · 2026-05-04 · unverdicted · none · ref 128
Preference poisoning against log-linear DPO reduces to a binary sparse approximation problem solved by lattice-reduction (BAL-A) and matching-pursuit (BMP-A) algorithms that carry recovery guarantees.
Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning cs.LG · 2025-02-05 · unverdicted · none · ref 11 · internal anchor
Wolfpack attack framework disrupts MARL cooperation by targeting initial and assisting agents; WALL trains robust policies against it with reported experimental gains.

Robust Deep Reinforcement Learning with Adversarial Attacks

fields

years

verdicts

representative citing papers

citing papers explorer