pith. sign in

arxiv: 1906.12061 · v1 · pith:GLGF2AN5new · submitted 2019-06-28 · 💻 cs.LG · cs.CR· stat.ML

Learning to Cope with Adversarial Attacks

Pith reviewed 2026-05-25 13:48 UTC · model grok-4.3

classification 💻 cs.LG cs.CRstat.ML
keywords deep reinforcement learningadversarial attacksmeta-learninghierarchical policiesrobustnessonline adaptationcyber-physical systems
0
0 comments X

The pith

A meta-learned hierarchical RL agent adapts its master and sub-policies online to sustain nominal rewards under adversarial attacks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates the Meta-Learned Advantage Hierarchy agent as a way for deep reinforcement learning systems to handle adversarial intrusions in real-world settings such as cyber-physical systems. It tests the agent's responses to varied attack patterns and shows that the meta-learning structure lets the agent switch between policies to keep performance steady. Results also indicate that wider gaps between attacks allow the agent to reach better overall reward levels, even as variability rises. The work focuses on whether these adaptation patterns stem directly from the hierarchical meta-learning design.

Core claim

The MLAH agent exhibits interesting coping behaviors when subjected to different adversarial attacks to maintain a nominal reward. Additionally, the framework exhibits a hierarchical coping capability, based on the adaptability of the Master policy and sub-policies themselves. From empirical results, we also observed that as the interval of adversarial attacks increase, the MLAH agent can maintain a higher distribution of rewards, though at the cost of higher instabilities.

What carries the argument

The Meta-Learned Advantage Hierarchy (MLAH) agent, a meta-learning framework that learns robust policies online through a master policy and adaptable sub-policies.

If this is right

  • The agent can adjust its master policy for broad responses and sub-policies for finer adjustments during attacks.
  • Longer intervals between attacks support higher average rewards across trials.
  • Increased attack spacing also raises instability in the observed reward distribution.
  • Online adaptation occurs without requiring offline retraining after each attack.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar hierarchical meta-learning structures might be tested on other reinforcement learning vulnerabilities beyond the specific attacks used here.
  • Deployment in actual physical systems would clarify whether the simulated attack effects match real threats.
  • The instability trade-off could be measured against simpler non-hierarchical agents to isolate the source of the effect.

Load-bearing premise

The coping behaviors and reward patterns arise specifically from the MLAH meta-learning and hierarchical design rather than from other details of the training or attack setup.

What would settle it

Run the same attack experiments on a non-meta-learned hierarchical RL agent and compare whether the reward-maintenance and adaptation patterns disappear.

Figures

Figures reproduced from arXiv: 1906.12061 by Aaron Havens, Girish Chowdhary, Soumik Sarkar, Xian Yeow Lee.

Figure 1
Figure 1. Figure 1: Illustration of the MLAH framework. The Master policy observes the advantages of each sub-policy and decides the optimal sub-policy to employ. The selected sub-policy then acts on the observation from the environment. Note that both the Master policy and selected sub-policy receives the same reward signal from the environment. observation as the surrogate observation when adversaries are detected. Furtherm… view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of a symmetric mirror attack on the RL agent about the center vertical axis. Under this attack, the optimal policy changes and the resulting action isn’t just sub-optimal but is instead directly leading the agent away from the goal. two sub-policies for nominal/adversary conditions. Each sub-policy consists of another separate network with 2 dense layers and 32 hidden units each. The sub-polic… view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of a nominal agent with just one policy with the MLAH agent across multiple adversary attacks. The performance of the nominal agent are shown in red and the rewards clearly show a periodic presence of adversarial attacks. Performance of the MLAH (across different random seeds) are shown in cyan and there is a clear trend that the MLAH agent is able to cope against the adversarial attacks to main… view at source ↗
Figure 4
Figure 4. Figure 4: Illustration of MLAH agent’s coping behaviour under symmetrical mirror attack about the y-axis. The MLAH agent learns to use a different sub-policy that maps the adversarial obser￾vation to an optimal action that leads it to the goal. of 10,000 steps. As observed in the first sub-plot of each graph, the agent is able to reach the goal under nominal conditions in less than 10 iterations and consistently rec… view at source ↗
Figure 5
Figure 5. Figure 5: Cumulative reward plots of MLAH agent subjected to different intervals of adversarial mirror attacks. A noticeable trend is that as the intervals get smaller, the agent becomes more stable, though at a cost of a lower distribution of rewards with greater variance, as shown in [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Distribution of cumulative rewards for the MLAH agent subjected to adversarial mirror attacks with different intervals. Under long intervals of attacks, the MLAH agent has a higher dis￾tribution of rewards, albeit with several outlying points attributed to Master agent’s instability. As attack intervals decrease, there are fewer instabilities as evident by fewer outliers, although the distribution of rewar… view at source ↗
read the original abstract

The security of Deep Reinforcement Learning (Deep RL) algorithms deployed in real life applications are of a primary concern. In particular, the robustness of RL agents in cyber-physical systems against adversarial attacks are especially vital since the cost of a malevolent intrusions can be extremely high. Studies have shown Deep Neural Networks (DNN), which forms the core decision-making unit in most modern RL algorithms, are easily subjected to adversarial attacks. Hence, it is imperative that RL agents deployed in real-life applications have the capability to detect and mitigate adversarial attacks in an online fashion. An example of such a framework is the Meta-Learned Advantage Hierarchy (MLAH) agent that utilizes a meta-learning framework to learn policies robustly online. Since the mechanism of this framework are still not fully explored, we conducted multiple experiments to better understand the framework's capabilities and limitations. Our results shows that the MLAH agent exhibits interesting coping behaviors when subjected to different adversarial attacks to maintain a nominal reward. Additionally, the framework exhibits a hierarchical coping capability, based on the adaptability of the Master policy and sub-policies themselves. From empirical results, we also observed that as the interval of adversarial attacks increase, the MLAH agent can maintain a higher distribution of rewards, though at the cost of higher instabilities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper examines the Meta-Learned Advantage Hierarchy (MLAH) agent in deep reinforcement learning for robustness against adversarial attacks in cyber-physical systems. It reports that MLAH exhibits coping behaviors to sustain nominal rewards under varying attacks, demonstrates hierarchical adaptability via the master policy and sub-policies, and achieves higher reward distributions as attack intervals lengthen, at the expense of increased instability. These outcomes are presented as empirical findings from multiple experiments exploring the framework's capabilities and limitations.

Significance. If the reported behaviors can be causally attributed to the meta-learning and hierarchical structure rather than generic RL adaptation, the work could inform design of online-robust agents for high-stakes applications. However, the absence of isolating controls means the significance remains provisional; the manuscript does not yet deliver reproducible evidence that the observed coping or interval effects are architecture-specific.

major comments (1)
  1. [Abstract / Experiments section] Abstract and experimental claims: the central attribution—that coping behaviors, hierarchical adaptability, and reward distributions arise from MLAH's meta-learning and Master/sub-policy structure—lacks supporting controls. No comparisons to flat meta-learners, non-meta hierarchical agents, or standard DQN/PPO under identical attack schedules are described, leaving the causal link to the architecture unsecured (see reader's weakest_assumption and skeptic note).
minor comments (1)
  1. [Abstract] The abstract states results but provides no quantitative details (e.g., specific reward values, attack types, interval lengths, or statistical measures), making it impossible to assess reproducibility or effect sizes.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. The major comment correctly identifies that our experiments do not include control comparisons to other agents, which limits strong causal claims about the source of the observed behaviors. We respond point by point below.

read point-by-point responses
  1. Referee: [Abstract / Experiments section] Abstract and experimental claims: the central attribution—that coping behaviors, hierarchical adaptability, and reward distributions arise from MLAH's meta-learning and Master/sub-policy structure—lacks supporting controls. No comparisons to flat meta-learners, non-meta hierarchical agents, or standard DQN/PPO under identical attack schedules are described, leaving the causal link to the architecture unsecured (see reader's weakest_assumption and skeptic note).

    Authors: We agree that the manuscript contains no ablation or baseline comparisons (flat meta-learners, non-meta hierarchies, or standard DQN/PPO) under matched attack schedules. The work is an observational study of behaviors exhibited by the MLAH agent; the abstract and text attribute the reported coping and interval effects to the MLAH framework as implemented, without claiming these effects are absent in other architectures. Because new comparative experiments were not performed, we cannot supply the requested isolating controls. We will therefore revise the abstract and add an explicit limitations paragraph stating that the findings are specific to MLAH and that causal isolation of meta-learning versus hierarchy remains future work. This is a partial revision consisting of textual clarification rather than new experiments. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical observations only

full rationale

The paper reports experimental results on MLAH agent behavior under adversarial attacks, with all claims (coping behaviors, hierarchical adaptability, reward distributions) explicitly tied to simulation outcomes rather than any derivation, prediction, or first-principles argument. No equations, fitted parameters presented as predictions, or self-citation load-bearing steps appear in the provided text. The analysis is therefore self-contained against external benchmarks with no reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract relies on standard domain assumptions about DNN vulnerability to adversarial attacks in RL; no free parameters, new entities, or ad-hoc axioms are mentioned.

axioms (1)
  • domain assumption Deep neural networks forming the core of RL agents are vulnerable to adversarial attacks
    Stated as background motivation in the abstract.

pith-pipeline@v0.9.0 · 5760 in / 1182 out tokens · 29553 ms · 2026-05-25T13:48:36.569428+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 11 internal anchors

  1. [1]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

  2. [2]

    and Munir, A

    Behzadan, V. and Munir, A. Vulnerability of deep reinforcement learning to policy induction attacks. In International Conference on Machine Learning and Data Mining in Pattern Recognition, pp.\ 262--275. Springer, 2017 a

  3. [3]

    Whatever Does Not Kill Deep Reinforcement Learning, Makes It Stronger

    Behzadan, V. and Munir, A. Whatever does not kill deep reinforcement learning, makes it stronger. arXiv preprint arXiv:1712.09344, 2017 b

  4. [4]

    OpenAI Gym

    Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W. Openai gym. arXiv preprint arXiv:1606.01540, 2016

  5. [5]

    Deep direct reinforcement learning for financial signal representation and trading

    Deng, Y., Bao, F., Kong, Y., Ren, Z., and Dai, Q. Deep direct reinforcement learning for financial signal representation and trading. IEEE transactions on neural networks and learning systems, 28 0 (3): 0 653--664, 2017

  6. [6]

    Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control

    Ebert, F., Finn, C., Dasari, S., Xie, A., Lee, A., and Levine, S. Visual foresight: Model-based deep reinforcement learning for vision-based robotic control. arXiv preprint arXiv:1812.00568, 2018

  7. [7]

    Meta Learning Shared Hierarchies

    Frans, K., Ho, J., Chen, X., Abbeel, P., and Schulman, J. Meta learning shared hierarchies. arXiv preprint arXiv:1710.09767, 2017

  8. [8]

    Explaining and Harnessing Adversarial Examples

    Goodfellow, I., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015. URL http://arxiv.org/abs/1412.6572

  9. [9]

    Online robust policy learning in the presence of unknown adversaries

    Havens, A., Jiang, Z., and Sarkar, S. Online robust policy learning in the presence of unknown adversaries. In Advances in Neural Information Processing Systems, pp.\ 9916--9926, 2018

  10. [10]

    Adversarial Attacks on Neural Network Policies

    Huang, S., Papernot, N., Goodfellow, I., Duan, Y., and Abbeel, P. Adversarial attacks on neural network policies. arXiv preprint arXiv:1702.02284, 2017

  11. [11]

    W., Morton, E

    Jaccard, N., Rogers, T. W., Morton, E. J., and Griffin, L. D. Automated detection of smuggled high-risk security threats using deep learning. 2016

  12. [12]

    A., Badawi, O., Gordon, A

    Komorowski, M., Celi, L. A., Badawi, O., Gordon, A. C., and Faisal, A. A. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nature Medicine, 24 0 (11): 0 1716, 2018

  13. [13]

    Adversarial examples in the physical world

    Kurakin, A., Goodfellow, I., and Bengio, S. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016 a

  14. [14]

    Adversarial Machine Learning at Scale

    Kurakin, A., Goodfellow, I., and Bengio, S. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016 b

  15. [15]

    Flow Shape Design for Microfluidic Devices Using Deep Reinforcement Learning

    Lee, X. Y., Balu, A., Stoecklein, D., Ganapathysubramanian, B., and Sarkar, S. Flow shape design for microfluidic devices using deep reinforcement learning. arXiv preprint arXiv:1811.12444, 2018

  16. [16]

    Tactics of adversarial attack on deep reinforcement learning agents

    Lin, Y.-C., Hong, Z.-W., Liao, Y.-H., Shih, M.-L., Liu, M.-Y., and Sun, M. Tactics of adversarial attack on deep reinforcement learning agents. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp.\ 3756--3762. AAAI Press, 2017 a

  17. [17]

    Detecting Adversarial Attacks on Neural Network Policies with Visual Foresight

    Lin, Y.-C., Liu, M.-Y., Sun, M., and Huang, J.-B. Detecting adversarial attacks on neural network policies with visual foresight. arXiv preprint arXiv:1710.00814, 2017 b

  18. [18]

    A., Veness, J., Bellemare, M

    Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. Human-level control through deep reinforcement learning. Nature, 518 0 (7540): 0 529, 2015

  19. [19]

    Neftci, E. O. and Averbeck, B. B. Reinforcement learning in artificial and biological systems. Environment, pp.\ 3, 2002

  20. [20]

    Refuel: Exploring sparse features in deep reinforcement learning for fast disease diagnosis

    Peng, Y.-S., Tang, K.-F., Lin, H.-T., and Chang, E. Refuel: Exploring sparse features in deep reinforcement learning for fast disease diagnosis. In Advances in Neural Information Processing Systems, pp.\ 7322--7331, 2018

  21. [21]

    Rausch, V., Hansen, A., Solowjow, E., Liu, C., Kreuzer, E., and Hedrick, J. K. Learning a deep neural net policy for end-to-end control of autonomous vehicles. In 2017 American Control Conference (ACC), pp.\ 4914--4919. IEEE, 2017

  22. [22]

    J., and Fritz, M

    Tretschk, E., Oh, S. J., and Fritz, M. Sequential attacks on agents for long-term adversarial goals. In 2. ACM Computer Science in Cars Symposium, 2018

  23. [23]

    Neural Architecture Search with Reinforcement Learning

    Zoph, B. and Le, Q. V. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016