pith. sign in

arxiv: 1611.01236 · v2 · pith:EDMCJCZ3new · submitted 2016-11-04 · 💻 cs.CV · cs.CR· cs.LG· stat.ML

Adversarial Machine Learning at Scale

classification 💻 cs.CV cs.CRcs.LGstat.ML
keywords adversarialtrainingattackexamplesmodelattacksmethodsmodels
0
0 comments X
read the original abstract

Adversarial examples are malicious inputs designed to fool machine learning models. They often transfer from one model to another, allowing attackers to mount black box attacks without knowledge of the target model's parameters. Adversarial training is the process of explicitly training a model on adversarial examples, in order to make it more robust to attack or to reduce its test error on clean inputs. So far, adversarial training has primarily been applied to small problems. In this research, we apply adversarial training to ImageNet. Our contributions include: (1) recommendations for how to succesfully scale adversarial training to large models and datasets, (2) the observation that adversarial training confers robustness to single-step attack methods, (3) the finding that multi-step attack methods are somewhat less transferable than single-step attack methods, so single-step attacks are the best for mounting black-box attacks, and (4) resolution of a "label leaking" effect that causes adversarially trained models to perform better on adversarial examples than on clean examples, because the adversarial example construction process uses the true label and the model can learn to exploit regularities in the construction process.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 14 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Revisiting Model Inversion Evaluation: From Misleading Standards to Reliable Privacy Assessment

    cs.LG 2025-05 conditional novelty 7.0

    Standard model inversion evaluation counts many adversarial false positives as successes; MLLM-based evaluation reveals consistently high false-positive rates across 27 attack setups.

  2. Hard-Label Black-Box Attacks on 3D Point Clouds

    cs.CV 2024-11 unverdicted novelty 7.0

    A spectrum-aware decision boundary algorithm enables effective hard-label black-box adversarial attacks on 3D point cloud models by fusing spectral information across classes and performing curvature-aware iterative o...

  3. MirrorCheck: Efficient Adversarial Defense for Vision-Language Models

    cs.CV 2024-06 unverdicted novelty 7.0

    MirrorCheck detects adversarial attacks on VLMs via T2I regeneration for semantic consistency checks, using stochastic model selection and one-time perturbations for robustness against adaptive attacks.

  4. Quantum Patches: Enhancing Robustness of Quantum Machine Learning Models

    quant-ph 2026-04 unverdicted novelty 6.0

    Random quantum circuits used as adversarial training data reduce successful attack rates on QML models for CIFAR-10 from 89.8% to 68.45% and for CINIC-10 from 94.23% to 78.68%.

  5. Learning-Based Automated Adversarial Red-Teaming for Robustness Evaluation of Large Language Models

    cs.CR 2025-12 unverdicted novelty 6.0

    A meta-prompt and hierarchical detection framework automates LLM red-teaming, achieving 3.9 times higher vulnerability discovery rate than manual methods with 89% accuracy on GPT-OSS-20B.

  6. Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection

    cs.CV 2024-11 unverdicted novelty 6.0

    Orthogonal subspace decomposition via SVD on vision foundation model features preserves high-rank pre-trained knowledge by freezing principal components and adapting residuals, reducing overfitting for better generali...

  7. Fooling a Real Car with Adversarial Traffic Signs

    cs.CR 2019-06 unverdicted novelty 6.0

    A reproducible pipeline produces physical adversarial traffic signs that successfully attack production-grade traffic sign recognition systems in a real car under black-box conditions.

  8. UniAda: Universal Adaptive Multi-objective Adversarial Attack for End-to-End Autonomous Driving Systems

    cs.SE 2026-04 unverdicted novelty 5.0

    UniAda introduces a white-box multi-objective attack using adaptive weighting to generate perturbations that jointly affect steering and speed in E2E ADS, outperforming benchmarks with average deviations of 3.54-29 de...

  9. Latent Adversarial Defence with Boundary-guided Generation

    cs.LG 2019-07 unverdicted novelty 5.0

    LAD generates diverse adversarial examples in latent space by perturbing along normals to an SVM-defined decision boundary and uses them for adversarial training to improve DNN robustness.

  10. Why Blocking Targeted Adversarial Perturbations Impairs the Ability to Learn

    cs.LG 2019-07 unverdicted novelty 5.0

    Defensive distillation blocks non-targeted adversarial attacks but cannot block targeted ones without preventing the network from learning via its input gradient.

  11. Learning to Cope with Adversarial Attacks

    cs.LG 2019-06 unverdicted novelty 5.0

    MLAH agent in deep RL demonstrates hierarchical coping mechanisms and improved reward maintenance under spaced adversarial attacks, at the expense of stability.

  12. Beyond Attack Success Rate: A Multi-Metric Evaluation of Adversarial Transferability in Medical Imaging Models

    cs.CV 2026-04 unverdicted novelty 4.0

    Perceptual quality metrics correlate strongly with each other but show minimal correlation with attack success rate across medical imaging models and datasets, making ASR alone inadequate for assessing adversarial robustness.

  13. Brain MR Image Segmentation in Small Dataset with Adversarial Defense and Task Reorganization

    eess.IV 2019-06 unverdicted novelty 4.0

    The method reaches 84.46% Dice score on brain MR segmentation of gray matter, white matter and major regions using only seven training subjects via adversarial defense and hierarchical task reorganization.

  14. Quantum Adversarial Machine Learning: From Classical Adaptations to Quantum-Native Methods

    cs.LG 2026-05 unverdicted novelty 1.0

    A survey of quantum adversarial machine learning covering attacks, countermeasures, theoretical underpinnings, trends, and challenges.