arXiv preprint arXiv:1710.11342 , year=

Towards natural, robust adversarial examples , author= · 2017 · cs.LG · arXiv 1710.11342

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Due to their complex nature, it is hard to characterize the ways in which machine learning models can misbehave or be exploited when deployed. Recent work on adversarial examples, i.e. inputs with minor perturbations that result in substantially different model predictions, is helpful in evaluating the robustness of these models by exposing the adversarial scenarios where they fail. However, these malicious perturbations are often unnatural, not semantically meaningful, and not applicable to complicated domains such as language. In this paper, we propose a framework to generate natural and legible adversarial examples that lie on the data manifold, by searching in semantic space of dense and continuous data representation, utilizing the recent advances in generative adversarial networks. We present generated adversaries to demonstrate the potential of the proposed approach for black-box classifiers for a wide range of applications such as image classification, textual entailment, and machine translation. We include experiments to show that the generated adversaries are natural, legible to humans, and useful in evaluating and analyzing black-box classifiers.

representative citing papers

Uncovering and Understanding FPR Manipulation Attack in Industrial IoT Networks

cs.CR · 2026-01-20 · unverdicted · novelty 8.0

FPR manipulation attack perturbs benign MQTT packets to flip labels to attacks in NIDS with 80-100% success, increasing SOC delays without gradient-based methods.

Variational Autoencoder-Based Black-Box Adversarial Attack on Collaborative DNN Inference

cs.CR · 2025-08-01 · unverdicted · novelty 6.0

AdVAR-DNN employs a variational autoencoder to create untraceable adversarial samples that compromise black-box collaborative DNN inference by exploiting model partitioning information exchange, achieving high misclassification success on CIFAR-100 with low detection probability.

Insider Attacks in Multi-Agent LLM Consensus Systems

cs.MA · 2026-05-08 · unverdicted · novelty 5.0

A malicious agent in multi-agent LLM consensus systems can be trained via a surrogate world model and RL to reduce consensus rates and prolong disagreement more effectively than direct prompt attacks.

citing papers explorer

Showing 3 of 3 citing papers.

Uncovering and Understanding FPR Manipulation Attack in Industrial IoT Networks cs.CR · 2026-01-20 · unverdicted · none · ref 25 · internal anchor
FPR manipulation attack perturbs benign MQTT packets to flip labels to attacks in NIDS with 80-100% success, increasing SOC delays without gradient-based methods.
Variational Autoencoder-Based Black-Box Adversarial Attack on Collaborative DNN Inference cs.CR · 2025-08-01 · unverdicted · none · ref 16 · internal anchor
AdVAR-DNN employs a variational autoencoder to create untraceable adversarial samples that compromise black-box collaborative DNN inference by exploiting model partitioning information exchange, achieving high misclassification success on CIFAR-100 with low detection probability.
Insider Attacks in Multi-Agent LLM Consensus Systems cs.MA · 2026-05-08 · unverdicted · none · ref 127
A malicious agent in multi-agent LLM consensus systems can be trained via a surrogate world model and RL to reduce consensus rates and prolong disagreement more effectively than direct prompt attacks.

arXiv preprint arXiv:1710.11342 , year=

fields

years

verdicts

representative citing papers

citing papers explorer