Technical Report on the CleverHans v2.1.0 Adversarial Examples Library

· 2016 · cs.LG · arXiv 1610.00768

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open full Pith review browse 4 citing papers arXiv PDF

abstract

CleverHans is a software library that provides standardized reference implementations of adversarial example construction techniques and adversarial training. The library may be used to develop more robust machine learning models and to provide standardized benchmarks of models' performance in the adversarial setting. Benchmarks constructed without a standardized implementation of adversarial example construction are not comparable to each other, because a good result may indicate a robust model or it may merely indicate a weak implementation of the adversarial example construction procedure. This technical report is structured as follows. Section 1 provides an overview of adversarial examples in machine learning and of the CleverHans software. Section 2 presents the core functionalities of the library: namely the attacks based on adversarial examples and defenses to improve the robustness of machine learning models to these attacks. Section 3 describes how to report benchmark results using the library. Section 4 describes the versioning system.

representative citing papers

Discriminative Active Learning

cs.LG · 2019-07-15 · unverdicted · novelty 6.0

DAL poses batch active learning as a binary classification task between labeled and unlabeled data to select informative examples for labeling.

Latent Adversarial Defence with Boundary-guided Generation

cs.LG · 2019-07-16 · unverdicted · novelty 5.0

LAD generates diverse adversarial examples in latent space by perturbing along normals to an SVM-defined decision boundary and uses them for adversarial training to improve DNN robustness.

Defending Adversarial Attacks by Correcting logits

cs.LG · 2019-06-26 · unverdicted · novelty 5.0

A two-layer network trained on mixed clean and perturbed logits recovers original predictions for a range of adversarial attacks without needing image data.

LLM-Safety Evaluations Lack Robustness

cs.CR · 2025-03-04 · unverdicted · novelty 4.0

LLM safety evaluations are hindered by noise in dataset curation, automated red-teaming, response generation, and LLM-judge evaluation, making fair comparisons difficult and slowing progress.

citing papers explorer

Showing 4 of 4 citing papers.

Discriminative Active Learning cs.LG · 2019-07-15 · unverdicted · none · ref 8 · internal anchor
DAL poses batch active learning as a binary classification task between labeled and unlabeled data to select informative examples for labeling.
Latent Adversarial Defence with Boundary-guided Generation cs.LG · 2019-07-16 · unverdicted · none · ref 31 · internal anchor
LAD generates diverse adversarial examples in latent space by perturbing along normals to an SVM-defined decision boundary and uses them for adversarial training to improve DNN robustness.
Defending Adversarial Attacks by Correcting logits cs.LG · 2019-06-26 · unverdicted · none · ref 23 · internal anchor
A two-layer network trained on mixed clean and perturbed logits recovers original predictions for a range of adversarial attacks without needing image data.
LLM-Safety Evaluations Lack Robustness cs.CR · 2025-03-04 · unverdicted · none · ref 42 · internal anchor
LLM safety evaluations are hindered by noise in dataset curation, automated red-teaming, response generation, and LLM-judge evaluation, making fair comparisons difficult and slowing progress.

Technical Report on the CleverHans v2.1.0 Adversarial Examples Library

fields

years

verdicts

representative citing papers

citing papers explorer