Attacks Meet Interpretability (AmI) Evaluation and Findings

Qian Ma; Shagufta Mehnaz; Ziping Ye

arxiv: 2310.08808 · v4 · submitted 2023-10-13 · 💻 cs.CR

Attacks Meet Interpretability (AmI) Evaluation and Findings

Qian Ma , Ziping Ye , Shagufta Mehnaz This is my paper

Pith reviewed 2026-05-24 06:41 UTC · model grok-4.3

classification 💻 cs.CR

keywords adversarial examplesmodel interpretabilityhyperparameter sensitivityreproduction studydefense evaluationattribute-steered detection

0 comments

The pith

Reproducing AmI shows its adversarial detection depends heavily on hyperparameter selection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper reproduces two prior studies on using model interpretability to detect adversarial examples. It finds that the Attacks Meet Interpretability (AmI) method's results change substantially based on hyperparameter choices. With different settings, AmI can detect Nicholas Carlini's attack, contrary to some earlier reports. The authors identify this sensitivity as a core limitation and suggest guidelines for evaluating similar defense techniques in the future.

Core claim

AmI is highly dependent on the selection of hyperparameters. Therefore, with a different hyperparameter choice, AmI is still able to detect Nicholas Carlini's attack.

What carries the argument

Hyperparameter selection in the attribute-steered interpretability detection procedure of AmI.

If this is right

AmI performance on adversarial detection varies with hyperparameter tuning.
Prior claims that AmI cannot detect certain attacks may depend on the specific parameter settings used.
Evaluation of interpretability-based defenses requires explicit reporting of hyperparameter search procedures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Standardized protocols for hyperparameter selection could reduce apparent contradictions across studies of such detectors.
The same sensitivity might appear in other explanation-based defenses, warranting similar reproduction checks.
Automated or data-driven methods for choosing AmI hyperparameters could be tested as an extension.

Load-bearing premise

The authors' reproduction of the original AmI methods is accurate and their alternative hyperparameter choices fairly test the general behavior of the technique.

What would settle it

A controlled test on the same datasets where AmI with the new hyperparameters fails to detect Carlini's attack at rates reported in the reproduction.

read the original abstract

To investigate the effectiveness of the model explanation in detecting adversarial examples, we reproduce the results of two papers, Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples and Is AmI (Attacks Meet Interpretability) Robust to Adversarial Examples. And then conduct experiments and case studies to identify the limitations of both works. We find that Attacks Meet Interpretability(AmI) is highly dependent on the selection of hyperparameters. Therefore, with a different hyperparameter choice, AmI is still able to detect Nicholas Carlini's attack. Finally, we propose recommendations for future work on the evaluation of defense techniques such as AmI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Reproduction flags hyperparameter sensitivity in AmI detection but only the abstract is available so the evidence can't be checked.

read the letter

The main point is that this reproduction of two prior AmI papers finds the detection method depends heavily on hyperparameter choices, so a different setting still catches Nicholas Carlini's attack where the originals apparently did not. That observation is new relative to the cited works and could matter for how people test interpretability-based defenses going forward. The paper reproduces the earlier results, runs its own experiments and case studies to surface the limitation, and closes with recommendations on evaluating such techniques. That is a straightforward and useful extension in an area where evaluation standards are still settling. The soft spot is obvious and central: only the abstract exists here. No methods, hyperparameter values, datasets, attack implementations, or reproduction steps are provided, so there is no way to judge whether the reproduction matches the originals or whether the alternative hyperparameters were chosen fairly rather than to produce the desired outcome. The claim rests entirely on empirical work that remains invisible. This kind of paper is for researchers already working on adversarial machine learning and interpretability defenses who care about making evaluations more reliable. A reader focused on improving testing practices might extract some value from the recommendations, but anyone wanting to rely on the specific finding would need the full experimental record first. It deserves a serious referee if the complete version supplies enough detail to let others verify the reproduction and the hyperparameter tests; without that, the contribution stays too provisional to invest referee time in.

Referee Report

1 major / 0 minor

Summary. The manuscript reproduces results from two prior papers on Attacks Meet Interpretability (AmI) for detecting adversarial examples, performs additional experiments and case studies to identify limitations, concludes that AmI is highly dependent on hyperparameter selection, shows that alternative hyperparameter choices still allow AmI to detect Nicholas Carlini's attack, and offers recommendations for future evaluations of such defense techniques.

Significance. If the claimed reproductions and hyperparameter-sensitivity results hold with accurate implementations and fair alternative settings, the work would usefully highlight a practical limitation in interpretability-based adversarial detection methods, encouraging more rigorous sensitivity testing in defense evaluations. The abstract provides no supporting data, code, or details to assess whether this contribution is realized.

major comments (1)

Abstract: The central claim that AmI is highly dependent on hyperparameters and that alternative choices enable detection of Carlini's attack rests entirely on unreported reproductions, datasets, exact hyperparameter values, and experimental protocols; without these, the soundness of the reproduction and the fairness of the alternative settings cannot be verified, making the hyperparameter-dependence conclusion unevaluable.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We appreciate the referee's feedback regarding the abstract. We address this comment in detail below and propose revisions to improve clarity and verifiability.

read point-by-point responses

Referee: Abstract: The central claim that AmI is highly dependent on hyperparameters and that alternative choices enable detection of Carlini's attack rests entirely on unreported reproductions, datasets, exact hyperparameter values, and experimental protocols; without these, the soundness of the reproduction and the fairness of the alternative settings cannot be verified, making the hyperparameter-dependence conclusion unevaluable.

Authors: We agree that the abstract, as currently written, does not include the specific details of the reproductions, datasets, hyperparameters, or protocols. The full manuscript contains these in the sections describing the reproduction of the two prior papers and the additional experiments. To make the contribution more verifiable from the abstract, we will revise the abstract to include a brief mention of the key experimental settings and findings, and we will ensure that all hyperparameters and protocols are explicitly listed in the main text with code availability. This addresses the concern about verifiability. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided text consists only of an abstract with no equations, derivations, parameters, or self-citations. Claims rest on empirical reproduction of external prior work and hyperparameter experiments; no load-bearing step reduces by construction to the paper's own inputs or definitions. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the accuracy of the authors' reproduction of the two prior AmI papers and on the representativeness of their hyperparameter experiments; no further free parameters, axioms, or invented entities are stated in the abstract.

free parameters (1)

AmI hyperparameters
The paper identifies strong dependence on hyperparameter selection as the key limitation, though no specific values are given in the abstract.

axioms (1)

domain assumption The reproduction faithfully implements the original AmI methods from the two cited papers
Required for the identified limitations to be attributed to the technique itself.

pith-pipeline@v0.9.0 · 5601 in / 1129 out tokens · 28468 ms · 2026-05-24T06:41:06.647660+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

AmI is highly dependent on the selection of hyperparameters. Therefore, with a different hyperparameter choice, AmI is still able to detect Nicholas Carlini's attack.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.