pith. sign in

arxiv: 2310.08808 · v4 · submitted 2023-10-13 · 💻 cs.CR

Attacks Meet Interpretability (AmI) Evaluation and Findings

Pith reviewed 2026-05-24 06:41 UTC · model grok-4.3

classification 💻 cs.CR
keywords adversarial examplesmodel interpretabilityhyperparameter sensitivityreproduction studydefense evaluationattribute-steered detection
0
0 comments X

The pith

Reproducing AmI shows its adversarial detection depends heavily on hyperparameter selection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper reproduces two prior studies on using model interpretability to detect adversarial examples. It finds that the Attacks Meet Interpretability (AmI) method's results change substantially based on hyperparameter choices. With different settings, AmI can detect Nicholas Carlini's attack, contrary to some earlier reports. The authors identify this sensitivity as a core limitation and suggest guidelines for evaluating similar defense techniques in the future.

Core claim

AmI is highly dependent on the selection of hyperparameters. Therefore, with a different hyperparameter choice, AmI is still able to detect Nicholas Carlini's attack.

What carries the argument

Hyperparameter selection in the attribute-steered interpretability detection procedure of AmI.

If this is right

  • AmI performance on adversarial detection varies with hyperparameter tuning.
  • Prior claims that AmI cannot detect certain attacks may depend on the specific parameter settings used.
  • Evaluation of interpretability-based defenses requires explicit reporting of hyperparameter search procedures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Standardized protocols for hyperparameter selection could reduce apparent contradictions across studies of such detectors.
  • The same sensitivity might appear in other explanation-based defenses, warranting similar reproduction checks.
  • Automated or data-driven methods for choosing AmI hyperparameters could be tested as an extension.

Load-bearing premise

The authors' reproduction of the original AmI methods is accurate and their alternative hyperparameter choices fairly test the general behavior of the technique.

What would settle it

A controlled test on the same datasets where AmI with the new hyperparameters fails to detect Carlini's attack at rates reported in the reproduction.

read the original abstract

To investigate the effectiveness of the model explanation in detecting adversarial examples, we reproduce the results of two papers, Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples and Is AmI (Attacks Meet Interpretability) Robust to Adversarial Examples. And then conduct experiments and case studies to identify the limitations of both works. We find that Attacks Meet Interpretability(AmI) is highly dependent on the selection of hyperparameters. Therefore, with a different hyperparameter choice, AmI is still able to detect Nicholas Carlini's attack. Finally, we propose recommendations for future work on the evaluation of defense techniques such as AmI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript reproduces results from two prior papers on Attacks Meet Interpretability (AmI) for detecting adversarial examples, performs additional experiments and case studies to identify limitations, concludes that AmI is highly dependent on hyperparameter selection, shows that alternative hyperparameter choices still allow AmI to detect Nicholas Carlini's attack, and offers recommendations for future evaluations of such defense techniques.

Significance. If the claimed reproductions and hyperparameter-sensitivity results hold with accurate implementations and fair alternative settings, the work would usefully highlight a practical limitation in interpretability-based adversarial detection methods, encouraging more rigorous sensitivity testing in defense evaluations. The abstract provides no supporting data, code, or details to assess whether this contribution is realized.

major comments (1)
  1. Abstract: The central claim that AmI is highly dependent on hyperparameters and that alternative choices enable detection of Carlini's attack rests entirely on unreported reproductions, datasets, exact hyperparameter values, and experimental protocols; without these, the soundness of the reproduction and the fairness of the alternative settings cannot be verified, making the hyperparameter-dependence conclusion unevaluable.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We appreciate the referee's feedback regarding the abstract. We address this comment in detail below and propose revisions to improve clarity and verifiability.

read point-by-point responses
  1. Referee: Abstract: The central claim that AmI is highly dependent on hyperparameters and that alternative choices enable detection of Carlini's attack rests entirely on unreported reproductions, datasets, exact hyperparameter values, and experimental protocols; without these, the soundness of the reproduction and the fairness of the alternative settings cannot be verified, making the hyperparameter-dependence conclusion unevaluable.

    Authors: We agree that the abstract, as currently written, does not include the specific details of the reproductions, datasets, hyperparameters, or protocols. The full manuscript contains these in the sections describing the reproduction of the two prior papers and the additional experiments. To make the contribution more verifiable from the abstract, we will revise the abstract to include a brief mention of the key experimental settings and findings, and we will ensure that all hyperparameters and protocols are explicitly listed in the main text with code availability. This addresses the concern about verifiability. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided text consists only of an abstract with no equations, derivations, parameters, or self-citations. Claims rest on empirical reproduction of external prior work and hyperparameter experiments; no load-bearing step reduces by construction to the paper's own inputs or definitions. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the accuracy of the authors' reproduction of the two prior AmI papers and on the representativeness of their hyperparameter experiments; no further free parameters, axioms, or invented entities are stated in the abstract.

free parameters (1)
  • AmI hyperparameters
    The paper identifies strong dependence on hyperparameter selection as the key limitation, though no specific values are given in the abstract.
axioms (1)
  • domain assumption The reproduction faithfully implements the original AmI methods from the two cited papers
    Required for the identified limitations to be attributed to the technique itself.

pith-pipeline@v0.9.0 · 5601 in / 1129 out tokens · 28468 ms · 2026-05-24T06:41:06.647660+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.