Attacks Meet Interpretability (AmI) Evaluation and Findings
Pith reviewed 2026-05-24 06:41 UTC · model grok-4.3
The pith
Reproducing AmI shows its adversarial detection depends heavily on hyperparameter selection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AmI is highly dependent on the selection of hyperparameters. Therefore, with a different hyperparameter choice, AmI is still able to detect Nicholas Carlini's attack.
What carries the argument
Hyperparameter selection in the attribute-steered interpretability detection procedure of AmI.
If this is right
- AmI performance on adversarial detection varies with hyperparameter tuning.
- Prior claims that AmI cannot detect certain attacks may depend on the specific parameter settings used.
- Evaluation of interpretability-based defenses requires explicit reporting of hyperparameter search procedures.
Where Pith is reading between the lines
- Standardized protocols for hyperparameter selection could reduce apparent contradictions across studies of such detectors.
- The same sensitivity might appear in other explanation-based defenses, warranting similar reproduction checks.
- Automated or data-driven methods for choosing AmI hyperparameters could be tested as an extension.
Load-bearing premise
The authors' reproduction of the original AmI methods is accurate and their alternative hyperparameter choices fairly test the general behavior of the technique.
What would settle it
A controlled test on the same datasets where AmI with the new hyperparameters fails to detect Carlini's attack at rates reported in the reproduction.
read the original abstract
To investigate the effectiveness of the model explanation in detecting adversarial examples, we reproduce the results of two papers, Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples and Is AmI (Attacks Meet Interpretability) Robust to Adversarial Examples. And then conduct experiments and case studies to identify the limitations of both works. We find that Attacks Meet Interpretability(AmI) is highly dependent on the selection of hyperparameters. Therefore, with a different hyperparameter choice, AmI is still able to detect Nicholas Carlini's attack. Finally, we propose recommendations for future work on the evaluation of defense techniques such as AmI.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reproduces results from two prior papers on Attacks Meet Interpretability (AmI) for detecting adversarial examples, performs additional experiments and case studies to identify limitations, concludes that AmI is highly dependent on hyperparameter selection, shows that alternative hyperparameter choices still allow AmI to detect Nicholas Carlini's attack, and offers recommendations for future evaluations of such defense techniques.
Significance. If the claimed reproductions and hyperparameter-sensitivity results hold with accurate implementations and fair alternative settings, the work would usefully highlight a practical limitation in interpretability-based adversarial detection methods, encouraging more rigorous sensitivity testing in defense evaluations. The abstract provides no supporting data, code, or details to assess whether this contribution is realized.
major comments (1)
- Abstract: The central claim that AmI is highly dependent on hyperparameters and that alternative choices enable detection of Carlini's attack rests entirely on unreported reproductions, datasets, exact hyperparameter values, and experimental protocols; without these, the soundness of the reproduction and the fairness of the alternative settings cannot be verified, making the hyperparameter-dependence conclusion unevaluable.
Simulated Author's Rebuttal
We appreciate the referee's feedback regarding the abstract. We address this comment in detail below and propose revisions to improve clarity and verifiability.
read point-by-point responses
-
Referee: Abstract: The central claim that AmI is highly dependent on hyperparameters and that alternative choices enable detection of Carlini's attack rests entirely on unreported reproductions, datasets, exact hyperparameter values, and experimental protocols; without these, the soundness of the reproduction and the fairness of the alternative settings cannot be verified, making the hyperparameter-dependence conclusion unevaluable.
Authors: We agree that the abstract, as currently written, does not include the specific details of the reproductions, datasets, hyperparameters, or protocols. The full manuscript contains these in the sections describing the reproduction of the two prior papers and the additional experiments. To make the contribution more verifiable from the abstract, we will revise the abstract to include a brief mention of the key experimental settings and findings, and we will ensure that all hyperparameters and protocols are explicitly listed in the main text with code availability. This addresses the concern about verifiability. revision: yes
Circularity Check
No significant circularity detected
full rationale
The provided text consists only of an abstract with no equations, derivations, parameters, or self-citations. Claims rest on empirical reproduction of external prior work and hyperparameter experiments; no load-bearing step reduces by construction to the paper's own inputs or definitions. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- AmI hyperparameters
axioms (1)
- domain assumption The reproduction faithfully implements the original AmI methods from the two cited papers
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
AmI is highly dependent on the selection of hyperparameters. Therefore, with a different hyperparameter choice, AmI is still able to detect Nicholas Carlini's attack.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.