pith. sign in

arxiv: 1907.06291 · v1 · pith:L3VXMIBUnew · submitted 2019-07-14 · 💻 cs.LG · cs.CR· cs.CV· stat.ML

Measuring the Transferability of Adversarial Examples

Pith reviewed 2026-05-24 21:25 UTC · model grok-4.3

classification 💻 cs.LG cs.CRcs.CVstat.ML
keywords adversarial examplestransferabilityFGSMBIMCarlini-WagnerSSIM metricblack-box attacksmodel ensembles
0
0 comments X

The pith

Adversarial example transferability is measured more accurately with strong attack parameters, L-infinity clipping, and the SSIM metric.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates transfer of adversarial examples generated by FGSM, BIM, and Carlini-Wagner attacks from source models to target models within the VGG class and the Inception class. It identifies problems in how prior work assessed such transfer and attempts to correct them through stronger attack settings, L-infinity norm clipping on perturbations, and replacement of pixel-difference metrics with SSIM. A reader would care because accurate transfer rates determine how feasible black-box attacks are against models that differ from the ones used to craft the examples. If the amendments succeed, earlier reported transfer rates may have been distorted by weak parameters or mismatched success criteria.

Core claim

Current assessments of adversarial transferability suffer from weak attack parameters and unsuitable evaluation metrics. By selecting strong parameters for the Fast Gradient Sign Method, Basic Iterative Method, and Carlini & Wagner attacks, applying L-Infinity clipping, and using the SSIM metric, transferability can be measured more accurately between the VGG class of models and the Inception class of models, including their ensembles.

What carries the argument

The amended evaluation protocol that applies strong parameters to FGSM, BIM, and C&W attacks, clips perturbations under the L-infinity norm, and scores transfer success with the SSIM metric across VGG-class and Inception-class models.

If this is right

  • Transfer rates obtained under the new protocol are more representative of black-box attack success.
  • Differences in transferability between attack methods become visible without confounding from weak parameter choices.
  • Within-class transfer remains higher than cross-class transfer once evaluation bias is reduced.
  • Ensemble models within each class provide a stricter test of whether perturbations survive model variation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same protocol could be applied to additional architectures to test whether the observed within-class versus cross-class pattern holds more generally.
  • If SSIM better captures human-noticeable changes than pixel norms, the protocol might also improve evaluation of other perturbation methods beyond the three studied here.
  • Lower measured transfer under the corrected protocol would imply that defenses focused on model diversity gain more practical value than previously estimated.

Load-bearing premise

That selecting strong attack parameters, applying L-infinity clipping, and switching to SSIM for evaluation produces a meaningfully improved and less biased measurement of transferability than prior practices.

What would settle it

Obtaining the same transfer rates on the same model pairs when repeating the experiments with weak parameters and pixel-difference metrics instead of the proposed protocol would show the amendments do not change the measured transferability.

Figures

Figures reproduced from arXiv: 1907.06291 by Deyan Petrov, Timothy M. Hospedales.

Figure 1
Figure 1. Figure 1: VGG Ensemble 2.2.2. INCEPTION FAMILY GoogLeNet, also known as Inception-V1 was first intro￾duced in 2014 (Szegedy et al., 2014) and exploited the idea of an ”inception modules”(Lin et al., 2013). The inception [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Inception Ensemble. types of modules they use. The Inception Resnet versions incorporate residual connections in their modules. For network stability, activation scaling is used for residual layers deeper in the architecture. Out of the three models, Inception Resnet V2 is the deepest and the differs the most from the other two. It requires twice the memory and computational operations as compared to Incep… view at source ↗
Figure 4
Figure 4. Figure 4: Performance of every classifier when attacked by each of the 7 classifiers, for all 3 attacks, under the L-Infinity metric. For example, (a) shows the defensive accuracy of VGG16 when attacked by clipped adversarial images created by all 7 classifiers, using the FGSM attack(title of plot signifies the defending classifier and the used attack) [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 3
Figure 3. Figure 3: The three visual metrics considered for the reorganization of results - Inception Score of the adversarials of varying clip strengths(top), Average Mean Absolute Distance of adversarials of varying clip strengths from the original images(middle), and Average Structural Similarity Index of adversarials of varying clip strengths from the originals(bottom). 3.4. Discussion Using SSIM analysis(fig. 6) we are a… view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of the transferability of the three attacks under the chosen parameters 4. Conclusion Transferable adversarial examples are a potential risk to a variety of applications employing machine learning meth￾ods by using the examples created using a local model to attack a remote service. In this work we evaluate the trans￾ferability of adversarial examples using the setting of strong untargeted attac… view at source ↗
Figure 6
Figure 6. Figure 6: Performance of every classifier when attacked by each of the 7 classifiers, for all 3 attacks, using SSIM to calibrate the x-axis as metric for visual perturbation. For example, (a) shows the defensive accuracy of VGG16 when attacked by clipped adversarial images created by all 7 classifiers, using the FGSM attack(title of plot signifies the defending classifier and the used attack) [PITH_FULL_IMAGE:figur… view at source ↗
Figure 7
Figure 7. Figure 7: Images for the FGSM attack trained on the VGG-Ensemble of different L-Infinity clipping ranges and the average transferabil￾ity(rounded) of these ranges across all models, including the source [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Images for the I-FGSM attack trained on the VGG19 nework of different L-Infinity clipping ranges and the average transferabil￾ity(rounded) of these ranges across all models, including the source [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Images for the C&W attack with LR=7, trained on the VGG19 network of different L-Infinity clipping ranges and the average transferability(rounded) of these ranges across all models, including the source [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗
read the original abstract

Adversarial examples are of wide concern due to their impact on the reliability of contemporary machine learning systems. Effective adversarial examples are mostly found via white-box attacks. However, in some cases they can be transferred across models, thus enabling them to attack black-box models. In this work we evaluate the transferability of three adversarial attacks - the Fast Gradient Sign Method, the Basic Iterative Method, and the Carlini & Wagner method, across two classes of models - the VGG class(using VGG16, VGG19 and an ensemble of VGG16 and VGG19), and the Inception class(Inception V3, Xception, Inception Resnet V2, and an ensemble of the three). We also outline the problems with the assessment of transferability in the current body of research and attempt to amend them by picking specific "strong" parameters for the attacks, and by using a L-Infinity clipping technique and the SSIM metric for the final evaluation of the attack transferability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper evaluates transferability of FGSM, BIM, and C&W attacks across VGG-family (VGG16, VGG19, ensemble) and Inception-family (InceptionV3, Xception, InceptionResNetV2, ensemble) ImageNet models. It identifies flaws in prior transferability assessment and proposes fixes via selection of 'strong' attack hyperparameters, L∞ clipping of perturbations, and replacement of pixel-norm success rates with the SSIM metric for final evaluation.

Significance. If the proposed methodological changes demonstrably reduce bias or alter measured transfer rates relative to standard practices, the work could help standardize more reliable empirical protocols in adversarial ML. The evaluation itself adds data on cross-family transfer between VGG and Inception architectures, but its value hinges on validation of the fixes.

major comments (2)
  1. [Abstract and § on proposed fixes] The central methodological claim—that 'strong' parameters, L∞ clipping, and SSIM produce a meaningfully improved and less biased measurement of transferability—lacks load-bearing evidence. No before/after comparison, sensitivity study, or external validation is provided showing that these choices change transfer rates or better predict black-box success than the L∞/L2 norms already used in the cited attacks.
  2. [Experimental results section] The evaluation reports transfer rates for the three attacks across the two model classes, but without tables or figures showing quantitative results, error bars, or statistical tests, it is impossible to assess whether the observed transferability supports any claim about relative attack strength or the effect of the proposed fixes.
minor comments (2)
  1. Notation for model ensembles and attack parameters should be defined explicitly in a table or early section rather than inline.
  2. The abstract states the intent to amend assessment problems but supplies no numerical results; the results section should open with a direct comparison to prior transferability numbers from the cited literature.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments below and will revise the manuscript to strengthen the evidence for the proposed methodological changes and improve the presentation of quantitative results.

read point-by-point responses
  1. Referee: [Abstract and § on proposed fixes] The central methodological claim—that 'strong' parameters, L∞ clipping, and SSIM produce a meaningfully improved and less biased measurement of transferability—lacks load-bearing evidence. No before/after comparison, sensitivity study, or external validation is provided showing that these choices change transfer rates or better predict black-box success than the L∞/L2 norms already used in the cited attacks.

    Authors: We agree that direct comparative evidence would strengthen the central claim. The manuscript identifies specific issues with prior transferability assessments and motivates the choices of strong attack parameters, L∞ clipping, and SSIM on that basis, but does not include explicit before/after or sensitivity experiments. In the revision we will add such comparisons (standard vs. proposed protocols) and a sensitivity analysis on the attack hyperparameters to demonstrate their effect on measured transfer rates. revision: yes

  2. Referee: [Experimental results section] The evaluation reports transfer rates for the three attacks across the two model classes, but without tables or figures showing quantitative results, error bars, or statistical tests, it is impossible to assess whether the observed transferability supports any claim about relative attack strength or the effect of the proposed fixes.

    Authors: The transfer rates are currently described in the text. We acknowledge that this makes quantitative assessment difficult. In the revised manuscript we will include summary tables of the transfer rates across attacks and model families, figures with error bars (derived from multiple runs or model variations where applicable), and appropriate statistical notes or tests to support claims about relative attack strength and the impact of the proposed fixes. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical evaluation with no derivations or self-referential reductions

full rationale

The paper conducts an empirical study measuring transfer rates of FGSM, BIM, and C&W attacks between VGG-family and Inception-family models. It proposes three methodological choices (strong attack hyperparameters, L-inf clipping, SSIM metric) to address perceived flaws in prior work, but supplies no equations, no fitted parameters renamed as predictions, and no derivation chain. These choices are presented as experimental decisions rather than results derived from the paper's own inputs. No self-citation is used to justify a uniqueness theorem or ansatz that would close a loop. The evaluation is performed against external, pre-trained models and is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical choice of 'strong' attack parameters and the domain assumption that the selected VGG and Inception models adequately represent transferability behavior across architecture classes.

free parameters (1)
  • strong attack parameters
    Specific parameter settings for FGSM, BIM, and C&W described as 'strong' to address assessment problems; values not supplied in abstract.
axioms (1)
  • domain assumption VGG and Inception model classes are representative for studying transferability across architectures
    Paper partitions models into these two classes and treats results as informative about transferability between families.

pith-pipeline@v0.9.0 · 5704 in / 1454 out tokens · 29012 ms · 2026-05-24T21:25:22.021749+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 13 internal anchors

  1. [1]

    Towards Evaluating the Robustness of Neural Networks

    Carlini, N. and Wagner, D. A. Towards evaluating the robustness of neural networks. CoRR, abs/1608.04644,

  2. [3]

    Xception: Deep Learning with Depthwise Separable Convolutions

    URL http://arxiv.org/abs/1610.02357. Finlayson, S. G., Kohane, I. S., and Beam, A. L. Adversarial attacks against medical deep learning systems. CoRR, abs/1804.05296,

  3. [4]

    Adversarial Attacks Against Medical Deep Learning Systems

    URL http://arxiv.org/ abs/1804.05296. Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. CoRR, abs/1412.6572,

  4. [6]

    Deep Residual Learning for Image Recognition

    URL http://arxiv.org/abs/1512.03385. Ioffe, S. and Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR, abs/1502.03167,

  5. [7]

    Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

    URL http://arxiv. org/abs/1502.03167. Kurakin, A., Goodfellow, I. J., and Bengio, S. Ad- versarial examples in the physical world. CoRR, abs/1607.02533,

  6. [8]

    Adversarial examples in the physical world

    URL http://arxiv.org/ abs/1607.02533. LeCun, Y ., Haffner, P., Bottou, L., and Bengio, Y . Ob- ject recognition with gradient-based learning. In Shape, Contour and Grouping in Computer Vision , pp. 319–, London, UK, UK,

  7. [9]

    ISBN 3-540- 66722-9

    Springer-Verlag. ISBN 3-540- 66722-9. URL http://dl.acm.org/citation. cfm?id=646469.691875. Lin, M., Chen, Q., and Yan, S. Network in network. CoRR, abs/1312.4400,

  8. [10]

    Network In Network

    URL http://arxiv.org/ abs/1312.4400. Liu, Y ., Chen, X., Liu, C., and Song, D. Delving into transferable adversarial examples and black-box attacks. CoRR, abs/1611.02770,

  9. [12]

    Papernot, N., McDaniel, P

    URL http://arxiv.org/ abs/1807.01069. Papernot, N., McDaniel, P. D., Wu, X., Jha, S., and Swami, A. Distillation as a defense to adversarial perturbations against deep neural networks. CoRR, abs/1511.04508,

  10. [13]

    Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks

    URL http://arxiv.org/ abs/1511.04508. Salimans, T., Goodfellow, I. J., Zaremba, W., Cheung, V ., Radford, A., and Chen, X. Improved techniques for training gans. CoRR, abs/1606.03498,

  11. [14]

    Improved Techniques for Training GANs

    URL http: //arxiv.org/abs/1606.03498. Simonyan, K. and Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. ArXiv e-prints, September

  12. [15]

    Is Robustness the Cost of Accuracy? -- A Comprehensive Study on the Robustness of 18 Deep Image Classification Models

    Su, D., Zhang, H., Chen, H., Yi, J., Chen, P., and Gao, Y . Is robustness the cost of accuracy? - A comprehensive study on the robustness of 18 deep image classification models. CoRR, abs/1808.01688,

  13. [17]

    Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I

    URL http://arxiv.org/ abs/1710.08864. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. J., and Fergus, R. Intriguing properties of neural networks. CoRR, abs/1312.6199,

  14. [19]

    Going Deeper with Convolutions

    URL http://arxiv.org/ abs/1409.4842. Szegedy, C., Vanhoucke, V ., Ioffe, S., Shlens, J., and Wojna, Z. Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567,

  15. [20]

    Rethinking the Inception Architecture for Computer Vision

    URL http:// arxiv.org/abs/1512.00567. Szegedy, C., Ioffe, S., and Vanhoucke, V . Inception-v4, inception-resnet and the impact of residual connections on learning. CoRR, abs/1602.07261,

  16. [21]

    Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

    URL http: //arxiv.org/abs/1602.07261. Wang, Z., Bovik, A., Rahim Sheikh, H., and Simoncelli, E. Image quality assessment: From error visibility to struc- tural similarity. Image Processing, IEEE Transactions on, 13:600 – 612, 05

  17. [22]

    Bovik, Hamid R

    doi: 10.1109/TIP.2003.819861. Appendix Original images Clip range - 10, avg. transferability - 49% Clip range - 20, avg. transferability - 63% Clip range - 30, avg. transferability - 71% Clip range - 40, avg. transferability - 77% Clip range - 50, avg. transferability - 83% Clip range - 60, avg. transferability - 87% Clip range - 70, avg. transferability ...