Measuring the Transferability of Adversarial Examples
Pith reviewed 2026-05-24 21:25 UTC · model grok-4.3
The pith
Adversarial example transferability is measured more accurately with strong attack parameters, L-infinity clipping, and the SSIM metric.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Current assessments of adversarial transferability suffer from weak attack parameters and unsuitable evaluation metrics. By selecting strong parameters for the Fast Gradient Sign Method, Basic Iterative Method, and Carlini & Wagner attacks, applying L-Infinity clipping, and using the SSIM metric, transferability can be measured more accurately between the VGG class of models and the Inception class of models, including their ensembles.
What carries the argument
The amended evaluation protocol that applies strong parameters to FGSM, BIM, and C&W attacks, clips perturbations under the L-infinity norm, and scores transfer success with the SSIM metric across VGG-class and Inception-class models.
If this is right
- Transfer rates obtained under the new protocol are more representative of black-box attack success.
- Differences in transferability between attack methods become visible without confounding from weak parameter choices.
- Within-class transfer remains higher than cross-class transfer once evaluation bias is reduced.
- Ensemble models within each class provide a stricter test of whether perturbations survive model variation.
Where Pith is reading between the lines
- The same protocol could be applied to additional architectures to test whether the observed within-class versus cross-class pattern holds more generally.
- If SSIM better captures human-noticeable changes than pixel norms, the protocol might also improve evaluation of other perturbation methods beyond the three studied here.
- Lower measured transfer under the corrected protocol would imply that defenses focused on model diversity gain more practical value than previously estimated.
Load-bearing premise
That selecting strong attack parameters, applying L-infinity clipping, and switching to SSIM for evaluation produces a meaningfully improved and less biased measurement of transferability than prior practices.
What would settle it
Obtaining the same transfer rates on the same model pairs when repeating the experiments with weak parameters and pixel-difference metrics instead of the proposed protocol would show the amendments do not change the measured transferability.
Figures
read the original abstract
Adversarial examples are of wide concern due to their impact on the reliability of contemporary machine learning systems. Effective adversarial examples are mostly found via white-box attacks. However, in some cases they can be transferred across models, thus enabling them to attack black-box models. In this work we evaluate the transferability of three adversarial attacks - the Fast Gradient Sign Method, the Basic Iterative Method, and the Carlini & Wagner method, across two classes of models - the VGG class(using VGG16, VGG19 and an ensemble of VGG16 and VGG19), and the Inception class(Inception V3, Xception, Inception Resnet V2, and an ensemble of the three). We also outline the problems with the assessment of transferability in the current body of research and attempt to amend them by picking specific "strong" parameters for the attacks, and by using a L-Infinity clipping technique and the SSIM metric for the final evaluation of the attack transferability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper evaluates transferability of FGSM, BIM, and C&W attacks across VGG-family (VGG16, VGG19, ensemble) and Inception-family (InceptionV3, Xception, InceptionResNetV2, ensemble) ImageNet models. It identifies flaws in prior transferability assessment and proposes fixes via selection of 'strong' attack hyperparameters, L∞ clipping of perturbations, and replacement of pixel-norm success rates with the SSIM metric for final evaluation.
Significance. If the proposed methodological changes demonstrably reduce bias or alter measured transfer rates relative to standard practices, the work could help standardize more reliable empirical protocols in adversarial ML. The evaluation itself adds data on cross-family transfer between VGG and Inception architectures, but its value hinges on validation of the fixes.
major comments (2)
- [Abstract and § on proposed fixes] The central methodological claim—that 'strong' parameters, L∞ clipping, and SSIM produce a meaningfully improved and less biased measurement of transferability—lacks load-bearing evidence. No before/after comparison, sensitivity study, or external validation is provided showing that these choices change transfer rates or better predict black-box success than the L∞/L2 norms already used in the cited attacks.
- [Experimental results section] The evaluation reports transfer rates for the three attacks across the two model classes, but without tables or figures showing quantitative results, error bars, or statistical tests, it is impossible to assess whether the observed transferability supports any claim about relative attack strength or the effect of the proposed fixes.
minor comments (2)
- Notation for model ensembles and attack parameters should be defined explicitly in a table or early section rather than inline.
- The abstract states the intent to amend assessment problems but supplies no numerical results; the results section should open with a direct comparison to prior transferability numbers from the cited literature.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the major comments below and will revise the manuscript to strengthen the evidence for the proposed methodological changes and improve the presentation of quantitative results.
read point-by-point responses
-
Referee: [Abstract and § on proposed fixes] The central methodological claim—that 'strong' parameters, L∞ clipping, and SSIM produce a meaningfully improved and less biased measurement of transferability—lacks load-bearing evidence. No before/after comparison, sensitivity study, or external validation is provided showing that these choices change transfer rates or better predict black-box success than the L∞/L2 norms already used in the cited attacks.
Authors: We agree that direct comparative evidence would strengthen the central claim. The manuscript identifies specific issues with prior transferability assessments and motivates the choices of strong attack parameters, L∞ clipping, and SSIM on that basis, but does not include explicit before/after or sensitivity experiments. In the revision we will add such comparisons (standard vs. proposed protocols) and a sensitivity analysis on the attack hyperparameters to demonstrate their effect on measured transfer rates. revision: yes
-
Referee: [Experimental results section] The evaluation reports transfer rates for the three attacks across the two model classes, but without tables or figures showing quantitative results, error bars, or statistical tests, it is impossible to assess whether the observed transferability supports any claim about relative attack strength or the effect of the proposed fixes.
Authors: The transfer rates are currently described in the text. We acknowledge that this makes quantitative assessment difficult. In the revised manuscript we will include summary tables of the transfer rates across attacks and model families, figures with error bars (derived from multiple runs or model variations where applicable), and appropriate statistical notes or tests to support claims about relative attack strength and the impact of the proposed fixes. revision: yes
Circularity Check
No circularity: purely empirical evaluation with no derivations or self-referential reductions
full rationale
The paper conducts an empirical study measuring transfer rates of FGSM, BIM, and C&W attacks between VGG-family and Inception-family models. It proposes three methodological choices (strong attack hyperparameters, L-inf clipping, SSIM metric) to address perceived flaws in prior work, but supplies no equations, no fitted parameters renamed as predictions, and no derivation chain. These choices are presented as experimental decisions rather than results derived from the paper's own inputs. No self-citation is used to justify a uniqueness theorem or ansatz that would close a loop. The evaluation is performed against external, pre-trained models and is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- strong attack parameters
axioms (1)
- domain assumption VGG and Inception model classes are representative for studying transferability across architectures
Reference graph
Works this paper leans on
-
[1]
Towards Evaluating the Robustness of Neural Networks
Carlini, N. and Wagner, D. A. Towards evaluating the robustness of neural networks. CoRR, abs/1608.04644,
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
Xception: Deep Learning with Depthwise Separable Convolutions
URL http://arxiv.org/abs/1610.02357. Finlayson, S. G., Kohane, I. S., and Beam, A. L. Adversarial attacks against medical deep learning systems. CoRR, abs/1804.05296,
work page internal anchor Pith review Pith/arXiv arXiv
-
[4]
Adversarial Attacks Against Medical Deep Learning Systems
URL http://arxiv.org/ abs/1804.05296. Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. CoRR, abs/1412.6572,
work page internal anchor Pith review Pith/arXiv arXiv
-
[6]
Deep Residual Learning for Image Recognition
URL http://arxiv.org/abs/1512.03385. Ioffe, S. and Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR, abs/1502.03167,
work page internal anchor Pith review Pith/arXiv arXiv
-
[7]
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
URL http://arxiv. org/abs/1502.03167. Kurakin, A., Goodfellow, I. J., and Bengio, S. Ad- versarial examples in the physical world. CoRR, abs/1607.02533,
work page internal anchor Pith review Pith/arXiv arXiv
-
[8]
Adversarial examples in the physical world
URL http://arxiv.org/ abs/1607.02533. LeCun, Y ., Haffner, P., Bottou, L., and Bengio, Y . Ob- ject recognition with gradient-based learning. In Shape, Contour and Grouping in Computer Vision , pp. 319–, London, UK, UK,
work page internal anchor Pith review Pith/arXiv arXiv
-
[9]
Springer-Verlag. ISBN 3-540- 66722-9. URL http://dl.acm.org/citation. cfm?id=646469.691875. Lin, M., Chen, Q., and Yan, S. Network in network. CoRR, abs/1312.4400,
-
[10]
URL http://arxiv.org/ abs/1312.4400. Liu, Y ., Chen, X., Liu, C., and Song, D. Delving into transferable adversarial examples and black-box attacks. CoRR, abs/1611.02770,
work page internal anchor Pith review Pith/arXiv arXiv
-
[12]
URL http://arxiv.org/ abs/1807.01069. Papernot, N., McDaniel, P. D., Wu, X., Jha, S., and Swami, A. Distillation as a defense to adversarial perturbations against deep neural networks. CoRR, abs/1511.04508,
-
[13]
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
URL http://arxiv.org/ abs/1511.04508. Salimans, T., Goodfellow, I. J., Zaremba, W., Cheung, V ., Radford, A., and Chen, X. Improved techniques for training gans. CoRR, abs/1606.03498,
work page internal anchor Pith review Pith/arXiv arXiv
-
[14]
Improved Techniques for Training GANs
URL http: //arxiv.org/abs/1606.03498. Simonyan, K. and Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. ArXiv e-prints, September
work page internal anchor Pith review Pith/arXiv arXiv
-
[15]
Su, D., Zhang, H., Chen, H., Yi, J., Chen, P., and Gao, Y . Is robustness the cost of accuracy? - A comprehensive study on the robustness of 18 deep image classification models. CoRR, abs/1808.01688,
work page internal anchor Pith review Pith/arXiv arXiv
-
[17]
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I
URL http://arxiv.org/ abs/1710.08864. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. J., and Fergus, R. Intriguing properties of neural networks. CoRR, abs/1312.6199,
-
[19]
Going Deeper with Convolutions
URL http://arxiv.org/ abs/1409.4842. Szegedy, C., Vanhoucke, V ., Ioffe, S., Shlens, J., and Wojna, Z. Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567,
work page internal anchor Pith review Pith/arXiv arXiv
-
[20]
Rethinking the Inception Architecture for Computer Vision
URL http:// arxiv.org/abs/1512.00567. Szegedy, C., Ioffe, S., and Vanhoucke, V . Inception-v4, inception-resnet and the impact of residual connections on learning. CoRR, abs/1602.07261,
work page internal anchor Pith review Pith/arXiv arXiv
-
[21]
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
URL http: //arxiv.org/abs/1602.07261. Wang, Z., Bovik, A., Rahim Sheikh, H., and Simoncelli, E. Image quality assessment: From error visibility to struc- tural similarity. Image Processing, IEEE Transactions on, 13:600 – 612, 05
work page internal anchor Pith review Pith/arXiv arXiv
-
[22]
doi: 10.1109/TIP.2003.819861. Appendix Original images Clip range - 10, avg. transferability - 49% Clip range - 20, avg. transferability - 63% Clip range - 30, avg. transferability - 71% Clip range - 40, avg. transferability - 77% Clip range - 50, avg. transferability - 83% Clip range - 60, avg. transferability - 87% Clip range - 70, avg. transferability ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.