S⁴ST: A Strong, Self-transferable, faSt, and Simple Scale Transformation for Transferable Targeted Attack
Pith reviewed 2026-05-23 19:20 UTC · model grok-4.3
The pith
Attacking simple scaling transformations uniquely enhances targeted transferability under strict black-box conditions
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Attacking simple scaling transformations uniquely enhances targeted transferability, outperforming other basic transformations and rivaling leading complex methods. Geometric and color transformations exhibit high internal redundancy despite weak inter-category correlations. The S4ST method integrates dimensionally consistent scaling, complementary low-redundancy transformations, and block-wise operations to deliver state-of-the-art effectiveness-efficiency balance in data-free settings. Scaling's effectiveness stems from visual data's multi-scale nature and ubiquitous scale augmentation during training, rendering such augmentation a double-edged sword. The framework generalizes to medical,
What carries the argument
Blind estimation measures of self-alignment and self-transferability that rank per-transformation effectiveness and cross-transformation correlations without any victim-model feedback or data
If this is right
- S4ST achieves state-of-the-art targeted transferability without relying on victim data or feedback
- Scaling's advantage arises directly from the multi-scale structure of visual data and common training augmentations
- Geometric and color transformations largely overlap with each other and add little new value when combined
- The same S4ST design transfers effectively to medical imaging and face verification tasks
Where Pith is reading between the lines
- Other common training augmentations such as rotation or brightness shifts may also create exploitable transfer gaps if examined with the same blind measures
- Model trainers could reduce vulnerability by deliberately varying scale augmentation strength or omitting it for certain layers
- The redundancy pattern among transformations offers a general rule for pruning transformation sets in any future attack pipeline
Load-bearing premise
The blind estimation measures of self-alignment and self-transferability accurately reflect per-transformation effectiveness and cross-transformation correlations under strict black-box constraints without any victim-model feedback or data
What would settle it
A controlled experiment that applies the S4ST attack to a new set of victim models never seen during measure development and checks whether scaling-based attacks actually transfer at higher rates than attacks using other basic transformations
Figures
read the original abstract
Transferable Targeted Attacks (TTAs) face significant challenges due to severe overfitting to surrogate models. Recent breakthroughs heavily rely on large-scale training data of victim models, while data-free solutions, \textit{i.e.}, image transformation-involved gradient optimization, often depend on black-box feedback for method design and tuning. These dependencies violate black-box transfer settings and compromise threat evaluation fairness. In this paper, we propose two blind estimation measures, self-alignment and self-transferability, to analyze per-transformation effectiveness and cross-transformation correlations under strict black-box constraints. Our findings challenge conventional assumptions: (1) Attacking simple scaling transformations uniquely enhances targeted transferability, outperforming other basic transformations and rivaling leading complex methods; (2) Geometric and color transformations exhibit high internal redundancy despite weak inter-category correlations. These insights drive the design and tuning of S$^4$ST (Strong, Self-transferable, faSt, Simple Scale Transformation), which integrates dimensionally consistent scaling, complementary low-redundancy transformations, and block-wise operations. Extensive evaluations across diverse architectures, training distributions, and tasks show that S$^{4}$ST achieves state-of-the-art effectiveness-efficiency balance without data dependency. We reveal that scaling's effectiveness stems from visual data's multi-scale nature and ubiquitous scale augmentation during training, rendering such augmentation a double-edged sword. Further validations on medical imaging and face verification confirm the framework's strong generalization.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces two blind estimation measures—self-alignment and self-transferability—computed solely from surrogate gradients and image statistics under strict no-feedback black-box constraints. These measures are used to analyze per-transformation effectiveness and cross-transformation correlations, leading to the claim that simple scaling transformations uniquely enhance targeted transferability (outperforming other basic transformations and rivaling complex methods). This insight drives the design of S⁴ST, which combines dimensionally consistent scaling, low-redundancy transformations, and block-wise operations. The method is reported to achieve SOTA effectiveness-efficiency balance across architectures, distributions, and tasks (including medical imaging and face verification) without data dependency or victim-model feedback. The paper attributes scaling's effectiveness to the multi-scale nature of visual data and ubiquitous scale augmentation in training.
Significance. If the blind measures prove predictive of actual black-box targeted transfer rates, the work would provide a data-free, feedback-free framework for designing and tuning TTAs, challenging reliance on complex methods or large victim-model datasets. It offers explicit credit for reproducible empirical evaluations across diverse settings and for identifying a simple transformation (scaling) that rivals leading approaches. The generalization experiments on non-standard domains strengthen the case for broader applicability.
major comments (2)
- [Abstract / measure definitions] Abstract and the sections introducing the measures (around the proposed self-alignment/self-transferability definitions): the central claim that scaling uniquely boosts targeted transferability rests on these two measures correctly ranking transformations and revealing correlations. However, no experiment directly correlates the blind measure rankings with measured black-box targeted success rates on held-out victim models; the measures are used both to discover the scaling insight and to tune S⁴ST, creating a potential circularity if the measures are only weakly correlated with true transferability.
- [Evaluation sections (S⁴ST results)] The experimental sections reporting S⁴ST performance: because S⁴ST hyperparameters and transformation choices are selected via the unvalidated blind measures, the reported SOTA results on victim models may not demonstrate that the scaling insight generalizes beyond the surrogate used to compute the measures.
minor comments (2)
- [Title] The title contains inconsistent capitalization (faSt); consider standardizing to 'S⁴ST: A Strong, Self-transferable, Fast, and Simple Scale Transformation for Transferable Targeted Attack'.
- [Method section] Notation for the two new measures should be introduced with explicit formulas or pseudocode in the main text rather than relying solely on prose descriptions.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the significance of the blind measures and the generalization experiments. We address the two major comments point-by-point below and outline the revisions we will make.
read point-by-point responses
-
Referee: [Abstract / measure definitions] Abstract and the sections introducing the measures (around the proposed self-alignment/self-transferability definitions): the central claim that scaling uniquely boosts targeted transferability rests on these two measures correctly ranking transformations and revealing correlations. However, no experiment directly correlates the blind measure rankings with measured black-box targeted success rates on held-out victim models; the measures are used both to discover the scaling insight and to tune S⁴ST, creating a potential circularity if the measures are only weakly correlated with true transferability.
Authors: We agree that a direct correlation study between the blind measure rankings and actual black-box targeted transfer rates on held-out victim models is absent from the current manuscript and would strengthen the claims. The measures are strictly computed from surrogate gradients and image statistics under no-feedback constraints, and the final S⁴ST results on diverse victim models provide supporting evidence; however, this does not replace an explicit validation of the measures' predictive power. In the revision we will add a dedicated experiment that ranks transformations by the two measures on the surrogate and directly compares those rankings against measured targeted success rates on multiple held-out victim models. This addition will address the circularity concern. revision: yes
-
Referee: [Evaluation sections (S⁴ST results)] The experimental sections reporting S⁴ST performance: because S⁴ST hyperparameters and transformation choices are selected via the unvalidated blind measures, the reported SOTA results on victim models may not demonstrate that the scaling insight generalizes beyond the surrogate used to compute the measures.
Authors: We acknowledge the concern that hyperparameter and transformation selection via the surrogate-derived measures could limit claims of generalization. While S⁴ST is evaluated on held-out victim models across architectures, distributions, and tasks (including medical imaging and face verification), the selection process itself was not cross-validated against victim performance. In the revised manuscript we will expand the evaluation sections with (i) an explicit correlation analysis between measure rankings and victim transfer rates and (ii) additional results obtained by re-selecting transformations using an alternative surrogate, thereby demonstrating that the scaling insight is not surrogate-specific. revision: yes
Circularity Check
No significant circularity; measures presented as independent blind estimators
full rationale
The paper introduces self-alignment and self-transferability as new blind estimation measures computed from surrogate gradients and image statistics under strict black-box constraints. These are used to analyze transformations and derive the scaling insight that then informs S⁴ST design. No equations, definitions, or self-citations are shown that reduce the measures to fitted parameters, self-referential predictions, or load-bearing prior results by the same authors. The derivation chain remains self-contained against external benchmarks, with the measures functioning as independent analysis tools rather than quantities defined in terms of the target transferability outcomes.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Visual data possesses a multi-scale nature and ubiquitous scale augmentation is used during model training
invented entities (2)
-
self-alignment measure
no independent evidence
-
self-transferability measure
no independent evidence
Forward citations
Cited by 1 Pith paper
-
Light-ResKAN: A Parameter-Sharing Lightweight KAN with Gram Polynomials for Efficient SAR Image Recognition
Light-ResKAN reaches 99.09% accuracy on MSTAR SAR images with 82.9 times fewer FLOPs and 163.78 times fewer parameters than VGG16 by combining KAN convolutions, Gram polynomials, and channel-wise parameter sharing.
Reference graph
Works this paper leans on
-
[1]
Intriguing properties of neural networks,
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” in International Conference on Learning Representations, 2014. 1
work page 2014
-
[2]
Towards deep learning models resistant to adversarial attacks,
A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in International Conference on Learning Representations, 2018. 1, 3, 11, 14
work page 2018
-
[3]
Ensemble adversarial training: Attacks and defenses,
F. Tram `er, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P . McDaniel, “Ensemble adversarial training: Attacks and defenses,” in International Conference on Learning Representations, 2018. 1, 11
work page 2018
-
[4]
Revisiting auc-oriented adversarial training with loss-agnostic perturbations,
Z. Yang, Q. Xu, W. Hou, S. Bao, Y. He, X. Cao, and Q. Huang, “Revisiting auc-oriented adversarial training with loss-agnostic perturbations,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023. 1
work page 2023
-
[5]
Improving fast adversarial training with prior-guided knowledge,
X. Jia, Y. Zhang, X. Wei, B. Wu, K. Ma, J. Wang, and X. Cao, “Improving fast adversarial training with prior-guided knowledge,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. 1
work page 2024
-
[6]
Explaining and harnessing adversarial examples,
I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in International Conference on Learning Representations), 2015. 1
work page 2015
-
[7]
A survey on transferability of adversarial examples across deep neural networks,
J. Gu, X. Jia, P . de Jorge, W. Yu, X. Liu, A. Ma, Y. Xun, A. Hu, A. Khakzar, Z. Li et al., “A survey on transferability of adversarial examples across deep neural networks,” TMLR, 2024. 1
work page 2024
-
[8]
Boosting adversarial attacks with momentum,
Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li, “Boosting adversarial attacks with momentum,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 2018. 1, 3, 10, 11
work page 2018
-
[9]
Evading defenses to transferable adversarial examples by translation-invariant attacks,
Y. Dong, T. Pang, H. Su, and J. Zhu, “Evading defenses to transferable adversarial examples by translation-invariant attacks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4312–4321. 1, 3, 4, 10, 11
work page 2019
-
[10]
Improving transferability of adversarial examples with input diversity,
C. Xie, Z. Zhang, Y. Zhou, S. Bai, J. Wang, Z. Ren, and A. L. Yuille, “Improving transferability of adversarial examples with input diversity,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2730–2739. 1, 2, 3, 4, 8, 9, 10
work page 2019
-
[11]
https://github.com/pytorch/vision/tree/main/torchvision/ transforms SUBMITTED TO IEEE TPAMI 15
-
[12]
Structure invariant transformation for better adversarial transferability,
X. Wang, Z. Zhang, and J. Zhang, “Structure invariant transformation for better adversarial transferability,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, October 2023, pp. 4607–
work page 2023
-
[13]
1, 4, 7, 8, 9, 10, 11
-
[14]
Boosting adversarial transferability by block shuffle and rotation,
K. Wang, X. He, W. Wang, and X. Wang, “Boosting adversarial transferability by block shuffle and rotation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
-
[15]
Boosting adversarial transferability across model genus by deformation-constrained warping,
Q. Lin, C. Luo, Z. Niu, X. He, W. Xie, Y. Hou, L. Shen, and S. Song, “Boosting adversarial transferability across model genus by deformation-constrained warping,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 4, pp. 3459–3467, Mar
-
[16]
Understanding adversarial examples from the mutual influence of images and perturbations,
C. Zhang, P . Benz, T. Imtiaz, and I. S. Kweon, “Understanding adversarial examples from the mutual influence of images and perturbations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020. 1, 3, 9, 11, 13
work page 2020
-
[17]
On generating transferable targeted perturbations,
M. Naseer, S. Khan, M. Hayat, F. S. Khan, and F. Porikli, “On generating transferable targeted perturbations,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, October 2021, pp. 7708–
work page 2021
-
[18]
Exploring non-target knowledge for improving ensemble universal adversarial attacks,
J. Weng, Z. Luo, Z. Zhong, D. Lin, and S. Li, “Exploring non-target knowledge for improving ensemble universal adversarial attacks,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 3, 2023, pp. 2768–2775. 1, 3
work page 2023
-
[19]
Towards transferable targeted adversarial examples,
Z. Wang, H. Yang, Y. Feng, P . Sun, H. Guo, Z. Zhang, and K. Ren, “Towards transferable targeted adversarial examples,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2023, pp. 20 534–20 543. 1
work page 2023
-
[20]
Minimizing maximum model discrepancy for transferable black-box targeted attacks,
A. Zhao, T. Chu, Y. Liu, W. Li, J. Li, and L. Duan, “Minimizing maximum model discrepancy for transferable black-box targeted attacks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2023, pp. 8153–8162. 1, 4, 9, 10, 11, 12
work page 2023
-
[21]
Rethinking adversarial transferability from a data distribution perspective,
Y. Zhu, J. Sun, and Z. Li, “Rethinking adversarial transferability from a data distribution perspective,” in International Conference on Learning Representations, 2022. 1, 3, 9, 11
work page 2022
-
[22]
Toward understanding and boosting adversarial transferability from a distribution perspective,
Y. Zhu, Y. Chen, X. Li, K. Chen, Y. He, X. Tian, B. Zheng, Y. Chen, and Q. Huang, “Toward understanding and boosting adversarial transferability from a distribution perspective,” IEEE Transactions on Image Processing, vol. 31, pp. 6487–6501, 2022. 1, 3, 9, 11
work page 2022
-
[23]
Improving transferable targeted adversarial attacks with model self-enhancement,
H. Wu, G. Ou, W. Wu, and Z. Zheng, “Improving transferable targeted adversarial attacks with model self-enhancement,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024, pp. 24 615–24 624. 1, 9, 11, 12
work page 2024
-
[24]
Delving into transferable adversarial examples and black-box attacks,
Y. Liu, X. Chen, C. Liu, and D. Song, “Delving into transferable adversarial examples and black-box attacks,” in International Conference on Learning Representations, 2017. 1
work page 2017
-
[25]
Towards transferable targeted attack,
M. Li, C. Deng, T. Li, J. Yan, X. Gao, and H. Huang, “Towards transferable targeted attack,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020. 2, 3
work page 2020
-
[26]
On success and simplicity: A second look at transferable targeted attacks,
Z. Zhao, Z. Liu, and M. Larson, “On success and simplicity: A second look at transferable targeted attacks,” in Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P . Liang, and J. W. Vaughan, Eds., vol. 34. Curran Associates, Inc., 2021, pp. 6115–6128. 2, 3, 4, 8, 13
work page 2021
-
[27]
Logit margin matters: Improving transferable targeted adversarial attack by logit calibration,
J. Weng, Z. Luo, S. Li, N. Sebe, and Z. Zhong, “Logit margin matters: Improving transferable targeted adversarial attack by logit calibration,” IEEE Transactions on Information Forensics and Security, vol. 18, pp. 3561–3574, 2023. 2, 3, 4
work page 2023
-
[28]
On single- model transferable targeted attacks: A closer look at decision-level optimization,
X. Sun, G. Cheng, H. Li, L. Pei, and J. Han, “On single- model transferable targeted attacks: A closer look at decision-level optimization,” IEEE Transactions on Image Processing, vol. 32, pp. 2972–2984, 2023. 2, 3, 4
work page 2023
-
[29]
Rethinking data augmentation for improving transferable targeted attacks,
Z. Wei, J. Chen, Z. Wu, and Y.-G. Jiang, “Rethinking data augmentation for improving transferable targeted attacks,” 2023. [Online]. Available: https://openreview.net/forum?id=go0P5gsBE2 2, 3, 4, 9, 10, 11
work page 2023
-
[30]
J. Zou, Z. Pan, J. Qiu, X. Liu, T. Rui, and W. Li, “Improving the transferability of adversarial examples with resized-diverse-inputs, diversity-ensemble and region fitting,” in European Conference on Computer Vision. Springer International Publishing, 2020, pp. 563–
work page 2020
-
[31]
J. Byun, M.-J. Kwon, S. Cho, Y. Kim, and C. Kim, “Introducing competition to boost the transferability of targeted adversarial examples through clean feature mixup,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2023, pp. 24 648–24 657. 2, 3, 9, 11, 12
work page 2023
-
[32]
Adversarial examples in the physical world,
A. Kurakin, I. Goodfellow, S. Bengio et al., “Adversarial examples in the physical world,” in International Conference on Learning Representations, 2017. 3
work page 2017
-
[33]
Randaugment: Practical automated data augmentation with a reduced search space,
E. D. Cubuk, B. Zoph, J. Shlens, and Q. V . Le, “Randaugment: Practical automated data augmentation with a reduced search space,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020, pp. 702–703. 3
work page 2020
-
[34]
Improving the transferability of targeted adversarial examples through object- based diverse input,
J. Byun, S. Cho, M.-J. Kwon, H.-S. Kim, and C. Kim, “Improving the transferability of targeted adversarial examples through object- based diverse input,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2022, pp. 15 244–15 253. 3, 4, 8, 9, 10, 11, 14
work page 2022
-
[35]
Enhancing the self-universality for transferable targeted attacks,
Z. Wei, J. Chen, Z. Wu, and Y.-G. Jiang, “Enhancing the self-universality for transferable targeted attacks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2023, pp. 12 281–12 290. 3, 4, 8, 9, 11
work page 2023
-
[36]
N. Inkawhich, K. Liang, B. Wang, M. Inkawhich, L. Carin, and Y. Chen, “Perturbing across the feature hierarchy to improve standard and strict blackbox attack transferability,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33. Curran Associates, Inc., 2020, pp. 20 791– 20 801. 3
work page 2020
-
[37]
A little robustness goes a long way: Leveraging robust features for targeted transfer attacks,
J. Springer, M. Mitchell, and G. Kenyon, “A little robustness goes a long way: Leveraging robust features for targeted transfer attacks,” Advances in Neural Information Processing Systems, vol. 34, pp. 9759– 9773, 2021. 3, 9, 11
work page 2021
-
[38]
Sharpness-aware minimization for efficiently improving generalization,
P . Foret, A. Kleiner, H. Mobahi, and B. Neyshabur, “Sharpness-aware minimization for efficiently improving generalization,” in International Conference on Learning Representations, 2021. 3
work page 2021
-
[39]
Boosting transferability of targeted adversarial examples via hierarchical generative networks,
X. Yang, Y. Dong, T. Pang, H. Su, and J. Zhu, “Boosting transferability of targeted adversarial examples via hierarchical generative networks,” in European Conference on Computer Vision. Springer, 2022, pp. 725–742. 4, 9, 11
work page 2022
-
[40]
A survey on image data augmentation for deep learning,
C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmentation for deep learning,” Journal of big data, vol. 6, no. 1, pp. 1–48, 2019. 4
work page 2019
-
[41]
Data augmentation for improving deep learning in image classification problem,
A. Mikołajczyk and M. Grochowski, “Data augmentation for improving deep learning in image classification problem,” in 2018 international interdisciplinary PhD workshop (IIPhDW). IEEE, 2018, pp. 117–122. 4
work page 2018
-
[42]
Improving deep learning with generic data augmentation,
L. Taylor and G. Nitschke, “Improving deep learning with generic data augmentation,” in 2018 IEEE symposium series on computational intelligence (SSCI). IEEE, 2018, pp. 1542–1547. 4
work page 2018
-
[43]
Automa: Towards automatic model augmentation for transferable adversarial attacks,
H. Yuan, Q. Chu, F. Zhu, R. Zhao, B. Liu, and N. Yu, “Automa: Towards automatic model augmentation for transferable adversarial attacks,” IEEE Transactions on Multimedia, vol. 25, pp. 203–213, 2021. 4
work page 2021
-
[44]
Adaptive image transformations for transfer-based adversarial attack,
Z. Yuan, J. Zhang, and S. Shan, “Adaptive image transformations for transfer-based adversarial attack,” in European Conference on Computer Vision. Springer, 2022, pp. 1–17. 4
work page 2022
-
[45]
Learning to transform dynamically for better adversarial transferability,
R. Zhu, Z. Zhang, S. Liang, Z. Liu, and C. Xu, “Learning to transform dynamically for better adversarial transferability,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024, pp. 24 273–24 283. 4
work page 2024
-
[46]
Nesterov accelerated gradient and scale invariance for adversarial attacks,
J. Lin, C. Song, K. He, L. Wang, and J. E. Hopcroft, “Nesterov accelerated gradient and scale invariance for adversarial attacks,” in International Conference on Learning Representations, 2020. 4, 9, 10
work page 2020
-
[47]
Admix: Enhancing the transferability of adversarial attacks,
X. Wang, X. He, J. Wang, and K. He, “Admix: Enhancing the transferability of adversarial attacks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, October 2021, pp. 16 158– 16 167. 4, 9, 10
work page 2021
-
[48]
Frequency domain model augmentation for adversarial attack,
Y. Long, Q. Zhang, B. Zeng, L. Gao, X. Liu, J. Zhang, and J. Song, “Frequency domain model augmentation for adversarial attack,” in European Conference on Computer Vision. Springer International Publishing, 2022, pp. 549–566. 4, 7, 9, 10
work page 2022
-
[49]
Universal adversarial attack on attention and the resulting dataset damagenet,
S. Chen, Z. He, C. Sun, J. Yang, and X. Huang, “Universal adversarial attack on attention and the resulting dataset damagenet,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 4, pp. 2188–2197, 2020. 4
work page 2020
-
[50]
Feature importance-aware transferable adversarial attacks,
Z. Wang, H. Guo, Z. Zhang, W. Liu, Z. Qin, and K. Ren, “Feature importance-aware transferable adversarial attacks,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 7639–7648. 4
work page 2021
-
[51]
Grad-cam: Visual explanations from deep networks via gradient-based localization,
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 618–626. 4
work page 2017
-
[52]
Analysis of representations for domain adaptation,
S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira, “Analysis of representations for domain adaptation,” NeurIPS, 2006. 5
work page 2006
-
[53]
Position: The platonic representation hypothesis,
M. Huh, B. Cheung, T. Wang, and P . Isola, “Position: The platonic representation hypothesis,” in Forty-first International Conference on Machine Learning. 6
-
[54]
Learning transferable adversarial examples via ghost networks,
Y. Li, S. Bai, Y. Zhou, C. Xie, Z. Zhang, and A. Yuille, “Learning transferable adversarial examples via ghost networks,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 11 458–11 465, Apr. 2020. 6 SUBMITTED TO IEEE TPAMI 16
work page 2020
-
[55]
Deep residual learning for image recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 2016. 8
work page 2016
-
[56]
Mobilenetv2: Inverted residuals and linear bottlenecks,
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 2018. 9
work page 2018
-
[57]
EfficientNet: Rethinking model scaling for convolutional neural networks,
M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” in Proceedings of the International Conference on Machine Learning, vol. 97. PMLR, 09–15 Jun 2019, pp. 6105–6114. 9
work page 2019
-
[58]
Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2022, pp. 11 976–11 986. 9
work page 2022
-
[59]
Rethinking the inception architecture for computer vision,
C. Szegedy, V . Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June
-
[60]
Inception-v4, inception-resnet and the impact of residual connections on learning,
C. Szegedy, S. Ioffe, V . Vanhoucke, and A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, Feb. 2017. 9
work page 2017
-
[61]
Xception: Deep learning with depthwise separable convolutions,
F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, July 2017. 9
work page 2017
-
[62]
An image is worth 16x16 words: Transformers for image recognition at scale,
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations, 2021. 9, 10
work page 2021
-
[63]
Swin transformer: Hierarchical vision transformer using shifted windows,
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, October 2021, pp. 10 012–10 022. 9
work page 2021
-
[64]
Maxvit: Multi-axis vision transformer,
Z. Tu, H. Talebi, H. Zhang, F. Yang, P . Milanfar, A. Bovik, and Y. Li, “Maxvit: Multi-axis vision transformer,” in European Conference on Computer Vision. Springer International Publishing, 2022, pp. 459–
work page 2022
-
[65]
Twins: Revisiting the design of spatial attention in vision transformers,
X. Chu, Z. Tian, Y. Wang, B. Zhang, H. Ren, X. Wei, H. Xia, and C. Shen, “Twins: Revisiting the design of spatial attention in vision transformers,” in Advances in Neural Information Processing Systems, vol. 34. Curran Associates, Inc., 2021, pp. 9355–9366. 9
work page 2021
-
[66]
Rethinking spatial dimensions of vision transformers,
B. Heo, S. Yun, D. Han, S. Chun, J. Choe, and S. J. Oh, “Rethinking spatial dimensions of vision transformers,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, October 2021, pp. 11 936–11 945. 9
work page 2021
-
[67]
K. Han, A. Xiao, E. Wu, J. Guo, C. XU, and Y. Wang, “Transformer in transformer,” in Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P . Liang, and J. W. Vaughan, Eds., vol. 34. Curran Associates, Inc., 2021, pp. 15 908–15 919. 9
work page 2021
-
[68]
Training data-efficient image transformers & distillation through attention,
H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jegou, “Training data-efficient image transformers & distillation through attention,” in Proceedings of the 38th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 139. PMLR, July 2021, pp. 10 347–10 357. 9
work page 2021
-
[69]
R. Wightman, “Pytorch image models,” https://github.com/ rwightman/pytorch-image-models, 2019. 9
work page 2019
-
[70]
Pytorch: An imperative style, high- performance deep learning library,
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high- performance deep learning library,” in Advances in Neural Information Processing ...
work page 2019
-
[71]
Densely connected convolutional networks,
G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, July
-
[72]
Very deep convolutional networks for large-scale image recognition,
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations, 2015. 10
work page 2015
-
[73]
Augmix: A simple method to improve robustness and uncertainty under data shift,
D. Hendrycks, N. Mu, E. D. Cubuk, B. Zoph, J. Gilmer, and B. Lakshminarayanan, “Augmix: A simple method to improve robustness and uncertainty under data shift,” in International Conference on Learning Representations, 2020. 10, 11
work page 2020
-
[74]
R. Geirhos, P . Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann, and W. Brendel, “Imagenet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness.” in International Conference on Learning Representations, 2019. 11
work page 2019
-
[75]
Do adversarially robust imagenet models transfer better?
H. Salman, A. Ilyas, L. Engstrom, A. Kapoor, and A. Madry, “Do adversarially robust imagenet models transfer better?” in Advances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., 2020, pp. 3533–3545. 11
work page 2020
-
[76]
CogVLM2: Visual Language Models for Image and Video Understanding
W. Hong, W. Wang, M. Ding, W. Yu, Q. Lv, Y. Wang, Y. Cheng, S. Huang, J. Ji, Z. Xue et al., “Cogvlm2: Visual language models for image and video understanding,” arXiv preprint arXiv:2408.16500, 2024. 12
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[77]
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
G. Team, P . Georgiev, V . I. Lei, R. Burnell, L. Bai, A. Gulati, G. Tanzer, D. Vincent, Z. Pan, S. Wang et al., “Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context,” arXiv preprint arXiv:2403.05530, 2024. 12
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[78]
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
X. Chen, Z. Wu, X. Liu, Z. Pan, W. Liu, Z. Xie, X. Yu, and C. Ruan, “Janus-pro: Unified multimodal understanding and generation with data and model scaling,” arXiv preprint arXiv:2501.17811, 2025. 12, 13
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[79]
G. Ortiz-Jim ´enez, A. Modas, S.-M. Moosavi-Dezfooli, and P . Frossard, “Optimism in the face of adversity: Understanding and improving deep learning through adversarial robustness,” Proceedings of the IEEE, vol. 109, no. 5, pp. 635–659, 2021. 14
work page 2021
-
[80]
Adversarial examples make strong poisons,
L. Fowl, M. Goldblum, P .-y. Chiang, J. Geiping, W. Czaja, and T. Goldstein, “Adversarial examples make strong poisons,” Advances in Neural Information Processing Systems, vol. 34, pp. 30 339–30 351,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.