pith. machine review for the scientific record. sign in

arxiv: 2605.03405 · v1 · submitted 2026-05-05 · 💻 cs.CV · cs.LG

TsallisPGD: Adaptive Gradient Weighting for Adversarial Attacks on Semantic Segmentation

Pith reviewed 2026-05-08 01:22 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords adversarial attacksemantic segmentationTsallis entropyprojected gradient descentdynamic schedulegradient weightingmodel robustness
0
0 comments X

The pith

TsallisPGD improves adversarial attacks on semantic segmentation by using a dynamic q schedule in Tsallis cross-entropy to adaptively weight pixel gradients.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Semantic segmentation requires flipping thousands of pixel predictions at once, making attacks harder than on classifiers. Standard cross-entropy loss overemphasizes already-misclassified pixels and slows optimization. Tsallis cross-entropy generalizes this loss with a parameter q that controls how gradients concentrate across pixels of different confidence levels. Sweeping q dynamically during optimization allows the attack to steer toward useful pixels at each step. A single schedule chosen once on validation data delivers the best average rank and stronger reductions in accuracy and mean IoU than several prior PGD variants, on both standard and robust models across three datasets.

Core claim

The paper claims that no fixed value of q is optimal across datasets, architectures, and budgets, but a dynamic schedule that varies q over the course of projected gradient descent steps produces a more effective attack. TsallisPGD built on this loss and schedule, with one validation-selected q trajectory, reduces accuracy and mIoU more than CEPGD, SegPGD, CosPGD, JSPGD, and MaskedPGD on Cityscapes, Pascal VOC, and ADE20K for both standard and adversarially robust models.

What carries the argument

Tsallis cross-entropy loss parameterized by q, combined with a dynamic schedule that sweeps q during PGD steps to reshape gradient concentration across pixels.

If this is right

  • The attack achieves the highest average rank in lowering accuracy and mIoU across all tested settings.
  • It works on both standard and robust models without per-instance or per-model retuning of the schedule.
  • Varying q steers the attack toward pixels at different confidence levels, avoiding over-focus on already-wrong pixels.
  • The method outperforms multiple existing segmentation-specific PGD variants on three standard datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same adaptive weighting idea could apply to other dense prediction tasks that use pixel-wise losses.
  • Robustness benchmarks may need to test against dynamically scheduled attacks rather than only fixed-loss variants.
  • Defenses trained against standard cross-entropy attacks might remain vulnerable to this form of gradient reshaping.
  • If the schedule generalizes further, it could simplify attack evaluation by removing the need for many separate hyperparameter searches.

Load-bearing premise

A single q schedule chosen on validation data will stay effective on new models, datasets, and perturbation budgets without further retuning.

What would settle it

A new segmentation model or dataset on which the fixed validation schedule produces weaker accuracy and mIoU reductions than a standard cross-entropy PGD attack.

Figures

Figures reproduced from arXiv: 2605.03405 by Alexander Matyasko, Indriyati Atmosukarto, Wei Zhang, Xin Lou.

Figure 1
Figure 1. Figure 1: Comparison of per-pixel gradient weightings. Best viewed in color. We compare per-pixel gradient weightings for different attacks as a function of the predicted probability py of the ground-truth class. Varying q in Tsallis cross-entropy shifts the location of the gradient peak, enabling targeted optimization over pixels at different confidence levels. in Section V-C, we fix this validation-selected schedu… view at source ↗
read the original abstract

Attacking semantic segmentation models is significantly harder than image classification models because an attacker must flip thousands of pixel predictions simultaneously. Standard pixel-wise cross-entropy (CE) is ill-suited to this setting: it tends to overemphasize already-misclassified pixels, which slows optimization and overstates model robustness. To address these issues, we introduce TsallisPGD, an adversarial attack built on the Tsallis cross-entropy, a generalization of CE parameterized by $q$, which adaptively reshapes the gradient landscape by controlling gradient concentration across pixels. By varying $q$, we steer the attack toward pixels at different confidence levels. We first show that no single fixed-$q$ is universally optimal, as its effectiveness depends on the dataset, model architecture, and perturbation budget. Motivated by this, we propose a dynamic $q$-schedule that sweeps $q$ during optimization. Extensive experiments on Cityscapes, Pascal VOC, and ADE20K show that TsallisPGD, using a single validation-selected schedule, achieves the best average attack rank across all evaluated settings and improves over CEPGD, SegPGD, CosPGD, JSPGD, and MaskedPGD in reducing accuracy and mIoU on both standard and robust models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces TsallisPGD, an adversarial attack for semantic segmentation that replaces pixel-wise cross-entropy with Tsallis cross-entropy parameterized by q. By dynamically sweeping q during PGD optimization according to a single schedule chosen once on a validation set, the method claims to adaptively concentrate gradients on pixels at varying confidence levels. Experiments on Cityscapes, Pascal VOC, and ADE20K demonstrate that this validation-selected schedule yields the best average attack rank, outperforming CEPGD, SegPGD, CosPGD, JSPGD, and MaskedPGD in accuracy and mIoU reduction on both standard and robust models.

Significance. If the reported gains hold under rigorous statistical controls and the schedule generalizes without per-dataset retuning, the work would supply a practical, parameter-light improvement to robustness evaluation for dense-prediction tasks where standard CE is known to be suboptimal. The explicit demonstration that no fixed q is universally optimal is a useful empirical contribution.

major comments (3)
  1. [Experiments / Results] Experimental protocol (results section): the selection procedure for the single validation-chosen q-schedule is not described with sufficient detail (e.g., exact validation-set size, whether multiple candidate schedules were evaluated, or whether any test-set statistics influenced the final choice). This information is load-bearing for the central claim that one fixed dynamic schedule works across all three datasets and both model types without implicit overfitting.
  2. [Abstract and Results tables] Quantitative reporting (abstract and results tables): no standard deviations, number of random seeds, or statistical significance tests accompany the reported accuracy/mIoU reductions or average-rank improvements. Without these, it is impossible to determine whether the claimed outperformance over the five baselines is robust or could be explained by run-to-run variance.
  3. [Method and Ablations] Ablation on fixed-q versus dynamic schedule (method/ablations subsection): the paper must clarify whether the dynamic schedule's advantage is isolated from the particular functional form of the sweep chosen on validation; otherwise the comparison risks conflating the benefit of sweeping q with the benefit of the specific sweep that was selected.
minor comments (2)
  1. [Abstract] The abstract states that q 'steers the attack toward pixels at different confidence levels' but does not give the explicit functional dependence of the gradient weighting on q; a short equation or reference to the Tsallis loss definition would improve clarity.
  2. [Method] Notation for the dynamic schedule (e.g., how q(t) evolves with iteration t) should be introduced once in the method section and used consistently in all figures and tables.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We appreciate the referee's insightful comments, which have helped us identify areas for improvement in the manuscript. We provide point-by-point responses below and plan to incorporate revisions to address the concerns raised.

read point-by-point responses
  1. Referee: [Experiments / Results] Experimental protocol (results section): the selection procedure for the single validation-chosen q-schedule is not described with sufficient detail (e.g., exact validation-set size, whether multiple candidate schedules were evaluated, or whether any test-set statistics influenced the final choice). This information is load-bearing for the central claim that one fixed dynamic schedule works across all three datasets and both model types without implicit overfitting.

    Authors: We agree that additional details on the schedule selection process are necessary to support our claims. In the revised manuscript, we will provide a comprehensive description of the validation procedure, including the sizes of the validation sets used for each dataset, the specific candidate schedules that were evaluated, and explicit confirmation that the final choice was made without any access to or influence from test-set statistics. This will ensure transparency and address concerns about potential overfitting. revision: yes

  2. Referee: [Abstract and Results tables] Quantitative reporting (abstract and results tables): no standard deviations, number of random seeds, or statistical significance tests accompany the reported accuracy/mIoU reductions or average-rank improvements. Without these, it is impossible to determine whether the claimed outperformance over the five baselines is robust or could be explained by run-to-run variance.

    Authors: We acknowledge that the current version lacks measures of variability and statistical testing. We will update the abstract and all results tables to include standard deviations computed over multiple independent runs with different random seeds. Additionally, we will perform and report statistical significance tests (such as Wilcoxon signed-rank tests) to quantify the robustness of the observed improvements over the baselines. revision: yes

  3. Referee: [Method and Ablations] Ablation on fixed-q versus dynamic schedule (method/ablations subsection): the paper must clarify whether the dynamic schedule's advantage is isolated from the particular functional form of the sweep chosen on validation; otherwise the comparison risks conflating the benefit of sweeping q with the benefit of the specific sweep that was selected.

    Authors: To address this, we will revise the ablations section to include a more explicit comparison. Specifically, we will report results for the dynamic schedule against not only a range of fixed q values but also against the best-performing fixed q selected via the same validation procedure. This will help isolate the benefit of the dynamic nature of the schedule from the particular form of the sweep. We believe this additional analysis will clarify that the advantage stems from the adaptivity rather than the specific parameterization. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method and claims are self-contained

full rationale

The paper defines TsallisPGD via the Tsallis cross-entropy (a known generalization) with a dynamic q-schedule chosen once on a held-out validation set and evaluated on separate test data across datasets and models. No equations reduce the reported attack improvements to fitted constants or prior outputs by construction. The observation that 'no single fixed-q is optimal' is an empirical finding used only to motivate the dynamic schedule; it does not feed back into the reported gains. No self-citations are load-bearing for the central empirical result, and the validation/test split prevents the 'prediction' from being the input by design. The derivation chain therefore remains independent of the final performance numbers.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the mathematical properties of Tsallis entropy for reshaping gradients and on the empirical observation that no fixed q is optimal. No new physical entities are introduced.

free parameters (1)
  • q schedule
    A dynamic sequence of q values is selected on a validation set; the specific sequence and selection criterion are not detailed in the abstract.
axioms (1)
  • domain assumption Tsallis cross-entropy generalizes standard cross-entropy and can be used to control gradient concentration across pixels
    Invoked in the abstract to justify adaptive weighting; treated as a known property of the Tsallis family.

pith-pipeline@v0.9.0 · 5530 in / 1188 out tokens · 53569 ms · 2026-05-08T01:22:58.353976+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references

  1. [1]

    Inception-v4, Inception-ResNet and the impact of resid- ual connections on learning,

    C. Szegedy, S. Ioffe, V . Vanhoucke, and A. Alemi, “Inception-v4, Inception-ResNet and the impact of resid- ual connections on learning,” inProc. AAAI Conf. Artif. Intell. (AAAI), 2017

  2. [2]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016

  3. [3]

    You only look once: Unified, real-time object detection,

    J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016

  4. [4]

    SSD: single shot multibox detector,

    W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y . Fu, and A. C. Berg, “SSD: single shot multibox detector,” inProc. Eur. Conf. Comput. Vis. (ECCV), 2016

  5. [5]

    DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,

    L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, 2018

  6. [6]

    SegFormer: simple and efficient design for semantic segmentation with transformers,

    E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “SegFormer: simple and efficient design for semantic segmentation with transformers,” inAdv. Neural Inf. Process. Syst. (NeurIPS), 2021

  7. [7]

    Digging into self-supervised monocular depth estima- tion,

    C. Godard, O. Mac Aodha, M. Firman, and G. J. Brostow, “Digging into self-supervised monocular depth estima- tion,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2019

  8. [8]

    AdaBins: depth estimation using adaptive bins,

    S. F. Bhat, I. Alhashim, and P. Wonka, “AdaBins: depth estimation using adaptive bins,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021

  9. [9]

    Optical flow estimation using a spatial pyramid network,

    A. Ranjan and M. J. Black, “Optical flow estimation using a spatial pyramid network,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017

  10. [10]

    RAFT: recurrent all-pairs field transforms for optical flow,

    Z. Teed and J. Deng, “RAFT: recurrent all-pairs field transforms for optical flow,” inProc. Eur. Conf. Comput. Vis. (ECCV), 2020

  11. [11]

    A survey of autonomous driving: Common practices and emerging technologies,

    E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda, “A survey of autonomous driving: Common practices and emerging technologies,”IEEE Access, vol. 8, 2020

  12. [12]

    A survey on deep learning in medical image analysis,

    G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. W. M. van der Laak, B. van Ginneken, and C. I. S ´anchez, “A survey on deep learning in medical image analysis,”Med. Image Anal., vol. 42, 2017

  13. [13]

    Measuring robustness to natural distribu- tion shifts in image classification,

    R. Taori, A. Dave, V . Shankar, N. Carlini, B. Recht, and L. Schmidt, “Measuring robustness to natural distribu- tion shifts in image classification,” inAdv. Neural Inf. Process. Syst. (NeurIPS), 2020

  14. [14]

    Do ImageNet classifiers generalize to ImageNet?

    B. Recht, R. Roelofs, L. Schmidt, and V . Shankar, “Do ImageNet classifiers generalize to ImageNet?” inProc. Int. Conf. Mach. Learn. (ICML), 2019

  15. [15]

    Intriguing properties of neural networks,

    C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Er- han, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” inProc. Int. Conf. Learn. Represent. (ICLR), 2014

  16. [16]

    Explaining and harnessing adversarial examples,

    I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” inProc. Int. Conf. Learn. Represent. (ICLR), 2015

  17. [17]

    Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,

    F. Croce and M. Hein, “Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,” inProc. Int. Conf. Mach. Learn. (ICML), 2020

  18. [18]

    Towards deep learning models resistant to adversarial attacks,

    A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” inProc. Int. Conf. Learn. Represent. (ICLR), 2018

  19. [19]

    Adversarial examples for semantic segmenta- tion and object detection,

    C. Xie, J. Wang, Z. Zhang, Y . Zhou, L. Xie, and A. Yuille, “Adversarial examples for semantic segmenta- tion and object detection,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2017

  20. [20]

    Fooling auto- mated surveillance cameras: Adversarial patches to attack person detection,

    S. Thys, W. Van Ranst, and T. Goedem ´e, “Fooling auto- mated surveillance cameras: Adversarial patches to attack person detection,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), 2019

  21. [21]

    PLA-LiDAR: physical laser attacks against LiDAR- based 3D object detection in autonomous vehicle,

    Z. Jin, X. Ji, Y . Cheng, B. Yang, C. Yan, and W. Xu, “PLA-LiDAR: physical laser attacks against LiDAR- based 3D object detection in autonomous vehicle,” in Proc. IEEE Symp. Secur. Privacy (S&P), 2023

  22. [22]

    Deep text classification can be fooled,

    B. Liang, H. Li, M. Su, P. Bian, X. Li, and W. Shi, “Deep text classification can be fooled,” inProc. Int. Joint Conf. Artif. Intell. (IJCAI), 2018

  23. [23]

    On the ro- bustness of semantic segmentation models to adversarial attacks,

    A. Arnab, O. Miksik, and P. H. S. Torr, “On the ro- bustness of semantic segmentation models to adversarial attacks,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018

  24. [24]

    SegPGD: an effective and efficient adversarial attack for evaluating and boosting segmentation robustness,

    J. Gu, H. Zhao, V . Tresp, and P. H. S. Torr, “SegPGD: an effective and efficient adversarial attack for evaluating and boosting segmentation robustness,” inProc. Eur. Conf. Comput. Vis. (ECCV), 2022

  25. [25]

    CosPGD: an efficient white-box adversarial attack for pixel-wise pre- diction tasks,

    S. Agnihotri, S. Jung, and M. Keuper, “CosPGD: an efficient white-box adversarial attack for pixel-wise pre- diction tasks,” inProc. Int. Conf. Mach. Learn. (ICML), 2024

  26. [26]

    Towards reliable evaluation and fast training of robust semantic segmenta- tion models,

    F. Croce, N. D. Singh, and M. Hein, “Towards reliable evaluation and fast training of robust semantic segmenta- tion models,” inProc. Eur. Conf. Comput. Vis. (ECCV), 2024

  27. [27]

    Adversarial examples in the physical world,

    A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” inProc. Int. Conf. Learn. Represent. Workshop (ICLR Workshop), 2017

  28. [28]

    Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples,

    A. Athalye, N. Carlini, and D. Wagner, “Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples,” inProc. Int. Conf. Mach. Learn. (ICML), 2018

  29. [29]

    Dynamic divide-and-conquer adversarial training for robust semantic segmentation,

    X. Xu, H. Zhao, and J. Jia, “Dynamic divide-and-conquer adversarial training for robust semantic segmentation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2021

  30. [30]

    The Pascal visual object classes (VOC) challenge,

    M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The Pascal visual object classes (VOC) challenge,”Int. J. Comput. Vis., vol. 88, no. 2, pp. 303–338, 2010

  31. [31]

    Semantic understanding of scenes through the ADE20K dataset,

    B. Zhou, H. Zhao, X. Puig, T. Xiao, S. Fidler, A. Bar- riuso, and A. Torralba, “Semantic understanding of scenes through the ADE20K dataset,”Int. J. Comput. Vis., vol. 127, no. 3, 2019

  32. [32]

    The Cityscapes dataset for semantic urban scene under- standing,

    M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. En- zweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The Cityscapes dataset for semantic urban scene under- standing,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016

  33. [33]

    Pyramid scene parsing network,

    H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” inProc. IEEE/CVF Conf. Com- put. Vis. Pattern Recognit. (CVPR), 2017

  34. [34]

    Unified perceptual parsing for scene understanding,

    T. Xiao, Y . Liu, B. Zhou, Y . Jiang, and J. Sun, “Unified perceptual parsing for scene understanding,” inProc. Eur. Conf. Comput. Vis. (ECCV), 2018

  35. [35]

    A ConvNet for the 2020s,

    Z. Liu, H. Mao, C.-Y . Wu, C. Feichtenhofer, T. Dar- rell, and S. Xie, “A ConvNet for the 2020s,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022

  36. [36]

    Seg- menter: Transformer for semantic segmentation,

    R. Strudel, R. Garcia, I. Laptev, and C. Schmid, “Seg- menter: Transformer for semantic segmentation,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2021

  37. [37]

    An image is worth 16×16 words: Transformers for image recognition at scale,

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weis- senborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Min- derer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16×16 words: Transformers for image recognition at scale,” inProc. Int. Conf. Learn. Represent. (ICLR), 2021