pith. sign in

arxiv: 2604.10715 · v1 · submitted 2026-04-12 · 💻 cs.CV

Defending against Patch-Based and Texture-Based Adversarial Attacks with Spectral Decomposition

Pith reviewed 2026-05-10 15:40 UTC · model grok-4.3

classification 💻 cs.CV
keywords adversarial defensepatch attackstexture attacksdiscrete wavelet transformspectral decompositionadversarial trainingrobustnessphysical-world attacks
0
0 comments X

The pith

Spectral decomposition via wavelets plus adversarial training defends image classifiers from patch and texture attacks even when the attacker adapts to the defense.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a defense called ASD that decomposes input images with discrete wavelet transforms to examine patterns at multiple frequency scales and spatial locations. This targets both localized patch attacks and broader texture attacks that can be physically printed and placed to fool detectors in real settings such as surveillance or autonomous vehicles. Pairing the spectral analysis with standard adversarial training creates a combined strategy that holds up when the attacker designs perturbations specifically to bypass the defense. A reader would care because these attacks are realizable in the physical world and threaten security-critical applications. If the approach works, it provides a way to strengthen existing models by adding frequency-based checks without changing their core architectures.

Core claim

ASD uses the multi-resolution and localization properties of the Discrete Wavelet Transform to capture both high-frequency fine-grained perturbations and low-frequency spatially pervasive perturbations from patch-based and texture-based attacks. When integrated with an off-the-shelf adversarial training model, this spectral analysis yields a comprehensive defense that achieves state-of-the-art robustness against strong adaptive adversaries specifically designed to counter the method.

What carries the argument

Adversarial Spectrum Defense (ASD), which applies Discrete Wavelet Transform (DWT) spectral decomposition to input images to isolate adversarial perturbations across frequency scales before feeding the result to an adversarially trained classifier.

If this is right

  • Classifiers gain improved resistance to physically realizable attacks in applications such as person detection for surveillance and autonomous systems.
  • The single framework addresses both localized high-frequency perturbations and spatially extensive low-frequency changes.
  • Performance against adaptive adversaries exceeds that of prior defense methods.
  • Off-the-shelf adversarially trained models can be augmented with spectral decomposition without requiring new network architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The localization property of the wavelet decomposition could be used to identify the spatial location of a patch attack within an image.
  • Similar frequency-domain checks might help against other input-manipulation threats where perturbations have characteristic scale signatures.
  • Combining spectral analysis with training-based robustness suggests that hybrid defenses can compound protection when each component targets different aspects of the attack.

Load-bearing premise

The spectral signatures of perturbations from patch and texture attacks differ enough from natural image content that DWT decomposition can separate them reliably, and this separation survives even when the attacker knows the defense and optimizes against it.

What would settle it

An adaptive attack that produces adversarial images whose DWT coefficients closely match those of clean images yet still cause the defended model to misclassify would show that the spectral separation does not hold.

Figures

Figures reproduced from arXiv: 2604.10715 by Wei Zhang, Xiao Li, Xiaolin Hu, Xinyu Chang, Yiming Zhu.

Figure 1
Figure 1. Figure 1: Spectral analysis of different physically realizable adversarial attacks [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of the DWT decomposition. Left: The 1D hierarchical [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the pipeline of the proposed ASD method. ASD has a masking module, called [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of adversarial examples with patch-based or texture-based [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The DWT amplitude of different physically realizable adversarial [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Frequency-amplitude division of ASD and AT. [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
read the original abstract

Adversarial examples present significant challenges to the security of Deep Neural Network (DNN) applications. Specifically, there are patch-based and texture-based attacks that are usually used to craft physical-world adversarial examples, posing real threats to security-critical applications such as person detection in surveillance and autonomous systems, because those attacks are physically realizable. Existing defense mechanisms face challenges in the adaptive attack setting, i.e., the attacks are specifically designed against them. In this paper, we propose Adversarial Spectrum Defense (ASD), a defense mechanism that leverages spectral decomposition via Discrete Wavelet Transform (DWT) to analyze adversarial patterns across multiple frequency scales. The multi-resolution and localization capability of DWT enables ASD to capture both high-frequency (fine-grained) and low-frequency (spatially pervasive) perturbations. By integrating this spectral analysis with the off-the-shelf Adversarial Training (AT) model, ASD provides a comprehensive defense strategy against both patch-based and texture-based adversarial attacks. Extensive experiments demonstrate that ASD+AT achieved state-of-the-art (SOTA) performance against various attacks, outperforming the APs of previous defense methods by 21.73%, in the face of strong adaptive adversaries specifically designed against ASD. Code available at https://github.com/weiz0823/adv-spectral-defense .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes Adversarial Spectrum Defense (ASD), which applies Discrete Wavelet Transform (DWT) to decompose images across multiple frequency scales and localize both high-frequency fine-grained and low-frequency pervasive perturbations from patch-based and texture-based adversarial attacks. ASD is integrated with off-the-shelf adversarial training (AT) to form a combined defense, with the abstract claiming state-of-the-art robustness that outperforms prior methods by 21.73% under strong adaptive adversaries specifically designed against ASD. Code is made available.

Significance. If the empirical results hold under properly constructed adaptive attacks that optimize against the full ASD+AT pipeline, the method could supply a practical, multi-resolution preprocessing step for defending physically realizable attacks in applications such as surveillance and autonomous systems. The explicit release of code supports reproducibility, which is a clear strength for an empirical defense paper.

major comments (2)
  1. [Abstract] Abstract: The headline claim of SOTA performance with a 21.73% gain over prior defenses is stated without any accompanying experimental details on datasets, attack implementations (including whether back-propagation occurs through DWT), evaluation metrics, baselines, or ablation studies. This omission renders the central empirical claim unverifiable and load-bearing for the paper's contribution.
  2. [Abstract] Abstract: The description of the adaptive adversaries does not specify the threat model (e.g., whether the attacker has white-box access to the DWT decomposition, whether DWT parameters are frozen, or how the combined loss is formulated). Without this, it is impossible to confirm that the reported robustness is not an artifact of an insufficiently strong adaptive attack that fails to target the spectral separation directly.
minor comments (1)
  1. [Abstract] The abstract refers to 'extensive experiments' yet supplies only a single aggregate percentage; a concise results summary or table reference would improve clarity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We agree that the abstract requires expansion to better support the central claims and will revise it in the next version. Below we respond point by point to the major comments.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline claim of SOTA performance with a 21.73% gain over prior defenses is stated without any accompanying experimental details on datasets, attack implementations (including whether back-propagation occurs through DWT), evaluation metrics, baselines, or ablation studies. This omission renders the central empirical claim unverifiable and load-bearing for the paper's contribution.

    Authors: We acknowledge that the abstract, as a concise summary, omits the supporting experimental details, which are instead presented in the Experiments and Ablation sections of the manuscript. To address the concern directly, we will revise the abstract to include a brief summary of the setup: the datasets used, the evaluation metric (AP), that attacks are implemented with back-propagation through the differentiable DWT, and reference to the baselines and ablations. This change will make the SOTA claim more verifiable from the abstract itself while remaining within length constraints. revision: yes

  2. Referee: [Abstract] Abstract: The description of the adaptive adversaries does not specify the threat model (e.g., whether the attacker has white-box access to the DWT decomposition, whether DWT parameters are frozen, or how the combined loss is formulated). Without this, it is impossible to confirm that the reported robustness is not an artifact of an insufficiently strong adaptive attack that fails to target the spectral separation directly.

    Authors: We agree that explicit specification of the threat model strengthens the paper. In the work, adaptive attacks are white-box with full access to the ASD+AT pipeline, gradients flow through the DWT (which is differentiable), DWT parameters are fixed and non-learnable, and the loss is the standard adversarial objective on the defended model. We will revise the abstract to state: 'under white-box adaptive adversaries optimized against the full ASD pipeline with back-propagation through DWT.' This clarification confirms that attacks target the spectral decomposition and will be incorporated in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical integration of DWT with standard AT

full rationale

The paper presents ASD as an empirical defense that applies off-the-shelf Discrete Wavelet Transform for multi-resolution spectral analysis and combines it with standard Adversarial Training. No mathematical derivations, equations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The SOTA claim rests on experimental results under adaptive attacks rather than any self-referential reduction. The method is self-contained against external benchmarks with no load-bearing steps that collapse to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unproven domain assumption that DWT decomposition separates adversarial signals from natural image content at multiple scales; no free parameters or new entities are introduced in the abstract.

axioms (1)
  • domain assumption Discrete Wavelet Transform provides multi-resolution localization that captures both high-frequency fine-grained and low-frequency spatially pervasive adversarial perturbations.
    Directly invoked to justify why DWT is suitable for analyzing patch-based and texture-based attacks.
invented entities (1)
  • Adversarial Spectrum Defense (ASD) no independent evidence
    purpose: Defense mechanism that integrates DWT spectral analysis with adversarial training.
    Newly named method proposed in the paper; no independent evidence outside the claimed experiments is provided in the abstract.

pith-pipeline@v0.9.0 · 5537 in / 1289 out tokens · 37925 ms · 2026-05-10T15:40:04.444956+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages

  1. [1]

    Explaining and harnessing adversarial examples,

    I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” inInternational Conference on Learning Representations, 2015. 1, 2, 3

  2. [2]

    Towards deep learning models resistant to adversarial attacks,

    A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” inInternational Conference on Learning Representations, 2018. 1, 2, 3, 6, 7

  3. [3]

    Adversarial patch,

    T. B. Brown, D. Mané, A. Roy, M. Abadi, and J. Gilmer, “Adversarial patch,” inNeurIPS 2017 Workshop on Machine Learning and Computer Security, 2017. 2, 11, 12

  4. [4]

    Adversarial examples in the physical world,

    A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” inArtificial Intelligence Safety and Security. Chapman and Hall/CRC, 2018, pp. 99–112. 1, 2

  5. [5]

    Fooling automated surveillance cameras: adversarial patches to attack person detection,

    S. Thys, W. Van Ranst, and T. Goedemé, “Fooling automated surveillance cameras: adversarial patches to attack person detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 1–7. 1, 2, 5, 7, 9

  6. [6]

    Phys- ically realizable natural-looking clothing textures evade person detectors via 3d modeling,

    Z. Hu, W. Chu, X. Zhu, H. Zhang, B. Zhang, and X. Hu, “Phys- ically realizable natural-looking clothing textures evade person detectors via 3d modeling,” inProceedings of the IEEE/CVF 13 Conference on Computer Vision and Pattern Recognition, 2023, pp. 16 975–16 984. 1, 2, 7, 8, 9

  7. [7]

    Towards evaluating the robustness of neural networks,

    N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in2017 IEEE Symposium on Security and Privacy (SP). IEEE, 2017, pp. 39–57. 1, 2, 3

  8. [8]

    Adversarial t-shirt! evading person detectors in a physical world,

    K. Xu, G. Zhang, S. Liu, Q. Fan, M. Sun, H. Chen, P.-Y . Chen, Y . Wang, and X. Lin, “Adversarial t-shirt! evading person detectors in a physical world,” inEuropean Conference on Computer Vision. Springer, 2020, pp. 665–681. 1, 2, 7, 9

  9. [9]

    Making an invisibility cloak: Real world adversarial attacks on object detec- tors,

    Z. Wu, S.-N. Lim, L. S. Davis, and T. Goldstein, “Making an invisibility cloak: Real world adversarial attacks on object detec- tors,” inEuropean Conference on Computer Vision. Springer, 2020, pp. 1–17. 2

  10. [10]

    Naturalistic physical adversarial patch for object detectors,

    Y .-C.-T. Hu, B.-H. Kung, D. S. Tan, J.-C. Chen, K.-L. Hua, and W.-H. Cheng, “Naturalistic physical adversarial patch for object detectors,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 7848–7857. 2, 10

  11. [11]

    Adver- sarial texture for fooling person detectors in the physical world,

    Z. Hu, S. Huang, X. Zhu, F. Sun, B. Zhang, and X. Hu, “Adver- sarial texture for fooling person detectors in the physical world,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13 307–13 316. 1, 2, 5, 7, 8, 9

  12. [12]

    Local gradients smoothing: Defense against localized adversarial attacks,

    M. Naseer, S. Khan, and F. Porikli, “Local gradients smoothing: Defense against localized adversarial attacks,” inIEEE Winter Conference on Applications of Computer Vision, 2019, pp. 1300–

  13. [13]

    Defending physical adversarial attack on object detection via adversarial patch-feature energy,

    T. Kim, Y . Yu, and Y . M. Ro, “Defending physical adversarial attack on object detection via adversarial patch-feature energy,” inProceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 1905–1913. 3, 7, 9

  14. [14]

    Segment and complete: Defending object detectors against adversarial patch attacks with robust patch detection,

    J. Liu, A. Levine, C. P. Lau, R. Chellappa, and S. Feizi, “Segment and complete: Defending object detectors against adversarial patch attacks with robust patch detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 14 973–14 982. 3, 7, 9

  15. [15]

    Napguard: To- wards detecting naturalistic adversarial patches,

    S. Wu, J. Wang, J. Zhao, Y . Wang, and X. Liu, “Napguard: To- wards detecting naturalistic adversarial patches,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 24 367–24 376. 1, 3, 7, 9

  16. [16]

    Jedi: Entropy-based localization and removal of adversarial patches,

    B. Tarchoun, A. Ben Khalifa, M. A. Mahjoub, N. Abu-Ghazaleh, and I. Alouani, “Jedi: Entropy-based localization and removal of adversarial patches,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 4087–

  17. [17]

    Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples,

    A. Athalye, N. Carlini, and D. Wagner, “Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples,” inInternational Conference on Machine Learning. PMLR, 2018, pp. 274–283. 1, 7, 8

  18. [18]

    On adaptive attacks to adversarial example defenses,

    F. Tramer, N. Carlini, W. Brendel, and A. Madry, “On adaptive attacks to adversarial example defenses,”Advances in Neural Information Processing Systems, vol. 33, pp. 1633–1645, 2020. 1, 7

  19. [19]

    Continuous and discrete wavelet transforms,

    C. E. Heil and D. F. Walnut, “Continuous and discrete wavelet transforms,”SIAM Review, vol. 31, no. 4, pp. 628–666, 1989. 2, 3

  20. [20]

    Human detection from images and videos: A survey,

    D. T. Nguyen, W. Li, and P. O. Ogunbona, “Human detection from images and videos: A survey,”Pattern Recognition, vol. 51, pp. 148–175, 2016. 2

  21. [21]

    Physical adversarial attack meets computer vision: A decade survey,

    H. Wei, H. Tang, X. Jia, Z. Wang, H. Yu, Z. Li, S. Satoh, L. Van Gool, and Z. Wang, “Physical adversarial attack meets computer vision: A decade survey,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. 2

  22. [22]

    Numbod: A spatial-frequency fusion attack against object detectors,

    Z. Zhou, B. Li, Y . Song, Z. Yu, S. Hu, W. Wan, L. Y . Zhang, D. Yao, and H. Jin, “Numbod: A spatial-frequency fusion attack against object detectors,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 1, 2025, pp. 1201–1209. 2, 7, 9

  23. [23]

    Toward generic and controllable attacks against object detection,

    G. Li, Y . Xu, J. Ding, and G.-S. Xia, “Toward generic and controllable attacks against object detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–12, 2024. 2, 7, 9

  24. [24]

    Adversarial yolo: Defense human detection patch attacks via detecting adversarial patches,

    N. Ji, Y . Feng, H. Xie, X. Xiang, and N. Liu, “Adversarial yolo: Defense human detection patch attacks via detecting adversarial patches,”arXiv preprint arXiv:2103.08860, 2021. 3

  25. [25]

    Adversarial patch detection and mitigation by detecting high entropy regions,

    N. Bunzel, A. Siwakoti, and G. Klause, “Adversarial patch detection and mitigation by detecting high entropy regions,” in2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). IEEE, 2023, pp. 124–128

  26. [26]

    Patchzero: Defending against adversarial patch attacks by detecting and zeroing the patch,

    K. Xu, Y . Xiao, Z. Zheng, K. Cai, and R. Nevatia, “Patchzero: Defending against adversarial patch attacks by detecting and zeroing the patch,” inIEEE Winter Conference on Applications of Computer Vision, 2023, pp. 4632–4641

  27. [27]

    Defending from physically-realizable adversarial attacks through internal over-activation analysis,

    G. Rossolini, F. Nesti, F. Brau, A. Biondi, and G. Buttazzo, “Defending from physically-realizable adversarial attacks through internal over-activation analysis,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 12, 2023, pp. 15 064–15 072. 3

  28. [28]

    Pad: Patch-agnostic defense against adversarial patch attacks,

    L. Jing, R. Wang, W. Ren, X. Dong, and C. Zou, “Pad: Patch-agnostic defense against adversarial patch attacks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 24 472–24 481. 3

  29. [29]

    Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,

    F. Croce and M. Hein, “Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,” inInternational Conference on Machine Learning. PMLR, 2020, pp. 2206–2216. 3, 7

  30. [30]

    On the importance of backbone to the adversarial robustness of object detectors,

    X. Li, H. Chen, and X. Hu, “On the importance of backbone to the adversarial robustness of object detectors,”IEEE Transactions on Information Forensics and Security, 2025. 3, 6, 7, 9

  31. [31]

    Feature distillation: Dnn-oriented jpeg compression against ad- versarial examples,

    Z. Liu, Q. Liu, T. Liu, N. Xu, X. Lin, Y . Wang, and W. Wen, “Feature distillation: Dnn-oriented jpeg compression against ad- versarial examples,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2019, Conference Proceedings, pp. 860–868. 3, 7, 9, 11

  32. [32]

    Amplitude- phase recombination: Rethinking robustness of convolutional neural networks in frequency domain,

    G. Chen, P. Peng, L. Ma, J. Li, L. Du, and Y . Tian, “Amplitude- phase recombination: Rethinking robustness of convolutional neural networks in frequency domain,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 458–467

  33. [33]

    Rethinking and improving robustness of convolutional neural networks: a shapley value- based approach in frequency domain,

    Y . Chen, Q. Ren, and J. Yan, “Rethinking and improving robustness of convolutional neural networks: a shapley value- based approach in frequency domain,”Advances in Neural Information Processing Systems, vol. 35, pp. 324–337, 2022. 3

  34. [34]

    Improving adversarial robustness of masked autoencoders via test-time frequency-domain prompting,

    Q. Huang, X. Dong, D. Chen, Y . Chen, L. Yuan, G. Hua, W. Zhang, and N. Yu, “Improving adversarial robustness of masked autoencoders via test-time frequency-domain prompting,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 1600–1610

  35. [35]

    Adversarial example detection by predicting adversarial noise in the frequency domain,

    S. Jung, M. Chung, and Y .-G. Shin, “Adversarial example detection by predicting adversarial noise in the frequency domain,”Multimedia Tools and Applications, vol. 82, no. 16, pp. 25 235–25 251, 2023. 3

  36. [36]

    Lpf- defense: 3d adversarial defense based on frequency analysis,

    H. Naderi, K. Noorbakhsh, A. Etemadi, and S. Kasaei, “Lpf- defense: 3d adversarial defense based on frequency analysis,” PLOS One, vol. 18, no. 2, p. e0271388, 2023

  37. [37]

    Adversarial robustness of convolutional models learned in the frequency domain,

    S. Chaudhury and T. Yamasaki, “Adversarial robustness of convolutional models learned in the frequency domain,” in2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024, pp. 7455–7459. 3

  38. [38]

    Image transformation-based defense against adversarial perturbation on deep learning models,

    A. Agarwal, R. Singh, M. Vatsa, and N. Ratha, “Image transformation-based defense against adversarial perturbation on deep learning models,”IEEE Transactions on Dependable and Secure Computing, vol. 18, no. 5, pp. 2106–2121, 2020

  39. [39]

    Defense against adversarial examples based on wavelet domain analysis,

    A. Sarvar and M. Amirmazlaghani, “Defense against adversarial examples based on wavelet domain analysis,”Applied Intelli- gence, vol. 53, no. 1, pp. 423–439, 2023

  40. [40]

    Adviris: a hybrid approach to detecting adversarial iris examples using wavelet transform,

    K. Meenakshi and G. Maragatham, “Adviris: a hybrid approach to detecting adversarial iris examples using wavelet transform,” 14 International Journal of Speech Technology, vol. 25, no. 2, pp. 435–441, 2022. 3

  41. [41]

    The haar-wavelet transform in digital image processing: its status and achievements,

    P. Porwik and A. Lisowska, “The haar-wavelet transform in digital image processing: its status and achievements,”Machine Graphics and Vision, vol. 13, no. 1/2, pp. 79–98, 2004. 4

  42. [42]

    Faster r-cnn: Towards real-time object detection with region proposal networks,

    S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,”Ad- vances in Neural Information Processing Systems, vol. 28, 2015. 4, 6, 9, 10

  43. [43]

    Histograms of oriented gradients for human detection,

    N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, vol. 1. IEEE, 2005, pp. 886–893. 4, 7, 8

  44. [44]

    Fcos: Fully convolutional one-stage object detection,

    Z. Tian, C. Shen, H. Chen, and T. He, “Fcos: Fully convolutional one-stage object detection,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9627–

  45. [45]

    Ultralytics yolov8,

    G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics yolov8,” 2023. [Online]. Available: https://github.com/ultralytics/ultralytics 6, 9

  46. [46]

    Microsoft coco: Common objects in context,

    T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” inEuropean Conference on Computer Vision. Springer, 2014, pp. 740–755. 7, 8

  47. [47]

    Information distribution based defense against physical attacks on object detection,

    G. Zhou, H. Gao, P. Chen, J. Liu, J. Dai, J. Han, and R. Li, “Information distribution based defense against physical attacks on object detection,” inIEEE International Conference on Multimedia and Expo Workshops, 2020, pp. 1–6. 9

  48. [48]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778. 12

  49. [49]

    Imagenet large scale visual recognition challenge,

    O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernsteinet al., “Imagenet large scale visual recognition challenge,”International Journal of Computer Vision, vol. 115, pp. 211–252, 2015. 11, 12

  50. [50]

    Rf-detr: neural architecture search for real-time detection transformers,

    I. Robinson, P. Robicheaux, M. Popov, D. Ramanan, and N. Peri, “Rf-detr: neural architecture search for real-time detection transformers,” inInternational Conference on Learning Representations, 2026. 11