Unleashing the Representational Power of Fourier Shapes for Attacking Infrared Object Detection

Fan Li; Jian Wang; Lijun He; Ming Lei; Yixing Yong

arxiv: 2605.17822 · v1 · pith:VKMEKY6Dnew · submitted 2026-05-18 · 💻 cs.CV

Unleashing the Representational Power of Fourier Shapes for Attacking Infrared Object Detection

Yixing Yong , Jian Wang , Ming Lei , Lijun He , Fan Li This is my paper

Pith reviewed 2026-05-20 12:25 UTC · model grok-4.3

classification 💻 cs.CV

keywords infrared object detectionadversarial attacksFourier shapesphysical patcheswinding number theoremthermal signaturesobject evasion

0 comments

The pith

Compact Fourier shapes can be optimized to produce physical patches that robustly fool infrared object detectors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper seeks to resolve the trade-off in shape-based adversarial attacks for infrared detection by using Fourier series to represent boundaries. A small number of coefficients are mapped to a full pixel mask through an analytic function based on the winding number theorem, allowing gradients to flow back for optimization. Sympathetic readers would care if this leads to stronger attacks because it shows how mathematical representations can enhance physical-world adversarial capabilities in thermal imaging systems used for surveillance and driving. The validation comes from extensive tests showing high success rates even at extended ranges.

Core claim

The authors claim that by learning Fourier coefficients to define shape boundaries and analytically converting them to masks, they can generate infrared adversarial patches that achieve over 88% success in evading detectors at distances exceeding 25 meters under diverse conditions.

What carries the argument

End-to-end differentiable mapping from Fourier coefficients to pixel masks using the winding number theorem

If this is right

The generated physical patches evade detectors effectively across diverse distances, angles, poses, and different individuals.
Over 88% attack success rate is achieved at distances greater than 25 meters with a detection confidence of 0.5.
The end-to-end framework allows for more effective optimization compared to non-differentiable shape representations.
Both digital and physical experiments validate superior performance over existing shape-based infrared attack methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach might be adapted to optimize shapes for evading other types of sensors by changing the mask interpretation.
Additional factors like the thermal conductivity of the patch material could influence real-world performance beyond the shape alone.
Evaluating the attack on a wider range of infrared detector models would test its generalizability.

Load-bearing premise

The winding number theorem can accurately and differentiably translate a small set of Fourier coefficients into complex pixel masks suitable for real infrared attack scenarios.

What would settle it

Observing that the optimized physical patches achieve less than 50% attack success rate when tested at distances over 25 meters in multiple outdoor trials with varying environmental conditions would indicate the method does not deliver the claimed robustness.

Figures

Figures reproduced from arXiv: 2605.17822 by Fan Li, Jian Wang, Lijun He, Ming Lei, Yixing Yong.

**Figure 1.** Figure 1: Physical Fourier shape attack against infrared detection. We optimize a Fourier shape, fabricate it from heat-blocking material, and apply it to a person. This adversarial shape renders the person invisible to the infrared detector, while the benign person is easily detected. 1. Introduction Deep Neural Networks (DNNs) are widely used for environmental perception, demonstrating exceptional performance i… view at source ↗

**Figure 2.** Figure 2: The comparison between different shape-based infrared attacks. Blue denotes the corresponding shape definition parameters in top-down approaches. (a)-(e) represent ref. (Zhu et al., 2021), (Zhu et al., 2022), (Wei et al., 2023d), (Wei et al., 2023b), and (Chen et al., 2022), respectively. object detectors for tasks like person or vehicle detection. However, this pixel-domain optimization often results in … view at source ↗

**Figure 3.** Figure 3: The overall framework of the Fourier shape attack. 3.2. Fourier Shape Representation To overcome the limitations of existing shape models, we require a representation that is both highly expressive and inherently describes a complete, physically plausible contour. Grid-based methods are difficult to constrain, while spline-based methods lack efficient optimization. We therefore adopt a powerful parametri… view at source ↗

**Figure 4.** Figure 4: Comparison results with previous attacks on YOLOv3. (a) Visualizations of attack results. Area ratio is used to reflect the size of adversarial patches. The object confidence score here is set as 0.1. (b) ASR-Confidence curve. (c) Precision-Recall curve. RetinaNet (Lin et al., 2017) and modern YOLOv8 (Jocher et al., 2023) to verify the generalizability of our approach. Evaluation Metrics: We evaluate attac… view at source ↗

**Figure 5.** Figure 5: Attack results on different detectors. (a) ASR-Confidence curve; (b) P-R curve. ASR Confidence ASR Confidence (a) Fourier terms K (b) Scale Ratio [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Ablations studies. The ASRs on YOLOv3 are presented. (a) Fourier terms K; (b) Scale ratio ρ. that often use a truncated curve (e.g., ≥ 0.5 confidence), which does not account for the mass of low-confidence proposals and may lead to an overestimation of the perceived AP drop. Second, as shown in visualization results, the attack’s primary effect is suppressing confidence scores (si) rather than altering box… view at source ↗

**Figure 7.** Figure 7: The physical attack environment and the attack results across different distances, viewing angles, body poses, and individuals. adversarial Fourier shape is first generated in the digital domain against the YOLOv3 detector. We then fabricate the physical patch by precisely cutting the optimized geometry from a sheet of fiberglass aluminum-foil insulation material (see [PITH_FULL_IMAGE:figures/full_fig_p0… view at source ↗

**Figure 8.** Figure 8: Analysis of the attack’s generalizability and robustness. (a) Attacking multiple objects. (b) Attacking the visible light modality. (c) ASR sensitivity to patch gray-values. (d) Robustness against defenses. All results are against YOLOv3. Visualizations (a-c) use a 0.1 confidence threshold. real-world transferability. This robustness is by design: we incorporate a set of data augmentations during the digit… view at source ↗

**Figure 9.** Figure 9: Results on adversarial augmentation between different methods, and different detectors. (a) Attack performance of different methods before (dash lines) and after (solid lines) adversarial augmentation on YOLOv3 detector. Our method consistently demonstrates superior robustness under all confidence levels compared to other methods. (b) Attack performance of proposed adversarial shape across different detect… view at source ↗

**Figure 10.** Figure 10: Visualization results of patches with different geometric shapes and quantitative ASR-Conf results of proposed adversarial attack against different adversarial augmentation strategies. When Fourier shapes are absent from defense priors, augmentation strategy based on limited regular geometry shapes fails to effectively defense proposed attack method. In contrast, introducing optimized Fourier shapes as de… view at source ↗

**Figure 11.** Figure 11: Optimization with and without regularization loss Lreg. (a) Different shapes with and without Lreg. Adversarial shapes optimized with Lreg are significantly simpler than those without Lreg. (b) In some cases, Lreg can help adversarial patches converge faster. During the training, we set the confidence loss threshold of 0.1 as the criterion for successful attack and the stopping condition of optimization. … view at source ↗

read the original abstract

Infrared object detection is crucial for perception in autonomous driving and surveillance but remains vulnerable to physical adversarial attacks. Unlike in the RGB domain, where attacks rely on color texture, infrared attacks must manipulate thermal signatures, making the geometry shape of heat-blocking materials the primary adversarial information carrier. Current shape-based methods suffer from a fundamental trade-off between representational capability and optimization power, limiting their attack effectiveness.In this work, we overcome this dilemma by introducing learnable Fourier shapes to the infrared domain. We utilize an end-to-end differentiable framework where a compact set of Fourier coefficients, defining the shape boundary, is analytically mapped to a pixel-space mask via the winding number theorem. This enables efficient gradient-based optimization to generate potent shapes that cause human targets to evade detection. Extensive digital and physical experiments provide a comprehensive evaluation and validate our superior performance. Our resulting physical patch achieves striking robustness, successfully evading detectors across diverse distances, angles, poses, and individuals, and achieves over 88% attack success rate at distances greater than 25m (conf.=0.5). Code is available at https://github.com/Yongyx99/Fourier-shape-attack.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Fourier coefficients plus winding-number mapping give a cleaner way to optimize complex physical shapes for infrared attacks than earlier parameterizations, but the abstract leaves the performance edge unproven.

read the letter

The main takeaway is that this work shows how to represent adversarial patches for infrared detectors with a compact set of Fourier coefficients whose boundary is turned into a mask through the winding number theorem. That mapping is analytic and differentiable, so the whole thing can be optimized end-to-end against a detector loss. They claim this sidesteps the usual trade-off between shape complexity and how well you can actually tune it with gradients, and they back it with both simulated and real-world tests that reach over 88 percent success at distances beyond 25 meters across angles and people. If those numbers are solid, the approach is a useful incremental step for anyone building physical attacks on thermal systems. The physical robustness they report is the part that stands out most; most prior shape attacks have been weaker once you move outside the lab. The method itself is straightforward gradient descent on the coefficients, which keeps it practical. The soft spot is the missing comparison data. The abstract does not show head-to-head numbers against the shape-based baselines it criticizes, nor any ablation on coefficient count or on how the discrete winding-number rasterization affects gradient quality for wiggly boundaries. If the discretization introduces noticeable artifacts or topological slips on real pixel grids, the claimed representational gain shrinks. The stress-test note about gradient fidelity is worth checking in the methods section; if the paper already demonstrates clean optimization for the shapes they use, that concern is minor. This paper is mainly for researchers who already work on physical adversarial examples in the infrared domain or on robustness for vehicle and surveillance thermal cameras. It is concrete enough and has enough real-world testing that a serious editor should send it out for review rather than desk-reject it. Referees can pressure the authors on the baselines and on any discretization details that are not yet clear.

Referee Report

2 major / 2 minor

Summary. The paper introduces learnable Fourier shapes for physical adversarial attacks on infrared object detectors. A compact vector of Fourier coefficients parameterizes the boundary of a heat-blocking patch; this is mapped analytically to a binary pixel mask via the winding-number theorem inside an end-to-end differentiable pipeline, enabling gradient-based optimization. Digital and physical experiments are reported to show that the resulting patches evade detectors across distances, angles, poses and individuals, with an attack success rate exceeding 88 % at ranges greater than 25 m.

Significance. If the central technical claim holds, the work would usefully extend shape-based physical attacks into the infrared domain by supplying a parameterization that is both compact and expressive while remaining amenable to gradient descent. The public release of code is a clear positive for reproducibility. The practical impact, however, hinges on whether the Fourier-plus-winding-number construction demonstrably outperforms prior shape parameterizations once proper baselines and ablations are supplied.

major comments (2)

[§3.2] §3.2 (Differentiable mask generation): the manuscript asserts that the winding-number evaluation on the discrete pixel lattice yields a sufficiently smooth and accurate gradient signal for high-frequency Fourier modes. No error analysis, gradient-norm plots, or comparison against a non-rasterized reference is provided; if discretization artifacts dominate for complex boundaries, the claimed representational advantage collapses.
[§5] §5 (Physical evaluation): the 88 % success rate at >25 m is presented without quantitative comparison to the strongest prior shape-based infrared attacks or ablations on the number of Fourier coefficients. Without these controls it is impossible to attribute the reported robustness to the Fourier representation rather than to other experimental choices.

minor comments (2)

[§3] Notation for the Fourier coefficient vector and the winding-number threshold should be introduced once in §3 and used consistently thereafter; occasional re-definition of symbols interrupts readability.
[Figure 4] Figure captions for the physical patch photographs should state the exact number of Fourier coefficients used and the camera model, distance, and weather conditions for each row.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for the insightful comments that will help improve the clarity and rigor of our work. We address the major comments point by point below, indicating where revisions will be made to the manuscript.

read point-by-point responses

Referee: [§3.2] §3.2 (Differentiable mask generation): the manuscript asserts that the winding-number evaluation on the discrete pixel lattice yields a sufficiently smooth and accurate gradient signal for high-frequency Fourier modes. No error analysis, gradient-norm plots, or comparison against a non-rasterized reference is provided; if discretization artifacts dominate for complex boundaries, the claimed representational advantage collapses.

Authors: We thank the referee for highlighting this important aspect of our differentiable pipeline. While the winding number theorem provides an analytical mapping that is differentiable in the continuous domain, we recognize that the discrete pixel lattice implementation may introduce approximation errors, particularly for high-frequency modes. In the revised version, we will add an error analysis section, including gradient-norm plots that compare the gradients obtained from our discrete implementation against a higher-resolution reference rasterization. This will quantify the smoothness and accuracy of the gradient signal and confirm that discretization artifacts do not dominate within the range of Fourier coefficients employed in our experiments. revision: yes
Referee: [§5] §5 (Physical evaluation): the 88 % success rate at >25 m is presented without quantitative comparison to the strongest prior shape-based infrared attacks or ablations on the number of Fourier coefficients. Without these controls it is impossible to attribute the reported robustness to the Fourier representation rather than to other experimental choices.

Authors: We agree that additional controls are necessary to substantiate the advantages of the Fourier parameterization. In the revision, we will include an ablation study varying the number of Fourier coefficients (e.g., 4, 8, 12, 16) and report the corresponding attack success rates in both digital and physical settings. For comparisons to prior shape-based attacks, we note that most existing infrared attack methods focus on texture or material properties rather than pure shape parameterization; the closest shape-based works are primarily in the visible domain. We will add a discussion comparing our results to re-implemented or reported baselines from related literature where feasible, and explicitly discuss any limitations in direct comparability due to differences in detector models and experimental conditions. This will better isolate the contribution of the Fourier representation. revision: partial

Circularity Check

0 steps flagged

No circularity: Fourier-to-mask mapping uses external winding-number theorem and independent detector loss

full rationale

The derivation introduces a compact Fourier coefficient vector that is mapped to a binary mask via the standard winding number theorem (an external mathematical fact, not defined by the paper). End-to-end gradient optimization then minimizes a detection loss measured on real infrared detectors. Attack success rate is evaluated empirically on held-out physical and digital test cases rather than being algebraically forced by the coefficient values themselves. No self-definitional loops, fitted-input-as-prediction, or load-bearing self-citations appear in the provided abstract or claimed chain. The method is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the differentiability and accuracy of the Fourier-to-mask mapping and on the assumption that gradient optimization of a modest number of coefficients can discover effective physical shapes. No new physical entities are postulated.

axioms (1)

standard math The winding number theorem supplies an analytic, differentiable mapping from a compact set of Fourier coefficients to a binary pixel mask.
Explicitly invoked in the abstract to enable end-to-end gradient-based optimization.

pith-pipeline@v0.9.0 · 5737 in / 1292 out tokens · 49289 ms · 2026-05-20T12:25:39.749056+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 1 internal anchor

[1]

K., Sarkar, S., and Das, T

Bustos, N., Mashhadi, M., Lai-Yuen, S. K., Sarkar, S., and Das, T. K. A systematic literature review on object detection using near infrared and thermal images. Neurocomputing, 560: 0 126804, 2023

work page 2023
[2]

Yolo-ms: Rethinking multi-scale representation learning for real-time object detection

Chen, Y., Yuan, X., Wang, J., Wu, R., Li, X., Hou, Q., and Cheng, M.-M. Yolo-ms: Rethinking multi-scale representation learning for real-time object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

work page 2025
[3]

Shape matters: deformable patch attack

Chen, Z., Li, B., Wu, S., Xu, J., Ding, S., and Zhang, W. Shape matters: deformable patch attack. In European Conference on Computer Vision, pp.\ 529--548. Springer, 2022

work page 2022
[4]

Full-distance evasion of pedestrian detectors in the physical world

Cheng, Z., Hu, Z., Liu, Y., Li, J., Su, H., and Hu, X. Full-distance evasion of pedestrian detectors in the physical world. Advances in Neural Information Processing Systems, 37: 0 102366--102392, 2024

work page 2024
[5]

A., Alouani, I., and Shafique, M

Guesmi, A., Ding, R., Hanif, M. A., Alouani, I., and Shafique, M. Dap: A dynamic adversarial patch for evading person detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 24595--24604, 2024

work page 2024
[6]

S., Chen, J.-C., Hua, K.-L., and Cheng, W.-H

Hu, Y.-C.-T., Kung, B.-H., Tan, D. S., Chen, J.-C., Hua, K.-L., and Cheng, W.-H. Naturalistic physical adversarial patch for object detectors. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 7848--7857, 2021

work page 2021
[7]

Adversarial texture for fooling person detectors in the physical world

Hu, Z., Huang, S., Zhu, X., Sun, F., Zhang, B., and Hu, X. Adversarial texture for fooling person detectors in the physical world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 13307--13316, 2022

work page 2022
[8]

Physically realizable natural-looking clothing textures evade person detectors via 3d modeling

Hu, Z., Chu, W., Zhu, X., Zhang, H., Zhang, B., and Hu, X. Physically realizable natural-looking clothing textures evade person detectors via 3d modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 16975--16984, 2023

work page 2023
[9]

Llvip: A visible-infrared paired dataset for low-light vision

Jia, X., Zhu, C., Li, M., Tang, W., and Zhou, W. Llvip: A visible-infrared paired dataset for low-light vision. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 3496--3504, 2021

work page 2021
[10]

Ultralytics YOLO , January 2023

Jocher, G., Qiu, J., and Chaurasia, A. Ultralytics YOLO , January 2023

work page 2023
[11]

C., Lo, W.-Y., et al

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C., Lo, W.-Y., et al. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 4015--4026, 2023

work page 2023
[12]

Focal loss for dense object detection

Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll \'a r, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, pp.\ 2980--2988, 2017

work page 2017
[13]

and Fu, K.-S

Persoon, E. and Fu, K.-S. Shape discrimination using fourier descriptors. IEEE Transactions on systems, man, and cybernetics, 7 0 (3): 0 170--179, 2007

work page 2007
[14]

YOLOv3: An Incremental Improvement

Redmon, J. and Farhadi, A. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[15]

Faster r-cnn: Towards real-time object detection with region proposal networks

Ren, S., He, K., Girshick, R., and Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence, 39 0 (6): 0 1137--1149, 2016

work page 2016
[16]

Fooling automated surveillance cameras: adversarial patches to attack person detection

Thys, S., Van Ranst, W., and Goedem \'e , T. Fooling automated surveillance cameras: adversarial patches to attack person detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp.\ 0--0, 2019

work page 2019
[17]

Adversarial obstacle generation against lidar-based 3d object detection

Wang, J., Li, F., Zhang, X., and Sun, H. Adversarial obstacle generation against lidar-based 3d object detection. IEEE Transactions on Multimedia, 26: 0 2686--2699, 2023

work page 2023
[18]

Toward robust lidar-camera fusion in bev space via mutual deformable attention and temporal aggregation

Wang, J., Li, F., An, Y., Zhang, X., and Sun, H. Toward robust lidar-camera fusion in bev space via mutual deformable attention and temporal aggregation. IEEE Transactions on Circuits and Systems for Video Technology, 34 0 (7): 0 5753--5764, 2024

work page 2024
[19]

Invisible triggers, visible threats! road-style adversarial creation attack for visual 3d detection in autonomous driving

Wang, J., He, L., Yong, Y., Bi, H., and Li, F. Invisible triggers, visible threats! road-style adversarial creation attack for visual 3d detection in autonomous driving. arXiv preprint arXiv:2511.08015, 2025 a

work page arXiv 2025
[20]

A unified framework for adversarial patch attacks against visual 3d object detection in autonomous driving

Wang, J., Li, F., and He, L. A unified framework for adversarial patch attacks against visual 3d object detection in autonomous driving. IEEE Transactions on Circuits and Systems for Video Technology, 2025 b

work page 2025
[21]

Physically realizable adversarial creating attack against vision-based bev space 3d object detection

Wang, J., Li, F., Lv, S., He, L., and Shen, C. Physically realizable adversarial creating attack against vision-based bev space 3d object detection. IEEE Transactions on Image Processing, 2025 c

work page 2025
[22]

Learning fourier shapes to probe the geometric world of deep neural networks

Wang, J., Yong, Y., Bi, H., He, L., and Li, F. Learning fourier shapes to probe the geometric world of deep neural networks. arXiv preprint arXiv:2511.04970, 2025 d

work page arXiv 2025
[23]

Badpatch: Diffusion-based generation of physical adversarial patches

Wang, Z., Ma, X., and Jiang, Y.-G. Badpatch: Diffusion-based generation of physical adversarial patches. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 6244--6254, 2025 e

work page 2025
[24]

Hotcold block: Fooling thermal infrared detectors with a novel wearable design

Wei, H., Wang, Z., Jia, X., Zheng, Y., Tang, H., Satoh, S., and Wang, Z. Hotcold block: Fooling thermal infrared detectors with a novel wearable design. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp.\ 15233--15241, 2023 a

work page 2023
[25]

Physical adversarial attack meets computer vision: A decade survey

Wei, H., Tang, H., Jia, X., Wang, Z., Yu, H., Li, Z., Satoh, S., Van Gool, L., and Wang, Z. Physical adversarial attack meets computer vision: A decade survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46 0 (12): 0 9797--9817, 2024 a

work page 2024
[26]

Unified adversarial patch for cross-modal attacks in the physical world

Wei, X., Huang, Y., Sun, Y., and Yu, J. Unified adversarial patch for cross-modal attacks in the physical world. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 4445--4454, 2023 b

work page 2023
[27]

Unified adversarial patch for visible-infrared cross-modal attacks in the physical world

Wei, X., Huang, Y., Sun, Y., and Yu, J. Unified adversarial patch for visible-infrared cross-modal attacks in the physical world. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46 0 (4): 0 2348--2363, 2023 c

work page 2023
[28]

Physically adversarial infrared patches with learnable shapes and locations

Wei, X., Yu, J., and Huang, Y. Physically adversarial infrared patches with learnable shapes and locations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 12334--12342, 2023 d

work page 2023
[29]

Infrared adversarial patches with learnable shapes and locations in the physical world

Wei, X., Yu, J., and Huang, Y. Infrared adversarial patches with learnable shapes and locations in the physical world. International Journal of Computer Vision, 132 0 (6): 0 1928--1944, 2024 b

work page 1928
[30]

Real-world adversarial defense against patch attacks based on diffusion model

Wei, X., Kang, C., Dong, Y., Wang, Z., Ruan, S., Chen, Y., and Su, H. Real-world adversarial defense against patch attacks based on diffusion model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

work page 2025
[31]

Sam as the guide: mastering pseudo-label refinement in semi-supervised referring expression segmentation

Yang, D., Ji, J., Ma, Y., Guo, T., Wang, H., Sun, X., and Ji, R. Sam as the guide: mastering pseudo-label refinement in semi-supervised referring expression segmentation. In Proceedings of the 41st International Conference on Machine Learning, 2024

work page 2024
[32]

Resolution adaptive networks for efficient inference

Yang, L., Han, Y., Chen, X., Song, S., Dai, J., and Huang, G. Resolution adaptive networks for efficient inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 2369--2378, 2020

work page 2020
[33]

Condensenet v2: Sparse feature reactivation for deep networks

Yang, L., Jiang, H., Cai, R., Wang, Y., Song, S., Huang, G., and Tian, Q. Condensenet v2: Sparse feature reactivation for deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 3569--3578, 2021

work page 2021
[34]

A unified framework for generating diverse and stealthy adversarial patches against aerial object detection

Yong, Y., Wang, J., He, L., and Li, F. A unified framework for generating diverse and stealthy adversarial patches against aerial object detection. IEEE Transactions on Geoscience and Remote Sensing, 2025

work page 2025
[35]

Omni-angle assault: An invisible and powerful physical adversarial attack on face recognition

Yuan, S., Li, H., Zhang, R., Cao, H., Jiang, W., Ni, T., Fan, W., Zhao, Q., and Xu, G. Omni-angle assault: An invisible and powerful physical adversarial attack on face recognition. In International Conference on Machine Learning, 2025

work page 2025
[36]

A comparative study of fourier descriptors for shape representation and retrieval

Zhang, D., Lu, G., et al. A comparative study of fourier descriptors for shape representation and retrieval. In Proc. 5th Asian Conference on Computer Vision, pp.\ 35, 2002

work page 2002
[37]

Generalizable multi-camera 3d object detection from a single source via fourier cross-view learning

Zhao, X., Gu, Q., Wang, X., Zhou, C., and Ye, N. Generalizable multi-camera 3d object detection from a single source via fourier cross-view learning. In International Conference on Machine Learning, 2025

work page 2025
[38]

Image fusion via vision-language model

Zhao, Z., Deng, L., Bai, H., Cui, Y., Zhang, Z., Zhang, Y., Qin, H., Chen, D., Zhang, J., Wang, P., and Van Gool, L. Image fusion via vision-language model. In Proceedings of the 41st International Conference on Machine Learning, 2024

work page 2024
[39]

Physical 3d adversarial attacks against monocular depth estimation in autonomous driving

Zheng, J., Lin, C., Sun, J., Zhao, Z., Li, Q., and Shen, C. Physical 3d adversarial attacks against monocular depth estimation in autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 24452--24461, 2024

work page 2024
[40]

Rauca: a novel physical adversarial attack on vehicle detectors via robust and accurate camouflage generation

Zhou, J., Lyu, L., He, D., and Li, Y. Rauca: a novel physical adversarial attack on vehicle detectors via robust and accurate camouflage generation. In Proceedings of the 41st International Conference on Machine Learning, 2024

work page 2024
[41]

Fooling thermal infrared pedestrian detectors in real world using small bulbs

Zhu, X., Li, X., Li, J., Wang, Z., and Hu, X. Fooling thermal infrared pedestrian detectors in real world using small bulbs. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp.\ 3616--3624, 2021

work page 2021
[42]

Infrared invisible clothing: Hiding from infrared detectors at multiple angles in real world

Zhu, X., Hu, Z., Huang, S., Li, J., and Hu, X. Infrared invisible clothing: Hiding from infrared detectors at multiple angles in real world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 13317--13326, 2022

work page 2022
[43]

Infrared adversarial car stickers

Zhu, X., Liu, Y., Hu, Z., Li, J., and Hu, X. Infrared adversarial car stickers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 24284--24293, 2024

work page 2024

[1] [1]

K., Sarkar, S., and Das, T

Bustos, N., Mashhadi, M., Lai-Yuen, S. K., Sarkar, S., and Das, T. K. A systematic literature review on object detection using near infrared and thermal images. Neurocomputing, 560: 0 126804, 2023

work page 2023

[2] [2]

Yolo-ms: Rethinking multi-scale representation learning for real-time object detection

Chen, Y., Yuan, X., Wang, J., Wu, R., Li, X., Hou, Q., and Cheng, M.-M. Yolo-ms: Rethinking multi-scale representation learning for real-time object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

work page 2025

[3] [3]

Shape matters: deformable patch attack

Chen, Z., Li, B., Wu, S., Xu, J., Ding, S., and Zhang, W. Shape matters: deformable patch attack. In European Conference on Computer Vision, pp.\ 529--548. Springer, 2022

work page 2022

[4] [4]

Full-distance evasion of pedestrian detectors in the physical world

Cheng, Z., Hu, Z., Liu, Y., Li, J., Su, H., and Hu, X. Full-distance evasion of pedestrian detectors in the physical world. Advances in Neural Information Processing Systems, 37: 0 102366--102392, 2024

work page 2024

[5] [5]

A., Alouani, I., and Shafique, M

Guesmi, A., Ding, R., Hanif, M. A., Alouani, I., and Shafique, M. Dap: A dynamic adversarial patch for evading person detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 24595--24604, 2024

work page 2024

[6] [6]

S., Chen, J.-C., Hua, K.-L., and Cheng, W.-H

Hu, Y.-C.-T., Kung, B.-H., Tan, D. S., Chen, J.-C., Hua, K.-L., and Cheng, W.-H. Naturalistic physical adversarial patch for object detectors. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 7848--7857, 2021

work page 2021

[7] [7]

Adversarial texture for fooling person detectors in the physical world

Hu, Z., Huang, S., Zhu, X., Sun, F., Zhang, B., and Hu, X. Adversarial texture for fooling person detectors in the physical world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 13307--13316, 2022

work page 2022

[8] [8]

Physically realizable natural-looking clothing textures evade person detectors via 3d modeling

Hu, Z., Chu, W., Zhu, X., Zhang, H., Zhang, B., and Hu, X. Physically realizable natural-looking clothing textures evade person detectors via 3d modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 16975--16984, 2023

work page 2023

[9] [9]

Llvip: A visible-infrared paired dataset for low-light vision

Jia, X., Zhu, C., Li, M., Tang, W., and Zhou, W. Llvip: A visible-infrared paired dataset for low-light vision. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 3496--3504, 2021

work page 2021

[10] [10]

Ultralytics YOLO , January 2023

Jocher, G., Qiu, J., and Chaurasia, A. Ultralytics YOLO , January 2023

work page 2023

[11] [11]

C., Lo, W.-Y., et al

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C., Lo, W.-Y., et al. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 4015--4026, 2023

work page 2023

[12] [12]

Focal loss for dense object detection

Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll \'a r, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, pp.\ 2980--2988, 2017

work page 2017

[13] [13]

and Fu, K.-S

Persoon, E. and Fu, K.-S. Shape discrimination using fourier descriptors. IEEE Transactions on systems, man, and cybernetics, 7 0 (3): 0 170--179, 2007

work page 2007

[14] [14]

YOLOv3: An Incremental Improvement

Redmon, J. and Farhadi, A. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[15] [15]

Faster r-cnn: Towards real-time object detection with region proposal networks

Ren, S., He, K., Girshick, R., and Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence, 39 0 (6): 0 1137--1149, 2016

work page 2016

[16] [16]

Fooling automated surveillance cameras: adversarial patches to attack person detection

Thys, S., Van Ranst, W., and Goedem \'e , T. Fooling automated surveillance cameras: adversarial patches to attack person detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp.\ 0--0, 2019

work page 2019

[17] [17]

Adversarial obstacle generation against lidar-based 3d object detection

Wang, J., Li, F., Zhang, X., and Sun, H. Adversarial obstacle generation against lidar-based 3d object detection. IEEE Transactions on Multimedia, 26: 0 2686--2699, 2023

work page 2023

[18] [18]

Toward robust lidar-camera fusion in bev space via mutual deformable attention and temporal aggregation

Wang, J., Li, F., An, Y., Zhang, X., and Sun, H. Toward robust lidar-camera fusion in bev space via mutual deformable attention and temporal aggregation. IEEE Transactions on Circuits and Systems for Video Technology, 34 0 (7): 0 5753--5764, 2024

work page 2024

[19] [19]

Invisible triggers, visible threats! road-style adversarial creation attack for visual 3d detection in autonomous driving

Wang, J., He, L., Yong, Y., Bi, H., and Li, F. Invisible triggers, visible threats! road-style adversarial creation attack for visual 3d detection in autonomous driving. arXiv preprint arXiv:2511.08015, 2025 a

work page arXiv 2025

[20] [20]

A unified framework for adversarial patch attacks against visual 3d object detection in autonomous driving

Wang, J., Li, F., and He, L. A unified framework for adversarial patch attacks against visual 3d object detection in autonomous driving. IEEE Transactions on Circuits and Systems for Video Technology, 2025 b

work page 2025

[21] [21]

Physically realizable adversarial creating attack against vision-based bev space 3d object detection

Wang, J., Li, F., Lv, S., He, L., and Shen, C. Physically realizable adversarial creating attack against vision-based bev space 3d object detection. IEEE Transactions on Image Processing, 2025 c

work page 2025

[22] [22]

Learning fourier shapes to probe the geometric world of deep neural networks

Wang, J., Yong, Y., Bi, H., He, L., and Li, F. Learning fourier shapes to probe the geometric world of deep neural networks. arXiv preprint arXiv:2511.04970, 2025 d

work page arXiv 2025

[23] [23]

Badpatch: Diffusion-based generation of physical adversarial patches

Wang, Z., Ma, X., and Jiang, Y.-G. Badpatch: Diffusion-based generation of physical adversarial patches. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 6244--6254, 2025 e

work page 2025

[24] [24]

Hotcold block: Fooling thermal infrared detectors with a novel wearable design

Wei, H., Wang, Z., Jia, X., Zheng, Y., Tang, H., Satoh, S., and Wang, Z. Hotcold block: Fooling thermal infrared detectors with a novel wearable design. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp.\ 15233--15241, 2023 a

work page 2023

[25] [25]

Physical adversarial attack meets computer vision: A decade survey

Wei, H., Tang, H., Jia, X., Wang, Z., Yu, H., Li, Z., Satoh, S., Van Gool, L., and Wang, Z. Physical adversarial attack meets computer vision: A decade survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46 0 (12): 0 9797--9817, 2024 a

work page 2024

[26] [26]

Unified adversarial patch for cross-modal attacks in the physical world

Wei, X., Huang, Y., Sun, Y., and Yu, J. Unified adversarial patch for cross-modal attacks in the physical world. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 4445--4454, 2023 b

work page 2023

[27] [27]

Unified adversarial patch for visible-infrared cross-modal attacks in the physical world

Wei, X., Huang, Y., Sun, Y., and Yu, J. Unified adversarial patch for visible-infrared cross-modal attacks in the physical world. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46 0 (4): 0 2348--2363, 2023 c

work page 2023

[28] [28]

Physically adversarial infrared patches with learnable shapes and locations

Wei, X., Yu, J., and Huang, Y. Physically adversarial infrared patches with learnable shapes and locations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 12334--12342, 2023 d

work page 2023

[29] [29]

Infrared adversarial patches with learnable shapes and locations in the physical world

Wei, X., Yu, J., and Huang, Y. Infrared adversarial patches with learnable shapes and locations in the physical world. International Journal of Computer Vision, 132 0 (6): 0 1928--1944, 2024 b

work page 1928

[30] [30]

Real-world adversarial defense against patch attacks based on diffusion model

Wei, X., Kang, C., Dong, Y., Wang, Z., Ruan, S., Chen, Y., and Su, H. Real-world adversarial defense against patch attacks based on diffusion model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

work page 2025

[31] [31]

Sam as the guide: mastering pseudo-label refinement in semi-supervised referring expression segmentation

Yang, D., Ji, J., Ma, Y., Guo, T., Wang, H., Sun, X., and Ji, R. Sam as the guide: mastering pseudo-label refinement in semi-supervised referring expression segmentation. In Proceedings of the 41st International Conference on Machine Learning, 2024

work page 2024

[32] [32]

Resolution adaptive networks for efficient inference

Yang, L., Han, Y., Chen, X., Song, S., Dai, J., and Huang, G. Resolution adaptive networks for efficient inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 2369--2378, 2020

work page 2020

[33] [33]

Condensenet v2: Sparse feature reactivation for deep networks

Yang, L., Jiang, H., Cai, R., Wang, Y., Song, S., Huang, G., and Tian, Q. Condensenet v2: Sparse feature reactivation for deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 3569--3578, 2021

work page 2021

[34] [34]

A unified framework for generating diverse and stealthy adversarial patches against aerial object detection

Yong, Y., Wang, J., He, L., and Li, F. A unified framework for generating diverse and stealthy adversarial patches against aerial object detection. IEEE Transactions on Geoscience and Remote Sensing, 2025

work page 2025

[35] [35]

Omni-angle assault: An invisible and powerful physical adversarial attack on face recognition

Yuan, S., Li, H., Zhang, R., Cao, H., Jiang, W., Ni, T., Fan, W., Zhao, Q., and Xu, G. Omni-angle assault: An invisible and powerful physical adversarial attack on face recognition. In International Conference on Machine Learning, 2025

work page 2025

[36] [36]

A comparative study of fourier descriptors for shape representation and retrieval

Zhang, D., Lu, G., et al. A comparative study of fourier descriptors for shape representation and retrieval. In Proc. 5th Asian Conference on Computer Vision, pp.\ 35, 2002

work page 2002

[37] [37]

Generalizable multi-camera 3d object detection from a single source via fourier cross-view learning

Zhao, X., Gu, Q., Wang, X., Zhou, C., and Ye, N. Generalizable multi-camera 3d object detection from a single source via fourier cross-view learning. In International Conference on Machine Learning, 2025

work page 2025

[38] [38]

Image fusion via vision-language model

Zhao, Z., Deng, L., Bai, H., Cui, Y., Zhang, Z., Zhang, Y., Qin, H., Chen, D., Zhang, J., Wang, P., and Van Gool, L. Image fusion via vision-language model. In Proceedings of the 41st International Conference on Machine Learning, 2024

work page 2024

[39] [39]

Physical 3d adversarial attacks against monocular depth estimation in autonomous driving

Zheng, J., Lin, C., Sun, J., Zhao, Z., Li, Q., and Shen, C. Physical 3d adversarial attacks against monocular depth estimation in autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 24452--24461, 2024

work page 2024

[40] [40]

Rauca: a novel physical adversarial attack on vehicle detectors via robust and accurate camouflage generation

Zhou, J., Lyu, L., He, D., and Li, Y. Rauca: a novel physical adversarial attack on vehicle detectors via robust and accurate camouflage generation. In Proceedings of the 41st International Conference on Machine Learning, 2024

work page 2024

[41] [41]

Fooling thermal infrared pedestrian detectors in real world using small bulbs

Zhu, X., Li, X., Li, J., Wang, Z., and Hu, X. Fooling thermal infrared pedestrian detectors in real world using small bulbs. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp.\ 3616--3624, 2021

work page 2021

[42] [42]

Infrared invisible clothing: Hiding from infrared detectors at multiple angles in real world

Zhu, X., Hu, Z., Huang, S., Li, J., and Hu, X. Infrared invisible clothing: Hiding from infrared detectors at multiple angles in real world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 13317--13326, 2022

work page 2022

[43] [43]

Infrared adversarial car stickers

Zhu, X., Liu, Y., Hu, Z., Li, J., and Hu, X. Infrared adversarial car stickers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 24284--24293, 2024

work page 2024