Unleashing the Representational Power of Fourier Shapes for Attacking Infrared Object Detection
Pith reviewed 2026-05-20 12:25 UTC · model grok-4.3
The pith
Compact Fourier shapes can be optimized to produce physical patches that robustly fool infrared object detectors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that by learning Fourier coefficients to define shape boundaries and analytically converting them to masks, they can generate infrared adversarial patches that achieve over 88% success in evading detectors at distances exceeding 25 meters under diverse conditions.
What carries the argument
End-to-end differentiable mapping from Fourier coefficients to pixel masks using the winding number theorem
If this is right
- The generated physical patches evade detectors effectively across diverse distances, angles, poses, and different individuals.
- Over 88% attack success rate is achieved at distances greater than 25 meters with a detection confidence of 0.5.
- The end-to-end framework allows for more effective optimization compared to non-differentiable shape representations.
- Both digital and physical experiments validate superior performance over existing shape-based infrared attack methods.
Where Pith is reading between the lines
- This approach might be adapted to optimize shapes for evading other types of sensors by changing the mask interpretation.
- Additional factors like the thermal conductivity of the patch material could influence real-world performance beyond the shape alone.
- Evaluating the attack on a wider range of infrared detector models would test its generalizability.
Load-bearing premise
The winding number theorem can accurately and differentiably translate a small set of Fourier coefficients into complex pixel masks suitable for real infrared attack scenarios.
What would settle it
Observing that the optimized physical patches achieve less than 50% attack success rate when tested at distances over 25 meters in multiple outdoor trials with varying environmental conditions would indicate the method does not deliver the claimed robustness.
Figures
read the original abstract
Infrared object detection is crucial for perception in autonomous driving and surveillance but remains vulnerable to physical adversarial attacks. Unlike in the RGB domain, where attacks rely on color texture, infrared attacks must manipulate thermal signatures, making the geometry shape of heat-blocking materials the primary adversarial information carrier. Current shape-based methods suffer from a fundamental trade-off between representational capability and optimization power, limiting their attack effectiveness.In this work, we overcome this dilemma by introducing learnable Fourier shapes to the infrared domain. We utilize an end-to-end differentiable framework where a compact set of Fourier coefficients, defining the shape boundary, is analytically mapped to a pixel-space mask via the winding number theorem. This enables efficient gradient-based optimization to generate potent shapes that cause human targets to evade detection. Extensive digital and physical experiments provide a comprehensive evaluation and validate our superior performance. Our resulting physical patch achieves striking robustness, successfully evading detectors across diverse distances, angles, poses, and individuals, and achieves over 88% attack success rate at distances greater than 25m (conf.=0.5). Code is available at https://github.com/Yongyx99/Fourier-shape-attack.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces learnable Fourier shapes for physical adversarial attacks on infrared object detectors. A compact vector of Fourier coefficients parameterizes the boundary of a heat-blocking patch; this is mapped analytically to a binary pixel mask via the winding-number theorem inside an end-to-end differentiable pipeline, enabling gradient-based optimization. Digital and physical experiments are reported to show that the resulting patches evade detectors across distances, angles, poses and individuals, with an attack success rate exceeding 88 % at ranges greater than 25 m.
Significance. If the central technical claim holds, the work would usefully extend shape-based physical attacks into the infrared domain by supplying a parameterization that is both compact and expressive while remaining amenable to gradient descent. The public release of code is a clear positive for reproducibility. The practical impact, however, hinges on whether the Fourier-plus-winding-number construction demonstrably outperforms prior shape parameterizations once proper baselines and ablations are supplied.
major comments (2)
- [§3.2] §3.2 (Differentiable mask generation): the manuscript asserts that the winding-number evaluation on the discrete pixel lattice yields a sufficiently smooth and accurate gradient signal for high-frequency Fourier modes. No error analysis, gradient-norm plots, or comparison against a non-rasterized reference is provided; if discretization artifacts dominate for complex boundaries, the claimed representational advantage collapses.
- [§5] §5 (Physical evaluation): the 88 % success rate at >25 m is presented without quantitative comparison to the strongest prior shape-based infrared attacks or ablations on the number of Fourier coefficients. Without these controls it is impossible to attribute the reported robustness to the Fourier representation rather than to other experimental choices.
minor comments (2)
- [§3] Notation for the Fourier coefficient vector and the winding-number threshold should be introduced once in §3 and used consistently thereafter; occasional re-definition of symbols interrupts readability.
- [Figure 4] Figure captions for the physical patch photographs should state the exact number of Fourier coefficients used and the camera model, distance, and weather conditions for each row.
Simulated Author's Rebuttal
We are grateful to the referee for the insightful comments that will help improve the clarity and rigor of our work. We address the major comments point by point below, indicating where revisions will be made to the manuscript.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Differentiable mask generation): the manuscript asserts that the winding-number evaluation on the discrete pixel lattice yields a sufficiently smooth and accurate gradient signal for high-frequency Fourier modes. No error analysis, gradient-norm plots, or comparison against a non-rasterized reference is provided; if discretization artifacts dominate for complex boundaries, the claimed representational advantage collapses.
Authors: We thank the referee for highlighting this important aspect of our differentiable pipeline. While the winding number theorem provides an analytical mapping that is differentiable in the continuous domain, we recognize that the discrete pixel lattice implementation may introduce approximation errors, particularly for high-frequency modes. In the revised version, we will add an error analysis section, including gradient-norm plots that compare the gradients obtained from our discrete implementation against a higher-resolution reference rasterization. This will quantify the smoothness and accuracy of the gradient signal and confirm that discretization artifacts do not dominate within the range of Fourier coefficients employed in our experiments. revision: yes
-
Referee: [§5] §5 (Physical evaluation): the 88 % success rate at >25 m is presented without quantitative comparison to the strongest prior shape-based infrared attacks or ablations on the number of Fourier coefficients. Without these controls it is impossible to attribute the reported robustness to the Fourier representation rather than to other experimental choices.
Authors: We agree that additional controls are necessary to substantiate the advantages of the Fourier parameterization. In the revision, we will include an ablation study varying the number of Fourier coefficients (e.g., 4, 8, 12, 16) and report the corresponding attack success rates in both digital and physical settings. For comparisons to prior shape-based attacks, we note that most existing infrared attack methods focus on texture or material properties rather than pure shape parameterization; the closest shape-based works are primarily in the visible domain. We will add a discussion comparing our results to re-implemented or reported baselines from related literature where feasible, and explicitly discuss any limitations in direct comparability due to differences in detector models and experimental conditions. This will better isolate the contribution of the Fourier representation. revision: partial
Circularity Check
No circularity: Fourier-to-mask mapping uses external winding-number theorem and independent detector loss
full rationale
The derivation introduces a compact Fourier coefficient vector that is mapped to a binary mask via the standard winding number theorem (an external mathematical fact, not defined by the paper). End-to-end gradient optimization then minimizes a detection loss measured on real infrared detectors. Attack success rate is evaluated empirically on held-out physical and digital test cases rather than being algebraically forced by the coefficient values themselves. No self-definitional loops, fitted-input-as-prediction, or load-bearing self-citations appear in the provided abstract or claimed chain. The method is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math The winding number theorem supplies an analytic, differentiable mapping from a compact set of Fourier coefficients to a binary pixel mask.
Reference graph
Works this paper leans on
-
[1]
Bustos, N., Mashhadi, M., Lai-Yuen, S. K., Sarkar, S., and Das, T. K. A systematic literature review on object detection using near infrared and thermal images. Neurocomputing, 560: 0 126804, 2023
work page 2023
-
[2]
Yolo-ms: Rethinking multi-scale representation learning for real-time object detection
Chen, Y., Yuan, X., Wang, J., Wu, R., Li, X., Hou, Q., and Cheng, M.-M. Yolo-ms: Rethinking multi-scale representation learning for real-time object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
work page 2025
-
[3]
Shape matters: deformable patch attack
Chen, Z., Li, B., Wu, S., Xu, J., Ding, S., and Zhang, W. Shape matters: deformable patch attack. In European Conference on Computer Vision, pp.\ 529--548. Springer, 2022
work page 2022
-
[4]
Full-distance evasion of pedestrian detectors in the physical world
Cheng, Z., Hu, Z., Liu, Y., Li, J., Su, H., and Hu, X. Full-distance evasion of pedestrian detectors in the physical world. Advances in Neural Information Processing Systems, 37: 0 102366--102392, 2024
work page 2024
-
[5]
A., Alouani, I., and Shafique, M
Guesmi, A., Ding, R., Hanif, M. A., Alouani, I., and Shafique, M. Dap: A dynamic adversarial patch for evading person detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 24595--24604, 2024
work page 2024
-
[6]
S., Chen, J.-C., Hua, K.-L., and Cheng, W.-H
Hu, Y.-C.-T., Kung, B.-H., Tan, D. S., Chen, J.-C., Hua, K.-L., and Cheng, W.-H. Naturalistic physical adversarial patch for object detectors. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 7848--7857, 2021
work page 2021
-
[7]
Adversarial texture for fooling person detectors in the physical world
Hu, Z., Huang, S., Zhu, X., Sun, F., Zhang, B., and Hu, X. Adversarial texture for fooling person detectors in the physical world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 13307--13316, 2022
work page 2022
-
[8]
Physically realizable natural-looking clothing textures evade person detectors via 3d modeling
Hu, Z., Chu, W., Zhu, X., Zhang, H., Zhang, B., and Hu, X. Physically realizable natural-looking clothing textures evade person detectors via 3d modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 16975--16984, 2023
work page 2023
-
[9]
Llvip: A visible-infrared paired dataset for low-light vision
Jia, X., Zhu, C., Li, M., Tang, W., and Zhou, W. Llvip: A visible-infrared paired dataset for low-light vision. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 3496--3504, 2021
work page 2021
-
[10]
Ultralytics YOLO , January 2023
Jocher, G., Qiu, J., and Chaurasia, A. Ultralytics YOLO , January 2023
work page 2023
-
[11]
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C., Lo, W.-Y., et al. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 4015--4026, 2023
work page 2023
-
[12]
Focal loss for dense object detection
Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll \'a r, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, pp.\ 2980--2988, 2017
work page 2017
-
[13]
Persoon, E. and Fu, K.-S. Shape discrimination using fourier descriptors. IEEE Transactions on systems, man, and cybernetics, 7 0 (3): 0 170--179, 2007
work page 2007
-
[14]
YOLOv3: An Incremental Improvement
Redmon, J. and Farhadi, A. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[15]
Faster r-cnn: Towards real-time object detection with region proposal networks
Ren, S., He, K., Girshick, R., and Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence, 39 0 (6): 0 1137--1149, 2016
work page 2016
-
[16]
Fooling automated surveillance cameras: adversarial patches to attack person detection
Thys, S., Van Ranst, W., and Goedem \'e , T. Fooling automated surveillance cameras: adversarial patches to attack person detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp.\ 0--0, 2019
work page 2019
-
[17]
Adversarial obstacle generation against lidar-based 3d object detection
Wang, J., Li, F., Zhang, X., and Sun, H. Adversarial obstacle generation against lidar-based 3d object detection. IEEE Transactions on Multimedia, 26: 0 2686--2699, 2023
work page 2023
-
[18]
Wang, J., Li, F., An, Y., Zhang, X., and Sun, H. Toward robust lidar-camera fusion in bev space via mutual deformable attention and temporal aggregation. IEEE Transactions on Circuits and Systems for Video Technology, 34 0 (7): 0 5753--5764, 2024
work page 2024
-
[19]
Wang, J., He, L., Yong, Y., Bi, H., and Li, F. Invisible triggers, visible threats! road-style adversarial creation attack for visual 3d detection in autonomous driving. arXiv preprint arXiv:2511.08015, 2025 a
-
[20]
Wang, J., Li, F., and He, L. A unified framework for adversarial patch attacks against visual 3d object detection in autonomous driving. IEEE Transactions on Circuits and Systems for Video Technology, 2025 b
work page 2025
-
[21]
Physically realizable adversarial creating attack against vision-based bev space 3d object detection
Wang, J., Li, F., Lv, S., He, L., and Shen, C. Physically realizable adversarial creating attack against vision-based bev space 3d object detection. IEEE Transactions on Image Processing, 2025 c
work page 2025
-
[22]
Learning fourier shapes to probe the geometric world of deep neural networks
Wang, J., Yong, Y., Bi, H., He, L., and Li, F. Learning fourier shapes to probe the geometric world of deep neural networks. arXiv preprint arXiv:2511.04970, 2025 d
-
[23]
Badpatch: Diffusion-based generation of physical adversarial patches
Wang, Z., Ma, X., and Jiang, Y.-G. Badpatch: Diffusion-based generation of physical adversarial patches. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 6244--6254, 2025 e
work page 2025
-
[24]
Hotcold block: Fooling thermal infrared detectors with a novel wearable design
Wei, H., Wang, Z., Jia, X., Zheng, Y., Tang, H., Satoh, S., and Wang, Z. Hotcold block: Fooling thermal infrared detectors with a novel wearable design. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp.\ 15233--15241, 2023 a
work page 2023
-
[25]
Physical adversarial attack meets computer vision: A decade survey
Wei, H., Tang, H., Jia, X., Wang, Z., Yu, H., Li, Z., Satoh, S., Van Gool, L., and Wang, Z. Physical adversarial attack meets computer vision: A decade survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46 0 (12): 0 9797--9817, 2024 a
work page 2024
-
[26]
Unified adversarial patch for cross-modal attacks in the physical world
Wei, X., Huang, Y., Sun, Y., and Yu, J. Unified adversarial patch for cross-modal attacks in the physical world. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 4445--4454, 2023 b
work page 2023
-
[27]
Unified adversarial patch for visible-infrared cross-modal attacks in the physical world
Wei, X., Huang, Y., Sun, Y., and Yu, J. Unified adversarial patch for visible-infrared cross-modal attacks in the physical world. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46 0 (4): 0 2348--2363, 2023 c
work page 2023
-
[28]
Physically adversarial infrared patches with learnable shapes and locations
Wei, X., Yu, J., and Huang, Y. Physically adversarial infrared patches with learnable shapes and locations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 12334--12342, 2023 d
work page 2023
-
[29]
Infrared adversarial patches with learnable shapes and locations in the physical world
Wei, X., Yu, J., and Huang, Y. Infrared adversarial patches with learnable shapes and locations in the physical world. International Journal of Computer Vision, 132 0 (6): 0 1928--1944, 2024 b
work page 1928
-
[30]
Real-world adversarial defense against patch attacks based on diffusion model
Wei, X., Kang, C., Dong, Y., Wang, Z., Ruan, S., Chen, Y., and Su, H. Real-world adversarial defense against patch attacks based on diffusion model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
work page 2025
-
[31]
Yang, D., Ji, J., Ma, Y., Guo, T., Wang, H., Sun, X., and Ji, R. Sam as the guide: mastering pseudo-label refinement in semi-supervised referring expression segmentation. In Proceedings of the 41st International Conference on Machine Learning, 2024
work page 2024
-
[32]
Resolution adaptive networks for efficient inference
Yang, L., Han, Y., Chen, X., Song, S., Dai, J., and Huang, G. Resolution adaptive networks for efficient inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 2369--2378, 2020
work page 2020
-
[33]
Condensenet v2: Sparse feature reactivation for deep networks
Yang, L., Jiang, H., Cai, R., Wang, Y., Song, S., Huang, G., and Tian, Q. Condensenet v2: Sparse feature reactivation for deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 3569--3578, 2021
work page 2021
-
[34]
Yong, Y., Wang, J., He, L., and Li, F. A unified framework for generating diverse and stealthy adversarial patches against aerial object detection. IEEE Transactions on Geoscience and Remote Sensing, 2025
work page 2025
-
[35]
Omni-angle assault: An invisible and powerful physical adversarial attack on face recognition
Yuan, S., Li, H., Zhang, R., Cao, H., Jiang, W., Ni, T., Fan, W., Zhao, Q., and Xu, G. Omni-angle assault: An invisible and powerful physical adversarial attack on face recognition. In International Conference on Machine Learning, 2025
work page 2025
-
[36]
A comparative study of fourier descriptors for shape representation and retrieval
Zhang, D., Lu, G., et al. A comparative study of fourier descriptors for shape representation and retrieval. In Proc. 5th Asian Conference on Computer Vision, pp.\ 35, 2002
work page 2002
-
[37]
Generalizable multi-camera 3d object detection from a single source via fourier cross-view learning
Zhao, X., Gu, Q., Wang, X., Zhou, C., and Ye, N. Generalizable multi-camera 3d object detection from a single source via fourier cross-view learning. In International Conference on Machine Learning, 2025
work page 2025
-
[38]
Image fusion via vision-language model
Zhao, Z., Deng, L., Bai, H., Cui, Y., Zhang, Z., Zhang, Y., Qin, H., Chen, D., Zhang, J., Wang, P., and Van Gool, L. Image fusion via vision-language model. In Proceedings of the 41st International Conference on Machine Learning, 2024
work page 2024
-
[39]
Physical 3d adversarial attacks against monocular depth estimation in autonomous driving
Zheng, J., Lin, C., Sun, J., Zhao, Z., Li, Q., and Shen, C. Physical 3d adversarial attacks against monocular depth estimation in autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 24452--24461, 2024
work page 2024
-
[40]
Zhou, J., Lyu, L., He, D., and Li, Y. Rauca: a novel physical adversarial attack on vehicle detectors via robust and accurate camouflage generation. In Proceedings of the 41st International Conference on Machine Learning, 2024
work page 2024
-
[41]
Fooling thermal infrared pedestrian detectors in real world using small bulbs
Zhu, X., Li, X., Li, J., Wang, Z., and Hu, X. Fooling thermal infrared pedestrian detectors in real world using small bulbs. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp.\ 3616--3624, 2021
work page 2021
-
[42]
Infrared invisible clothing: Hiding from infrared detectors at multiple angles in real world
Zhu, X., Hu, Z., Huang, S., Li, J., and Hu, X. Infrared invisible clothing: Hiding from infrared detectors at multiple angles in real world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 13317--13326, 2022
work page 2022
-
[43]
Infrared adversarial car stickers
Zhu, X., Liu, Y., Hu, Z., Li, J., and Hu, X. Infrared adversarial car stickers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 24284--24293, 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.