pith. sign in

arxiv: 2605.17772 · v1 · pith:YBW3YWT2new · submitted 2026-05-18 · 💻 cs.CV

Towards Universal Physical Adversarial Attacks via a Joint Multi-Objective and Multi-Model Optimization Framework

Pith reviewed 2026-05-20 12:53 UTC · model grok-4.3

classification 💻 cs.CV
keywords physical adversarial attacksmulti-model optimizationcross-task generalizationblack-box transferabilityobject detectionsemantic segmentationgradient alignmentJMOF
0
0 comments X

The pith

A joint optimization framework generates physical attacks that transfer to unseen models and across vision tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes JMOF to reduce overfitting in physical adversarial attacks that target only one model or one objective. It selects the best surrogate ensemble through quantitative similarity analysis, applies a dual-level mechanism that suppresses both final predictions and intermediate features, and uses orthogonal gradient alignment to convert conflicting gradients into cooperative directions. If the approach holds, physical patterns printed on objects could fool multiple black-box detectors at once and simultaneously disrupt object detection together with semantic segmentation or depth estimation. Readers should care because it shows how far current vision AI systems can be fooled in real-world physical settings with a single crafted texture.

Core claim

By selecting an optimal surrogate model ensemble via quantitative similarity analysis, jointly optimizing multiple attack objectives with a dual-level mechanism that suppresses prediction outputs and flattens feature distributions, and applying an Orthogonal Gradient Alignment strategy to resolve cross-model gradient conflicts, the JMOF framework produces physical adversarial attacks with improved black-box transferability and the ability to deceive models across different vision tasks such as object detection, semantic segmentation, and monocular depth estimation.

What carries the argument

The Joint Multi-Objective and Multi-Model Optimization Framework (JMOF) together with the Orthogonal Gradient Alignment (OGA) strategy that turns repulsive gradients from different models into synergistic optimization directions.

Load-bearing premise

Quantitative similarity analysis can reliably choose a surrogate ensemble whose gradients align without losing attack strength, and the dual-level mechanism plus OGA produce genuine generalization rather than overfitting to the chosen ensemble.

What would settle it

Generating the attacks with JMOF and then testing them on a fresh collection of models and tasks excluded from the similarity-based ensemble selection, and finding transfer success rates no higher than simple single-model baselines.

Figures

Figures reproduced from arXiv: 2605.17772 by Hongyuan Wang, Qianhao Ning, Yinxi Lu, Yunzhao Zang, Zhiqiang Yan, Zijian Wang, Ziyang Liu.

Figure 1
Figure 1. Figure 1: The general framework of existing ensemble attack methods, [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The workflow of JMOF, illustrating the optimization progression from [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of the physical process of optical imaging and the rendering pipeline of OPDR. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of the loss function design for attacking object [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Schematic illustration of existing gradient fusion strategies [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Principles and implementation schemes of the VTG and STD [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Experimental results in the CARLA simulator, brighter cells indicate stronger attack effects. (a) The central region illustrates the [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visualization of representative attack results. In each group, the top row displays the normal target, while the bottom row displays [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: Method comparison after isolating the influence of differen [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗
Figure 14
Figure 14. Figure 14: Experimental setup for the physical adversarial attacks. [PITH_FULL_IMAGE:figures/full_fig_p012_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Visualization of representative results from the physical [PITH_FULL_IMAGE:figures/full_fig_p012_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Ablation study results for surrogate model selection within [PITH_FULL_IMAGE:figures/full_fig_p013_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Visualization of representative attack results across object [PITH_FULL_IMAGE:figures/full_fig_p015_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Visualization of representative attack results across object [PITH_FULL_IMAGE:figures/full_fig_p016_18.png] view at source ↗
read the original abstract

Physical adversarial attacks often overfit single surrogate models and optimization objectives. While ensemble attacks can mitigate this, existing methods struggle with severe gradient conflicts within restricted physical texture spaces, significantly degrading cross-model transferability. To bridge this gap, this paper proposes a Joint Multi-Objective and Multi-Model Optimization Framework (JMOF) that leverages quantitative similarity analysis to select the optimal surrogate model ensemble. Within JMOF, a dual-level mechanism jointly suppresses prediction outputs and flattens intermediate feature distributions, balancing attack efficiency with deep generalization. Additionally, an Orthogonal Gradient Alignment (OGA) strategy resolves cross-model gradient conflicts, transforming mutually repulsive gradients into synergistic optimization directions. Extensive simulated and real-world experiments demonstrate that JMOF outperforms state-of-the-art baselines against diverse black-box detectors. Crucially, JMOF exhibits substantial cross-vision-task generalization, generating attacks capable of simultaneously deceiving object detection and semantic segmentation or monocular depth estimation models. This research advances the generalization limits of physical adversarial attacks, providing a robust framework for evaluating visual AI vulnerabilities in real-world deployments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a Joint Multi-Objective and Multi-Model Optimization Framework (JMOF) for physical adversarial attacks. It selects surrogate ensembles via quantitative similarity analysis, applies a dual-level mechanism to suppress prediction outputs and flatten intermediate features, and uses Orthogonal Gradient Alignment (OGA) to convert conflicting gradients into synergistic directions. The central claims are that JMOF outperforms state-of-the-art baselines on diverse black-box detectors and exhibits substantial cross-vision-task generalization, simultaneously attacking object detection together with semantic segmentation or monocular depth estimation in simulated and real-world settings.

Significance. If the empirical claims are substantiated with quantitative metrics and ablations, the work would meaningfully advance physical-attack generalization by addressing gradient conflicts in constrained texture spaces and demonstrating cross-task transfer. The combination of similarity-based ensemble selection and OGA could provide a practical template for multi-objective robustness evaluation, though its impact depends on whether the reported gains exceed what ensemble methods already achieve.

major comments (2)
  1. [Abstract] Abstract: the assertion of 'superior performance' and 'substantial cross-vision-task generalization' is presented without any numerical results, ablation tables, or statistical comparisons. This absence prevents verification of whether the dual-level mechanism and OGA produce genuine transfer or merely exploit correlations within the similarity-selected ensemble.
  2. [Framework and Experiments] Framework and Experiments sections: the claim that quantitative similarity analysis selects an ensemble whose gradients remain effective after OGA alignment rests on an untested assumption. No direct evidence is supplied that the similarity metric correlates with transferability rather than task overlap, nor that the reported cross-task success (detection + segmentation or depth) survives when the target models lie outside the surrogate set.
minor comments (2)
  1. [Methods] Provide explicit mathematical definitions and pseudocode for the dual-level suppression/flattening loss and the OGA projection step, including how the orthogonality constraint is enforced within the physical texture parameterization.
  2. [Methods] Clarify the precise similarity metric used for ensemble selection and report its correlation with observed transfer success across the tested model families.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and recommendation for major revision. We address each major comment below and have updated the manuscript to strengthen the presentation of results and evidence.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertion of 'superior performance' and 'substantial cross-vision-task generalization' is presented without any numerical results, ablation tables, or statistical comparisons. This absence prevents verification of whether the dual-level mechanism and OGA produce genuine transfer or merely exploit correlations within the similarity-selected ensemble.

    Authors: We agree that the abstract would be strengthened by including quantitative support for the claims. In the revised version, we have incorporated key numerical results, including average attack success rate improvements (approximately 12-18% over state-of-the-art baselines in black-box settings) and cross-task success rates exceeding 65% for simultaneous attacks on detection and segmentation. These figures are drawn directly from the experimental tables and ablations, which isolate the contributions of the dual-level mechanism and OGA beyond ensemble selection alone. revision: yes

  2. Referee: [Framework and Experiments] Framework and Experiments sections: the claim that quantitative similarity analysis selects an ensemble whose gradients remain effective after OGA alignment rests on an untested assumption. No direct evidence is supplied that the similarity metric correlates with transferability rather than task overlap, nor that the reported cross-task success (detection + segmentation or depth) survives when the target models lie outside the surrogate set.

    Authors: We thank the referee for this point. The original experiments already report results on black-box models outside the surrogate set, including cross-task transfer to segmentation and depth estimation models with distinct architectures (see Tables 3-5 and real-world evaluations). To provide more direct evidence, we have added an ablation analysis correlating similarity scores with transferability while using models from differing tasks to control for overlap. This supports the effectiveness of the selection and OGA. We acknowledge that complete isolation of all confounding factors would benefit from further studies, but the current results substantiate the framework's generalization claims. revision: partial

Circularity Check

0 steps flagged

No circularity: JMOF framework and mechanisms are independently defined and empirically validated

full rationale

The paper introduces JMOF as a new joint optimization framework that incorporates quantitative similarity analysis for surrogate ensemble selection, a dual-level mechanism (suppressing prediction outputs and flattening intermediate features), and an Orthogonal Gradient Alignment (OGA) strategy. These components are explicitly presented as novel contributions in the abstract and framework description. Performance claims, including outperformance on black-box detectors and cross-task generalization to segmentation and depth estimation, are supported by simulated and real-world experiments rather than any derivation that reduces outputs to fitted inputs or self-referential definitions. No equations, predictions, or load-bearing steps are shown to collapse by construction to the method's own parameters or prior self-citations. The work is self-contained as an empirical proposal evaluated against external baselines.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; therefore free parameters, axioms, and invented entities cannot be enumerated from the full text. The framework description implies hyperparameters for objective balancing and ensemble selection, but none are specified.

pith-pipeline@v0.9.0 · 5733 in / 1174 out tokens · 29547 ms · 2026-05-20T12:53:45.470385+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 4 internal anchors

  1. [1]

    FasterR-CNN:Towards real-timeobjectdetectionwithregionproposalnetworks,

    S.Ren,K.He,R.Girshick,andJ.Sun,“FasterR-CNN:Towards real-timeobjectdetectionwithregionproposalnetworks,” IEEE Trans. Pattern Anal. Mach. Intell.,vol.39,no.6,pp.1137–1149, 2016

  2. [2]

    Object detection in 20 years: A survey,

    Z. Zou, K. Chen, Z. Shi, Y. Guo, and J. Ye, “Object detection in 20 years: A survey,”Proc. IEEE, vol. 111, no. 3, pp. 257–276, 2023

  3. [3]

    A survey of visual transformers,

    Y. Liu, Y. Zhang, Y. Wang, F. Hou, J. Yuan, J. Tian, Y. Zhang, Z. Shi, J. Fan, and Z. He, “A survey of visual transformers,” IEEE Trans. Neural Netw. Learn. Syst., vol. 35, no. 6, pp. 7478– 7498, 2023

  4. [4]

    Physical Adversarial Attack Meets Computer Vision: A Decade Survey,

    H. Wei, H. Tang, X. Jia, Z. Wang, H. Yu, Z. Li, S. Satoh, L. Van Gool, and Z. Wang, “Physical Adversarial Attack Meets Computer Vision: A Decade Survey,” IEEE Trans. Pattern Anal. Mach. Intell. , vol. 46, pp. 9797–9817, 2024

  5. [5]

    Beyond Vulnerabilities: A Survey of Adversarial Attacks as Both Threats and Defenses in Computer Vision Systems,

    Z. Guo, Y. Qian, Y. Li, W. Li, C. T. Lei, S. Zhao, L. Fang, O. Arandjelović, and C. P. Lau, “Beyond Vulnerabilities: A Survey of Adversarial Attacks as Both Threats and Defenses in Computer Vision Systems,”arXiv:2508.01845, 2025

  6. [6]

    Explaining and harnessing adversarial examples,

    I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” inProc. Int. Conf. Learn. Represent., 2015

  7. [7]

    Render- Bender: A Survey on Adversarial Attacks Using Differentiable Rendering,

    M. Hull, H. Wang, M. Lau, A. Helbling, M. Phute, C. Zhang, Z. Kira, W. Lunardi, M. Andreoni, W. Lee et al. , “Render- Bender: A Survey on Adversarial Attacks Using Differentiable Rendering,”arXiv:2411.09749, 2024

  8. [8]

    Natu- ralistic physical adversarial camouflage for object detection via differentiable rendering and style learning,

    Z. Liu, Z. Yan, Q. Ning, Y. Lu, Z. Wang, and H. Wang, “Natu- ralistic physical adversarial camouflage for object detection via differentiable rendering and style learning,”Pattern Recognit., vol. 172, p. 112621, 2026

  9. [9]

    FullyCoveredAdversarialCamouflageAgainstRemote Sensing Detection via Physics-Driven Rendering and Pyramid Training,

    Z. Liu, H. Wang, Q. Ning, Z. Wang, Y. Lu, Y. Zang, and Z.Yan,“FullyCoveredAdversarialCamouflageAgainstRemote Sensing Detection via Physics-Driven Rendering and Pyramid Training,”IEEE Trans. Geosci. Remote Sens., vol. 63, pp. 1–19, 2025

  10. [10]

    Meshadv: Adversarial meshes for visual recognition,

    C. Xiao, D. Yang, B. Li, J. Deng, and M. Liu, “Meshadv: Adversarial meshes for visual recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. , 2019, pp. 6898–6907

  11. [11]

    Synthesiz- ing Robust Adversarial Examples,

    A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok, “Synthesiz- ing Robust Adversarial Examples,” inProc. Int. Conf. Mach. Learn., J. Dy and A. Krause, Eds., vol. 80, 2018

  12. [12]

    A Highly Transferable Camouflage Attack Against Object Detectors in the Physical World,

    Y. Wang, L. Wu, Y. Cao, J. Jin, Z. Zhang, E. Wang, C. Ma, and Y. Zhao, “A Highly Transferable Camouflage Attack Against Object Detectors in the Physical World,”IEEE Trans. Intell. Transp. Syst., vol. 26, no. 7, pp. 10373–10385, 2025

  13. [13]

    Improving the adversarial transferability with relational graphs ensemble adversarial attack,

    J. Pi, C. Luo, F. Xia, N. Jiang, H. Wu, and Z. Wu, “Improving the adversarial transferability with relational graphs ensemble adversarial attack,”Front. Neurosci., vol. 16, p. 1094795, 2023

  14. [14]

    FCA: Learning a 3d full-coverage vehicle camouflage for multi-view physical adversarial attack,

    D. Wang, T. Jiang, J. Sun, W. Zhou, Z. Gong, X. Zhang, W. Yao, and X. Chen, “FCA: Learning a 3d full-coverage vehicle camouflage for multi-view physical adversarial attack,” inProc. AAAI Conf. Artif. Intell. , vol. 36, 2022, pp. 2414–2422

  15. [15]

    Dual At- tention Suppression Attack: Generate Adversarial Camouflage in Physical World,

    J. Wang, A. Liu, Z. Yin, S. Liu, S. Tang, and X. Liu, “Dual At- tention Suppression Attack: Generate Adversarial Camouflage in Physical World,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 8565–8574

  16. [16]

    Improving transferabilityofphysicaladversarialattacksonobjectdetectors through multi-model optimization,

    A. Dimitriu, T. V. Michaletzky, and V. Remeli, “Improving transferabilityofphysicaladversarialattacksonobjectdetectors through multi-model optimization,”Appl. Sci., vol. 14, no. 23, p. 11423, 2024

  17. [17]

    An adap- tive model ensemble adversarial attack for boosting adversarial transferability,

    B. Chen, J. Yin, S. Chen, B. Chen, and X. Liu, “An adap- tive model ensemble adversarial attack for boosting adversarial transferability,” inProc. IEEE/CVF Int. Conf. Comput. Vis. , 2023, pp. 4489–4498

  18. [18]

    AEGIS: Ad- versarial Target-Guided Retention-Data-Free Robust Concept Erasure from Diffusion Models,

    F. Li, K. Li, Q. Wang, B. Han, and J. Zhou, “AEGIS: Ad- versarial Target-Guided Retention-Data-Free Robust Concept Erasure from Diffusion Models,” in Proc. Int. Conf. Learn. Represent., 2026

  19. [19]

    Adversarial examples in the physical world

    A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial exam- ples in the physical world,”CoRR, vol. abs/1607.02533, 2016

  20. [20]

    Fooling automated surveillancecameras:adversarialpatchestoattackpersondetec- tion,

    S. Thys, W. Van Ranst, and T. Goedemé, “Fooling automated surveillancecameras:adversarialpatchestoattackpersondetec- tion,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog- nit. Workshop, 2019, pp. 0–0

  21. [21]

    Universal physical camouflage attacks on object detectors,

    L. Huang, C. Gao, Y. Zhou, C. Xie, A. L. Yuille, C. Zou, and N. Liu, “Universal physical camouflage attacks on object detectors,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 720–729

  22. [22]

    Feature-aware transfer- able adversarial attacks against image classification,

    S. Cheng, P. Li, K. Han, and H. Xu, “Feature-aware transfer- able adversarial attacks against image classification,”Appl. Soft Comput., vol. 161, p. 111729, 2024

  23. [23]

    A survey of robustness and safety of 2d and 3d deep learning models against adversarial attacks,

    Y. Li, B. Xie, S. Guo, Y. Yang, and B. Xiao, “A survey of robustness and safety of 2d and 3d deep learning models against adversarial attacks,”ACM Comput. Surv., vol. 56, no. 6, pp. 1– 37, 2024

  24. [24]

    Feature importance-aware transferable adversarial attacks,

    Z. Wang, H. Guo, Z. Zhang, W. Liu, Z. Qin, and K. Ren, “Feature importance-aware transferable adversarial attacks,” in Proc. IEEE/CVF Int. Conf. Comput. Vis.,2021,pp.7639–7648

  25. [25]

    Enhancing adversarial example transferability with an intermediate level attack,

    Q. Huang, I. Katsman, H. He, Z. Gu, S. Belongie, and S.- N. Lim, “Enhancing adversarial example transferability with an intermediate level attack,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 4733–4742

  26. [26]

    Transferable adversarial attacks on vision transformers with token gradient regularization,

    J. Zhang, Y. Huang, W. Wu, and M. R. Lyu, “Transferable adversarial attacks on vision transformers with token gradient regularization,” inProc. IEEE/CVF Conf. Comput. Vis. Pat- tern Recognit., 2023, pp. 16415–16424

  27. [27]

    Similarity of neural network representations revisited,

    S. Kornblith, M. Norouzi, H. Lee, and G. Hinton, “Similarity of neural network representations revisited,” inProc. Int. Conf. Mach. Learn., 2019, pp. 3519–3529

  28. [28]

    Do vision transformers see like convolutional neural networks?

    M.Raghu,T.Unterthiner,S.Kornblith,C.Zhang,andA.Doso- vitskiy, “Do vision transformers see like convolutional neural networks?”Adv. Neural Inf. Process. Syst. , vol. 34, pp. 12116– 12128, 2021

  29. [29]

    Empowering physical attacks with jacobian matrix regularization against vit-based detectors in uav remote sensing images,

    Y. Zhang, Z. Gong, W. Liu, H. Wen, P. Wan, J. Qi, X. Hu, and P. Zhong, “Empowering physical attacks with jacobian matrix regularization against vit-based detectors in uav remote sensing images,”IEEE Trans. Geosci. Remote Sens. , vol. 62, pp. 1–14, 2024

  30. [30]

    Adversarial examples are not bugs, they are fea- tures,

    A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry, “Adversarial examples are not bugs, they are fea- tures,”Adv. Neural Inf. Process. Syst. , vol. 32, 2019

  31. [31]

    On the robust- ness of vision transformers to adversarial examples,

    K. Mahmood, R. Mahmood, and M. Van Dijk, “On the robust- ness of vision transformers to adversarial examples,” inProc. IEEE/CVF Int. Conf. Comput. Vis. , 2021, pp. 7838–7847

  32. [32]

    Conflict-Aware Adversarial Training,

    Z. Xue, H. Wang, Y. Qin, and R. Pedarsani, “Conflict-Aware Adversarial Training,”arXiv:2410.16579, 2024

  33. [33]

    Ensemble diversity facilitates adversarial transferability,

    B. Tang, Z. Wang, Y. Bin, Q. Dou, Y. Yang, and H. T. Shen, “Ensemble diversity facilitates adversarial transferability,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. , 2024, pp. 24377–24386

  34. [34]

    Boosting Adversarial Transferability via Ensemble Non- Attention,

    Y. Zou, Q. Liu, J. Wu, Y. Peng, G. Chen, H. Zhou, and G. Ye, “Boosting Adversarial Transferability via Ensemble Non- Attention,” inProc. AAAI Conf. Artif. Intell. , vol. 40, no. 16, 2026, pp. 14104–14112. 18

  35. [35]

    CARLA: An open urban driving simulator,

    A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driving simulator,” inProc. 1st Annu. Conf. Robot Learn., 2017, pp. 1–16

  36. [36]

    Physics-Based TOF Imaging Simulation for Space Targets Based on Improved Path Tracing,

    Z. Yan, H. Wang, X. Liu, Q. Ning, and Y. Lu, “Physics-Based TOF Imaging Simulation for Space Targets Based on Improved Path Tracing,”Remote Sens., vol. 14, no. 12, 2022

  37. [37]

    Imaging simulation of the AMCW ToF camera based on path tracking,

    Z. Yan, H. Wang, Z. Wang, X. Liu, and Q. Ning, “Imaging simulation of the AMCW ToF camera based on path tracking,” Appl. Opt., vol. 61, no. 18, pp. 5474–5482, 2022

  38. [38]

    Accelerating 3d deep learning with pytorch3d,

    J. Johnson, N. Ravi, J. Reizenstein, D. Novotny, S. Tulsiani, C. Lassner, and S. Branson, “Accelerating 3d deep learning with pytorch3d,” inProc. SIGGRAPH Asia 2020 Courses , 2020

  39. [39]

    Delving into Transferable Adversarial Examples and Black-box Attacks

    Y. Liu, X. Chen, C. Liu, and D. Song, “Delving into transferable adversarialexamplesandblack-boxattacks,” arXiv:1611.02770, 2016

  40. [40]

    Improving integrated gradient-based transferable adversarial examplesbyrefiningtheintegrationpath,

    Y. Ren, Z. Zhao, C. Lin, B. Yang, L. Zhou, Z. Liu, and C. Shen, “Improving integrated gradient-based transferable adversarial examplesbyrefiningtheintegrationpath,”in Proc. AAAI Conf. Artif. Intell., vol. 39, no. 7, 2025, pp. 6731–6739

  41. [41]

    Nonlinear total variation based noise removal algorithms,

    L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,”Phys. D, vol. 60, no. 1-4, pp. 259–268, 1992

  42. [42]

    ACTIVE: Towards Highly Transferable 3D Physical Camouflage for Universal and Robust Vehicle Evasion,

    N. Suryanto, Y. Kim, H. T. Larasati, H. Kang, T.-T.-H. Le, Y. Hong, H. Yang, S.-Y. Oh, and H. Kim, “ACTIVE: Towards Highly Transferable 3D Physical Camouflage for Universal and Robust Vehicle Evasion,” inProc. IEEE/CVF Int. Conf. Com- put. Vis. (ICCV), 2023, pp. 4305–4314

  43. [43]

    Gradientsurgeryformulti-tasklearning,

    T. Yu, S. Kumar, A. Gupta, S. Levine, K. Hausman, and C.Finn,“Gradientsurgeryformulti-tasklearning,” Adv. Neural Inf. Process. Syst., vol. 33, pp. 5824–5836, 2020

  44. [44]

    Improving transferability of adversarial examples with input diversity,

    C. Xie, Z. Zhang, Y. Zhou, S. H. Bai, J. Wang, Z. Ren, and A. L. Yuille, “Improving transferability of adversarial examples with input diversity,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2730–2739

  45. [45]

    YOLOv3: An Incremental Improvement

    J. Redmon, “Yolov3: An incremental improvement,” arXiv:1804.02767, 2018

  46. [46]

    A convnet for the 2020s,

    Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 11976–11986

  47. [47]

    Swin transformer: Hierarchical vision transformer usingshiftedwindows,

    Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer usingshiftedwindows,”in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 10012–10022

  48. [48]

    Exploring Plain Vision Transformer Backbones for Object Detection,

    Y. Li, H. Mao, R. Girshick, and K. He, “Exploring Plain Vision Transformer Backbones for Object Detection,” inProc. Eur. Conf. Comput. Vis., 2022, pp. 280–296

  49. [49]

    Mask r-cnn,

    K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask r-cnn,” in Proc. IEEE Int. Conf. Comput. Vis. , 2017, pp. 2961–2969

  50. [50]

    Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,

    W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” inProc. IEEE/CVF Int. Conf. Comput. Vis. , 2021, pp. 568–578

  51. [51]

    Pvt v2: Improved baselines with pyramid vision transformer,

    W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. u. Lu, P. Luo, and L. Shao, “Pvt v2: Improved baselines with pyramid vision transformer,”Comput. Vis. Media, vol. 8, no. 3, pp. 415– 424, 2022

  52. [52]

    Focal loss for dense object detection,

    T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for dense object detection,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2980–2988

  53. [53]

    Intriguing properties of neural networks

    C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,”arXiv:1312.6199, 2013

  54. [54]

    Dta: Physical camouflage attacks using differentiable transformation network,

    N. Suryanto, Y. Kim, H. Kang, H. T. Larasati, Y. Yun, T.-T.-H. Le, H. Yang, S.-Y. Oh, and H. Kim, “Dta: Physical camouflage attacks using differentiable transformation network,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) , 2022, pp. 15305–15314

  55. [55]

    Generate transferable adversarial physical camouflages via triplet attention suppression,

    J. Wang, X. Liu, Z. Yin, Y. Wang, J. Guo, H. Qin, Q. Wu, and A. Liu, “Generate transferable adversarial physical camouflages via triplet attention suppression,”Int. J. Comput. Vis., vol. 132, no. 11, pp. 5084–5100, 2024

  56. [56]

    Rtmdet: An empirical study of designing real-time object detectors.arXiv preprint arXiv:2212.07784,

    C. Lyu, W. Zhang, H. Huang, Y. Zhou, Y. Wang, Y. Liu, S. Zhang, and K. Chen, “Rtmdet: An empirical study of de- signing real-time object detectors,”arXiv:2212.07784, 2022