Towards Universal Physical Adversarial Attacks via a Joint Multi-Objective and Multi-Model Optimization Framework

Hongyuan Wang; Qianhao Ning; Yinxi Lu; Yunzhao Zang; Zhiqiang Yan; Zijian Wang; Ziyang Liu

arxiv: 2605.17772 · v1 · pith:YBW3YWT2new · submitted 2026-05-18 · 💻 cs.CV

Towards Universal Physical Adversarial Attacks via a Joint Multi-Objective and Multi-Model Optimization Framework

Ziyang Liu , Hongyuan Wang , Zijian Wang , Yinxi Lu , Yunzhao Zang , Zhiqiang Yan , Qianhao Ning This is my paper

Pith reviewed 2026-05-20 12:53 UTC · model grok-4.3

classification 💻 cs.CV

keywords physical adversarial attacksmulti-model optimizationcross-task generalizationblack-box transferabilityobject detectionsemantic segmentationgradient alignmentJMOF

0 comments

The pith

A joint optimization framework generates physical attacks that transfer to unseen models and across vision tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes JMOF to reduce overfitting in physical adversarial attacks that target only one model or one objective. It selects the best surrogate ensemble through quantitative similarity analysis, applies a dual-level mechanism that suppresses both final predictions and intermediate features, and uses orthogonal gradient alignment to convert conflicting gradients into cooperative directions. If the approach holds, physical patterns printed on objects could fool multiple black-box detectors at once and simultaneously disrupt object detection together with semantic segmentation or depth estimation. Readers should care because it shows how far current vision AI systems can be fooled in real-world physical settings with a single crafted texture.

Core claim

By selecting an optimal surrogate model ensemble via quantitative similarity analysis, jointly optimizing multiple attack objectives with a dual-level mechanism that suppresses prediction outputs and flattens feature distributions, and applying an Orthogonal Gradient Alignment strategy to resolve cross-model gradient conflicts, the JMOF framework produces physical adversarial attacks with improved black-box transferability and the ability to deceive models across different vision tasks such as object detection, semantic segmentation, and monocular depth estimation.

What carries the argument

The Joint Multi-Objective and Multi-Model Optimization Framework (JMOF) together with the Orthogonal Gradient Alignment (OGA) strategy that turns repulsive gradients from different models into synergistic optimization directions.

Load-bearing premise

Quantitative similarity analysis can reliably choose a surrogate ensemble whose gradients align without losing attack strength, and the dual-level mechanism plus OGA produce genuine generalization rather than overfitting to the chosen ensemble.

What would settle it

Generating the attacks with JMOF and then testing them on a fresh collection of models and tasks excluded from the similarity-based ensemble selection, and finding transfer success rates no higher than simple single-model baselines.

Figures

Figures reproduced from arXiv: 2605.17772 by Hongyuan Wang, Qianhao Ning, Yinxi Lu, Yunzhao Zang, Zhiqiang Yan, Zijian Wang, Ziyang Liu.

**Figure 2.** Figure 2: The workflow of JMOF, illustrating the optimization progression from [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of the physical process of optical imaging and the rendering pipeline of OPDR. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 5.** Figure 5: Visualization of the loss function design for attacking object [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Schematic illustration of existing gradient fusion strategies [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Principles and implementation schemes of the VTG and STD [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Experimental results in the CARLA simulator, brighter cells indicate stronger attack effects. (a) The central region illustrates the [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 9.** Figure 9: Visualization of representative attack results. In each group, the top row displays the normal target, while the bottom row displays [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

**Figure 11.** Figure 11: Method comparison after isolating the influence of differen [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗

**Figure 14.** Figure 14: Experimental setup for the physical adversarial attacks. [PITH_FULL_IMAGE:figures/full_fig_p012_14.png] view at source ↗

**Figure 15.** Figure 15: Visualization of representative results from the physical [PITH_FULL_IMAGE:figures/full_fig_p012_15.png] view at source ↗

**Figure 16.** Figure 16: Ablation study results for surrogate model selection within [PITH_FULL_IMAGE:figures/full_fig_p013_16.png] view at source ↗

**Figure 17.** Figure 17: Visualization of representative attack results across object [PITH_FULL_IMAGE:figures/full_fig_p015_17.png] view at source ↗

**Figure 18.** Figure 18: Visualization of representative attack results across object [PITH_FULL_IMAGE:figures/full_fig_p016_18.png] view at source ↗

read the original abstract

Physical adversarial attacks often overfit single surrogate models and optimization objectives. While ensemble attacks can mitigate this, existing methods struggle with severe gradient conflicts within restricted physical texture spaces, significantly degrading cross-model transferability. To bridge this gap, this paper proposes a Joint Multi-Objective and Multi-Model Optimization Framework (JMOF) that leverages quantitative similarity analysis to select the optimal surrogate model ensemble. Within JMOF, a dual-level mechanism jointly suppresses prediction outputs and flattens intermediate feature distributions, balancing attack efficiency with deep generalization. Additionally, an Orthogonal Gradient Alignment (OGA) strategy resolves cross-model gradient conflicts, transforming mutually repulsive gradients into synergistic optimization directions. Extensive simulated and real-world experiments demonstrate that JMOF outperforms state-of-the-art baselines against diverse black-box detectors. Crucially, JMOF exhibits substantial cross-vision-task generalization, generating attacks capable of simultaneously deceiving object detection and semantic segmentation or monocular depth estimation models. This research advances the generalization limits of physical adversarial attacks, providing a robust framework for evaluating visual AI vulnerabilities in real-world deployments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper puts forward a JMOF framework with similarity-based ensemble selection, dual-level suppression, and orthogonal gradient alignment to improve physical attack transfer, but the cross-task generalization claims rest on assumptions that need tighter experimental checks.

read the letter

Hey, the main thing to know is that this paper introduces JMOF, a joint multi-objective and multi-model setup for physical adversarial attacks. It picks surrogates through quantitative similarity analysis, applies dual-level suppression on both predictions and features, and uses Orthogonal Gradient Alignment to handle gradient conflicts in the limited physical texture space. That combination is the concrete new piece, and it directly targets the overfitting and conflict problems that plague single-model and standard ensemble attacks. The paper does a solid job spelling out why those issues matter in real-world settings and why aligning gradients orthogonally could turn conflicts into cooperative directions rather than just averaging them away. If the experiments hold up, the OGA step could be a practical addition for anyone building robustness tests on detectors or other vision models. The soft spots sit mostly in the validation of the bigger claims. The abstract states clear wins on black-box detectors plus substantial cross-task generalization to segmentation and depth estimation, yet the details on numbers, ablations, or how conflicts were actually measured are thin in the summary. The stress-test note flags the risk that similarity selection might just capture task overlap instead of true transfer, and that the dual-level plus OGA results could overfit the chosen ensemble rather than generalize. If the full paper shows diverse test models, controls for that overlap, and real-world results that separate those effects, the generalization part strengthens; otherwise it stays the weaker link. This work is aimed at researchers who evaluate physical robustness in computer vision systems, such as those working on surveillance or robotics. Readers already thinking about ensemble attacks or multi-task transfer would find the specific strategies worth examining. It deserves peer review because the framework is motivated, the technical moves are defined enough to critique and build on, and the topic has clear practical stakes even if the evidence needs tightening.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a Joint Multi-Objective and Multi-Model Optimization Framework (JMOF) for physical adversarial attacks. It selects surrogate ensembles via quantitative similarity analysis, applies a dual-level mechanism to suppress prediction outputs and flatten intermediate features, and uses Orthogonal Gradient Alignment (OGA) to convert conflicting gradients into synergistic directions. The central claims are that JMOF outperforms state-of-the-art baselines on diverse black-box detectors and exhibits substantial cross-vision-task generalization, simultaneously attacking object detection together with semantic segmentation or monocular depth estimation in simulated and real-world settings.

Significance. If the empirical claims are substantiated with quantitative metrics and ablations, the work would meaningfully advance physical-attack generalization by addressing gradient conflicts in constrained texture spaces and demonstrating cross-task transfer. The combination of similarity-based ensemble selection and OGA could provide a practical template for multi-objective robustness evaluation, though its impact depends on whether the reported gains exceed what ensemble methods already achieve.

major comments (2)

[Abstract] Abstract: the assertion of 'superior performance' and 'substantial cross-vision-task generalization' is presented without any numerical results, ablation tables, or statistical comparisons. This absence prevents verification of whether the dual-level mechanism and OGA produce genuine transfer or merely exploit correlations within the similarity-selected ensemble.
[Framework and Experiments] Framework and Experiments sections: the claim that quantitative similarity analysis selects an ensemble whose gradients remain effective after OGA alignment rests on an untested assumption. No direct evidence is supplied that the similarity metric correlates with transferability rather than task overlap, nor that the reported cross-task success (detection + segmentation or depth) survives when the target models lie outside the surrogate set.

minor comments (2)

[Methods] Provide explicit mathematical definitions and pseudocode for the dual-level suppression/flattening loss and the OGA projection step, including how the orthogonality constraint is enforced within the physical texture parameterization.
[Methods] Clarify the precise similarity metric used for ensemble selection and report its correlation with observed transfer success across the tested model families.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and recommendation for major revision. We address each major comment below and have updated the manuscript to strengthen the presentation of results and evidence.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion of 'superior performance' and 'substantial cross-vision-task generalization' is presented without any numerical results, ablation tables, or statistical comparisons. This absence prevents verification of whether the dual-level mechanism and OGA produce genuine transfer or merely exploit correlations within the similarity-selected ensemble.

Authors: We agree that the abstract would be strengthened by including quantitative support for the claims. In the revised version, we have incorporated key numerical results, including average attack success rate improvements (approximately 12-18% over state-of-the-art baselines in black-box settings) and cross-task success rates exceeding 65% for simultaneous attacks on detection and segmentation. These figures are drawn directly from the experimental tables and ablations, which isolate the contributions of the dual-level mechanism and OGA beyond ensemble selection alone. revision: yes
Referee: [Framework and Experiments] Framework and Experiments sections: the claim that quantitative similarity analysis selects an ensemble whose gradients remain effective after OGA alignment rests on an untested assumption. No direct evidence is supplied that the similarity metric correlates with transferability rather than task overlap, nor that the reported cross-task success (detection + segmentation or depth) survives when the target models lie outside the surrogate set.

Authors: We thank the referee for this point. The original experiments already report results on black-box models outside the surrogate set, including cross-task transfer to segmentation and depth estimation models with distinct architectures (see Tables 3-5 and real-world evaluations). To provide more direct evidence, we have added an ablation analysis correlating similarity scores with transferability while using models from differing tasks to control for overlap. This supports the effectiveness of the selection and OGA. We acknowledge that complete isolation of all confounding factors would benefit from further studies, but the current results substantiate the framework's generalization claims. revision: partial

Circularity Check

0 steps flagged

No circularity: JMOF framework and mechanisms are independently defined and empirically validated

full rationale

The paper introduces JMOF as a new joint optimization framework that incorporates quantitative similarity analysis for surrogate ensemble selection, a dual-level mechanism (suppressing prediction outputs and flattening intermediate features), and an Orthogonal Gradient Alignment (OGA) strategy. These components are explicitly presented as novel contributions in the abstract and framework description. Performance claims, including outperformance on black-box detectors and cross-task generalization to segmentation and depth estimation, are supported by simulated and real-world experiments rather than any derivation that reduces outputs to fitted inputs or self-referential definitions. No equations, predictions, or load-bearing steps are shown to collapse by construction to the method's own parameters or prior self-citations. The work is self-contained as an empirical proposal evaluated against external baselines.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; therefore free parameters, axioms, and invented entities cannot be enumerated from the full text. The framework description implies hyperparameters for objective balancing and ensemble selection, but none are specified.

pith-pipeline@v0.9.0 · 5733 in / 1174 out tokens · 29547 ms · 2026-05-20T12:53:45.470385+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

quantitative similarity analysis to select the optimal surrogate model ensemble... Orthogonal Gradient Alignment (OGA) strategy resolves cross-model gradient conflicts
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

dual-level mechanism jointly suppresses prediction outputs and flattens intermediate feature distributions

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 4 internal anchors

[1]

FasterR-CNN:Towards real-timeobjectdetectionwithregionproposalnetworks,

S.Ren,K.He,R.Girshick,andJ.Sun,“FasterR-CNN:Towards real-timeobjectdetectionwithregionproposalnetworks,” IEEE Trans. Pattern Anal. Mach. Intell.,vol.39,no.6,pp.1137–1149, 2016

work page 2016
[2]

Object detection in 20 years: A survey,

Z. Zou, K. Chen, Z. Shi, Y. Guo, and J. Ye, “Object detection in 20 years: A survey,”Proc. IEEE, vol. 111, no. 3, pp. 257–276, 2023

work page 2023
[3]

A survey of visual transformers,

Y. Liu, Y. Zhang, Y. Wang, F. Hou, J. Yuan, J. Tian, Y. Zhang, Z. Shi, J. Fan, and Z. He, “A survey of visual transformers,” IEEE Trans. Neural Netw. Learn. Syst., vol. 35, no. 6, pp. 7478– 7498, 2023

work page 2023
[4]

Physical Adversarial Attack Meets Computer Vision: A Decade Survey,

H. Wei, H. Tang, X. Jia, Z. Wang, H. Yu, Z. Li, S. Satoh, L. Van Gool, and Z. Wang, “Physical Adversarial Attack Meets Computer Vision: A Decade Survey,” IEEE Trans. Pattern Anal. Mach. Intell. , vol. 46, pp. 9797–9817, 2024

work page 2024
[5]

Beyond Vulnerabilities: A Survey of Adversarial Attacks as Both Threats and Defenses in Computer Vision Systems,

Z. Guo, Y. Qian, Y. Li, W. Li, C. T. Lei, S. Zhao, L. Fang, O. Arandjelović, and C. P. Lau, “Beyond Vulnerabilities: A Survey of Adversarial Attacks as Both Threats and Defenses in Computer Vision Systems,”arXiv:2508.01845, 2025

work page arXiv 2025
[6]

Explaining and harnessing adversarial examples,

I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” inProc. Int. Conf. Learn. Represent., 2015

work page 2015
[7]

Render- Bender: A Survey on Adversarial Attacks Using Differentiable Rendering,

M. Hull, H. Wang, M. Lau, A. Helbling, M. Phute, C. Zhang, Z. Kira, W. Lunardi, M. Andreoni, W. Lee et al. , “Render- Bender: A Survey on Adversarial Attacks Using Differentiable Rendering,”arXiv:2411.09749, 2024

work page arXiv 2024
[8]

Natu- ralistic physical adversarial camouflage for object detection via differentiable rendering and style learning,

Z. Liu, Z. Yan, Q. Ning, Y. Lu, Z. Wang, and H. Wang, “Natu- ralistic physical adversarial camouflage for object detection via differentiable rendering and style learning,”Pattern Recognit., vol. 172, p. 112621, 2026

work page 2026
[9]

FullyCoveredAdversarialCamouflageAgainstRemote Sensing Detection via Physics-Driven Rendering and Pyramid Training,

Z. Liu, H. Wang, Q. Ning, Z. Wang, Y. Lu, Y. Zang, and Z.Yan,“FullyCoveredAdversarialCamouflageAgainstRemote Sensing Detection via Physics-Driven Rendering and Pyramid Training,”IEEE Trans. Geosci. Remote Sens., vol. 63, pp. 1–19, 2025

work page 2025
[10]

Meshadv: Adversarial meshes for visual recognition,

C. Xiao, D. Yang, B. Li, J. Deng, and M. Liu, “Meshadv: Adversarial meshes for visual recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. , 2019, pp. 6898–6907

work page 2019
[11]

Synthesiz- ing Robust Adversarial Examples,

A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok, “Synthesiz- ing Robust Adversarial Examples,” inProc. Int. Conf. Mach. Learn., J. Dy and A. Krause, Eds., vol. 80, 2018

work page 2018
[12]

A Highly Transferable Camouflage Attack Against Object Detectors in the Physical World,

Y. Wang, L. Wu, Y. Cao, J. Jin, Z. Zhang, E. Wang, C. Ma, and Y. Zhao, “A Highly Transferable Camouflage Attack Against Object Detectors in the Physical World,”IEEE Trans. Intell. Transp. Syst., vol. 26, no. 7, pp. 10373–10385, 2025

work page 2025
[13]

Improving the adversarial transferability with relational graphs ensemble adversarial attack,

J. Pi, C. Luo, F. Xia, N. Jiang, H. Wu, and Z. Wu, “Improving the adversarial transferability with relational graphs ensemble adversarial attack,”Front. Neurosci., vol. 16, p. 1094795, 2023

work page 2023
[14]

FCA: Learning a 3d full-coverage vehicle camouflage for multi-view physical adversarial attack,

D. Wang, T. Jiang, J. Sun, W. Zhou, Z. Gong, X. Zhang, W. Yao, and X. Chen, “FCA: Learning a 3d full-coverage vehicle camouflage for multi-view physical adversarial attack,” inProc. AAAI Conf. Artif. Intell. , vol. 36, 2022, pp. 2414–2422

work page 2022
[15]

Dual At- tention Suppression Attack: Generate Adversarial Camouflage in Physical World,

J. Wang, A. Liu, Z. Yin, S. Liu, S. Tang, and X. Liu, “Dual At- tention Suppression Attack: Generate Adversarial Camouflage in Physical World,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 8565–8574

work page 2021
[16]

Improving transferabilityofphysicaladversarialattacksonobjectdetectors through multi-model optimization,

A. Dimitriu, T. V. Michaletzky, and V. Remeli, “Improving transferabilityofphysicaladversarialattacksonobjectdetectors through multi-model optimization,”Appl. Sci., vol. 14, no. 23, p. 11423, 2024

work page 2024
[17]

An adap- tive model ensemble adversarial attack for boosting adversarial transferability,

B. Chen, J. Yin, S. Chen, B. Chen, and X. Liu, “An adap- tive model ensemble adversarial attack for boosting adversarial transferability,” inProc. IEEE/CVF Int. Conf. Comput. Vis. , 2023, pp. 4489–4498

work page 2023
[18]

AEGIS: Ad- versarial Target-Guided Retention-Data-Free Robust Concept Erasure from Diffusion Models,

F. Li, K. Li, Q. Wang, B. Han, and J. Zhou, “AEGIS: Ad- versarial Target-Guided Retention-Data-Free Robust Concept Erasure from Diffusion Models,” in Proc. Int. Conf. Learn. Represent., 2026

work page 2026
[19]

Adversarial examples in the physical world

A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial exam- ples in the physical world,”CoRR, vol. abs/1607.02533, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[20]

Fooling automated surveillancecameras:adversarialpatchestoattackpersondetec- tion,

S. Thys, W. Van Ranst, and T. Goedemé, “Fooling automated surveillancecameras:adversarialpatchestoattackpersondetec- tion,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog- nit. Workshop, 2019, pp. 0–0

work page 2019
[21]

Universal physical camouflage attacks on object detectors,

L. Huang, C. Gao, Y. Zhou, C. Xie, A. L. Yuille, C. Zou, and N. Liu, “Universal physical camouflage attacks on object detectors,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 720–729

work page 2020
[22]

Feature-aware transfer- able adversarial attacks against image classification,

S. Cheng, P. Li, K. Han, and H. Xu, “Feature-aware transfer- able adversarial attacks against image classification,”Appl. Soft Comput., vol. 161, p. 111729, 2024

work page 2024
[23]

A survey of robustness and safety of 2d and 3d deep learning models against adversarial attacks,

Y. Li, B. Xie, S. Guo, Y. Yang, and B. Xiao, “A survey of robustness and safety of 2d and 3d deep learning models against adversarial attacks,”ACM Comput. Surv., vol. 56, no. 6, pp. 1– 37, 2024

work page 2024
[24]

Feature importance-aware transferable adversarial attacks,

Z. Wang, H. Guo, Z. Zhang, W. Liu, Z. Qin, and K. Ren, “Feature importance-aware transferable adversarial attacks,” in Proc. IEEE/CVF Int. Conf. Comput. Vis.,2021,pp.7639–7648

work page 2021
[25]

Enhancing adversarial example transferability with an intermediate level attack,

Q. Huang, I. Katsman, H. He, Z. Gu, S. Belongie, and S.- N. Lim, “Enhancing adversarial example transferability with an intermediate level attack,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 4733–4742

work page 2019
[26]

Transferable adversarial attacks on vision transformers with token gradient regularization,

J. Zhang, Y. Huang, W. Wu, and M. R. Lyu, “Transferable adversarial attacks on vision transformers with token gradient regularization,” inProc. IEEE/CVF Conf. Comput. Vis. Pat- tern Recognit., 2023, pp. 16415–16424

work page 2023
[27]

Similarity of neural network representations revisited,

S. Kornblith, M. Norouzi, H. Lee, and G. Hinton, “Similarity of neural network representations revisited,” inProc. Int. Conf. Mach. Learn., 2019, pp. 3519–3529

work page 2019
[28]

Do vision transformers see like convolutional neural networks?

M.Raghu,T.Unterthiner,S.Kornblith,C.Zhang,andA.Doso- vitskiy, “Do vision transformers see like convolutional neural networks?”Adv. Neural Inf. Process. Syst. , vol. 34, pp. 12116– 12128, 2021

work page 2021
[29]

Empowering physical attacks with jacobian matrix regularization against vit-based detectors in uav remote sensing images,

Y. Zhang, Z. Gong, W. Liu, H. Wen, P. Wan, J. Qi, X. Hu, and P. Zhong, “Empowering physical attacks with jacobian matrix regularization against vit-based detectors in uav remote sensing images,”IEEE Trans. Geosci. Remote Sens. , vol. 62, pp. 1–14, 2024

work page 2024
[30]

Adversarial examples are not bugs, they are fea- tures,

A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry, “Adversarial examples are not bugs, they are fea- tures,”Adv. Neural Inf. Process. Syst. , vol. 32, 2019

work page 2019
[31]

On the robust- ness of vision transformers to adversarial examples,

K. Mahmood, R. Mahmood, and M. Van Dijk, “On the robust- ness of vision transformers to adversarial examples,” inProc. IEEE/CVF Int. Conf. Comput. Vis. , 2021, pp. 7838–7847

work page 2021
[32]

Conflict-Aware Adversarial Training,

Z. Xue, H. Wang, Y. Qin, and R. Pedarsani, “Conflict-Aware Adversarial Training,”arXiv:2410.16579, 2024

work page arXiv 2024
[33]

Ensemble diversity facilitates adversarial transferability,

B. Tang, Z. Wang, Y. Bin, Q. Dou, Y. Yang, and H. T. Shen, “Ensemble diversity facilitates adversarial transferability,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. , 2024, pp. 24377–24386

work page 2024
[34]

Boosting Adversarial Transferability via Ensemble Non- Attention,

Y. Zou, Q. Liu, J. Wu, Y. Peng, G. Chen, H. Zhou, and G. Ye, “Boosting Adversarial Transferability via Ensemble Non- Attention,” inProc. AAAI Conf. Artif. Intell. , vol. 40, no. 16, 2026, pp. 14104–14112. 18

work page 2026
[35]

CARLA: An open urban driving simulator,

A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driving simulator,” inProc. 1st Annu. Conf. Robot Learn., 2017, pp. 1–16

work page 2017
[36]

Physics-Based TOF Imaging Simulation for Space Targets Based on Improved Path Tracing,

Z. Yan, H. Wang, X. Liu, Q. Ning, and Y. Lu, “Physics-Based TOF Imaging Simulation for Space Targets Based on Improved Path Tracing,”Remote Sens., vol. 14, no. 12, 2022

work page 2022
[37]

Imaging simulation of the AMCW ToF camera based on path tracking,

Z. Yan, H. Wang, Z. Wang, X. Liu, and Q. Ning, “Imaging simulation of the AMCW ToF camera based on path tracking,” Appl. Opt., vol. 61, no. 18, pp. 5474–5482, 2022

work page 2022
[38]

Accelerating 3d deep learning with pytorch3d,

J. Johnson, N. Ravi, J. Reizenstein, D. Novotny, S. Tulsiani, C. Lassner, and S. Branson, “Accelerating 3d deep learning with pytorch3d,” inProc. SIGGRAPH Asia 2020 Courses , 2020

work page 2020
[39]

Delving into Transferable Adversarial Examples and Black-box Attacks

Y. Liu, X. Chen, C. Liu, and D. Song, “Delving into transferable adversarialexamplesandblack-boxattacks,” arXiv:1611.02770, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[40]

Improving integrated gradient-based transferable adversarial examplesbyrefiningtheintegrationpath,

Y. Ren, Z. Zhao, C. Lin, B. Yang, L. Zhou, Z. Liu, and C. Shen, “Improving integrated gradient-based transferable adversarial examplesbyrefiningtheintegrationpath,”in Proc. AAAI Conf. Artif. Intell., vol. 39, no. 7, 2025, pp. 6731–6739

work page 2025
[41]

Nonlinear total variation based noise removal algorithms,

L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,”Phys. D, vol. 60, no. 1-4, pp. 259–268, 1992

work page 1992
[42]

ACTIVE: Towards Highly Transferable 3D Physical Camouflage for Universal and Robust Vehicle Evasion,

N. Suryanto, Y. Kim, H. T. Larasati, H. Kang, T.-T.-H. Le, Y. Hong, H. Yang, S.-Y. Oh, and H. Kim, “ACTIVE: Towards Highly Transferable 3D Physical Camouflage for Universal and Robust Vehicle Evasion,” inProc. IEEE/CVF Int. Conf. Com- put. Vis. (ICCV), 2023, pp. 4305–4314

work page 2023
[43]

Gradientsurgeryformulti-tasklearning,

T. Yu, S. Kumar, A. Gupta, S. Levine, K. Hausman, and C.Finn,“Gradientsurgeryformulti-tasklearning,” Adv. Neural Inf. Process. Syst., vol. 33, pp. 5824–5836, 2020

work page 2020
[44]

Improving transferability of adversarial examples with input diversity,

C. Xie, Z. Zhang, Y. Zhou, S. H. Bai, J. Wang, Z. Ren, and A. L. Yuille, “Improving transferability of adversarial examples with input diversity,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2730–2739

work page 2019
[45]

YOLOv3: An Incremental Improvement

J. Redmon, “Yolov3: An incremental improvement,” arXiv:1804.02767, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[46]

A convnet for the 2020s,

Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 11976–11986

work page 2022
[47]

Swin transformer: Hierarchical vision transformer usingshiftedwindows,

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer usingshiftedwindows,”in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 10012–10022

work page 2021
[48]

Exploring Plain Vision Transformer Backbones for Object Detection,

Y. Li, H. Mao, R. Girshick, and K. He, “Exploring Plain Vision Transformer Backbones for Object Detection,” inProc. Eur. Conf. Comput. Vis., 2022, pp. 280–296

work page 2022
[49]

Mask r-cnn,

K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask r-cnn,” in Proc. IEEE Int. Conf. Comput. Vis. , 2017, pp. 2961–2969

work page 2017
[50]

Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,

W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” inProc. IEEE/CVF Int. Conf. Comput. Vis. , 2021, pp. 568–578

work page 2021
[51]

Pvt v2: Improved baselines with pyramid vision transformer,

W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. u. Lu, P. Luo, and L. Shao, “Pvt v2: Improved baselines with pyramid vision transformer,”Comput. Vis. Media, vol. 8, no. 3, pp. 415– 424, 2022

work page 2022
[52]

Focal loss for dense object detection,

T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for dense object detection,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2980–2988

work page 2017
[53]

Intriguing properties of neural networks

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,”arXiv:1312.6199, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[54]

Dta: Physical camouflage attacks using differentiable transformation network,

N. Suryanto, Y. Kim, H. Kang, H. T. Larasati, Y. Yun, T.-T.-H. Le, H. Yang, S.-Y. Oh, and H. Kim, “Dta: Physical camouflage attacks using differentiable transformation network,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) , 2022, pp. 15305–15314

work page 2022
[55]

Generate transferable adversarial physical camouflages via triplet attention suppression,

J. Wang, X. Liu, Z. Yin, Y. Wang, J. Guo, H. Qin, Q. Wu, and A. Liu, “Generate transferable adversarial physical camouflages via triplet attention suppression,”Int. J. Comput. Vis., vol. 132, no. 11, pp. 5084–5100, 2024

work page 2024
[56]

Rtmdet: An empirical study of designing real-time object detectors.arXiv preprint arXiv:2212.07784,

C. Lyu, W. Zhang, H. Huang, Y. Zhou, Y. Wang, Y. Liu, S. Zhang, and K. Chen, “Rtmdet: An empirical study of de- signing real-time object detectors,”arXiv:2212.07784, 2022

work page arXiv 2022

[1] [1]

FasterR-CNN:Towards real-timeobjectdetectionwithregionproposalnetworks,

S.Ren,K.He,R.Girshick,andJ.Sun,“FasterR-CNN:Towards real-timeobjectdetectionwithregionproposalnetworks,” IEEE Trans. Pattern Anal. Mach. Intell.,vol.39,no.6,pp.1137–1149, 2016

work page 2016

[2] [2]

Object detection in 20 years: A survey,

Z. Zou, K. Chen, Z. Shi, Y. Guo, and J. Ye, “Object detection in 20 years: A survey,”Proc. IEEE, vol. 111, no. 3, pp. 257–276, 2023

work page 2023

[3] [3]

A survey of visual transformers,

Y. Liu, Y. Zhang, Y. Wang, F. Hou, J. Yuan, J. Tian, Y. Zhang, Z. Shi, J. Fan, and Z. He, “A survey of visual transformers,” IEEE Trans. Neural Netw. Learn. Syst., vol. 35, no. 6, pp. 7478– 7498, 2023

work page 2023

[4] [4]

Physical Adversarial Attack Meets Computer Vision: A Decade Survey,

H. Wei, H. Tang, X. Jia, Z. Wang, H. Yu, Z. Li, S. Satoh, L. Van Gool, and Z. Wang, “Physical Adversarial Attack Meets Computer Vision: A Decade Survey,” IEEE Trans. Pattern Anal. Mach. Intell. , vol. 46, pp. 9797–9817, 2024

work page 2024

[5] [5]

Beyond Vulnerabilities: A Survey of Adversarial Attacks as Both Threats and Defenses in Computer Vision Systems,

Z. Guo, Y. Qian, Y. Li, W. Li, C. T. Lei, S. Zhao, L. Fang, O. Arandjelović, and C. P. Lau, “Beyond Vulnerabilities: A Survey of Adversarial Attacks as Both Threats and Defenses in Computer Vision Systems,”arXiv:2508.01845, 2025

work page arXiv 2025

[6] [6]

Explaining and harnessing adversarial examples,

I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” inProc. Int. Conf. Learn. Represent., 2015

work page 2015

[7] [7]

Render- Bender: A Survey on Adversarial Attacks Using Differentiable Rendering,

M. Hull, H. Wang, M. Lau, A. Helbling, M. Phute, C. Zhang, Z. Kira, W. Lunardi, M. Andreoni, W. Lee et al. , “Render- Bender: A Survey on Adversarial Attacks Using Differentiable Rendering,”arXiv:2411.09749, 2024

work page arXiv 2024

[8] [8]

Natu- ralistic physical adversarial camouflage for object detection via differentiable rendering and style learning,

Z. Liu, Z. Yan, Q. Ning, Y. Lu, Z. Wang, and H. Wang, “Natu- ralistic physical adversarial camouflage for object detection via differentiable rendering and style learning,”Pattern Recognit., vol. 172, p. 112621, 2026

work page 2026

[9] [9]

FullyCoveredAdversarialCamouflageAgainstRemote Sensing Detection via Physics-Driven Rendering and Pyramid Training,

Z. Liu, H. Wang, Q. Ning, Z. Wang, Y. Lu, Y. Zang, and Z.Yan,“FullyCoveredAdversarialCamouflageAgainstRemote Sensing Detection via Physics-Driven Rendering and Pyramid Training,”IEEE Trans. Geosci. Remote Sens., vol. 63, pp. 1–19, 2025

work page 2025

[10] [10]

Meshadv: Adversarial meshes for visual recognition,

C. Xiao, D. Yang, B. Li, J. Deng, and M. Liu, “Meshadv: Adversarial meshes for visual recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. , 2019, pp. 6898–6907

work page 2019

[11] [11]

Synthesiz- ing Robust Adversarial Examples,

A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok, “Synthesiz- ing Robust Adversarial Examples,” inProc. Int. Conf. Mach. Learn., J. Dy and A. Krause, Eds., vol. 80, 2018

work page 2018

[12] [12]

A Highly Transferable Camouflage Attack Against Object Detectors in the Physical World,

Y. Wang, L. Wu, Y. Cao, J. Jin, Z. Zhang, E. Wang, C. Ma, and Y. Zhao, “A Highly Transferable Camouflage Attack Against Object Detectors in the Physical World,”IEEE Trans. Intell. Transp. Syst., vol. 26, no. 7, pp. 10373–10385, 2025

work page 2025

[13] [13]

Improving the adversarial transferability with relational graphs ensemble adversarial attack,

J. Pi, C. Luo, F. Xia, N. Jiang, H. Wu, and Z. Wu, “Improving the adversarial transferability with relational graphs ensemble adversarial attack,”Front. Neurosci., vol. 16, p. 1094795, 2023

work page 2023

[14] [14]

FCA: Learning a 3d full-coverage vehicle camouflage for multi-view physical adversarial attack,

D. Wang, T. Jiang, J. Sun, W. Zhou, Z. Gong, X. Zhang, W. Yao, and X. Chen, “FCA: Learning a 3d full-coverage vehicle camouflage for multi-view physical adversarial attack,” inProc. AAAI Conf. Artif. Intell. , vol. 36, 2022, pp. 2414–2422

work page 2022

[15] [15]

Dual At- tention Suppression Attack: Generate Adversarial Camouflage in Physical World,

J. Wang, A. Liu, Z. Yin, S. Liu, S. Tang, and X. Liu, “Dual At- tention Suppression Attack: Generate Adversarial Camouflage in Physical World,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 8565–8574

work page 2021

[16] [16]

Improving transferabilityofphysicaladversarialattacksonobjectdetectors through multi-model optimization,

A. Dimitriu, T. V. Michaletzky, and V. Remeli, “Improving transferabilityofphysicaladversarialattacksonobjectdetectors through multi-model optimization,”Appl. Sci., vol. 14, no. 23, p. 11423, 2024

work page 2024

[17] [17]

An adap- tive model ensemble adversarial attack for boosting adversarial transferability,

B. Chen, J. Yin, S. Chen, B. Chen, and X. Liu, “An adap- tive model ensemble adversarial attack for boosting adversarial transferability,” inProc. IEEE/CVF Int. Conf. Comput. Vis. , 2023, pp. 4489–4498

work page 2023

[18] [18]

AEGIS: Ad- versarial Target-Guided Retention-Data-Free Robust Concept Erasure from Diffusion Models,

F. Li, K. Li, Q. Wang, B. Han, and J. Zhou, “AEGIS: Ad- versarial Target-Guided Retention-Data-Free Robust Concept Erasure from Diffusion Models,” in Proc. Int. Conf. Learn. Represent., 2026

work page 2026

[19] [19]

Adversarial examples in the physical world

A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial exam- ples in the physical world,”CoRR, vol. abs/1607.02533, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[20] [20]

Fooling automated surveillancecameras:adversarialpatchestoattackpersondetec- tion,

S. Thys, W. Van Ranst, and T. Goedemé, “Fooling automated surveillancecameras:adversarialpatchestoattackpersondetec- tion,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog- nit. Workshop, 2019, pp. 0–0

work page 2019

[21] [21]

Universal physical camouflage attacks on object detectors,

L. Huang, C. Gao, Y. Zhou, C. Xie, A. L. Yuille, C. Zou, and N. Liu, “Universal physical camouflage attacks on object detectors,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 720–729

work page 2020

[22] [22]

Feature-aware transfer- able adversarial attacks against image classification,

S. Cheng, P. Li, K. Han, and H. Xu, “Feature-aware transfer- able adversarial attacks against image classification,”Appl. Soft Comput., vol. 161, p. 111729, 2024

work page 2024

[23] [23]

A survey of robustness and safety of 2d and 3d deep learning models against adversarial attacks,

Y. Li, B. Xie, S. Guo, Y. Yang, and B. Xiao, “A survey of robustness and safety of 2d and 3d deep learning models against adversarial attacks,”ACM Comput. Surv., vol. 56, no. 6, pp. 1– 37, 2024

work page 2024

[24] [24]

Feature importance-aware transferable adversarial attacks,

Z. Wang, H. Guo, Z. Zhang, W. Liu, Z. Qin, and K. Ren, “Feature importance-aware transferable adversarial attacks,” in Proc. IEEE/CVF Int. Conf. Comput. Vis.,2021,pp.7639–7648

work page 2021

[25] [25]

Enhancing adversarial example transferability with an intermediate level attack,

Q. Huang, I. Katsman, H. He, Z. Gu, S. Belongie, and S.- N. Lim, “Enhancing adversarial example transferability with an intermediate level attack,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 4733–4742

work page 2019

[26] [26]

Transferable adversarial attacks on vision transformers with token gradient regularization,

J. Zhang, Y. Huang, W. Wu, and M. R. Lyu, “Transferable adversarial attacks on vision transformers with token gradient regularization,” inProc. IEEE/CVF Conf. Comput. Vis. Pat- tern Recognit., 2023, pp. 16415–16424

work page 2023

[27] [27]

Similarity of neural network representations revisited,

S. Kornblith, M. Norouzi, H. Lee, and G. Hinton, “Similarity of neural network representations revisited,” inProc. Int. Conf. Mach. Learn., 2019, pp. 3519–3529

work page 2019

[28] [28]

Do vision transformers see like convolutional neural networks?

M.Raghu,T.Unterthiner,S.Kornblith,C.Zhang,andA.Doso- vitskiy, “Do vision transformers see like convolutional neural networks?”Adv. Neural Inf. Process. Syst. , vol. 34, pp. 12116– 12128, 2021

work page 2021

[29] [29]

Empowering physical attacks with jacobian matrix regularization against vit-based detectors in uav remote sensing images,

Y. Zhang, Z. Gong, W. Liu, H. Wen, P. Wan, J. Qi, X. Hu, and P. Zhong, “Empowering physical attacks with jacobian matrix regularization against vit-based detectors in uav remote sensing images,”IEEE Trans. Geosci. Remote Sens. , vol. 62, pp. 1–14, 2024

work page 2024

[30] [30]

Adversarial examples are not bugs, they are fea- tures,

A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry, “Adversarial examples are not bugs, they are fea- tures,”Adv. Neural Inf. Process. Syst. , vol. 32, 2019

work page 2019

[31] [31]

On the robust- ness of vision transformers to adversarial examples,

K. Mahmood, R. Mahmood, and M. Van Dijk, “On the robust- ness of vision transformers to adversarial examples,” inProc. IEEE/CVF Int. Conf. Comput. Vis. , 2021, pp. 7838–7847

work page 2021

[32] [32]

Conflict-Aware Adversarial Training,

Z. Xue, H. Wang, Y. Qin, and R. Pedarsani, “Conflict-Aware Adversarial Training,”arXiv:2410.16579, 2024

work page arXiv 2024

[33] [33]

Ensemble diversity facilitates adversarial transferability,

B. Tang, Z. Wang, Y. Bin, Q. Dou, Y. Yang, and H. T. Shen, “Ensemble diversity facilitates adversarial transferability,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. , 2024, pp. 24377–24386

work page 2024

[34] [34]

Boosting Adversarial Transferability via Ensemble Non- Attention,

Y. Zou, Q. Liu, J. Wu, Y. Peng, G. Chen, H. Zhou, and G. Ye, “Boosting Adversarial Transferability via Ensemble Non- Attention,” inProc. AAAI Conf. Artif. Intell. , vol. 40, no. 16, 2026, pp. 14104–14112. 18

work page 2026

[35] [35]

CARLA: An open urban driving simulator,

A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driving simulator,” inProc. 1st Annu. Conf. Robot Learn., 2017, pp. 1–16

work page 2017

[36] [36]

Physics-Based TOF Imaging Simulation for Space Targets Based on Improved Path Tracing,

Z. Yan, H. Wang, X. Liu, Q. Ning, and Y. Lu, “Physics-Based TOF Imaging Simulation for Space Targets Based on Improved Path Tracing,”Remote Sens., vol. 14, no. 12, 2022

work page 2022

[37] [37]

Imaging simulation of the AMCW ToF camera based on path tracking,

Z. Yan, H. Wang, Z. Wang, X. Liu, and Q. Ning, “Imaging simulation of the AMCW ToF camera based on path tracking,” Appl. Opt., vol. 61, no. 18, pp. 5474–5482, 2022

work page 2022

[38] [38]

Accelerating 3d deep learning with pytorch3d,

J. Johnson, N. Ravi, J. Reizenstein, D. Novotny, S. Tulsiani, C. Lassner, and S. Branson, “Accelerating 3d deep learning with pytorch3d,” inProc. SIGGRAPH Asia 2020 Courses , 2020

work page 2020

[39] [39]

Delving into Transferable Adversarial Examples and Black-box Attacks

Y. Liu, X. Chen, C. Liu, and D. Song, “Delving into transferable adversarialexamplesandblack-boxattacks,” arXiv:1611.02770, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[40] [40]

Improving integrated gradient-based transferable adversarial examplesbyrefiningtheintegrationpath,

Y. Ren, Z. Zhao, C. Lin, B. Yang, L. Zhou, Z. Liu, and C. Shen, “Improving integrated gradient-based transferable adversarial examplesbyrefiningtheintegrationpath,”in Proc. AAAI Conf. Artif. Intell., vol. 39, no. 7, 2025, pp. 6731–6739

work page 2025

[41] [41]

Nonlinear total variation based noise removal algorithms,

L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,”Phys. D, vol. 60, no. 1-4, pp. 259–268, 1992

work page 1992

[42] [42]

ACTIVE: Towards Highly Transferable 3D Physical Camouflage for Universal and Robust Vehicle Evasion,

N. Suryanto, Y. Kim, H. T. Larasati, H. Kang, T.-T.-H. Le, Y. Hong, H. Yang, S.-Y. Oh, and H. Kim, “ACTIVE: Towards Highly Transferable 3D Physical Camouflage for Universal and Robust Vehicle Evasion,” inProc. IEEE/CVF Int. Conf. Com- put. Vis. (ICCV), 2023, pp. 4305–4314

work page 2023

[43] [43]

Gradientsurgeryformulti-tasklearning,

T. Yu, S. Kumar, A. Gupta, S. Levine, K. Hausman, and C.Finn,“Gradientsurgeryformulti-tasklearning,” Adv. Neural Inf. Process. Syst., vol. 33, pp. 5824–5836, 2020

work page 2020

[44] [44]

Improving transferability of adversarial examples with input diversity,

C. Xie, Z. Zhang, Y. Zhou, S. H. Bai, J. Wang, Z. Ren, and A. L. Yuille, “Improving transferability of adversarial examples with input diversity,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2730–2739

work page 2019

[45] [45]

YOLOv3: An Incremental Improvement

J. Redmon, “Yolov3: An incremental improvement,” arXiv:1804.02767, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[46] [46]

A convnet for the 2020s,

Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 11976–11986

work page 2022

[47] [47]

Swin transformer: Hierarchical vision transformer usingshiftedwindows,

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer usingshiftedwindows,”in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 10012–10022

work page 2021

[48] [48]

Exploring Plain Vision Transformer Backbones for Object Detection,

Y. Li, H. Mao, R. Girshick, and K. He, “Exploring Plain Vision Transformer Backbones for Object Detection,” inProc. Eur. Conf. Comput. Vis., 2022, pp. 280–296

work page 2022

[49] [49]

Mask r-cnn,

K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask r-cnn,” in Proc. IEEE Int. Conf. Comput. Vis. , 2017, pp. 2961–2969

work page 2017

[50] [50]

Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,

W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” inProc. IEEE/CVF Int. Conf. Comput. Vis. , 2021, pp. 568–578

work page 2021

[51] [51]

Pvt v2: Improved baselines with pyramid vision transformer,

W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. u. Lu, P. Luo, and L. Shao, “Pvt v2: Improved baselines with pyramid vision transformer,”Comput. Vis. Media, vol. 8, no. 3, pp. 415– 424, 2022

work page 2022

[52] [52]

Focal loss for dense object detection,

T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for dense object detection,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2980–2988

work page 2017

[53] [53]

Intriguing properties of neural networks

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,”arXiv:1312.6199, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[54] [54]

Dta: Physical camouflage attacks using differentiable transformation network,

N. Suryanto, Y. Kim, H. Kang, H. T. Larasati, Y. Yun, T.-T.-H. Le, H. Yang, S.-Y. Oh, and H. Kim, “Dta: Physical camouflage attacks using differentiable transformation network,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) , 2022, pp. 15305–15314

work page 2022

[55] [55]

Generate transferable adversarial physical camouflages via triplet attention suppression,

J. Wang, X. Liu, Z. Yin, Y. Wang, J. Guo, H. Qin, Q. Wu, and A. Liu, “Generate transferable adversarial physical camouflages via triplet attention suppression,”Int. J. Comput. Vis., vol. 132, no. 11, pp. 5084–5100, 2024

work page 2024

[56] [56]

Rtmdet: An empirical study of designing real-time object detectors.arXiv preprint arXiv:2212.07784,

C. Lyu, W. Zhang, H. Huang, Y. Zhou, Y. Wang, Y. Liu, S. Zhang, and K. Chen, “Rtmdet: An empirical study of de- signing real-time object detectors,”arXiv:2212.07784, 2022

work page arXiv 2022