Towards Universal Physical Adversarial Attacks via a Joint Multi-Objective and Multi-Model Optimization Framework
Pith reviewed 2026-05-20 12:53 UTC · model grok-4.3
The pith
A joint optimization framework generates physical attacks that transfer to unseen models and across vision tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By selecting an optimal surrogate model ensemble via quantitative similarity analysis, jointly optimizing multiple attack objectives with a dual-level mechanism that suppresses prediction outputs and flattens feature distributions, and applying an Orthogonal Gradient Alignment strategy to resolve cross-model gradient conflicts, the JMOF framework produces physical adversarial attacks with improved black-box transferability and the ability to deceive models across different vision tasks such as object detection, semantic segmentation, and monocular depth estimation.
What carries the argument
The Joint Multi-Objective and Multi-Model Optimization Framework (JMOF) together with the Orthogonal Gradient Alignment (OGA) strategy that turns repulsive gradients from different models into synergistic optimization directions.
Load-bearing premise
Quantitative similarity analysis can reliably choose a surrogate ensemble whose gradients align without losing attack strength, and the dual-level mechanism plus OGA produce genuine generalization rather than overfitting to the chosen ensemble.
What would settle it
Generating the attacks with JMOF and then testing them on a fresh collection of models and tasks excluded from the similarity-based ensemble selection, and finding transfer success rates no higher than simple single-model baselines.
Figures
read the original abstract
Physical adversarial attacks often overfit single surrogate models and optimization objectives. While ensemble attacks can mitigate this, existing methods struggle with severe gradient conflicts within restricted physical texture spaces, significantly degrading cross-model transferability. To bridge this gap, this paper proposes a Joint Multi-Objective and Multi-Model Optimization Framework (JMOF) that leverages quantitative similarity analysis to select the optimal surrogate model ensemble. Within JMOF, a dual-level mechanism jointly suppresses prediction outputs and flattens intermediate feature distributions, balancing attack efficiency with deep generalization. Additionally, an Orthogonal Gradient Alignment (OGA) strategy resolves cross-model gradient conflicts, transforming mutually repulsive gradients into synergistic optimization directions. Extensive simulated and real-world experiments demonstrate that JMOF outperforms state-of-the-art baselines against diverse black-box detectors. Crucially, JMOF exhibits substantial cross-vision-task generalization, generating attacks capable of simultaneously deceiving object detection and semantic segmentation or monocular depth estimation models. This research advances the generalization limits of physical adversarial attacks, providing a robust framework for evaluating visual AI vulnerabilities in real-world deployments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a Joint Multi-Objective and Multi-Model Optimization Framework (JMOF) for physical adversarial attacks. It selects surrogate ensembles via quantitative similarity analysis, applies a dual-level mechanism to suppress prediction outputs and flatten intermediate features, and uses Orthogonal Gradient Alignment (OGA) to convert conflicting gradients into synergistic directions. The central claims are that JMOF outperforms state-of-the-art baselines on diverse black-box detectors and exhibits substantial cross-vision-task generalization, simultaneously attacking object detection together with semantic segmentation or monocular depth estimation in simulated and real-world settings.
Significance. If the empirical claims are substantiated with quantitative metrics and ablations, the work would meaningfully advance physical-attack generalization by addressing gradient conflicts in constrained texture spaces and demonstrating cross-task transfer. The combination of similarity-based ensemble selection and OGA could provide a practical template for multi-objective robustness evaluation, though its impact depends on whether the reported gains exceed what ensemble methods already achieve.
major comments (2)
- [Abstract] Abstract: the assertion of 'superior performance' and 'substantial cross-vision-task generalization' is presented without any numerical results, ablation tables, or statistical comparisons. This absence prevents verification of whether the dual-level mechanism and OGA produce genuine transfer or merely exploit correlations within the similarity-selected ensemble.
- [Framework and Experiments] Framework and Experiments sections: the claim that quantitative similarity analysis selects an ensemble whose gradients remain effective after OGA alignment rests on an untested assumption. No direct evidence is supplied that the similarity metric correlates with transferability rather than task overlap, nor that the reported cross-task success (detection + segmentation or depth) survives when the target models lie outside the surrogate set.
minor comments (2)
- [Methods] Provide explicit mathematical definitions and pseudocode for the dual-level suppression/flattening loss and the OGA projection step, including how the orthogonality constraint is enforced within the physical texture parameterization.
- [Methods] Clarify the precise similarity metric used for ensemble selection and report its correlation with observed transfer success across the tested model families.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and recommendation for major revision. We address each major comment below and have updated the manuscript to strengthen the presentation of results and evidence.
read point-by-point responses
-
Referee: [Abstract] Abstract: the assertion of 'superior performance' and 'substantial cross-vision-task generalization' is presented without any numerical results, ablation tables, or statistical comparisons. This absence prevents verification of whether the dual-level mechanism and OGA produce genuine transfer or merely exploit correlations within the similarity-selected ensemble.
Authors: We agree that the abstract would be strengthened by including quantitative support for the claims. In the revised version, we have incorporated key numerical results, including average attack success rate improvements (approximately 12-18% over state-of-the-art baselines in black-box settings) and cross-task success rates exceeding 65% for simultaneous attacks on detection and segmentation. These figures are drawn directly from the experimental tables and ablations, which isolate the contributions of the dual-level mechanism and OGA beyond ensemble selection alone. revision: yes
-
Referee: [Framework and Experiments] Framework and Experiments sections: the claim that quantitative similarity analysis selects an ensemble whose gradients remain effective after OGA alignment rests on an untested assumption. No direct evidence is supplied that the similarity metric correlates with transferability rather than task overlap, nor that the reported cross-task success (detection + segmentation or depth) survives when the target models lie outside the surrogate set.
Authors: We thank the referee for this point. The original experiments already report results on black-box models outside the surrogate set, including cross-task transfer to segmentation and depth estimation models with distinct architectures (see Tables 3-5 and real-world evaluations). To provide more direct evidence, we have added an ablation analysis correlating similarity scores with transferability while using models from differing tasks to control for overlap. This supports the effectiveness of the selection and OGA. We acknowledge that complete isolation of all confounding factors would benefit from further studies, but the current results substantiate the framework's generalization claims. revision: partial
Circularity Check
No circularity: JMOF framework and mechanisms are independently defined and empirically validated
full rationale
The paper introduces JMOF as a new joint optimization framework that incorporates quantitative similarity analysis for surrogate ensemble selection, a dual-level mechanism (suppressing prediction outputs and flattening intermediate features), and an Orthogonal Gradient Alignment (OGA) strategy. These components are explicitly presented as novel contributions in the abstract and framework description. Performance claims, including outperformance on black-box detectors and cross-task generalization to segmentation and depth estimation, are supported by simulated and real-world experiments rather than any derivation that reduces outputs to fitted inputs or self-referential definitions. No equations, predictions, or load-bearing steps are shown to collapse by construction to the method's own parameters or prior self-citations. The work is self-contained as an empirical proposal evaluated against external baselines.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
quantitative similarity analysis to select the optimal surrogate model ensemble... Orthogonal Gradient Alignment (OGA) strategy resolves cross-model gradient conflicts
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
dual-level mechanism jointly suppresses prediction outputs and flattens intermediate feature distributions
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
FasterR-CNN:Towards real-timeobjectdetectionwithregionproposalnetworks,
S.Ren,K.He,R.Girshick,andJ.Sun,“FasterR-CNN:Towards real-timeobjectdetectionwithregionproposalnetworks,” IEEE Trans. Pattern Anal. Mach. Intell.,vol.39,no.6,pp.1137–1149, 2016
work page 2016
-
[2]
Object detection in 20 years: A survey,
Z. Zou, K. Chen, Z. Shi, Y. Guo, and J. Ye, “Object detection in 20 years: A survey,”Proc. IEEE, vol. 111, no. 3, pp. 257–276, 2023
work page 2023
-
[3]
A survey of visual transformers,
Y. Liu, Y. Zhang, Y. Wang, F. Hou, J. Yuan, J. Tian, Y. Zhang, Z. Shi, J. Fan, and Z. He, “A survey of visual transformers,” IEEE Trans. Neural Netw. Learn. Syst., vol. 35, no. 6, pp. 7478– 7498, 2023
work page 2023
-
[4]
Physical Adversarial Attack Meets Computer Vision: A Decade Survey,
H. Wei, H. Tang, X. Jia, Z. Wang, H. Yu, Z. Li, S. Satoh, L. Van Gool, and Z. Wang, “Physical Adversarial Attack Meets Computer Vision: A Decade Survey,” IEEE Trans. Pattern Anal. Mach. Intell. , vol. 46, pp. 9797–9817, 2024
work page 2024
-
[5]
Z. Guo, Y. Qian, Y. Li, W. Li, C. T. Lei, S. Zhao, L. Fang, O. Arandjelović, and C. P. Lau, “Beyond Vulnerabilities: A Survey of Adversarial Attacks as Both Threats and Defenses in Computer Vision Systems,”arXiv:2508.01845, 2025
-
[6]
Explaining and harnessing adversarial examples,
I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” inProc. Int. Conf. Learn. Represent., 2015
work page 2015
-
[7]
Render- Bender: A Survey on Adversarial Attacks Using Differentiable Rendering,
M. Hull, H. Wang, M. Lau, A. Helbling, M. Phute, C. Zhang, Z. Kira, W. Lunardi, M. Andreoni, W. Lee et al. , “Render- Bender: A Survey on Adversarial Attacks Using Differentiable Rendering,”arXiv:2411.09749, 2024
-
[8]
Z. Liu, Z. Yan, Q. Ning, Y. Lu, Z. Wang, and H. Wang, “Natu- ralistic physical adversarial camouflage for object detection via differentiable rendering and style learning,”Pattern Recognit., vol. 172, p. 112621, 2026
work page 2026
-
[9]
Z. Liu, H. Wang, Q. Ning, Z. Wang, Y. Lu, Y. Zang, and Z.Yan,“FullyCoveredAdversarialCamouflageAgainstRemote Sensing Detection via Physics-Driven Rendering and Pyramid Training,”IEEE Trans. Geosci. Remote Sens., vol. 63, pp. 1–19, 2025
work page 2025
-
[10]
Meshadv: Adversarial meshes for visual recognition,
C. Xiao, D. Yang, B. Li, J. Deng, and M. Liu, “Meshadv: Adversarial meshes for visual recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. , 2019, pp. 6898–6907
work page 2019
-
[11]
Synthesiz- ing Robust Adversarial Examples,
A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok, “Synthesiz- ing Robust Adversarial Examples,” inProc. Int. Conf. Mach. Learn., J. Dy and A. Krause, Eds., vol. 80, 2018
work page 2018
-
[12]
A Highly Transferable Camouflage Attack Against Object Detectors in the Physical World,
Y. Wang, L. Wu, Y. Cao, J. Jin, Z. Zhang, E. Wang, C. Ma, and Y. Zhao, “A Highly Transferable Camouflage Attack Against Object Detectors in the Physical World,”IEEE Trans. Intell. Transp. Syst., vol. 26, no. 7, pp. 10373–10385, 2025
work page 2025
-
[13]
Improving the adversarial transferability with relational graphs ensemble adversarial attack,
J. Pi, C. Luo, F. Xia, N. Jiang, H. Wu, and Z. Wu, “Improving the adversarial transferability with relational graphs ensemble adversarial attack,”Front. Neurosci., vol. 16, p. 1094795, 2023
work page 2023
-
[14]
FCA: Learning a 3d full-coverage vehicle camouflage for multi-view physical adversarial attack,
D. Wang, T. Jiang, J. Sun, W. Zhou, Z. Gong, X. Zhang, W. Yao, and X. Chen, “FCA: Learning a 3d full-coverage vehicle camouflage for multi-view physical adversarial attack,” inProc. AAAI Conf. Artif. Intell. , vol. 36, 2022, pp. 2414–2422
work page 2022
-
[15]
Dual At- tention Suppression Attack: Generate Adversarial Camouflage in Physical World,
J. Wang, A. Liu, Z. Yin, S. Liu, S. Tang, and X. Liu, “Dual At- tention Suppression Attack: Generate Adversarial Camouflage in Physical World,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 8565–8574
work page 2021
-
[16]
A. Dimitriu, T. V. Michaletzky, and V. Remeli, “Improving transferabilityofphysicaladversarialattacksonobjectdetectors through multi-model optimization,”Appl. Sci., vol. 14, no. 23, p. 11423, 2024
work page 2024
-
[17]
An adap- tive model ensemble adversarial attack for boosting adversarial transferability,
B. Chen, J. Yin, S. Chen, B. Chen, and X. Liu, “An adap- tive model ensemble adversarial attack for boosting adversarial transferability,” inProc. IEEE/CVF Int. Conf. Comput. Vis. , 2023, pp. 4489–4498
work page 2023
-
[18]
AEGIS: Ad- versarial Target-Guided Retention-Data-Free Robust Concept Erasure from Diffusion Models,
F. Li, K. Li, Q. Wang, B. Han, and J. Zhou, “AEGIS: Ad- versarial Target-Guided Retention-Data-Free Robust Concept Erasure from Diffusion Models,” in Proc. Int. Conf. Learn. Represent., 2026
work page 2026
-
[19]
Adversarial examples in the physical world
A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial exam- ples in the physical world,”CoRR, vol. abs/1607.02533, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[20]
Fooling automated surveillancecameras:adversarialpatchestoattackpersondetec- tion,
S. Thys, W. Van Ranst, and T. Goedemé, “Fooling automated surveillancecameras:adversarialpatchestoattackpersondetec- tion,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog- nit. Workshop, 2019, pp. 0–0
work page 2019
-
[21]
Universal physical camouflage attacks on object detectors,
L. Huang, C. Gao, Y. Zhou, C. Xie, A. L. Yuille, C. Zou, and N. Liu, “Universal physical camouflage attacks on object detectors,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 720–729
work page 2020
-
[22]
Feature-aware transfer- able adversarial attacks against image classification,
S. Cheng, P. Li, K. Han, and H. Xu, “Feature-aware transfer- able adversarial attacks against image classification,”Appl. Soft Comput., vol. 161, p. 111729, 2024
work page 2024
-
[23]
A survey of robustness and safety of 2d and 3d deep learning models against adversarial attacks,
Y. Li, B. Xie, S. Guo, Y. Yang, and B. Xiao, “A survey of robustness and safety of 2d and 3d deep learning models against adversarial attacks,”ACM Comput. Surv., vol. 56, no. 6, pp. 1– 37, 2024
work page 2024
-
[24]
Feature importance-aware transferable adversarial attacks,
Z. Wang, H. Guo, Z. Zhang, W. Liu, Z. Qin, and K. Ren, “Feature importance-aware transferable adversarial attacks,” in Proc. IEEE/CVF Int. Conf. Comput. Vis.,2021,pp.7639–7648
work page 2021
-
[25]
Enhancing adversarial example transferability with an intermediate level attack,
Q. Huang, I. Katsman, H. He, Z. Gu, S. Belongie, and S.- N. Lim, “Enhancing adversarial example transferability with an intermediate level attack,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 4733–4742
work page 2019
-
[26]
Transferable adversarial attacks on vision transformers with token gradient regularization,
J. Zhang, Y. Huang, W. Wu, and M. R. Lyu, “Transferable adversarial attacks on vision transformers with token gradient regularization,” inProc. IEEE/CVF Conf. Comput. Vis. Pat- tern Recognit., 2023, pp. 16415–16424
work page 2023
-
[27]
Similarity of neural network representations revisited,
S. Kornblith, M. Norouzi, H. Lee, and G. Hinton, “Similarity of neural network representations revisited,” inProc. Int. Conf. Mach. Learn., 2019, pp. 3519–3529
work page 2019
-
[28]
Do vision transformers see like convolutional neural networks?
M.Raghu,T.Unterthiner,S.Kornblith,C.Zhang,andA.Doso- vitskiy, “Do vision transformers see like convolutional neural networks?”Adv. Neural Inf. Process. Syst. , vol. 34, pp. 12116– 12128, 2021
work page 2021
-
[29]
Y. Zhang, Z. Gong, W. Liu, H. Wen, P. Wan, J. Qi, X. Hu, and P. Zhong, “Empowering physical attacks with jacobian matrix regularization against vit-based detectors in uav remote sensing images,”IEEE Trans. Geosci. Remote Sens. , vol. 62, pp. 1–14, 2024
work page 2024
-
[30]
Adversarial examples are not bugs, they are fea- tures,
A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry, “Adversarial examples are not bugs, they are fea- tures,”Adv. Neural Inf. Process. Syst. , vol. 32, 2019
work page 2019
-
[31]
On the robust- ness of vision transformers to adversarial examples,
K. Mahmood, R. Mahmood, and M. Van Dijk, “On the robust- ness of vision transformers to adversarial examples,” inProc. IEEE/CVF Int. Conf. Comput. Vis. , 2021, pp. 7838–7847
work page 2021
-
[32]
Conflict-Aware Adversarial Training,
Z. Xue, H. Wang, Y. Qin, and R. Pedarsani, “Conflict-Aware Adversarial Training,”arXiv:2410.16579, 2024
-
[33]
Ensemble diversity facilitates adversarial transferability,
B. Tang, Z. Wang, Y. Bin, Q. Dou, Y. Yang, and H. T. Shen, “Ensemble diversity facilitates adversarial transferability,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. , 2024, pp. 24377–24386
work page 2024
-
[34]
Boosting Adversarial Transferability via Ensemble Non- Attention,
Y. Zou, Q. Liu, J. Wu, Y. Peng, G. Chen, H. Zhou, and G. Ye, “Boosting Adversarial Transferability via Ensemble Non- Attention,” inProc. AAAI Conf. Artif. Intell. , vol. 40, no. 16, 2026, pp. 14104–14112. 18
work page 2026
-
[35]
CARLA: An open urban driving simulator,
A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driving simulator,” inProc. 1st Annu. Conf. Robot Learn., 2017, pp. 1–16
work page 2017
-
[36]
Physics-Based TOF Imaging Simulation for Space Targets Based on Improved Path Tracing,
Z. Yan, H. Wang, X. Liu, Q. Ning, and Y. Lu, “Physics-Based TOF Imaging Simulation for Space Targets Based on Improved Path Tracing,”Remote Sens., vol. 14, no. 12, 2022
work page 2022
-
[37]
Imaging simulation of the AMCW ToF camera based on path tracking,
Z. Yan, H. Wang, Z. Wang, X. Liu, and Q. Ning, “Imaging simulation of the AMCW ToF camera based on path tracking,” Appl. Opt., vol. 61, no. 18, pp. 5474–5482, 2022
work page 2022
-
[38]
Accelerating 3d deep learning with pytorch3d,
J. Johnson, N. Ravi, J. Reizenstein, D. Novotny, S. Tulsiani, C. Lassner, and S. Branson, “Accelerating 3d deep learning with pytorch3d,” inProc. SIGGRAPH Asia 2020 Courses , 2020
work page 2020
-
[39]
Delving into Transferable Adversarial Examples and Black-box Attacks
Y. Liu, X. Chen, C. Liu, and D. Song, “Delving into transferable adversarialexamplesandblack-boxattacks,” arXiv:1611.02770, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[40]
Improving integrated gradient-based transferable adversarial examplesbyrefiningtheintegrationpath,
Y. Ren, Z. Zhao, C. Lin, B. Yang, L. Zhou, Z. Liu, and C. Shen, “Improving integrated gradient-based transferable adversarial examplesbyrefiningtheintegrationpath,”in Proc. AAAI Conf. Artif. Intell., vol. 39, no. 7, 2025, pp. 6731–6739
work page 2025
-
[41]
Nonlinear total variation based noise removal algorithms,
L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,”Phys. D, vol. 60, no. 1-4, pp. 259–268, 1992
work page 1992
-
[42]
ACTIVE: Towards Highly Transferable 3D Physical Camouflage for Universal and Robust Vehicle Evasion,
N. Suryanto, Y. Kim, H. T. Larasati, H. Kang, T.-T.-H. Le, Y. Hong, H. Yang, S.-Y. Oh, and H. Kim, “ACTIVE: Towards Highly Transferable 3D Physical Camouflage for Universal and Robust Vehicle Evasion,” inProc. IEEE/CVF Int. Conf. Com- put. Vis. (ICCV), 2023, pp. 4305–4314
work page 2023
-
[43]
Gradientsurgeryformulti-tasklearning,
T. Yu, S. Kumar, A. Gupta, S. Levine, K. Hausman, and C.Finn,“Gradientsurgeryformulti-tasklearning,” Adv. Neural Inf. Process. Syst., vol. 33, pp. 5824–5836, 2020
work page 2020
-
[44]
Improving transferability of adversarial examples with input diversity,
C. Xie, Z. Zhang, Y. Zhou, S. H. Bai, J. Wang, Z. Ren, and A. L. Yuille, “Improving transferability of adversarial examples with input diversity,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2730–2739
work page 2019
-
[45]
YOLOv3: An Incremental Improvement
J. Redmon, “Yolov3: An incremental improvement,” arXiv:1804.02767, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[46]
Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 11976–11986
work page 2022
-
[47]
Swin transformer: Hierarchical vision transformer usingshiftedwindows,
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer usingshiftedwindows,”in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 10012–10022
work page 2021
-
[48]
Exploring Plain Vision Transformer Backbones for Object Detection,
Y. Li, H. Mao, R. Girshick, and K. He, “Exploring Plain Vision Transformer Backbones for Object Detection,” inProc. Eur. Conf. Comput. Vis., 2022, pp. 280–296
work page 2022
-
[49]
K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask r-cnn,” in Proc. IEEE Int. Conf. Comput. Vis. , 2017, pp. 2961–2969
work page 2017
-
[50]
Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,
W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” inProc. IEEE/CVF Int. Conf. Comput. Vis. , 2021, pp. 568–578
work page 2021
-
[51]
Pvt v2: Improved baselines with pyramid vision transformer,
W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. u. Lu, P. Luo, and L. Shao, “Pvt v2: Improved baselines with pyramid vision transformer,”Comput. Vis. Media, vol. 8, no. 3, pp. 415– 424, 2022
work page 2022
-
[52]
Focal loss for dense object detection,
T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for dense object detection,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2980–2988
work page 2017
-
[53]
Intriguing properties of neural networks
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,”arXiv:1312.6199, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[54]
Dta: Physical camouflage attacks using differentiable transformation network,
N. Suryanto, Y. Kim, H. Kang, H. T. Larasati, Y. Yun, T.-T.-H. Le, H. Yang, S.-Y. Oh, and H. Kim, “Dta: Physical camouflage attacks using differentiable transformation network,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) , 2022, pp. 15305–15314
work page 2022
-
[55]
Generate transferable adversarial physical camouflages via triplet attention suppression,
J. Wang, X. Liu, Z. Yin, Y. Wang, J. Guo, H. Qin, Q. Wu, and A. Liu, “Generate transferable adversarial physical camouflages via triplet attention suppression,”Int. J. Comput. Vis., vol. 132, no. 11, pp. 5084–5100, 2024
work page 2024
-
[56]
Rtmdet: An empirical study of designing real-time object detectors.arXiv preprint arXiv:2212.07784,
C. Lyu, W. Zhang, H. Huang, Y. Zhou, Y. Wang, Y. Liu, S. Zhang, and K. Chen, “Rtmdet: An empirical study of de- signing real-time object detectors,”arXiv:2212.07784, 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.