Recognition: unknown
Reward-Guided Semantic Evolution for Test-time Adaptive Object Detection
Pith reviewed 2026-05-08 17:08 UTC · model grok-4.3
The pith
RGSE refines text embeddings at test time through reward-guided perturbations to correct semantic misalignment in open-vocabulary object detection without any training.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RGSE treats text embedding adaptation as a semantic search process: it perturbs text embeddings as candidate variants, evaluates them via cosine similarity with current and historical high-confidence visual proposals as a reward signal, and fuses them into a refined embedding through reward-weighted averaging.
What carries the argument
Reward-guided semantic evolution that perturbs text embeddings, scores variants by cosine similarity to high-confidence visual proposals, and fuses them by reward-weighted averaging.
If this is right
- Achieves state-of-the-art detection accuracy across multiple benchmarks under test-time distribution shifts.
- Adds only minimal computational overhead relative to standard forward passes.
- Bypasses both backpropagation-based adaptation and external-memory methods used in prior work.
- Directly realigns text and vision embeddings in a fully training-free manner.
Where Pith is reading between the lines
- The same perturbation-plus-reward mechanism could be applied to other vision-language tasks such as open-vocabulary segmentation or captioning where embedding drift occurs at test time.
- Historical proposals already collected during a session might make the method especially stable for video or streaming detection.
- Refining the perturbation distribution or the number of candidates evaluated could further reduce the already-low overhead.
- The approach suggests evolutionary search in embedding space as a general lightweight substitute for gradient-based test-time adaptation.
Load-bearing premise
Cosine similarity between perturbed text embeddings and high-confidence visual proposals provides a reliable signal of better semantic alignment.
What would settle it
A benchmark run in which RGSE produces lower average precision than the original unadapted model on a dataset known to contain distribution shift, especially if the reward scores fail to track actual detection improvements.
Figures
read the original abstract
Open-vocabulary object detection with vision-language models (VLMs) such as Grounding DINO suffers from performance degradation under test-time distribution shifts, primarily due to semantic misalignment between text embeddings and shifted visual embeddings of region proposals. While recent test-time adaptive object detection methods for VLM-based either rely on costly backpropagation or bypass semantic misalignment via external memory, none directly and efficiently align text and vision in a training-free manner. To address this, we propose Reward-Guided Semantic Evolution (RGSE), a training-free framework that directly refines the text embeddings at test time. Inspired by evolutionary search, RGSE treats text embedding adaptation as a semantic search process: it perturbs text embeddings as candidate variants, evaluates them via cosine similarity with current and historical high-confidence visual proposals as a reward signal, and fuses them into a refined embedding through reward-weighted averaging. Without any backpropagation, RGSE achieves state-of-the-art performance across multiple detection benchmarks while adding minimal computational overhead. Our code will be open source upon publication.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Reward-Guided Semantic Evolution (RGSE), a training-free test-time adaptation framework for open-vocabulary object detection with VLMs such as Grounding DINO. Text embeddings are perturbed to generate candidate variants; each variant is scored by cosine similarity to high-confidence region proposals drawn from the current input and a historical buffer; the scores serve as rewards to compute a refined embedding via weighted averaging. The authors claim this process corrects semantic misalignment, yields state-of-the-art results on multiple detection benchmarks, and incurs only minimal computational overhead without any back-propagation or parameter updates.
Significance. If the empirical claims hold and the reward signal proves robust, RGSE would constitute a lightweight, training-free alternative to existing test-time adaptation techniques that rely on optimization or external memory banks. The emphasis on direct semantic alignment via evolutionary search and the commitment to open-sourcing code are positive contributions to reproducibility in the field.
major comments (2)
- [Method (reward signal definition)] The central claim that cosine similarity to high-confidence visual proposals supplies a reliable reward signal rests on the assumption that the base detector's proposals remain sufficiently accurate under distribution shift. The manuscript provides no analysis or ablation of proposal quality (e.g., precision of high-confidence boxes before versus after adaptation) or of how the historical buffer accumulates usable signal before the reward collapses. This assumption is load-bearing for the assertion that RGSE corrects misalignment without training.
- [Experiments] The SOTA performance claims require explicit ablations isolating the contribution of reward-weighted averaging, historical buffer size, perturbation variance, and the high-confidence threshold. Without these controls, it is impossible to determine whether observed gains stem from the proposed mechanism or from other implementation choices.
minor comments (1)
- [Abstract] The abstract refers to 'multiple detection benchmarks' without naming them; the introduction or experimental section should list the exact datasets and metrics used.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments. We have carefully considered each point and provide detailed responses below, along with plans for revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Method (reward signal definition)] The central claim that cosine similarity to high-confidence visual proposals supplies a reliable reward signal rests on the assumption that the base detector's proposals remain sufficiently accurate under distribution shift. The manuscript provides no analysis or ablation of proposal quality (e.g., precision of high-confidence boxes before versus after adaptation) or of how the historical buffer accumulates usable signal before the reward collapses. This assumption is load-bearing for the assertion that RGSE corrects misalignment without training.
Authors: We acknowledge the importance of validating the reward signal's reliability. While the current manuscript demonstrates performance improvements through the overall framework, we agree that explicit analysis of proposal quality would provide stronger support. In the revised manuscript, we will add a new subsection with ablations on proposal precision (e.g., comparing IoU or classification accuracy of high-confidence boxes pre- and post-adaptation) across benchmarks. We will also include plots showing the evolution of average reward scores over test sequences to illustrate that the historical buffer maintains usable signal without collapse, supporting the training-free adaptation claim. revision: yes
-
Referee: [Experiments] The SOTA performance claims require explicit ablations isolating the contribution of reward-weighted averaging, historical buffer size, perturbation variance, and the high-confidence threshold. Without these controls, it is impossible to determine whether observed gains stem from the proposed mechanism or from other implementation choices.
Authors: We agree that isolating the contributions of each component is essential for rigorous validation of the SOTA claims. The original manuscript includes some component studies, but to fully address this, we will expand the experimental section with dedicated ablations: (1) comparing reward-weighted averaging against uniform or no averaging, (2) varying historical buffer sizes and reporting performance curves, (3) sweeping perturbation variances and their impact on adaptation, and (4) ablating the high-confidence threshold with corresponding results. These additions will clarify that the gains arise from the RGSE mechanism. revision: yes
Circularity Check
No significant circularity in RGSE derivation chain
full rationale
The paper defines the reward signal explicitly as cosine similarity between perturbed text embeddings and independent high-confidence visual proposals produced by the base detector (Grounding DINO). This signal is computed from external visual data rather than being defined in terms of the target detection performance or the refined embeddings themselves. The subsequent reward-weighted averaging is a direct, non-iterative fusion step with no fitted parameters or self-referential loops. No equations, self-citations, or uniqueness theorems are invoked in the provided description to justify the core process, and the method remains self-contained against external benchmarks without reducing any claimed prediction to its own inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Learning transferable visual models from natural language supervision,
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInternational conference on machine learning. PmLR, 2021, pp. 8748–8763
2021
-
[2]
Grounding dino: Marrying dino with grounded pre-training for open-set object detection,
S. Liu, Z. Zeng, T. Ren, F. Li, H. Zhang, J. Yang, Q. Jiang, C. Li, J. Yang, H. Suet al., “Grounding dino: Marrying dino with grounded pre-training for open-set object detection,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 38–55
2024
-
[3]
Generalizing vision-language models to novel domains: A comprehensive survey,
X. Li, J. Li, F. Li, L. Zhu, Y . Yang, and H. T. Shen, “Generalizing vision-language models to novel domains: A comprehensive survey,” arXiv preprint arXiv:2506.18504, 2025
-
[4]
A survey on transfer learning,
S. J. Pan and Q. Yang, “A survey on transfer learning,”IEEE Trans- actions on knowledge and data engineering, vol. 22, no. 10, pp. 1345– 1359, 2009
2009
-
[5]
Dual domain- attribute learning framework with asynchronous adapters for continual test-time adaptation,
Y . Tian, K. Li, T. He, L. Wan, P.-A. Heng, and W. Feng, “Dual domain- attribute learning framework with asynchronous adapters for continual test-time adaptation,”IEEE Transactions on Image Processing, 2026
2026
-
[6]
Consistent assistant domains transformer for source-free domain adaptation,
R. Shao, W. Zhang, K. Luo, Q. Li, and J. Wang, “Consistent assistant domains transformer for source-free domain adaptation,”IEEE Trans- actions on Image Processing, 2025
2025
-
[7]
Deep label propagation with nuclear norm maximization for visual domain adaptation,
W. Wang, H. Li, C. Wang, C. Huang, Z. Ding, F. Nie, and X. Cao, “Deep label propagation with nuclear norm maximization for visual domain adaptation,”IEEE Transactions on Image Processing, vol. 34, pp. 1246– 1258, 2025
2025
-
[8]
Adaptive dispersal and collaborative clustering for few-shot unsupervised domain adaptation,
Y . Lu, H. Huang, W. K. Wong, X. Hu, Z. Lai, and X. Li, “Adaptive dispersal and collaborative clustering for few-shot unsupervised domain adaptation,”IEEE Transactions on Image Processing, 2025
2025
-
[9]
A comprehensive survey on test-time adaptation under distribution shifts,
J. Liang, R. He, and T. Tan, “A comprehensive survey on test-time adaptation under distribution shifts,”International Journal of Computer Vision, vol. 133, no. 1, pp. 31–64, 2025
2025
-
[10]
Towards efficient test time adaptation with hierarchical distribution alignment,
Y . Liu, C. Huang, Y . Xu, X. Cao, and J. Wang, “Towards efficient test time adaptation with hierarchical distribution alignment,”IEEE Transactions on Image Processing, 2025
2025
-
[11]
A3-tta: Adaptive anchor alignment test-time adaptation for image segmentation,
J. Wu, X. Luo, Y . Zhou, L. Wu, G. Wang, and S. Zhang, “A3-tta: Adaptive anchor alignment test-time adaptation for image segmentation,” IEEE Transactions on Image Processing, vol. 34, pp. 8511–8522, 2025
2025
-
[12]
Test-time prompt tuning for zero-shot generalization in vision- language models,
M. Shu, W. Nie, D.-A. Huang, Z. Yu, T. Goldstein, A. Anandkumar, and C. Xiao, “Test-time prompt tuning for zero-shot generalization in vision- language models,” inProceedings of the 36th International Conference on Neural Information Processing Systems, 2022, pp. 14 274–14 289
2022
-
[13]
Diverse data augmentation with diffusions for effective test-time prompt tuning,
C.-M. Feng, K. Yu, Y . Liu, S. Khan, and W. Zuo, “Diverse data augmentation with diffusions for effective test-time prompt tuning,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 2704–2714
2023
-
[14]
Efficient test-time adaptation of vision-language models,
A. Karmanov, D. Guan, S. Lu, A. El Saddik, and E. Xing, “Efficient test-time adaptation of vision-language models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 14 162–14 171
2024
-
[15]
R-tpt: Improving adversarial robustness of vision-language models through test-time prompt tuning,
L. Sheng, J. Liang, Z. Wang, and R. He, “R-tpt: Improving adversarial robustness of vision-language models through test-time prompt tuning,” inProceedings of the Computer Vision and Pattern Recognition Confer- ence, 2025, pp. 29 958–29 967
2025
-
[16]
O- tpt: Orthogonality constraints for calibrating test-time prompt tuning in vision-language models,
A. Sharifdeen, M. A. Munir, S. Baliah, S. Khan, and M. H. Khan, “O- tpt: Orthogonality constraints for calibrating test-time prompt tuning in vision-language models,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 19 942–19 951
2025
-
[17]
Bayesian test-time adaptation for vision-language models,
L. Zhou, M. Ye, S. Li, N. Li, X. Zhu, L. Deng, H. Liu, and Z. Lei, “Bayesian test-time adaptation for vision-language models,” inPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2025
2025
-
[18]
Dual prototype evolving for test-time generalization of vision-language models,
C. Zhang, S. Stepputtis, K. Sycara, and Y . Xie, “Dual prototype evolving for test-time generalization of vision-language models,” vol. 37, 2024, pp. 32 111–32 136
2024
-
[19]
Cola: Context-aware language-driven test-time adaptation,
A. Zhang, T. Yu, L. Bai, J. Tang, Y . Guo, Y . Ruan, Y . Zhou, and Z. Lu, “Cola: Context-aware language-driven test-time adaptation,”IEEE Transactions on Image Processing, 2025. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 10
2025
-
[20]
Vlod-tta: Test-time adaptation of vision-language object detectors,
A. Belal, H. R. Medeiros, M. Pedersoli, and E. Granger, “Vlod-tta: Test-time adaptation of vision-language object detectors,”arXiv preprint arXiv:2510.00458, 2025
-
[21]
Bayesian test-time adaptation for object recognition and de- tection with vision-language models,
L. Zhou, M. Ye, S. Li, N. Li, J. Wu, X. Zhu, L. Deng, H. Liu, J. Luo, and Z. Lei, “Bayesian test-time adaptation for object recognition and de- tection with vision-language models,”arXiv preprint arXiv:2510.02750, 2025
-
[22]
Historical test- time prompt tuning for vision foundation models,
J. Zhang, J. Huang, X. Zhang, L. Shao, and S. Lu, “Historical test- time prompt tuning for vision foundation models,” inProceedings of the 38th International Conference on Neural Information Processing Systems, 2024, pp. 12 872–12 896
2024
-
[23]
Test-time adaptive object de- tection with foundation model,
Y . Gao, Y . Zhang, Z. Cai, and D. Huang, “Test-time adaptive object de- tection with foundation model,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
2025
-
[24]
2016, arXiv e-prints, arXiv:1604.00772, doi: 10.48550/arXiv.1604.00772
N. Hansen, “The cma evolution strategy: A tutorial,”arXiv preprint arXiv:1604.00772, 2016
-
[25]
M. Jaderberg, V . Dalibard, S. Osindero, W. M. Czarnecki, J. Don- ahue, A. Razavi, O. Vinyals, T. Green, I. Dunning, K. Simonyan et al., “Population based training of neural networks,”arXiv preprint arXiv:1711.09846, 2017
-
[26]
Deep residual learning for image recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778
2016
-
[27]
Faster r-cnn: Towards real-time object detection with region proposal networks,
S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,”IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 6, pp. 1137–1149, 2016
2016
-
[28]
Scaling up visual and vision-language representation learning with noisy text supervision,
C. Jia, Y . Yang, Y . Xia, Y .-T. Chen, Z. Parekh, H. Pham, Q. Le, Y .-H. Sung, Z. Li, and T. Duerig, “Scaling up visual and vision-language representation learning with noisy text supervision,” inInternational conference on machine learning. PMLR, 2021, pp. 4904–4916
2021
-
[29]
Filip: Fine-grained interactive language-image pre-training,
L. Yao, R. Huang, L. Hou, G. Lu, M. Niu, H. Xu, X. Liang, Z. Li, X. Jiang, and C. Xu, “Filip: Fine-grained interactive language-image pre-training,” inInternational Conference on Learning Representations
-
[30]
Pali: A jointly-scaled multilingual language-image model,
X. Chen, X. Wang, S. Changpinyo, A. Piergiovanni, P. Padlewski, D. Salz, S. Goodman, A. Grycner, B. Mustafa, L. Beyeret al., “Pali: A jointly-scaled multilingual language-image model,” inThe Eleventh International Conference on Learning Representations
-
[31]
Lit: Zero-shot transfer with locked-image text tuning,
X. Zhai, X. Wang, B. Mustafa, A. Steiner, D. Keysers, A. Kolesnikov, and L. Beyer, “Lit: Zero-shot transfer with locked-image text tuning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 18 123–18 133
2022
-
[32]
Contrastive vision-language pre-training with limited resources,
Q. Cui, B. Zhou, Y . Guo, W. Yin, H. Wu, O. Yoshie, and Y . Chen, “Contrastive vision-language pre-training with limited resources,” in European Conference on Computer Vision. Springer, 2022, pp. 236– 253
2022
-
[33]
Open-vocabulary object detec- tion via vision and language knowledge distillation,
X. Gu, T.-Y . Lin, W. Kuo, and Y . Cui, “Open-vocabulary object detec- tion via vision and language knowledge distillation,” inInternational Conference on Learning Representations, 2022
2022
-
[34]
Detclip: dictionary-enriched visual-concept paralleled pre- training for open-world detection,
L. Yao, J. Han, Y . Wen, X. Liang, D. Xu, W. Zhang, Z. Li, C. Xu, and H. Xu, “Detclip: dictionary-enriched visual-concept paralleled pre- training for open-world detection,” inProceedings of the 36th Interna- tional Conference on Neural Information Processing Systems, 2022, pp. 9125–9138
2022
-
[35]
Open-vocabulary object detection using captions,
A. Zareian, K. D. Rosa, D. H. Hu, and S.-F. Chang, “Open-vocabulary object detection using captions,” inProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, 2021, pp. 14 393– 14 402
2021
-
[36]
End-to-end object detection with transformers,
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European conference on computer vision. Springer, 2020, pp. 213– 229
2020
-
[37]
Grounded language-image pre-training,
L. H. Li, P. Zhang, H. Zhang, J. Yang, C. Li, Y . Zhong, L. Wang, L. Yuan, L. Zhang, J.-N. Hwanget al., “Grounded language-image pre-training,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 965–10 975
2022
-
[38]
Yolo- world: Real-time open-vocabulary object detection,
T. Cheng, L. Song, Y . Ge, W. Liu, X. Wang, and Y . Shan, “Yolo- world: Real-time open-vocabulary object detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 16 901–16 911
2024
-
[39]
Tent: Fully test-time adaptation by entropy minimization,
D. Wang, E. Shelhamer, S. Liu, B. Olshausen, and T. Darrell, “Tent: Fully test-time adaptation by entropy minimization,” inInternational Conference on Learning Representations
-
[40]
Y . Chen, X. Xu, Y . Su, and K. Jia, “Stfar: Improving object detection robustness at test-time by self-training with feature alignment regular- ization,”arXiv preprint arXiv:2303.17937, 2023
-
[41]
What how and when should object detectors update in continually changing test domains?
J. Yoo, D. Lee, I. Chung, D. Kim, and N. Kwak, “What how and when should object detectors update in continually changing test domains?” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 23 354–23 363
2024
-
[42]
Towards online domain adaptive object detection,
V . VS, P. Oza, and V . M. Patel, “Towards online domain adaptive object detection,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 478–488
2023
-
[43]
Mlfa: Toward realistic test time adaptive object detection by multi-level feature alignment,
Y . Liu, J. Wang, C. Huang, Y . Wu, Y . Xu, and X. Cao, “Mlfa: Toward realistic test time adaptive object detection by multi-level feature alignment,”IEEE Transactions on Image Processing, vol. 33, pp. 5837– 5848, 2024
2024
-
[44]
Efficient test-time adaptive object detection via sensitivity-guided pruning,
K. Wang, X. Fu, X. Lu, C. Ge, C. Cao, W. Zhai, and Z.-J. Zha, “Efficient test-time adaptive object detection via sensitivity-guided pruning,” in Proceedings of the Computer Vision and Pattern Recognition Confer- ence, 2025, pp. 10 577–10 586
2025
-
[45]
Fully test-time adaptation for object detection,
X. Ruan and W. Tang, “Fully test-time adaptation for object detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 1038–1047
2024
-
[46]
Large-scale evolution of image classifiers,
E. Real, S. Moore, A. Selle, S. Saxena, Y . L. Suematsu, J. Tan, Q. V . Le, and A. Kurakin, “Large-scale evolution of image classifiers,” in International conference on machine learning. PMLR, 2017, pp. 2902– 2911
2017
-
[47]
Regularized evolution for image classifier architecture search,
E. Real, A. Aggarwal, Y . Huang, and Q. V . Le, “Regularized evolution for image classifier architecture search,” inProceedings of the aaai conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 4780– 4789
2019
-
[48]
A survey on evolutionary computation approaches to feature selection,
B. Xue, M. Zhang, W. N. Browne, and X. Yao, “A survey on evolutionary computation approaches to feature selection,”IEEE Transactions on evolutionary computation, vol. 20, no. 4, pp. 606–626, 2015
2015
-
[49]
Bert: Pre-training of deep bidirectional transformers for language understanding,
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” inPro- ceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 2019, pp. 4171–4186
2019
-
[50]
Swin transformer: Hierarchical vision transformer using shifted windows,
Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10 012–10 022
2021
-
[51]
Dino: Detr with improved denoising anchor boxes for end-to-end object detection,
H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. Ni, and H.-Y . Shum, “Dino: Detr with improved denoising anchor boxes for end-to-end object detection,” inThe Eleventh International Conference on Learning Representations, 2022
2022
-
[52]
Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation,
J. Liang, D. Hu, and J. Feng, “Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation,” inInternational conference on machine learning. PMLR, 2020, pp. 6028–6039
2020
-
[53]
Test-time classifier adjustment module for model-agnostic domain generalization,
Y . Iwasawa and Y . Matsuo, “Test-time classifier adjustment module for model-agnostic domain generalization,” vol. 34, 2021, pp. 2427–2440
2021
-
[54]
End-to-end semi-supervised object detection with soft teacher,
M. Xu, Z. Zhang, H. Hu, J. Wang, L. Wang, F. Wei, X. Bai, and Z. Liu, “End-to-end semi-supervised object detection with soft teacher,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 3060–3069
2021
-
[55]
Revisiting realistic test-time training: Sequential inference and adaptation by anchored clustering,
Y . Su, X. Xu, and K. Jia, “Revisiting realistic test-time training: Sequential inference and adaptation by anchored clustering,” vol. 35, 2022, pp. 17 543–17 555
2022
-
[56]
Semantic foggy scene under- standing with synthetic data,
C. Sakaridis, D. Dai, and L. Van Gool, “Semantic foggy scene under- standing with synthetic data,”International Journal of Computer Vision, vol. 126, no. 9, pp. 973–992, 2018
2018
-
[57]
The cityscapes dataset for semantic urban scene understanding,
M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Be- nenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3213– 3223
2016
-
[58]
Ecker, Matthias Bethge, and Wieland Brendel
C. Michaelis, B. Mitzkus, R. Geirhos, E. Rusak, O. Bringmann, A. S. Ecker, M. Bethge, and W. Brendel, “Benchmarking robustness in object detection: Autonomous driving when winter is coming,”arXiv preprint arXiv:1907.07484, 2019
-
[59]
The pascal visual object classes challenge: A retrospective,
M. Everingham, S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes challenge: A retrospective,”International journal of computer vision, vol. 111, no. 1, pp. 98–136, 2015
2015
-
[60]
Microsoft coco: Common objects in context,
T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” inEuropean conference on computer vision. Springer, 2014, pp. 740–755
2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.