pith. machine review for the scientific record. sign in

arxiv: 2604.11998 · v1 · submitted 2026-04-13 · 💻 cs.CV · cs.AI

Recognition: unknown

The Second Challenge on Cross-Domain Few-Shot Object Detection at NTIRE 2026: Methods and Results

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:01 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords cross-domain few-shot object detectionNTIRE challengeobject detectionfew-shot learningdomain generalization
0
0 comments X

The pith

The NTIRE 2026 challenge on cross-domain few-shot object detection attracted 128 participants and 19 valid submissions advancing performance across domains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents the results of the second Cross-Domain Few-Shot Object Detection Challenge at NTIRE 2026. It describes how 128 teams registered, 31 participated actively, and 19 submitted final results using various strategies. The goal is to test object detectors on unseen target domains with very few labeled examples. Readers care because this addresses a key limitation in applying detectors to new real-world settings where data is scarce and domains differ.

Core claim

The challenge engaged significant community interest and participants developed innovative methods that push the performance frontier for detecting objects in new domains under limited annotation conditions, as shown in both open-source and closed-source tracks.

What carries the argument

The two-track evaluation setup testing models on distinct source and target domains with few-shot annotations per class in the target.

Load-bearing premise

The challenge's datasets and scoring rules accurately reflect the difficulties of real-world cross-domain few-shot object detection without competition-specific biases.

What would settle it

Demonstrating that the top challenge methods do not outperform standard baselines on a new, independent cross-domain few-shot dataset would falsify the claim that they push the performance frontier effectively.

Figures

Figures reproduced from arXiv: 2604.11998 by Adh\'emar de Senneville, Bei Dou, Bhoomi Deshpande, Bin Ren, Bowen Fu, Chao Chen, Dandan Zhao, Di Yang, Dongdong Lu, Flavien Armangeon, Guangwei Huang, Guoyi Xu, Hanzhe Xia, Hao Tang, Harsh Patil, Hongchun Zhu, Hui Qiao, Jiachen Tu, Jiacong Liu, Jiajia Liu, Jiancheng Pan, Jiawei Geng, Jingru Wang, Junjie Liu, Kaiyu Li, Ke Li, Kyeongryeol Go, Lingyi Hong, Li Wang, Liwei Zhou, Longfei Liu, Mengbers, Mingxi Cheng, Mohamed Elhoseiny, Nicu Sebe, Qi Xu, Radu Timofte, Rakshita Kulkarni, Ravi Kirasur, Runze Li, Saiprasad Meesiyawar, Shu Luo, Shuming Hu, Taewoong Jang, Tao Wang, Tao Wu, Uma Mudenagudi, Wei Zhou, Wenqiang Zhang, Xiangyong Cao, Xingdong Sheng, Xingqi He, Xingyu Qiu, Xi Shen, Xuanlong Yu, Yang Yang, Yanwei Fu, Yaokun Shi, Yaoxin Jiang, Yaze Zhao, Yazhe Lyu, Yikai Qin, Yixiong Zou, Yongwei Jiang, Youyang Sha, Yuqian Fu, Zekang Fan, Zhenzhao Xing, Zhe Zhang, Zhimeng Xin, Zijian Zhuang, Zixuan Jiang, Zongwei Wu.

Figure 1
Figure 1. Figure 1: Illustration of the challenge settings, including the closed-source and open-source CD-FSOD tracks. The three newly introduced [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overall framework of CD-ViTO baseline method. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of Domain-RAG. Domain-RAG consists of three stages: (1) Domain-aware background retrieval, (2) Domain￾guided background generation, and (3) Foreground-background composition. The whole pipeline is training-free and follows the principle of fix foreground, adapt background. 2.5. Domain-RAG Baseline Model We take Domain-RAG [58], the current state-of-the-art (SOTA) method, as the baseline for the op… view at source ↗
Figure 4
Figure 4. Figure 4: Overview of their efficient tuning and inference. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Post Processing. 4.1.2. Training Details They use the open-vocabulary detection model as the baseline detection model. They utilize Qwen3-VL-235B￾A22B [4] as their MLLM for label generation and post￾processing. Their fine-tuning experiments are conducted on 8 NVIDIA RTX 3090 GPUs, with a batch size of 8 and a base learning rate of 1e-6. During the optimization process, the text model is completely frozen, … view at source ↗
Figure 6
Figure 6. Figure 6: Proposed framework for CDFSOD. Branch 1 iteratively [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Team earth-insights: overview of the training process. [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Overall pipeline of ZEP. tween pseudo-labels and the few-shot ground-truth boxes (1/5/10-shot). Predictions with IoU ≤ 0.3 and matching class labels are filtered out to suppress noisy false positives. They then compute mAP on the remaining predictions, which serves as a proxy for pseudo-label quality. Based on this criterion, they select the pseudo-label source with higher FSOD-mAP (Dataset1 and Dataset2 f… view at source ↗
Figure 9
Figure 9. Figure 9: Overall pipeline of SAIDA 4.6.2. Module Details Shot-agnostic domain adaptation. The initial phase fo￾cuses on adapting the model to the target domain’s visual distribution without relying on the provided challenge la￾bels. This ensures the architecture captures the underlying semantics of the new domain before category-specific fine￾tuning begins. • Foundation model selection: They utilize ZERO[14], a vis… view at source ↗
Figure 10
Figure 10. Figure 10: Overview of the proposed pipeline. Algorithm 1 Pseudo-Label Driven Adaptation 1: Initialize model M 2: for each iteration do 3: Generate predictions B 4: Filter pseudo-labels Bˆ 5: Fine-tune model 6: end for 3. Iterative model adaptation via fine-tuning Given initial predictions B, high-confidence pseudo￾labels are selected as: Bˆ = {bi ∈ B | si > τ} (1) The KLETech-CEVI team applies multiple filtering st… view at source ↗
Figure 11
Figure 11. Figure 11: GLIP-based architecture. 4.7.2. Training Details The KLETech-CEVI team trains the model using large￾scale vision-language datasets, including Conceptual Cap￾tions, SBU Captions, Visual Genome [45], MS COCO [60], and Objects365. For optimization, the following loss func￾tions are employed: • Focal Loss [61] for handling class imbalance and improv￾ing recall • Generalized IoU (GIoU) Loss [88] for enhanced l… view at source ↗
Figure 13
Figure 13. Figure 13: Overall pipeline of GDino-FT. For D1 (underwater) [PITH_FULL_IMAGE:figures/full_fig_p013_13.png] view at source ↗
Figure 15
Figure 15. Figure 15: Overall pipeline of the AIRCAS MILab team’s [PITH_FULL_IMAGE:figures/full_fig_p014_15.png] view at source ↗
Figure 14
Figure 14. Figure 14: Ablation results. Prompt optimization boosts D1 (+12 [PITH_FULL_IMAGE:figures/full_fig_p014_14.png] view at source ↗
Figure 16
Figure 16. Figure 16: Overall pipeline of Negative Prompting for Few-Shot [PITH_FULL_IMAGE:figures/full_fig_p015_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Overall pipeline of Multimodal Query-based Object [PITH_FULL_IMAGE:figures/full_fig_p017_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Overall pipeline of DDR. High-level idea: As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p017_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Overall pipeline of Triple-Tower. They start from a [PITH_FULL_IMAGE:figures/full_fig_p018_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Overall pipeline of AIPR. Input Images Image & Text Encoder Please generate an underwater object detection image based on my input image, including the following input categories: holothurian, echinus, scallop, starfish, fish, corals, diver, cuttlefish, turtle, and jellyfish. Qwen-Image 2.0 Image Decoder Input Text Prompt Generated Images Image & Text Encoder Qwen-VL Model Text Decoder Please assist me in… view at source ↗
Figure 21
Figure 21. Figure 21: Overall pipeline of the data generation process for data [PITH_FULL_IMAGE:figures/full_fig_p019_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Overview of the FusionFormer framework. The ar [PITH_FULL_IMAGE:figures/full_fig_p020_22.png] view at source ↗
read the original abstract

Cross-domain few-shot object detection (CD-FSOD) remains a challenging problem for existing object detectors and few-shot learning approaches, particularly when generalizing across distinct domains. As part of NTIRE 2026, we hosted the second CD-FSOD Challenge to systematically evaluate and promote progress in detecting objects in unseen target domains under limited annotation conditions. The challenge received strong community interest, with 128 registered participants and a total of 696 submissions. Among them, 31 teams actively participated, and 19 teams submitted valid final results. Participants explored a wide range of strategies, introducing innovative methods that push the performance frontier under both open-source and closed-source tracks. This report presents a detailed overview of the NTIRE 2026 CD-FSOD Challenge, including a summary of the submitted approaches and an analysis of the final results across all participating teams. Challenge Codes: https://github.com/ohMargin/NTIRE2026_CDFSOD.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The manuscript reports on the Second Challenge on Cross-Domain Few-Shot Object Detection (CD-FSOD) held as part of NTIRE 2026. It states that the challenge received 128 registrations, 31 active teams, and 19 valid final submissions. Participants employed a wide range of strategies, with the report claiming that innovative methods were introduced that push the performance frontier in both open-source and closed-source tracks. The paper provides a detailed overview of the challenge, a summary of submitted approaches, and an analysis of the final results, along with a link to the challenge code repository.

Significance. This challenge report documents community progress on a difficult problem in computer vision and provides a public benchmark through its summary of methods and results. The explicit provision of the GitHub repository (https://github.com/ohMargin/NTIRE2026_CDFSOD) is a clear strength that supports reproducibility of the evaluation protocol and challenge setup. If the participation counts and high-level outcome statements hold, the manuscript serves as a useful reference for identifying effective strategies in cross-domain few-shot object detection.

minor comments (1)
  1. [Abstract] The abstract refers to 'Challenge Codes' but the manuscript would benefit from a short dedicated subsection describing the repository contents (e.g., evaluation scripts, dataset splits, or baseline implementations) to improve accessibility for readers.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive review and recommendation to accept. The manuscript documents the NTIRE 2026 CD-FSOD Challenge results and methods as described.

Circularity Check

0 steps flagged

Factual competition report with no derivations or predictions

full rationale

The manuscript is a descriptive summary of the NTIRE 2026 CD-FSOD Challenge, reporting registration numbers (128), submissions (696), active teams (31), and valid final results (19) along with high-level strategy categories. No equations, proofs, fitted parameters, or predictive claims appear; all statements are factual accounts of observed competition outcomes. No load-bearing steps reduce to self-definition, fitted inputs renamed as predictions, or self-citation chains.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities; the document is an empirical competition report without theoretical derivations or new postulated constructs.

pith-pipeline@v0.9.0 · 5771 in / 949 out tokens · 38002 ms · 2026-05-10T16:01:12.996460+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

132 extracted references · 18 canonical work pages · 7 internal anchors

  1. [1]

    NT-HAZE: A Benchmark Dataset for Re- alistic Night-time Image Dehazing

    Radu Ancuti, Codruta Ancuti, Radu Timofte, and Cos- min Ancuti. NT-HAZE: A Benchmark Dataset for Re- alistic Night-time Image Dehazing . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  2. [2]

    NTIRE 2026 Nighttime Image De- hazing Challenge Report

    Radu Ancuti, Alexandru Brateanu, Florin Vasluianu, Raul Balmez, Ciprian Orhei, Codruta Ancuti, Radu Timofte, Cosmin Ancuti, et al. NTIRE 2026 Nighttime Image De- hazing Challenge Report . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  3. [4]

    Qwen3-VL Technical Report

    Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical re- port.arXiv preprint arXiv:2511.21631, 2025. 5, 6

  4. [5]

    YOLOv4: Optimal Speed and Accuracy of Object Detection

    Alexey Bochkovskiy, Chien-Yao Wang, and Hong- Yuan Mark Liao. Yolov4: Optimal speed and accuracy of object detection.arXiv preprint arXiv:2004.10934, 2020. 8

  5. [6]

    NTIRE 2026 Challenge on Single Image Reflection Removal in the Wild: Datasets, Results, and Methods

    Jie Cai, Kangning Yang, Zhiyuan Li, Florin Vasluianu, Radu Timofte, et al. NTIRE 2026 Challenge on Single Image Reflection Removal in the Wild: Datasets, Results, and Methods . InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  6. [7]

    End-to-end object detection with transformers

    Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nico- las Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. InECCV,

  7. [8]

    Sam 3: Segment anything with concepts, 2025

    Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoub- hik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman R¨adle, Triantafyllos Afouras, Effrosyni Mavroudi, Kather- ine Xu, Tsung-Han Wu, Yu Zhou, Lil...

  8. [9]

    SAM 3: Segment Anything with Concepts

    Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoub- hik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, et al. Sam 3: Segment anything with concepts.arXiv preprint arXiv:2511.16719, 2025. 8

  9. [10]

    SAM 3: Segment anything with concepts

    Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoub- hik Debnath, Ronghang Hu, Didac Suris Coll-Vinent, Chai- tanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman R ¨adle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu,...

  10. [11]

    MMDetection: Open MMLab Detection Toolbox and Benchmark

    Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, and Dahua Lin. MMDetec- tion: Open mmlab detection toolbox and...

  11. [12]

    The Fourth Chal- lenge on Image Super-Resolution (×4) at NTIRE 2026: Benchmark Results and Method Overview

    Zheng Chen, Kai Liu, Jingkai Wang, Xianglong Yan, Jianze Li, Ziqing Zhang, Jue Gong, Jiatong Li, Lei Sun, Xiaoyang Liu, Radu Timofte, Yulun Zhang, et al. The Fourth Chal- lenge on Image Super-Resolution (×4) at NTIRE 2026: Benchmark Results and Method Overview . InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) W...

  12. [13]

    Rethinking general underwater object detection: Datasets, challenges, and solutions.Neurocomputing

    Chenping Fu, Risheng Liu, Xin Fan, Puyang Chen, Hao Fu, Wanqi Yuan, Ming Zhu, Zhongxuan Luo. Rethinking general underwater object detection: Datasets, challenges, and solutions.Neurocomputing. 2, 3

  13. [14]

    Zero: Industry-ready vision foundation model with multi- modal prompts, 2025

    Sangbum Choi, Kyeongryeol Go, and Taewoong Jang. Zero: Industry-ready vision foundation model with multi- modal prompts, 2025. 10, 11

  14. [15]

    Low Light Image Enhancement Challenge at NTIRE 2026

    George Ciubotariu, Sharif S M A, Abdur Rehman, Fayaz Ali, Rizwan Ali Naqvi, Marcos Conde, Radu Timofte, et al. Low Light Image Enhancement Challenge at NTIRE 2026 . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  15. [16]

    High FPS Video Frame Inter- polation Challenge at NTIRE 2026

    George Ciubotariu, Zhuyun Zhou, Yeying Jin, Zongwei Wu, Radu Timofte, et al. High FPS Video Frame Inter- polation Challenge at NTIRE 2026 . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  16. [17]

    Bert: Pre-training of deep bidirectional trans- formers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional trans- formers for language understanding. InProceedings of the 2019 conference of the North American chapter of the as- sociation for computational linguistics: human language technologies, volume 1 (long and short papers), 2019. 7

  17. [18]

    Arthropod taxonomy orders object detection dataset

    Geir Drange. Arthropod taxonomy orders object detection dataset. Inhttps://doi.org/10.34740/kaggle/dsv/1240192,

  18. [19]

    NTIRE 2026 Rip Current Detection and Segmentation (RipDet- Seg) Challenge Report

    Andrei Dumitriu, Aakash Ralhan, Florin Miron, Florin Ta- tui, Radu Tudor Ionescu, Radu Timofte, et al. NTIRE 2026 Rip Current Detection and Segmentation (RipDet- Seg) Challenge Report . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  19. [20]

    Conde, Zongwei Wu, Yeying Jin, Radu Timofte, et al

    Omar Elezabi, Marcos V . Conde, Zongwei Wu, Yeying Jin, Radu Timofte, et al. Photography Retouching Trans- fer, NTIRE 2026 Challenge: Report . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  20. [21]

    Few-shot object detection with vision foundation models and graph diffu- sion

    Chen-Bin Feng, Youyang Sha, Longfei Liu, Yongjun Yu, Chi Man V ong, Xuanlong Yu, and Xi Shen. Few-shot object detection with vision foundation models and graph diffu- sion. InThe Fourteenth International Conference on Learn- ing Representations, 2026. 15

  21. [22]

    FSOD-VFM: Few-shot object detection with vision foundation models and graph diffusion

    Chen-Bin Feng, Youyang Sha, Longfei Liu, Yongjun Yu, Chi Man V ong, Xuanlong Yu, and Xi Shen. FSOD-VFM: Few-shot object detection with vision foundation models and graph diffusion. InICLR, 2026. 16

  22. [23]

    Wedetect: Fast open-vocabulary object detection as retrieval.arXiv preprint arXiv:2512.12309, 2025

    Shenghao Fu, Yukun Su, Fengyun Rao, Jing Lyu, Xi- aohua Xie, and Wei-Shi Zheng. Wedetect: Fast open- vocabulary object detection as retrieval.arXiv preprint arXiv:2512.12309, 2025. 18

  23. [24]

    Meta-fdmixup: Cross-domain few-shot learning guided by labeled target data

    Yuqian Fu, Yanwei Fu, and Yu-Gang Jiang. Meta-fdmixup: Cross-domain few-shot learning guided by labeled target data. InACM multimedia, 2021. 1

  24. [25]

    Me-d2n: Multi-expert domain decomposi- tional network for cross-domain few-shot learning

    Yuqian Fu, Yu Xie, Yanwei Fu, Jingjing Chen, and Yu- Gang Jiang. Me-d2n: Multi-expert domain decomposi- tional network for cross-domain few-shot learning. InACM multimedia, 2022

  25. [26]

    Styleadv: Meta style adversarial training for cross-domain few-shot learning

    Yuqian Fu, Yu Xie, Yanwei Fu, and Yu-Gang Jiang. Styleadv: Meta style adversarial training for cross-domain few-shot learning. InCVPR, 2023. 1

  26. [27]

    Cross-domain few-shot object detection via enhanced open-set object detector

    Yuqian Fu, Yu Wang, Yixuan Pan, Lian Huai, Xingyu Qiu, Zeyu Shangguan, Tong Liu, Yanwei Fu, Luc Van Gool, and Xingqun Jiang. Cross-domain few-shot object detection via enhanced open-set object detector. InEuropean Conference on Computer Vision, 2024. 1, 2, 3, 10, 19, 20

  27. [28]

    Ntire 2025 challenge on cross-domain few-shot object detection: Methods and re- sults

    Yuqian Fu, Xingyu Qiu, Bin Ren, Yanwei Fu, Radu Tim- ofte, Nicu Sebe, Ming-Hsuan Yang, Luc Van Gool, Kai- jin Zhang, Qingpeng Nong, et al. Ntire 2025 challenge on cross-domain few-shot object detection: Methods and re- sults. InProceedings of the Computer Vision and Pattern Recognition Conference, 2025. 2

  28. [29]

    NTIRE 2026 Challenge on End-to-End Financial Receipt Restoration and Reasoning from Degraded Images: Datasets, Methods and Results

    Bochen Guan, Jinlong Li, Kangning Yang, Chuang Ke, Jie Cai, Florin Vasluianu, Radu Timofte, et al. NTIRE 2026 Challenge on End-to-End Financial Receipt Restoration and Reasoning from Degraded Images: Datasets, Methods and Results . InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  29. [30]

    NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Challenge: AI Flash Portrait (Track 3)

    Ya-nan Guan, Shaonan Zhang, Hang Guo, Yawen Wang, Xinying Fan, Jie Liang, Hui Zeng, Guanyi Qin, Lishen Qu, Tao Dai, Shu-Tao Xia, Lei Zhang, Radu Timofte, et al. NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Challenge: AI Flash Portrait (Track 3) . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  30. [31]

    Remote sensing object detection in the deep learning era—a review.Remote Sensing, 16(2):327, 2024

    Shengxi Gui, Shuang Song, Rongjun Qin, and Yang Tang. Remote sensing object detection in the deep learning era—a review.Remote Sensing, 16(2):327, 2024. 1

  31. [32]

    Scale-equivalent distillation for semi-supervised object detection

    Qiushan Guo, Yao Mu, Jianyu Chen, Tianqi Wang, Yizhou Yu, and Ping Luo. Scale-equivalent distillation for semi-supervised object detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022. 11

  32. [33]

    A broader study of cross-domain few-shot learning

    Yunhui Guo, Noel C Codella, Leonid Karlinsky, James V Codella, John R Smith, Kate Saenko, Tajana Rosing, and Rogerio Feris. A broader study of cross-domain few-shot learning. InECCV, 2020. 1

  33. [34]

    NTIRE 2026 Challenge on Robust AI- Generated Image Detection in the Wild

    Aleksandr Gushchin, Khaled Abud, Ekaterina Shu- mitskaya, Artem Filippov, Georgii Bychkov, Sergey Lavrushkin, Mikhail Erofeev, Anastasia Antsiferova, Changsheng Chen, Shunquan Tan, Radu Timofte, Dmitriy Vatolin, et al. NTIRE 2026 Challenge on Robust AI- Generated Image Detection in the Wild . InProceedings of the IEEE/CVF Conference on Computer Vision and...

  34. [35]

    Robust Deepfake De- tection, NTIRE 2026 Challenge: Report

    Benedikt Hopf, Radu Timofte, et al. Robust Deepfake De- tection, NTIRE 2026 Challenge: Report . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  35. [36]

    Drone-based object counting by spatially regularized re- gional proposal network

    Meng-Ru Hsieh, Yen-Liang Lin, and Winston H Hsu. Drone-based object counting by spatially regularized re- gional proposal network. InProceedings of the IEEE in- ternational conference on computer vision, 2017. 2, 3

  36. [37]

    Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

    Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models. In ICLR. OpenReview.net, 2022. 10, 11

  37. [38]

    Cross-domain weakly-supervised object detection through progressive domain adaptation

    Naoto Inoue, Ryosuke Furuta, Toshihiko Yamasaki, and Kiyoharu Aizawa. Cross-domain weakly-supervised object detection through progressive domain adaptation. InCVPR,

  38. [39]

    Underwater species detection using channel sharpening at- tention

    Lihao Jiang, Yi Wang, Qi Jia, Shengwei Xu, Yu Liu, Xin Fan, Haojie Li, Risheng Liu, Xinwei Xue, and Ruili Wang. Underwater species detection using channel sharpening at- tention. InACM MM, 2021. 3

  39. [40]

    Cha- trex: Taming multimodal llm for joint perception and un- derstanding, 2024

    Qing Jiang, Gen luo, Yuqin Yang, Yuda Xiong, Yihao Chen, Zhaoyang Zeng, Tianhe Ren, and Lei Zhang. Cha- trex: Taming multimodal llm for joint perception and un- derstanding, 2024. 15

  40. [41]

    Mdetr- modulated detection for end-to-end multi-modal under- standing

    Aishwarya Kamath, Mannat Singh, Yann LeCun, Gabriel Synnaeve, Ishan Misra, and Nicolas Carion. Mdetr- modulated detection for end-to-end multi-modal under- standing. InICCV, 2021. 7

  41. [42]

    NTIRE 2026 Low-light Enhancement: Twilight Cowboy Challenge

    Aleksei Khalin, Egor Ershov, Artem Panshin, Sergey Kor- chagin, Georgiy Lobarev, Arseniy Terekhin, Sofiia Doro- gova, Amir Shamsutdinov, Yasin Mamedov, Bakhtiyar Khalfin, Bogdan Sheludko, Emil Zilyaev, Nikola Bani ´c, Georgy Perevozchikov, Radu Timofte, et al. NTIRE 2026 Low-light Enhancement: Twilight Cowboy Challenge . In Proceedings of the IEEE/CVF Con...

  42. [43]

    Few-shot object detection: A comprehensive sur- vey.IEEE Transactions on Neural Networks and Learning Systems, 2023

    Mona K ¨ohler, Markus Eisenbach, and Horst-Michael Gross. Few-shot object detection: A comprehensive sur- vey.IEEE Transactions on Neural Networks and Learning Systems, 2023. 1

  43. [44]

    Visual genome: Connecting language and vision using crowdsourced dense image annotations.International journal of computer vi- sion, 2017

    Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalan- tidis, Li-Jia Li, David A Shamma, et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations.International journal of computer vi- sion, 2017. 7

  44. [45]

    Visual genome

    Ranjay Krishna et al. Visual genome. InIJCV, 2017. 12

  45. [46]

    Imagenet classification with deep convolutional neural net- works

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural net- works. InAdvances in Neural Information Processing Sys- tems, 2012. 7

  46. [47]

    The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale

    Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Ui- jlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Ste- fan Popov, Matteo Malloci, Alexander Kolesnikov, et al. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. International journal of computer vision, 2020. 7

  47. [48]

    Maris: Marine open-vocabulary in- stance segmentation with geometric enhancement and se- mantic alignment.arXiv preprint arXiv:2510.15398, 2025

    Bingyu Li, Feiyu Wang, Da Zhang, Zhiyuan Zhao, Junyu Gao, and Xuelong Li. Maris: Marine open-vocabulary in- stance segmentation with geometric enhancement and se- mantic alignment.arXiv preprint arXiv:2510.15398, 2025. 9

  48. [49]

    Elevater: A benchmark and toolkit for evaluating language-augmented visual models

    Chunyuan Li, Haotian Liu, Liunian Li, Pengchuan Zhang, Jyoti Aneja, Jianwei Yang, Ping Jin, Houdong Hu, Zicheng Liu, Yong Jae Lee, et al. Elevater: A benchmark and toolkit for evaluating language-augmented visual models. Advances in Neural Information Processing Systems, 2022. 7

  49. [50]

    Semantic-aware ship detection with vision-language inte- gration

    Jiahao Li, Jiancheng Pan, Yuze Sun, and Xiaomeng Huang. Semantic-aware ship detection with vision-language inte- gration. InIGARSS 2025-2025 IEEE International Geo- science and Remote Sensing Symposium, 2025. 1

  50. [51]

    The First Challenge on Mobile Real- World Image Super-Resolution at NTIRE 2026: Bench- mark Results and Method Overview

    Jiatong Li, Zheng Chen, Kai Liu, Jingkai Wang, Zihan Zhou, Xiaoyang Liu, Libo Zhu, Radu Timofte, Yulun Zhang, et al. The First Challenge on Mobile Real- World Image Super-Resolution at NTIRE 2026: Bench- mark Results and Method Overview . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  51. [52]

    Object detection in optical remote sensing images: A survey and a new benchmark.ISPRS, 2020

    Ke Li, Gang Wan, Gong Cheng, Liqiu Meng, and Junwei Han. Object detection in optical remote sensing images: A survey and a new benchmark.ISPRS, 2020. 3

  52. [53]

    Grounded language-image pre-training

    Liunian Li et al. Grounded language-image pre-training. In CVPR, 2022. 11

  53. [54]

    Grounded language-image pre-training

    Liunian Harold Li, Pengchuan Zhang, Haotian Zhang, Jian- wei Yang, Chunyuan Li, Yiwu Zhong, Lijuan Wang, Lu Yuan, Lei Zhang, Jenq-Neng Hwang, et al. Grounded language-image pre-training. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022. 6

  54. [55]

    Cross-domain few-shot learning with task-specific adapters

    Wei-Hong Li, Xialei Liu, and Hakan Bilen. Cross-domain few-shot learning with task-specific adapters. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, 2022. 1

  55. [56]

    NTIRE 2026 Challenge on Short-form UGC Video Restoration in the Wild with Generative Mod- els: Datasets, Methods and Results

    Xin Li, Jiachao Gong, Xijun Wang, Shiyao Xiong, Bingchen Li, Suhang Yao, Chao Zhou, Zhibo Chen, Radu Timofte, et al. NTIRE 2026 Challenge on Short-form UGC Video Restoration in the Wild with Generative Mod- els: Datasets, Methods and Results . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  56. [57]

    NTIRE 2026 The Second Challenge on Day and Night Raindrop Removal for Dual- Focused Images: Methods and Results

    Xin Li, Yeying Jin, Suhang Yao, Beibei Lin, Zhaoxin Fan, Wending Yan, Xin Jin, Zongwei Wu, Bingchen Li, Peishu Shi, Yufei Yang, Yu Li, Zhibo Chen, Bihan Wen, Robby Tan, Radu Timofte, et al. NTIRE 2026 The Second Challenge on Day and Night Raindrop Removal for Dual- Focused Images: Methods and Results . InProceedings of the IEEE/CVF Conference on Computer ...

  57. [58]

    Domain-rag: Retrieval-guided compositional image generation for cross-domain few-shot object detection

    Yu Li, Xingyu Qiu, Yuqian Fu, Jie Chen, Tianwen Qian, Xu Zheng, Danda Pani Paudel, Yanwei Fu, Xuanjing Huang, Luc Van Gool, et al. Domain-rag: Retrieval-guided compo- sitional image generation for cross-domain few-shot object detection.arXiv preprint arXiv:2506.05872, 2025. 1, 4

  58. [59]

    Microsoft coco: Common objects in context

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll ´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. InECCV, 2014. 1, 3, 7, 20

  59. [60]

    Microsoft coco

    Tsung-Yi Lin et al. Microsoft coco. InECCV, 2014. 12

  60. [61]

    Focal loss

    Tsung-Yi Lin et al. Focal loss. InICCV, 2017. 12

  61. [62]

    The First Challenge on Remote Sensing Infrared Image Super-Resolution at NTIRE 2026: Benchmark Results and Method Overview

    Kai Liu, Haoyang Yue, Zeli Lin, Zheng Chen, Jingkai Wang, Jue Gong, Radu Timofte, Yulun Zhang, et al. The First Challenge on Remote Sensing Infrared Image Super-Resolution at NTIRE 2026: Benchmark Results and Method Overview . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  62. [63]

    Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

    Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection.arXiv preprint arXiv:2303.05499, 2023. 10

  63. [64]

    Grounding dino: Marrying dino with grounded pre-training for open-set object detection

    Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. InEu- ropean Conference on Computer Vision, 2024. 6, 8

  64. [65]

    Conde, et al

    Shuhong Liu, Ziteng Cui, Chenyu Bao, Xuangeng Chu, Lin Gu, Bin Ren, Radu Timofte, Marcos V . Conde, et al. 3D Restoration and Reconstruction in Adverse Conditions: Re- alX3D Challenge Results . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  65. [66]

    NTIRE 2026 X- AIGC Quality Assessment Challenge: Methods and Results

    Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Qiang Hu, Jiezhang Cao, Yu Zhou, Wei Sun, Farong Wen, Zitong Xu, Yingjie Zhou, Huiyu Duan, Lu Liu, Jiarui Wang, Siqi Luo, Chunyi Li, Li Xu, Zicheng Zhang, Yue Shi, Yubo Wang, Minghong Zhang, Chunchao Guo, Zhichao Hu, Mingtao Chen, Xiele Wu, Xin Ma, Zhaohe Lv, Yuanhao Xue, Jiaqi Wang, Xinxing Sha, Radu Timofte, et...

  66. [67]

    Diverse instance generation via diffusion models for enhanced few-shot ob- ject detection in remote sensing images.IEEE Geoscience and Remote Sensing Letters, 2025

    Yanxing Liu, Jiancheng Pan, Jianwei Yang, Tiancheng Chen, Peiling Zhou, and Bingchen Zhang. Diverse instance generation via diffusion models for enhanced few-shot ob- ject detection in remote sensing images.IEEE Geoscience and Remote Sensing Letters, 2025. 1

  67. [68]

    Swin transformer: Hierarchical vision transformer using shifted windows

    Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021. 7, 11

  68. [69]

    Cdformer: Cross-domain few-shot object detection trans- former against feature confusion

    Boyuan Meng, Xiaohan Zhang, Peilin Li, Zhe Wu, Yim- ing Li, Wenkai Zhao, Beinan Yu, and Hui-Liang Shen. Cdformer: Cross-domain few-shot object detection trans- former against feature confusion. InICME, 2025. 1, 20

  69. [70]

    NTIRE 2026 Challenge on Video Saliency Prediction: Methods and Results

    Andrey Moskalenko, Alexey Bryncev, Ivan Kosmynin, Kira Shilovskaya, Mikhail Erofeev, Dmitry Vatolin, Radu Timofte, et al. NTIRE 2026 Challenge on Video Saliency Prediction: Methods and Results . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  70. [71]

    V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernan- dez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al

    Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernan- dez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. DINOv2: Learning robust visual features without su- pervision.Transactions on Machine Learning Research,

  71. [72]

    Im2text: Describing images using 1 million captioned pho- tographs.Advances in neural information processing sys- tems, 24, 2011

    Vicente Ordonez, Girish Kulkarni, and Tamara Berg. Im2text: Describing images using 1 million captioned pho- tographs.Advances in neural information processing sys- tems, 24, 2011. 7

  72. [73]

    Locate anything on earth: Advancing open-vocabulary object detection for remote sensing com- munity

    Jiancheng Pan, Yanxing Liu, Yuqian Fu, Muyuan Ma, Jiahao Li, Danda Pani Paudel, Luc Van Gool, and Xi- aomeng Huang. Locate anything on earth: Advancing open-vocabulary object detection for remote sensing com- munity. InProceedings of the AAAI Conference on Artifi- cial Intelligence, 2025. 1

  73. [74]

    Enhance then search: An augmentation-search strategy with foundation models for cross-domain few-shot object detection

    Jiancheng Pan, Yanxing Liu, Xiao He, Long Peng, Jiahao Li, Yuze Sun, and Xiaomeng Huang. Enhance then search: An augmentation-search strategy with foundation models for cross-domain few-shot object detection. InProceed- ings of the Computer Vision and Pattern Recognition Con- ference, 2025. 1, 8

  74. [75]

    NTIRE 2026 Challenge on Efficient Burst HDR and Restoration: Datasets, Methods, and Results

    Hyunhee Park, Eunpil Park, Sangmin Lee, Radu Timofte, et al. NTIRE 2026 Challenge on Efficient Burst HDR and Restoration: Datasets, Methods, and Results . InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  75. [76]

    NTIRE 2026 Challenge on Learned Smartphone ISP with Unpaired Data: Methods and Results

    Georgy Perevozchikov, Daniil Vladimirov, Radu Timofte, et al. NTIRE 2026 Challenge on Learned Smartphone ISP with Unpaired Data: Methods and Results . InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  76. [77]

    Defrcn: Decoupled faster r-cnn for few-shot object detection

    Limeng Qiao, Yuxuan Zhao, Zhiyuan Li, Xi Qiu, Jianan Wu, and Chi Zhang. Defrcn: Decoupled faster r-cnn for few-shot object detection. InICCV, 2021. 1

  77. [78]

    NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Chal- lenge: Professional Image Quality Assessment (Track 1)

    Guanyi Qin, Jie Liang, Bingbing Zhang, Lishen Qu, Ya-nan Guan, Hui Zeng, Lei Zhang, Radu Timofte, et al. NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Chal- lenge: Professional Image Quality Assessment (Track 1) . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  78. [79]

    The Second Challenge on Cross-Domain Few-Shot Object Detection at NTIRE 2026: Methods and Results

    Xingyu Qiu, Yuqian Fu, Jiawei Geng, Bin Ren, Jiancheng Pan, Zongwei Wu, Hao Tang, Yanwei Fu, Radu Timo- fte, Nicu Sebe, Mohamed Elhoseiny, et al. The Second Challenge on Cross-Domain Few-Shot Object Detection at NTIRE 2026: Methods and Results . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  79. [80]

    NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Challenge: Multi-Exposure Image Fusion in Dynamic Scenes (Track2)

    Lishen Qu, Yao Liu, Jie Liang, Hui Zeng, Wen Dai, Ya-nan Guan, Guanyi Qin, Shihao Zhou, Jufeng Yang, Lei Zhang, Radu Timofte, et al. NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Challenge: Multi-Exposure Image Fusion in Dynamic Scenes (Track2) . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  80. [81]

    Qwen3.5: Towards native multimodal agents,

    Qwen Team. Qwen3.5: Towards native multimodal agents,

Showing first 80 references.