arxiv: 2604.11998 · v1 · submitted 2026-04-13 · 💻 cs.CV · cs.AI

Recognition: unknown

The Second Challenge on Cross-Domain Few-Shot Object Detection at NTIRE 2026: Methods and Results

Xingyu Qiu , Yuqian Fu , Jiawei Geng , Bin Ren , Jiancheng Pan , Zongwei Wu , Hao Tang , Yanwei Fu

show 65 more authors

Radu Timofte Nicu Sebe Mohamed Elhoseiny Lingyi Hong Mingxi Cheng Xingqi He Runze Li Xingdong Sheng Wenqiang Zhang Jiacong Liu Shu Luo Yikai Qin Yaze Zhao Yongwei Jiang Yixiong Zou Zhe Zhang Yang Yang Kaiyu Li Bowen Fu Zixuan Jiang Ke Li Hui Qiao Xiangyong Cao Xuanlong Yu Youyang Sha Longfei Liu Di Yang Xi Shen Kyeongryeol Go Taewoong Jang Saiprasad Meesiyawar Ravi Kirasur Rakshita Kulkarni Bhoomi Deshpande Harsh Patil Uma Mudenagudi Shuming Hu Chao Chen Tao Wang Wei Zhou Qi Xu Zhenzhao Xing Dandan Zhao Hanzhe Xia Dongdong Lu Jingru Wang Guangwei Huang Jiachen Tu Yaokun Shi Guoyi Xu Yaoxin Jiang Jiajia Liu Liwei Zhou Bei Dou Tao Wu Zekang Fan Junjie Liu Adh\'emar de Senneville Flavien Armangeon Mengbers Yazhe Lyu Zhimeng Xin Zijian Zhuang Hongchun Zhu Li Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:01 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords cross-domain few-shot object detectionNTIRE challengeobject detectionfew-shot learningdomain generalization

0 comments

The pith

The NTIRE 2026 challenge on cross-domain few-shot object detection attracted 128 participants and 19 valid submissions advancing performance across domains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents the results of the second Cross-Domain Few-Shot Object Detection Challenge at NTIRE 2026. It describes how 128 teams registered, 31 participated actively, and 19 submitted final results using various strategies. The goal is to test object detectors on unseen target domains with very few labeled examples. Readers care because this addresses a key limitation in applying detectors to new real-world settings where data is scarce and domains differ.

Core claim

The challenge engaged significant community interest and participants developed innovative methods that push the performance frontier for detecting objects in new domains under limited annotation conditions, as shown in both open-source and closed-source tracks.

What carries the argument

The two-track evaluation setup testing models on distinct source and target domains with few-shot annotations per class in the target.

Load-bearing premise

The challenge's datasets and scoring rules accurately reflect the difficulties of real-world cross-domain few-shot object detection without competition-specific biases.

What would settle it

Demonstrating that the top challenge methods do not outperform standard baselines on a new, independent cross-domain few-shot dataset would falsify the claim that they push the performance frontier effectively.

Figures

Figures reproduced from arXiv: 2604.11998 by Adh\'emar de Senneville, Bei Dou, Bhoomi Deshpande, Bin Ren, Bowen Fu, Chao Chen, Dandan Zhao, Di Yang, Dongdong Lu, Flavien Armangeon, Guangwei Huang, Guoyi Xu, Hanzhe Xia, Hao Tang, Harsh Patil, Hongchun Zhu, Hui Qiao, Jiachen Tu, Jiacong Liu, Jiajia Liu, Jiancheng Pan, Jiawei Geng, Jingru Wang, Junjie Liu, Kaiyu Li, Ke Li, Kyeongryeol Go, Lingyi Hong, Li Wang, Liwei Zhou, Longfei Liu, Mengbers, Mingxi Cheng, Mohamed Elhoseiny, Nicu Sebe, Qi Xu, Radu Timofte, Rakshita Kulkarni, Ravi Kirasur, Runze Li, Saiprasad Meesiyawar, Shu Luo, Shuming Hu, Taewoong Jang, Tao Wang, Tao Wu, Uma Mudenagudi, Wei Zhou, Wenqiang Zhang, Xiangyong Cao, Xingdong Sheng, Xingqi He, Xingyu Qiu, Xi Shen, Xuanlong Yu, Yang Yang, Yanwei Fu, Yaokun Shi, Yaoxin Jiang, Yaze Zhao, Yazhe Lyu, Yikai Qin, Yixiong Zou, Yongwei Jiang, Youyang Sha, Yuqian Fu, Zekang Fan, Zhenzhao Xing, Zhe Zhang, Zhimeng Xin, Zijian Zhuang, Zixuan Jiang, Zongwei Wu.

**Figure 1.** Figure 1: Illustration of the challenge settings, including the closed-source and open-source CD-FSOD tracks. The three newly introduced [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Overall framework of CD-ViTO baseline method. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of Domain-RAG. Domain-RAG consists of three stages: (1) Domain-aware background retrieval, (2) Domainguided background generation, and (3) Foreground-background composition. The whole pipeline is training-free and follows the principle of fix foreground, adapt background. 2.5. Domain-RAG Baseline Model We take Domain-RAG [58], the current state-of-the-art (SOTA) method, as the baseline for the op… view at source ↗

**Figure 4.** Figure 4: Overview of their efficient tuning and inference. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Post Processing. 4.1.2. Training Details They use the open-vocabulary detection model as the baseline detection model. They utilize Qwen3-VL-235BA22B [4] as their MLLM for label generation and postprocessing. Their fine-tuning experiments are conducted on 8 NVIDIA RTX 3090 GPUs, with a batch size of 8 and a base learning rate of 1e-6. During the optimization process, the text model is completely frozen, … view at source ↗

**Figure 6.** Figure 6: Proposed framework for CDFSOD. Branch 1 iteratively [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Team earth-insights: overview of the training process. [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Overall pipeline of ZEP. tween pseudo-labels and the few-shot ground-truth boxes (1/5/10-shot). Predictions with IoU ≤ 0.3 and matching class labels are filtered out to suppress noisy false positives. They then compute mAP on the remaining predictions, which serves as a proxy for pseudo-label quality. Based on this criterion, they select the pseudo-label source with higher FSOD-mAP (Dataset1 and Dataset2 f… view at source ↗

**Figure 9.** Figure 9: Overall pipeline of SAIDA 4.6.2. Module Details Shot-agnostic domain adaptation. The initial phase focuses on adapting the model to the target domain’s visual distribution without relying on the provided challenge labels. This ensures the architecture captures the underlying semantics of the new domain before category-specific finetuning begins. • Foundation model selection: They utilize ZERO[14], a vis… view at source ↗

**Figure 10.** Figure 10: Overview of the proposed pipeline. Algorithm 1 Pseudo-Label Driven Adaptation 1: Initialize model M 2: for each iteration do 3: Generate predictions B 4: Filter pseudo-labels Bˆ 5: Fine-tune model 6: end for 3. Iterative model adaptation via fine-tuning Given initial predictions B, high-confidence pseudolabels are selected as: Bˆ = {bi ∈ B | si > τ} (1) The KLETech-CEVI team applies multiple filtering st… view at source ↗

**Figure 11.** Figure 11: GLIP-based architecture. 4.7.2. Training Details The KLETech-CEVI team trains the model using largescale vision-language datasets, including Conceptual Captions, SBU Captions, Visual Genome [45], MS COCO [60], and Objects365. For optimization, the following loss functions are employed: • Focal Loss [61] for handling class imbalance and improving recall • Generalized IoU (GIoU) Loss [88] for enhanced l… view at source ↗

**Figure 13.** Figure 13: Overall pipeline of GDino-FT. For D1 (underwater) [PITH_FULL_IMAGE:figures/full_fig_p013_13.png] view at source ↗

**Figure 15.** Figure 15: Overall pipeline of the AIRCAS MILab team’s [PITH_FULL_IMAGE:figures/full_fig_p014_15.png] view at source ↗

**Figure 14.** Figure 14: Ablation results. Prompt optimization boosts D1 (+12 [PITH_FULL_IMAGE:figures/full_fig_p014_14.png] view at source ↗

**Figure 16.** Figure 16: Overall pipeline of Negative Prompting for Few-Shot [PITH_FULL_IMAGE:figures/full_fig_p015_16.png] view at source ↗

**Figure 17.** Figure 17: Overall pipeline of Multimodal Query-based Object [PITH_FULL_IMAGE:figures/full_fig_p017_17.png] view at source ↗

**Figure 18.** Figure 18: Overall pipeline of DDR. High-level idea: As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p017_18.png] view at source ↗

**Figure 19.** Figure 19: Overall pipeline of Triple-Tower. They start from a [PITH_FULL_IMAGE:figures/full_fig_p018_19.png] view at source ↗

**Figure 20.** Figure 20: Overall pipeline of AIPR. Input Images Image & Text Encoder Please generate an underwater object detection image based on my input image, including the following input categories: holothurian, echinus, scallop, starfish, fish, corals, diver, cuttlefish, turtle, and jellyfish. Qwen-Image 2.0 Image Decoder Input Text Prompt Generated Images Image & Text Encoder Qwen-VL Model Text Decoder Please assist me in… view at source ↗

**Figure 21.** Figure 21: Overall pipeline of the data generation process for data [PITH_FULL_IMAGE:figures/full_fig_p019_21.png] view at source ↗

**Figure 22.** Figure 22: Overview of the FusionFormer framework. The ar [PITH_FULL_IMAGE:figures/full_fig_p020_22.png] view at source ↗

read the original abstract

Cross-domain few-shot object detection (CD-FSOD) remains a challenging problem for existing object detectors and few-shot learning approaches, particularly when generalizing across distinct domains. As part of NTIRE 2026, we hosted the second CD-FSOD Challenge to systematically evaluate and promote progress in detecting objects in unseen target domains under limited annotation conditions. The challenge received strong community interest, with 128 registered participants and a total of 696 submissions. Among them, 31 teams actively participated, and 19 teams submitted valid final results. Participants explored a wide range of strategies, introducing innovative methods that push the performance frontier under both open-source and closed-source tracks. This report presents a detailed overview of the NTIRE 2026 CD-FSOD Challenge, including a summary of the submitted approaches and an analysis of the final results across all participating teams. Challenge Codes: https://github.com/ohMargin/NTIRE2026_CDFSOD.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a factual recap of the 2026 NTIRE cross-domain few-shot object detection challenge that logs 19 teams' submissions and high-level strategy categories without advancing any new method or analysis itself.

read the letter

The main thing here is a summary of the second CD-FSOD challenge run at NTIRE 2026. It reports 128 registrations, 31 active teams, and 19 valid final submissions, then groups the approaches into broad buckets like domain adaptation, data synthesis, and fine-tuning tricks. The GitHub link to challenge code is the most concrete output for anyone who wants to inspect the actual entries later. That is useful as a timestamped record of where community effort went in 2026 on this narrow problem. The paper does a clean job of stating participation numbers and listing the final leaderboard order without overclaiming derivations or proofs. Everything stays descriptive and internally consistent with the event format. The soft spot is that the text repeats the phrase about teams introducing innovative methods that push the frontier, yet supplies almost no side-by-side numbers against the 2025 challenge or against strong baselines, so the reader cannot judge the size of any advance. The evaluation protocol and dataset choices are taken as given; there is no discussion of possible biases in how the target domains were sampled or how few-shot splits were constructed. Because the manuscript is only a competition report, it contains no new equations, no ablation studies of its own, and no independent verification of the submitted code. This kind of document is mainly for people who already follow NTIRE tracks or who need a quick pointer to the current best-reported numbers on cross-domain few-shot detection. It is not the place to look for a reusable technique or a theoretical insight. The work is coherent on its own terms and meets the standard for publishing challenge summaries, so it should go to peer review rather than desk rejection. A referee could usefully ask for clearer before-and-after comparisons and for more detail on which submitted methods actually differed from prior published work.

Referee Report

0 major / 1 minor

Summary. The manuscript reports on the Second Challenge on Cross-Domain Few-Shot Object Detection (CD-FSOD) held as part of NTIRE 2026. It states that the challenge received 128 registrations, 31 active teams, and 19 valid final submissions. Participants employed a wide range of strategies, with the report claiming that innovative methods were introduced that push the performance frontier in both open-source and closed-source tracks. The paper provides a detailed overview of the challenge, a summary of submitted approaches, and an analysis of the final results, along with a link to the challenge code repository.

Significance. This challenge report documents community progress on a difficult problem in computer vision and provides a public benchmark through its summary of methods and results. The explicit provision of the GitHub repository (https://github.com/ohMargin/NTIRE2026_CDFSOD) is a clear strength that supports reproducibility of the evaluation protocol and challenge setup. If the participation counts and high-level outcome statements hold, the manuscript serves as a useful reference for identifying effective strategies in cross-domain few-shot object detection.

minor comments (1)

[Abstract] The abstract refers to 'Challenge Codes' but the manuscript would benefit from a short dedicated subsection describing the repository contents (e.g., evaluation scripts, dataset splits, or baseline implementations) to improve accessibility for readers.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive review and recommendation to accept. The manuscript documents the NTIRE 2026 CD-FSOD Challenge results and methods as described.

Circularity Check

0 steps flagged

Factual competition report with no derivations or predictions

full rationale

The manuscript is a descriptive summary of the NTIRE 2026 CD-FSOD Challenge, reporting registration numbers (128), submissions (696), active teams (31), and valid final results (19) along with high-level strategy categories. No equations, proofs, fitted parameters, or predictive claims appear; all statements are factual accounts of observed competition outcomes. No load-bearing steps reduce to self-definition, fitted inputs renamed as predictions, or self-citation chains.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities; the document is an empirical competition report without theoretical derivations or new postulated constructs.

pith-pipeline@v0.9.0 · 5771 in / 949 out tokens · 38002 ms · 2026-05-10T16:01:12.996460+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

132 extracted references · 18 canonical work pages · 7 internal anchors

[1]

NT-HAZE: A Benchmark Dataset for Re- alistic Night-time Image Dehazing

Radu Ancuti, Codruta Ancuti, Radu Timofte, and Cos- min Ancuti. NT-HAZE: A Benchmark Dataset for Re- alistic Night-time Image Dehazing . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

2026
[2]

NTIRE 2026 Nighttime Image De- hazing Challenge Report

Radu Ancuti, Alexandru Brateanu, Florin Vasluianu, Raul Balmez, Ciprian Orhei, Codruta Ancuti, Radu Timofte, Cosmin Ancuti, et al. NTIRE 2026 Nighttime Image De- hazing Challenge Report . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

2026
[4]

Qwen3-VL Technical Report

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical re- port.arXiv preprint arXiv:2511.21631, 2025. 5, 6

work page internal anchor Pith review Pith/arXiv arXiv 2025
[5]

YOLOv4: Optimal Speed and Accuracy of Object Detection

Alexey Bochkovskiy, Chien-Yao Wang, and Hong- Yuan Mark Liao. Yolov4: Optimal speed and accuracy of object detection.arXiv preprint arXiv:2004.10934, 2020. 8

work page internal anchor Pith review arXiv 2004
[6]

NTIRE 2026 Challenge on Single Image Reflection Removal in the Wild: Datasets, Results, and Methods

Jie Cai, Kangning Yang, Zhiyuan Li, Florin Vasluianu, Radu Timofte, et al. NTIRE 2026 Challenge on Single Image Reflection Removal in the Wild: Datasets, Results, and Methods . InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

2026
[7]

End-to-end object detection with transformers

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nico- las Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. InECCV,
[8]

Sam 3: Segment anything with concepts, 2025

Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoub- hik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman R¨adle, Triantafyllos Afouras, Effrosyni Mavroudi, Kather- ine Xu, Tsung-Han Wu, Yu Zhou, Lil...

2025
[9]

SAM 3: Segment Anything with Concepts

Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoub- hik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, et al. Sam 3: Segment anything with concepts.arXiv preprint arXiv:2511.16719, 2025. 8

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

SAM 3: Segment anything with concepts

Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoub- hik Debnath, Ronghang Hu, Didac Suris Coll-Vinent, Chai- tanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman R ¨adle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu,...

2026
[11]

MMDetection: Open MMLab Detection Toolbox and Benchmark

Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, and Dahua Lin. MMDetec- tion: Open mmlab detection toolbox and...

work page Pith review arXiv 1906
[12]

The Fourth Chal- lenge on Image Super-Resolution (×4) at NTIRE 2026: Benchmark Results and Method Overview

Zheng Chen, Kai Liu, Jingkai Wang, Xianglong Yan, Jianze Li, Ziqing Zhang, Jue Gong, Jiatong Li, Lei Sun, Xiaoyang Liu, Radu Timofte, Yulun Zhang, et al. The Fourth Chal- lenge on Image Super-Resolution (×4) at NTIRE 2026: Benchmark Results and Method Overview . InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) W...

2026
[13]

Rethinking general underwater object detection: Datasets, challenges, and solutions.Neurocomputing

Chenping Fu, Risheng Liu, Xin Fan, Puyang Chen, Hao Fu, Wanqi Yuan, Ming Zhu, Zhongxuan Luo. Rethinking general underwater object detection: Datasets, challenges, and solutions.Neurocomputing. 2, 3
[14]

Zero: Industry-ready vision foundation model with multi- modal prompts, 2025

Sangbum Choi, Kyeongryeol Go, and Taewoong Jang. Zero: Industry-ready vision foundation model with multi- modal prompts, 2025. 10, 11

2025
[15]

Low Light Image Enhancement Challenge at NTIRE 2026

George Ciubotariu, Sharif S M A, Abdur Rehman, Fayaz Ali, Rizwan Ali Naqvi, Marcos Conde, Radu Timofte, et al. Low Light Image Enhancement Challenge at NTIRE 2026 . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

2026
[16]

High FPS Video Frame Inter- polation Challenge at NTIRE 2026

George Ciubotariu, Zhuyun Zhou, Yeying Jin, Zongwei Wu, Radu Timofte, et al. High FPS Video Frame Inter- polation Challenge at NTIRE 2026 . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

2026
[17]

Bert: Pre-training of deep bidirectional trans- formers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional trans- formers for language understanding. InProceedings of the 2019 conference of the North American chapter of the as- sociation for computational linguistics: human language technologies, volume 1 (long and short papers), 2019. 7

2019
[18]

Arthropod taxonomy orders object detection dataset

Geir Drange. Arthropod taxonomy orders object detection dataset. Inhttps://doi.org/10.34740/kaggle/dsv/1240192,

work page doi:10.34740/kaggle/dsv/1240192
[19]

NTIRE 2026 Rip Current Detection and Segmentation (RipDet- Seg) Challenge Report

Andrei Dumitriu, Aakash Ralhan, Florin Miron, Florin Ta- tui, Radu Tudor Ionescu, Radu Timofte, et al. NTIRE 2026 Rip Current Detection and Segmentation (RipDet- Seg) Challenge Report . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

2026
[20]

Conde, Zongwei Wu, Yeying Jin, Radu Timofte, et al

Omar Elezabi, Marcos V . Conde, Zongwei Wu, Yeying Jin, Radu Timofte, et al. Photography Retouching Trans- fer, NTIRE 2026 Challenge: Report . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

2026
[21]

Few-shot object detection with vision foundation models and graph diffu- sion

Chen-Bin Feng, Youyang Sha, Longfei Liu, Yongjun Yu, Chi Man V ong, Xuanlong Yu, and Xi Shen. Few-shot object detection with vision foundation models and graph diffu- sion. InThe Fourteenth International Conference on Learn- ing Representations, 2026. 15

2026
[22]

FSOD-VFM: Few-shot object detection with vision foundation models and graph diffusion

Chen-Bin Feng, Youyang Sha, Longfei Liu, Yongjun Yu, Chi Man V ong, Xuanlong Yu, and Xi Shen. FSOD-VFM: Few-shot object detection with vision foundation models and graph diffusion. InICLR, 2026. 16

2026
[23]

Wedetect: Fast open-vocabulary object detection as retrieval.arXiv preprint arXiv:2512.12309, 2025

Shenghao Fu, Yukun Su, Fengyun Rao, Jing Lyu, Xi- aohua Xie, and Wei-Shi Zheng. Wedetect: Fast open- vocabulary object detection as retrieval.arXiv preprint arXiv:2512.12309, 2025. 18

work page arXiv 2025
[24]

Meta-fdmixup: Cross-domain few-shot learning guided by labeled target data

Yuqian Fu, Yanwei Fu, and Yu-Gang Jiang. Meta-fdmixup: Cross-domain few-shot learning guided by labeled target data. InACM multimedia, 2021. 1

2021
[25]

Me-d2n: Multi-expert domain decomposi- tional network for cross-domain few-shot learning

Yuqian Fu, Yu Xie, Yanwei Fu, Jingjing Chen, and Yu- Gang Jiang. Me-d2n: Multi-expert domain decomposi- tional network for cross-domain few-shot learning. InACM multimedia, 2022

2022
[26]

Styleadv: Meta style adversarial training for cross-domain few-shot learning

Yuqian Fu, Yu Xie, Yanwei Fu, and Yu-Gang Jiang. Styleadv: Meta style adversarial training for cross-domain few-shot learning. InCVPR, 2023. 1

2023
[27]

Cross-domain few-shot object detection via enhanced open-set object detector

Yuqian Fu, Yu Wang, Yixuan Pan, Lian Huai, Xingyu Qiu, Zeyu Shangguan, Tong Liu, Yanwei Fu, Luc Van Gool, and Xingqun Jiang. Cross-domain few-shot object detection via enhanced open-set object detector. InEuropean Conference on Computer Vision, 2024. 1, 2, 3, 10, 19, 20

2024
[28]

Ntire 2025 challenge on cross-domain few-shot object detection: Methods and re- sults

Yuqian Fu, Xingyu Qiu, Bin Ren, Yanwei Fu, Radu Tim- ofte, Nicu Sebe, Ming-Hsuan Yang, Luc Van Gool, Kai- jin Zhang, Qingpeng Nong, et al. Ntire 2025 challenge on cross-domain few-shot object detection: Methods and re- sults. InProceedings of the Computer Vision and Pattern Recognition Conference, 2025. 2

2025
[29]

NTIRE 2026 Challenge on End-to-End Financial Receipt Restoration and Reasoning from Degraded Images: Datasets, Methods and Results

Bochen Guan, Jinlong Li, Kangning Yang, Chuang Ke, Jie Cai, Florin Vasluianu, Radu Timofte, et al. NTIRE 2026 Challenge on End-to-End Financial Receipt Restoration and Reasoning from Degraded Images: Datasets, Methods and Results . InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

2026
[30]

NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Challenge: AI Flash Portrait (Track 3)

Ya-nan Guan, Shaonan Zhang, Hang Guo, Yawen Wang, Xinying Fan, Jie Liang, Hui Zeng, Guanyi Qin, Lishen Qu, Tao Dai, Shu-Tao Xia, Lei Zhang, Radu Timofte, et al. NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Challenge: AI Flash Portrait (Track 3) . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

2026
[31]

Remote sensing object detection in the deep learning era—a review.Remote Sensing, 16(2):327, 2024

Shengxi Gui, Shuang Song, Rongjun Qin, and Yang Tang. Remote sensing object detection in the deep learning era—a review.Remote Sensing, 16(2):327, 2024. 1

2024
[32]

Scale-equivalent distillation for semi-supervised object detection

Qiushan Guo, Yao Mu, Jianyu Chen, Tianqi Wang, Yizhou Yu, and Ping Luo. Scale-equivalent distillation for semi-supervised object detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022. 11

2022
[33]

A broader study of cross-domain few-shot learning

Yunhui Guo, Noel C Codella, Leonid Karlinsky, James V Codella, John R Smith, Kate Saenko, Tajana Rosing, and Rogerio Feris. A broader study of cross-domain few-shot learning. InECCV, 2020. 1

2020
[34]

NTIRE 2026 Challenge on Robust AI- Generated Image Detection in the Wild

Aleksandr Gushchin, Khaled Abud, Ekaterina Shu- mitskaya, Artem Filippov, Georgii Bychkov, Sergey Lavrushkin, Mikhail Erofeev, Anastasia Antsiferova, Changsheng Chen, Shunquan Tan, Radu Timofte, Dmitriy Vatolin, et al. NTIRE 2026 Challenge on Robust AI- Generated Image Detection in the Wild . InProceedings of the IEEE/CVF Conference on Computer Vision and...

2026
[35]

Robust Deepfake De- tection, NTIRE 2026 Challenge: Report

Benedikt Hopf, Radu Timofte, et al. Robust Deepfake De- tection, NTIRE 2026 Challenge: Report . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

2026
[36]

Drone-based object counting by spatially regularized re- gional proposal network

Meng-Ru Hsieh, Yen-Liang Lin, and Winston H Hsu. Drone-based object counting by spatially regularized re- gional proposal network. InProceedings of the IEEE in- ternational conference on computer vision, 2017. 2, 3

2017
[37]

Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models. In ICLR. OpenReview.net, 2022. 10, 11

2022
[38]

Cross-domain weakly-supervised object detection through progressive domain adaptation

Naoto Inoue, Ryosuke Furuta, Toshihiko Yamasaki, and Kiyoharu Aizawa. Cross-domain weakly-supervised object detection through progressive domain adaptation. InCVPR,
[39]

Underwater species detection using channel sharpening at- tention

Lihao Jiang, Yi Wang, Qi Jia, Shengwei Xu, Yu Liu, Xin Fan, Haojie Li, Risheng Liu, Xinwei Xue, and Ruili Wang. Underwater species detection using channel sharpening at- tention. InACM MM, 2021. 3

2021
[40]

Cha- trex: Taming multimodal llm for joint perception and un- derstanding, 2024

Qing Jiang, Gen luo, Yuqin Yang, Yuda Xiong, Yihao Chen, Zhaoyang Zeng, Tianhe Ren, and Lei Zhang. Cha- trex: Taming multimodal llm for joint perception and un- derstanding, 2024. 15

2024
[41]

Mdetr- modulated detection for end-to-end multi-modal under- standing

Aishwarya Kamath, Mannat Singh, Yann LeCun, Gabriel Synnaeve, Ishan Misra, and Nicolas Carion. Mdetr- modulated detection for end-to-end multi-modal under- standing. InICCV, 2021. 7

2021
[42]

NTIRE 2026 Low-light Enhancement: Twilight Cowboy Challenge

Aleksei Khalin, Egor Ershov, Artem Panshin, Sergey Kor- chagin, Georgiy Lobarev, Arseniy Terekhin, Sofiia Doro- gova, Amir Shamsutdinov, Yasin Mamedov, Bakhtiyar Khalfin, Bogdan Sheludko, Emil Zilyaev, Nikola Bani ´c, Georgy Perevozchikov, Radu Timofte, et al. NTIRE 2026 Low-light Enhancement: Twilight Cowboy Challenge . In Proceedings of the IEEE/CVF Con...

2026
[43]

Few-shot object detection: A comprehensive sur- vey.IEEE Transactions on Neural Networks and Learning Systems, 2023

Mona K ¨ohler, Markus Eisenbach, and Horst-Michael Gross. Few-shot object detection: A comprehensive sur- vey.IEEE Transactions on Neural Networks and Learning Systems, 2023. 1

2023
[44]

Visual genome: Connecting language and vision using crowdsourced dense image annotations.International journal of computer vi- sion, 2017

Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalan- tidis, Li-Jia Li, David A Shamma, et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations.International journal of computer vi- sion, 2017. 7

2017
[45]

Visual genome

Ranjay Krishna et al. Visual genome. InIJCV, 2017. 12

2017
[46]

Imagenet classification with deep convolutional neural net- works

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural net- works. InAdvances in Neural Information Processing Sys- tems, 2012. 7

2012
[47]

The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale

Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Ui- jlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Ste- fan Popov, Matteo Malloci, Alexander Kolesnikov, et al. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. International journal of computer vision, 2020. 7

2020
[48]

Maris: Marine open-vocabulary in- stance segmentation with geometric enhancement and se- mantic alignment.arXiv preprint arXiv:2510.15398, 2025

Bingyu Li, Feiyu Wang, Da Zhang, Zhiyuan Zhao, Junyu Gao, and Xuelong Li. Maris: Marine open-vocabulary in- stance segmentation with geometric enhancement and se- mantic alignment.arXiv preprint arXiv:2510.15398, 2025. 9

work page arXiv 2025
[49]

Elevater: A benchmark and toolkit for evaluating language-augmented visual models

Chunyuan Li, Haotian Liu, Liunian Li, Pengchuan Zhang, Jyoti Aneja, Jianwei Yang, Ping Jin, Houdong Hu, Zicheng Liu, Yong Jae Lee, et al. Elevater: A benchmark and toolkit for evaluating language-augmented visual models. Advances in Neural Information Processing Systems, 2022. 7

2022
[50]

Semantic-aware ship detection with vision-language inte- gration

Jiahao Li, Jiancheng Pan, Yuze Sun, and Xiaomeng Huang. Semantic-aware ship detection with vision-language inte- gration. InIGARSS 2025-2025 IEEE International Geo- science and Remote Sensing Symposium, 2025. 1

2025
[51]

The First Challenge on Mobile Real- World Image Super-Resolution at NTIRE 2026: Bench- mark Results and Method Overview

Jiatong Li, Zheng Chen, Kai Liu, Jingkai Wang, Zihan Zhou, Xiaoyang Liu, Libo Zhu, Radu Timofte, Yulun Zhang, et al. The First Challenge on Mobile Real- World Image Super-Resolution at NTIRE 2026: Bench- mark Results and Method Overview . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

2026
[52]

Object detection in optical remote sensing images: A survey and a new benchmark.ISPRS, 2020

Ke Li, Gang Wan, Gong Cheng, Liqiu Meng, and Junwei Han. Object detection in optical remote sensing images: A survey and a new benchmark.ISPRS, 2020. 3

2020
[53]

Grounded language-image pre-training

Liunian Li et al. Grounded language-image pre-training. In CVPR, 2022. 11

2022
[54]

Grounded language-image pre-training

Liunian Harold Li, Pengchuan Zhang, Haotian Zhang, Jian- wei Yang, Chunyuan Li, Yiwu Zhong, Lijuan Wang, Lu Yuan, Lei Zhang, Jenq-Neng Hwang, et al. Grounded language-image pre-training. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022. 6

2022
[55]

Cross-domain few-shot learning with task-specific adapters

Wei-Hong Li, Xialei Liu, and Hakan Bilen. Cross-domain few-shot learning with task-specific adapters. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, 2022. 1

2022
[56]

NTIRE 2026 Challenge on Short-form UGC Video Restoration in the Wild with Generative Mod- els: Datasets, Methods and Results

Xin Li, Jiachao Gong, Xijun Wang, Shiyao Xiong, Bingchen Li, Suhang Yao, Chao Zhou, Zhibo Chen, Radu Timofte, et al. NTIRE 2026 Challenge on Short-form UGC Video Restoration in the Wild with Generative Mod- els: Datasets, Methods and Results . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

2026
[57]

NTIRE 2026 The Second Challenge on Day and Night Raindrop Removal for Dual- Focused Images: Methods and Results

Xin Li, Yeying Jin, Suhang Yao, Beibei Lin, Zhaoxin Fan, Wending Yan, Xin Jin, Zongwei Wu, Bingchen Li, Peishu Shi, Yufei Yang, Yu Li, Zhibo Chen, Bihan Wen, Robby Tan, Radu Timofte, et al. NTIRE 2026 The Second Challenge on Day and Night Raindrop Removal for Dual- Focused Images: Methods and Results . InProceedings of the IEEE/CVF Conference on Computer ...

2026
[58]

Domain-rag: Retrieval-guided compositional image generation for cross-domain few-shot object detection

Yu Li, Xingyu Qiu, Yuqian Fu, Jie Chen, Tianwen Qian, Xu Zheng, Danda Pani Paudel, Yanwei Fu, Xuanjing Huang, Luc Van Gool, et al. Domain-rag: Retrieval-guided compo- sitional image generation for cross-domain few-shot object detection.arXiv preprint arXiv:2506.05872, 2025. 1, 4

work page arXiv 2025
[59]

Microsoft coco: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll ´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. InECCV, 2014. 1, 3, 7, 20

2014
[60]

Microsoft coco

Tsung-Yi Lin et al. Microsoft coco. InECCV, 2014. 12

2014
[61]

Focal loss

Tsung-Yi Lin et al. Focal loss. InICCV, 2017. 12

2017
[62]

The First Challenge on Remote Sensing Infrared Image Super-Resolution at NTIRE 2026: Benchmark Results and Method Overview

Kai Liu, Haoyang Yue, Zeli Lin, Zheng Chen, Jingkai Wang, Jue Gong, Radu Timofte, Yulun Zhang, et al. The First Challenge on Remote Sensing Infrared Image Super-Resolution at NTIRE 2026: Benchmark Results and Method Overview . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

2026
[63]

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection.arXiv preprint arXiv:2303.05499, 2023. 10

work page Pith review arXiv 2023
[64]

Grounding dino: Marrying dino with grounded pre-training for open-set object detection

Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. InEu- ropean Conference on Computer Vision, 2024. 6, 8

2024
[65]

Conde, et al

Shuhong Liu, Ziteng Cui, Chenyu Bao, Xuangeng Chu, Lin Gu, Bin Ren, Radu Timofte, Marcos V . Conde, et al. 3D Restoration and Reconstruction in Adverse Conditions: Re- alX3D Challenge Results . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

2026
[66]

NTIRE 2026 X- AIGC Quality Assessment Challenge: Methods and Results

Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Qiang Hu, Jiezhang Cao, Yu Zhou, Wei Sun, Farong Wen, Zitong Xu, Yingjie Zhou, Huiyu Duan, Lu Liu, Jiarui Wang, Siqi Luo, Chunyi Li, Li Xu, Zicheng Zhang, Yue Shi, Yubo Wang, Minghong Zhang, Chunchao Guo, Zhichao Hu, Mingtao Chen, Xiele Wu, Xin Ma, Zhaohe Lv, Yuanhao Xue, Jiaqi Wang, Xinxing Sha, Radu Timofte, et...

2026
[67]

Diverse instance generation via diffusion models for enhanced few-shot ob- ject detection in remote sensing images.IEEE Geoscience and Remote Sensing Letters, 2025

Yanxing Liu, Jiancheng Pan, Jianwei Yang, Tiancheng Chen, Peiling Zhou, and Bingchen Zhang. Diverse instance generation via diffusion models for enhanced few-shot ob- ject detection in remote sensing images.IEEE Geoscience and Remote Sensing Letters, 2025. 1

2025
[68]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021. 7, 11

2021
[69]

Cdformer: Cross-domain few-shot object detection trans- former against feature confusion

Boyuan Meng, Xiaohan Zhang, Peilin Li, Zhe Wu, Yim- ing Li, Wenkai Zhao, Beinan Yu, and Hui-Liang Shen. Cdformer: Cross-domain few-shot object detection trans- former against feature confusion. InICME, 2025. 1, 20

2025
[70]

NTIRE 2026 Challenge on Video Saliency Prediction: Methods and Results

Andrey Moskalenko, Alexey Bryncev, Ivan Kosmynin, Kira Shilovskaya, Mikhail Erofeev, Dmitry Vatolin, Radu Timofte, et al. NTIRE 2026 Challenge on Video Saliency Prediction: Methods and Results . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

2026
[71]

V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernan- dez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al

Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernan- dez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. DINOv2: Learning robust visual features without su- pervision.Transactions on Machine Learning Research,
[72]

Im2text: Describing images using 1 million captioned pho- tographs.Advances in neural information processing sys- tems, 24, 2011

Vicente Ordonez, Girish Kulkarni, and Tamara Berg. Im2text: Describing images using 1 million captioned pho- tographs.Advances in neural information processing sys- tems, 24, 2011. 7

2011
[73]

Locate anything on earth: Advancing open-vocabulary object detection for remote sensing com- munity

Jiancheng Pan, Yanxing Liu, Yuqian Fu, Muyuan Ma, Jiahao Li, Danda Pani Paudel, Luc Van Gool, and Xi- aomeng Huang. Locate anything on earth: Advancing open-vocabulary object detection for remote sensing com- munity. InProceedings of the AAAI Conference on Artifi- cial Intelligence, 2025. 1

2025
[74]

Enhance then search: An augmentation-search strategy with foundation models for cross-domain few-shot object detection

Jiancheng Pan, Yanxing Liu, Xiao He, Long Peng, Jiahao Li, Yuze Sun, and Xiaomeng Huang. Enhance then search: An augmentation-search strategy with foundation models for cross-domain few-shot object detection. InProceed- ings of the Computer Vision and Pattern Recognition Con- ference, 2025. 1, 8

2025
[75]

NTIRE 2026 Challenge on Efficient Burst HDR and Restoration: Datasets, Methods, and Results

Hyunhee Park, Eunpil Park, Sangmin Lee, Radu Timofte, et al. NTIRE 2026 Challenge on Efficient Burst HDR and Restoration: Datasets, Methods, and Results . InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

2026
[76]

NTIRE 2026 Challenge on Learned Smartphone ISP with Unpaired Data: Methods and Results

Georgy Perevozchikov, Daniil Vladimirov, Radu Timofte, et al. NTIRE 2026 Challenge on Learned Smartphone ISP with Unpaired Data: Methods and Results . InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

2026
[77]

Defrcn: Decoupled faster r-cnn for few-shot object detection

Limeng Qiao, Yuxuan Zhao, Zhiyuan Li, Xi Qiu, Jianan Wu, and Chi Zhang. Defrcn: Decoupled faster r-cnn for few-shot object detection. InICCV, 2021. 1

2021
[78]

NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Chal- lenge: Professional Image Quality Assessment (Track 1)

Guanyi Qin, Jie Liang, Bingbing Zhang, Lishen Qu, Ya-nan Guan, Hui Zeng, Lei Zhang, Radu Timofte, et al. NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Chal- lenge: Professional Image Quality Assessment (Track 1) . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

2026
[79]

The Second Challenge on Cross-Domain Few-Shot Object Detection at NTIRE 2026: Methods and Results

Xingyu Qiu, Yuqian Fu, Jiawei Geng, Bin Ren, Jiancheng Pan, Zongwei Wu, Hao Tang, Yanwei Fu, Radu Timo- fte, Nicu Sebe, Mohamed Elhoseiny, et al. The Second Challenge on Cross-Domain Few-Shot Object Detection at NTIRE 2026: Methods and Results . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

2026
[80]

NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Challenge: Multi-Exposure Image Fusion in Dynamic Scenes (Track2)

Lishen Qu, Yao Liu, Jie Liang, Hui Zeng, Wen Dai, Ya-nan Guan, Guanyi Qin, Shihao Zhou, Jufeng Yang, Lei Zhang, Radu Timofte, et al. NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Challenge: Multi-Exposure Image Fusion in Dynamic Scenes (Track2) . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

2026
[81]

Qwen3.5: Towards native multimodal agents,

Qwen Team. Qwen3.5: Towards native multimodal agents,

Showing first 80 references.