Few-Shot Incremental 3D Object Detection in Dynamic Indoor Environments

Jianjun Qian; Jian Yang; Jin Xie; Na Zhao; Yun Zhu

arxiv: 2604.07997 · v1 · submitted 2026-04-09 · 💻 cs.CV

Few-Shot Incremental 3D Object Detection in Dynamic Indoor Environments

Yun Zhu , Jianjun Qian , Jian Yang , Jin Xie , Na Zhao This is my paper

Pith reviewed 2026-05-10 16:53 UTC · model grok-4.3

classification 💻 cs.CV

keywords few-shot learningincremental learning3D object detectionvision-language modelsindoor environmentsmultimodal fusionprototype imprintingunknown object mining

0 comments

The pith

Few-shot incremental 3D detection works by mining unknown objects with vision-language models and fusing 2D-3D prototypes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents FI3Det as a way to let 3D detectors pick up new object types in dynamic indoor spaces after seeing only a handful of examples. It relies on vision-language models to identify previously unseen objects and to pull out both 2D meaning and rough 3D shapes, then cleans those signals with location-based weighting before imprinting class prototypes. A gated fusion step combines the 2D and 3D information to decide on detections. This setup matters because real-world environments change and labeling every new object type is expensive, so efficient adaptation matters for robots or smart spaces. Tests on common indoor datasets confirm steady gains over standard incremental methods in both one-time and ongoing learning schedules.

Core claim

The central discovery is that a combination of VLM-guided unknown object mining, spatial and consistency-based feature weighting, and gated multimodal prototype imprinting allows effective few-shot incremental 3D object detection without requiring extensive annotations for novel classes, as demonstrated by consistent improvements on ScanNet V2 and SUN RGB-D datasets in batch and sequential settings.

What carries the argument

The gated multimodal prototype imprinting module constructs category prototypes from aligned 2D semantic and 3D geometric features and fuses their classification scores using a multimodal gating mechanism to detect novel objects.

If this is right

Detectors can add new classes with minimal additional labeling effort.
Unknown objects are perceived better already in the initial training phase.
Consistent gains appear in both batch addition of all new classes at once and sequential addition over time.
The framework defines standard evaluation protocols for this task on two common 3D indoor datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The reliance on external vision-language models suggests that further advances in those models would directly boost few-shot 3D detection performance.
This technique could be adapted for real-time robotic systems that encounter new objects during operation.
Extending the weighting and gating ideas to other sensor modalities like depth-only or LiDAR data might broaden its use.
If the noise reduction proves robust, the method could handle even noisier inputs from less capable vision-language models.

Load-bearing premise

Vision-language models are able to mine unknown objects reliably and generate 2D semantic features and class-agnostic 3D boxes that are not too noisy for the subsequent weighting and fusion modules to handle effectively.

What would settle it

Running the system without the VLM-guided unknown object learning module and checking if few-shot performance on novel classes falls back to levels achieved by standard incremental baselines on the ScanNet V2 or SUN RGB-D datasets.

Figures

Figures reproduced from arXiv: 2604.07997 by Jianjun Qian, Jian Yang, Jin Xie, Na Zhao, Yun Zhu.

**Figure 2.** Figure 2: Correlation between base and novel category objects. In [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of our few-shot incremental 3D object detection model. The model consists of two parts: base training and incremental learning. In the base stage, we introduce a VLM-guided unknown object learning module that uses 2D VLMs to generate unknown objects, thereby improving the perception of unknown objects. In the incremental stage, we propose a gated multimodal prototype imprinting module that builds … view at source ↗

**Figure 4.** Figure 4: Visualization comparison of features. In (b), the [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparison on the ScanNet V2 [ [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Ablation of different components in UOM and UOW. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 8.** Figure 8: Statistical analysis of the number of instances for each [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 9.** Figure 9: Qualitative comparison on the ScanNet V2 [ [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 10.** Figure 10: Qualitative comparison on the SUN RGB-D [ [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

read the original abstract

Incremental 3D object perception is a critical step toward embodied intelligence in dynamic indoor environments. However, existing incremental 3D detection methods rely on extensive annotations of novel classes for satisfactory performance. To address this limitation, we propose FI3Det, a Few-shot Incremental 3D Detection framework that enables efficient 3D perception with only a few novel samples by leveraging vision-language models (VLMs) to learn knowledge of unseen categories. FI3Det introduces a VLM-guided unknown object learning module in the base stage to enhance perception of unseen categories. Specifically, it employs VLMs to mine unknown objects and extract comprehensive representations, including 2D semantic features and class-agnostic 3D bounding boxes. To mitigate noise in these representations, a weighting mechanism is further designed to re-weight the contributions of point- and box-level features based on their spatial locations and feature consistency within each box. Moreover, FI3Det proposes a gated multimodal prototype imprinting module, where category prototypes are constructed from aligned 2D semantic and 3D geometric features to compute classification scores, which are then fused via a multimodal gating mechanism for novel object detection. As the first framework for few-shot incremental 3D object detection, we establish both batch and sequential evaluation settings on two datasets, ScanNet V2 and SUN RGB-D, where FI3Det achieves strong and consistent improvements over baseline methods. Code is available at https://github.com/zyrant/FI3Det.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FI3Det is the first framework for few-shot incremental 3D detection via VLM-guided mining and gated prototypes, but its reported gains rest on unverified assumptions about VLM output quality in indoor scenes.

read the letter

The main point here is that FI3Det introduces the first dedicated setup for few-shot incremental 3D object detection in dynamic indoor environments. It uses VLMs in a base stage to mine unknown objects, pull 2D semantic features and class-agnostic 3D boxes, then applies spatial-feature weighting to downplay noise and a gated multimodal prototype imprinting step to fuse 2D and 3D information for novel class detection. The paper also defines batch and sequential evaluation protocols on ScanNet V2 and SUN RGB-D, claims consistent gains over baselines, and releases code. That combination addresses a practical gap for embodied systems that cannot afford full re-annotation of new objects. The modules themselves look like a sensible engineering response to the annotation problem, and opening the code plus the new eval settings gives others a concrete starting point. The soft spot is the heavy dependence on the VLM step. If the initial mining of unknowns produces noisy boxes or features because of domain shift from web-scale training to these indoor point clouds, the weighting and gating have limited independent power to recover. The abstract states improvements but supplies no ablations, variance numbers, or failure cases, so it is hard to judge how much the downstream fixes actually deliver. The stress-test concern about low VLM recall or precision therefore lands as a real open question rather than a minor detail. This work is aimed at researchers in 3D vision and robotics who care about continual or few-shot perception in real scenes. Someone building embodied agents or indoor scene understanding systems could extract usable ideas from the modules and the evaluation design. It has enough novelty and grounding to deserve serious peer review, though the referees will probably ask for more empirical checks on the VLM assumptions and clearer breakdowns of where the gains come from.

Referee Report

2 major / 0 minor

Summary. The paper proposes FI3Det, the first framework for few-shot incremental 3D object detection in dynamic indoor environments. It uses a VLM-guided unknown object learning module in the base stage to mine unknown objects and extract 2D semantic features plus class-agnostic 3D bounding boxes from few novel samples. Noise in these representations is addressed by a weighting mechanism that re-weights point- and box-level features according to spatial location and feature consistency. A gated multimodal prototype imprinting module then builds category prototypes from aligned 2D/3D features and fuses classification scores via multimodal gating. New batch and sequential evaluation protocols are introduced on ScanNet V2 and SUN RGB-D, with reported strong and consistent gains over baselines; code is released.

Significance. If the results hold, the work would be significant as the first dedicated approach to few-shot incremental 3D detection, lowering annotation costs for novel classes in embodied indoor settings. Establishing batch and sequential protocols on standard datasets is a useful contribution to evaluation methodology. The code release is a clear strength that supports reproducibility and follow-on research.

major comments (2)

The central performance claims rest on the assumption that VLMs produce sufficiently accurate class-agnostic 3D boxes and low-noise 2D semantic features from few-shot indoor samples; the weighting and gated-imprinting modules are then asserted to clean residual noise. No quantitative metrics (e.g., mining precision/recall, feature noise statistics, or VLM output quality on ScanNet V2 / SUN RGB-D) are supplied to show that these downstream steps can recover from typical VLM domain-shift errors, leaving the reported gains unsubstantiated.
The abstract states that FI3Det 'achieves strong and consistent improvements over baseline methods' in both batch and sequential settings, yet supplies no numerical results, ablation tables, or error analysis. Without these, it is impossible to verify whether gains are attributable to the proposed weighting and gating components or to other factors.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the significance of FI3Det as the first dedicated framework for few-shot incremental 3D object detection, along with the value of the new evaluation protocols and code release. We address each major comment below.

read point-by-point responses

Referee: The central performance claims rest on the assumption that VLMs produce sufficiently accurate class-agnostic 3D boxes and low-noise 2D semantic features from few-shot indoor samples; the weighting and gated-imprinting modules are then asserted to clean residual noise. No quantitative metrics (e.g., mining precision/recall, feature noise statistics, or VLM output quality on ScanNet V2 / SUN RGB-D) are supplied to show that these downstream steps can recover from typical VLM domain-shift errors, leaving the reported gains unsubstantiated.

Authors: We agree that explicit quantitative validation of the VLM mining step would further substantiate the claims. In the revised manuscript we will add precision/recall metrics for unknown-object mining on both ScanNet V2 and SUN RGB-D, together with before/after statistics on feature consistency and noise levels. These additions will directly illustrate how the spatial- and consistency-based weighting recovers from typical VLM domain-shift errors. The existing ablation studies already isolate the contribution of the weighting and gated-imprinting modules by showing performance degradation when either component is removed. revision: yes
Referee: The abstract states that FI3Det 'achieves strong and consistent improvements over baseline methods' in both batch and sequential settings, yet supplies no numerical results, ablation tables, or error analysis. Without these, it is impossible to verify whether gains are attributable to the proposed weighting and gating components or to other factors.

Authors: Abstracts are subject to strict length limits and therefore omit detailed tables and analyses. The full manuscript already contains the requested material: quantitative results for both batch and sequential protocols (Tables 1–2), component ablations (Table 3), and error analysis (Section 4.3). These tables and figures explicitly attribute the observed gains to the weighting and gating modules. To address the concern, we will revise the abstract to include a small number of concrete improvement figures (e.g., mAP deltas) while remaining within the word limit. revision: partial

Circularity Check

0 steps flagged

No circularity; framework and evaluations are self-contained

full rationale

The paper introduces FI3Det as a novel framework using VLM-guided unknown object mining, spatial/feature weighting, and gated multimodal prototype imprinting for few-shot incremental 3D detection. No equations, derivations, or fitted parameters are shown that reduce performance claims to quantities defined by the method's own inputs. New batch/sequential evaluation protocols on ScanNet V2 and SUN RGB-D are presented as external benchmarks without circular dependence on internal definitions or self-citations. Central claims rest on empirical gains over baselines rather than tautological reductions or load-bearing self-references.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated in the provided text.

pith-pipeline@v0.9.0 · 5570 in / 1198 out tokens · 32303 ms · 2026-05-10T16:53:46.299457+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages

[1]

Generalized Few- shot 3D Point Cloud Segmentation with Vision-Language Model

Zhaochong An, Guolei Sun, Yun Liu, Runjia Li, Junlin Han, Ender Konukoglu, and Serge Belongie. Generalized Few- shot 3D Point Cloud Segmentation with Vision-Language Model. InCVPR, 2025. 3

work page 2025
[2]

V oxelNeXt: Fully Sparse V oxelNet for 3D Object Detection and Tracking

Yukang Chen, Jianhui Liu, Xiangyu Zhang, Xiaojuan Qi, and Jiaya Jia. V oxelNeXt: Fully Sparse V oxelNet for 3D Object Detection and Tracking. InCVPR, 2023. 1, 2

work page 2023
[3]

Enhancing Few-Shot Class-Incremental Learning via Training-Free Bi-Level Modality Calibration

Yiyang Chen, Tianyu Ding, Lei Wang, Jing Huo, Yang Gao, and Wenbin Li. Enhancing Few-Shot Class-Incremental Learning via Training-Free Bi-Level Modality Calibration. InCVPR, 2025. 2

work page 2025
[4]

YOLO-World: Real-Time Open-V ocabulary Object Detection

Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xinggang Wang, and Ying Shan. YOLO-World: Real-Time Open-V ocabulary Object Detection. InCVPR, 2024. 3, 10, 12

work page 2024
[5]

AIC3DOD: Advancing Indoor Class- Incremental 3D Object Detection with Point Transformer Architecture and Room Layout Constraints

Zhongyao Cheng, Fang Wu, Peisheng Qian, Ziyuan Zhao, and Xulei Yang. AIC3DOD: Advancing Indoor Class- Incremental 3D Object Detection with Point Transformer Architecture and Room Layout Constraints. InWACV, 2025. 1, 2, 6, 7, 12

work page 2025
[6]

4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks

Christopher Choy, JunYoung Gwak, and Silvio Savarese. 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. InCVPR, 2019. 2

work page 2019
[7]

MMDetection3D: Open- MMLab next-generation platform for general 3D object detection.https://github.com/open- mmlab/ mmdetection3d, 2020

MMDetection3D Contributors. MMDetection3D: Open- MMLab next-generation platform for general 3D object detection.https://github.com/open- mmlab/ mmdetection3d, 2020. 6

work page 2020
[8]

Spconv: Spatially sparse convolution library.https : / / github

Spconv Contributors. Spconv: Spatially sparse convolution library.https : / / github . com / traveller59 / spconv, 2022. 2

work page 2022
[9]

Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner

Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes. In CVPR, 2017. 2, 6, 7, 8, 10, 11, 12

work page 2017
[10]

V oxel R-CNN: Towards High Performance V oxel-based 3D Object Detection

Jiajun Deng, Shaoshuai Shi, Peiwei Li, Wengang Zhou, Yanyong Zhang, and Houqiang Li. V oxel R-CNN: Towards High Performance V oxel-based 3D Object Detection. In AAAI, 2021. 1

work page 2021
[11]

Incremental-DETR: Incremental Few-Shot Object Detection via Self-Supervised Learning

Na Dong, Yongqiang Zhang, Mingli Ding, and Gim Hee Lee. Incremental-DETR: Incremental Few-Shot Object Detection via Self-Supervised Learning. InAAAI, 2023. 2, 3, 6, 7, 12

work page 2023
[12]

DQS3D: Densely-matched Quantization- aware Semi-supervised 3D Detection

Huan-ang Gao, Beiwen Tian, Pengfei Li, Hao Zhao, and Guyue Zhou. DQS3D: Densely-matched Quantization- aware Semi-supervised 3D Detection. InICCV, 2023. 3

work page 2023
[13]

Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object Detection

Yucheng Han, Na Zhao, Weiling Chen, Keng Teck Ma, and Hanwang Zhang. Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object Detection. InAAAI, 2024. 1

work page 2024
[14]

Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection

Cheng-Ju Ho, Chen-Hsuan Tai, Yen-Yu Lin, Ming-Hsuan Yang, and Yi-Hsuan Tsai. Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection. InNeurIPS, 2024. 3

work page 2024
[15]

Learning Superpoint Graph Cut for 3D Instance Segmentation

Le Hui, Linghua Tang, Yaqi Shen, Jin Xie, and Jian Yang. Learning Superpoint Graph Cut for 3D Instance Segmentation. InNeurIPS, 2022. 2

work page 2022
[16]

Efficient LiDAR Point Cloud Oversegmentation Network

Le Hui, Linghua Tang, Yuchao Dai, Jin Xie, and Jian Yang. Efficient LiDAR Point Cloud Oversegmentation Network. In ICCV, 2023

work page 2023
[17]

Sampling network guided cross-entropy method for unsupervised point cloud registration

Haobo Jiang, Yaqi Shen, Jin Xie, Jun Li, Jianjun Qian, and Jian Yang. Sampling network guided cross-entropy method for unsupervised point cloud registration. InICCV, 2021

work page 2021
[18]

SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation

Haobo Jiang, Mathieu Salzmann, Zheng Dang, Jin Xie, and Jian Yang. SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation. In NeurIPS, 2023. 2

work page 2023
[19]

Revisiting Pool-based Prompt Learning for Few-shot Class- incremental Learning

Yongwei Jiang, Yixiong Zou, Yuhua Li, and Ruixuan Li. Revisiting Pool-based Prompt Learning for Few-shot Class- incremental Learning. InCVPR, 2025. 2

work page 2025
[20]

Segment Anything

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment Anything. InICCV, 2023. 2, 3

work page 2023
[21]

SS3D: Sparsely-Supervised 3D Object Detection from Point Cloud

Chuandong Liu, Chenqiang Gao, Fangcen Liu, Jiang Liu, Deyu Meng, and Xinbo Gao. SS3D: Sparsely-Supervised 3D Object Detection from Point Cloud. InCVPR, 2022. 3

work page 2022
[22]

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, et al. Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection. In ECCV, 2024. 2, 3, 4, 6, 10, 12

work page 2024
[23]

SEC-Prompt: SEmantic Complementary Prompting for Few-Shot Class-Incremental Learning

Ye Liu and Meng Yang. SEC-Prompt: SEmantic Complementary Prompting for Few-Shot Class-Incremental Learning. InCVPR, 2025. 2, 3

work page 2025
[24]

Continual Detection Transformer for Incremen- tal Object Detection

Yaoyao Liu, Bernt Schiele, Andrea Vedaldi, and Christian Rupprecht. Continual Detection Transformer for Incremen- tal Object Detection. InCVPR, 2023. 2

work page 2023
[25]

Spa- tiallm: Training large language models for structured in- door modeling.arXiv preprint arXiv:2506.07491, 2025

Yongsen Mao, Junhao Zhong, Chuan Fang, Jia Zheng, Rui Tang, Hao Zhu, Ping Tan, and Zihan Zhou. SpatialLM: Training Large Language Models for Structured Indoor Modeling.arXiv preprint arXiv:2506.07491, 2025. 7

work page arXiv 2025
[26]

Weakly Supervised 3D Object Detection from Lidar Point Cloud

Qinghao Meng, Wenguan Wang, Tianfei Zhou, Jianbing Shen, Luc Van Gool, and Dengxin Dai. Weakly Supervised 3D Object Detection from Lidar Point Cloud. InECCV,

work page
[27]

V-net: Fully convolutional neural networks for columetric medical image segmentation

Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-net: Fully convolutional neural networks for columetric medical image segmentation. In3DV, 2016. 9

work page 2016
[28]

How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation

Yining Pan, Qiongjie Cui, Xulei Yang, and Na Zhao. How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation. In ICML, 2025. 2

work page 2025
[29]

Incremental Few-Shot Object Detection

Juan-Manuel Perez-Rua, Xiatian Zhu, Timothy M Hospedales, and Tao Xiang. Incremental Few-Shot Object Detection. InCVPR, 2020. 2, 3

work page 2020
[30]

PointNet: Deep learning on Point sets for 3D Classification and Segmentation

Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. PointNet: Deep learning on Point sets for 3D Classification and Segmentation. InCVPR, 2017. 2

work page 2017
[31]

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. InNeurIPS, 2017. 2

work page 2017
[32]

Deep Hough V oting for 3D Object Detection in Point Clouds

Charles R Qi, Or Litany, Kaiming He, and Leonidas J Guibas. Deep Hough V oting for 3D Object Detection in Point Clouds. InICCV, 2019. 2

work page 2019
[33]

Low-shot Learning with Imprinted Weights

Hang Qi, Matthew Brown, and David G Lowe. Low-shot Learning with Imprinted Weights. InCVPR, 2018. 4, 6, 7, 12

work page 2018
[34]

FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection

Anna Rukhovich, Anna V orontsova, and Anton Konushin. FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection. InECCV, 2022. 1, 2, 5, 6

work page 2022
[35]

TR3D: Towards Real-Time Indoor 3D Object Detection

Danila Rukhovich, Anna V orontsova, and Anton Konushin. TR3D: Towards Real-Time Indoor 3D Object Detection. In ICIP, 2023. 1, 2, 5, 6, 7

work page 2023
[36]

V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection

Yichao Shen, Zigang Geng, Yuhui Yuan, Yutong Lin, Ze Liu, Chunyu Wang, Han Hu, Nanning Zheng, and Baining Guo. V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection. InICLR, 2024. 2

work page 2024
[37]

Lichtenberg, and Jianxiong Xiao

Shuran Song, Samuel P. Lichtenberg, and Jianxiong Xiao. SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite. InCVPR, 2015. 2, 6, 7, 8, 10, 11, 12, 13

work page 2015
[38]

Prototypical Variational Autoencoder for 3D Few-shot Object Detection

Weiliang Tang, Biqi Yang, Xianzhi Li, Pheng-Ann Heng, Yunhui Liu, and Chi-Wing Fu. Prototypical Variational Autoencoder for 3D Few-shot Object Detection. InNeurIPS,

work page
[39]

3DIoUMatch: Leveraging IoU Prediction for Semi- Supervised 3D Object Detection

He Wang, Yezhen Cong, Or Litany, Yue Gao, and Leonidas J Guibas. 3DIoUMatch: Leveraging IoU Prediction for Semi- Supervised 3D Object Detection. InCVPR, 2021. 3

work page 2021
[40]

CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds

Haiyang Wang, Shaocong Dong, Shaoshuai Shi, Aoxue Li, Jianan Li, Zhenguo Li, Liwei Wang, et al. CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds. InNeurIPS, 2022. 1

work page 2022
[41]

Uncertainty Meets Diversity: A Comprehensive Active Learning Framework for Indoor 3D Object Detection

Jiangyi Wang and Na Zhao. Uncertainty Meets Diversity: A Comprehensive Active Learning Framework for Indoor 3D Object Detection. InCVPR, 2025. 1

work page 2025
[42]

AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models

Xinyi Wang, Xun Yang, Yanlong Xu, Yuchen Wu, Zhen Li, and Na Zhao. AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models. In NeurIPS, 2025. 2

work page 2025
[43]

AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring

Xinyi Wang, Na Zhao, Zhiyuan Han, Dan Guo, and Xun Yang. AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring. InAAAI, 2025. 2

work page 2025
[44]

Syn-to- Real Unsupervised Domain Adaptation for Indoor 3D Object Detection

Yunsong Wang, Na Zhao, and Gim Hee Lee. Syn-to- Real Unsupervised Domain Adaptation for Indoor 3D Object Detection. InBMVC, 2024. 3

work page 2024
[45]

One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection

Zhenyu Wang, Ya-Li Li, Hengshuang Zhao, and Shengjin Wang. One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection. InNeurIPS, 2024. 2

work page 2024
[46]

Text2LiDAR: Text-guided LiDAR Point Cloud Generation via Equirectangular Transformer

Yang Wu, Kaihua Zhang, Jianjun Qian, Jin Xie, and Jian Yang. Text2LiDAR: Text-guided LiDAR Point Cloud Generation via Equirectangular Transformer. InECCV,

work page
[47]

WeatherGen: A Unified Diverse Weather Generator for LiDAR Point Clouds via Spider Mamba Diffusion

Yang Wu, Yun Zhu, Kaihua Zhang, Jianjun Qian, Jin Xie, and Jian Yang. WeatherGen: A Unified Diverse Weather Generator for LiDAR Point Clouds via Spider Mamba Diffusion. InCVPR, 2025. 2

work page 2025
[48]

CCF: Complementary Collaborative Fusion for Domain Generalized Multi-Modal 3D Object Detection.arXiv preprint arXiv:2603.23276, 2026

Yuchen Wu, Kun Wang, Yining Pan, and Na Zhao. CCF: Complementary Collaborative Fusion for Domain Generalized Multi-Modal 3D Object Detection.arXiv preprint arXiv:2603.23276, 2026. 1

work page arXiv 2026
[49]

NaviFormer: A Spatio-Temporal Context-Aware Transformer for Object Navigation

Wei Xie, Haobo Jiang, Yun Zhu, Jianjun Qian, and Jin Xie. NaviFormer: A Spatio-Temporal Context-Aware Transformer for Object Navigation. InAAAI, 2025. 2

work page 2025
[50]

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

Yunyang Xiong, Bala Varadarajan, Lemeng Wu, Xiaoyu Xi- ang, Fanyi Xiao, Chenchen Zhu, Xiaoliang Dai, Dilin Wang, Fei Sun, Forrest Iandola, Raghuraman Krishnamoorthi, and Vikas Chandra. EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything. InCVPR, 2024. 2, 3, 4, 7

work page 2024
[51]

Back to Reality: Weakly-supervised 3D Object Detection with Shape-guided Label Enhancement

Xiuwei Xu, Yifan Wang, Yu Zheng, Yongming Rao, Jie Zhou, and Jiwen Lu. Back to Reality: Weakly-supervised 3D Object Detection with Shape-guided Label Enhancement. In CVPR, 2022. 3

work page 2022
[52]

Mixsup: Mixed-Grained Supervision for Label-Efficient Lidar-based 3D Object Detection

Yuxue Yang, Lue Fan, and Zhaoxiang Zhang. Mixsup: Mixed-Grained Supervision for Label-Efficient Lidar-based 3D Object Detection. InICLR, 2024. 3, 7

work page 2024
[53]

Sylph: A Hypernetwork Framework for Incremental Few-shot Object Detection

Li Yin, Juan M Perez-Rua, and Kevin J Liang. Sylph: A Hypernetwork Framework for Incremental Few-shot Object Detection. InCVPR, 2022. 2, 3

work page 2022
[54]

General Geometry-aware Weakly Supervised 3D Object Detection

Guowen Zhang, Junsong Fan, Liyi Chen, Zhaoxiang Zhang, Zhen Lei, and Lei Zhang. General Geometry-aware Weakly Supervised 3D Object Detection. InECCV, 2024. 3, 7

work page 2024
[55]

Attraction Diminishing and Distributing for Few-Shot Class-Incremental Learning

Li-Jun Zhao, Zhen-Duo Chen, Yongxin Wang, Xin Luo, and Xin-Shun Xu. Attraction Diminishing and Distributing for Few-Shot Class-Incremental Learning. InCVPR, 2025. 2, 3

work page 2025
[56]

Static-Dynamic Co-teaching for Class-Incremental 3D Object Detection

Na Zhao and Gim Hee Lee. Static-Dynamic Co-teaching for Class-Incremental 3D Object Detection. InAAAI, 2022. 1, 2, 6, 10

work page 2022
[57]

SESS: Self- Ensembling Semi-Supervised 3D Object Detection

Na Zhao, Tat-Seng Chua, and Gim Hee Lee. SESS: Self- Ensembling Semi-Supervised 3D Object Detection. In CVPR, 2020. 3

work page 2020
[58]

SDCoT++: Improved Static-Dynamic Co- Teaching for Class-Incremental 3D Object Detection.IEEE Transactions on Image Processing, 2025

Na Zhao, Peisheng Qian, Fang Wu, Xun Xu, Xulei Yang, and Gim Hee Lee. SDCoT++: Improved Static-Dynamic Co- Teaching for Class-Incremental 3D Object Detection.IEEE Transactions on Image Processing, 2025. 1, 2, 6, 7, 11, 12

work page 2025
[59]

Prototypical V oteNet for Few-Shot 3D Point Cloud Object Detection

Shizhen Zhao and Xiaojuan Qi. Prototypical V oteNet for Few-Shot 3D Point Cloud Object Detection. InNeurIPS,

work page
[60]

SP3D: Boosting Sparsely-Supervised 3D Object Detection via Accurate Cross-Modal Semantic Prompts

Shijia Zhao, Qiming Xia, Xusheng Guo, Pufan Zou, Maoji Zheng, Hai Wu, Chenglu Wen, and Cheng Wang. SP3D: Boosting Sparsely-Supervised 3D Object Detection via Accurate Cross-Modal Semantic Prompts. InCVPR, 2025. 3

work page 2025
[61]

Distance-IoU loss: Faster and Better Learning for Bounding Box Regression

Zhaohui Zheng, Ping Wang, Wei Liu, Jinze Li, Rongguang Ye, and Dongwei Ren. Distance-IoU loss: Faster and Better Learning for Bounding Box Regression. InAAAI, 2020. 9

work page 2020
[62]

MonoSE(3)-Diffusion: A Monocular SE(3) Diffusion Framework for Robust Camera-to-Robot Pose Estimation.IEEE Robotics and Automation Letters, 10 (11):11832–11839, 2025

Kangjian Zhu, Haobo Jiang, Yigong Zhang, Jianjun Qian, Jian Yang, and Jin Xie. MonoSE(3)-Diffusion: A Monocular SE(3) Diffusion Framework for Robust Camera-to-Robot Pose Estimation.IEEE Robotics and Automation Letters, 10 (11):11832–11839, 2025. 2

work page 2025
[63]

SPGroup3D: Superpoint Grouping Network for Indoor 3D Object Detection

Yun Zhu, Le Hui, Yaqi Shen, and Jin Xie. SPGroup3D: Superpoint Grouping Network for Indoor 3D Object Detection. InAAAI, 2024. 2

work page 2024
[64]

Learning Class Prototypes for Unified Sparse- Supervised 3D Object Detection

Yun Zhu, Le Hui, Hang Yang, Jianjun Qian, Jin Xie, and Jian Yang. Learning Class Prototypes for Unified Sparse- Supervised 3D Object Detection. InCVPR, 2025. 3

work page 2025

[1] [1]

Generalized Few- shot 3D Point Cloud Segmentation with Vision-Language Model

Zhaochong An, Guolei Sun, Yun Liu, Runjia Li, Junlin Han, Ender Konukoglu, and Serge Belongie. Generalized Few- shot 3D Point Cloud Segmentation with Vision-Language Model. InCVPR, 2025. 3

work page 2025

[2] [2]

V oxelNeXt: Fully Sparse V oxelNet for 3D Object Detection and Tracking

Yukang Chen, Jianhui Liu, Xiangyu Zhang, Xiaojuan Qi, and Jiaya Jia. V oxelNeXt: Fully Sparse V oxelNet for 3D Object Detection and Tracking. InCVPR, 2023. 1, 2

work page 2023

[3] [3]

Enhancing Few-Shot Class-Incremental Learning via Training-Free Bi-Level Modality Calibration

Yiyang Chen, Tianyu Ding, Lei Wang, Jing Huo, Yang Gao, and Wenbin Li. Enhancing Few-Shot Class-Incremental Learning via Training-Free Bi-Level Modality Calibration. InCVPR, 2025. 2

work page 2025

[4] [4]

YOLO-World: Real-Time Open-V ocabulary Object Detection

Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xinggang Wang, and Ying Shan. YOLO-World: Real-Time Open-V ocabulary Object Detection. InCVPR, 2024. 3, 10, 12

work page 2024

[5] [5]

AIC3DOD: Advancing Indoor Class- Incremental 3D Object Detection with Point Transformer Architecture and Room Layout Constraints

Zhongyao Cheng, Fang Wu, Peisheng Qian, Ziyuan Zhao, and Xulei Yang. AIC3DOD: Advancing Indoor Class- Incremental 3D Object Detection with Point Transformer Architecture and Room Layout Constraints. InWACV, 2025. 1, 2, 6, 7, 12

work page 2025

[6] [6]

4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks

Christopher Choy, JunYoung Gwak, and Silvio Savarese. 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. InCVPR, 2019. 2

work page 2019

[7] [7]

MMDetection3D: Open- MMLab next-generation platform for general 3D object detection.https://github.com/open- mmlab/ mmdetection3d, 2020

MMDetection3D Contributors. MMDetection3D: Open- MMLab next-generation platform for general 3D object detection.https://github.com/open- mmlab/ mmdetection3d, 2020. 6

work page 2020

[8] [8]

Spconv: Spatially sparse convolution library.https : / / github

Spconv Contributors. Spconv: Spatially sparse convolution library.https : / / github . com / traveller59 / spconv, 2022. 2

work page 2022

[9] [9]

Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner

Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes. In CVPR, 2017. 2, 6, 7, 8, 10, 11, 12

work page 2017

[10] [10]

V oxel R-CNN: Towards High Performance V oxel-based 3D Object Detection

Jiajun Deng, Shaoshuai Shi, Peiwei Li, Wengang Zhou, Yanyong Zhang, and Houqiang Li. V oxel R-CNN: Towards High Performance V oxel-based 3D Object Detection. In AAAI, 2021. 1

work page 2021

[11] [11]

Incremental-DETR: Incremental Few-Shot Object Detection via Self-Supervised Learning

Na Dong, Yongqiang Zhang, Mingli Ding, and Gim Hee Lee. Incremental-DETR: Incremental Few-Shot Object Detection via Self-Supervised Learning. InAAAI, 2023. 2, 3, 6, 7, 12

work page 2023

[12] [12]

DQS3D: Densely-matched Quantization- aware Semi-supervised 3D Detection

Huan-ang Gao, Beiwen Tian, Pengfei Li, Hao Zhao, and Guyue Zhou. DQS3D: Densely-matched Quantization- aware Semi-supervised 3D Detection. InICCV, 2023. 3

work page 2023

[13] [13]

Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object Detection

Yucheng Han, Na Zhao, Weiling Chen, Keng Teck Ma, and Hanwang Zhang. Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object Detection. InAAAI, 2024. 1

work page 2024

[14] [14]

Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection

Cheng-Ju Ho, Chen-Hsuan Tai, Yen-Yu Lin, Ming-Hsuan Yang, and Yi-Hsuan Tsai. Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection. InNeurIPS, 2024. 3

work page 2024

[15] [15]

Learning Superpoint Graph Cut for 3D Instance Segmentation

Le Hui, Linghua Tang, Yaqi Shen, Jin Xie, and Jian Yang. Learning Superpoint Graph Cut for 3D Instance Segmentation. InNeurIPS, 2022. 2

work page 2022

[16] [16]

Efficient LiDAR Point Cloud Oversegmentation Network

Le Hui, Linghua Tang, Yuchao Dai, Jin Xie, and Jian Yang. Efficient LiDAR Point Cloud Oversegmentation Network. In ICCV, 2023

work page 2023

[17] [17]

Sampling network guided cross-entropy method for unsupervised point cloud registration

Haobo Jiang, Yaqi Shen, Jin Xie, Jun Li, Jianjun Qian, and Jian Yang. Sampling network guided cross-entropy method for unsupervised point cloud registration. InICCV, 2021

work page 2021

[18] [18]

SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation

Haobo Jiang, Mathieu Salzmann, Zheng Dang, Jin Xie, and Jian Yang. SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation. In NeurIPS, 2023. 2

work page 2023

[19] [19]

Revisiting Pool-based Prompt Learning for Few-shot Class- incremental Learning

Yongwei Jiang, Yixiong Zou, Yuhua Li, and Ruixuan Li. Revisiting Pool-based Prompt Learning for Few-shot Class- incremental Learning. InCVPR, 2025. 2

work page 2025

[20] [20]

Segment Anything

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment Anything. InICCV, 2023. 2, 3

work page 2023

[21] [21]

SS3D: Sparsely-Supervised 3D Object Detection from Point Cloud

Chuandong Liu, Chenqiang Gao, Fangcen Liu, Jiang Liu, Deyu Meng, and Xinbo Gao. SS3D: Sparsely-Supervised 3D Object Detection from Point Cloud. InCVPR, 2022. 3

work page 2022

[22] [22]

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, et al. Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection. In ECCV, 2024. 2, 3, 4, 6, 10, 12

work page 2024

[23] [23]

SEC-Prompt: SEmantic Complementary Prompting for Few-Shot Class-Incremental Learning

Ye Liu and Meng Yang. SEC-Prompt: SEmantic Complementary Prompting for Few-Shot Class-Incremental Learning. InCVPR, 2025. 2, 3

work page 2025

[24] [24]

Continual Detection Transformer for Incremen- tal Object Detection

Yaoyao Liu, Bernt Schiele, Andrea Vedaldi, and Christian Rupprecht. Continual Detection Transformer for Incremen- tal Object Detection. InCVPR, 2023. 2

work page 2023

[25] [25]

Spa- tiallm: Training large language models for structured in- door modeling.arXiv preprint arXiv:2506.07491, 2025

Yongsen Mao, Junhao Zhong, Chuan Fang, Jia Zheng, Rui Tang, Hao Zhu, Ping Tan, and Zihan Zhou. SpatialLM: Training Large Language Models for Structured Indoor Modeling.arXiv preprint arXiv:2506.07491, 2025. 7

work page arXiv 2025

[26] [26]

Weakly Supervised 3D Object Detection from Lidar Point Cloud

Qinghao Meng, Wenguan Wang, Tianfei Zhou, Jianbing Shen, Luc Van Gool, and Dengxin Dai. Weakly Supervised 3D Object Detection from Lidar Point Cloud. InECCV,

work page

[27] [27]

V-net: Fully convolutional neural networks for columetric medical image segmentation

Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-net: Fully convolutional neural networks for columetric medical image segmentation. In3DV, 2016. 9

work page 2016

[28] [28]

How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation

Yining Pan, Qiongjie Cui, Xulei Yang, and Na Zhao. How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation. In ICML, 2025. 2

work page 2025

[29] [29]

Incremental Few-Shot Object Detection

Juan-Manuel Perez-Rua, Xiatian Zhu, Timothy M Hospedales, and Tao Xiang. Incremental Few-Shot Object Detection. InCVPR, 2020. 2, 3

work page 2020

[30] [30]

PointNet: Deep learning on Point sets for 3D Classification and Segmentation

Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. PointNet: Deep learning on Point sets for 3D Classification and Segmentation. InCVPR, 2017. 2

work page 2017

[31] [31]

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. InNeurIPS, 2017. 2

work page 2017

[32] [32]

Deep Hough V oting for 3D Object Detection in Point Clouds

Charles R Qi, Or Litany, Kaiming He, and Leonidas J Guibas. Deep Hough V oting for 3D Object Detection in Point Clouds. InICCV, 2019. 2

work page 2019

[33] [33]

Low-shot Learning with Imprinted Weights

Hang Qi, Matthew Brown, and David G Lowe. Low-shot Learning with Imprinted Weights. InCVPR, 2018. 4, 6, 7, 12

work page 2018

[34] [34]

FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection

Anna Rukhovich, Anna V orontsova, and Anton Konushin. FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection. InECCV, 2022. 1, 2, 5, 6

work page 2022

[35] [35]

TR3D: Towards Real-Time Indoor 3D Object Detection

Danila Rukhovich, Anna V orontsova, and Anton Konushin. TR3D: Towards Real-Time Indoor 3D Object Detection. In ICIP, 2023. 1, 2, 5, 6, 7

work page 2023

[36] [36]

V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection

Yichao Shen, Zigang Geng, Yuhui Yuan, Yutong Lin, Ze Liu, Chunyu Wang, Han Hu, Nanning Zheng, and Baining Guo. V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection. InICLR, 2024. 2

work page 2024

[37] [37]

Lichtenberg, and Jianxiong Xiao

Shuran Song, Samuel P. Lichtenberg, and Jianxiong Xiao. SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite. InCVPR, 2015. 2, 6, 7, 8, 10, 11, 12, 13

work page 2015

[38] [38]

Prototypical Variational Autoencoder for 3D Few-shot Object Detection

Weiliang Tang, Biqi Yang, Xianzhi Li, Pheng-Ann Heng, Yunhui Liu, and Chi-Wing Fu. Prototypical Variational Autoencoder for 3D Few-shot Object Detection. InNeurIPS,

work page

[39] [39]

3DIoUMatch: Leveraging IoU Prediction for Semi- Supervised 3D Object Detection

He Wang, Yezhen Cong, Or Litany, Yue Gao, and Leonidas J Guibas. 3DIoUMatch: Leveraging IoU Prediction for Semi- Supervised 3D Object Detection. InCVPR, 2021. 3

work page 2021

[40] [40]

CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds

Haiyang Wang, Shaocong Dong, Shaoshuai Shi, Aoxue Li, Jianan Li, Zhenguo Li, Liwei Wang, et al. CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds. InNeurIPS, 2022. 1

work page 2022

[41] [41]

Uncertainty Meets Diversity: A Comprehensive Active Learning Framework for Indoor 3D Object Detection

Jiangyi Wang and Na Zhao. Uncertainty Meets Diversity: A Comprehensive Active Learning Framework for Indoor 3D Object Detection. InCVPR, 2025. 1

work page 2025

[42] [42]

AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models

Xinyi Wang, Xun Yang, Yanlong Xu, Yuchen Wu, Zhen Li, and Na Zhao. AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models. In NeurIPS, 2025. 2

work page 2025

[43] [43]

AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring

Xinyi Wang, Na Zhao, Zhiyuan Han, Dan Guo, and Xun Yang. AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring. InAAAI, 2025. 2

work page 2025

[44] [44]

Syn-to- Real Unsupervised Domain Adaptation for Indoor 3D Object Detection

Yunsong Wang, Na Zhao, and Gim Hee Lee. Syn-to- Real Unsupervised Domain Adaptation for Indoor 3D Object Detection. InBMVC, 2024. 3

work page 2024

[45] [45]

One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection

Zhenyu Wang, Ya-Li Li, Hengshuang Zhao, and Shengjin Wang. One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection. InNeurIPS, 2024. 2

work page 2024

[46] [46]

Text2LiDAR: Text-guided LiDAR Point Cloud Generation via Equirectangular Transformer

Yang Wu, Kaihua Zhang, Jianjun Qian, Jin Xie, and Jian Yang. Text2LiDAR: Text-guided LiDAR Point Cloud Generation via Equirectangular Transformer. InECCV,

work page

[47] [47]

WeatherGen: A Unified Diverse Weather Generator for LiDAR Point Clouds via Spider Mamba Diffusion

Yang Wu, Yun Zhu, Kaihua Zhang, Jianjun Qian, Jin Xie, and Jian Yang. WeatherGen: A Unified Diverse Weather Generator for LiDAR Point Clouds via Spider Mamba Diffusion. InCVPR, 2025. 2

work page 2025

[48] [48]

CCF: Complementary Collaborative Fusion for Domain Generalized Multi-Modal 3D Object Detection.arXiv preprint arXiv:2603.23276, 2026

Yuchen Wu, Kun Wang, Yining Pan, and Na Zhao. CCF: Complementary Collaborative Fusion for Domain Generalized Multi-Modal 3D Object Detection.arXiv preprint arXiv:2603.23276, 2026. 1

work page arXiv 2026

[49] [49]

NaviFormer: A Spatio-Temporal Context-Aware Transformer for Object Navigation

Wei Xie, Haobo Jiang, Yun Zhu, Jianjun Qian, and Jin Xie. NaviFormer: A Spatio-Temporal Context-Aware Transformer for Object Navigation. InAAAI, 2025. 2

work page 2025

[50] [50]

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

Yunyang Xiong, Bala Varadarajan, Lemeng Wu, Xiaoyu Xi- ang, Fanyi Xiao, Chenchen Zhu, Xiaoliang Dai, Dilin Wang, Fei Sun, Forrest Iandola, Raghuraman Krishnamoorthi, and Vikas Chandra. EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything. InCVPR, 2024. 2, 3, 4, 7

work page 2024

[51] [51]

Back to Reality: Weakly-supervised 3D Object Detection with Shape-guided Label Enhancement

Xiuwei Xu, Yifan Wang, Yu Zheng, Yongming Rao, Jie Zhou, and Jiwen Lu. Back to Reality: Weakly-supervised 3D Object Detection with Shape-guided Label Enhancement. In CVPR, 2022. 3

work page 2022

[52] [52]

Mixsup: Mixed-Grained Supervision for Label-Efficient Lidar-based 3D Object Detection

Yuxue Yang, Lue Fan, and Zhaoxiang Zhang. Mixsup: Mixed-Grained Supervision for Label-Efficient Lidar-based 3D Object Detection. InICLR, 2024. 3, 7

work page 2024

[53] [53]

Sylph: A Hypernetwork Framework for Incremental Few-shot Object Detection

Li Yin, Juan M Perez-Rua, and Kevin J Liang. Sylph: A Hypernetwork Framework for Incremental Few-shot Object Detection. InCVPR, 2022. 2, 3

work page 2022

[54] [54]

General Geometry-aware Weakly Supervised 3D Object Detection

Guowen Zhang, Junsong Fan, Liyi Chen, Zhaoxiang Zhang, Zhen Lei, and Lei Zhang. General Geometry-aware Weakly Supervised 3D Object Detection. InECCV, 2024. 3, 7

work page 2024

[55] [55]

Attraction Diminishing and Distributing for Few-Shot Class-Incremental Learning

Li-Jun Zhao, Zhen-Duo Chen, Yongxin Wang, Xin Luo, and Xin-Shun Xu. Attraction Diminishing and Distributing for Few-Shot Class-Incremental Learning. InCVPR, 2025. 2, 3

work page 2025

[56] [56]

Static-Dynamic Co-teaching for Class-Incremental 3D Object Detection

Na Zhao and Gim Hee Lee. Static-Dynamic Co-teaching for Class-Incremental 3D Object Detection. InAAAI, 2022. 1, 2, 6, 10

work page 2022

[57] [57]

SESS: Self- Ensembling Semi-Supervised 3D Object Detection

Na Zhao, Tat-Seng Chua, and Gim Hee Lee. SESS: Self- Ensembling Semi-Supervised 3D Object Detection. In CVPR, 2020. 3

work page 2020

[58] [58]

SDCoT++: Improved Static-Dynamic Co- Teaching for Class-Incremental 3D Object Detection.IEEE Transactions on Image Processing, 2025

Na Zhao, Peisheng Qian, Fang Wu, Xun Xu, Xulei Yang, and Gim Hee Lee. SDCoT++: Improved Static-Dynamic Co- Teaching for Class-Incremental 3D Object Detection.IEEE Transactions on Image Processing, 2025. 1, 2, 6, 7, 11, 12

work page 2025

[59] [59]

Prototypical V oteNet for Few-Shot 3D Point Cloud Object Detection

Shizhen Zhao and Xiaojuan Qi. Prototypical V oteNet for Few-Shot 3D Point Cloud Object Detection. InNeurIPS,

work page

[60] [60]

SP3D: Boosting Sparsely-Supervised 3D Object Detection via Accurate Cross-Modal Semantic Prompts

Shijia Zhao, Qiming Xia, Xusheng Guo, Pufan Zou, Maoji Zheng, Hai Wu, Chenglu Wen, and Cheng Wang. SP3D: Boosting Sparsely-Supervised 3D Object Detection via Accurate Cross-Modal Semantic Prompts. InCVPR, 2025. 3

work page 2025

[61] [61]

Distance-IoU loss: Faster and Better Learning for Bounding Box Regression

Zhaohui Zheng, Ping Wang, Wei Liu, Jinze Li, Rongguang Ye, and Dongwei Ren. Distance-IoU loss: Faster and Better Learning for Bounding Box Regression. InAAAI, 2020. 9

work page 2020

[62] [62]

MonoSE(3)-Diffusion: A Monocular SE(3) Diffusion Framework for Robust Camera-to-Robot Pose Estimation.IEEE Robotics and Automation Letters, 10 (11):11832–11839, 2025

Kangjian Zhu, Haobo Jiang, Yigong Zhang, Jianjun Qian, Jian Yang, and Jin Xie. MonoSE(3)-Diffusion: A Monocular SE(3) Diffusion Framework for Robust Camera-to-Robot Pose Estimation.IEEE Robotics and Automation Letters, 10 (11):11832–11839, 2025. 2

work page 2025

[63] [63]

SPGroup3D: Superpoint Grouping Network for Indoor 3D Object Detection

Yun Zhu, Le Hui, Yaqi Shen, and Jin Xie. SPGroup3D: Superpoint Grouping Network for Indoor 3D Object Detection. InAAAI, 2024. 2

work page 2024

[64] [64]

Learning Class Prototypes for Unified Sparse- Supervised 3D Object Detection

Yun Zhu, Le Hui, Hang Yang, Jianjun Qian, Jin Xie, and Jian Yang. Learning Class Prototypes for Unified Sparse- Supervised 3D Object Detection. InCVPR, 2025. 3

work page 2025