Few-Shot Incremental 3D Object Detection in Dynamic Indoor Environments
Pith reviewed 2026-05-10 16:53 UTC · model grok-4.3
The pith
Few-shot incremental 3D detection works by mining unknown objects with vision-language models and fusing 2D-3D prototypes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is that a combination of VLM-guided unknown object mining, spatial and consistency-based feature weighting, and gated multimodal prototype imprinting allows effective few-shot incremental 3D object detection without requiring extensive annotations for novel classes, as demonstrated by consistent improvements on ScanNet V2 and SUN RGB-D datasets in batch and sequential settings.
What carries the argument
The gated multimodal prototype imprinting module constructs category prototypes from aligned 2D semantic and 3D geometric features and fuses their classification scores using a multimodal gating mechanism to detect novel objects.
If this is right
- Detectors can add new classes with minimal additional labeling effort.
- Unknown objects are perceived better already in the initial training phase.
- Consistent gains appear in both batch addition of all new classes at once and sequential addition over time.
- The framework defines standard evaluation protocols for this task on two common 3D indoor datasets.
Where Pith is reading between the lines
- The reliance on external vision-language models suggests that further advances in those models would directly boost few-shot 3D detection performance.
- This technique could be adapted for real-time robotic systems that encounter new objects during operation.
- Extending the weighting and gating ideas to other sensor modalities like depth-only or LiDAR data might broaden its use.
- If the noise reduction proves robust, the method could handle even noisier inputs from less capable vision-language models.
Load-bearing premise
Vision-language models are able to mine unknown objects reliably and generate 2D semantic features and class-agnostic 3D boxes that are not too noisy for the subsequent weighting and fusion modules to handle effectively.
What would settle it
Running the system without the VLM-guided unknown object learning module and checking if few-shot performance on novel classes falls back to levels achieved by standard incremental baselines on the ScanNet V2 or SUN RGB-D datasets.
Figures
read the original abstract
Incremental 3D object perception is a critical step toward embodied intelligence in dynamic indoor environments. However, existing incremental 3D detection methods rely on extensive annotations of novel classes for satisfactory performance. To address this limitation, we propose FI3Det, a Few-shot Incremental 3D Detection framework that enables efficient 3D perception with only a few novel samples by leveraging vision-language models (VLMs) to learn knowledge of unseen categories. FI3Det introduces a VLM-guided unknown object learning module in the base stage to enhance perception of unseen categories. Specifically, it employs VLMs to mine unknown objects and extract comprehensive representations, including 2D semantic features and class-agnostic 3D bounding boxes. To mitigate noise in these representations, a weighting mechanism is further designed to re-weight the contributions of point- and box-level features based on their spatial locations and feature consistency within each box. Moreover, FI3Det proposes a gated multimodal prototype imprinting module, where category prototypes are constructed from aligned 2D semantic and 3D geometric features to compute classification scores, which are then fused via a multimodal gating mechanism for novel object detection. As the first framework for few-shot incremental 3D object detection, we establish both batch and sequential evaluation settings on two datasets, ScanNet V2 and SUN RGB-D, where FI3Det achieves strong and consistent improvements over baseline methods. Code is available at https://github.com/zyrant/FI3Det.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes FI3Det, the first framework for few-shot incremental 3D object detection in dynamic indoor environments. It uses a VLM-guided unknown object learning module in the base stage to mine unknown objects and extract 2D semantic features plus class-agnostic 3D bounding boxes from few novel samples. Noise in these representations is addressed by a weighting mechanism that re-weights point- and box-level features according to spatial location and feature consistency. A gated multimodal prototype imprinting module then builds category prototypes from aligned 2D/3D features and fuses classification scores via multimodal gating. New batch and sequential evaluation protocols are introduced on ScanNet V2 and SUN RGB-D, with reported strong and consistent gains over baselines; code is released.
Significance. If the results hold, the work would be significant as the first dedicated approach to few-shot incremental 3D detection, lowering annotation costs for novel classes in embodied indoor settings. Establishing batch and sequential protocols on standard datasets is a useful contribution to evaluation methodology. The code release is a clear strength that supports reproducibility and follow-on research.
major comments (2)
- The central performance claims rest on the assumption that VLMs produce sufficiently accurate class-agnostic 3D boxes and low-noise 2D semantic features from few-shot indoor samples; the weighting and gated-imprinting modules are then asserted to clean residual noise. No quantitative metrics (e.g., mining precision/recall, feature noise statistics, or VLM output quality on ScanNet V2 / SUN RGB-D) are supplied to show that these downstream steps can recover from typical VLM domain-shift errors, leaving the reported gains unsubstantiated.
- The abstract states that FI3Det 'achieves strong and consistent improvements over baseline methods' in both batch and sequential settings, yet supplies no numerical results, ablation tables, or error analysis. Without these, it is impossible to verify whether gains are attributable to the proposed weighting and gating components or to other factors.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the significance of FI3Det as the first dedicated framework for few-shot incremental 3D object detection, along with the value of the new evaluation protocols and code release. We address each major comment below.
read point-by-point responses
-
Referee: The central performance claims rest on the assumption that VLMs produce sufficiently accurate class-agnostic 3D boxes and low-noise 2D semantic features from few-shot indoor samples; the weighting and gated-imprinting modules are then asserted to clean residual noise. No quantitative metrics (e.g., mining precision/recall, feature noise statistics, or VLM output quality on ScanNet V2 / SUN RGB-D) are supplied to show that these downstream steps can recover from typical VLM domain-shift errors, leaving the reported gains unsubstantiated.
Authors: We agree that explicit quantitative validation of the VLM mining step would further substantiate the claims. In the revised manuscript we will add precision/recall metrics for unknown-object mining on both ScanNet V2 and SUN RGB-D, together with before/after statistics on feature consistency and noise levels. These additions will directly illustrate how the spatial- and consistency-based weighting recovers from typical VLM domain-shift errors. The existing ablation studies already isolate the contribution of the weighting and gated-imprinting modules by showing performance degradation when either component is removed. revision: yes
-
Referee: The abstract states that FI3Det 'achieves strong and consistent improvements over baseline methods' in both batch and sequential settings, yet supplies no numerical results, ablation tables, or error analysis. Without these, it is impossible to verify whether gains are attributable to the proposed weighting and gating components or to other factors.
Authors: Abstracts are subject to strict length limits and therefore omit detailed tables and analyses. The full manuscript already contains the requested material: quantitative results for both batch and sequential protocols (Tables 1–2), component ablations (Table 3), and error analysis (Section 4.3). These tables and figures explicitly attribute the observed gains to the weighting and gating modules. To address the concern, we will revise the abstract to include a small number of concrete improvement figures (e.g., mAP deltas) while remaining within the word limit. revision: partial
Circularity Check
No circularity; framework and evaluations are self-contained
full rationale
The paper introduces FI3Det as a novel framework using VLM-guided unknown object mining, spatial/feature weighting, and gated multimodal prototype imprinting for few-shot incremental 3D detection. No equations, derivations, or fitted parameters are shown that reduce performance claims to quantities defined by the method's own inputs. New batch/sequential evaluation protocols on ScanNet V2 and SUN RGB-D are presented as external benchmarks without circular dependence on internal definitions or self-citations. Central claims rest on empirical gains over baselines rather than tautological reductions or load-bearing self-references.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Generalized Few- shot 3D Point Cloud Segmentation with Vision-Language Model
Zhaochong An, Guolei Sun, Yun Liu, Runjia Li, Junlin Han, Ender Konukoglu, and Serge Belongie. Generalized Few- shot 3D Point Cloud Segmentation with Vision-Language Model. InCVPR, 2025. 3
work page 2025
-
[2]
V oxelNeXt: Fully Sparse V oxelNet for 3D Object Detection and Tracking
Yukang Chen, Jianhui Liu, Xiangyu Zhang, Xiaojuan Qi, and Jiaya Jia. V oxelNeXt: Fully Sparse V oxelNet for 3D Object Detection and Tracking. InCVPR, 2023. 1, 2
work page 2023
-
[3]
Enhancing Few-Shot Class-Incremental Learning via Training-Free Bi-Level Modality Calibration
Yiyang Chen, Tianyu Ding, Lei Wang, Jing Huo, Yang Gao, and Wenbin Li. Enhancing Few-Shot Class-Incremental Learning via Training-Free Bi-Level Modality Calibration. InCVPR, 2025. 2
work page 2025
-
[4]
YOLO-World: Real-Time Open-V ocabulary Object Detection
Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xinggang Wang, and Ying Shan. YOLO-World: Real-Time Open-V ocabulary Object Detection. InCVPR, 2024. 3, 10, 12
work page 2024
-
[5]
Zhongyao Cheng, Fang Wu, Peisheng Qian, Ziyuan Zhao, and Xulei Yang. AIC3DOD: Advancing Indoor Class- Incremental 3D Object Detection with Point Transformer Architecture and Room Layout Constraints. InWACV, 2025. 1, 2, 6, 7, 12
work page 2025
-
[6]
4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks
Christopher Choy, JunYoung Gwak, and Silvio Savarese. 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. InCVPR, 2019. 2
work page 2019
-
[7]
MMDetection3D Contributors. MMDetection3D: Open- MMLab next-generation platform for general 3D object detection.https://github.com/open- mmlab/ mmdetection3d, 2020. 6
work page 2020
-
[8]
Spconv: Spatially sparse convolution library.https : / / github
Spconv Contributors. Spconv: Spatially sparse convolution library.https : / / github . com / traveller59 / spconv, 2022. 2
work page 2022
-
[9]
Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes. In CVPR, 2017. 2, 6, 7, 8, 10, 11, 12
work page 2017
-
[10]
V oxel R-CNN: Towards High Performance V oxel-based 3D Object Detection
Jiajun Deng, Shaoshuai Shi, Peiwei Li, Wengang Zhou, Yanyong Zhang, and Houqiang Li. V oxel R-CNN: Towards High Performance V oxel-based 3D Object Detection. In AAAI, 2021. 1
work page 2021
-
[11]
Incremental-DETR: Incremental Few-Shot Object Detection via Self-Supervised Learning
Na Dong, Yongqiang Zhang, Mingli Ding, and Gim Hee Lee. Incremental-DETR: Incremental Few-Shot Object Detection via Self-Supervised Learning. InAAAI, 2023. 2, 3, 6, 7, 12
work page 2023
-
[12]
DQS3D: Densely-matched Quantization- aware Semi-supervised 3D Detection
Huan-ang Gao, Beiwen Tian, Pengfei Li, Hao Zhao, and Guyue Zhou. DQS3D: Densely-matched Quantization- aware Semi-supervised 3D Detection. InICCV, 2023. 3
work page 2023
-
[13]
Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object Detection
Yucheng Han, Na Zhao, Weiling Chen, Keng Teck Ma, and Hanwang Zhang. Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object Detection. InAAAI, 2024. 1
work page 2024
-
[14]
Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection
Cheng-Ju Ho, Chen-Hsuan Tai, Yen-Yu Lin, Ming-Hsuan Yang, and Yi-Hsuan Tsai. Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection. InNeurIPS, 2024. 3
work page 2024
-
[15]
Learning Superpoint Graph Cut for 3D Instance Segmentation
Le Hui, Linghua Tang, Yaqi Shen, Jin Xie, and Jian Yang. Learning Superpoint Graph Cut for 3D Instance Segmentation. InNeurIPS, 2022. 2
work page 2022
-
[16]
Efficient LiDAR Point Cloud Oversegmentation Network
Le Hui, Linghua Tang, Yuchao Dai, Jin Xie, and Jian Yang. Efficient LiDAR Point Cloud Oversegmentation Network. In ICCV, 2023
work page 2023
-
[17]
Sampling network guided cross-entropy method for unsupervised point cloud registration
Haobo Jiang, Yaqi Shen, Jin Xie, Jun Li, Jianjun Qian, and Jian Yang. Sampling network guided cross-entropy method for unsupervised point cloud registration. InICCV, 2021
work page 2021
-
[18]
SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation
Haobo Jiang, Mathieu Salzmann, Zheng Dang, Jin Xie, and Jian Yang. SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation. In NeurIPS, 2023. 2
work page 2023
-
[19]
Revisiting Pool-based Prompt Learning for Few-shot Class- incremental Learning
Yongwei Jiang, Yixiong Zou, Yuhua Li, and Ruixuan Li. Revisiting Pool-based Prompt Learning for Few-shot Class- incremental Learning. InCVPR, 2025. 2
work page 2025
-
[20]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment Anything. InICCV, 2023. 2, 3
work page 2023
-
[21]
SS3D: Sparsely-Supervised 3D Object Detection from Point Cloud
Chuandong Liu, Chenqiang Gao, Fangcen Liu, Jiang Liu, Deyu Meng, and Xinbo Gao. SS3D: Sparsely-Supervised 3D Object Detection from Point Cloud. InCVPR, 2022. 3
work page 2022
-
[22]
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, et al. Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection. In ECCV, 2024. 2, 3, 4, 6, 10, 12
work page 2024
-
[23]
SEC-Prompt: SEmantic Complementary Prompting for Few-Shot Class-Incremental Learning
Ye Liu and Meng Yang. SEC-Prompt: SEmantic Complementary Prompting for Few-Shot Class-Incremental Learning. InCVPR, 2025. 2, 3
work page 2025
-
[24]
Continual Detection Transformer for Incremen- tal Object Detection
Yaoyao Liu, Bernt Schiele, Andrea Vedaldi, and Christian Rupprecht. Continual Detection Transformer for Incremen- tal Object Detection. InCVPR, 2023. 2
work page 2023
-
[25]
Yongsen Mao, Junhao Zhong, Chuan Fang, Jia Zheng, Rui Tang, Hao Zhu, Ping Tan, and Zihan Zhou. SpatialLM: Training Large Language Models for Structured Indoor Modeling.arXiv preprint arXiv:2506.07491, 2025. 7
-
[26]
Weakly Supervised 3D Object Detection from Lidar Point Cloud
Qinghao Meng, Wenguan Wang, Tianfei Zhou, Jianbing Shen, Luc Van Gool, and Dengxin Dai. Weakly Supervised 3D Object Detection from Lidar Point Cloud. InECCV,
-
[27]
V-net: Fully convolutional neural networks for columetric medical image segmentation
Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-net: Fully convolutional neural networks for columetric medical image segmentation. In3DV, 2016. 9
work page 2016
-
[28]
How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation
Yining Pan, Qiongjie Cui, Xulei Yang, and Na Zhao. How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation. In ICML, 2025. 2
work page 2025
-
[29]
Incremental Few-Shot Object Detection
Juan-Manuel Perez-Rua, Xiatian Zhu, Timothy M Hospedales, and Tao Xiang. Incremental Few-Shot Object Detection. InCVPR, 2020. 2, 3
work page 2020
-
[30]
PointNet: Deep learning on Point sets for 3D Classification and Segmentation
Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. PointNet: Deep learning on Point sets for 3D Classification and Segmentation. InCVPR, 2017. 2
work page 2017
-
[31]
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. InNeurIPS, 2017. 2
work page 2017
-
[32]
Deep Hough V oting for 3D Object Detection in Point Clouds
Charles R Qi, Or Litany, Kaiming He, and Leonidas J Guibas. Deep Hough V oting for 3D Object Detection in Point Clouds. InICCV, 2019. 2
work page 2019
-
[33]
Low-shot Learning with Imprinted Weights
Hang Qi, Matthew Brown, and David G Lowe. Low-shot Learning with Imprinted Weights. InCVPR, 2018. 4, 6, 7, 12
work page 2018
-
[34]
FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection
Anna Rukhovich, Anna V orontsova, and Anton Konushin. FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection. InECCV, 2022. 1, 2, 5, 6
work page 2022
-
[35]
TR3D: Towards Real-Time Indoor 3D Object Detection
Danila Rukhovich, Anna V orontsova, and Anton Konushin. TR3D: Towards Real-Time Indoor 3D Object Detection. In ICIP, 2023. 1, 2, 5, 6, 7
work page 2023
-
[36]
V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection
Yichao Shen, Zigang Geng, Yuhui Yuan, Yutong Lin, Ze Liu, Chunyu Wang, Han Hu, Nanning Zheng, and Baining Guo. V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection. InICLR, 2024. 2
work page 2024
-
[37]
Lichtenberg, and Jianxiong Xiao
Shuran Song, Samuel P. Lichtenberg, and Jianxiong Xiao. SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite. InCVPR, 2015. 2, 6, 7, 8, 10, 11, 12, 13
work page 2015
-
[38]
Prototypical Variational Autoencoder for 3D Few-shot Object Detection
Weiliang Tang, Biqi Yang, Xianzhi Li, Pheng-Ann Heng, Yunhui Liu, and Chi-Wing Fu. Prototypical Variational Autoencoder for 3D Few-shot Object Detection. InNeurIPS,
-
[39]
3DIoUMatch: Leveraging IoU Prediction for Semi- Supervised 3D Object Detection
He Wang, Yezhen Cong, Or Litany, Yue Gao, and Leonidas J Guibas. 3DIoUMatch: Leveraging IoU Prediction for Semi- Supervised 3D Object Detection. InCVPR, 2021. 3
work page 2021
-
[40]
CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds
Haiyang Wang, Shaocong Dong, Shaoshuai Shi, Aoxue Li, Jianan Li, Zhenguo Li, Liwei Wang, et al. CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds. InNeurIPS, 2022. 1
work page 2022
-
[41]
Jiangyi Wang and Na Zhao. Uncertainty Meets Diversity: A Comprehensive Active Learning Framework for Indoor 3D Object Detection. InCVPR, 2025. 1
work page 2025
-
[42]
AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models
Xinyi Wang, Xun Yang, Yanlong Xu, Yuchen Wu, Zhen Li, and Na Zhao. AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models. In NeurIPS, 2025. 2
work page 2025
-
[43]
Xinyi Wang, Na Zhao, Zhiyuan Han, Dan Guo, and Xun Yang. AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring. InAAAI, 2025. 2
work page 2025
-
[44]
Syn-to- Real Unsupervised Domain Adaptation for Indoor 3D Object Detection
Yunsong Wang, Na Zhao, and Gim Hee Lee. Syn-to- Real Unsupervised Domain Adaptation for Indoor 3D Object Detection. InBMVC, 2024. 3
work page 2024
-
[45]
One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection
Zhenyu Wang, Ya-Li Li, Hengshuang Zhao, and Shengjin Wang. One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection. InNeurIPS, 2024. 2
work page 2024
-
[46]
Text2LiDAR: Text-guided LiDAR Point Cloud Generation via Equirectangular Transformer
Yang Wu, Kaihua Zhang, Jianjun Qian, Jin Xie, and Jian Yang. Text2LiDAR: Text-guided LiDAR Point Cloud Generation via Equirectangular Transformer. InECCV,
-
[47]
WeatherGen: A Unified Diverse Weather Generator for LiDAR Point Clouds via Spider Mamba Diffusion
Yang Wu, Yun Zhu, Kaihua Zhang, Jianjun Qian, Jin Xie, and Jian Yang. WeatherGen: A Unified Diverse Weather Generator for LiDAR Point Clouds via Spider Mamba Diffusion. InCVPR, 2025. 2
work page 2025
-
[48]
Yuchen Wu, Kun Wang, Yining Pan, and Na Zhao. CCF: Complementary Collaborative Fusion for Domain Generalized Multi-Modal 3D Object Detection.arXiv preprint arXiv:2603.23276, 2026. 1
-
[49]
NaviFormer: A Spatio-Temporal Context-Aware Transformer for Object Navigation
Wei Xie, Haobo Jiang, Yun Zhu, Jianjun Qian, and Jin Xie. NaviFormer: A Spatio-Temporal Context-Aware Transformer for Object Navigation. InAAAI, 2025. 2
work page 2025
-
[50]
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
Yunyang Xiong, Bala Varadarajan, Lemeng Wu, Xiaoyu Xi- ang, Fanyi Xiao, Chenchen Zhu, Xiaoliang Dai, Dilin Wang, Fei Sun, Forrest Iandola, Raghuraman Krishnamoorthi, and Vikas Chandra. EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything. InCVPR, 2024. 2, 3, 4, 7
work page 2024
-
[51]
Back to Reality: Weakly-supervised 3D Object Detection with Shape-guided Label Enhancement
Xiuwei Xu, Yifan Wang, Yu Zheng, Yongming Rao, Jie Zhou, and Jiwen Lu. Back to Reality: Weakly-supervised 3D Object Detection with Shape-guided Label Enhancement. In CVPR, 2022. 3
work page 2022
-
[52]
Mixsup: Mixed-Grained Supervision for Label-Efficient Lidar-based 3D Object Detection
Yuxue Yang, Lue Fan, and Zhaoxiang Zhang. Mixsup: Mixed-Grained Supervision for Label-Efficient Lidar-based 3D Object Detection. InICLR, 2024. 3, 7
work page 2024
-
[53]
Sylph: A Hypernetwork Framework for Incremental Few-shot Object Detection
Li Yin, Juan M Perez-Rua, and Kevin J Liang. Sylph: A Hypernetwork Framework for Incremental Few-shot Object Detection. InCVPR, 2022. 2, 3
work page 2022
-
[54]
General Geometry-aware Weakly Supervised 3D Object Detection
Guowen Zhang, Junsong Fan, Liyi Chen, Zhaoxiang Zhang, Zhen Lei, and Lei Zhang. General Geometry-aware Weakly Supervised 3D Object Detection. InECCV, 2024. 3, 7
work page 2024
-
[55]
Attraction Diminishing and Distributing for Few-Shot Class-Incremental Learning
Li-Jun Zhao, Zhen-Duo Chen, Yongxin Wang, Xin Luo, and Xin-Shun Xu. Attraction Diminishing and Distributing for Few-Shot Class-Incremental Learning. InCVPR, 2025. 2, 3
work page 2025
-
[56]
Static-Dynamic Co-teaching for Class-Incremental 3D Object Detection
Na Zhao and Gim Hee Lee. Static-Dynamic Co-teaching for Class-Incremental 3D Object Detection. InAAAI, 2022. 1, 2, 6, 10
work page 2022
-
[57]
SESS: Self- Ensembling Semi-Supervised 3D Object Detection
Na Zhao, Tat-Seng Chua, and Gim Hee Lee. SESS: Self- Ensembling Semi-Supervised 3D Object Detection. In CVPR, 2020. 3
work page 2020
-
[58]
Na Zhao, Peisheng Qian, Fang Wu, Xun Xu, Xulei Yang, and Gim Hee Lee. SDCoT++: Improved Static-Dynamic Co- Teaching for Class-Incremental 3D Object Detection.IEEE Transactions on Image Processing, 2025. 1, 2, 6, 7, 11, 12
work page 2025
-
[59]
Prototypical V oteNet for Few-Shot 3D Point Cloud Object Detection
Shizhen Zhao and Xiaojuan Qi. Prototypical V oteNet for Few-Shot 3D Point Cloud Object Detection. InNeurIPS,
-
[60]
SP3D: Boosting Sparsely-Supervised 3D Object Detection via Accurate Cross-Modal Semantic Prompts
Shijia Zhao, Qiming Xia, Xusheng Guo, Pufan Zou, Maoji Zheng, Hai Wu, Chenglu Wen, and Cheng Wang. SP3D: Boosting Sparsely-Supervised 3D Object Detection via Accurate Cross-Modal Semantic Prompts. InCVPR, 2025. 3
work page 2025
-
[61]
Distance-IoU loss: Faster and Better Learning for Bounding Box Regression
Zhaohui Zheng, Ping Wang, Wei Liu, Jinze Li, Rongguang Ye, and Dongwei Ren. Distance-IoU loss: Faster and Better Learning for Bounding Box Regression. InAAAI, 2020. 9
work page 2020
-
[62]
Kangjian Zhu, Haobo Jiang, Yigong Zhang, Jianjun Qian, Jian Yang, and Jin Xie. MonoSE(3)-Diffusion: A Monocular SE(3) Diffusion Framework for Robust Camera-to-Robot Pose Estimation.IEEE Robotics and Automation Letters, 10 (11):11832–11839, 2025. 2
work page 2025
-
[63]
SPGroup3D: Superpoint Grouping Network for Indoor 3D Object Detection
Yun Zhu, Le Hui, Yaqi Shen, and Jin Xie. SPGroup3D: Superpoint Grouping Network for Indoor 3D Object Detection. InAAAI, 2024. 2
work page 2024
-
[64]
Learning Class Prototypes for Unified Sparse- Supervised 3D Object Detection
Yun Zhu, Le Hui, Hang Yang, Jianjun Qian, Jin Xie, and Jian Yang. Learning Class Prototypes for Unified Sparse- Supervised 3D Object Detection. InCVPR, 2025. 3
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.