Learning Adaptive Pseudo-Label Selection for Semi-Supervised 3D Object Detection

Taehun Kong; Tae-Kyun Kim

arxiv: 2509.23880 · v3 · submitted 2025-09-28 · 💻 cs.CV

Learning Adaptive Pseudo-Label Selection for Semi-Supervised 3D Object Detection

Taehun Kong , Tae-Kyun Kim This is my paper

Pith reviewed 2026-05-18 11:48 UTC · model grok-4.3

classification 💻 cs.CV

keywords semi-supervised 3D object detectionpseudo-label selectionteacher-student frameworkadaptive thresholdingquality assessment networkKITTIWaymo

0 comments

The pith

Two networks trained on pseudo-label alignment with ground truth enable automatic adaptive selection of high-quality labels for semi-supervised 3D object detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to replace manual or partially informed confidence thresholds with a learnable module that picks reliable pseudo-labels from a teacher model. It adds a quality-assessment network that fuses available scores and a separate network that sets thresholds according to context such as object distance, class, and training state. Both networks are trained directly on how closely the chosen pseudo-labels match ground-truth boxes, and a soft-supervision step lets the student model down-weight noisier labels. A sympathetic reader would expect this to let semi-supervised 3D detectors exploit more unlabeled scenes without losing precision or missing rare contexts.

Core claim

The central claim is that a learnable pseudo-labeling module, built from a quality-assessment network performing score fusion and a threshold network producing context-adaptive decisions, can be supervised by the geometric alignment of pseudo-labels to ground-truth boxes; when combined with soft supervision that prioritizes cleaner labels, this module selects high-precision pseudo-labels while preserving wider contextual coverage and higher recall than fixed-threshold or earlier dynamic methods.

What carries the argument

A learnable pseudo-labeling module containing a quality-assessment network and a context-adaptive threshold network, both supervised by the alignment between pseudo-labels and ground-truth bounding boxes.

If this is right

The method produces higher overall performance than prior SS3DOD approaches on the KITTI and Waymo benchmarks.
Selected pseudo-labels achieve high precision together with broader coverage of contexts and improved recall rates.
Soft supervision lets the student network focus training on cleaner labels even when some pseudo-label noise remains.
Context-aware thresholds replace the need for hand-tuned confidence cutoffs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same alignment-based supervision could be tested on other pseudo-labeling pipelines outside 3D detection.
Removing manual threshold search may simplify scaling to new unlabeled datasets or sensor setups.
The two-network design might be combined with different teacher-student architectures to check for further gains.

Load-bearing premise

Alignment of pseudo-labels with ground-truth bounding boxes supplies reliable and sufficient supervision for the quality-assessment and threshold networks without introducing harmful bias.

What would settle it

On a held-out validation set with full ground truth, measure whether the adaptive module's selected labels show measurably higher precision at the same or higher recall than a fixed-threshold baseline; if no consistent gain appears across distance and class bins, the adaptive selection claim is refuted.

Figures

Figures reproduced from arXiv: 2509.23880 by Taehun Kong, Tae-Kyun Kim.

**Figure 2.** Figure 2: (a), (b), and (c) show that classification confidence and objectness have different distributions depending on the context. (b) [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the proposed framework, consisting of two main components: the Pseudo-label Selection Module (PSM), which [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: The correlation between GT-IoU and each score for [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: CTE thresholds by classes and distances. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 7.** Figure 7: Quantitative comparisons of pseudo-label qualities on [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Qualitative comparisons of pseudo-labels on KITTI. [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

read the original abstract

Semi-supervised 3D object detection (SS3DOD) aims to reduce costly 3D annotations utilizing unlabeled data. Recent studies adopt pseudo-label-based teacher-student frameworks and demonstrate impressive performance. The main challenge of these frameworks is in selecting high-quality pseudo-labels from the teacher's predictions. Most previous methods, however, select pseudo-labels by comparing confidence scores over thresholds manually set. The latest works tackle the challenge either by dynamic thresholding or refining the quality of pseudo-labels. Such methods still overlook contextual information e.g. object distances, classes, and learning states, and inadequately assess the pseudo-label quality using partial information available from the networks. In this work, we propose a novel SS3DOD framework featuring a learnable pseudo-labeling module designed to automatically and adaptively select high-quality pseudo-labels. Our approach introduces two networks at the teacher output level. These networks reliably assess the quality of pseudo-labels by the score fusion and determine context-adaptive thresholds, which are supervised by the alignment of pseudo-labels over GT bounding boxes. Additionally, we introduce a soft supervision strategy that can learn robustly under pseudo-label noises. This helps the student network prioritize cleaner labels over noisy ones in semi-supervised learning. Extensive experiments on the KITTI and Waymo datasets demonstrate the effectiveness of our method. The proposed method selects high-precision pseudo-labels while maintaining a wider coverage of contexts and a higher recall rate, significantly improving relevant SS3DOD methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds learnable score fusion and context-adaptive thresholds for pseudo-label selection in SS3DOD, supervised by GT alignment plus soft student supervision.

read the letter

This paper's main contribution is a pair of networks added at the teacher stage that learn to fuse pseudo-label scores and set thresholds that adapt to context such as distance, class, and training state. The supervision signal comes from matching pseudo-labels to ground-truth boxes on the labeled portion, and a soft supervision scheme lets the student focus on cleaner labels during training. Experiments are reported on KITTI and Waymo, with claims of higher precision, broader context coverage, and better recall than prior threshold-based or dynamic-threshold methods in semi-supervised 3D detection.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a semi-supervised 3D object detection (SS3DOD) framework that augments teacher-student pseudo-labeling with a learnable module containing a quality-assessment network (via score fusion) and a threshold network that produces context-adaptive thresholds. Both networks are supervised by alignment between teacher pseudo-labels and ground-truth boxes on labeled data; a soft-supervision strategy is added to let the student down-weight noisy labels. Experiments on KITTI and Waymo are reported to yield higher-precision pseudo-labels, broader contextual coverage, and improved recall over prior SS3DOD baselines.

Significance. If the empirical gains are reproducible and the learned selector generalizes, the work would meaningfully advance pseudo-label selection in 3D detection by replacing hand-tuned or non-contextual thresholds with data-driven, context-aware components. The soft-supervision mechanism is a constructive addition for noise robustness.

major comments (2)

[Abstract / §3] Abstract and §3 (method): the claim that the two networks learn a generalizable quality function rests on supervision obtained solely from GT alignment on labeled data. Because this supervision is unavailable on the unlabeled distribution, it is unclear whether the resulting thresholds and quality scores avoid systematic under- or over-selection when object distances, classes, or learning states differ from the labeled subset; this directly underpins the stated improvements in recall and contextual coverage.
[§4] §4 (experiments): the abstract asserts that the method 'significantly improving relevant SS3DOD methods' yet provides no numerical deltas, ablation tables isolating the contribution of the quality-assessment versus threshold network, or direct comparison against recent dynamic-thresholding baselines; without these the central performance claim cannot be verified.

minor comments (2)

Clarify the precise input features and output heads of the two networks and how their predictions are fused at inference time on unlabeled frames.
Specify the exact formulation of the soft-supervision loss (e.g., weighting scheme or temperature) and whether it is applied only to the student or also back to the teacher.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [Abstract / §3] Abstract and §3 (method): the claim that the two networks learn a generalizable quality function rests on supervision obtained solely from GT alignment on labeled data. Because this supervision is unavailable on the unlabeled distribution, it is unclear whether the resulting thresholds and quality scores avoid systematic under- or over-selection when object distances, classes, or learning states differ from the labeled subset; this directly underpins the stated improvements in recall and contextual coverage.

Authors: The quality-assessment and threshold networks are supervised exclusively on labeled data where GT alignments are available. These networks learn to map contextual features—object distances, classes, and model learning state—to quality scores and adaptive thresholds. The same feature extraction is used for unlabeled data, so the learned mapping is applied directly. Experiments on KITTI and Waymo, which contain diverse distances and classes, show higher recall and broader context coverage than baselines, providing empirical evidence that systematic under- or over-selection is mitigated. In revision we will add a paragraph in §3 explicitly discussing the generalization assumption and its empirical support. revision: partial
Referee: [§4] §4 (experiments): the abstract asserts that the method 'significantly improving relevant SS3DOD methods' yet provides no numerical deltas, ablation tables isolating the contribution of the quality-assessment versus threshold network, or direct comparison against recent dynamic-thresholding baselines; without these the central performance claim cannot be verified.

Authors: The current abstract statement is qualitative. Section 4 already reports mAP and recall gains over multiple SS3DOD baselines on both KITTI and Waymo. To address the request directly, we will (1) insert concrete numerical deltas into the abstract, (2) add ablation tables that isolate the quality-assessment network from the threshold network, and (3) include comparisons against recent dynamic-thresholding methods. These changes will be incorporated in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's central mechanism trains quality-assessment and threshold networks via explicit supervision from alignment between teacher pseudo-labels and ground-truth boxes on labeled data, then applies the learned context-adaptive thresholds to unlabeled pseudo-labels. This external GT-derived signal is independent of the model's own predictions on the target unlabeled distribution and does not reduce any claimed prediction or result to a fitted input or self-definition by construction. No equations or steps in the abstract or description equate the output selection to the input supervision through renaming or tautology. The soft-supervision strategy for the student is likewise a standard noise-robust loss rather than a circular re-use of the same fitted values. The overall framework therefore remains self-contained with respect to external benchmarks on KITTI and Waymo.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on standard semi-supervised assumptions plus the new claim that two auxiliary networks can reliably learn quality and thresholds from GT alignment.

axioms (1)

domain assumption Pseudo-label quality can be reliably assessed and thresholded by learned score fusion and context features.
This is the core modeling choice introduced in the abstract.

pith-pipeline@v0.9.0 · 5796 in / 1152 out tokens · 32099 ms · 2026-05-18T11:48:10.648962+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PQE encodes teacher scores … to Q(xs_i) predicting GT-IoU; trained by LPQE = MSE(Q(xs_i), IoU(b_i, bGT_i))
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CTE learns T(c_i, d_i | θ_t) via Lthr on false-positive/negative cases w.r.t. τ_iou

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages

[1]

Semi-supervised object detection with object- wise contrastive learning and regression uncertainty.arXiv preprint arXiv:2212.02747, 2022

Honggyu Choi, Zhixiang Chen, Xuepeng Shi, and Tae- Kyun Kim. Semi-supervised object detection with object- wise contrastive learning and regression uncertainty.arXiv preprint arXiv:2212.02747, 2022. 3

work page arXiv 2022
[2]

V oxel r-cnn: Towards high performance voxel-based 3d object detection

Jiajun Deng, Shaoshuai Shi, Peiwei Li, Wengang Zhou, Yanyong Zhang, and Houqiang Li. V oxel r-cnn: Towards high performance voxel-based 3d object detection. InPro- ceedings of the AAAI conference on artificial intelligence, pages 1201–1209, 2021. 2, 3, 6, 1

work page 2021
[3]

Are we ready for autonomous driving? the kitti vision benchmark suite

Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In2012 IEEE conference on computer vision and pat- tern recognition, pages 3354–3361. IEEE, 2012. 6

work page 2012
[4]

Cross-modality knowl- edge distillation network for monocular 3d object detection

Yu Hong, Hang Dai, and Yong Ding. Cross-modality knowl- edge distillation network for monocular 3d object detection. InEuropean Conference on Computer Vision, pages 87–104. Springer, 2022. 2

work page 2022
[5]

Label propagation for deep semi-supervised learning

Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, and Ondrej Chum. Label propagation for deep semi-supervised learning. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 5070–5079, 2019. 3

work page 2019
[6]

Semi- supervised 3d object detection with channel augmentation using transformation equivariance

Minju Kang, Taehun Kong, and Tae-Kyun Kim. Semi- supervised 3d object detection with channel augmentation using transformation equivariance. In2024 IEEE Interna- tional Conference on Image Processing (ICIP), pages 638–

work page
[7]

Pointpillars: Fast encoders for object detection from point clouds

Alex H Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. Pointpillars: Fast encoders for object detection from point clouds. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12697–12705, 2019. 2, 3

work page 2019
[8]

Pseudo-label: The simple and effi- cient semi-supervised learning method for deep neural net- works

Dong-Hyun Lee et al. Pseudo-label: The simple and effi- cient semi-supervised learning method for deep neural net- works. InWorkshop on challenges in representation learn- ing, ICML, page 896. Atlanta, 2013. 3

work page 2013
[9]

Dds3d: Dense pseudo-labels with dynamic threshold for semi-supervised 3d object detection

Jingyu Li, Zhe Liu, Jinghua Hou, and Dingkang Liang. Dds3d: Dense pseudo-labels with dynamic threshold for semi-supervised 3d object detection. In2023 IEEE Inter- national Conference on Robotics and Automation (ICRA), pages 9245–9252. IEEE, 2023. 1, 2, 3, 7

work page 2023
[10]

Semireward: A general reward model for semi-supervised learning.arXiv preprint arXiv:2310.03013, 2023

Siyuan Li, Weiyang Jin, Zedong Wang, Fang Wu, Zicheng Liu, Cheng Tan, and Stan Z Li. Semireward: A general reward model for semi-supervised learning.arXiv preprint arXiv:2310.03013, 2023. 3

work page arXiv 2023
[11]

Lidar r-cnn: An efficient and universal 3d object detector

Zhichao Li, Feng Wang, and Naiyan Wang. Lidar r-cnn: An efficient and universal 3d object detector. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7546–7555, 2021. 3

work page 2021
[12]

Hierarchical supervision and shuffle data augmentation for 3d semi-supervised ob- ject detection

Chuandong Liu, Chenqiang Gao, Fangcen Liu, Pengcheng Li, Deyu Meng, and Xinbo Gao. Hierarchical supervision and shuffle data augmentation for 3d semi-supervised ob- ject detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23819– 23828, 2023. 1, 2, 3, 4, 5, 6, 7, 8

work page 2023
[13]

Pyramid r-cnn: Towards bet- ter performance and adaptability for 3d object detection

Jiageng Mao, Minzhe Niu, Haoyue Bai, Xiaodan Liang, Hang Xu, and Chunjing Xu. Pyramid r-cnn: Towards bet- ter performance and adaptability for 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2723–2732, 2021. 3

work page 2021
[14]

V oxel transformer for 3d object detection

Jiageng Mao, Yujing Xue, Minzhe Niu, Haoyue Bai, Ji- ashi Feng, Xiaodan Liang, Hang Xu, and Chunjing Xu. V oxel transformer for 3d object detection. InProceedings of the IEEE/CVF international conference on computer vision, pages 3164–3173, 2021. 2

work page 2021
[15]

Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, and Shin Ishii. Virtual adversarial training: a regularization method for supervised and semi-supervised learning.IEEE transactions on pattern analysis and machine intelligence, 41(8):1979–1993, 2018. 3

work page 1979
[16]

Reliable student: Addressing noise in semi-supervised 3d object detection

Farzad Nozarian, Shashank Agarwal, Farzaneh Rezaeia- naran, Danish Shahzad, Atanas Poibrenski, Christian M¨uller, and Philipp Slusallek. Reliable student: Addressing noise in semi-supervised 3d object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4981–4990, 2023. 1, 3, 7

work page 2023
[17]

3d object detection with pointformer

Xuran Pan, Zhuofan Xia, Shiji Song, Li Erran Li, and Gao Huang. 3d object detection with pointformer. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7463–7472, 2021. 2

work page 2021
[18]

Clocs: Camera- lidar object candidates fusion for 3d object detection

Su Pang, Daniel Morris, and Hayder Radha. Clocs: Camera- lidar object candidates fusion for 3d object detection. In2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10386–10393. IEEE, 2020. 4

work page 2020
[19]

Fast-clocs: Fast camera-lidar object candidates fusion for 3d object detection

Su Pang, Daniel Morris, and Hayder Radha. Fast-clocs: Fast camera-lidar object candidates fusion for 3d object detection. InProceedings of the IEEE/CVF Winter Conference on Ap- plications of Computer Vision, pages 187–196, 2022. 4

work page 2022
[20]

Detmatch: Two teachers are bet- ter than one for joint 2d and 3d semi-supervised object de- tection

Jinhyung Park, Chenfeng Xu, Yiyang Zhou, Masayoshi Tomizuka, and Wei Zhan. Detmatch: Two teachers are bet- ter than one for joint 2d and 3d semi-supervised object de- tection. InEuropean Conference on Computer Vision, pages 370–389. Springer, 2022. 1, 3, 6, 7

work page 2022
[21]

Pointnet: Deep learning on point sets for 3d classification and segmentation

Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660,

work page
[22]

Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in neural information processing systems, 30, 2017

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in neural information processing systems, 30, 2017. 2

work page 2017
[23]

Deep hough voting for 3d object detection in point clouds

Charles R Qi, Or Litany, Kaiming He, and Leonidas J Guibas. Deep hough voting for 3d object detection in point clouds. Inproceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 9277–9286, 2019. 2

work page 2019
[24]

Temporal ensembling for semi- supervised learning

Laine Samuli and Aila Timo. Temporal ensembling for semi- supervised learning. InInternational Conference on Learn- ing Representations (ICLR), page 6, 2017. 3

work page 2017
[25]

Improving 3d object detection with channel-wise transformer

Hualian Sheng, Sijia Cai, Yuan Liu, Bing Deng, Jianqiang Huang, Xian-Sheng Hua, and Min-Jian Zhao. Improving 3d object detection with channel-wise transformer. InProceed- ings of the IEEE/CVF international conference on computer vision, pages 2743–2752, 2021. 3

work page 2021
[26]

Pointr- cnn: 3d object proposal generation and detection from point cloud

Shaoshuai Shi, Xiaogang Wang, and Hongsheng Li. Pointr- cnn: 3d object proposal generation and detection from point cloud. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 770–779, 2019. 2

work page 2019
[27]

Pv-rcnn: Point- voxel feature set abstraction for 3d object detection

Shaoshuai Shi, Chaoxu Guo, Li Jiang, Zhe Wang, Jianping Shi, Xiaogang Wang, and Hongsheng Li. Pv-rcnn: Point- voxel feature set abstraction for 3d object detection. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10529–10538, 2020. 3, 6, 1

work page 2020
[28]

Shaoshuai Shi, Zhe Wang, Jianping Shi, Xiaogang Wang, and Hongsheng Li. From points to parts: 3d object detection from point cloud with part-aware and part-aggregation net- work.IEEE transactions on pattern analysis and machine intelligence, 43(8):2647–2664, 2020. 2

work page 2020
[29]

Pv- rcnn++: Point-voxel feature set abstraction with local vector representation for 3d object detection.International Journal of Computer Vision, 131(2):531–551, 2023

Shaoshuai Shi, Li Jiang, Jiajun Deng, Zhe Wang, Chaoxu Guo, Jianping Shi, Xiaogang Wang, and Hongsheng Li. Pv- rcnn++: Point-voxel feature set abstraction with local vector representation for 3d object detection.International Journal of Computer Vision, 131(2):531–551, 2023. 3

work page 2023
[30]

Point-gnn: Graph neural net- work for 3d object detection in a point cloud

Weijing Shi and Raj Rajkumar. Point-gnn: Graph neural net- work for 3d object detection in a point cloud. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1711–1719, 2020. 2

work page 2020
[31]

Distance- normalized unified representation for monocular 3d object detection

Xuepeng Shi, Zhixiang Chen, and Tae-Kyun Kim. Distance- normalized unified representation for monocular 3d object detection. InEuropean Conference on Computer Vision, pages 91–107. Springer, 2020. 2

work page 2020
[32]

Geometry-based dis- tance decomposition for monocular 3d object detection

Xuepeng Shi, Qi Ye, Xiaozhi Chen, Chuangrong Chen, Zhixiang Chen, and Tae-Kyun Kim. Geometry-based dis- tance decomposition for monocular 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15172–15181, 2021

work page 2021
[33]

Multivari- ate probabilistic monocular 3d object detection

Xuepeng Shi, Zhixiang Chen, and Tae-Kyun Kim. Multivari- ate probabilistic monocular 3d object detection. InProceed- ings of the IEEE/CVF winter conference on applications of computer vision, pages 4281–4290, 2023. 2

work page 2023
[34]

Fixmatch: Simplifying semi-supervised learning with consistency and confidence

Kihyuk Sohn, David Berthelot, Nicholas Carlini, Zizhao Zhang, Han Zhang, Colin A Raffel, Ekin Dogus Cubuk, Alexey Kurakin, and Chun-Liang Li. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Advances in neural information processing systems, 33:596– 608, 2020. 3

work page 2020
[35]

A Simple Semi-Supervised Learning Framework for Object Detection , publisher =

Kihyuk Sohn, Zizhao Zhang, Chun-Liang Li, Han Zhang, Chen-Yu Lee, and Tomas Pfister. A simple semi-supervised learning framework for object detection.arXiv preprint arXiv:2005.04757, 2020. 3

work page arXiv 2005
[36]

Scalability in perception for autonomous driving: Waymo open dataset

Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, et al. Scalability in perception for autonomous driving: Waymo open dataset. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2446–2454, 2020. 6

work page 2020
[37]

Fourier features let networks learn high frequency functions in low dimen- sional domains.Advances in neural information processing systems, 33:7537–7547, 2020

Matthew Tancik, Pratul Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ra- mamoorthi, Jonathan Barron, and Ren Ng. Fourier features let networks learn high frequency functions in low dimen- sional domains.Advances in neural information processing systems, 33:7537–7547, 2020. 6

work page 2020
[38]

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results.Advances in neural information processing systems, 30, 2017

Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results.Advances in neural information processing systems, 30, 2017. 3

work page 2017
[39]

Not ev- ery side is equal: Localization uncertainty estimation for semi-supervised 3d object detection

Chuxin Wang, Wenfei Yang, and Tianzhu Zhang. Not ev- ery side is equal: Localization uncertainty estimation for semi-supervised 3d object detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3814–3824, 2023. 3

work page 2023
[40]

3dioumatch: Leveraging iou prediction for semi- supervised 3d object detection

He Wang, Yezhen Cong, Or Litany, Yue Gao, and Leonidas J Guibas. 3dioumatch: Leveraging iou prediction for semi- supervised 3d object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14615–14624, 2021. 1, 2, 3, 4, 7

work page 2021
[41]

A-teacher: Asymmetric network for 3d semi-supervised ob- ject detection

Hanshi Wang, Zhipeng Zhang, Jin Gao, and Weiming Hu. A-teacher: Asymmetric network for 3d semi-supervised ob- ject detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14978– 14987, 2024. 1, 3, 6, 7

work page 2024
[42]

Consistent-teacher: Towards reducing incon- sistent pseudo-targets in semi-supervised object detection

Xinjiang Wang, Xingyi Yang, Shilong Zhang, Yijiang Li, Litong Feng, Shijie Fang, Chengqi Lyu, Kai Chen, and Wayne Zhang. Consistent-teacher: Towards reducing incon- sistent pseudo-targets in semi-supervised object detection. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 3240–3249, 2023. 3

work page 2023
[43]

Pillar-based object detection for autonomous driving

Yue Wang, Alireza Fathi, Abhijit Kundu, David A Ross, Caroline Pantofaru, Tom Funkhouser, and Justin Solomon. Pillar-based object detection for autonomous driving. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16, pages 18–34. Springer, 2020. 2, 3

work page 2020
[44]

Semi-supervised 3d object detection with patchteacher and pillarmix

Xiaopei Wu, Liang Peng, Liang Xie, Yuenan Hou, Bin- bin Lin, Xiaoshui Huang, Haifeng Liu, Deng Cai, and Wanli Ouyang. Semi-supervised 3d object detection with patchteacher and pillarmix. InProceedings of the AAAI Con- ference on Artificial Intelligence, pages 6153–6161, 2024. 1, 3, 6, 7

work page 2024
[45]

Unsupervised data augmentation for consistency training.Advances in neural information processing systems, 33:6256–6268, 2020

Qizhe Xie, Zihang Dai, Eduard Hovy, Thang Luong, and Quoc Le. Unsupervised data augmentation for consistency training.Advances in neural information processing systems, 33:6256–6268, 2020. 3

work page 2020
[46]

Self-training with noisy student improves imagenet classification

Qizhe Xie, Minh-Thang Luong, Eduard Hovy, and Quoc V Le. Self-training with noisy student improves imagenet classification. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10687– 10698, 2020. 3

work page 2020
[47]

End-to- end semi-supervised object detection with soft teacher

Mengde Xu, Zheng Zhang, Han Hu, Jianfeng Wang, Lijuan Wang, Fangyun Wei, Xiang Bai, and Zicheng Liu. End-to- end semi-supervised object detection with soft teacher. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3060–3069, 2021. 3

work page 2021
[48]

Monocd: Monocular 3d object detection with complementary depths

Longfei Yan, Pei Yan, Shengzhou Xiong, Xuanyu Xiang, and Yihua Tan. Monocd: Monocular 3d object detection with complementary depths. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10248–10257, 2024. 2

work page 2024
[49]

Second: Sparsely embed- ded convolutional detection.Sensors, 18(10):3337, 2018

Yan Yan, Yuxing Mao, and Bo Li. Second: Sparsely embed- ded convolutional detection.Sensors, 18(10):3337, 2018. 2, 3, 4, 5

work page 2018
[50]

Std: Sparse-to-dense 3d object detector for point cloud

Zetong Yang, Yanan Sun, Shu Liu, Xiaoyong Shen, and Jiaya Jia. Std: Sparse-to-dense 3d object detector for point cloud. InProceedings of the IEEE/CVF international conference on computer vision, pages 1951–1960, 2019. 2

work page 1951
[51]

3dssd: Point-based 3d single stage object detector

Zetong Yang, Yanan Sun, Shu Liu, and Jiaya Jia. 3dssd: Point-based 3d single stage object detector. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11040–11048, 2020. 2

work page 2020
[52]

Semi- supervised 3d object detection with proficient teachers

Junbo Yin, Jin Fang, Dingfu Zhou, Liangjun Zhang, Cheng- Zhong Xu, Jianbing Shen, and Wenguan Wang. Semi- supervised 3d object detection with proficient teachers. In European Conference on Computer Vision, pages 727–743. Springer, 2022. 1, 3

work page 2022
[53]

Center- based 3d object detection and tracking

Tianwei Yin, Xingyi Zhou, and Philipp Krahenbuhl. Center- based 3d object detection and tracking. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11784–11793, 2021. 2

work page 2021
[54]

Csot: Cross-scan object transfer for semi- supervised lidar object detection

Jinglin Zhan, Tiejun Liu, Rengang Li, Zhaoxiang Zhang, and Yuntao Chen. Csot: Cross-scan object transfer for semi- supervised lidar object detection. InEuropean Conference on Computer Vision, 2024. 3

work page 2024
[55]

Flexmatch: Boosting semi-supervised learning with curricu- lum pseudo labeling.Advances in Neural Information Pro- cessing Systems, 34:18408–18419, 2021

Bowen Zhang, Yidong Wang, Wenxin Hou, Hao Wu, Jin- dong Wang, Manabu Okumura, and Takahiro Shinozaki. Flexmatch: Boosting semi-supervised learning with curricu- lum pseudo labeling.Advances in Neural Information Pro- cessing Systems, 34:18408–18419, 2021. 3

work page 2021
[56]

Atf-3d: Semi-supervised 3d object detection with adaptive thresholds filtering based on confidence and distance.IEEE Robotics and Automation Letters, 7(4):10573–10580, 2022

Zehan Zhang, Yang Ji, Wei Cui, Yulong Wang, Hao Li, Xian Zhao, Duo Li, Sanli Tang, Ming Yang, Wenming Tan, et al. Atf-3d: Semi-supervised 3d object detection with adaptive thresholds filtering based on confidence and distance.IEEE Robotics and Automation Letters, 7(4):10573–10580, 2022. 1, 2, 3, 8

work page 2022
[57]

Sess: Self- ensembling semi-supervised 3d object detection

Na Zhao, Tat-Seng Chua, and Gim Hee Lee. Sess: Self- ensembling semi-supervised 3d object detection. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11079–11087, 2020. 3

work page 2020
[58]

Instant-teaching: An end-to-end semi-supervised object detection framework

Qiang Zhou, Chaohui Yu, Zhibin Wang, Qi Qian, and Hao Li. Instant-teaching: An end-to-end semi-supervised object detection framework. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 4081–4090, 2021. 3

work page 2021
[59]

V oxelnet: End-to-end learning for point cloud based 3d object detection

Yin Zhou and Oncel Tuzel. V oxelnet: End-to-end learning for point cloud based 3d object detection. InProceedings of the IEEE conference on computer vision and pattern recog- nition, pages 4490–4499, 2018. 2 Learning Adaptive Pseudo-Label Selection for Semi-Supervised 3D Object Detection Supplementary Material PedestrianCyclistCar CTE thresholdPQE scoreCT...

work page 2018
[60]

and V oxel-RCNN [2] also incorporate a GT-IoU esti- mation module, similar to PQE. The key difference of PQE lies in that the pseudo-label quality is predicted more reli- ably by aggregating diverse information through a score fu- sion manner, including semantic scores and geometric con- sistency between original and augmented scenes. Fig. 4 in the main p...

work page

[1] [1]

Semi-supervised object detection with object- wise contrastive learning and regression uncertainty.arXiv preprint arXiv:2212.02747, 2022

Honggyu Choi, Zhixiang Chen, Xuepeng Shi, and Tae- Kyun Kim. Semi-supervised object detection with object- wise contrastive learning and regression uncertainty.arXiv preprint arXiv:2212.02747, 2022. 3

work page arXiv 2022

[2] [2]

V oxel r-cnn: Towards high performance voxel-based 3d object detection

Jiajun Deng, Shaoshuai Shi, Peiwei Li, Wengang Zhou, Yanyong Zhang, and Houqiang Li. V oxel r-cnn: Towards high performance voxel-based 3d object detection. InPro- ceedings of the AAAI conference on artificial intelligence, pages 1201–1209, 2021. 2, 3, 6, 1

work page 2021

[3] [3]

Are we ready for autonomous driving? the kitti vision benchmark suite

Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In2012 IEEE conference on computer vision and pat- tern recognition, pages 3354–3361. IEEE, 2012. 6

work page 2012

[4] [4]

Cross-modality knowl- edge distillation network for monocular 3d object detection

Yu Hong, Hang Dai, and Yong Ding. Cross-modality knowl- edge distillation network for monocular 3d object detection. InEuropean Conference on Computer Vision, pages 87–104. Springer, 2022. 2

work page 2022

[5] [5]

Label propagation for deep semi-supervised learning

Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, and Ondrej Chum. Label propagation for deep semi-supervised learning. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 5070–5079, 2019. 3

work page 2019

[6] [6]

Semi- supervised 3d object detection with channel augmentation using transformation equivariance

Minju Kang, Taehun Kong, and Tae-Kyun Kim. Semi- supervised 3d object detection with channel augmentation using transformation equivariance. In2024 IEEE Interna- tional Conference on Image Processing (ICIP), pages 638–

work page

[7] [7]

Pointpillars: Fast encoders for object detection from point clouds

Alex H Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. Pointpillars: Fast encoders for object detection from point clouds. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12697–12705, 2019. 2, 3

work page 2019

[8] [8]

Pseudo-label: The simple and effi- cient semi-supervised learning method for deep neural net- works

Dong-Hyun Lee et al. Pseudo-label: The simple and effi- cient semi-supervised learning method for deep neural net- works. InWorkshop on challenges in representation learn- ing, ICML, page 896. Atlanta, 2013. 3

work page 2013

[9] [9]

Dds3d: Dense pseudo-labels with dynamic threshold for semi-supervised 3d object detection

Jingyu Li, Zhe Liu, Jinghua Hou, and Dingkang Liang. Dds3d: Dense pseudo-labels with dynamic threshold for semi-supervised 3d object detection. In2023 IEEE Inter- national Conference on Robotics and Automation (ICRA), pages 9245–9252. IEEE, 2023. 1, 2, 3, 7

work page 2023

[10] [10]

Semireward: A general reward model for semi-supervised learning.arXiv preprint arXiv:2310.03013, 2023

Siyuan Li, Weiyang Jin, Zedong Wang, Fang Wu, Zicheng Liu, Cheng Tan, and Stan Z Li. Semireward: A general reward model for semi-supervised learning.arXiv preprint arXiv:2310.03013, 2023. 3

work page arXiv 2023

[11] [11]

Lidar r-cnn: An efficient and universal 3d object detector

Zhichao Li, Feng Wang, and Naiyan Wang. Lidar r-cnn: An efficient and universal 3d object detector. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7546–7555, 2021. 3

work page 2021

[12] [12]

Hierarchical supervision and shuffle data augmentation for 3d semi-supervised ob- ject detection

Chuandong Liu, Chenqiang Gao, Fangcen Liu, Pengcheng Li, Deyu Meng, and Xinbo Gao. Hierarchical supervision and shuffle data augmentation for 3d semi-supervised ob- ject detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23819– 23828, 2023. 1, 2, 3, 4, 5, 6, 7, 8

work page 2023

[13] [13]

Pyramid r-cnn: Towards bet- ter performance and adaptability for 3d object detection

Jiageng Mao, Minzhe Niu, Haoyue Bai, Xiaodan Liang, Hang Xu, and Chunjing Xu. Pyramid r-cnn: Towards bet- ter performance and adaptability for 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2723–2732, 2021. 3

work page 2021

[14] [14]

V oxel transformer for 3d object detection

Jiageng Mao, Yujing Xue, Minzhe Niu, Haoyue Bai, Ji- ashi Feng, Xiaodan Liang, Hang Xu, and Chunjing Xu. V oxel transformer for 3d object detection. InProceedings of the IEEE/CVF international conference on computer vision, pages 3164–3173, 2021. 2

work page 2021

[15] [15]

Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, and Shin Ishii. Virtual adversarial training: a regularization method for supervised and semi-supervised learning.IEEE transactions on pattern analysis and machine intelligence, 41(8):1979–1993, 2018. 3

work page 1979

[16] [16]

Reliable student: Addressing noise in semi-supervised 3d object detection

Farzad Nozarian, Shashank Agarwal, Farzaneh Rezaeia- naran, Danish Shahzad, Atanas Poibrenski, Christian M¨uller, and Philipp Slusallek. Reliable student: Addressing noise in semi-supervised 3d object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4981–4990, 2023. 1, 3, 7

work page 2023

[17] [17]

3d object detection with pointformer

Xuran Pan, Zhuofan Xia, Shiji Song, Li Erran Li, and Gao Huang. 3d object detection with pointformer. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7463–7472, 2021. 2

work page 2021

[18] [18]

Clocs: Camera- lidar object candidates fusion for 3d object detection

Su Pang, Daniel Morris, and Hayder Radha. Clocs: Camera- lidar object candidates fusion for 3d object detection. In2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10386–10393. IEEE, 2020. 4

work page 2020

[19] [19]

Fast-clocs: Fast camera-lidar object candidates fusion for 3d object detection

Su Pang, Daniel Morris, and Hayder Radha. Fast-clocs: Fast camera-lidar object candidates fusion for 3d object detection. InProceedings of the IEEE/CVF Winter Conference on Ap- plications of Computer Vision, pages 187–196, 2022. 4

work page 2022

[20] [20]

Detmatch: Two teachers are bet- ter than one for joint 2d and 3d semi-supervised object de- tection

Jinhyung Park, Chenfeng Xu, Yiyang Zhou, Masayoshi Tomizuka, and Wei Zhan. Detmatch: Two teachers are bet- ter than one for joint 2d and 3d semi-supervised object de- tection. InEuropean Conference on Computer Vision, pages 370–389. Springer, 2022. 1, 3, 6, 7

work page 2022

[21] [21]

Pointnet: Deep learning on point sets for 3d classification and segmentation

Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660,

work page

[22] [22]

Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in neural information processing systems, 30, 2017

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in neural information processing systems, 30, 2017. 2

work page 2017

[23] [23]

Deep hough voting for 3d object detection in point clouds

Charles R Qi, Or Litany, Kaiming He, and Leonidas J Guibas. Deep hough voting for 3d object detection in point clouds. Inproceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 9277–9286, 2019. 2

work page 2019

[24] [24]

Temporal ensembling for semi- supervised learning

Laine Samuli and Aila Timo. Temporal ensembling for semi- supervised learning. InInternational Conference on Learn- ing Representations (ICLR), page 6, 2017. 3

work page 2017

[25] [25]

Improving 3d object detection with channel-wise transformer

Hualian Sheng, Sijia Cai, Yuan Liu, Bing Deng, Jianqiang Huang, Xian-Sheng Hua, and Min-Jian Zhao. Improving 3d object detection with channel-wise transformer. InProceed- ings of the IEEE/CVF international conference on computer vision, pages 2743–2752, 2021. 3

work page 2021

[26] [26]

Pointr- cnn: 3d object proposal generation and detection from point cloud

Shaoshuai Shi, Xiaogang Wang, and Hongsheng Li. Pointr- cnn: 3d object proposal generation and detection from point cloud. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 770–779, 2019. 2

work page 2019

[27] [27]

Pv-rcnn: Point- voxel feature set abstraction for 3d object detection

Shaoshuai Shi, Chaoxu Guo, Li Jiang, Zhe Wang, Jianping Shi, Xiaogang Wang, and Hongsheng Li. Pv-rcnn: Point- voxel feature set abstraction for 3d object detection. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10529–10538, 2020. 3, 6, 1

work page 2020

[28] [28]

Shaoshuai Shi, Zhe Wang, Jianping Shi, Xiaogang Wang, and Hongsheng Li. From points to parts: 3d object detection from point cloud with part-aware and part-aggregation net- work.IEEE transactions on pattern analysis and machine intelligence, 43(8):2647–2664, 2020. 2

work page 2020

[29] [29]

Pv- rcnn++: Point-voxel feature set abstraction with local vector representation for 3d object detection.International Journal of Computer Vision, 131(2):531–551, 2023

Shaoshuai Shi, Li Jiang, Jiajun Deng, Zhe Wang, Chaoxu Guo, Jianping Shi, Xiaogang Wang, and Hongsheng Li. Pv- rcnn++: Point-voxel feature set abstraction with local vector representation for 3d object detection.International Journal of Computer Vision, 131(2):531–551, 2023. 3

work page 2023

[30] [30]

Point-gnn: Graph neural net- work for 3d object detection in a point cloud

Weijing Shi and Raj Rajkumar. Point-gnn: Graph neural net- work for 3d object detection in a point cloud. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1711–1719, 2020. 2

work page 2020

[31] [31]

Distance- normalized unified representation for monocular 3d object detection

Xuepeng Shi, Zhixiang Chen, and Tae-Kyun Kim. Distance- normalized unified representation for monocular 3d object detection. InEuropean Conference on Computer Vision, pages 91–107. Springer, 2020. 2

work page 2020

[32] [32]

Geometry-based dis- tance decomposition for monocular 3d object detection

Xuepeng Shi, Qi Ye, Xiaozhi Chen, Chuangrong Chen, Zhixiang Chen, and Tae-Kyun Kim. Geometry-based dis- tance decomposition for monocular 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15172–15181, 2021

work page 2021

[33] [33]

Multivari- ate probabilistic monocular 3d object detection

Xuepeng Shi, Zhixiang Chen, and Tae-Kyun Kim. Multivari- ate probabilistic monocular 3d object detection. InProceed- ings of the IEEE/CVF winter conference on applications of computer vision, pages 4281–4290, 2023. 2

work page 2023

[34] [34]

Fixmatch: Simplifying semi-supervised learning with consistency and confidence

Kihyuk Sohn, David Berthelot, Nicholas Carlini, Zizhao Zhang, Han Zhang, Colin A Raffel, Ekin Dogus Cubuk, Alexey Kurakin, and Chun-Liang Li. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Advances in neural information processing systems, 33:596– 608, 2020. 3

work page 2020

[35] [35]

A Simple Semi-Supervised Learning Framework for Object Detection , publisher =

Kihyuk Sohn, Zizhao Zhang, Chun-Liang Li, Han Zhang, Chen-Yu Lee, and Tomas Pfister. A simple semi-supervised learning framework for object detection.arXiv preprint arXiv:2005.04757, 2020. 3

work page arXiv 2005

[36] [36]

Scalability in perception for autonomous driving: Waymo open dataset

Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, et al. Scalability in perception for autonomous driving: Waymo open dataset. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2446–2454, 2020. 6

work page 2020

[37] [37]

Fourier features let networks learn high frequency functions in low dimen- sional domains.Advances in neural information processing systems, 33:7537–7547, 2020

Matthew Tancik, Pratul Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ra- mamoorthi, Jonathan Barron, and Ren Ng. Fourier features let networks learn high frequency functions in low dimen- sional domains.Advances in neural information processing systems, 33:7537–7547, 2020. 6

work page 2020

[38] [38]

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results.Advances in neural information processing systems, 30, 2017

Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results.Advances in neural information processing systems, 30, 2017. 3

work page 2017

[39] [39]

Not ev- ery side is equal: Localization uncertainty estimation for semi-supervised 3d object detection

Chuxin Wang, Wenfei Yang, and Tianzhu Zhang. Not ev- ery side is equal: Localization uncertainty estimation for semi-supervised 3d object detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3814–3824, 2023. 3

work page 2023

[40] [40]

3dioumatch: Leveraging iou prediction for semi- supervised 3d object detection

He Wang, Yezhen Cong, Or Litany, Yue Gao, and Leonidas J Guibas. 3dioumatch: Leveraging iou prediction for semi- supervised 3d object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14615–14624, 2021. 1, 2, 3, 4, 7

work page 2021

[41] [41]

A-teacher: Asymmetric network for 3d semi-supervised ob- ject detection

Hanshi Wang, Zhipeng Zhang, Jin Gao, and Weiming Hu. A-teacher: Asymmetric network for 3d semi-supervised ob- ject detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14978– 14987, 2024. 1, 3, 6, 7

work page 2024

[42] [42]

Consistent-teacher: Towards reducing incon- sistent pseudo-targets in semi-supervised object detection

Xinjiang Wang, Xingyi Yang, Shilong Zhang, Yijiang Li, Litong Feng, Shijie Fang, Chengqi Lyu, Kai Chen, and Wayne Zhang. Consistent-teacher: Towards reducing incon- sistent pseudo-targets in semi-supervised object detection. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 3240–3249, 2023. 3

work page 2023

[43] [43]

Pillar-based object detection for autonomous driving

Yue Wang, Alireza Fathi, Abhijit Kundu, David A Ross, Caroline Pantofaru, Tom Funkhouser, and Justin Solomon. Pillar-based object detection for autonomous driving. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16, pages 18–34. Springer, 2020. 2, 3

work page 2020

[44] [44]

Semi-supervised 3d object detection with patchteacher and pillarmix

Xiaopei Wu, Liang Peng, Liang Xie, Yuenan Hou, Bin- bin Lin, Xiaoshui Huang, Haifeng Liu, Deng Cai, and Wanli Ouyang. Semi-supervised 3d object detection with patchteacher and pillarmix. InProceedings of the AAAI Con- ference on Artificial Intelligence, pages 6153–6161, 2024. 1, 3, 6, 7

work page 2024

[45] [45]

Unsupervised data augmentation for consistency training.Advances in neural information processing systems, 33:6256–6268, 2020

Qizhe Xie, Zihang Dai, Eduard Hovy, Thang Luong, and Quoc Le. Unsupervised data augmentation for consistency training.Advances in neural information processing systems, 33:6256–6268, 2020. 3

work page 2020

[46] [46]

Self-training with noisy student improves imagenet classification

Qizhe Xie, Minh-Thang Luong, Eduard Hovy, and Quoc V Le. Self-training with noisy student improves imagenet classification. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10687– 10698, 2020. 3

work page 2020

[47] [47]

End-to- end semi-supervised object detection with soft teacher

Mengde Xu, Zheng Zhang, Han Hu, Jianfeng Wang, Lijuan Wang, Fangyun Wei, Xiang Bai, and Zicheng Liu. End-to- end semi-supervised object detection with soft teacher. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3060–3069, 2021. 3

work page 2021

[48] [48]

Monocd: Monocular 3d object detection with complementary depths

Longfei Yan, Pei Yan, Shengzhou Xiong, Xuanyu Xiang, and Yihua Tan. Monocd: Monocular 3d object detection with complementary depths. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10248–10257, 2024. 2

work page 2024

[49] [49]

Second: Sparsely embed- ded convolutional detection.Sensors, 18(10):3337, 2018

Yan Yan, Yuxing Mao, and Bo Li. Second: Sparsely embed- ded convolutional detection.Sensors, 18(10):3337, 2018. 2, 3, 4, 5

work page 2018

[50] [50]

Std: Sparse-to-dense 3d object detector for point cloud

Zetong Yang, Yanan Sun, Shu Liu, Xiaoyong Shen, and Jiaya Jia. Std: Sparse-to-dense 3d object detector for point cloud. InProceedings of the IEEE/CVF international conference on computer vision, pages 1951–1960, 2019. 2

work page 1951

[51] [51]

3dssd: Point-based 3d single stage object detector

Zetong Yang, Yanan Sun, Shu Liu, and Jiaya Jia. 3dssd: Point-based 3d single stage object detector. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11040–11048, 2020. 2

work page 2020

[52] [52]

Semi- supervised 3d object detection with proficient teachers

Junbo Yin, Jin Fang, Dingfu Zhou, Liangjun Zhang, Cheng- Zhong Xu, Jianbing Shen, and Wenguan Wang. Semi- supervised 3d object detection with proficient teachers. In European Conference on Computer Vision, pages 727–743. Springer, 2022. 1, 3

work page 2022

[53] [53]

Center- based 3d object detection and tracking

Tianwei Yin, Xingyi Zhou, and Philipp Krahenbuhl. Center- based 3d object detection and tracking. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11784–11793, 2021. 2

work page 2021

[54] [54]

Csot: Cross-scan object transfer for semi- supervised lidar object detection

Jinglin Zhan, Tiejun Liu, Rengang Li, Zhaoxiang Zhang, and Yuntao Chen. Csot: Cross-scan object transfer for semi- supervised lidar object detection. InEuropean Conference on Computer Vision, 2024. 3

work page 2024

[55] [55]

Flexmatch: Boosting semi-supervised learning with curricu- lum pseudo labeling.Advances in Neural Information Pro- cessing Systems, 34:18408–18419, 2021

Bowen Zhang, Yidong Wang, Wenxin Hou, Hao Wu, Jin- dong Wang, Manabu Okumura, and Takahiro Shinozaki. Flexmatch: Boosting semi-supervised learning with curricu- lum pseudo labeling.Advances in Neural Information Pro- cessing Systems, 34:18408–18419, 2021. 3

work page 2021

[56] [56]

Atf-3d: Semi-supervised 3d object detection with adaptive thresholds filtering based on confidence and distance.IEEE Robotics and Automation Letters, 7(4):10573–10580, 2022

Zehan Zhang, Yang Ji, Wei Cui, Yulong Wang, Hao Li, Xian Zhao, Duo Li, Sanli Tang, Ming Yang, Wenming Tan, et al. Atf-3d: Semi-supervised 3d object detection with adaptive thresholds filtering based on confidence and distance.IEEE Robotics and Automation Letters, 7(4):10573–10580, 2022. 1, 2, 3, 8

work page 2022

[57] [57]

Sess: Self- ensembling semi-supervised 3d object detection

Na Zhao, Tat-Seng Chua, and Gim Hee Lee. Sess: Self- ensembling semi-supervised 3d object detection. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11079–11087, 2020. 3

work page 2020

[58] [58]

Instant-teaching: An end-to-end semi-supervised object detection framework

Qiang Zhou, Chaohui Yu, Zhibin Wang, Qi Qian, and Hao Li. Instant-teaching: An end-to-end semi-supervised object detection framework. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 4081–4090, 2021. 3

work page 2021

[59] [59]

V oxelnet: End-to-end learning for point cloud based 3d object detection

Yin Zhou and Oncel Tuzel. V oxelnet: End-to-end learning for point cloud based 3d object detection. InProceedings of the IEEE conference on computer vision and pattern recog- nition, pages 4490–4499, 2018. 2 Learning Adaptive Pseudo-Label Selection for Semi-Supervised 3D Object Detection Supplementary Material PedestrianCyclistCar CTE thresholdPQE scoreCT...

work page 2018

[60] [60]

and V oxel-RCNN [2] also incorporate a GT-IoU esti- mation module, similar to PQE. The key difference of PQE lies in that the pseudo-label quality is predicted more reli- ably by aggregating diverse information through a score fu- sion manner, including semantic scores and geometric con- sistency between original and augmented scenes. Fig. 4 in the main p...

work page