Unsupervised Multi-agent and Single-agent Perception from Cooperative Views

Baolu Li; Delin Ren; Haochen Yang; Hongkai Yu; Jiacheng Guo; Lei Li; Minghai Qin; Tianyun Zhang

arxiv: 2604.05354 · v1 · submitted 2026-04-07 · 💻 cs.CV

Unsupervised Multi-agent and Single-agent Perception from Cooperative Views

Haochen Yang , Baolu Li , Lei Li , Delin Ren , Jiacheng Guo , Minghai Qin , Tianyun Zhang , Hongkai Yu This is my paper

Pith reviewed 2026-05-10 20:13 UTC · model grok-4.3

classification 💻 cs.CV

keywords unsupervised 3D detectionmulti-agent perceptionLiDAR point cloudscooperative viewspseudo labelscross-view learningautonomous vehiclesobject classification

0 comments

The pith

Unsupervised sharing of LiDAR data among agents improves 3D object detection for both the group and each individual agent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that multi-agent cooperation without labels can solve two perception problems at once. Sharing sensor data creates denser point clouds that aid unsupervised classification of object candidates, while the combined cooperative view supplies guidance that trains a reliable detector from any single agent's partial view. This matters for robot fleets and automated vehicles because it removes the need for expensive human annotations when scaling perception training across many units. The authors build a framework that purifies proposals from the fused clouds, generates stable pseudo labels through easy-to-hard progression, and enforces consensus learning so the cooperative view supervises the single view. Experiments on two public datasets confirm higher detection accuracy than prior unsupervised baselines in both multi-agent and single-agent tasks.

Core claim

By discovering that improved point cloud density from multi-agent sharing benefits unsupervised object classification and that the cooperative view can serve as reliable unsupervised guidance for single-view 3D detection, the authors introduce the UMS framework. It combines a learning-based Proposal Purifying Filter applied after density cooperation, a Progressive Proposal Stabilizing module that yields pseudo labels via curriculum learning, and Cross-View Consensus Learning that transfers supervision from the multi-agent view to single-agent detection models.

What carries the argument

Cross-View Consensus Learning, which uses the multi-agent cooperative view to provide unsupervised guidance that trains the 3D object detection model operating on a single agent's view.

If this is right

Multi-agent point cloud sharing increases density and thereby improves unsupervised classification accuracy for object proposals.
The cooperative view supplies usable guidance that raises single-agent detection performance above prior unsupervised methods.
A single framework can handle both multi-agent and single-agent perception tasks without requiring human annotations.
Progressive curriculum learning from easy to hard cases stabilizes the creation of pseudo labels during training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Fleets of vehicles could train perception models continuously during normal operation by exchanging data over communication links.
The same density and consensus principles might apply to camera or radar data if alignment can be maintained across agents.
Performance gains may shrink in scenes with limited view overlap, pointing to a possible need for selective data fusion.

Load-bearing premise

The assumption that point cloud data shared from multiple agents remains sufficiently aligned and noise-free to produce reliable unsupervised signals for both proposal classification and cross-view guidance.

What would settle it

Disable the Cross-View Consensus Learning component and measure whether single-agent 3D detection accuracy on the OPV2V dataset falls to the level of existing unsupervised single-agent baselines.

Figures

Figures reproduced from arXiv: 2604.05354 by Baolu Li, Delin Ren, Haochen Yang, Hongkai Yu, Jiacheng Guo, Lei Li, Minghai Qin, Tianyun Zhang.

**Figure 1.** Figure 1: Illustration of Benefits from Cooperative Views. (a) Point Cloud Density Benefit, (b) Cross-View Consensus Benefit. We use Vehicle-to-Vehicle (V2V) cooperative perception [26] in Connected Autonomous Vehicles (CAV) as an example here. large-scale human-annotated 3D bounding boxes of the objects, which are always not available in many real-world applications. Therefore, is it possible to jointly solve mult… view at source ↗

**Figure 2.** Figure 2: UMS Pipeline. The system jointly trains two 3D object detectors for multi-agent and single-agent perception. Candidate Proposals are first generated from two initialized weak detectors and then refined by (i) Proposal Purifying Filter (PPF), (ii) Progressive Proposal Stabilizing (PPS), and (iii) Cross-View Consensus Learning (CCL), enabling robust pseudo supervision without human annotations. two weak det… view at source ↗

**Figure 3.** Figure 3: Illustration of the proposed modules. (a) Proposal Purifying Filter (PPF) learns an instance-level filter/classifier to remove unreliable proposals. (b) Progressive Proposal Stabilizing (PPS) maintains a memory bank and adaptively fuses historical and current pseudo labels for stability. Low Confidence Proposals High Confidence Proposals TP FP Confidence Bins Count [0, 0.01] [0.01, 0.02][0.02, 0.05] [0.05,… view at source ↗

**Figure 4.** Figure 4: Confidence–TP/FP statistics with improved point cloud density under multi-agent setting. Based on the initialized weak detector Dm on the V2V4Real [27] training set, the distribution gap of True Positives (TP) and False Positives (FP) across confidence makes self-supervised classification possible. 3.3. Progressive Proposal Stabilizing Although PPF removes unreliable proposals, the remaining ones may sti… view at source ↗

**Figure 5.** Figure 5: Qualitative Comparison of Pseudo Labels on V2V4Real Training Set. Green boxes: ground truths, Orange boxes: pseudo labels, Red arrows: false positives, Blue arrows: false negatives. Multi-agent dense fused point clouds are shown here [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Effect of PPS on Pseudo-Label Quality. Pseudolabel AP@0.3 / AP@0.5 curves across refinement iterations on OPV2V [26] under the multi-agent setting. 4.3. Ablation Study and Discussion Ablation Study of Each Component [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

read the original abstract

The LiDAR-based multi-agent and single-agent perception has shown promising performance in environmental understanding for robots and automated vehicles. However, there is no existing method that simultaneously solves both multi-agent and single-agent perception in an unsupervised way. By sharing sensor data between multiple agents via communication, this paper discovers two key insights: 1) Improved point cloud density after the data sharing from cooperative views could benefit unsupervised object classification, 2) Cooperative view of multiple agents can be used as unsupervised guidance for the 3D object detection in the single view. Based on these two discovered insights, we propose an Unsupervised Multi-agent and Single-agent (UMS) perception framework that leverages multi-agent cooperation without human annotations to simultaneously solve multi-agent and single-agent perception. UMS combines a learning-based Proposal Purifying Filter to better classify the candidate proposals after multi-agent point cloud density cooperation, followed by a Progressive Proposal Stabilizing module to yield reliable pseudo labels by the easy-to-hard curriculum learning. Furthermore, we design a Cross-View Consensus Learning to use multi-agent cooperative view to guide detection in single-agent view. Experimental results on two public datasets V2V4Real and OPV2V show that our UMS method achieved significantly higher 3D detection performance than the state-of-the-art methods on both multi-agent and single-agent perception tasks in an unsupervised setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces the first unsupervised framework that jointly tackles multi-agent and single-agent LiDAR 3D detection via cooperative views, but the reported gains rest on unablated assumptions about density and consensus.

read the letter

The core contribution is a new unsupervised setup that claims to solve both multi-agent fusion and single-agent detection at once. It rests on two observations: denser fused point clouds help unsupervised proposal classification, and the cooperative view can supply reliable guidance for single-view detection. From those, the authors build a Proposal Purifying Filter, a Progressive Proposal Stabilizing curriculum, and Cross-View Consensus Learning, then test on V2V4Real and OPV2V. The numbers beat prior unsupervised baselines on both tasks, which is the main empirical result. That combination of ideas is new; no earlier work is described as handling the joint unsupervised case. The framework is straightforward to follow and targets a practical pain point in robotics and automated driving where labels are expensive. The soft spots are the missing controls. The results compare the full system against baselines but do not isolate what the purifying filter or consensus module add once the curriculum and backbone are fixed. Real registration errors between agents, which are common in V2V data, are not measured for their effect on density or pseudo-label quality. Without those checks it is difficult to attribute the lift to the stated insights rather than other design choices. The paper is aimed at researchers working on cooperative perception who are willing to explore unsupervised routes. A reader focused on multi-agent 3D detection would find the ideas worth examining even if the validation needs tightening. I would send it for peer review because the problem matters and the approach is fresh, though the authors should expect requests for ablations and robustness tests on registration noise.

Referee Report

2 major / 1 minor

Summary. The paper proposes an Unsupervised Multi-agent and Single-agent (UMS) perception framework for LiDAR-based 3D object detection. It identifies two insights from cooperative multi-agent data sharing—improved point cloud density aiding unsupervised classification and cooperative views providing reliable guidance for single-view detection—and builds a system with a learning-based Proposal Purifying Filter, Progressive Proposal Stabilizing curriculum for pseudo-labels, and Cross-View Consensus Learning. Experiments on V2V4Real and OPV2V datasets claim significantly higher 3D detection mAP than prior unsupervised methods for both multi-agent and single-agent tasks.

Significance. If the reported gains hold and can be attributed to the claimed mechanisms rather than other factors, the work would be significant as the first unsupervised method addressing both multi-agent and single-agent perception simultaneously. It targets a practical gap in cooperative autonomous driving scenarios where annotations are costly, and the curriculum plus consensus approach is a plausible way to generate reliable pseudo-labels from density and cross-view signals.

major comments (2)

[Experiments] Experiments section: full-system results on V2V4Real and OPV2V are reported against unsupervised baselines, but no ablation studies isolate the contribution of the Proposal Purifying Filter or Cross-View Consensus Learning after controlling for the Progressive Proposal Stabilizing curriculum and any shared backbone architecture. Without these controls, the performance delta cannot be confidently attributed to the two key insights.
[Method] Method and Experiments: registration errors between agents are inevitable in real V2V data yet are neither quantified nor analyzed for their effect on fused point-cloud density or on the reliability of pseudo-label guidance in Cross-View Consensus Learning; this leaves open whether the claimed benefits survive realistic alignment noise.

minor comments (1)

[Abstract] Abstract and Experiments: the specific detection metrics (e.g., mAP at which IoU threshold) and exact baseline implementations are not stated, making direct comparison difficult.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the contributions and limitations of our work. We address each major comment below and will revise the manuscript accordingly to strengthen the experimental validation and robustness analysis.

read point-by-point responses

Referee: [Experiments] Experiments section: full-system results on V2V4Real and OPV2V are reported against unsupervised baselines, but no ablation studies isolate the contribution of the Proposal Purifying Filter or Cross-View Consensus Learning after controlling for the Progressive Proposal Stabilizing curriculum and any shared backbone architecture. Without these controls, the performance delta cannot be confidently attributed to the two key insights.

Authors: We agree that the current experiments report full-system results without isolating the individual contributions of the Proposal Purifying Filter and Cross-View Consensus Learning while holding the Progressive Proposal Stabilizing curriculum and backbone fixed. This makes it harder to attribute gains specifically to the two insights. In the revised manuscript, we will add controlled ablation studies that incrementally enable each component on top of the curriculum and shared architecture, reporting the corresponding mAP changes on both datasets. revision: yes
Referee: [Method] Method and Experiments: registration errors between agents are inevitable in real V2V data yet are neither quantified nor analyzed for their effect on fused point-cloud density or on the reliability of pseudo-label guidance in Cross-View Consensus Learning; this leaves open whether the claimed benefits survive realistic alignment noise.

Authors: We acknowledge that registration errors are a practical issue not explicitly quantified in the current version. The V2V4Real and OPV2V datasets supply pre-aligned cooperative point clouds, and our method assumes these alignments as is standard in the cooperative perception literature. To address the concern, the revised manuscript will include a sensitivity analysis that injects controlled registration noise into the fused views and measures the resulting degradation in point-cloud density and in the quality of cross-view pseudo-label guidance. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method built from stated insights without self-referential reduction

full rationale

The paper states two insights from multi-agent data sharing (density benefits for classification; cooperative views for guidance), then designs modules (Proposal Purifying Filter, Progressive Proposal Stabilizing, Cross-View Consensus Learning) on that basis. No equations or claims reduce a prediction or result to a fitted parameter or self-defined quantity by construction. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work are present in the provided text. Performance is reported via empirical mAP comparisons on V2V4Real and OPV2V against baselines, not forced outputs. This is a standard non-circular empirical derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests primarily on two domain assumptions about the benefits of cooperative views for unsupervised tasks, with no free parameters or invented entities described in the abstract.

axioms (2)

domain assumption Improved point cloud density after the data sharing from cooperative views could benefit unsupervised object classification
Listed as key insight 1 in the abstract.
domain assumption Cooperative view of multiple agents can be used as unsupervised guidance for the 3D object detection in the single view
Listed as key insight 2 in the abstract.

pith-pipeline@v0.9.0 · 5559 in / 1289 out tokens · 41166 ms · 2026-05-10T20:13:54.908355+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Proposal Purifying Filter... Progressive Proposal Stabilizing... Cross-View Consensus Learning... on V2V4Real and OPV2V
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Improved point cloud density after multi-agent data sharing benefits unsupervised object classification

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages

[1]

An overview of the current challenges, trends, and protocols in the field of vehicular communication.Electron- ics, 11(21):3581, 2022

Waleed Albattah, Shabana Habib, Mohammed F Alsharekh, Muhammad Islam, Saleh Albahli, and Deshinta Arrova Dewi. An overview of the current challenges, trends, and protocols in the field of vehicular communication.Electron- ics, 11(21):3581, 2022. 2

work page 2022
[2]

Curriculum learning

Yoshua Bengio, J ´erˆome Louradour, Ronan Collobert, and Ja- son Weston. Curriculum learning. InInternational Confer- ence on Machine Learning, pages 41–48, 2009. 2, 4

work page 2009
[3]

A density-based algorithm for discovering clusters in large spatial databases with noise

Martin Ester, Hans-Peter Kriegel, J ¨org Sander, Xiaowei Xu, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. Inkdd, pages 226–231,

work page
[4]

Vision-language guidance for lidar-based unsupervised 3d object detection.arXiv preprint arXiv:2408.03790, 2024

Christian Fruhwirth-Reisinger, Wei Lin, Du ˇsan Mali´c, Horst Bischof, and Horst Possegger. Vision-language guidance for lidar-based unsupervised 3d object detection.arXiv preprint arXiv:2408.03790, 2024. 2

work page arXiv 2024
[5]

Where2comm: Communication-efficient collab- orative perception via spatial confidence maps.Advances in Neural Information Processing Systems, 35:4874–4886,

Yue Hu, Shaoheng Fang, Zixing Lei, Yiqi Zhong, and Si- heng Chen. Where2comm: Communication-efficient collab- orative perception via spatial confidence maps.Advances in Neural Information Processing Systems, 35:4874–4886,

work page
[6]

Collaboration helps camera overtake li- dar in 3d detection

Yue Hu, Yifan Lu, Runsheng Xu, Weidi Xie, Siheng Chen, and Yanfeng Wang. Collaboration helps camera overtake li- dar in 3d detection. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9243–9252, 2023. 2

work page 2023
[7]

Communication-efficient collaborative percep- tion via information filling with codebook

Yue Hu, Juntong Peng, Sifei Liu, Junhao Ge, Si Liu, and Si- heng Chen. Communication-efficient collaborative percep- tion via information filling with codebook. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15481–15490, 2024. 2

work page 2024
[8]

Vehicle-to-everything cooperative perception for autonomous driving.Proceedings of the IEEE, 2025

Tao Huang, Jianan Liu, Xi Zhou, Dinh C Nguyen, Mostafa Rahimi Azghadi, Yuxuan Xia, Qing-Long Han, and Sumei Sun. Vehicle-to-everything cooperative perception for autonomous driving.Proceedings of the IEEE, 2025. 2

work page 2025
[9]

Pointpillars: Fast encoders for object detection from point clouds

Alex H Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. Pointpillars: Fast encoders for object detection from point clouds. InIEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 12697–12705, 2019. 6

work page 2019
[10]

Latency-aware collaborative perception

Zixing Lei, Shunli Ren, Yue Hu, Wenjun Zhang, and Siheng Chen. Latency-aware collaborative perception. InEuropean Conference on Computer Vision, pages 316–332. Springer,

work page
[11]

Union: Unsupervised 3d object detection using object appearance- based pseudo-classes.Advances in Neural Information Pro- cessing Systems, 37:22028–22046, 2024

Ted Lentsch, Holger Caesar, and Dariu Gavrila. Union: Unsupervised 3d object detection using object appearance- based pseudo-classes.Advances in Neural Information Pro- cessing Systems, 37:22028–22046, 2024. 2

work page 2024
[12]

V2x-dgw: Domain generalization for multi-agent perception under adverse weather conditions

Baolu Li, Jinlong Li, Xinyu Liu, Runsheng Xu, Zhengzhong Tu, Jiacheng Guo, Qin Zou, Xiaopeng Li, and Hongkai Yu. V2x-dgw: Domain generalization for multi-agent perception under adverse weather conditions. InIEEE International Conference on Robotics and Automation, pages 974–980. IEEE, 2025. 2

work page 2025
[13]

S2r-vit for multi-agent cooperative perception: Bridging the gap from simulation to reality

Jinlong Li, Runsheng Xu, Xinyu Liu, Baolu Li, Qin Zou, Ji- aqi Ma, and Hongkai Yu. S2r-vit for multi-agent cooperative perception: Bridging the gap from simulation to reality. In IEEE International Conference on Robotics and Automation, pages 16374–16380, 2024

work page 2024
[14]

Learning distilled collaboration graph for multi-agent perception.Advances in Neural Infor- mation Processing Systems, 34:29541–29552, 2021

Yiming Li, Shunli Ren, Pengxiang Wu, Siheng Chen, Chen Feng, and Wenjun Zhang. Learning distilled collaboration graph for multi-agent perception.Advances in Neural Infor- mation Processing Systems, 34:29541–29552, 2021. 2

work page 2021
[15]

Waymo open dataset: Panoramic video panoptic segmentation

Jieru Mei, Alex Zihao Zhu, Xinchen Yan, Hang Yan, Siyuan Qiao, Liang-Chieh Chen, and Henrik Kretzschmar. Waymo open dataset: Panoramic video panoptic segmentation. In European Conference on Computer Vision, pages 53–72. Springer, 2022. 8

work page 2022
[16]

Mo- tion inspired unsupervised perception and prediction in au- tonomous driving

Mahyar Najibi, Jingwei Ji, Yin Zhou, Charles R Qi, Xinchen Yan, Scott Ettinger, and Dragomir Anguelov. Mo- tion inspired unsupervised perception and prediction in au- tonomous driving. InEuropean Conference on Computer Vision, pages 424–443. Springer, 2022. 2

work page 2022
[17]

Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in Neural Information Processing Systems, 30, 2017

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in Neural Information Processing Systems, 30, 2017. 3

work page 2017
[18]

Unsupervised object detection with lidar clues

Hao Tian, Yuntao Chen, Jifeng Dai, Zhaoxiang Zhang, and Xizhou Zhu. Unsupervised object detection with lidar clues. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5962–5972, 2021. 2

work page 2021
[19]

V2vnet: Vehicle-to-vehicle communication for joint perception and prediction

Tsun-Hsuan Wang, Sivabalan Manivasagam, Ming Liang, Bin Yang, Wenyuan Zeng, and Raquel Urtasun. V2vnet: Vehicle-to-vehicle communication for joint perception and prediction. InEuropean conference on computer vision, pages 605–621. Springer, 2020. 2

work page 2020
[20]

4d unsu- pervised object discovery.Advances in Neural Information Processing Systems, 35:35563–35575, 2022

Yuqi Wang, Yuntao Chen, and Zhao-Xiang Zhang. 4d unsu- pervised object discovery.Advances in Neural Information Processing Systems, 35:35563–35575, 2022. 2

work page 2022
[21]

Asynchrony-robust collaborative perception via bird’s eye view flow.Advances in Neural In- formation Processing Systems, 36:28462–28477, 2023

Sizhe Wei, Yuxi Wei, Yue Hu, Yifan Lu, Yiqi Zhong, Si- heng Chen, and Ya Zhang. Asynchrony-robust collaborative perception via bird’s eye view flow.Advances in Neural In- formation Processing Systems, 36:28462–28477, 2023. 2

work page 2023
[22]

Commonsense prototype for outdoor un- supervised 3d object detection

Hai Wu, Shijia Zhao, Xun Huang, Chenglu Wen, Xin Li, and Cheng Wang. Commonsense prototype for outdoor un- supervised 3d object detection. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14968– 14977, 2024. 2, 3, 6, 7, 8

work page 2024
[23]

Learn- ing to detect objects from multi-agent lidar scans without manual labels

Qiming Xia, Wenkai Lin, Haoen Xiang, Xun Huang, Siheng Chen, Zhen Dong, Cheng Wang, and Chenglu Wen. Learn- ing to detect objects from multi-agent lidar scans without manual labels. InIEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 1418–1428, 2025. 1, 2, 6, 7, 8

work page 2025
[24]

V2xp-asg: Generating adversarial scenes for vehicle-to-everything perception.arXiv preprint arXiv:2209.13679, 2022

Hao Xiang, Runsheng Xu, Xin Xia, Zhaoliang Zheng, Bolei Zhou, and Jiaqi Ma. V2xp-asg: Generating adversarial scenes for vehicle-to-everything perception.arXiv preprint arXiv:2209.13679, 2022. 2

work page arXiv 2022
[25]

V2x-real: a large-scale dataset for vehicle-to- everything cooperative perception

Hao Xiang, Zhaoliang Zheng, Xin Xia, Runsheng Xu, Letian Gao, Zewei Zhou, Xu Han, Xinkai Ji, Mingxi Li, Zonglin Meng, et al. V2x-real: a large-scale dataset for vehicle-to- everything cooperative perception. InEuropean Conference on Computer Vision, pages 455–470. Springer, 2024. 8

work page 2024
[26]

Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communica- tion

Runsheng Xu, Hao Xiang, Xin Xia, Xu Han, Jinlong Li, and Jiaqi Ma. Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communica- tion. InIEEE International Conference on Robotics and Au- tomation, pages 2583–2589. IEEE, 2022. 1, 2, 5, 6, 7, 8

work page 2022
[27]

V2v4real: A real-world large- scale dataset for vehicle-to-vehicle cooperative perception

Runsheng Xu, Xin Xia, Jinlong Li, Hanzhao Li, Shuo Zhang, Zhengzhong Tu, Zonglin Meng, Hao Xiang, Xi- aoyu Dong, Rui Song, et al. V2v4real: A real-world large- scale dataset for vehicle-to-vehicle cooperative perception. InIEEE/CVF conference on computer vision and pattern recognition, pages 13712–13722, 2023. 2, 4, 5, 6, 7, 8

work page 2023
[28]

Learning to detect mobile objects from lidar scans without labels

Yurong You, Katie Luo, Cheng Perng Phoo, Wei-Lun Chao, Wen Sun, Bharath Hariharan, Mark Campbell, and Kilian Q Weinberger. Learning to detect mobile objects from lidar scans without labels. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1130–1140, 2022. 2

work page 2022
[29]

To- wards unsupervised object detection from lidar point clouds

Lunjun Zhang, Anqi Joyce Yang, Yuwen Xiong, Sergio Casas, Bin Yang, Mengye Ren, and Raquel Urtasun. To- wards unsupervised object detection from lidar point clouds. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9317–9328, 2023. 2, 3, 6, 7, 8

work page 2023

[1] [1]

An overview of the current challenges, trends, and protocols in the field of vehicular communication.Electron- ics, 11(21):3581, 2022

Waleed Albattah, Shabana Habib, Mohammed F Alsharekh, Muhammad Islam, Saleh Albahli, and Deshinta Arrova Dewi. An overview of the current challenges, trends, and protocols in the field of vehicular communication.Electron- ics, 11(21):3581, 2022. 2

work page 2022

[2] [2]

Curriculum learning

Yoshua Bengio, J ´erˆome Louradour, Ronan Collobert, and Ja- son Weston. Curriculum learning. InInternational Confer- ence on Machine Learning, pages 41–48, 2009. 2, 4

work page 2009

[3] [3]

A density-based algorithm for discovering clusters in large spatial databases with noise

Martin Ester, Hans-Peter Kriegel, J ¨org Sander, Xiaowei Xu, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. Inkdd, pages 226–231,

work page

[4] [4]

Vision-language guidance for lidar-based unsupervised 3d object detection.arXiv preprint arXiv:2408.03790, 2024

Christian Fruhwirth-Reisinger, Wei Lin, Du ˇsan Mali´c, Horst Bischof, and Horst Possegger. Vision-language guidance for lidar-based unsupervised 3d object detection.arXiv preprint arXiv:2408.03790, 2024. 2

work page arXiv 2024

[5] [5]

Where2comm: Communication-efficient collab- orative perception via spatial confidence maps.Advances in Neural Information Processing Systems, 35:4874–4886,

Yue Hu, Shaoheng Fang, Zixing Lei, Yiqi Zhong, and Si- heng Chen. Where2comm: Communication-efficient collab- orative perception via spatial confidence maps.Advances in Neural Information Processing Systems, 35:4874–4886,

work page

[6] [6]

Collaboration helps camera overtake li- dar in 3d detection

Yue Hu, Yifan Lu, Runsheng Xu, Weidi Xie, Siheng Chen, and Yanfeng Wang. Collaboration helps camera overtake li- dar in 3d detection. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9243–9252, 2023. 2

work page 2023

[7] [7]

Communication-efficient collaborative percep- tion via information filling with codebook

Yue Hu, Juntong Peng, Sifei Liu, Junhao Ge, Si Liu, and Si- heng Chen. Communication-efficient collaborative percep- tion via information filling with codebook. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15481–15490, 2024. 2

work page 2024

[8] [8]

Vehicle-to-everything cooperative perception for autonomous driving.Proceedings of the IEEE, 2025

Tao Huang, Jianan Liu, Xi Zhou, Dinh C Nguyen, Mostafa Rahimi Azghadi, Yuxuan Xia, Qing-Long Han, and Sumei Sun. Vehicle-to-everything cooperative perception for autonomous driving.Proceedings of the IEEE, 2025. 2

work page 2025

[9] [9]

Pointpillars: Fast encoders for object detection from point clouds

Alex H Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. Pointpillars: Fast encoders for object detection from point clouds. InIEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 12697–12705, 2019. 6

work page 2019

[10] [10]

Latency-aware collaborative perception

Zixing Lei, Shunli Ren, Yue Hu, Wenjun Zhang, and Siheng Chen. Latency-aware collaborative perception. InEuropean Conference on Computer Vision, pages 316–332. Springer,

work page

[11] [11]

Union: Unsupervised 3d object detection using object appearance- based pseudo-classes.Advances in Neural Information Pro- cessing Systems, 37:22028–22046, 2024

Ted Lentsch, Holger Caesar, and Dariu Gavrila. Union: Unsupervised 3d object detection using object appearance- based pseudo-classes.Advances in Neural Information Pro- cessing Systems, 37:22028–22046, 2024. 2

work page 2024

[12] [12]

V2x-dgw: Domain generalization for multi-agent perception under adverse weather conditions

Baolu Li, Jinlong Li, Xinyu Liu, Runsheng Xu, Zhengzhong Tu, Jiacheng Guo, Qin Zou, Xiaopeng Li, and Hongkai Yu. V2x-dgw: Domain generalization for multi-agent perception under adverse weather conditions. InIEEE International Conference on Robotics and Automation, pages 974–980. IEEE, 2025. 2

work page 2025

[13] [13]

S2r-vit for multi-agent cooperative perception: Bridging the gap from simulation to reality

Jinlong Li, Runsheng Xu, Xinyu Liu, Baolu Li, Qin Zou, Ji- aqi Ma, and Hongkai Yu. S2r-vit for multi-agent cooperative perception: Bridging the gap from simulation to reality. In IEEE International Conference on Robotics and Automation, pages 16374–16380, 2024

work page 2024

[14] [14]

Learning distilled collaboration graph for multi-agent perception.Advances in Neural Infor- mation Processing Systems, 34:29541–29552, 2021

Yiming Li, Shunli Ren, Pengxiang Wu, Siheng Chen, Chen Feng, and Wenjun Zhang. Learning distilled collaboration graph for multi-agent perception.Advances in Neural Infor- mation Processing Systems, 34:29541–29552, 2021. 2

work page 2021

[15] [15]

Waymo open dataset: Panoramic video panoptic segmentation

Jieru Mei, Alex Zihao Zhu, Xinchen Yan, Hang Yan, Siyuan Qiao, Liang-Chieh Chen, and Henrik Kretzschmar. Waymo open dataset: Panoramic video panoptic segmentation. In European Conference on Computer Vision, pages 53–72. Springer, 2022. 8

work page 2022

[16] [16]

Mo- tion inspired unsupervised perception and prediction in au- tonomous driving

Mahyar Najibi, Jingwei Ji, Yin Zhou, Charles R Qi, Xinchen Yan, Scott Ettinger, and Dragomir Anguelov. Mo- tion inspired unsupervised perception and prediction in au- tonomous driving. InEuropean Conference on Computer Vision, pages 424–443. Springer, 2022. 2

work page 2022

[17] [17]

Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in Neural Information Processing Systems, 30, 2017

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in Neural Information Processing Systems, 30, 2017. 3

work page 2017

[18] [18]

Unsupervised object detection with lidar clues

Hao Tian, Yuntao Chen, Jifeng Dai, Zhaoxiang Zhang, and Xizhou Zhu. Unsupervised object detection with lidar clues. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5962–5972, 2021. 2

work page 2021

[19] [19]

V2vnet: Vehicle-to-vehicle communication for joint perception and prediction

Tsun-Hsuan Wang, Sivabalan Manivasagam, Ming Liang, Bin Yang, Wenyuan Zeng, and Raquel Urtasun. V2vnet: Vehicle-to-vehicle communication for joint perception and prediction. InEuropean conference on computer vision, pages 605–621. Springer, 2020. 2

work page 2020

[20] [20]

4d unsu- pervised object discovery.Advances in Neural Information Processing Systems, 35:35563–35575, 2022

Yuqi Wang, Yuntao Chen, and Zhao-Xiang Zhang. 4d unsu- pervised object discovery.Advances in Neural Information Processing Systems, 35:35563–35575, 2022. 2

work page 2022

[21] [21]

Asynchrony-robust collaborative perception via bird’s eye view flow.Advances in Neural In- formation Processing Systems, 36:28462–28477, 2023

Sizhe Wei, Yuxi Wei, Yue Hu, Yifan Lu, Yiqi Zhong, Si- heng Chen, and Ya Zhang. Asynchrony-robust collaborative perception via bird’s eye view flow.Advances in Neural In- formation Processing Systems, 36:28462–28477, 2023. 2

work page 2023

[22] [22]

Commonsense prototype for outdoor un- supervised 3d object detection

Hai Wu, Shijia Zhao, Xun Huang, Chenglu Wen, Xin Li, and Cheng Wang. Commonsense prototype for outdoor un- supervised 3d object detection. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14968– 14977, 2024. 2, 3, 6, 7, 8

work page 2024

[23] [23]

Learn- ing to detect objects from multi-agent lidar scans without manual labels

Qiming Xia, Wenkai Lin, Haoen Xiang, Xun Huang, Siheng Chen, Zhen Dong, Cheng Wang, and Chenglu Wen. Learn- ing to detect objects from multi-agent lidar scans without manual labels. InIEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 1418–1428, 2025. 1, 2, 6, 7, 8

work page 2025

[24] [24]

V2xp-asg: Generating adversarial scenes for vehicle-to-everything perception.arXiv preprint arXiv:2209.13679, 2022

Hao Xiang, Runsheng Xu, Xin Xia, Zhaoliang Zheng, Bolei Zhou, and Jiaqi Ma. V2xp-asg: Generating adversarial scenes for vehicle-to-everything perception.arXiv preprint arXiv:2209.13679, 2022. 2

work page arXiv 2022

[25] [25]

V2x-real: a large-scale dataset for vehicle-to- everything cooperative perception

Hao Xiang, Zhaoliang Zheng, Xin Xia, Runsheng Xu, Letian Gao, Zewei Zhou, Xu Han, Xinkai Ji, Mingxi Li, Zonglin Meng, et al. V2x-real: a large-scale dataset for vehicle-to- everything cooperative perception. InEuropean Conference on Computer Vision, pages 455–470. Springer, 2024. 8

work page 2024

[26] [26]

Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communica- tion

Runsheng Xu, Hao Xiang, Xin Xia, Xu Han, Jinlong Li, and Jiaqi Ma. Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communica- tion. InIEEE International Conference on Robotics and Au- tomation, pages 2583–2589. IEEE, 2022. 1, 2, 5, 6, 7, 8

work page 2022

[27] [27]

V2v4real: A real-world large- scale dataset for vehicle-to-vehicle cooperative perception

Runsheng Xu, Xin Xia, Jinlong Li, Hanzhao Li, Shuo Zhang, Zhengzhong Tu, Zonglin Meng, Hao Xiang, Xi- aoyu Dong, Rui Song, et al. V2v4real: A real-world large- scale dataset for vehicle-to-vehicle cooperative perception. InIEEE/CVF conference on computer vision and pattern recognition, pages 13712–13722, 2023. 2, 4, 5, 6, 7, 8

work page 2023

[28] [28]

Learning to detect mobile objects from lidar scans without labels

Yurong You, Katie Luo, Cheng Perng Phoo, Wei-Lun Chao, Wen Sun, Bharath Hariharan, Mark Campbell, and Kilian Q Weinberger. Learning to detect mobile objects from lidar scans without labels. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1130–1140, 2022. 2

work page 2022

[29] [29]

To- wards unsupervised object detection from lidar point clouds

Lunjun Zhang, Anqi Joyce Yang, Yuwen Xiong, Sergio Casas, Bin Yang, Mengye Ren, and Raquel Urtasun. To- wards unsupervised object detection from lidar point clouds. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9317–9328, 2023. 2, 3, 6, 7, 8

work page 2023