CooperScene: Multi-Modal Cooperative Autonomy Benchmark with C-V2X Communication Characterization

Amit Roy-Chowdhury; Bo Wu; Guoyuan Wu; Hang Qiu; Janice Nguyen; Justin Yue; Matthew J. Barth; Ruoshen Mo; Yanyu Zhang

arxiv: 2606.31219 · v1 · pith:B2CAYW4Gnew · submitted 2026-06-30 · 💻 cs.CV

CooperScene: Multi-Modal Cooperative Autonomy Benchmark with C-V2X Communication Characterization

Bo Wu , Ruoshen Mo , Justin Yue , Yanyu Zhang , Janice Nguyen , Guoyuan Wu , Amit Roy-Chowdhury , Matthew J. Barth

show 1 more author

Hang Qiu

This is my paper

Pith reviewed 2026-07-01 06:05 UTC · model grok-4.3

classification 💻 cs.CV

keywords cooperative perceptionC-V2X communicationautonomous vehiclesmulti-agent systemsbenchmark datasetmulti-modal sensors3D object annotation

0 comments

The pith

CooperScene introduces a benchmark dataset for cooperative autonomy that records real C-V2X communication from commercial radios across three vehicles and one roadside unit.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CooperScene to fill gaps in existing cooperative autonomy datasets by including actual communication constraints. It collects data from three connected autonomous vehicles and one infrastructure unit, each carrying multi-modal sensors and off-the-shelf C-V2X radios, across intersections, highway ramps, and parking lots. All frames carry globally consistent 3D labels at 10 Hz for a total of 344K objects, supported by tight synchronization, centimeter-level localization, and cross-modality calibration. A sympathetic reader would care because the dataset lets researchers measure how cooperative perception, prediction, and planning scale when bandwidth is limited and dynamic rather than idealized.

Core claim

CooperScene is a high-fidelity cooperative autonomy dataset with real-world C-V2X communication characterization. The dataset is organized into diverse scenes involving three CAVs and one RSU, all equipped with multi-modal sensors and commercial C-V2X radios. Scenes are annotated with globally consistent 3D labels at 10 Hz, totaling 344K objects across 59K frames, underpinned by tight sensor- and agent-synchronization, centimeter-level localization and spatial alignment, precise cross-modality calibration, and 3GPP-standard-compliant C-V2X communication. CooperScene establishes a rigorous benchmark for evaluating multi-agent scaling and actual performance in real-world deployable settings.

What carries the argument

The CooperScene dataset, which records synchronized multi-modal sensor streams and C-V2X communication traces from three CAVs plus one RSU across varied real scenes.

If this is right

Algorithms can be tested for robustness when communication bandwidth varies and is limited rather than assumed perfect.
Evaluation can now include scaling to multiple agents and infrastructure units instead of pairs.
Methods handling heterogeneous sensor modalities across agents can be compared on standardized real traces.
Development of cooperative systems can target metrics that reflect deployable conditions with commercial radios.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The benchmark could be extended by adding explicit tasks for prediction and planning to measure end-to-end cooperative performance.
Direct comparison of the recorded C-V2X traces against simulated channel models would highlight where current models fail to capture real bandwidth dynamics.
Widespread use might push industry groups to adopt similar multi-agent, multi-modality test protocols for certification of cooperative driving features.

Load-bearing premise

The collected scenes, sensor setups, and C-V2X traces with commercial radios are representative of the real-world deployment complexities that existing datasets overlook.

What would settle it

If cooperative algorithms evaluated on CooperScene produce performance numbers that diverge from results obtained in live uncontrolled road tests using comparable hardware and traffic densities, the benchmark's representativeness would be challenged.

Figures

Figures reproduced from arXiv: 2606.31219 by Amit Roy-Chowdhury, Bo Wu, Guoyuan Wu, Hang Qiu, Janice Nguyen, Justin Yue, Matthew J. Barth, Ruoshen Mo, Yanyu Zhang.

**Figure 1.** Figure 1: CooperScene sample data visualization. Left: Overlay of point clouds from all agents (three connected autonomous vehicles (CAVs) in red, green, blue, and infrastructure in purple) with global 3D object labels (yellow). Middle: Individual agent sensor views in local frames. Right: Real time C-V2X throughput measured between all agent pairs. Abstract. Cellular vehicle-to-everything (C-V2X) enables cooperativ… view at source ↗

**Figure 2.** Figure 2: Accuracy and Bandwidth of Open-source Cooperative Perception Methods Benchmarked on OPV2V [62] Dataset: A Research Gap towards Real-world Deployment. Despite the success on benchmarks, many models fall short in real-world deployment, partly due to a lack of datasets that reflect realistic V2X communication dynamics [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: CMS Overview. CMS integrates LiDAR, camera, GNSS with a power-over-ethernet (PoE) switch, which forwards the data to a central ROS node (running on a laptop). The laptop and sensors are synchronized with GNSS time, and all intrinsic and extrinsic parameters are calibrated for all sensors. TX/RX module communicates with other CMS platforms on both vehicles and the infrastructure [PITH_FULL_IMAGE:figure… view at source ↗

**Figure 4.** Figure 4 [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: C-V2X Communication throughput variation across different frames. Pentagram denotes the peak throughput. due to vehicle mobility and channel contention among multiple agents. As a result, no single agent can occupy the channel continuously, and the effective throughput fluctuates significantly over time. These observations highlight the need for careful spectrum sharing design and motivate the use of more … view at source ↗

**Figure 6.** Figure 6: Latency and packet loss rate under C-V2X While bandwidth is the dominant bottleneck, other communication characteristics also affect end-to-end performance. To isolate their impact, we remove the bottleneck by conducting an oracle-style experiment: each CAV transmits only the down-sampled LiDAR points bounded by ground-truth boxes, reducing the data volume below the CV2X capacity. As shown in Table 6,… view at source ↗

**Figure 7.** Figure 7: Visualization of synchronized sequences in CooperScene. 0 1 2 Average RMSE 0 1 2 3 PDF No ICP Pairwise ICP Spatial-temporal ICP 0.5 1.0 0 5 10 15 (a) RMSE PDF Dataset 0.0 0.5 1.0 1.5 Average RMSE Dair-V2X V2X-Seq V2V4Real TUMTraf Cooperscene (b) RMSE comparison [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: Spatial alignment error comparison between different methods (a), and among other datasets (b). Spatial Alignment. We quantify spatial alignment accuracy using the root mean square error (RMSE) of overlapping areas between synchronized LiDAR frames for each agent pair. Specifically, we run ICP once more on the already aligned frames, and compute the RMSE of the Euclidean distances between the matched p… view at source ↗

**Figure 9.** Figure 9: LiDAR-to-camera overlays using calibrated extrinsic. achieving an RMSE of 0.2 m for most frames. Figure 8b compares the RMSE of CooperScene with other real-world datasets. CooperScene achieves the lowest average RMSE among all datasets except TumTraf. TumTraf aligns a single vehicle, whereas CooperScene must jointly align multiple agents. Furthermore, TumTraf applies ICP every 10 frames, while CooperScene … view at source ↗

**Figure 10.** Figure 10: Reprojection Qualitative Analysis Calibration [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

**Figure 11.** Figure 11: Synchronization structure of CooperScene. PTP (ptp4l v1.6 [3]) to synchronize all sensors, excluding the IMU. The Xsens IMU synchronizes directly via PPS signals and UTC time from GNSS. All other sensors are disciplined to the MK6, which acts as the PTP grandmaster. The synchronization structure of CooperScene is illustrated in [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗

**Figure 12.** Figure 12: Visualization of the LiDAR trigger output signal displayed on a 100 MHz oscilloscope. C-V2X measurement setup. We evaluate C-V2X network performance using Iperf2 [2]. Each agent executes one Iperf client for packet transmission and three Iperf servers to receive traffic from peer agents simultaneously. Clients are configured for UDP transmission at a rate of 5 Mbps, with servers reporting metrics at 1… view at source ↗

**Figure 13.** Figure 13: SUSTechPoints annotation interface for human refinement. The fused LiDAR point cloud from three CAVs and road side setup is loaded with the global bounding boxes obtained from the auto labeling stage. The SUSTechPoints tool is able to visualize LiDAR sweeps, 3D bounding boxes, and review object trajectories across time. The interface provides multi-view panels for fine-grained adjustment of object orienta… view at source ↗

**Figure 14.** Figure 14: Cross-frame trajectory adjustment in SUSTechPOINTS. SUSTechPOINTS provides a multi-frame visualization interface that allows annotators to refine a single object’s 3D bounding box consistently across time. Annotators can adjust the bounding box in any frame and propagate corrections using built-in interpolation tools, which automatically generate smooth and temporally coherent trajectories. captures diver… view at source ↗

**Figure 15.** Figure 15: LiDAR overlay visualization across multiple traffic scenarios involving three connected automated vehicles (CAVs) and one roadside infrastructure unit. Infrastructure LiDAR is shown in purple points, while CAV A, CAV B, and CAV C are visualized in red, green, and blue, respectively. (a) Scenario 1 (b) Scenario 2 (c) Scenario 3 (d) Scenario 4 (e) Scenario 5 (f ) Scenario 6 [PITH_FULL_IMAGE:figures/full_f… view at source ↗

read the original abstract

Cellular vehicle-to-everything (C-V2X) enables cooperative perception, prediction, and planning beyond the field of view of individual agents. However, existing datasets often overlook the complexities of real-world deployment, such as limited communication bandwidth and its dynamics, heterogeneous sensing modalities, and scalability beyond a single cooperative partner. In this paper, we introduce CooperScene, a high-fidelity cooperative autonomy dataset with real-world C-V2X communication characterization. The dataset is organized into diverse scenes, including intersections, highway ramps, and parking lots. These scenes involve three connected and autonomous vehicles (CAVs) and one infrastructure roadside unit (RSU), all equipped with multi-modal sensors and commercial off-the-shelf C-V2X communication radios. All scenes are annotated with globally consistent 3D labels at 10 Hz, totaling 344K objects across 59K frames, underpinned by tight sensor- and agent-synchronization, centimeter-level localization and spatial alignment, precise cross-modality calibration, and 3GPP-standard-compliant C-V2X communication. CooperScene establishes a rigorous benchmark for evaluating multi-agent scaling and actual performance in real-world deployable settings. Project website for data and benchmark: https://cisl.ucr.edu/CooperScene

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CooperScene adds real C-V2X traces from a fixed three-CAV plus RSU setup, which is useful for comms-aware perception work but does not support direct scaling measurements.

read the letter

CooperScene collects real C-V2X traces using commercial radios on three CAVs and one RSU, paired with multi-modal sensors and 10 Hz globally consistent 3D labels across intersections, ramps, and parking lots. The dataset totals 59K frames and 344K objects, with reported tight synchronization and 3GPP-compliant communication.

The useful part is the actual bandwidth dynamics and heterogeneous modality data rather than simulated traces. That addresses a real gap for people testing cooperative methods under deployment-like constraints. The collection effort itself looks substantive.

The soft spot is the scaling claim. All scenes use the identical three-plus-one configuration, so the traces do not let you measure performance changes with more agents or different densities. Any scaling analysis would need external simulation layered on top, which weakens the positioning as a rigorous benchmark for multi-agent scaling.

The paper is for the cooperative perception and V2X community that wants real comms data to validate algorithms. It shows clear thinking in the data release and no internal contradictions. I would bring the data details to a reading group. I would not cite it in my own work in the next year unless the full paper adds strong validation measurements. It deserves peer review because the real-world collection is concrete enough to warrant referee feedback on documentation and scope.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces CooperScene, a high-fidelity multi-modal cooperative autonomy dataset featuring data from three connected autonomous vehicles (CAVs) and one roadside unit (RSU) equipped with various sensors and commercial C-V2X radios. The dataset covers diverse scenes such as intersections, highway ramps, and parking lots, with globally consistent 3D annotations at 10 Hz, totaling 344K objects across 59K frames. It emphasizes tight synchronization, centimeter-level localization, precise calibration, and 3GPP-compliant C-V2X communication, positioning itself as a benchmark for evaluating multi-agent scaling and real-world performance in cooperative autonomy.

Significance. If the data collection and characterization claims are validated, CooperScene could significantly advance research in cooperative perception and planning by providing real-world C-V2X communication traces and multi-modal data under realistic constraints, which are often missing in existing datasets. This would enable more accurate evaluation of algorithms in deployable settings.

major comments (2)

[Abstract] Abstract: The assertion that the dataset 'establishes a rigorous benchmark for evaluating multi-agent scaling' is undermined by the fixed configuration of exactly three CAVs and one RSU in all scenes, with no reported variation in agent count or density. This prevents direct empirical assessment of scaling trends from the collected data.
[Abstract] Abstract: The abstract asserts high-fidelity properties including centimeter-level localization, precise cross-modality calibration, and 3GPP-standard-compliant C-V2X communication, but provides no validation measurements, error analysis, or comparison tables to support these claims.

minor comments (1)

[Abstract] Abstract: The total number of frames and objects is given, but it would be helpful to include breakdowns by scene type for better context on diversity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We address the two major comments point by point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The assertion that the dataset 'establishes a rigorous benchmark for evaluating multi-agent scaling' is undermined by the fixed configuration of exactly three CAVs and one RSU in all scenes, with no reported variation in agent count or density. This prevents direct empirical assessment of scaling trends from the collected data.

Authors: We agree that the fixed configuration of three CAVs and one RSU across all scenes does not permit direct empirical assessment of scaling trends with varying agent counts or densities from the collected data. The phrasing in the abstract regarding 'multi-agent scaling' is therefore not fully supported by the dataset design. We will revise the abstract to remove this specific claim and instead state that CooperScene establishes a benchmark for evaluating cooperative autonomy in multi-agent settings with real-world C-V2X constraints. revision: yes
Referee: [Abstract] Abstract: The abstract asserts high-fidelity properties including centimeter-level localization, precise cross-modality calibration, and 3GPP-standard-compliant C-V2X communication, but provides no validation measurements, error analysis, or comparison tables to support these claims.

Authors: The abstract summarizes properties achieved during data collection, with supporting characterization and compliance details provided in the main manuscript sections on sensor setup, localization, calibration, and C-V2X communication. However, the abstract itself does not include explicit validation metrics or references. We will revise the abstract to qualify these claims by adding a brief reference to the validation and characterization results presented in the body of the paper. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset contribution with no derivations or self-referential claims

full rationale

The paper presents a data-collection effort (scenes, sensors, C-V2X traces, annotations) rather than any derivation chain, equations, fitted parameters, or predictions. The abstract's benchmark claim is a statement about the dataset's intended use, not a result derived from prior steps within the paper. No self-citations, ansatzes, or reductions to inputs appear in the provided text. This is the expected non-finding for a benchmark paper whose central contribution is empirical data rather than a closed-form result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the domain assumption that its chosen scenes and commercial hardware capture the overlooked complexities of real deployment; no free parameters or invented entities are introduced because the contribution is empirical data collection rather than a theoretical model.

axioms (1)

domain assumption The recorded scenes, sensor configurations, and C-V2X traces are representative of real-world deployment complexities such as bandwidth dynamics and multi-agent scaling.
Abstract states that existing datasets overlook these complexities and presents CooperScene as addressing them.

pith-pipeline@v0.9.1-grok · 5778 in / 1289 out tokens · 35763 ms · 2026-07-01T06:05:19.282606+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

76 extracted references · 24 canonical work pages · 1 internal anchor

[1]

Cohda Wireless MK6.https://www.cohdawireless.com/solutions/mk6/
[2]

Iperf 2.https://iperf.fr/iperf-doc.php
[3]

Linux ptp4l.https://linuxptp.nwtime.org
[4]

https://thinklucid.com/product/triton- 5- mp- imx490/

Lucid Triton Gig-e Camera. https://thinklucid.com/product/triton- 5- mp- imx490/
[5]

Mikrotik poe css610-8p-2s+in.https://mikrotik.com/product/css610_8p_2s_in
[6]

OptiTrack Motion Capture System.https://optitrack.com/
[7]

Ouster.https://ouster.com
[8]

https://www.calian.com/advanced- technologies/gnss_ product/tw8889-dual-band-gnss-antenna

Tallysman tw8889. https://www.calian.com/advanced- technologies/gnss_ product/tw8889-dual-band-gnss-antenna
[9]

https://www.tesla.com/robotaxi

Tesla Robotaxi. https://www.tesla.com/robotaxi
[10]

https : / / www

Thunderbolt 4 10g ethernet adapter. https : / / www . owc . com / solutions / thunderbolt-4-10g-ethernet-adapter
[11]

Vicon Motion Capture System.https://www.vicon.com/
[12]

https://waymo.com

Waymo. https://waymo.com
[13]

https://www.movella.com/sensor-modules/xsens-mti- 680-rtk-gnss-ins

XSense MTi-680 RTK GNSS. https://www.movella.com/sensor-modules/xsens-mti- 680-rtk-gnss-ins
[14]

IEEE Std 1588-2019 (Revision ofIEEE Std 1588-2008) pp

Ieee standard for a precision clock synchronization protocol for networked measure- ment and control systems. IEEE Std 1588-2019 (Revision ofIEEE Std 1588-2008) pp. 1–499 (2020).https://doi.org/10.1109/IEEESTD.2020.9120376

work page doi:10.1109/ieeestd.2020.9120376 2019
[15]

Technical Specification (TS) 36.213 (2021), version 14.17.0

3GPP: Physical layer procedures. Technical Specification (TS) 36.213 (2021), version 14.17.0

2021
[16]

3GPP, E..: Digital cellular telecommunications system (phase 2+) (gsm); universal mobile telecommunications system (umts); lte; 5g; release description; release 14 (3gpp tr 21.914 version 14.0.0) — etsi tr 121 914 v14.0.0. Tech. Rep. TR 121 914 V14.0.0, ETSI (Jun 2018),https://www.etsi.org/deliver/etsi_tr/ 121900_121999/121914/14.00.00_60/tr_121914v140000...

2018
[17]

Accessed: 2025-11-13

2025
[18]

federal motor vehicle safety standards; v2v communications

Administration, N.H.T.S., et al.: Department of transportation (dot)," federal motor vehicle safety standards; v2v communications", notice of proposed rulemaking (nprm). Tech. rep. (2016) CooperScene17

2016
[19]

SAE International (2020)

Automation, C.D.: SAE J3216: Taxonomy and definitions for terms related to cooperative driving automation for on-road motor vehicles. SAE International (2020)

2020
[20]

IEEE Transactions on Pattern Analysis and Machine Intelligence14(2), 239–256 (1992).https://doi

Besl, P., McKay, N.D.: A method for registration of 3-d shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence14(2), 239–256 (1992).https://doi. org/10.1109/34.121791

work page doi:10.1109/34.121791 1992
[21]

Brettle, F., et al.: Google/draco: a library for compressing and decompressing 3d geometric meshes and point clouds.https://github.com/google/draco (2018), accessed: [Insert date of access, e.g., 2025-11-13]

2018
[22]

C-V2X Technical Committee: Sae j3161: LTE vehicle-to-everything (LTE-V2X) deployment profiles and radio parameters for single radio channel multi-service coexistence. Tech. rep., SAE International, 400 Commonwealth Drive, Warrendale, PA, United States (2022)

2022
[23]

nuScenes: A multimodal dataset for autonomous driving

Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., Beijbom, O.: nuscenes: A multimodal dataset for autonomous driving. arXiv:1903.11027 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1903
[24]

Chen, Q., Ma, X., Tang, S., Guo, J., Yang, Q., Fu, S.: F-cooper: Feature based cooperative perception for autonomous vehicle edge computing system using 3d pointclouds.In:Proceedingsofthe4thACM/IEEESymposiumonEdgeComputing. p. 88–100. SEC ’19, Association for Computing Machinery, New York, NY, USA (2019)

2019
[25]

In: 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS)

Chen, Q., Tang, S., Yang, Q., Fu, S.: Cooper: Cooperative perception for connected autonomous vehicles based on 3d point clouds. In: 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). pp. 514–524 (2019). https://doi.org/10.1109/ICDCS.2019.00058

work page doi:10.1109/icdcs.2019.00058 2019
[26]

https://github.com/open-mmlab/mmdetection3d (2020)

Contributors, M.: MMDetection3D: OpenMMLab next-generation platform for general 3D object detection. https://github.com/open-mmlab/mmdetection3d (2020)

2020
[27]

In: IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022

Cui, J., Qiu, H., Chen, D., Stone, P., Zhu, Y.: Coopernaut: End-to-end driving with cooperative perception for networked vehicles. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). p. 17231–17241. IEEE (Jun 2022). https://doi.org/10.1109/cvpr52688.2022.01674 , http://dx.doi.org/ 10.1109/CVPR52688.2022.01674

work page doi:10.1109/cvpr52688.2022.01674 2022
[28]

In: Proceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems

Cui, J., Tang, C., Holtz, J., Nguyen, J., Allievi, A.G., Qiu, H., Stone, P.: Coopreflect: Towards natural language communication for cooperative autonomous driving via multi-agent learning. In: Proceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems. AAMAS ’26 (2026),https: //arxiv.org/abs/2505.18334, oral Presentation

work page arXiv 2026
[29]

In: 5th Symposium on Operating Systems Design and Implementation (OSDI 02)

Elson, J., Girod, L., Estrin, D.: Fine-Grained network time synchronization us- ing reference broadcasts. In: 5th Symposium on Operating Systems Design and Implementation (OSDI 02). USENIX Association, Boston, MA (Dec 2002)

2002
[30]

International Journal of Robotics Research (IJRR) (2013)

Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: The kitti dataset. International Journal of Robotics Research (IJRR) (2013)

2013
[31]

In: 2011 IEEE 32nd Real-Time Systems Symposium

Hao, T., Zhou, R., Xing, G., Mutka, M.: Wizsync: Exploiting wi-fi infrastructure for clock synchronization in wireless sensor networks. In: 2011 IEEE 32nd Real-Time Systems Symposium. pp. 149–158 (2011)

2011
[32]

Advances in neural information processing systems (2022) 18 B

Hu, Y., Fang, S., Lei, Z., Zhong, Y., Chen, S.: Where2comm: Communication- efficient collaborative perception via spatial confidence maps. Advances in neural information processing systems (2022) 18 B. Wu et al

2022
[33]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Hu, Y., Peng, J., Liu, S., Ge, J., Liu, S., Chen, S.: Communication-efficient col- laborative perception via information filling with codebook. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 15481–15490 (2024)

2024
[34]

In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Iyer, G., Ram, R.K., Murthy, J.K., Krishna, K.M.: Calibnet: Geometrically super- vised extrinsic calibration using 3d spatial transformer networks. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE (Oct 2018). https://doi.org/10.1109/iros.2018.8593693 , http://dx.doi.org/10. 1109/IROS.2018.8593693

work page doi:10.1109/iros.2018.8593693 2018
[35]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: Pointpillars: Fast encoders for object detection from point clouds. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12697–12705 (2019)

2019
[36]

In: 2020 IEEE Intelligent Vehicles Symposium (IV)

Li, E., Wang, S., Li, C., Li, D., Wu, X., Hao, Q.: Sustech points: A portable 3d point cloud interactive annotation platform system. In: 2020 IEEE Intelligent Vehicles Symposium (IV). pp. 1108–1115 (2020).https://doi.org/10.1109/IV47402.2020. 9304562

work page doi:10.1109/iv47402.2020 2020
[37]

IEEE Transactions on Intelligent Vehicles8(4), 2650–2660 (Apr 2023).https://doi.org/10.1109/tiv

Li, J., Xu, R., Liu, X., Ma, J., Chi, Z., Ma, J., Yu, H.: Learning for vehicle-to- vehicle cooperative perception under lossy communication. IEEE Transactions on Intelligent Vehicles8(4), 2650–2660 (Apr 2023).https://doi.org/10.1109/tiv. 2023.3260040,http://dx.doi.org/10.1109/TIV.2023.3260040

work page doi:10.1109/tiv 2023
[38]

Advances in Neural Information Processing Systems34, 29541–29552 (2021)

Li, Y., Ren, S., Wu, P., Chen, S., Feng, C., Zhang, W.: Learning distilled collabora- tion graph for multi-agent perception. Advances in Neural Information Processing Systems34, 29541–29552 (2021)

2021
[39]

In: IEEE International Conference on Robotics and Automation, ICRA 2023, London, UK, May 29 - June 2, 2023

Liu, Z., Tang, H., Amini, A., Yang, X., Mao, H., Rus, D.L., Han, S.: Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. In: 2023 IEEE International Conference on Robotics and Automation (ICRA). pp. 2774–2781 (2023).https://doi.org/10.1109/ICRA48891.2023.10160968

work page doi:10.1109/icra48891.2023.10160968 2023
[40]

Maróti, M., Kusy, B., Simon, G., Lédeczi, A.: The flooding time synchronization protocol. p. 39–49. SenSys ’04, Association for Computing Machinery, New York, NY, USA (2004)

2004
[41]

In: Proceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems

Mo, R., Wu, B., Tan, Z., Qiu, H.: See-v2x: C-v2x direct communication dataset: An application-centric approach. In: Proceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems. p. 305–311. Association for Computing Machinery, New York, NY, USA (2025), https://doi.org/10.1145/3715014. 3722077

work page doi:10.1145/3715014 2025
[42]

In: The Fourteenth In- ternational Conference on Learning Representations

Mukhopadhyay, S., Roy-Chowdhury, A., Qiu, H.: Coopertrim: Adaptive data selection for uncertainty-aware cooperative perception. In: The Fourteenth In- ternational Conference on Learning Representations. ICLR ’26 (2026),https: //openreview.net/forum?id=8NgKNuHRiH

2026
[43]

National Academics (2020)

National Academies of Sciences, Engineering, and Medicine and others: Business models to facilitate deployment of connected vehicle infrastructure to support automated vehicle operations. National Academics (2020)

2020
[44]

In: Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services

Qiu, H., Ahmad, F., Bai, F., Gruteser, M., Govindan, R.: Avr: Augmented vehicular reality. In: Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services. p. 81–95. MobiSys ’18, New York, NY, USA (2018)

2018
[45]

In: Proceedings of the 20th Annual International Conference on Mobile Systems, Applications, and Services

Qiu, H., Huang, P., Asavisanu, N., Liu, X., Psounis, K., Govindan, R.: Autocast: Scalable infrastructure-less cooperative perception for distributed collaborative driving. In: Proceedings of the 20th Annual International Conference on Mobile Systems, Applications, and Services. MobiSys ’22 (December 2022) CooperScene19

2022
[46]

In: 2018 21st Inter- national Conference on Intelligent Transportation Systems (ITSC)

Rawashdeh, Z.Y., Wang, Z.: Collaborative automated driving: A machine learning- based method to enhance the accuracy of shared information. In: 2018 21st Inter- national Conference on Intelligent Transportation Systems (ITSC). pp. 3961–3966 (2018).https://doi.org/10.1109/ITSC.2018.8569832

work page doi:10.1109/itsc.2018.8569832 2018
[47]

In: Proceedings of the 23rd Annual International Conference on Mobile Systems, Applications, and Services

Ren, H., Zhang, W., Shi, S., Zhang, X., Zhang, L., Zhang, Y.: Unisense: Spatial- uncertainty-aware collaborative sensing for autonomous driving. In: Proceedings of the 23rd Annual International Conference on Mobile Systems, Applications, and Services. MobiSys ’25 (2025)

2025
[48]

Sekaran, K.C., Geisler, M., Rößle, D., Mohan, A., Cremers, D., Utschick, W., Botsch, M., Huber, W., Schön, T.: Urbaning-v2x: A large-scale multi-vehicle, multi- infrastructure dataset across multiple intersections for cooperative perception (2025), https://arxiv.org/abs/2510.23478

work page arXiv 2025
[49]

In: 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems

Strobl, K.H., Hirzinger, G.: Optimal hand-eye calibration. In: 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 4647–4653 (2006). https://doi.org/10.1109/IROS.2006.282250

work page doi:10.1109/iros.2006.282250 2006
[50]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2020)

Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V., Tsui, P., Guo, J., Zhou, Y., Chai, Y., Caine, B., Vasudevan, V., Han, W., Ngiam, J., Zhao, H., Timofeev, A., Ettinger, S., Krivokon, M., Gao, A., Joshi, A., Zhang, Y., Shlens, J., Chen, Z., Anguelov, D.: Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings ...

2020
[51]

IEEE Journal on Robotics and Automation3(4), 323–344 (1987)

Tsai, R.: A versatile camera calibration technique for high-accuracy 3d machine vi- sion metrology using off-the-shelf tv cameras and lenses. IEEE Journal on Robotics and Automation3(4), 323–344 (1987). https://doi.org/10.1109/JRA.1987. 1087109

work page doi:10.1109/jra.1987 1987
[52]

https://www.transportation.gov/av/3 (2018)

USDOT: Preparing for the future of transportation: Automated vehicles 3.0. https://www.transportation.gov/av/3 (2018)

2018
[53]

In: ECCV (2020)

Wang, T.H., Manivasagam, S., Liang, M., Bin, Y., Zeng, W., Tu, J., Urtasun, R.: V2VNet: Vehicle-to-vehicle communication for joint perception and prediction. In: ECCV (2020)

2020
[54]

IEEE Robotics and Automation Letters (2025)

Wang, Z., Wang, Y., Wu, Z., Ma, H., Li, Z., Qiu, H., Li, J.: Cmp: Cooperative motion prediction with multi-agent communication. IEEE Robotics and Automation Letters (2025)

2025
[55]

Waymo: Fleet response: Lending a helpful hand to Waymo’s autonomously driven vehicles.https://waymo.com/blog/2024/05/fleet-response

2024
[56]

In: Na- tional Institute of Standards and Technology (NIST), USA,[Online]: www

Weiss, M.: Telecom requirements for time and frequency synchronization. In: Na- tional Institute of Standards and Technology (NIST), USA,[Online]: www. gps. gov/cgsic/meetings/2012/weiss1. pdf (2012)

2012
[57]

In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Weng, X., Wang, J., Held, D., Kitani, K.: 3d multi-object tracking: A baseline and new evaluation metrics. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 10359–10366 (2020).https://doi.org/10.1109/ IROS45743.2020.9341164

work page arXiv 2020
[58]

Wu, B., Li, J., Mo, R., Yue, J., Bharadia, D., Qiu, H.: Demo Abstract: Cooperative Multi-modal Sensing, p. 712–713. Association for Computing Machinery, New York, NY, USA (2025),https://doi.org/10.1145/3715014.3724372

work page doi:10.1145/3715014.3724372 2025
[59]

Xu, J., Zhang, Y., Cai, Z., Huang, D.: Cosdh: Communication-efficient collaborative perception via supply-demand awareness and intermediate-late hybridization (2025), https://arxiv.org/abs/2503.03430

work page arXiv 2025
[60]

In: Conference on Robot Learning (CoRL) (2022) 20 B

Xu, R., Tu, Z., Xiang, H., Shao, W., Zhou, B., Ma, J.: Cobevt: Cooperative bird’s eye view semantic segmentation with sparse transformers. In: Conference on Robot Learning (CoRL) (2022) 20 B. Wu et al

2022
[61]

In: The IEEE/CVF Computer Vision and Pattern Recognition Conference (2023)

Xu, R., Xia, X., Li, J., Li, H., Zhang, S., Tu, Z., Meng, Z., Xiang, H., Dong, X., Song, R., Yu, H., Zhou, B., Ma, J.: V2v4real: A real-world large-scale dataset for vehicle-to-vehicle cooperative perception. In: The IEEE/CVF Computer Vision and Pattern Recognition Conference (2023)

2023
[62]

In: Proceedings of the European Conference on Computer Vision (2022)

Xu,R.,Xiang,H.,Tu,Z.,Xia,X.,Yang,M.H.,Ma,J.:V2x-vit:Vehicle-to-everything cooperative perception with vision transformer. In: Proceedings of the European Conference on Computer Vision (2022)

2022
[63]

In: 2022 International Conference on Robotics and Automation (ICRA)

Xu, R., Xiang, H., Xia, X., Han, X., Li, J., Ma, J.: Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication. In: 2022 International Conference on Robotics and Automation (ICRA). pp. 2583–2589. IEEE (2022)

2022
[64]

Sensors18(10), 3337 (2018)

Yan, Y., Mao, Y., Li, B.: Second: Sparsely embedded convolutional detection. Sensors18(10), 3337 (2018)

2018
[65]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Yu, H., Luo, Y., Shu, M., Huo, Y., Yang, Z., Shi, Y., Guo, Z., Li, H., Hu, X., Yuan, J., Nie, Z.: Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 21361–21370 (2022)

2022
[66]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)

Yu, H., Yang, W., Ruan, H., Yang, Z., Tang, Y., Gao, X., Hao, X., Shi, Y., Pan, Y., Sun, N., Song, J., Yuan, J., Luo, P., Nie, Z.: V2x-seq: A large-scale sequential dataset for vehicle-infrastructure cooperative perception and forecasting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)

2023
[67]

In: The 39th Annual AAAI Conference on Artificial Intelligence (2025)

Yu, H., Yang, W., Zhong, J., Yang, Z., Fan, S., Luo, P., Nie, Z.: End-to-end au- tonomous driving through v2x cooperation. In: The 39th Annual AAAI Conference on Artificial Intelligence (2025)

2025
[68]

In: 9th Annual Conference on Robot Learning

Yuan, W., Li, J., Yue, J., Shah, D., Karydis, K., Qiu, H.: Bevcalib: Lidar-camera calibration via geometry-guided bird’s-eye view representations. In: 9th Annual Conference on Robot Learning. CoRL ’25 (2025),https://arxiv.org/abs/2506. 02587

2025
[69]

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR)

Zhang, J., Yang, K., Wang, Y., Wang, H., Sun, P., Song, L.: Ermvp: Communication- efficient and collaboration-robust multi-vehicle perception in challenging environ- ments. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR). pp. 12575–12584 (2024).https://doi.org/10.1109/CVPR52733. 2024.01195

work page doi:10.1109/cvpr52733 2024
[70]

In: 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat

Zhang, Q., Pless, R.: Extrinsic calibration of a camera and laser range finder (improves camera calibration). In: 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566). vol. 3, pp. 2301–2306 vol.3 (2004).https://doi.org/10.1109/IROS.2004.1389752

work page doi:10.1109/iros.2004.1389752 2004
[71]

In: Proceedings of the 29th Annual International Conference on Mobile Computing and Networking

Zhang, Q., Zhang, X., Zhu, R., Bai, F., Naserian, M., Mao, Z.M.: Robust real-time multi-vehicle collaboration on asynchronous sensors. In: Proceedings of the 29th Annual International Conference on Mobile Computing and Networking. pp. 1–15 (2023)

2023
[72]

In: Proceedings of the 27th Annual International Conference on Mobile Computing and Networking

Zhang, X., Zhang, A., Sun, J., Zhu, X., Guo, Y.E., Qian, F., Mao, Z.M.: Emp: edge- assisted multi-vehicle perception. In: Proceedings of the 27th Annual International Conference on Mobile Computing and Networking. p. 545–558. MobiCom ’21, Association for Computing Machinery, New York, NY, USA (2021).https://doi. org/10.1145/3447993.3483242,https://doi.org...

work page doi:10.1145/3447993.3483242 2021
[73]

A flexible new technique for camera calibration

Zhang, Z.: A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence22(11), 1330–1334 (2000).https: //doi.org/10.1109/34.888718 CooperScene21

work page doi:10.1109/34.888718 2000
[74]

In: Proceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems

Zhu, R., Zhu, X., Zhang, A., Zhang, X., Sun, J., Qian, F., Qiu, H., Mao, Z.M., Lee, M.: Boosting collaborative vehicular perception on the edge with vehicle-to- vehicle communication. In: Proceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems. p. 141–154. SenSys ’24 (2024).https://doi.org/10. 1145/3666025.3699328,https://doi.org/10.11...

work page doi:10.1145/3666025.3699328 2024
[75]

arXiv preprint arXiv:2403.01316 (2024)

Zimmer, W., Wardana, G.A., Sritharan, S., Zhou, X., Song, R., Knoll, A.: Tumtraf v2x cooperative perception dataset. arXiv preprint arXiv:2403.01316 (2024)

work page arXiv 2024
[76]

In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) 22 B

Zimmer, W., Wardana, G.A., Sritharan, S., Zhou, X., Song, R., Knoll, A.C.: Tumtraf v2x cooperative perception dataset. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) 22 B. Wu et al. Appendix A Limitations and Discussions CooperScenepresents a significant step in real-world cooperative perception, though we acknowledge spe...

2024

[1] [1]

Cohda Wireless MK6.https://www.cohdawireless.com/solutions/mk6/

[2] [2]

Iperf 2.https://iperf.fr/iperf-doc.php

[3] [3]

Linux ptp4l.https://linuxptp.nwtime.org

[4] [4]

https://thinklucid.com/product/triton- 5- mp- imx490/

Lucid Triton Gig-e Camera. https://thinklucid.com/product/triton- 5- mp- imx490/

[5] [5]

Mikrotik poe css610-8p-2s+in.https://mikrotik.com/product/css610_8p_2s_in

[6] [6]

OptiTrack Motion Capture System.https://optitrack.com/

[7] [7]

Ouster.https://ouster.com

[8] [8]

https://www.calian.com/advanced- technologies/gnss_ product/tw8889-dual-band-gnss-antenna

Tallysman tw8889. https://www.calian.com/advanced- technologies/gnss_ product/tw8889-dual-band-gnss-antenna

[9] [9]

https://www.tesla.com/robotaxi

Tesla Robotaxi. https://www.tesla.com/robotaxi

[10] [10]

https : / / www

Thunderbolt 4 10g ethernet adapter. https : / / www . owc . com / solutions / thunderbolt-4-10g-ethernet-adapter

[11] [11]

Vicon Motion Capture System.https://www.vicon.com/

[12] [12]

https://waymo.com

Waymo. https://waymo.com

[13] [13]

https://www.movella.com/sensor-modules/xsens-mti- 680-rtk-gnss-ins

XSense MTi-680 RTK GNSS. https://www.movella.com/sensor-modules/xsens-mti- 680-rtk-gnss-ins

[14] [14]

IEEE Std 1588-2019 (Revision ofIEEE Std 1588-2008) pp

Ieee standard for a precision clock synchronization protocol for networked measure- ment and control systems. IEEE Std 1588-2019 (Revision ofIEEE Std 1588-2008) pp. 1–499 (2020).https://doi.org/10.1109/IEEESTD.2020.9120376

work page doi:10.1109/ieeestd.2020.9120376 2019

[15] [15]

Technical Specification (TS) 36.213 (2021), version 14.17.0

3GPP: Physical layer procedures. Technical Specification (TS) 36.213 (2021), version 14.17.0

2021

[16] [16]

3GPP, E..: Digital cellular telecommunications system (phase 2+) (gsm); universal mobile telecommunications system (umts); lte; 5g; release description; release 14 (3gpp tr 21.914 version 14.0.0) — etsi tr 121 914 v14.0.0. Tech. Rep. TR 121 914 V14.0.0, ETSI (Jun 2018),https://www.etsi.org/deliver/etsi_tr/ 121900_121999/121914/14.00.00_60/tr_121914v140000...

2018

[17] [17]

Accessed: 2025-11-13

2025

[18] [18]

federal motor vehicle safety standards; v2v communications

Administration, N.H.T.S., et al.: Department of transportation (dot)," federal motor vehicle safety standards; v2v communications", notice of proposed rulemaking (nprm). Tech. rep. (2016) CooperScene17

2016

[19] [19]

SAE International (2020)

Automation, C.D.: SAE J3216: Taxonomy and definitions for terms related to cooperative driving automation for on-road motor vehicles. SAE International (2020)

2020

[20] [20]

IEEE Transactions on Pattern Analysis and Machine Intelligence14(2), 239–256 (1992).https://doi

Besl, P., McKay, N.D.: A method for registration of 3-d shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence14(2), 239–256 (1992).https://doi. org/10.1109/34.121791

work page doi:10.1109/34.121791 1992

[21] [21]

Brettle, F., et al.: Google/draco: a library for compressing and decompressing 3d geometric meshes and point clouds.https://github.com/google/draco (2018), accessed: [Insert date of access, e.g., 2025-11-13]

2018

[22] [22]

C-V2X Technical Committee: Sae j3161: LTE vehicle-to-everything (LTE-V2X) deployment profiles and radio parameters for single radio channel multi-service coexistence. Tech. rep., SAE International, 400 Commonwealth Drive, Warrendale, PA, United States (2022)

2022

[23] [23]

nuScenes: A multimodal dataset for autonomous driving

Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., Beijbom, O.: nuscenes: A multimodal dataset for autonomous driving. arXiv:1903.11027 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1903

[24] [24]

Chen, Q., Ma, X., Tang, S., Guo, J., Yang, Q., Fu, S.: F-cooper: Feature based cooperative perception for autonomous vehicle edge computing system using 3d pointclouds.In:Proceedingsofthe4thACM/IEEESymposiumonEdgeComputing. p. 88–100. SEC ’19, Association for Computing Machinery, New York, NY, USA (2019)

2019

[25] [25]

In: 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS)

Chen, Q., Tang, S., Yang, Q., Fu, S.: Cooper: Cooperative perception for connected autonomous vehicles based on 3d point clouds. In: 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). pp. 514–524 (2019). https://doi.org/10.1109/ICDCS.2019.00058

work page doi:10.1109/icdcs.2019.00058 2019

[26] [26]

https://github.com/open-mmlab/mmdetection3d (2020)

Contributors, M.: MMDetection3D: OpenMMLab next-generation platform for general 3D object detection. https://github.com/open-mmlab/mmdetection3d (2020)

2020

[27] [27]

In: IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022

Cui, J., Qiu, H., Chen, D., Stone, P., Zhu, Y.: Coopernaut: End-to-end driving with cooperative perception for networked vehicles. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). p. 17231–17241. IEEE (Jun 2022). https://doi.org/10.1109/cvpr52688.2022.01674 , http://dx.doi.org/ 10.1109/CVPR52688.2022.01674

work page doi:10.1109/cvpr52688.2022.01674 2022

[28] [28]

In: Proceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems

Cui, J., Tang, C., Holtz, J., Nguyen, J., Allievi, A.G., Qiu, H., Stone, P.: Coopreflect: Towards natural language communication for cooperative autonomous driving via multi-agent learning. In: Proceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems. AAMAS ’26 (2026),https: //arxiv.org/abs/2505.18334, oral Presentation

work page arXiv 2026

[29] [29]

In: 5th Symposium on Operating Systems Design and Implementation (OSDI 02)

Elson, J., Girod, L., Estrin, D.: Fine-Grained network time synchronization us- ing reference broadcasts. In: 5th Symposium on Operating Systems Design and Implementation (OSDI 02). USENIX Association, Boston, MA (Dec 2002)

2002

[30] [30]

International Journal of Robotics Research (IJRR) (2013)

Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: The kitti dataset. International Journal of Robotics Research (IJRR) (2013)

2013

[31] [31]

In: 2011 IEEE 32nd Real-Time Systems Symposium

Hao, T., Zhou, R., Xing, G., Mutka, M.: Wizsync: Exploiting wi-fi infrastructure for clock synchronization in wireless sensor networks. In: 2011 IEEE 32nd Real-Time Systems Symposium. pp. 149–158 (2011)

2011

[32] [32]

Advances in neural information processing systems (2022) 18 B

Hu, Y., Fang, S., Lei, Z., Zhong, Y., Chen, S.: Where2comm: Communication- efficient collaborative perception via spatial confidence maps. Advances in neural information processing systems (2022) 18 B. Wu et al

2022

[33] [33]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Hu, Y., Peng, J., Liu, S., Ge, J., Liu, S., Chen, S.: Communication-efficient col- laborative perception via information filling with codebook. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 15481–15490 (2024)

2024

[34] [34]

In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Iyer, G., Ram, R.K., Murthy, J.K., Krishna, K.M.: Calibnet: Geometrically super- vised extrinsic calibration using 3d spatial transformer networks. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE (Oct 2018). https://doi.org/10.1109/iros.2018.8593693 , http://dx.doi.org/10. 1109/IROS.2018.8593693

work page doi:10.1109/iros.2018.8593693 2018

[35] [35]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: Pointpillars: Fast encoders for object detection from point clouds. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12697–12705 (2019)

2019

[36] [36]

In: 2020 IEEE Intelligent Vehicles Symposium (IV)

Li, E., Wang, S., Li, C., Li, D., Wu, X., Hao, Q.: Sustech points: A portable 3d point cloud interactive annotation platform system. In: 2020 IEEE Intelligent Vehicles Symposium (IV). pp. 1108–1115 (2020).https://doi.org/10.1109/IV47402.2020. 9304562

work page doi:10.1109/iv47402.2020 2020

[37] [37]

IEEE Transactions on Intelligent Vehicles8(4), 2650–2660 (Apr 2023).https://doi.org/10.1109/tiv

Li, J., Xu, R., Liu, X., Ma, J., Chi, Z., Ma, J., Yu, H.: Learning for vehicle-to- vehicle cooperative perception under lossy communication. IEEE Transactions on Intelligent Vehicles8(4), 2650–2660 (Apr 2023).https://doi.org/10.1109/tiv. 2023.3260040,http://dx.doi.org/10.1109/TIV.2023.3260040

work page doi:10.1109/tiv 2023

[38] [38]

Advances in Neural Information Processing Systems34, 29541–29552 (2021)

Li, Y., Ren, S., Wu, P., Chen, S., Feng, C., Zhang, W.: Learning distilled collabora- tion graph for multi-agent perception. Advances in Neural Information Processing Systems34, 29541–29552 (2021)

2021

[39] [39]

In: IEEE International Conference on Robotics and Automation, ICRA 2023, London, UK, May 29 - June 2, 2023

Liu, Z., Tang, H., Amini, A., Yang, X., Mao, H., Rus, D.L., Han, S.: Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. In: 2023 IEEE International Conference on Robotics and Automation (ICRA). pp. 2774–2781 (2023).https://doi.org/10.1109/ICRA48891.2023.10160968

work page doi:10.1109/icra48891.2023.10160968 2023

[40] [40]

Maróti, M., Kusy, B., Simon, G., Lédeczi, A.: The flooding time synchronization protocol. p. 39–49. SenSys ’04, Association for Computing Machinery, New York, NY, USA (2004)

2004

[41] [41]

In: Proceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems

Mo, R., Wu, B., Tan, Z., Qiu, H.: See-v2x: C-v2x direct communication dataset: An application-centric approach. In: Proceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems. p. 305–311. Association for Computing Machinery, New York, NY, USA (2025), https://doi.org/10.1145/3715014. 3722077

work page doi:10.1145/3715014 2025

[42] [42]

In: The Fourteenth In- ternational Conference on Learning Representations

Mukhopadhyay, S., Roy-Chowdhury, A., Qiu, H.: Coopertrim: Adaptive data selection for uncertainty-aware cooperative perception. In: The Fourteenth In- ternational Conference on Learning Representations. ICLR ’26 (2026),https: //openreview.net/forum?id=8NgKNuHRiH

2026

[43] [43]

National Academics (2020)

National Academies of Sciences, Engineering, and Medicine and others: Business models to facilitate deployment of connected vehicle infrastructure to support automated vehicle operations. National Academics (2020)

2020

[44] [44]

In: Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services

Qiu, H., Ahmad, F., Bai, F., Gruteser, M., Govindan, R.: Avr: Augmented vehicular reality. In: Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services. p. 81–95. MobiSys ’18, New York, NY, USA (2018)

2018

[45] [45]

In: Proceedings of the 20th Annual International Conference on Mobile Systems, Applications, and Services

Qiu, H., Huang, P., Asavisanu, N., Liu, X., Psounis, K., Govindan, R.: Autocast: Scalable infrastructure-less cooperative perception for distributed collaborative driving. In: Proceedings of the 20th Annual International Conference on Mobile Systems, Applications, and Services. MobiSys ’22 (December 2022) CooperScene19

2022

[46] [46]

In: 2018 21st Inter- national Conference on Intelligent Transportation Systems (ITSC)

Rawashdeh, Z.Y., Wang, Z.: Collaborative automated driving: A machine learning- based method to enhance the accuracy of shared information. In: 2018 21st Inter- national Conference on Intelligent Transportation Systems (ITSC). pp. 3961–3966 (2018).https://doi.org/10.1109/ITSC.2018.8569832

work page doi:10.1109/itsc.2018.8569832 2018

[47] [47]

In: Proceedings of the 23rd Annual International Conference on Mobile Systems, Applications, and Services

Ren, H., Zhang, W., Shi, S., Zhang, X., Zhang, L., Zhang, Y.: Unisense: Spatial- uncertainty-aware collaborative sensing for autonomous driving. In: Proceedings of the 23rd Annual International Conference on Mobile Systems, Applications, and Services. MobiSys ’25 (2025)

2025

[48] [48]

Sekaran, K.C., Geisler, M., Rößle, D., Mohan, A., Cremers, D., Utschick, W., Botsch, M., Huber, W., Schön, T.: Urbaning-v2x: A large-scale multi-vehicle, multi- infrastructure dataset across multiple intersections for cooperative perception (2025), https://arxiv.org/abs/2510.23478

work page arXiv 2025

[49] [49]

In: 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems

Strobl, K.H., Hirzinger, G.: Optimal hand-eye calibration. In: 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 4647–4653 (2006). https://doi.org/10.1109/IROS.2006.282250

work page doi:10.1109/iros.2006.282250 2006

[50] [50]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2020)

Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V., Tsui, P., Guo, J., Zhou, Y., Chai, Y., Caine, B., Vasudevan, V., Han, W., Ngiam, J., Zhao, H., Timofeev, A., Ettinger, S., Krivokon, M., Gao, A., Joshi, A., Zhang, Y., Shlens, J., Chen, Z., Anguelov, D.: Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings ...

2020

[51] [51]

IEEE Journal on Robotics and Automation3(4), 323–344 (1987)

Tsai, R.: A versatile camera calibration technique for high-accuracy 3d machine vi- sion metrology using off-the-shelf tv cameras and lenses. IEEE Journal on Robotics and Automation3(4), 323–344 (1987). https://doi.org/10.1109/JRA.1987. 1087109

work page doi:10.1109/jra.1987 1987

[52] [52]

https://www.transportation.gov/av/3 (2018)

USDOT: Preparing for the future of transportation: Automated vehicles 3.0. https://www.transportation.gov/av/3 (2018)

2018

[53] [53]

In: ECCV (2020)

Wang, T.H., Manivasagam, S., Liang, M., Bin, Y., Zeng, W., Tu, J., Urtasun, R.: V2VNet: Vehicle-to-vehicle communication for joint perception and prediction. In: ECCV (2020)

2020

[54] [54]

IEEE Robotics and Automation Letters (2025)

Wang, Z., Wang, Y., Wu, Z., Ma, H., Li, Z., Qiu, H., Li, J.: Cmp: Cooperative motion prediction with multi-agent communication. IEEE Robotics and Automation Letters (2025)

2025

[55] [55]

Waymo: Fleet response: Lending a helpful hand to Waymo’s autonomously driven vehicles.https://waymo.com/blog/2024/05/fleet-response

2024

[56] [56]

In: Na- tional Institute of Standards and Technology (NIST), USA,[Online]: www

Weiss, M.: Telecom requirements for time and frequency synchronization. In: Na- tional Institute of Standards and Technology (NIST), USA,[Online]: www. gps. gov/cgsic/meetings/2012/weiss1. pdf (2012)

2012

[57] [57]

In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Weng, X., Wang, J., Held, D., Kitani, K.: 3d multi-object tracking: A baseline and new evaluation metrics. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 10359–10366 (2020).https://doi.org/10.1109/ IROS45743.2020.9341164

work page arXiv 2020

[58] [58]

Wu, B., Li, J., Mo, R., Yue, J., Bharadia, D., Qiu, H.: Demo Abstract: Cooperative Multi-modal Sensing, p. 712–713. Association for Computing Machinery, New York, NY, USA (2025),https://doi.org/10.1145/3715014.3724372

work page doi:10.1145/3715014.3724372 2025

[59] [59]

Xu, J., Zhang, Y., Cai, Z., Huang, D.: Cosdh: Communication-efficient collaborative perception via supply-demand awareness and intermediate-late hybridization (2025), https://arxiv.org/abs/2503.03430

work page arXiv 2025

[60] [60]

In: Conference on Robot Learning (CoRL) (2022) 20 B

Xu, R., Tu, Z., Xiang, H., Shao, W., Zhou, B., Ma, J.: Cobevt: Cooperative bird’s eye view semantic segmentation with sparse transformers. In: Conference on Robot Learning (CoRL) (2022) 20 B. Wu et al

2022

[61] [61]

In: The IEEE/CVF Computer Vision and Pattern Recognition Conference (2023)

Xu, R., Xia, X., Li, J., Li, H., Zhang, S., Tu, Z., Meng, Z., Xiang, H., Dong, X., Song, R., Yu, H., Zhou, B., Ma, J.: V2v4real: A real-world large-scale dataset for vehicle-to-vehicle cooperative perception. In: The IEEE/CVF Computer Vision and Pattern Recognition Conference (2023)

2023

[62] [62]

In: Proceedings of the European Conference on Computer Vision (2022)

Xu,R.,Xiang,H.,Tu,Z.,Xia,X.,Yang,M.H.,Ma,J.:V2x-vit:Vehicle-to-everything cooperative perception with vision transformer. In: Proceedings of the European Conference on Computer Vision (2022)

2022

[63] [63]

In: 2022 International Conference on Robotics and Automation (ICRA)

Xu, R., Xiang, H., Xia, X., Han, X., Li, J., Ma, J.: Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication. In: 2022 International Conference on Robotics and Automation (ICRA). pp. 2583–2589. IEEE (2022)

2022

[64] [64]

Sensors18(10), 3337 (2018)

Yan, Y., Mao, Y., Li, B.: Second: Sparsely embedded convolutional detection. Sensors18(10), 3337 (2018)

2018

[65] [65]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Yu, H., Luo, Y., Shu, M., Huo, Y., Yang, Z., Shi, Y., Guo, Z., Li, H., Hu, X., Yuan, J., Nie, Z.: Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 21361–21370 (2022)

2022

[66] [66]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)

Yu, H., Yang, W., Ruan, H., Yang, Z., Tang, Y., Gao, X., Hao, X., Shi, Y., Pan, Y., Sun, N., Song, J., Yuan, J., Luo, P., Nie, Z.: V2x-seq: A large-scale sequential dataset for vehicle-infrastructure cooperative perception and forecasting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)

2023

[67] [67]

In: The 39th Annual AAAI Conference on Artificial Intelligence (2025)

Yu, H., Yang, W., Zhong, J., Yang, Z., Fan, S., Luo, P., Nie, Z.: End-to-end au- tonomous driving through v2x cooperation. In: The 39th Annual AAAI Conference on Artificial Intelligence (2025)

2025

[68] [68]

In: 9th Annual Conference on Robot Learning

Yuan, W., Li, J., Yue, J., Shah, D., Karydis, K., Qiu, H.: Bevcalib: Lidar-camera calibration via geometry-guided bird’s-eye view representations. In: 9th Annual Conference on Robot Learning. CoRL ’25 (2025),https://arxiv.org/abs/2506. 02587

2025

[69] [69]

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR)

Zhang, J., Yang, K., Wang, Y., Wang, H., Sun, P., Song, L.: Ermvp: Communication- efficient and collaboration-robust multi-vehicle perception in challenging environ- ments. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR). pp. 12575–12584 (2024).https://doi.org/10.1109/CVPR52733. 2024.01195

work page doi:10.1109/cvpr52733 2024

[70] [70]

In: 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat

Zhang, Q., Pless, R.: Extrinsic calibration of a camera and laser range finder (improves camera calibration). In: 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566). vol. 3, pp. 2301–2306 vol.3 (2004).https://doi.org/10.1109/IROS.2004.1389752

work page doi:10.1109/iros.2004.1389752 2004

[71] [71]

In: Proceedings of the 29th Annual International Conference on Mobile Computing and Networking

Zhang, Q., Zhang, X., Zhu, R., Bai, F., Naserian, M., Mao, Z.M.: Robust real-time multi-vehicle collaboration on asynchronous sensors. In: Proceedings of the 29th Annual International Conference on Mobile Computing and Networking. pp. 1–15 (2023)

2023

[72] [72]

In: Proceedings of the 27th Annual International Conference on Mobile Computing and Networking

Zhang, X., Zhang, A., Sun, J., Zhu, X., Guo, Y.E., Qian, F., Mao, Z.M.: Emp: edge- assisted multi-vehicle perception. In: Proceedings of the 27th Annual International Conference on Mobile Computing and Networking. p. 545–558. MobiCom ’21, Association for Computing Machinery, New York, NY, USA (2021).https://doi. org/10.1145/3447993.3483242,https://doi.org...

work page doi:10.1145/3447993.3483242 2021

[73] [73]

A flexible new technique for camera calibration

Zhang, Z.: A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence22(11), 1330–1334 (2000).https: //doi.org/10.1109/34.888718 CooperScene21

work page doi:10.1109/34.888718 2000

[74] [74]

In: Proceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems

Zhu, R., Zhu, X., Zhang, A., Zhang, X., Sun, J., Qian, F., Qiu, H., Mao, Z.M., Lee, M.: Boosting collaborative vehicular perception on the edge with vehicle-to- vehicle communication. In: Proceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems. p. 141–154. SenSys ’24 (2024).https://doi.org/10. 1145/3666025.3699328,https://doi.org/10.11...

work page doi:10.1145/3666025.3699328 2024

[75] [75]

arXiv preprint arXiv:2403.01316 (2024)

Zimmer, W., Wardana, G.A., Sritharan, S., Zhou, X., Song, R., Knoll, A.: Tumtraf v2x cooperative perception dataset. arXiv preprint arXiv:2403.01316 (2024)

work page arXiv 2024

[76] [76]

In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) 22 B

Zimmer, W., Wardana, G.A., Sritharan, S., Zhou, X., Song, R., Knoll, A.C.: Tumtraf v2x cooperative perception dataset. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) 22 B. Wu et al. Appendix A Limitations and Discussions CooperScenepresents a significant step in real-world cooperative perception, though we acknowledge spe...

2024