CORP: A Multi-Modal Dataset for Campus-Oriented Roadside Perception Tasks

Beibei Wang; Haojie Ren; Jianmin Ji; Jingjing Huang; Lu Zhang; Yanyong Zhang; Yao Li; Yuru Peng; Yuxuan Xiao; Yu Zhang

arxiv: 2404.03191 · v3 · submitted 2024-04-04 · 💻 cs.CV

CORP: A Multi-Modal Dataset for Campus-Oriented Roadside Perception Tasks

Beibei Wang , Zijian Yu , Lu Zhang , Jingjing Huang , Yao Li , Haojie Ren , Yuxuan Xiao , Yuru Peng

show 3 more authors

Jianmin Ji Yu Zhang Yanyong Zhang

This is my paper

Pith reviewed 2026-05-24 02:16 UTC · model grok-4.3

classification 💻 cs.CV

keywords roadside perceptionmulti-modal datasetcampus scenariosautonomous drivingLiDARobject trackinginstance segmentationbenchmark dataset

0 comments

The pith

CORP is the first public benchmark dataset for multi-modal roadside perception in campus settings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper notes that existing roadside perception datasets concentrate on urban arterial roads and overlook residential areas such as campuses that exhibit distinct characteristics. It releases CORP to address this gap, a collection of over 205k images and 102k point clouds from 18 cameras and 9 LiDAR sensors mounted on utility poles within a university campus. Annotations extend beyond bounding boxes to include unique IDs for tracking and pixel masks for instance segmentation. A sympathetic reader would care because the dataset supplies the data needed to develop and evaluate perception systems for objects and behaviors in these overlooked environments.

Core claim

The authors propose CORP as the first public benchmark dataset tailored for multi-modal roadside perception tasks under campus scenarios. Collected in a university campus, CORP consists of over 205k images plus 102k point clouds captured from 18 cameras and 9 LiDAR sensors with different configurations mounted on roadside utility poles to provide diverse viewpoints. The annotations encompass multi-dimensional information beyond 2D and 3D bounding boxes, providing extra support for 3D seamless tracking and instance segmentation with unique IDs and pixel masks for identifying targets, to enhance the understanding of objects and their behaviors distributed across the campus premises.

What carries the argument

The CORP dataset, built from synchronized multi-modal sensor streams on utility poles together with extended labels for tracking and segmentation.

If this is right

Researchers can train and benchmark multi-modal fusion methods on synchronized campus image and point cloud streams from varied viewpoints.
Algorithms for 3D object tracking can exploit the unique IDs across frames to maintain identities through campus scenes.
Instance segmentation models gain access to pixel masks that link 2D and 3D annotations for the same targets.
Perception systems for intelligent transportation can be evaluated on residential-area challenges such as pedestrian and cyclist behaviors near campus buildings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The multi-view utility-pole setup could be replicated in other non-arterial environments such as parks to test whether the same annotation style generalizes.
Comparison experiments between CORP and urban datasets would quantify how much viewpoint and scene-type differences affect current model performance.
The dataset's scale and sensor diversity make it suitable for studying long-term object re-identification across repeated campus routes.

Load-bearing premise

Campus scenarios exhibit entirely distinct characteristics from urban arterial roads that are not addressed by existing datasets.

What would settle it

Demonstration that perception models trained solely on existing urban roadside datasets reach equivalent accuracy on campus tasks without retraining or new labels.

Figures

Figures reproduced from arXiv: 2404.03191 by Beibei Wang, Haojie Ren, Jianmin Ji, Jingjing Huang, Lu Zhang, Yanyong Zhang, Yao Li, Yuru Peng, Yuxuan Xiao, Yu Zhang, Zijian Yu.

**Figure 2.** Figure 2: A BEV overview of the data pattern in CORP. The colored are LiDAR point clouds and the schematic yellow [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: An overview of 4 types of coordinate systems involved in CORP, [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: CORP image examples with (a) 3D annotations, (b) 2D boxes, and (c) segmentation masks. All are images [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Distribution of target location and orientation. (a) is a stacked overview of targets under BEV in their [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: A comparision of the target density in CORP and two of its urban-road counterparts. (a) and (c) are the [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Some challenging scenarios in CORP for object detection and segmentation tasks. [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: A performance comparison beween P3D and IPM. The [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 9.** Figure 9: An illustration of camera and roadside coordinate systems. The camera coordinate system is denoted as [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

**Figure 10.** Figure 10: Overview of our P3D method. .2.2 Implementation details Once the intrinsic parameters, pose angles and the height of camera sensors are measured, we can lift a 2D target in an image to a 3D point in the camera coordinate system by following the closed-form Eq. (7) and Eq. (8) with no computational cost, given an image-based 2D detector employed beforehand to produce the bounding boxes of interested target… view at source ↗

**Figure 11.** Figure 11: Flatness profiles of the typical ground surfaces in the dataset and sample images for cameras in the [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗

read the original abstract

Numerous roadside perception datasets have been introduced to propel advancements in autonomous driving and intelligent transportation systems research and development. However, it has been observed that the majority of their concentrates is on urban arterial roads, inadvertently overlooking residential areas such as parks and campuses that exhibit entirely distinct characteristics. In light of this gap, we propose CORP, which stands as the first public benchmark dataset tailored for multi-modal roadside perception tasks under campus scenarios. Collected in a university campus, CORP consists of over 205k images plus 102k point clouds captured from 18 cameras and 9 LiDAR sensors. These sensors with different configurations are mounted on roadside utility poles to provide diverse viewpoints within the campus region. The annotations of CORP encompass multi-dimensional information beyond 2D and 3D bounding boxes, providing extra support for 3D seamless tracking and instance segmentation with unique IDs and pixel masks for identifying targets, to enhance the understanding of objects and their behaviors distributed across the campus premises. Unlike other roadside datasets about urban traffic, CORP extends the spectrum to highlight the challenges for multi-modal perception in campuses and other residential areas.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CORP releases a campus roadside dataset with multi-modal sensors and tracking/segmentation annotations, but offers no quantitative comparisons to show campus scenes differ meaningfully from existing urban roadside data.

read the letter

The core contribution is the release of CORP: over 205k images and 102k point clouds from 18 cameras and 9 LiDARs on utility poles, plus annotations that include unique IDs for 3D tracking and pixel masks for instance segmentation. That setup and annotation richness is the actual new thing here, and it is documented clearly enough in the description to be usable by others working on roadside perception outside dense urban arterials. Releasing data with those extra dimensions is a practical step forward for tasks that need consistent object identities or shape details across frames. The collection from a single university campus with varied viewpoints is also straightforward to understand from the sensor placement details. The soft spot is the missing support for the central justification. The paper asserts that campuses have entirely distinct characteristics from urban roads and that prior datasets overlook residential areas, yet it provides no numbers on object density, trajectory statistics, scene diversity, or direct side-by-side metrics against the urban roadside datasets it cites. Without those, the claim that this fills a genuine gap stays untested. There are no equations, fitted parameters, or internal contradictions to worry about; the work is a data collection effort. This paper is mainly for researchers who need campus-specific training or test data for multi-modal perception and are willing to evaluate the distinctness claim themselves. A reader focused on intelligent transportation in non-arterial settings could extract value from the raw release. It is worth sending to peer review because new, well-scoped datasets with documented sensor setups and richer labels can still be useful even when the motivation section is light on evidence.

Referee Report

2 major / 2 minor

Summary. The paper introduces CORP, claimed as the first public multi-modal roadside perception benchmark for campus scenarios. It comprises over 205k images and 102k point clouds captured by 18 cameras and 9 LiDARs mounted on utility poles, with annotations extending beyond 2D/3D bounding boxes to include unique IDs for 3D tracking and pixel masks for instance segmentation. The central claim is that existing datasets focus on urban arterial roads while overlooking distinct characteristics of residential/campus areas, and that CORP fills this gap with its sensor diversity and multi-dimensional labels.

Significance. If the novelty and distinctness claims hold, CORP would offer a useful addition to the literature by providing data from an underrepresented environment (university campus) with rich annotations supporting tracking and segmentation tasks. The multi-view, multi-modal sensor configuration is a concrete strength for perception research.

major comments (2)

[Abstract] Abstract: The assertion that CORP 'stands as the first public benchmark dataset tailored for multi-modal roadside perception tasks under campus scenarios' and that campuses 'exhibit entirely distinct characteristics' is load-bearing for the contribution but is not supported by any quantitative comparisons (object density, trajectory statistics, scene diversity metrics, or similar) to prior roadside datasets; without these, the gap-filling claim cannot be evaluated.
[Data annotation / labeling sections] Annotation description (full text, data collection and labeling sections): No details are provided on annotation validation procedures, quality control, or metrics such as inter-annotator agreement; this directly affects the claim that the 'multi-dimensional information' and 'unique IDs and pixel masks' meaningfully enhance understanding of objects and behaviors.

minor comments (2)

[Abstract] Abstract: Typo/grammar: 'the majority of their concentrates is' should be rephrased to 'the majority of their concentration is' or 'most of their focus is'.
[Abstract] Abstract: The phrasing 'Unlike other roadside datasets about urban traffic' is imprecise; consider 'Unlike other roadside datasets focused on urban traffic'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses

Referee: [Abstract] Abstract: The assertion that CORP 'stands as the first public benchmark dataset tailored for multi-modal roadside perception tasks under campus scenarios' and that campuses 'exhibit entirely distinct characteristics' is load-bearing for the contribution but is not supported by any quantitative comparisons (object density, trajectory statistics, scene diversity metrics, or similar) to prior roadside datasets; without these, the gap-filling claim cannot be evaluated.

Authors: We agree that the distinctness claim would be strengthened by quantitative evidence. In the revised manuscript we will insert a new comparison subsection (or table) reporting concrete metrics—object density per frame, average trajectory duration, number of unique object classes per scene, and scene entropy measures—computed on CORP versus representative prior roadside datasets focused on arterial roads. This addition will allow readers to evaluate the claimed gap directly. revision: yes
Referee: [Data annotation / labeling sections] Annotation description (full text, data collection and labeling sections): No details are provided on annotation validation procedures, quality control, or metrics such as inter-annotator agreement; this directly affects the claim that the 'multi-dimensional information' and 'unique IDs and pixel masks' meaningfully enhance understanding of objects and behaviors.

Authors: We accept that the absence of quality-control details weakens the annotation claims. The revised manuscript will add a concise subsection under Data Annotation that describes the multi-stage validation workflow (initial labeling followed by independent review by two additional annotators), the resolution protocol for disagreements, and the computed inter-annotator agreement scores (e.g., IoU for boxes and masks, ID consistency for tracking). revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces a new multi-modal roadside perception dataset (CORP) collected on a university campus. It contains no equations, derivations, fitted parameters, predictions, or uniqueness theorems. The central claim—that CORP is the first public benchmark for campus scenarios because prior datasets focus on urban arterial roads—is presented as an empirical observation rather than a derived result. No load-bearing step reduces by construction to the paper's own inputs, self-citations, or ansatzes. The contribution is self-contained as a data-collection and annotation effort.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a dataset introduction paper whose central contribution is the collection and annotation of new empirical data rather than any fitted parameters, unproven axioms, or postulated entities.

pith-pipeline@v0.9.0 · 5755 in / 1071 out tokens · 50232 ms · 2026-05-24T02:16:14.306567+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages

[1]

Tennille, Srinath K

Haowen Xu, Anne Berres, Sarah A. Tennille, Srinath K. Ravulaparthy, Chieh Wang, and Jibonananda Sanyal. Continuous emulation and multiscale visualization of traffic flow using stationary roadside sensor data. IEEE Transactions on Intelligent Transportation Systems, 23(8):10530–10541, 2022

work page 2022
[2]

Vips: Real-time perception fusion for infrastructure-assisted autonomous driving

Shuyao Shi, Jiahe Cui, Zhehao Jiang, Zhenyu Yan, Guoliang Xing, Jianwei Niu, and Zhenchao Ouyang. Vips: Real-time perception fusion for infrastructure-assisted autonomous driving. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking , MobiCom ’22, page 133–146, New York, NY , USA, 2022. Association for Computing Machinery

work page 2022
[3]

Shishir Shah

Pranav Mantini, Zhenggang Li, and K. Shishir Shah. A day on campus - an anomaly detection dataset for events in a single camera. In Hiroshi Ishikawa, Cheng-Lin Liu, Tomas Pajdla, and Jianbo Shi, editors, Computer Vision – ACCV 2020, pages 619–635, Cham, 2021. Springer International Publishing

work page 2020
[4]

Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection

Haibao Yu, Yizhen Luo, Mao Shu, Yiyi Huo, Zebang Yang, Yifeng Shi, Zhenglong Guo, Hanyu Li, Xing Hu, Jirui Yuan, and Zaiqing Nie. Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 21361–21370, June 2022

work page 2022
[5]

Rope3d: The roadside perception dataset for autonomous driving and monocular 3d object detection task

Xiaoqing Ye, Mao Shu, Hanyu Li, Yifeng Shi, Yingying Li, Guangjie Wang, Xiao Tan, and Errui Ding. Rope3d: The roadside perception dataset for autonomous driving and monocular 3d object detection task. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 21341–21350, June 2022

work page 2022
[6]

Walter Zimmer, Christian Creß, Huu Tung Nguyen, and Alois C. Knoll. A9 intersection dataset: All you need for urban 3d camera-lidar roadside perception, 2023

work page 2023
[7]

Ips300+: a challenging multi-modal data sets for intersection perception system

Huanan Wang, Xinyu Zhang, Zhiwei Li, Jun Li, Kun Wang, Zhu Lei, and Ren Haibing. Ips300+: a challenging multi-modal data sets for intersection perception system. In 2022 International Conference on Robotics and Automation (ICRA), pages 2539–2545, 2022. 11

work page 2022
[8]

A9-dataset: Multi-sensor infrastructure-based dataset for mobility research

Christian Creß, Walter Zimmer, Leah Strand, Maximilian Fortkord, Siyi Dai, Venkatnarayanan Lakshmi- narasimhan, and Alois Knoll. A9-dataset: Multi-sensor infrastructure-based dataset for mobility research. In 2022 IEEE Intelligent V ehicles Symposium (IV), pages 965–970, 2022

work page 2022
[9]

Bevheight: A robust framework for vision-based roadside 3d object detection

Lei Yang, Kaicheng Yu, Tao Tang, Jun Li, Kun Yuan, Li Wang, Xinyu Zhang, and Peng Chen. Bevheight: A robust framework for vision-based roadside 3d object detection. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), March 2023

work page 2023
[10]

Bevdepth: Acquisition of reliable depth for multi-view 3d object detection, 2022

Yinhao Li, Zheng Ge, Guanyi Yu, Jinrong Yang, Zengran Wang, Yukang Shi, Jianjian Sun, and Zeming Li. Bevdepth: Acquisition of reliable depth for multi-view 3d object detection, 2022

work page 2022
[11]

Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers, 2022

Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, and Jifeng Dai. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers, 2022

work page 2022
[12]

A revisit of sparse coding based anomaly detection in stacked rnn framework

Weixin Luo, Wen Liu, and Shenghua Gao. A revisit of sparse coding based anomaly detection in stacked rnn framework. In 2017 IEEE International Conference on Computer Vision (ICCV) , pages 341–349, 2017

work page 2017
[13]

A new comprehensive benchmark for semi-supervised video anomaly detection and anticipation

Congqi Cao, Yue Lu, Peng Wang, and Yanning Zhang. A new comprehensive benchmark for semi-supervised video anomaly detection and anticipation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20392–20401, June 2023

work page 2023
[14]

Developing and testing robust autonomy: The university of sydney campus data set

Wei Zhou, Julie Stephany Berrio, Charika De Alvis, Mao Shan, Stewart Worrall, James Ward, and Eduardo Nebot. Developing and testing robust autonomy: The university of sydney campus data set. IEEE Intelligent Transportation Systems Magazine, 12(4):23–40, 2020

work page 2020
[15]

Campus3d: A photogrammetry point cloud benchmark for hierarchical understanding of outdoor scene

Xinke Li, Chongshou Li, Zekun Tong, Andrew Lim, Junsong Yuan, Yuwei Wu, Jing Tang, and Raymond Huang. Campus3d: A photogrammetry point cloud benchmark for hierarchical understanding of outdoor scene. MM ’20, page 238–246, New York, NY , USA, 2020. Association for Computing Machinery

work page 2020
[16]

V2x-seq: A large-scale sequential dataset for vehicle-infrastructure cooperative perception and forecasting

Haibao Yu, Wenxian Yang, Hongzhi Ruan, Zhenwei Yang, Yingjuan Tang, Xu Gao, Xin Hao, Yifeng Shi, Yifeng Pan, Ning Sun, Juan Song, Jirui Yuan, Ping Luo, and Zaiqing Nie. V2x-seq: A large-scale sequential dataset for vehicle-infrastructure cooperative perception and forecasting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogn...

work page 2023
[17]

Int2: Interactive trajectory prediction at intersections

Zhijie Yan, Pengfei Li, Zheng Fu, Shaocong Xu, Yongliang Shi, Xiaoxue Chen, Yuhang Zheng, Yang Li, Tianyu Liu, Chuxuan Li, Nairui Luo, Xu Gao, Yilun Chen, Zuoxu Wang, Yifeng Shi, Pengfei Huang, Zhengxiao Han, Jirui Yuan, Jiangtao Gong, Guyue Zhou, Hang Zhao, and Hao Zhao. Int2: Interactive trajectory prediction at intersections. In Proceedings of the IEEE...

work page 2023
[18]

You only look once: Unified, real-time object detection

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages 779–788, 2016

work page 2016
[19]

Yolo9000: Better, faster, stronger

Joseph Redmon and Ali Farhadi. Yolo9000: Better, faster, stronger. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6517–6525, 2017

work page 2017
[20]

Yolov3: An incremental improvement, 2018

Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement, 2018

work page 2018
[21]

Yolov4: Optimal speed and accuracy of object detection, 2020

Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. Yolov4: Optimal speed and accuracy of object detection, 2020

work page 2020
[22]

YOLOv5 by Ultralytics, May 2020

Glenn Jocher. YOLOv5 by Ultralytics, May 2020

work page 2020
[23]

You only learn one representation: Unified network for multiple tasks, 2021

Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao. You only learn one representation: Unified network for multiple tasks, 2021

work page 2021
[24]

Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, 2022

Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, 2022

work page 2022
[25]

YOLO by Ultralytics, January 2023

Glenn Jocher, Ayush Chaurasia, and Jing Qiu. YOLO by Ultralytics, January 2023

work page 2023
[26]

Modnet: Motion and appearance based moving object detection network for autonomous driving

Mennatullah Siam, Heba Mahgoub, Mohamed Zahran, Senthil Yogamani, Martin Jagersand, and Ahmad El-Sallab. Modnet: Motion and appearance based moving object detection network for autonomous driving. In 2018 21st International Conference on Intelligent Transportation Systems (ITSC) , pages 2859–2864, 2018

work page 2018
[27]

Monocular instance motion segmentation for autonomous driving: Kitti instancemotseg dataset and multi-task baseline

Eslam Mohamed, Mahmoud Ewaisha, Mennatullah Siam, Hazem Rashed, Senthil Yogamani, Waleed Hamdy, Mohamed El-Dakdouky, and Ahmad El-Sallab. Monocular instance motion segmentation for autonomous driving: Kitti instancemotseg dataset and multi-task baseline. In 2021 IEEE Intelligent V ehicles Symposium (IV), pages 114–121, 2021

work page 2021
[28]

Learning to segment rigid motions from two frames

Gengshan Yang and Deva Ramanan. Learning to segment rigid motions from two frames. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 1266–1275, 2021. 12

work page 2021
[29]

Discovering objects that can move

Zhipeng Bao, Pavel Tokmakov, Allan Jabri, Yu-Xiong Wang, Adrien Gaidon, and Martial Hebert. Discovering objects that can move. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 11779–11788, 2022

work page 2022
[30]

Segmenting moving objects via an object-centric layered representation

Junyu Xie, Weidi Xie, and Andrew Zisserman. Segmenting moving objects via an object-centric layered representation. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 28023–28036. Curran Associates, Inc., 2022

work page 2022
[31]

Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom

Alex H. Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. Pointpillars: Fast encoders for object detection from point clouds. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12689–12697, 2019

work page 2019
[32]

V oxelnet: End-to-end learning for point cloud based 3d object detection

Yin Zhou and Oncel Tuzel. V oxelnet: End-to-end learning for point cloud based 3d object detection. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 4490–4499, 2018

work page 2018
[33]

3dssd: Point-based 3d single stage object detector

Zetong Yang, Yanan Sun, Shu Liu, and Jiaya Jia. 3dssd: Point-based 3d single stage object detector. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 11037–11045, 2020

work page 2020
[34]

V oxel r-cnn: Towards high performance voxel-based 3d object detection, 2021

Jiajun Deng, Shaoshuai Shi, Peiwei Li, Wengang Zhou, Yanyong Zhang, and Houqiang Li. V oxel r-cnn: Towards high performance voxel-based 3d object detection, 2021

work page 2021
[35]

Center-based 3d object detection and tracking

Tianwei Yin, Xingyi Zhou, and Philipp Krähenbühl. Center-based 3d object detection and tracking. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 11779–11788, 2021

work page 2021
[36]

Neighbor-vote: Improving monocular 3d object detection through neighbor distance voting

Xiaomeng Chu, Jiajun Deng, Yao Li, Zhenxun Yuan, Yanyong Zhang, Jianmin Ji, and Yu Zhang. Neighbor-vote: Improving monocular 3d object detection through neighbor distance voting. In Proceedings of the 29th ACM International Conference on Multimedia, MM ’21, page 5239–5247, New York, NY , USA, 2021. Association for Computing Machinery

work page 2021
[37]

Objects are different: Flexible monocular 3d object detection

Yunpeng Zhang, Jiwen Lu, and Jie Zhou. Objects are different: Flexible monocular 3d object detection. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 3288–3297, 2021

work page 2021
[38]

Imvoxelnet: Image to voxels projection for monocular and multi-view general-purpose 3d object detection

Danila Rukhovich, Anna V orontsova, and Anton Konushin. Imvoxelnet: Image to voxels projection for monocular and multi-view general-purpose 3d object detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2397–2406, 2022

work page 2022
[39]

Monoatt: Online monocular 3d object detection with adaptive token transformer

Yunsong Zhou, Hongzi Zhu, Quan Liu, Shan Chang, and Minyi Guo. Monoatt: Online monocular 3d object detection with adaptive token transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17493–17503, June 2023

work page 2023
[40]

Unimode: Unified monocular 3d object detection, 2024

Zhuoling Li, Xiaogang Xu, SerNam Lim, and Hengshuang Zhao. Unimode: Unified monocular 3d object detection, 2024

work page 2024
[41]

Monouni: A unified vehicle and infrastructure-side monocular 3d object detection network with sufficient depth clues

Jia Jinrang, Zhenjia Li, and Yifeng Shi. Monouni: A unified vehicle and infrastructure-side monocular 3d object detection network with sufficient depth clues. In A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems , volume 36, pages 11703–11715. Curran Associates, Inc., 2023

work page 2023
[42]

Pointpainting: Sequential fusion for 3d object detection

Sourabh V ora, Alex H Lang, Bassam Helou, and Oscar Beijbom. Pointpainting: Sequential fusion for 3d object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 4604–4612, 2020

work page 2020
[43]

Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation

Zhijian Liu, Haotian Tang, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela L Rus, and Song Han. Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. In2023 IEEE International Conference on Robotics and Automation (ICRA) , pages 2774–2781. IEEE, 2023

work page 2023
[44]

3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection

Jin Hyeok Yoo, Yecheol Kim, Jisong Kim, and Jun Won Choi. 3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVII 16 , pages 720–736. Springer, 2020

work page 2020
[45]

Simple online and realtime tracking

Alex Bewley, Zongyuan Ge, Lionel Ott, Fabio Ramos, and Ben Upcroft. Simple online and realtime tracking. In 2016 IEEE International Conference on Image Processing (ICIP) , pages 3464–3468, 2016

work page 2016
[46]

Simple online and realtime tracking with a deep association metric

Nicolai Wojke, Alex Bewley, and Dietrich Paulus. Simple online and realtime tracking with a deep association metric. In 2017 IEEE International Conference on Image Processing (ICIP) , pages 3645–3649. IEEE, 2017

work page 2017
[47]

Observation-centric sort: Rethinking sort for robust multi-object tracking

Jinkun Cao, Jiangmiao Pang, Xinshuo Weng, Rawal Khirodkar, and Kris Kitani. Observation-centric sort: Rethinking sort for robust multi-object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9686–9696, June 2023. 13

work page 2023
[48]

3d multi-object tracking: A baseline and new evaluation metrics

Xinshuo Weng, Jianren Wang, David Held, and Kris Kitani. 3d multi-object tracking: A baseline and new evaluation metrics. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages 10359–10366, 2020

work page 2020
[49]

Cross-modal 3d object detection and tracking for auto-driving

Yihan Zeng, Chao Ma, Ming Zhu, Zhiming Fan, and Xiaokang Yang. Cross-modal 3d object detection and tracking for auto-driving. In Proc. Int. Conf. Intell. Robots Syst , pages 3850–3857. IEEE, 2021

work page 2021
[50]

Camera calibrator, 2022

The MathWorks Inc. Camera calibrator, 2022

work page 2022
[51]

A flexible new technique for camera calibration

Zhengyou Zhang. A flexible new technique for camera calibration. IEEE Transactions on pattern analysis and machine intelligence, 22(11):1330–1334, 2000

work page 2000
[52]

Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. Segment anything, 2023

work page 2023
[53]

Sustech points: A portable 3d point cloud interactive annotation platform system

E Li, Shuaijun Wang, Chengyang Li, Dachuan Li, Xiangbin Wu, and Qi Hao. Sustech points: A portable 3d point cloud interactive annotation platform system. In 2020 IEEE Intelligent V ehicles Symposium (IV), pages 1108–1115, 2020

work page 2020
[54]

U2-onet: A two-level nested octave u-structure network with a multi-scale attention mechanism for moving object segmentation

Chenjie Wang, Chengyuan Li, Jun Liu, Bin Luo, Xin Su, Yajun Wang, and Yan Gao. U2-onet: A two-level nested octave u-structure network with a multi-scale attention mechanism for moving object segmentation. Remote Sensing, 13(1), 2021

work page 2021
[55]

Riwnet: A moving object instance segmentation network being robust in adverse weather conditions, 2021

Chenjie Wang, Chengyuan Li, Bin Luo, Wei Wang, and Jun Liu. Riwnet: A moving object instance segmentation network being robust in adverse weather conditions, 2021

work page 2021
[56]

Real-time vehicle distance estimation using single view geometry

Ahmed Ali, Ali Hassan, Afsheen Rafaqat Ali, Hussam Ullah Khan, Wajahat Kazmi, and Aamer Zaheer. Real-time vehicle distance estimation using single view geometry. In 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1100–1109, 2020

work page 2020
[57]

Deep learning based vehicle position and orientation estimation via inverse perspective mapping image

Youngseok Kim and Dongsuk Kum. Deep learning based vehicle position and orientation estimation via inverse perspective mapping image. In 2019 IEEE Intelligent V ehicles Symposium (IV), pages 317–323, 2019

work page 2019
[58]

Joint vehicle detection and distance prediction via monocular depth estimation

Chao Shen, Xiangmo Zhao, Zhanwen Liu, Tao Gao, and Jiang Xu. Joint vehicle detection and distance prediction via monocular depth estimation. IET Intelligent Transport Systems, 14(7):753–763, 2020

work page 2020
[59]

Towards generalization across depth for monocular 3d object detection

Andrea Simonelli, Samuel Rota Buló, Lorenzo Porzi, Elisa Ricci, and Peter Kontschieder. Towards generalization across depth for monocular 3d object detection. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision – ECCV 2020, pages 767–782, Cham, 2020. Springer International Publishing

work page 2020
[60]

Towards model generalization for monocular 3d object detection, 2022

Zhenyu Li, Zehui Chen, Ang Li, Liangji Fang, Qinhong Jiang, Xianming Liu, and Junjun Jiang. Towards model generalization for monocular 3d object detection, 2022

work page 2022
[61]

u v 1 # = K ·

Massimo Bertozz, Alberto Broggi, and Alessandra Fascioli. Stereo inverse perspective mapping: theory and applications. Image and Vision Computing, 16(8):585–590, 1998. 14 Appendix .1 Rationale for P3D In roadside scenarios, the camera sensor is usually positioned at a certain height Hc above a local ground plane. We define the camera and the road coordina...

work page 1998

[1] [1]

Tennille, Srinath K

Haowen Xu, Anne Berres, Sarah A. Tennille, Srinath K. Ravulaparthy, Chieh Wang, and Jibonananda Sanyal. Continuous emulation and multiscale visualization of traffic flow using stationary roadside sensor data. IEEE Transactions on Intelligent Transportation Systems, 23(8):10530–10541, 2022

work page 2022

[2] [2]

Vips: Real-time perception fusion for infrastructure-assisted autonomous driving

Shuyao Shi, Jiahe Cui, Zhehao Jiang, Zhenyu Yan, Guoliang Xing, Jianwei Niu, and Zhenchao Ouyang. Vips: Real-time perception fusion for infrastructure-assisted autonomous driving. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking , MobiCom ’22, page 133–146, New York, NY , USA, 2022. Association for Computing Machinery

work page 2022

[3] [3]

Shishir Shah

Pranav Mantini, Zhenggang Li, and K. Shishir Shah. A day on campus - an anomaly detection dataset for events in a single camera. In Hiroshi Ishikawa, Cheng-Lin Liu, Tomas Pajdla, and Jianbo Shi, editors, Computer Vision – ACCV 2020, pages 619–635, Cham, 2021. Springer International Publishing

work page 2020

[4] [4]

Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection

Haibao Yu, Yizhen Luo, Mao Shu, Yiyi Huo, Zebang Yang, Yifeng Shi, Zhenglong Guo, Hanyu Li, Xing Hu, Jirui Yuan, and Zaiqing Nie. Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 21361–21370, June 2022

work page 2022

[5] [5]

Rope3d: The roadside perception dataset for autonomous driving and monocular 3d object detection task

Xiaoqing Ye, Mao Shu, Hanyu Li, Yifeng Shi, Yingying Li, Guangjie Wang, Xiao Tan, and Errui Ding. Rope3d: The roadside perception dataset for autonomous driving and monocular 3d object detection task. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 21341–21350, June 2022

work page 2022

[6] [6]

Walter Zimmer, Christian Creß, Huu Tung Nguyen, and Alois C. Knoll. A9 intersection dataset: All you need for urban 3d camera-lidar roadside perception, 2023

work page 2023

[7] [7]

Ips300+: a challenging multi-modal data sets for intersection perception system

Huanan Wang, Xinyu Zhang, Zhiwei Li, Jun Li, Kun Wang, Zhu Lei, and Ren Haibing. Ips300+: a challenging multi-modal data sets for intersection perception system. In 2022 International Conference on Robotics and Automation (ICRA), pages 2539–2545, 2022. 11

work page 2022

[8] [8]

A9-dataset: Multi-sensor infrastructure-based dataset for mobility research

Christian Creß, Walter Zimmer, Leah Strand, Maximilian Fortkord, Siyi Dai, Venkatnarayanan Lakshmi- narasimhan, and Alois Knoll. A9-dataset: Multi-sensor infrastructure-based dataset for mobility research. In 2022 IEEE Intelligent V ehicles Symposium (IV), pages 965–970, 2022

work page 2022

[9] [9]

Bevheight: A robust framework for vision-based roadside 3d object detection

Lei Yang, Kaicheng Yu, Tao Tang, Jun Li, Kun Yuan, Li Wang, Xinyu Zhang, and Peng Chen. Bevheight: A robust framework for vision-based roadside 3d object detection. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), March 2023

work page 2023

[10] [10]

Bevdepth: Acquisition of reliable depth for multi-view 3d object detection, 2022

Yinhao Li, Zheng Ge, Guanyi Yu, Jinrong Yang, Zengran Wang, Yukang Shi, Jianjian Sun, and Zeming Li. Bevdepth: Acquisition of reliable depth for multi-view 3d object detection, 2022

work page 2022

[11] [11]

Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers, 2022

Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, and Jifeng Dai. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers, 2022

work page 2022

[12] [12]

A revisit of sparse coding based anomaly detection in stacked rnn framework

Weixin Luo, Wen Liu, and Shenghua Gao. A revisit of sparse coding based anomaly detection in stacked rnn framework. In 2017 IEEE International Conference on Computer Vision (ICCV) , pages 341–349, 2017

work page 2017

[13] [13]

A new comprehensive benchmark for semi-supervised video anomaly detection and anticipation

Congqi Cao, Yue Lu, Peng Wang, and Yanning Zhang. A new comprehensive benchmark for semi-supervised video anomaly detection and anticipation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20392–20401, June 2023

work page 2023

[14] [14]

Developing and testing robust autonomy: The university of sydney campus data set

Wei Zhou, Julie Stephany Berrio, Charika De Alvis, Mao Shan, Stewart Worrall, James Ward, and Eduardo Nebot. Developing and testing robust autonomy: The university of sydney campus data set. IEEE Intelligent Transportation Systems Magazine, 12(4):23–40, 2020

work page 2020

[15] [15]

Campus3d: A photogrammetry point cloud benchmark for hierarchical understanding of outdoor scene

Xinke Li, Chongshou Li, Zekun Tong, Andrew Lim, Junsong Yuan, Yuwei Wu, Jing Tang, and Raymond Huang. Campus3d: A photogrammetry point cloud benchmark for hierarchical understanding of outdoor scene. MM ’20, page 238–246, New York, NY , USA, 2020. Association for Computing Machinery

work page 2020

[16] [16]

V2x-seq: A large-scale sequential dataset for vehicle-infrastructure cooperative perception and forecasting

Haibao Yu, Wenxian Yang, Hongzhi Ruan, Zhenwei Yang, Yingjuan Tang, Xu Gao, Xin Hao, Yifeng Shi, Yifeng Pan, Ning Sun, Juan Song, Jirui Yuan, Ping Luo, and Zaiqing Nie. V2x-seq: A large-scale sequential dataset for vehicle-infrastructure cooperative perception and forecasting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogn...

work page 2023

[17] [17]

Int2: Interactive trajectory prediction at intersections

Zhijie Yan, Pengfei Li, Zheng Fu, Shaocong Xu, Yongliang Shi, Xiaoxue Chen, Yuhang Zheng, Yang Li, Tianyu Liu, Chuxuan Li, Nairui Luo, Xu Gao, Yilun Chen, Zuoxu Wang, Yifeng Shi, Pengfei Huang, Zhengxiao Han, Jirui Yuan, Jiangtao Gong, Guyue Zhou, Hang Zhao, and Hao Zhao. Int2: Interactive trajectory prediction at intersections. In Proceedings of the IEEE...

work page 2023

[18] [18]

You only look once: Unified, real-time object detection

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages 779–788, 2016

work page 2016

[19] [19]

Yolo9000: Better, faster, stronger

Joseph Redmon and Ali Farhadi. Yolo9000: Better, faster, stronger. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6517–6525, 2017

work page 2017

[20] [20]

Yolov3: An incremental improvement, 2018

Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement, 2018

work page 2018

[21] [21]

Yolov4: Optimal speed and accuracy of object detection, 2020

Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. Yolov4: Optimal speed and accuracy of object detection, 2020

work page 2020

[22] [22]

YOLOv5 by Ultralytics, May 2020

Glenn Jocher. YOLOv5 by Ultralytics, May 2020

work page 2020

[23] [23]

You only learn one representation: Unified network for multiple tasks, 2021

Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao. You only learn one representation: Unified network for multiple tasks, 2021

work page 2021

[24] [24]

Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, 2022

Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, 2022

work page 2022

[25] [25]

YOLO by Ultralytics, January 2023

Glenn Jocher, Ayush Chaurasia, and Jing Qiu. YOLO by Ultralytics, January 2023

work page 2023

[26] [26]

Modnet: Motion and appearance based moving object detection network for autonomous driving

Mennatullah Siam, Heba Mahgoub, Mohamed Zahran, Senthil Yogamani, Martin Jagersand, and Ahmad El-Sallab. Modnet: Motion and appearance based moving object detection network for autonomous driving. In 2018 21st International Conference on Intelligent Transportation Systems (ITSC) , pages 2859–2864, 2018

work page 2018

[27] [27]

Monocular instance motion segmentation for autonomous driving: Kitti instancemotseg dataset and multi-task baseline

Eslam Mohamed, Mahmoud Ewaisha, Mennatullah Siam, Hazem Rashed, Senthil Yogamani, Waleed Hamdy, Mohamed El-Dakdouky, and Ahmad El-Sallab. Monocular instance motion segmentation for autonomous driving: Kitti instancemotseg dataset and multi-task baseline. In 2021 IEEE Intelligent V ehicles Symposium (IV), pages 114–121, 2021

work page 2021

[28] [28]

Learning to segment rigid motions from two frames

Gengshan Yang and Deva Ramanan. Learning to segment rigid motions from two frames. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 1266–1275, 2021. 12

work page 2021

[29] [29]

Discovering objects that can move

Zhipeng Bao, Pavel Tokmakov, Allan Jabri, Yu-Xiong Wang, Adrien Gaidon, and Martial Hebert. Discovering objects that can move. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 11779–11788, 2022

work page 2022

[30] [30]

Segmenting moving objects via an object-centric layered representation

Junyu Xie, Weidi Xie, and Andrew Zisserman. Segmenting moving objects via an object-centric layered representation. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 28023–28036. Curran Associates, Inc., 2022

work page 2022

[31] [31]

Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom

Alex H. Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. Pointpillars: Fast encoders for object detection from point clouds. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12689–12697, 2019

work page 2019

[32] [32]

V oxelnet: End-to-end learning for point cloud based 3d object detection

Yin Zhou and Oncel Tuzel. V oxelnet: End-to-end learning for point cloud based 3d object detection. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 4490–4499, 2018

work page 2018

[33] [33]

3dssd: Point-based 3d single stage object detector

Zetong Yang, Yanan Sun, Shu Liu, and Jiaya Jia. 3dssd: Point-based 3d single stage object detector. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 11037–11045, 2020

work page 2020

[34] [34]

V oxel r-cnn: Towards high performance voxel-based 3d object detection, 2021

Jiajun Deng, Shaoshuai Shi, Peiwei Li, Wengang Zhou, Yanyong Zhang, and Houqiang Li. V oxel r-cnn: Towards high performance voxel-based 3d object detection, 2021

work page 2021

[35] [35]

Center-based 3d object detection and tracking

Tianwei Yin, Xingyi Zhou, and Philipp Krähenbühl. Center-based 3d object detection and tracking. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 11779–11788, 2021

work page 2021

[36] [36]

Neighbor-vote: Improving monocular 3d object detection through neighbor distance voting

Xiaomeng Chu, Jiajun Deng, Yao Li, Zhenxun Yuan, Yanyong Zhang, Jianmin Ji, and Yu Zhang. Neighbor-vote: Improving monocular 3d object detection through neighbor distance voting. In Proceedings of the 29th ACM International Conference on Multimedia, MM ’21, page 5239–5247, New York, NY , USA, 2021. Association for Computing Machinery

work page 2021

[37] [37]

Objects are different: Flexible monocular 3d object detection

Yunpeng Zhang, Jiwen Lu, and Jie Zhou. Objects are different: Flexible monocular 3d object detection. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 3288–3297, 2021

work page 2021

[38] [38]

Imvoxelnet: Image to voxels projection for monocular and multi-view general-purpose 3d object detection

Danila Rukhovich, Anna V orontsova, and Anton Konushin. Imvoxelnet: Image to voxels projection for monocular and multi-view general-purpose 3d object detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2397–2406, 2022

work page 2022

[39] [39]

Monoatt: Online monocular 3d object detection with adaptive token transformer

Yunsong Zhou, Hongzi Zhu, Quan Liu, Shan Chang, and Minyi Guo. Monoatt: Online monocular 3d object detection with adaptive token transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17493–17503, June 2023

work page 2023

[40] [40]

Unimode: Unified monocular 3d object detection, 2024

Zhuoling Li, Xiaogang Xu, SerNam Lim, and Hengshuang Zhao. Unimode: Unified monocular 3d object detection, 2024

work page 2024

[41] [41]

Monouni: A unified vehicle and infrastructure-side monocular 3d object detection network with sufficient depth clues

Jia Jinrang, Zhenjia Li, and Yifeng Shi. Monouni: A unified vehicle and infrastructure-side monocular 3d object detection network with sufficient depth clues. In A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems , volume 36, pages 11703–11715. Curran Associates, Inc., 2023

work page 2023

[42] [42]

Pointpainting: Sequential fusion for 3d object detection

Sourabh V ora, Alex H Lang, Bassam Helou, and Oscar Beijbom. Pointpainting: Sequential fusion for 3d object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 4604–4612, 2020

work page 2020

[43] [43]

Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation

Zhijian Liu, Haotian Tang, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela L Rus, and Song Han. Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. In2023 IEEE International Conference on Robotics and Automation (ICRA) , pages 2774–2781. IEEE, 2023

work page 2023

[44] [44]

3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection

Jin Hyeok Yoo, Yecheol Kim, Jisong Kim, and Jun Won Choi. 3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVII 16 , pages 720–736. Springer, 2020

work page 2020

[45] [45]

Simple online and realtime tracking

Alex Bewley, Zongyuan Ge, Lionel Ott, Fabio Ramos, and Ben Upcroft. Simple online and realtime tracking. In 2016 IEEE International Conference on Image Processing (ICIP) , pages 3464–3468, 2016

work page 2016

[46] [46]

Simple online and realtime tracking with a deep association metric

Nicolai Wojke, Alex Bewley, and Dietrich Paulus. Simple online and realtime tracking with a deep association metric. In 2017 IEEE International Conference on Image Processing (ICIP) , pages 3645–3649. IEEE, 2017

work page 2017

[47] [47]

Observation-centric sort: Rethinking sort for robust multi-object tracking

Jinkun Cao, Jiangmiao Pang, Xinshuo Weng, Rawal Khirodkar, and Kris Kitani. Observation-centric sort: Rethinking sort for robust multi-object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9686–9696, June 2023. 13

work page 2023

[48] [48]

3d multi-object tracking: A baseline and new evaluation metrics

Xinshuo Weng, Jianren Wang, David Held, and Kris Kitani. 3d multi-object tracking: A baseline and new evaluation metrics. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages 10359–10366, 2020

work page 2020

[49] [49]

Cross-modal 3d object detection and tracking for auto-driving

Yihan Zeng, Chao Ma, Ming Zhu, Zhiming Fan, and Xiaokang Yang. Cross-modal 3d object detection and tracking for auto-driving. In Proc. Int. Conf. Intell. Robots Syst , pages 3850–3857. IEEE, 2021

work page 2021

[50] [50]

Camera calibrator, 2022

The MathWorks Inc. Camera calibrator, 2022

work page 2022

[51] [51]

A flexible new technique for camera calibration

Zhengyou Zhang. A flexible new technique for camera calibration. IEEE Transactions on pattern analysis and machine intelligence, 22(11):1330–1334, 2000

work page 2000

[52] [52]

Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. Segment anything, 2023

work page 2023

[53] [53]

Sustech points: A portable 3d point cloud interactive annotation platform system

E Li, Shuaijun Wang, Chengyang Li, Dachuan Li, Xiangbin Wu, and Qi Hao. Sustech points: A portable 3d point cloud interactive annotation platform system. In 2020 IEEE Intelligent V ehicles Symposium (IV), pages 1108–1115, 2020

work page 2020

[54] [54]

U2-onet: A two-level nested octave u-structure network with a multi-scale attention mechanism for moving object segmentation

Chenjie Wang, Chengyuan Li, Jun Liu, Bin Luo, Xin Su, Yajun Wang, and Yan Gao. U2-onet: A two-level nested octave u-structure network with a multi-scale attention mechanism for moving object segmentation. Remote Sensing, 13(1), 2021

work page 2021

[55] [55]

Riwnet: A moving object instance segmentation network being robust in adverse weather conditions, 2021

Chenjie Wang, Chengyuan Li, Bin Luo, Wei Wang, and Jun Liu. Riwnet: A moving object instance segmentation network being robust in adverse weather conditions, 2021

work page 2021

[56] [56]

Real-time vehicle distance estimation using single view geometry

Ahmed Ali, Ali Hassan, Afsheen Rafaqat Ali, Hussam Ullah Khan, Wajahat Kazmi, and Aamer Zaheer. Real-time vehicle distance estimation using single view geometry. In 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1100–1109, 2020

work page 2020

[57] [57]

Deep learning based vehicle position and orientation estimation via inverse perspective mapping image

Youngseok Kim and Dongsuk Kum. Deep learning based vehicle position and orientation estimation via inverse perspective mapping image. In 2019 IEEE Intelligent V ehicles Symposium (IV), pages 317–323, 2019

work page 2019

[58] [58]

Joint vehicle detection and distance prediction via monocular depth estimation

Chao Shen, Xiangmo Zhao, Zhanwen Liu, Tao Gao, and Jiang Xu. Joint vehicle detection and distance prediction via monocular depth estimation. IET Intelligent Transport Systems, 14(7):753–763, 2020

work page 2020

[59] [59]

Towards generalization across depth for monocular 3d object detection

Andrea Simonelli, Samuel Rota Buló, Lorenzo Porzi, Elisa Ricci, and Peter Kontschieder. Towards generalization across depth for monocular 3d object detection. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision – ECCV 2020, pages 767–782, Cham, 2020. Springer International Publishing

work page 2020

[60] [60]

Towards model generalization for monocular 3d object detection, 2022

Zhenyu Li, Zehui Chen, Ang Li, Liangji Fang, Qinhong Jiang, Xianming Liu, and Junjun Jiang. Towards model generalization for monocular 3d object detection, 2022

work page 2022

[61] [61]

u v 1 # = K ·

Massimo Bertozz, Alberto Broggi, and Alessandra Fascioli. Stereo inverse perspective mapping: theory and applications. Image and Vision Computing, 16(8):585–590, 1998. 14 Appendix .1 Rationale for P3D In roadside scenarios, the camera sensor is usually positioned at a certain height Hc above a local ground plane. We define the camera and the road coordina...

work page 1998