Recognition: unknown
EagleVision: A Multi-Task Benchmark for Cross-Domain Perception in High-Speed Autonomous Racing
Pith reviewed 2026-05-10 16:01 UTC · model grok-4.3
The pith
Cross-domain pretraining from urban and racing datasets enhances 3D detection and trajectory prediction in high-speed autonomous racing.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the EagleVision benchmark, built from newly annotated LiDAR frames across the Indy Autonomous Challenge, the A2RL competition, and simulator data, demonstrates through a dataset-centric transfer framework that urban pretraining improves 3D detection over scratch training on racing data, that intermediate pretraining on real racing data yields the strongest transfer to new racing environments, and that Indy-trained models outperform direct in-domain training on A2RL test sequences for trajectory prediction because of wider motion-distribution coverage.
What carries the argument
The dataset-centric transfer framework that standardizes LiDAR data from urban, simulator, and real racing sources under one evaluation protocol and measures how pretraining sequences affect detection and prediction in high-dynamic conditions.
Load-bearing premise
The newly created 3D bounding box annotations for the Indy and A2RL datasets are accurate and consistent enough to support reliable cross-domain comparisons.
What would settle it
Independent re-annotation of the same frames producing bounding boxes that reverse the observed performance ordering between urban-pretrained, simulator-adapted, and racing-intermediate models on the A2RL test set.
Figures
read the original abstract
High-speed autonomous racing presents extreme perception challenges, including large relative velocities and substantial domain shifts from conventional urban-driving datasets. Existing benchmarks do not adequately capture these high-dynamic conditions. We introduce EagleVision, a unified LiDAR-based multi-task benchmark for 3D detection and trajectory prediction in high-speed racing, providing newly annotated 3D bounding boxes for the Indy Autonomous Challenge dataset (14,893 frames) and the A2RL Real competition dataset (1,163 frames), together with 12,000 simulator-generated annotated frames, all standardized under a common evaluation protocol. Using a dataset-centric transfer framework, we quantify cross-domain generalization across urban, simulator, and real racing domains. Urban pretraining improves detection over scratch training (NDS 0.72 vs. 0.69), while intermediate pretraining on real racing data achieves the best transfer to A2RL (NDS 0.726), outperforming simulator-only adaptation. For trajectory prediction, Indy-trained models surpass in-domain A2RL training on A2RL test sequences (FDE 0.947 vs. 1.250), highlighting the role of motion-distribution coverage in cross-domain forecasting. EagleVision enables systematic study of perception generalization under extreme high-speed dynamics. The dataset and benchmark are publicly available at https://avlab.io/EagleVision
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces EagleVision, a LiDAR-based multi-task benchmark for 3D object detection and trajectory prediction in high-speed autonomous racing. It contributes newly annotated 3D bounding boxes for the Indy Autonomous Challenge dataset (14,893 frames) and A2RL Real competition dataset (1,163 frames), plus 12,000 simulator frames, all under a common protocol. Using a dataset-centric transfer framework, it reports cross-domain results: urban pretraining yields NDS 0.72 vs. 0.69 for scratch training on detection; intermediate pretraining on real racing data achieves best transfer to A2RL (NDS 0.726); and Indy-trained models outperform in-domain A2RL training on A2RL test sequences for trajectory prediction (FDE 0.947 vs. 1.250). The datasets and benchmark are released publicly.
Significance. If the annotations are shown to be reliable, this benchmark would be a useful addition for studying perception generalization under high-dynamic conditions that differ from urban driving. The public release of standardized datasets across urban, simulator, and real-racing domains enables reproducible follow-on work. The empirical observation that real racing pretraining outperforms simulator adaptation, and that broader motion coverage from Indy data improves forecasting on A2RL, provides concrete, falsifiable starting points for domain-adaptation research in robotics.
major comments (1)
- [Abstract / Dataset creation] Abstract and dataset description: All reported metric deltas (NDS 0.72 vs. 0.69; FDE 0.947 vs. 1.250) are computed directly on the newly supplied 3D bounding-box labels for Indy and A2RL. The manuscript states only that the boxes were “newly annotated” and “standardized under a common protocol,” with no description of the annotation pipeline, LiDAR calibration, motion-compensation procedure, inter-annotator agreement statistics, or quantitative validation (e.g., comparison to an off-the-shelf detector or error rates under high-speed sparsity). Because label noise or domain-specific bias could produce the observed transfer gains, this information is required to attribute results to domain shift rather than annotation artifacts.
minor comments (2)
- [Experimental setup] The manuscript should report statistical significance (e.g., confidence intervals or p-values) for the small metric improvements and provide the exact model architectures, training hyperparameters, and data splits used in the transfer experiments.
- [Results] Figure and table captions should explicitly state the evaluation protocol (e.g., how NDS is computed across domains) and note any differences in sensor characteristics between the three data sources.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of the benchmark's potential contribution and for highlighting the need for greater transparency in dataset construction. We address the major comment below and will revise the manuscript to incorporate the requested details.
read point-by-point responses
-
Referee: [Abstract / Dataset creation] Abstract and dataset description: All reported metric deltas (NDS 0.72 vs. 0.69; FDE 0.947 vs. 1.250) are computed directly on the newly supplied 3D bounding-box labels for Indy and A2RL. The manuscript states only that the boxes were “newly annotated” and “standardized under a common protocol,” with no description of the annotation pipeline, LiDAR calibration, motion-compensation procedure, inter-annotator agreement statistics, or quantitative validation (e.g., comparison to an off-the-shelf detector or error rates under high-speed sparsity). Because label noise or domain-specific bias could produce the observed transfer gains, this information is required to attribute results to domain shift rather than annotation artifacts.
Authors: We agree that the current manuscript provides insufficient detail on the annotation process, which is necessary to rule out label noise or bias as alternative explanations for the reported transfer gains. In the revised version we will add a dedicated subsection under Dataset Creation that describes: the semi-automatic annotation pipeline (initial proposals from an off-the-shelf detector followed by human refinement), vehicle-specific LiDAR calibration and extrinsic parameters, motion-compensation procedures that account for ego-velocity during high-speed sweeps, inter-annotator agreement metrics (mean IoU and label-consistency rates across multiple annotators), and quantitative validation results including precision-recall curves against held-out manual labels and error statistics stratified by speed and point-cloud sparsity. These additions will allow readers to assess label quality directly and strengthen the attribution of performance differences to domain shift. revision: yes
Circularity Check
No circularity in benchmark construction or transfer results
full rationale
The paper's core contributions are the release of newly annotated datasets (Indy 14,893 frames, A2RL 1,163 frames, plus simulator data) under a common protocol and the reporting of empirical NDS/FDE numbers obtained via standard supervised training and cross-domain transfer on held-out test splits. No equations, self-definitional loops, fitted parameters renamed as predictions, or load-bearing self-citations appear in the derivation of the reported deltas (e.g., urban pretrain NDS 0.72 vs scratch 0.69). All performance figures are computed directly from the supplied labels and models; the chain is externally falsifiable on the public benchmark and does not reduce to its own inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Vision meets robotics: The kitti dataset,
A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,”The international journal of robotics research, vol. 32, no. 11, pp. 1231–1237, 2013
2013
-
[2]
nuscenes: A multimodal dataset for autonomous driving,
H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631
2020
-
[3]
Scalability in perception for autonomous driving: Waymo open dataset,
P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V . Patnaik, P. Tsui, J. Guo, Y . Zhou, Y . Chai, B. Caineet al., “Scalability in perception for autonomous driving: Waymo open dataset,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2446–2454
2020
-
[4]
Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting
B. Wilson, W. Qi, T. Agarwal, J. Lambert, J. Singh, S. Khandelwal, B. Pan, R. Kumar, A. Hartnett, J. K. Ponteset al., “Argoverse 2: Next generation datasets for self-driving perception and forecasting,”arXiv preprint arXiv:2301.00493, 2023
work page internal anchor Pith review arXiv 2023
-
[5]
Spg: Unsupervised domain adaptation for 3d object detection via semantic point generation,
Q. e. a. Xu, “Spg: Unsupervised domain adaptation for 3d object detection via semantic point generation,” inICCV Workshops, 2021
2021
-
[6]
Gpa-3d: Geometry-aware prototype alignment for unsupervised domain adaptive 3d object detection,
Z. Li, J. Guo, and T. e. a. Cao, “Gpa-3d: Geometry-aware prototype alignment for unsupervised domain adaptive 3d object detection,” arXiv, 2023
2023
-
[7]
Vectornet: Encoding hd maps and agent dynamics from vectorized representation,
J. Gao, X. Yuanet al., “Vectornet: Encoding hd maps and agent dynamics from vectorized representation,” inCVPR, 2020
2020
-
[8]
Mtr++: Multi-agent mo- tion prediction with symmetric scene modeling and guided intention querying,
S. Shi, L. Jiang, D. Dai, and B. Schiele, “Mtr++: Multi-agent mo- tion prediction with symmetric scene modeling and guided intention querying,”arXiv, 2023
2023
-
[9]
Fadet: A multi-sensor 3d object detection network based on local featured attention,
Z. Guo, Z. Yagudin, S. Asfaw, A. Lykov, and D. Tsetserukou, “Fadet: A multi-sensor 3d object detection network based on local featured attention,” in2025 IEEE Intelligent V ehicles Symposium (IV). IEEE, 2025, pp. 202–208
2025
-
[10]
3d object detection for autonomous driving: A survey,
R. Qian, X. Lai, and X. Li, “3d object detection for autonomous driving: A survey,”arXiv, 2022
2022
-
[11]
A survey on deep-learning-based lidar 3d object detection,
S. Alabaet al., “A survey on deep-learning-based lidar 3d object detection,”PMC Free Article, 2022
2022
-
[12]
Vlm-auto: Vlm-based autonomous driving assistant with human-like behavior and understanding for complex road scenes,
Z. Guo, Z. Yagudin, A. Lykov, M. Konenkov, and D. Tsetserukou, “Vlm-auto: Vlm-based autonomous driving assistant with human-like behavior and understanding for complex road scenes,” in2024 2nd International Conference on F oundation and Large Language Models (FLLM), 2024, pp. 501–507
2024
-
[13]
Racecar-the dataset for high- speed autonomous racing,
A. Kulkarni, J. Chrosniak, E. Ducote, F. Sauerbeck, A. Saba, U. Chir- imar, J. Link, M. Behl, and M. Cellina, “Racecar-the dataset for high- speed autonomous racing,” in2023 IEEE/RSJ International Confer- ence on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 11 458–11 463
2023
-
[14]
Betty dataset: A multi-modal dataset for full-stack autonomy,
M. Nye, A. Raji, A. Saba, E. Erlich, R. Exley, A. Goyal, A. Matros, R. Misra, M. Sivaprakasam, M. Bertognaet al., “Betty dataset: A multi-modal dataset for full-stack autonomy,” in2025 IEEE Interna- tional Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 2453–2460
2025
-
[15]
Ms3d++: Ensemble of experts for multi-source unsupervised domain adaptation in 3d object detection,
D. Tsai, J. S. Berrio, and M. e. a. Shan, “Ms3d++: Ensemble of experts for multi-source unsupervised domain adaptation in 3d object detection,”arXiv, 2023
2023
-
[16]
Bev-dg: Cross-modal learning under bird’s-eye view for domain generalization,
M. e. a. Li, “Bev-dg: Cross-modal learning under bird’s-eye view for domain generalization,” inICCV, 2023
2023
-
[17]
Sim-to-real adversarial domain adaptation for 3d object detection,
M. Wozniak, M. Hansson, and P. Jensfelt, “Sim-to-real adversarial domain adaptation for 3d object detection,” inCVPR, 2024
2024
-
[18]
Metdrive: Multimodal end-to-end autonomous driv- ing with temporal guidance,
Z. Guo, X. Lin, Z. Yagudin, A. Lykov, Y . Wang, Y . Li, and D. Tsetserukou, “Metdrive: Multimodal end-to-end autonomous driv- ing with temporal guidance,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 6027–6032
2025
-
[19]
Sustechpoints: 3d point cloud annotation platform,
SUSTechPOINTS Development Team, “Sustechpoints: 3d point cloud annotation platform,” https://github.com/naurril/SUSTechPOINTS/ tree/dev-auto-annotate, 2023
2023
-
[20]
Indy autonomous challenge,
Indy Autonomous Challenge, “Indy autonomous challenge,” https:// www.indyautonomouschallenge.com, 2023, accessed: 2024
2023
-
[21]
Abu dhabi autonomous racing league (a2rl),
A2RL, “Abu dhabi autonomous racing league (a2rl),” https://a2rl.io, 2023, accessed: 2024
2023
-
[22]
Scalability in perception for au- tonomous driving: Waymo open dataset,
P. Sun, H. Kretzschmaret al., “Scalability in perception for au- tonomous driving: Waymo open dataset,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020
2020
-
[23]
Pointpillars: Fast encoders for object detection from point clouds,
A. H. Lang, S. V ora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast encoders for object detection from point clouds,” inCVPR, 2019
2019
-
[24]
Centerpoint: Center-based 3d object detection and tracking,
T. Yin, X. Zhou, and P. Krahenbuhl, “Centerpoint: Center-based 3d object detection and tracking,” inCVPR, 2021
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.