ParkingScenes: A Structured Dataset for End-to-End Autonomous Parking in Simulation Scenes
Pith reviewed 2026-05-10 04:51 UTC · model grok-4.3
The pith
Structured trajectories from planners enable better end-to-end parking policies than manual data collections.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ParkingScenes supplies structured parking trajectories generated by Hybrid A* planner and MPC in CARLA, covering 16 reverse-in and 6 parallel scenarios executed 16 times each under two pedestrian conditions for 704 episodes and about 105000 frames of multimodal data. End-to-end models trained on this dataset demonstrate significant performance improvements compared with models trained on unstructured manually collected simulation data under identical conditions, showing the effectiveness of structured supervision for robust and accurate parking policy learning.
What carries the argument
Structured supervision signals consisting of trajectories generated by the Hybrid A* planner and Model Predictive Controller (MPC), which replace unstructured manual data to supply accurate and reproducible training targets for parking maneuvers.
If this is right
- End-to-end models learn more reliable control for tight-space parking maneuvers when given planner-derived trajectories as supervision.
- Reproducible benchmarks become possible for comparing parking algorithms across consistent reverse-in, parallel, and pedestrian conditions.
- Multimodal inputs including synchronized cameras, depth, states, and bird's-eye views can be fused more systematically for context-aware policies.
- The released collection framework allows scalable generation of additional structured episodes without repeated manual effort.
Where Pith is reading between the lines
- This structured-supervision approach could extend to other constrained driving tasks such as precise lane changes or dockings where expert planners are available.
- Pre-training on planner trajectories in simulation may lower the volume of real-world data needed when fine-tuning parking models for physical vehicles.
- The dataset format invites experiments that combine simulation pre-training with limited real-world fine-tuning to test transfer across sensor and environment gaps.
Load-bearing premise
That performance gains observed when training on planner-generated trajectories in CARLA simulation will transfer to improved real-world parking behavior outside the specific simulator setup.
What would settle it
Training identical end-to-end models on the same number of episodes from ParkingScenes versus unstructured manual data and finding no statistically significant difference in parking success rate or accuracy metrics within the CARLA test scenarios would falsify the central claim.
Figures
read the original abstract
Autonomous parking remains a critical yet challenging task in intelligent driving systems, particularly within constrained urban environments where maneuvering space is limited and precise control is essential. While recent advances in end-to-end learning have shown great promise, the lack of high-quality, structured datasets tailored for parking scenarios remains a significant bottleneck.To address this gap, we present ParkingScenes, a comprehensive multimodal dataset specifically designed for end-to-end autonomous parking in simulated scenes. Built on the CARLA simulator, ParkingScenes features structured parking trajectories generated by a Hybrid A* planner and a Model Predictive Controller (MPC), providing accurate and reproducible supervision signals. The dataset includes 16 reverse-in and 6 parallel parking scenarios, each executed under two pedestrian conditions (present and absent), resulting in 704 structured episodes and approximately 105000 frames. Each scenario is repeated 16 times to ensure consistent coverage. Each frame contains synchronized data from four RGB cameras, four depth sensors, vehicle motion states, and Bird's-Eye View (BEV) representations, enabling rich multimodal fusion and context-aware learning. To demonstrate the utility of our dataset, we compare models trained on ParkingScenes with those trained on unstructured, manually collected simulation data under identical conditions. Results show significant improvements in performance, underscoring the effectiveness of structured supervision for robust and accurate parking policy learning. By releasing both the dataset and the collection framework, ParkingScenes establishes a scalable and reproducible benchmark for advancing learning-based autonomous parking systems. The dataset and collection framework will be released at: https://github.com/haonan-ai/ParkingScenes
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents ParkingScenes, a multimodal dataset for end-to-end autonomous parking built in CARLA, containing 704 structured episodes (16 reverse-in and 6 parallel parking scenarios, each with/without pedestrians, repeated 16 times) generated via Hybrid A* planner and MPC trajectories. Each frame includes synchronized RGB cameras, depth sensors, vehicle states, and BEV representations. The central claim is that models trained on this structured data achieve significant performance improvements over those trained on unstructured manually collected simulation data under identical conditions, establishing a reproducible benchmark.
Significance. If the performance gains can be rigorously attributed to the structured supervision with matched controls, the dataset would provide a valuable, scalable resource for training robust parking policies in simulation, addressing the noted bottleneck in high-quality parking data for learning-based systems.
major comments (2)
- [Abstract] Abstract: the claim of 'significant improvements in performance' is unsupported by any quantitative metrics, error bars, model architectures, statistical tests, or explicit definition of the unstructured baseline; without these, the central empirical claim cannot be evaluated.
- [Results/Experiments] Results/Experiments section: the comparison to the unstructured manual baseline does not report episode count, total frames, per-scenario repetition factor, collection interface, or quality filtering for the manual data, so it is impossible to confirm that gains arise from structured Hybrid A* + MPC trajectories rather than uncontrolled differences in data volume or noise (as required to support the 'under identical conditions' assertion).
minor comments (2)
- [Dataset Construction] Dataset section: specify the exact sensor synchronization protocol and how the 105000 frames are distributed across the 704 episodes.
- [Abstract] Abstract and conclusion: the GitHub link is provided but no details on data format, loading scripts, or licensing are given in the text.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and detailed review. The comments highlight important areas where the manuscript can be strengthened for clarity and rigor. We address each major comment below and will incorporate revisions to ensure the empirical claims are fully supported and transparent.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim of 'significant improvements in performance' is unsupported by any quantitative metrics, error bars, model architectures, statistical tests, or explicit definition of the unstructured baseline; without these, the central empirical claim cannot be evaluated.
Authors: We agree that the abstract, being a concise summary, does not include the supporting quantitative details. The experiments section of the manuscript describes the end-to-end models (imitation learning architectures), reports performance metrics including success rates and trajectory errors with standard deviations, and includes comparisons under matched conditions. In the revised version, we will update the abstract to briefly reference key quantitative results (e.g., relative improvement in parking success rate) and explicitly define the unstructured baseline as manually collected trajectories generated via human operator control in the identical CARLA environment and scenarios. revision: yes
-
Referee: [Results/Experiments] Results/Experiments section: the comparison to the unstructured manual baseline does not report episode count, total frames, per-scenario repetition factor, collection interface, or quality filtering for the manual data, so it is impossible to confirm that gains arise from structured Hybrid A* + MPC trajectories rather than uncontrolled differences in data volume or noise (as required to support the 'under identical conditions' assertion).
Authors: We acknowledge that the current manuscript does not provide sufficient detail on the unstructured baseline collection to allow full verification of matched conditions. The manual data was collected to match the structured dataset in scale (704 episodes across the same 22 scenarios with 16 repetitions each, yielding approximately 105000 frames), using a standard CARLA manual control interface with post-collection quality filtering to exclude invalid or colliding trajectories. To address this, the revised manuscript will include an expanded subsection in Results/Experiments that explicitly reports the episode count, total frames, repetition factor, collection interface, and quality filtering criteria for the baseline. This addition will isolate the effect of structured Hybrid A* + MPC supervision while maintaining the 'identical conditions' claim. revision: yes
Circularity Check
No circularity; purely empirical dataset construction and comparison
full rationale
The paper presents a new multimodal dataset generated via Hybrid A* + MPC trajectories in CARLA, followed by an empirical performance comparison against an unstructured manual baseline under identical conditions. No mathematical derivations, equations, fitted parameters, predictions, or uniqueness theorems appear in the provided text. The central claim rests on reported performance deltas rather than any self-referential reduction or ansatz smuggling. Self-citation load-bearing, self-definitional loops, and renaming of known results are absent. The skeptic concern about unmatched baseline volume is a potential validity or experimental-control issue, not a circularity in the derivation chain.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption CARLA simulator produces sufficiently accurate vehicle dynamics, camera images, depth data, and BEV representations for training transferable parking policies.
- domain assumption Hybrid A* planner combined with MPC generates reproducible, high-quality parking trajectories that serve as effective supervision signals.
Reference graph
Works this paper leans on
-
[1]
Avp-slam: Semantic visual mapping and localization for autonomous vehicles in the parking lot,
T. Qin, T. Chen, Y . Chen, and Q. Su, “Avp-slam: Semantic visual mapping and localization for autonomous vehicles in the parking lot,” in2020 IEEE/RSJ International Conference on intelligent robots and systems (IROS). IEEE, 2020, pp. 5939–5945
2020
-
[2]
Robust parking path planning with error-adaptive sampling under perception uncertainty,
S. Lee, W. Lim, and M. Sunwoo, “Robust parking path planning with error-adaptive sampling under perception uncertainty,”Sensors, vol. 20, no. 12, p. 3560, 2020
2020
-
[3]
Model-based predictive control and reinforcement learning for planning vehicle-parking trajec- tories for vertical parking spaces,
J. Shi, K. Li, C. Piao, J. Gao, and L. Chen, “Model-based predictive control and reinforcement learning for planning vehicle-parking trajec- tories for vertical parking spaces,”Sensors, vol. 23, no. 16, p. 7124, 2023
2023
-
[4]
Recent advancements in end-to-end au- tonomous driving using deep learning: A survey,
P. S. Chib and P. Singh, “Recent advancements in end-to-end au- tonomous driving using deep learning: A survey,”IEEE Transactions on Intelligent V ehicles, vol. 9, no. 1, pp. 103–118, 2023
2023
-
[5]
Planning-oriented autonomous driving,
Y . Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wanget al., “Planning-oriented autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 17 853–17 862
2023
-
[6]
Virtual fluid-flow- model-based lane-keeping integrated with collision avoidance control system design for autonomous vehicles,
S. Cheng, L. Li, Y .-G. Liu, W.-B. Li, and H.-Q. Guo, “Virtual fluid-flow- model-based lane-keeping integrated with collision avoidance control system design for autonomous vehicles,”IEEE Transactions on Intelli- gent Transportation Systems, vol. 22, no. 10, pp. 6232–6241, 2020
2020
-
[7]
A comprehensive review on deep learning-based motion planning and end-to-end learning for self-driving vehicle,
M. Ganesan, S. Kandhasamy, B. Chokkalingam, and L. Mihet-Popa, “A comprehensive review on deep learning-based motion planning and end-to-end learning for self-driving vehicle,”IEEE Access, 2024
2024
-
[8]
E2e parking: Autonomous parking by the end-to-end neural network on the carla simulator,
Y . Yang, D. Chen, T. Qin, X. Mu, C. Xu, and M. Yang, “E2e parking: Autonomous parking by the end-to-end neural network on the carla simulator,” in2024 IEEE Intelligent V ehicles Symposium (IV). IEEE, 2024, pp. 2375–2382
2024
-
[9]
Carla: An open urban driving simulator,
A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “Carla: An open urban driving simulator,” inConference on robot learning. PMLR, 2017, pp. 1–16
2017
-
[10]
Practical search techniques in path planning for autonomous driving,
D. Dolgov, S. Thrun, M. Montemerlo, and J. Diebel, “Practical search techniques in path planning for autonomous driving,”ann arbor, vol. 1001, no. 48105, pp. 18–80, 2008
2008
-
[11]
Con- strained model predictive control: Stability and optimality,
D. Q. Mayne, J. B. Rawlings, C. V . Rao, and P. O. Scokaert, “Con- strained model predictive control: Stability and optimality,”Automatica, vol. 36, no. 6, pp. 789–814, 2000
2000
-
[12]
Are we ready for autonomous driving? the kitti vision benchmark suite,
A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012, pp. 3354–3361
2012
-
[13]
nuscenes: A multimodal dataset for autonomous driving,
H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631
2020
-
[14]
Scalability in perception for autonomous driving: Waymo open dataset,
P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V . Patnaik, P. Tsui, J. Guo, Y . Zhou, Y . Chai, B. Caineet al., “Scalability in perception for autonomous driving: Waymo open dataset,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2446–2454
2020
-
[15]
F. Yu, W. Xian, Y . Chen, F. Liu, M. Liao, V . Madhavan, T. Darrellet al., “Bdd100k: A diverse driving video database with scalable annotation tooling,”arXiv preprint arXiv:1805.04687, vol. 2, no. 5, p. 6, 2018
-
[16]
The cityscapes dataset for semantic urban scene understanding,
M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Be- nenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3213– 3223
2016
-
[17]
Vh-hfcn based parking slot and lane markings segmentation on panoramic surround view,
Y . Wu, T. Yang, J. Zhao, L. Guan, and W. Jiang, “Vh-hfcn based parking slot and lane markings segmentation on panoramic surround view,” in 2018 IEEE Intelligent V ehicles Symposium (IV). IEEE, 2018, pp. 1767– 1772
2018
-
[18]
Vision-based parking-slot detection: A dcnn-based approach and a large-scale benchmark dataset,
L. Zhang, J. Huang, X. Li, and L. Xiong, “Vision-based parking-slot detection: A dcnn-based approach and a large-scale benchmark dataset,” IEEE Transactions on Image Processing, vol. 27, no. 11, pp. 5350–5364, 2018
2018
-
[19]
Context-based parking slot detection with a realistic dataset,
H. Do and J. Y . Choi, “Context-based parking slot detection with a realistic dataset,”IEEE access, vol. 8, pp. 171 551–171 559, 2020
2020
-
[20]
Parkpredict+: Multimodal intent and motion prediction for vehicles in parking lots with cnn and transformer,
X. Shen, M. Lacayo, N. Guggilla, and F. Borrelli, “Parkpredict+: Multimodal intent and motion prediction for vehicles in parking lots with cnn and transformer,” in2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2022, pp. 3999– 4004
2022
- [21]
-
[22]
A Commute in Data: The comma2k19 Dataset
H. Schafer, E. Santana, A. Haden, and R. Biasini, “A commute in data: The comma2k19 dataset,”arXiv preprint arXiv:1812.05752, 2018
work page Pith review arXiv 2018
-
[23]
Sups: A simulated underground parking scenario dataset for autonomous driving,
J. Hou, Q. Chen, Y . Cheng, G. Chen, X. Xue, T. Zeng, and J. Pu, “Sups: A simulated underground parking scenario dataset for autonomous driving,” in2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2022, pp. 2265–2271
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.