pith. sign in

arxiv: 2604.22835 · v1 · submitted 2026-04-20 · 💻 cs.CV · cs.AI

ParkingScenes: A Structured Dataset for End-to-End Autonomous Parking in Simulation Scenes

Pith reviewed 2026-05-10 04:51 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords autonomous parkingCARLA simulatorend-to-end learningstructured datasetHybrid A* plannermodel predictive controlmultimodal databird's eye view
0
0 comments X

The pith

Structured trajectories from planners enable better end-to-end parking policies than manual data collections.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the lack of high-quality training data for end-to-end autonomous parking by introducing ParkingScenes, a multimodal dataset built in the CARLA simulator. It consists of 704 episodes across 22 reverse-in and parallel parking scenarios, each repeated under pedestrian-present and absent conditions, with trajectories generated by a Hybrid A* planner and Model Predictive Controller. Every frame supplies synchronized data from four RGB cameras, four depth sensors, vehicle states, and bird's-eye views. Models trained on this structured supervision achieve significant performance gains over identical models trained on unstructured, manually collected simulation data. This outcome indicates that reproducible planner-based signals can strengthen policy learning for precise maneuvers in constrained spaces.

Core claim

ParkingScenes supplies structured parking trajectories generated by Hybrid A* planner and MPC in CARLA, covering 16 reverse-in and 6 parallel scenarios executed 16 times each under two pedestrian conditions for 704 episodes and about 105000 frames of multimodal data. End-to-end models trained on this dataset demonstrate significant performance improvements compared with models trained on unstructured manually collected simulation data under identical conditions, showing the effectiveness of structured supervision for robust and accurate parking policy learning.

What carries the argument

Structured supervision signals consisting of trajectories generated by the Hybrid A* planner and Model Predictive Controller (MPC), which replace unstructured manual data to supply accurate and reproducible training targets for parking maneuvers.

If this is right

  • End-to-end models learn more reliable control for tight-space parking maneuvers when given planner-derived trajectories as supervision.
  • Reproducible benchmarks become possible for comparing parking algorithms across consistent reverse-in, parallel, and pedestrian conditions.
  • Multimodal inputs including synchronized cameras, depth, states, and bird's-eye views can be fused more systematically for context-aware policies.
  • The released collection framework allows scalable generation of additional structured episodes without repeated manual effort.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This structured-supervision approach could extend to other constrained driving tasks such as precise lane changes or dockings where expert planners are available.
  • Pre-training on planner trajectories in simulation may lower the volume of real-world data needed when fine-tuning parking models for physical vehicles.
  • The dataset format invites experiments that combine simulation pre-training with limited real-world fine-tuning to test transfer across sensor and environment gaps.

Load-bearing premise

That performance gains observed when training on planner-generated trajectories in CARLA simulation will transfer to improved real-world parking behavior outside the specific simulator setup.

What would settle it

Training identical end-to-end models on the same number of episodes from ParkingScenes versus unstructured manual data and finding no statistically significant difference in parking success rate or accuracy metrics within the CARLA test scenarios would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.22835 by Bin Tian, Haonan Chen, Jun Fu, Kaiwen Xiao.

Figure 1
Figure 1. Figure 1: Top-down view of 16 reverse-in parking slots in simulation. Parking [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 4
Figure 4. Figure 4: Multimodal views captured per frame in our dataset. Top row: [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Overview of the ParkingScenes data collection pipeline. The ego ve [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗
read the original abstract

Autonomous parking remains a critical yet challenging task in intelligent driving systems, particularly within constrained urban environments where maneuvering space is limited and precise control is essential. While recent advances in end-to-end learning have shown great promise, the lack of high-quality, structured datasets tailored for parking scenarios remains a significant bottleneck.To address this gap, we present ParkingScenes, a comprehensive multimodal dataset specifically designed for end-to-end autonomous parking in simulated scenes. Built on the CARLA simulator, ParkingScenes features structured parking trajectories generated by a Hybrid A* planner and a Model Predictive Controller (MPC), providing accurate and reproducible supervision signals. The dataset includes 16 reverse-in and 6 parallel parking scenarios, each executed under two pedestrian conditions (present and absent), resulting in 704 structured episodes and approximately 105000 frames. Each scenario is repeated 16 times to ensure consistent coverage. Each frame contains synchronized data from four RGB cameras, four depth sensors, vehicle motion states, and Bird's-Eye View (BEV) representations, enabling rich multimodal fusion and context-aware learning. To demonstrate the utility of our dataset, we compare models trained on ParkingScenes with those trained on unstructured, manually collected simulation data under identical conditions. Results show significant improvements in performance, underscoring the effectiveness of structured supervision for robust and accurate parking policy learning. By releasing both the dataset and the collection framework, ParkingScenes establishes a scalable and reproducible benchmark for advancing learning-based autonomous parking systems. The dataset and collection framework will be released at: https://github.com/haonan-ai/ParkingScenes

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents ParkingScenes, a multimodal dataset for end-to-end autonomous parking built in CARLA, containing 704 structured episodes (16 reverse-in and 6 parallel parking scenarios, each with/without pedestrians, repeated 16 times) generated via Hybrid A* planner and MPC trajectories. Each frame includes synchronized RGB cameras, depth sensors, vehicle states, and BEV representations. The central claim is that models trained on this structured data achieve significant performance improvements over those trained on unstructured manually collected simulation data under identical conditions, establishing a reproducible benchmark.

Significance. If the performance gains can be rigorously attributed to the structured supervision with matched controls, the dataset would provide a valuable, scalable resource for training robust parking policies in simulation, addressing the noted bottleneck in high-quality parking data for learning-based systems.

major comments (2)
  1. [Abstract] Abstract: the claim of 'significant improvements in performance' is unsupported by any quantitative metrics, error bars, model architectures, statistical tests, or explicit definition of the unstructured baseline; without these, the central empirical claim cannot be evaluated.
  2. [Results/Experiments] Results/Experiments section: the comparison to the unstructured manual baseline does not report episode count, total frames, per-scenario repetition factor, collection interface, or quality filtering for the manual data, so it is impossible to confirm that gains arise from structured Hybrid A* + MPC trajectories rather than uncontrolled differences in data volume or noise (as required to support the 'under identical conditions' assertion).
minor comments (2)
  1. [Dataset Construction] Dataset section: specify the exact sensor synchronization protocol and how the 105000 frames are distributed across the 704 episodes.
  2. [Abstract] Abstract and conclusion: the GitHub link is provided but no details on data format, loading scripts, or licensing are given in the text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed review. The comments highlight important areas where the manuscript can be strengthened for clarity and rigor. We address each major comment below and will incorporate revisions to ensure the empirical claims are fully supported and transparent.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim of 'significant improvements in performance' is unsupported by any quantitative metrics, error bars, model architectures, statistical tests, or explicit definition of the unstructured baseline; without these, the central empirical claim cannot be evaluated.

    Authors: We agree that the abstract, being a concise summary, does not include the supporting quantitative details. The experiments section of the manuscript describes the end-to-end models (imitation learning architectures), reports performance metrics including success rates and trajectory errors with standard deviations, and includes comparisons under matched conditions. In the revised version, we will update the abstract to briefly reference key quantitative results (e.g., relative improvement in parking success rate) and explicitly define the unstructured baseline as manually collected trajectories generated via human operator control in the identical CARLA environment and scenarios. revision: yes

  2. Referee: [Results/Experiments] Results/Experiments section: the comparison to the unstructured manual baseline does not report episode count, total frames, per-scenario repetition factor, collection interface, or quality filtering for the manual data, so it is impossible to confirm that gains arise from structured Hybrid A* + MPC trajectories rather than uncontrolled differences in data volume or noise (as required to support the 'under identical conditions' assertion).

    Authors: We acknowledge that the current manuscript does not provide sufficient detail on the unstructured baseline collection to allow full verification of matched conditions. The manual data was collected to match the structured dataset in scale (704 episodes across the same 22 scenarios with 16 repetitions each, yielding approximately 105000 frames), using a standard CARLA manual control interface with post-collection quality filtering to exclude invalid or colliding trajectories. To address this, the revised manuscript will include an expanded subsection in Results/Experiments that explicitly reports the episode count, total frames, repetition factor, collection interface, and quality filtering criteria for the baseline. This addition will isolate the effect of structured Hybrid A* + MPC supervision while maintaining the 'identical conditions' claim. revision: yes

Circularity Check

0 steps flagged

No circularity; purely empirical dataset construction and comparison

full rationale

The paper presents a new multimodal dataset generated via Hybrid A* + MPC trajectories in CARLA, followed by an empirical performance comparison against an unstructured manual baseline under identical conditions. No mathematical derivations, equations, fitted parameters, predictions, or uniqueness theorems appear in the provided text. The central claim rests on reported performance deltas rather than any self-referential reduction or ansatz smuggling. Self-citation load-bearing, self-definitional loops, and renaming of known results are absent. The skeptic concern about unmatched baseline volume is a potential validity or experimental-control issue, not a circularity in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The contribution depends on the assumption that CARLA provides realistic enough sensor and dynamics data for learning parking policies and that Hybrid A* plus MPC trajectories constitute high-quality supervision superior to manual collection.

axioms (2)
  • domain assumption CARLA simulator produces sufficiently accurate vehicle dynamics, camera images, depth data, and BEV representations for training transferable parking policies.
    Invoked by building the entire dataset inside CARLA and claiming utility for autonomous parking systems.
  • domain assumption Hybrid A* planner combined with MPC generates reproducible, high-quality parking trajectories that serve as effective supervision signals.
    Used to create the structured episodes that form the core of the dataset.

pith-pipeline@v0.9.0 · 5584 in / 1474 out tokens · 66630 ms · 2026-05-10T04:51:38.342599+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 3 canonical work pages

  1. [1]

    Avp-slam: Semantic visual mapping and localization for autonomous vehicles in the parking lot,

    T. Qin, T. Chen, Y . Chen, and Q. Su, “Avp-slam: Semantic visual mapping and localization for autonomous vehicles in the parking lot,” in2020 IEEE/RSJ International Conference on intelligent robots and systems (IROS). IEEE, 2020, pp. 5939–5945

  2. [2]

    Robust parking path planning with error-adaptive sampling under perception uncertainty,

    S. Lee, W. Lim, and M. Sunwoo, “Robust parking path planning with error-adaptive sampling under perception uncertainty,”Sensors, vol. 20, no. 12, p. 3560, 2020

  3. [3]

    Model-based predictive control and reinforcement learning for planning vehicle-parking trajec- tories for vertical parking spaces,

    J. Shi, K. Li, C. Piao, J. Gao, and L. Chen, “Model-based predictive control and reinforcement learning for planning vehicle-parking trajec- tories for vertical parking spaces,”Sensors, vol. 23, no. 16, p. 7124, 2023

  4. [4]

    Recent advancements in end-to-end au- tonomous driving using deep learning: A survey,

    P. S. Chib and P. Singh, “Recent advancements in end-to-end au- tonomous driving using deep learning: A survey,”IEEE Transactions on Intelligent V ehicles, vol. 9, no. 1, pp. 103–118, 2023

  5. [5]

    Planning-oriented autonomous driving,

    Y . Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wanget al., “Planning-oriented autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 17 853–17 862

  6. [6]

    Virtual fluid-flow- model-based lane-keeping integrated with collision avoidance control system design for autonomous vehicles,

    S. Cheng, L. Li, Y .-G. Liu, W.-B. Li, and H.-Q. Guo, “Virtual fluid-flow- model-based lane-keeping integrated with collision avoidance control system design for autonomous vehicles,”IEEE Transactions on Intelli- gent Transportation Systems, vol. 22, no. 10, pp. 6232–6241, 2020

  7. [7]

    A comprehensive review on deep learning-based motion planning and end-to-end learning for self-driving vehicle,

    M. Ganesan, S. Kandhasamy, B. Chokkalingam, and L. Mihet-Popa, “A comprehensive review on deep learning-based motion planning and end-to-end learning for self-driving vehicle,”IEEE Access, 2024

  8. [8]

    E2e parking: Autonomous parking by the end-to-end neural network on the carla simulator,

    Y . Yang, D. Chen, T. Qin, X. Mu, C. Xu, and M. Yang, “E2e parking: Autonomous parking by the end-to-end neural network on the carla simulator,” in2024 IEEE Intelligent V ehicles Symposium (IV). IEEE, 2024, pp. 2375–2382

  9. [9]

    Carla: An open urban driving simulator,

    A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “Carla: An open urban driving simulator,” inConference on robot learning. PMLR, 2017, pp. 1–16

  10. [10]

    Practical search techniques in path planning for autonomous driving,

    D. Dolgov, S. Thrun, M. Montemerlo, and J. Diebel, “Practical search techniques in path planning for autonomous driving,”ann arbor, vol. 1001, no. 48105, pp. 18–80, 2008

  11. [11]

    Con- strained model predictive control: Stability and optimality,

    D. Q. Mayne, J. B. Rawlings, C. V . Rao, and P. O. Scokaert, “Con- strained model predictive control: Stability and optimality,”Automatica, vol. 36, no. 6, pp. 789–814, 2000

  12. [12]

    Are we ready for autonomous driving? the kitti vision benchmark suite,

    A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012, pp. 3354–3361

  13. [13]

    nuscenes: A multimodal dataset for autonomous driving,

    H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631

  14. [14]

    Scalability in perception for autonomous driving: Waymo open dataset,

    P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V . Patnaik, P. Tsui, J. Guo, Y . Zhou, Y . Chai, B. Caineet al., “Scalability in perception for autonomous driving: Waymo open dataset,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2446–2454

  15. [15]

    Bdd100k: A 10 diverse driving video database with scalable annotation tool- ing.arXiv preprint arXiv:1805.04687, 2(5):6, 2018

    F. Yu, W. Xian, Y . Chen, F. Liu, M. Liao, V . Madhavan, T. Darrellet al., “Bdd100k: A diverse driving video database with scalable annotation tooling,”arXiv preprint arXiv:1805.04687, vol. 2, no. 5, p. 6, 2018

  16. [16]

    The cityscapes dataset for semantic urban scene understanding,

    M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Be- nenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3213– 3223

  17. [17]

    Vh-hfcn based parking slot and lane markings segmentation on panoramic surround view,

    Y . Wu, T. Yang, J. Zhao, L. Guan, and W. Jiang, “Vh-hfcn based parking slot and lane markings segmentation on panoramic surround view,” in 2018 IEEE Intelligent V ehicles Symposium (IV). IEEE, 2018, pp. 1767– 1772

  18. [18]

    Vision-based parking-slot detection: A dcnn-based approach and a large-scale benchmark dataset,

    L. Zhang, J. Huang, X. Li, and L. Xiong, “Vision-based parking-slot detection: A dcnn-based approach and a large-scale benchmark dataset,” IEEE Transactions on Image Processing, vol. 27, no. 11, pp. 5350–5364, 2018

  19. [19]

    Context-based parking slot detection with a realistic dataset,

    H. Do and J. Y . Choi, “Context-based parking slot detection with a realistic dataset,”IEEE access, vol. 8, pp. 171 551–171 559, 2020

  20. [20]

    Parkpredict+: Multimodal intent and motion prediction for vehicles in parking lots with cnn and transformer,

    X. Shen, M. Lacayo, N. Guggilla, and F. Borrelli, “Parkpredict+: Multimodal intent and motion prediction for vehicles in parking lots with cnn and transformer,” in2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2022, pp. 3999– 4004

  21. [21]

    Geyer, Y

    J. Geyer, Y . Kassahun, M. Mahmudi, X. Ricou, R. Durgesh, A. S. Chung, L. Hauswald, V . H. Pham, M. M ¨uhlegg, S. Dornet al., “A2d2: Audi autonomous driving dataset,”arXiv preprint arXiv:2004.06320, 2020

  22. [22]

    A Commute in Data: The comma2k19 Dataset

    H. Schafer, E. Santana, A. Haden, and R. Biasini, “A commute in data: The comma2k19 dataset,”arXiv preprint arXiv:1812.05752, 2018

  23. [23]

    Sups: A simulated underground parking scenario dataset for autonomous driving,

    J. Hou, Q. Chen, Y . Cheng, G. Chen, X. Xue, T. Zeng, and J. Pu, “Sups: A simulated underground parking scenario dataset for autonomous driving,” in2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2022, pp. 2265–2271