Recognition: no theorem link
REAP: Reinforcement-Learning End-to-End Autonomous Parking with Gaussian Splatting Simulator for Real2Sim2Real Transfer
Pith reviewed 2026-05-12 01:25 UTC · model grok-4.3
The pith
An end-to-end reinforcement learning policy trained in a 3D Gaussian Splatting simulator transfers directly to physical vehicles and parks successfully in narrow mechanical slots.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
REAP shows that an end-to-end reinforcement learning policy trained with Soft Actor-Critic in an asymmetric framework, after distilling a rule-based planner and applying a soft predictive collision penalty inside a 3D Gaussian Splatting simulator, can be deployed directly onto physical vehicles to achieve autonomous parking in various spaces, including extremely narrow mechanical slots.
What carries the argument
The asymmetric Soft Actor-Critic reinforcement learning framework combined with 3D Gaussian Splatting for Real2Sim2Real scene reconstruction and direct policy transfer.
Load-bearing premise
The 3D Gaussian Splatting reconstruction and asymmetric training produce a policy whose simulated behavior matches real vehicle dynamics and sensor inputs closely enough for safe zero-shot deployment.
What would settle it
Deploy the trained REAP policy on a physical vehicle in an extremely narrow mechanical parking slot and record repeated collisions or failures to complete the maneuver.
Figures
read the original abstract
In recent years, autonomous parking has made significant advances, yet parking tasks still face challenges in extreme scenarios such as mechanical and dead-end parking slots, often resulting in failures. This is mainly due to traditional parking methods adopting a multistage approach, lacking the ability to optimize the parking problem as a whole. End-to-end methods enable joint optimization across perception and planning modules to eliminate the accumulation of errors, enhancing algorithm performance in extreme scenarios. Although several end-to-end parking methods use imitation or reinforcement learning, the former is limited by data cost and distribution coverage, while the latter suffers from inefficient exploration. To address these challenges, we propose a Reinforcement learning End-to-end Autonomous Parking method (REAP). REAP employs Soft Actor-Critic (SAC) within an asymmetric reinforcement learning framework to improve training efficiency and inference performance. To accelerate model convergence, we distill the capabilities of a rule-based planner into the end-to-end network through behavior cloning. We further introduce a soft predictive collision penalty mechanism to reduce collision rates by penalizing obstacle-approaching actions. To ensure that the trained reinforcement learning network can directly transfer to real-world scenarios, we have established a Real2Sim2Real simulator. In the Real2Sim step, we use 3D Gaussian Splatting (3DGS) to transform real-world scenes into digital scenes. In the Sim2Real step, we deploy the end-to-end model onto the vehicle to bridge the Sim2Real gap. Trained in the 3DGS simulator and deployed on physical vehicles, REAP successfully parks in various types of parking spaces, especially demonstrating the feasibility of end-to-end RL parking in extremely narrow mechanical slots.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes REAP, a reinforcement-learning end-to-end autonomous parking method that employs Soft Actor-Critic (SAC) in an asymmetric framework, distills a rule-based planner via behavior cloning, and adds a soft predictive collision penalty. It introduces a Real2Sim2Real pipeline in which 3D Gaussian Splatting reconstructs real scenes into a simulator for training, followed by direct deployment of the policy on physical vehicles, with the central claim being successful parking across various slot types and demonstrated feasibility in extremely narrow mechanical slots without additional real-world adaptation.
Significance. If the quantitative claims hold, the work would supply evidence that photorealistic 3DGS simulators combined with asymmetric RL and behavior cloning can enable direct sim-to-real transfer for end-to-end policies in safety-critical, kinematically tight parking tasks, potentially reducing reliance on multistage planning and real-world fine-tuning.
major comments (2)
- Abstract: the assertion that the trained model 'successfully parks' in real-world scenarios, especially narrow mechanical slots, is unsupported by any reported success rates, collision rates, failure-mode analysis, or baseline comparisons, which are required to evaluate the load-bearing Real2Sim2Real transfer claim.
- Simulator description (Real2Sim step): the 3DGS reconstruction is presented primarily for visual fidelity; no quantitative validation (e.g., trajectory matching, actuator latency, or tire-slip reproduction) is supplied for the underlying physics engine, leaving the assumption that simulation dynamics match real-vehicle behavior within the tolerance needed for collision-free narrow-slot control untested.
minor comments (1)
- Abstract: consider inserting key numerical results (success rates, slot widths, etc.) to make the summary self-contained.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for strengthening the presentation of our Real2Sim2Real claims and simulator validation. We address each major comment point by point below, indicating the revisions we will incorporate in the updated version.
read point-by-point responses
-
Referee: Abstract: the assertion that the trained model 'successfully parks' in real-world scenarios, especially narrow mechanical slots, is unsupported by any reported success rates, collision rates, failure-mode analysis, or baseline comparisons, which are required to evaluate the load-bearing Real2Sim2Real transfer claim.
Authors: We agree that the abstract's claim of successful real-world parking requires quantitative backing to properly substantiate the Real2Sim2Real transfer. The full manuscript reports qualitative demonstrations of parking in various slot types, including narrow mechanical ones, along with some experimental results in later sections, but these metrics (success rates, collision rates, failure modes) are not summarized in the abstract, nor are direct baseline comparisons provided there. In the revised manuscript, we will update the abstract to include key quantitative results from our real-world experiments, such as overall success rates across slot types and collision statistics, while noting the absence of extensive baseline comparisons as a limitation of the current evaluation. This revision will make the claims more evidence-based without overstating the results. revision: yes
-
Referee: Simulator description (Real2Sim step): the 3DGS reconstruction is presented primarily for visual fidelity; no quantitative validation (e.g., trajectory matching, actuator latency, or tire-slip reproduction) is supplied for the underlying physics engine, leaving the assumption that simulation dynamics match real-vehicle behavior within the tolerance needed for collision-free narrow-slot control untested.
Authors: We acknowledge that the manuscript emphasizes the visual fidelity of the 3DGS reconstruction for perception but provides limited explicit quantitative validation of the physics engine components. The simulator relies on standard vehicle dynamics models integrated with the 3DGS environment, and we assumed these suffice for the narrow-slot tasks based on observed transfer success. However, we recognize that metrics such as trajectory matching, actuator latency, and tire-slip reproduction would better support the sim-to-real assumptions. In the revision, we will add a dedicated subsection with any available quantitative comparisons (e.g., simulated vs. real trajectory errors where recorded) and explicitly discuss the physics modeling limitations, including potential gaps in latency and slip reproduction. If additional validation experiments are feasible, we will include them; otherwise, we will frame this as an area for future work while qualifying the current transfer claims. revision: partial
Circularity Check
No significant circularity; empirical pipeline uses standard components without self-referential reduction.
full rationale
The paper describes an end-to-end RL parking method (SAC + behavior cloning + collision penalty) trained in a 3DGS-based Real2Sim2Real simulator and deployed on hardware. No derivation chain, equation, or central claim reduces by construction to its own inputs, a fitted parameter renamed as prediction, or a load-bearing self-citation chain. Success is asserted via experimental transfer rather than mathematical equivalence to the training setup. This is the expected non-circular outcome for a systems paper relying on established RL and reconstruction techniques.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption 3D Gaussian Splatting reconstruction from real scenes produces a simulator whose visual and geometric fidelity is sufficient for policy transfer to physical vehicles.
- domain assumption The asymmetric RL framework with SAC and behavior cloning will produce a policy that generalizes from simulation to real narrow mechanical parking slots.
Reference graph
Works this paper leans on
-
[1]
Multi-modal fusion transformer for end-to-end autonomous driving,
A. Prakash, K. Chitta, and A. Geiger, “Multi-modal fusion transformer for end-to-end autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 7077– 7087
work page 2021
-
[2]
Planning-oriented autonomous driving,
Y . Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wanget al., “Planning-oriented autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 17 853–17 862
work page 2023
-
[3]
VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning
S. Chen, B. Jiang, H. Gao, B. Liao, Q. Xu, Q. Zhang, C. Huang, W. Liu, and X. Wang, “V ADv2: End-to-end vectorized autonomous driving via probabilistic planning,”arXiv preprint arXiv:2402.13243, 2024. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 13
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[4]
Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation
Z. Li, K. Li, S. Wang, S. Lan, Z. Yu, Y . Ji, Z. Li, Z. Zhu, J. Kautz, Z. Wu et al., “Hydra-mdp: End-to-end multimodal planning with multi-target hydra-distillation,”arXiv preprint arXiv:2406.06978, 2024
work page internal anchor Pith review arXiv 2024
-
[5]
Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving,
B. Liao, S. Chen, H. Yin, B. Jiang, C. Wang, S. Yan, X. Zhang, X. Li, Y . Zhang, Q. Zhanget al., “Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 12 037–12 047
work page 2025
-
[6]
ParkingE2E: Camera- based end-to-end parking network, from images to planning,
C. Li, Z. Ji, Z. Chen, T. Qin, and M. Yang, “ParkingE2E: Camera- based end-to-end parking network, from images to planning,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 13 206–13 212
work page 2024
-
[7]
A method for the shortest path search by extended dijkstra algorithm,
M. Noto and H. Sato, “A method for the shortest path search by extended dijkstra algorithm,” inSmc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. ’cybernetics evolving to systems, humans, organizations, and their complex interac- tions’(cat. no. 0, vol. 3. IEEE, 2000, pp. 2316–2320
work page 2000
-
[8]
K. Kurzer, “Path planning in unstructured environments : A real-time hybrid A* implementation for fast and deterministic path generation for the kth research concept vehicle,” Master’s thesis, KTH, Integrated Transport Research Lab, ITRL, 2016
work page 2016
-
[9]
G. Tang, C. Tang, C. Claramunt, X. Hu, and P. Zhou, “Geometric a-star algorithm: An improved a-star algorithm for agv path planning in a port environment,”IEEE access, vol. 9, pp. 59 196–59 210, 2021
work page 2021
-
[10]
Dynamic anti- collision a-star algorithm for multi-ship encounter situations,
Z. He, C. Liu, X. Chu, R. R. Negenborn, and Q. Wu, “Dynamic anti- collision a-star algorithm for multi-ship encounter situations,”Applied Ocean Research, vol. 118, p. 102995, 2022
work page 2022
-
[11]
Optimal path planning using RRT* based approaches: a survey and future directions,
I. Noreen, A. Khan, and Z. Habib, “Optimal path planning using RRT* based approaches: a survey and future directions,”International Journal of Advanced Computer Science and Applications, vol. 7, no. 11, 2016
work page 2016
-
[12]
R. Bohlin and L. E. Kavraki, “Path planning using lazy prm,” in Proceedings 2000 ICRA. Millennium conference. IEEE international conference on robotics and automation. Symposia proceedings (Cat. No. 00CH37065), vol. 1. IEEE, 2000, pp. 521–528
work page 2000
-
[13]
H. J. Sussmann and G. Tang, “Shortest paths for the reeds-shepp car: a worked out example of the use of geometric techniques in nonlinear optimal control,” 1991
work page 1991
-
[14]
R. S. Sutton, A. G. Bartoet al.,Reinforcement learning: An introduction. MIT press Cambridge, 1998, vol. 1, no. 1
work page 1998
-
[15]
Carla: An open urban driving simulator,
A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “Carla: An open urban driving simulator,” inConference on robot learning. PMLR, 2017, pp. 1–16
work page 2017
-
[16]
Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning
V . Makoviychuk, L. Wawrzyniak, Y . Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handaet al., “Isaac gym: High performance gpu-based physics simulation for robot learning,”arXiv preprint arXiv:2108.10470, 2021
work page internal anchor Pith review arXiv 2021
-
[17]
Mujoco: A physics engine for model- based control,
E. Todorov, T. Erez, and Y . Tassa, “Mujoco: A physics engine for model- based control,” in2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 2012, pp. 5026–5033
work page 2012
-
[18]
Unity: A general platform for intelligent agents.arXiv:1809.02627, 2018
A. Juliani, V .-P. Berges, E. Teng, A. Cohen, J. Harper, C. Elion, C. Goy, Y . Gao, H. Henry, M. Mattaret al., “Unity: A general platform for intelligent agents,”arXiv preprint arXiv:1809.02627, 2018
-
[19]
Discoverse: Efficient robot simulation in complex high-fidelity environments,
Y . Jia, G. Wang, Y . Dong, J. Wu, Y . Zeng, H. Lin, Z. Wang, H. Ge, W. Gu, K. Dinget al., “Discoverse: Efficient robot simulation in complex high-fidelity environments,”arXiv preprint arXiv:2507.21981, 2025
-
[20]
Vr-robo: A real- to-sim-to-real framework for visual robot navigation and locomotion,
S. Zhu, L. Mou, D. Li, B. Ye, R. Huang, and H. Zhao, “Vr-robo: A real- to-sim-to-real framework for visual robot navigation and locomotion,” IEEE Robotics and Automation Letters, 2025
work page 2025
-
[21]
Precise and dexterous robotic ma- nipulation via human-in-the-loop reinforcement learning,
J. Luo, C. Xu, J. Wu, and S. Levine, “Precise and dexterous robotic ma- nipulation via human-in-the-loop reinforcement learning,”arXiv preprint arXiv:2410.21845, 2024
-
[22]
Continuous control with deep reinforcement learning
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y . Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,”arXiv preprint arXiv:1509.02971, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[23]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox- imal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[24]
Soft Actor-Critic Algorithms and Applications
T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V . Ku- mar, H. Zhu, A. Gupta, P. Abbeelet al., “Soft actor-critic algorithms and applications,”arXiv preprint arXiv:1812.05905, 2018
work page internal anchor Pith review arXiv 2018
-
[25]
S. Dankwa and W. Zheng, “Twin-delayed ddpg: A deep reinforcement learning technique to model a continuous movement of an intelligent robot agent,” inProceedings of the 3rd international conference on vision, image and signal processing, 2019, pp. 1–5
work page 2019
-
[26]
Multi-objective autonomous eco-driving strategy: A pathway to future green mobility,
H. Tong, L. Chu, Z. Chen, Y . Liu, Y . Zhang, and J. Hu, “Multi-objective autonomous eco-driving strategy: A pathway to future green mobility,” Green Energy and Intelligent Transportation, p. 100279, 2025
work page 2025
-
[27]
T. Li, J. Ruan, and K. Zhang, “The investigation of reinforcement learning-based end-to-end decision-making algorithms for autonomous driving on the road with consecutive sharp turns,”Green Energy and Intelligent Transportation, vol. 4, no. 3, p. 100288, 2025
work page 2025
-
[28]
Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware
T. Z. Zhao, V . Kumar, S. Levine, and C. Finn, “Learning fine- grained bimanual manipulation with low-cost hardware,”arXiv preprint arXiv:2304.13705, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[29]
Offline reinforcement learning for autonomous driving with real world driving data,
X. Fang, Q. Zhang, Y . Gao, and D. Zhao, “Offline reinforcement learning for autonomous driving with real world driving data,” in2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2022, pp. 3417–3422
work page 2022
-
[30]
Boosting offline reinforce- ment learning for autonomous driving with hierarchical latent skills,
Z. Li, F. Nie, Q. Sun, F. Da, and H. Zhao, “Boosting offline reinforce- ment learning for autonomous driving with hierarchical latent skills,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 18 362–18 369
work page 2024
-
[31]
HOPE: A reinforcement learning-based hybrid policy path planner for diverse parking scenarios,
M. Jiang, Y . Li, S. Zhang, S. Chen, C. Wang, and M. Yang, “HOPE: A reinforcement learning-based hybrid policy path planner for diverse parking scenarios,”IEEE Transactions on Intelligent Transportation Systems, 2025
work page 2025
-
[32]
RL-OGM-Parking: Lidar ogm-based hybrid reinforcement learning planner for autonomous parking,
Z. Wang, Z. Chen, M. Jiang, T. Qin, and M. Yang, “RL-OGM-Parking: Lidar ogm-based hybrid reinforcement learning planner for autonomous parking,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 8420–8426
work page 2025
-
[33]
Z. Yang, Z. Peng, and J. Ma, “SEG-Parking: Towards safe, efficient, and generalizable autonomous parking via end-to-end offline reinforcement learning,”arXiv preprint arXiv:2509.13956, 2025
-
[34]
3d gaussian splatting for real-time radiance field rendering
B. Kerbl, G. Kopanas, T. Leimk ¨uhler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering.”ACM Trans. Graph., vol. 42, no. 4, pp. 139–1, 2023
work page 2023
-
[35]
Mars: An instance-aware, modular and realistic simulator for autonomous driving,
Z. Wu, T. Liu, L. Luo, Z. Zhong, J. Chen, H. Xiao, C. Hou, H. Lou, Y . Chen, R. Yanget al., “Mars: An instance-aware, modular and realistic simulator for autonomous driving,” inCAAI International Conference on Artificial Intelligence. Springer, 2023, pp. 3–15
work page 2023
-
[36]
RL-GSBridge: 3d gaussian splatting based real2sim2real method for robotic manipulation learning,
Y . Wu, L. Pan, W. Wu, G. Wang, Y . Miao, F. Xu, and H. Wang, “RL-GSBridge: 3d gaussian splatting based real2sim2real method for robotic manipulation learning,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 192–198
work page 2025
-
[37]
Unisim: A neural closed-loop sensor simulator,
Z. Yang, Y . Chen, J. Wang, S. Manivasagam, W.-C. Ma, A. J. Yang, and R. Urtasun, “Unisim: A neural closed-loop sensor simulator,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1389–1399
work page 2023
-
[38]
NeRF: Representing scenes as neural radiance fields for view synthesis,
B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “NeRF: Representing scenes as neural radiance fields for view synthesis,”Communications of the ACM, vol. 65, no. 1, pp. 99– 106, 2021
work page 2021
-
[39]
Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d,
J. Philion and S. Fidler, “Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d,” inComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16. Springer, 2020, pp. 194–210
work page 2020
-
[40]
3drealcar: An in-the-wild rgb-d car dataset with 360-degree views,
X. Du, Y . Wang, H. Sun, Z. Wu, H. Sheng, S. Wang, J. Ying, M. Lu, T. Zhu, K. Zhanet al., “3drealcar: An in-the-wild rgb-d car dataset with 360-degree views,”arXiv preprint arXiv:2406.04875, 2024
-
[41]
Optimal paths for a car that goes both forwards and backwards,
J. Reeds and L. Shepp, “Optimal paths for a car that goes both forwards and backwards,”Pacific journal of mathematics, vol. 145, no. 2, pp. 367–393, 1990
work page 1990
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.