pith. machine review for the scientific record. sign in

arxiv: 2605.08713 · v1 · submitted 2026-05-09 · 💻 cs.RO · cs.AI

Recognition: no theorem link

REAP: Reinforcement-Learning End-to-End Autonomous Parking with Gaussian Splatting Simulator for Real2Sim2Real Transfer

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:25 UTC · model grok-4.3

classification 💻 cs.RO cs.AI
keywords reinforcement learningautonomous parkingend-to-end controlgaussian splattingsim-to-real transferreal2sim2realrobotics
0
0 comments X

The pith

An end-to-end reinforcement learning policy trained in a 3D Gaussian Splatting simulator transfers directly to physical vehicles and parks successfully in narrow mechanical slots.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces REAP to handle autonomous parking in extreme scenarios such as narrow mechanical and dead-end slots where multistage methods often fail due to accumulated errors. End-to-end training lets the system optimize perception and planning together through reinforcement learning. REAP applies Soft Actor-Critic in an asymmetric framework, distills a rule-based planner via behavior cloning, adds a soft predictive collision penalty, and builds a Real2Sim2Real pipeline that reconstructs real scenes with 3D Gaussian Splatting for simulation training. The policy is then deployed on actual vehicles without further adaptation. Real-world tests confirm it works across parking types and establishes feasibility for end-to-end RL in tight mechanical spaces.

Core claim

REAP shows that an end-to-end reinforcement learning policy trained with Soft Actor-Critic in an asymmetric framework, after distilling a rule-based planner and applying a soft predictive collision penalty inside a 3D Gaussian Splatting simulator, can be deployed directly onto physical vehicles to achieve autonomous parking in various spaces, including extremely narrow mechanical slots.

What carries the argument

The asymmetric Soft Actor-Critic reinforcement learning framework combined with 3D Gaussian Splatting for Real2Sim2Real scene reconstruction and direct policy transfer.

Load-bearing premise

The 3D Gaussian Splatting reconstruction and asymmetric training produce a policy whose simulated behavior matches real vehicle dynamics and sensor inputs closely enough for safe zero-shot deployment.

What would settle it

Deploy the trained REAP policy on a physical vehicle in an extremely narrow mechanical parking slot and record repeated collisions or failures to complete the maneuver.

Figures

Figures reproduced from arXiv: 2605.08713 by Changze Li, Lisen Mu, Ming Yang, Qian Zhang, Qing Su, Shaoyu Chen, Tong Qin, Yijian Li, Yuelong Yu, Zhe Chen.

Figure 1
Figure 1. Figure 1: The structure of the overall workflow. The algorithm adopts the Real2Sim2Real paradigm. Real2Sim: Construct simulation scenarios using data collected via handheld devices. Closed-loop training: Train the network model in the simulator using an end-to-end reinforcement learning approach. Sim2Real: Deploy the model trained in the simulator to a real vehicle, enabling real-world validation in standard garages… view at source ↗
Figure 2
Figure 2. Figure 2: The comparison of different methods. (a) Rule-based planner. (b) Imitation learning end-to-end method. (c) Reinforcement learning end-to-end method. (d) Our proposed method. significant potential in tackling complex autonomous driving challenges, ranging from multi-objective strategic planning [26] to end-to-end maneuver control in demanding scenarios like consecutive sharp turns [27]. For camera-based aut… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the proposed end-to-end autonomous parking framework. The system utilizes surround-view images and user-selected target heatmaps as inputs, which are encoded into latent spatial features. These features are fused and fed into the actor network for continuous action prediction. During training, auxiliary BEV perception tasks are co-optimized to enhance feature representation, and ground-truth BE… view at source ↗
Figure 4
Figure 4. Figure 4: Different collision scenarios. Collision status of the vehicle and its predicted actions. Red indicates collision areas, and green indicates collision￾free areas. (a) 80% of the trajectories result in collision; (b) 20% of the trajectories result in collision. which include Intersection-over-Union (IoU) reward RIoU and soft collision reward RSC. Together, the predictive collision reward RPC and the soft co… view at source ↗
Figure 5
Figure 5. Figure 5: Description of the scenarios used in training and validation. Scenario 1 (approximately 60 parking spaces) and Scenario 2 (approximately 100 parking spaces) are both underground parking slots. Scenario 3 (approximately 60 parking spaces) and Scenario 4 (approximately 60 parking spaces) are ground parking slots. Scenario 5 is a mechanical parking space, with a width of only 2.1 m and side panels on both sid… view at source ↗
Figure 6
Figure 6. Figure 6: Training scene randomization and visualization of auxiliary tasks prediction results. The figure presents the global map, surround-view camera images, ground-truth occupancy map, and the composite visualization of occupancy and parking slot detection predicted by the model. The results demonstrate both the simulator’s capability of randomly adding vehicles and the model’s generalization ability in auxiliar… view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative Sim2Real perception comparison of models trained with different simulators. The three rows in the left block show the rendered environment images and occupancy maps generated by three different simulators: 3DGS-based, Mesh-based, and CARLA-based. The right block shows the environment images and occupancy map of the corresponding real-world scene. The middle block presents the perception results… view at source ↗
Figure 8
Figure 8. Figure 8: Representative closed-loop real-vehicle parking results in five test scenarios. Each row corresponds to one real-world test scenario. From left to right, the four columns show the BEV scene layout, the executed parking trajectory, the real-vehicle start pose, and the final parked pose, respectively. In the BEV scene layout, the blue arrow indicates the initial vehicle pose. The five scenarios cover diverse… view at source ↗
read the original abstract

In recent years, autonomous parking has made significant advances, yet parking tasks still face challenges in extreme scenarios such as mechanical and dead-end parking slots, often resulting in failures. This is mainly due to traditional parking methods adopting a multistage approach, lacking the ability to optimize the parking problem as a whole. End-to-end methods enable joint optimization across perception and planning modules to eliminate the accumulation of errors, enhancing algorithm performance in extreme scenarios. Although several end-to-end parking methods use imitation or reinforcement learning, the former is limited by data cost and distribution coverage, while the latter suffers from inefficient exploration. To address these challenges, we propose a Reinforcement learning End-to-end Autonomous Parking method (REAP). REAP employs Soft Actor-Critic (SAC) within an asymmetric reinforcement learning framework to improve training efficiency and inference performance. To accelerate model convergence, we distill the capabilities of a rule-based planner into the end-to-end network through behavior cloning. We further introduce a soft predictive collision penalty mechanism to reduce collision rates by penalizing obstacle-approaching actions. To ensure that the trained reinforcement learning network can directly transfer to real-world scenarios, we have established a Real2Sim2Real simulator. In the Real2Sim step, we use 3D Gaussian Splatting (3DGS) to transform real-world scenes into digital scenes. In the Sim2Real step, we deploy the end-to-end model onto the vehicle to bridge the Sim2Real gap. Trained in the 3DGS simulator and deployed on physical vehicles, REAP successfully parks in various types of parking spaces, especially demonstrating the feasibility of end-to-end RL parking in extremely narrow mechanical slots.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes REAP, a reinforcement-learning end-to-end autonomous parking method that employs Soft Actor-Critic (SAC) in an asymmetric framework, distills a rule-based planner via behavior cloning, and adds a soft predictive collision penalty. It introduces a Real2Sim2Real pipeline in which 3D Gaussian Splatting reconstructs real scenes into a simulator for training, followed by direct deployment of the policy on physical vehicles, with the central claim being successful parking across various slot types and demonstrated feasibility in extremely narrow mechanical slots without additional real-world adaptation.

Significance. If the quantitative claims hold, the work would supply evidence that photorealistic 3DGS simulators combined with asymmetric RL and behavior cloning can enable direct sim-to-real transfer for end-to-end policies in safety-critical, kinematically tight parking tasks, potentially reducing reliance on multistage planning and real-world fine-tuning.

major comments (2)
  1. Abstract: the assertion that the trained model 'successfully parks' in real-world scenarios, especially narrow mechanical slots, is unsupported by any reported success rates, collision rates, failure-mode analysis, or baseline comparisons, which are required to evaluate the load-bearing Real2Sim2Real transfer claim.
  2. Simulator description (Real2Sim step): the 3DGS reconstruction is presented primarily for visual fidelity; no quantitative validation (e.g., trajectory matching, actuator latency, or tire-slip reproduction) is supplied for the underlying physics engine, leaving the assumption that simulation dynamics match real-vehicle behavior within the tolerance needed for collision-free narrow-slot control untested.
minor comments (1)
  1. Abstract: consider inserting key numerical results (success rates, slot widths, etc.) to make the summary self-contained.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for strengthening the presentation of our Real2Sim2Real claims and simulator validation. We address each major comment point by point below, indicating the revisions we will incorporate in the updated version.

read point-by-point responses
  1. Referee: Abstract: the assertion that the trained model 'successfully parks' in real-world scenarios, especially narrow mechanical slots, is unsupported by any reported success rates, collision rates, failure-mode analysis, or baseline comparisons, which are required to evaluate the load-bearing Real2Sim2Real transfer claim.

    Authors: We agree that the abstract's claim of successful real-world parking requires quantitative backing to properly substantiate the Real2Sim2Real transfer. The full manuscript reports qualitative demonstrations of parking in various slot types, including narrow mechanical ones, along with some experimental results in later sections, but these metrics (success rates, collision rates, failure modes) are not summarized in the abstract, nor are direct baseline comparisons provided there. In the revised manuscript, we will update the abstract to include key quantitative results from our real-world experiments, such as overall success rates across slot types and collision statistics, while noting the absence of extensive baseline comparisons as a limitation of the current evaluation. This revision will make the claims more evidence-based without overstating the results. revision: yes

  2. Referee: Simulator description (Real2Sim step): the 3DGS reconstruction is presented primarily for visual fidelity; no quantitative validation (e.g., trajectory matching, actuator latency, or tire-slip reproduction) is supplied for the underlying physics engine, leaving the assumption that simulation dynamics match real-vehicle behavior within the tolerance needed for collision-free narrow-slot control untested.

    Authors: We acknowledge that the manuscript emphasizes the visual fidelity of the 3DGS reconstruction for perception but provides limited explicit quantitative validation of the physics engine components. The simulator relies on standard vehicle dynamics models integrated with the 3DGS environment, and we assumed these suffice for the narrow-slot tasks based on observed transfer success. However, we recognize that metrics such as trajectory matching, actuator latency, and tire-slip reproduction would better support the sim-to-real assumptions. In the revision, we will add a dedicated subsection with any available quantitative comparisons (e.g., simulated vs. real trajectory errors where recorded) and explicitly discuss the physics modeling limitations, including potential gaps in latency and slip reproduction. If additional validation experiments are feasible, we will include them; otherwise, we will frame this as an area for future work while qualifying the current transfer claims. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical pipeline uses standard components without self-referential reduction.

full rationale

The paper describes an end-to-end RL parking method (SAC + behavior cloning + collision penalty) trained in a 3DGS-based Real2Sim2Real simulator and deployed on hardware. No derivation chain, equation, or central claim reduces by construction to its own inputs, a fitted parameter renamed as prediction, or a load-bearing self-citation chain. Success is asserted via experimental transfer rather than mathematical equivalence to the training setup. This is the expected non-circular outcome for a systems paper relying on established RL and reconstruction techniques.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the unproven fidelity of the 3DGS simulator for dynamics and the assumption that RL exploration with behavior cloning will converge to a transferable policy without real-world fine-tuning.

axioms (2)
  • domain assumption 3D Gaussian Splatting reconstruction from real scenes produces a simulator whose visual and geometric fidelity is sufficient for policy transfer to physical vehicles.
    Invoked in the Real2Sim step to justify training entirely in simulation.
  • domain assumption The asymmetric RL framework with SAC and behavior cloning will produce a policy that generalizes from simulation to real narrow mechanical parking slots.
    Central to the Sim2Real claim.

pith-pipeline@v0.9.0 · 5636 in / 1491 out tokens · 43646 ms · 2026-05-12T01:25:56.364350+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 7 internal anchors

  1. [1]

    Multi-modal fusion transformer for end-to-end autonomous driving,

    A. Prakash, K. Chitta, and A. Geiger, “Multi-modal fusion transformer for end-to-end autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 7077– 7087

  2. [2]

    Planning-oriented autonomous driving,

    Y . Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wanget al., “Planning-oriented autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 17 853–17 862

  3. [3]

    VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning

    S. Chen, B. Jiang, H. Gao, B. Liao, Q. Xu, Q. Zhang, C. Huang, W. Liu, and X. Wang, “V ADv2: End-to-end vectorized autonomous driving via probabilistic planning,”arXiv preprint arXiv:2402.13243, 2024. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 13

  4. [4]

    Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

    Z. Li, K. Li, S. Wang, S. Lan, Z. Yu, Y . Ji, Z. Li, Z. Zhu, J. Kautz, Z. Wu et al., “Hydra-mdp: End-to-end multimodal planning with multi-target hydra-distillation,”arXiv preprint arXiv:2406.06978, 2024

  5. [5]

    Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving,

    B. Liao, S. Chen, H. Yin, B. Jiang, C. Wang, S. Yan, X. Zhang, X. Li, Y . Zhang, Q. Zhanget al., “Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 12 037–12 047

  6. [6]

    ParkingE2E: Camera- based end-to-end parking network, from images to planning,

    C. Li, Z. Ji, Z. Chen, T. Qin, and M. Yang, “ParkingE2E: Camera- based end-to-end parking network, from images to planning,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 13 206–13 212

  7. [7]

    A method for the shortest path search by extended dijkstra algorithm,

    M. Noto and H. Sato, “A method for the shortest path search by extended dijkstra algorithm,” inSmc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. ’cybernetics evolving to systems, humans, organizations, and their complex interac- tions’(cat. no. 0, vol. 3. IEEE, 2000, pp. 2316–2320

  8. [8]

    Path planning in unstructured environments : A real-time hybrid A* implementation for fast and deterministic path generation for the kth research concept vehicle,

    K. Kurzer, “Path planning in unstructured environments : A real-time hybrid A* implementation for fast and deterministic path generation for the kth research concept vehicle,” Master’s thesis, KTH, Integrated Transport Research Lab, ITRL, 2016

  9. [9]

    Geometric a-star algorithm: An improved a-star algorithm for agv path planning in a port environment,

    G. Tang, C. Tang, C. Claramunt, X. Hu, and P. Zhou, “Geometric a-star algorithm: An improved a-star algorithm for agv path planning in a port environment,”IEEE access, vol. 9, pp. 59 196–59 210, 2021

  10. [10]

    Dynamic anti- collision a-star algorithm for multi-ship encounter situations,

    Z. He, C. Liu, X. Chu, R. R. Negenborn, and Q. Wu, “Dynamic anti- collision a-star algorithm for multi-ship encounter situations,”Applied Ocean Research, vol. 118, p. 102995, 2022

  11. [11]

    Optimal path planning using RRT* based approaches: a survey and future directions,

    I. Noreen, A. Khan, and Z. Habib, “Optimal path planning using RRT* based approaches: a survey and future directions,”International Journal of Advanced Computer Science and Applications, vol. 7, no. 11, 2016

  12. [12]

    Path planning using lazy prm,

    R. Bohlin and L. E. Kavraki, “Path planning using lazy prm,” in Proceedings 2000 ICRA. Millennium conference. IEEE international conference on robotics and automation. Symposia proceedings (Cat. No. 00CH37065), vol. 1. IEEE, 2000, pp. 521–528

  13. [13]

    Shortest paths for the reeds-shepp car: a worked out example of the use of geometric techniques in nonlinear optimal control,

    H. J. Sussmann and G. Tang, “Shortest paths for the reeds-shepp car: a worked out example of the use of geometric techniques in nonlinear optimal control,” 1991

  14. [14]

    R. S. Sutton, A. G. Bartoet al.,Reinforcement learning: An introduction. MIT press Cambridge, 1998, vol. 1, no. 1

  15. [15]

    Carla: An open urban driving simulator,

    A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “Carla: An open urban driving simulator,” inConference on robot learning. PMLR, 2017, pp. 1–16

  16. [16]

    Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning

    V . Makoviychuk, L. Wawrzyniak, Y . Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handaet al., “Isaac gym: High performance gpu-based physics simulation for robot learning,”arXiv preprint arXiv:2108.10470, 2021

  17. [17]

    Mujoco: A physics engine for model- based control,

    E. Todorov, T. Erez, and Y . Tassa, “Mujoco: A physics engine for model- based control,” in2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 2012, pp. 5026–5033

  18. [18]

    Unity: A general platform for intelligent agents.arXiv:1809.02627, 2018

    A. Juliani, V .-P. Berges, E. Teng, A. Cohen, J. Harper, C. Elion, C. Goy, Y . Gao, H. Henry, M. Mattaret al., “Unity: A general platform for intelligent agents,”arXiv preprint arXiv:1809.02627, 2018

  19. [19]

    Discoverse: Efficient robot simulation in complex high-fidelity environments,

    Y . Jia, G. Wang, Y . Dong, J. Wu, Y . Zeng, H. Lin, Z. Wang, H. Ge, W. Gu, K. Dinget al., “Discoverse: Efficient robot simulation in complex high-fidelity environments,”arXiv preprint arXiv:2507.21981, 2025

  20. [20]

    Vr-robo: A real- to-sim-to-real framework for visual robot navigation and locomotion,

    S. Zhu, L. Mou, D. Li, B. Ye, R. Huang, and H. Zhao, “Vr-robo: A real- to-sim-to-real framework for visual robot navigation and locomotion,” IEEE Robotics and Automation Letters, 2025

  21. [21]

    Precise and dexterous robotic ma- nipulation via human-in-the-loop reinforcement learning,

    J. Luo, C. Xu, J. Wu, and S. Levine, “Precise and dexterous robotic ma- nipulation via human-in-the-loop reinforcement learning,”arXiv preprint arXiv:2410.21845, 2024

  22. [22]

    Continuous control with deep reinforcement learning

    T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y . Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,”arXiv preprint arXiv:1509.02971, 2015

  23. [23]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox- imal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

  24. [24]

    Soft Actor-Critic Algorithms and Applications

    T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V . Ku- mar, H. Zhu, A. Gupta, P. Abbeelet al., “Soft actor-critic algorithms and applications,”arXiv preprint arXiv:1812.05905, 2018

  25. [25]

    Twin-delayed ddpg: A deep reinforcement learning technique to model a continuous movement of an intelligent robot agent,

    S. Dankwa and W. Zheng, “Twin-delayed ddpg: A deep reinforcement learning technique to model a continuous movement of an intelligent robot agent,” inProceedings of the 3rd international conference on vision, image and signal processing, 2019, pp. 1–5

  26. [26]

    Multi-objective autonomous eco-driving strategy: A pathway to future green mobility,

    H. Tong, L. Chu, Z. Chen, Y . Liu, Y . Zhang, and J. Hu, “Multi-objective autonomous eco-driving strategy: A pathway to future green mobility,” Green Energy and Intelligent Transportation, p. 100279, 2025

  27. [27]

    The investigation of reinforcement learning-based end-to-end decision-making algorithms for autonomous driving on the road with consecutive sharp turns,

    T. Li, J. Ruan, and K. Zhang, “The investigation of reinforcement learning-based end-to-end decision-making algorithms for autonomous driving on the road with consecutive sharp turns,”Green Energy and Intelligent Transportation, vol. 4, no. 3, p. 100288, 2025

  28. [28]

    Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

    T. Z. Zhao, V . Kumar, S. Levine, and C. Finn, “Learning fine- grained bimanual manipulation with low-cost hardware,”arXiv preprint arXiv:2304.13705, 2023

  29. [29]

    Offline reinforcement learning for autonomous driving with real world driving data,

    X. Fang, Q. Zhang, Y . Gao, and D. Zhao, “Offline reinforcement learning for autonomous driving with real world driving data,” in2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2022, pp. 3417–3422

  30. [30]

    Boosting offline reinforce- ment learning for autonomous driving with hierarchical latent skills,

    Z. Li, F. Nie, Q. Sun, F. Da, and H. Zhao, “Boosting offline reinforce- ment learning for autonomous driving with hierarchical latent skills,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 18 362–18 369

  31. [31]

    HOPE: A reinforcement learning-based hybrid policy path planner for diverse parking scenarios,

    M. Jiang, Y . Li, S. Zhang, S. Chen, C. Wang, and M. Yang, “HOPE: A reinforcement learning-based hybrid policy path planner for diverse parking scenarios,”IEEE Transactions on Intelligent Transportation Systems, 2025

  32. [32]

    RL-OGM-Parking: Lidar ogm-based hybrid reinforcement learning planner for autonomous parking,

    Z. Wang, Z. Chen, M. Jiang, T. Qin, and M. Yang, “RL-OGM-Parking: Lidar ogm-based hybrid reinforcement learning planner for autonomous parking,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 8420–8426

  33. [33]

    SEG-Parking: Towards safe, efficient, and generalizable autonomous parking via end-to-end offline reinforcement learning,

    Z. Yang, Z. Peng, and J. Ma, “SEG-Parking: Towards safe, efficient, and generalizable autonomous parking via end-to-end offline reinforcement learning,”arXiv preprint arXiv:2509.13956, 2025

  34. [34]

    3d gaussian splatting for real-time radiance field rendering

    B. Kerbl, G. Kopanas, T. Leimk ¨uhler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering.”ACM Trans. Graph., vol. 42, no. 4, pp. 139–1, 2023

  35. [35]

    Mars: An instance-aware, modular and realistic simulator for autonomous driving,

    Z. Wu, T. Liu, L. Luo, Z. Zhong, J. Chen, H. Xiao, C. Hou, H. Lou, Y . Chen, R. Yanget al., “Mars: An instance-aware, modular and realistic simulator for autonomous driving,” inCAAI International Conference on Artificial Intelligence. Springer, 2023, pp. 3–15

  36. [36]

    RL-GSBridge: 3d gaussian splatting based real2sim2real method for robotic manipulation learning,

    Y . Wu, L. Pan, W. Wu, G. Wang, Y . Miao, F. Xu, and H. Wang, “RL-GSBridge: 3d gaussian splatting based real2sim2real method for robotic manipulation learning,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 192–198

  37. [37]

    Unisim: A neural closed-loop sensor simulator,

    Z. Yang, Y . Chen, J. Wang, S. Manivasagam, W.-C. Ma, A. J. Yang, and R. Urtasun, “Unisim: A neural closed-loop sensor simulator,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1389–1399

  38. [38]

    NeRF: Representing scenes as neural radiance fields for view synthesis,

    B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “NeRF: Representing scenes as neural radiance fields for view synthesis,”Communications of the ACM, vol. 65, no. 1, pp. 99– 106, 2021

  39. [39]

    Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d,

    J. Philion and S. Fidler, “Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d,” inComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16. Springer, 2020, pp. 194–210

  40. [40]

    3drealcar: An in-the-wild rgb-d car dataset with 360-degree views,

    X. Du, Y . Wang, H. Sun, Z. Wu, H. Sheng, S. Wang, J. Ying, M. Lu, T. Zhu, K. Zhanet al., “3drealcar: An in-the-wild rgb-d car dataset with 360-degree views,”arXiv preprint arXiv:2406.04875, 2024

  41. [41]

    Optimal paths for a car that goes both forwards and backwards,

    J. Reeds and L. Shepp, “Optimal paths for a car that goes both forwards and backwards,”Pacific journal of mathematics, vol. 145, no. 2, pp. 367–393, 1990