pith. machine review for the scientific record. sign in

arxiv: 2604.12626 · v1 · submitted 2026-04-14 · 💻 cs.RO · cs.CV

Recognition: unknown

Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:20 UTC · model grok-4.3

classification 💻 cs.RO cs.CV
keywords 3D Gaussian Splattingembodied AInavigation simulatorHabitatGaussian avatarsphotorealistic renderinghuman-aware navigationcross-domain generalization
0
0 comments X

The pith

Integrating 3D Gaussian Splatting and dynamic Gaussian avatars into a navigation simulator boosts agent generalization to new domains and human interactions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Habitat-GS adds 3D Gaussian Splatting rendering to the Habitat simulator for more realistic visual environments and introduces drivable Gaussian avatars that act as both realistic visuals and physical obstacles. This setup allows embodied AI agents to train with higher fidelity simulations of real-world scenes containing people. Experiments indicate that training on these 3DGS scenes leads to better performance when tested in different domains, and that mixed training with both 3DGS and standard scenes works best. The avatars specifically help agents learn to navigate around humans without collisions.

Core claim

Habitat-GS extends Habitat-Sim with a 3DGS renderer for real-time photorealistic rendering and a Gaussian avatar module where each avatar serves as both a photorealistic visual entity and a navigation obstacle, resulting in improved cross-domain generalization for point-goal navigation agents and effective human-aware navigation.

What carries the argument

The 3D Gaussian Splatting renderer combined with the Gaussian avatar module that provides both visual fidelity and collision detection for dynamic humans.

Load-bearing premise

The photorealism and collision modeling from 3D Gaussian Splatting and drivable avatars are sufficient to improve real-world generalization without introducing new artifacts or biases in agent behavior.

What would settle it

A real-world deployment test in which agents trained in Habitat-GS show no better or worse performance than agents trained in standard mesh-based Habitat-Sim when navigating around actual moving people in varied physical environments.

Figures

Figures reproduced from arXiv: 2604.12626 by Chong Cui, Hujun Bao, Jiazhao Zhang, Jingyi Xu, Junbo Chen, Qingsong Yan, Ruizhen Hu, Sida Peng, Tao Ni, Xiaowei Zhou, Yuanhong Yu, Ziyuan Xia.

Figure 1
Figure 1. Figure 1: Habitat-GS is a navigation-centric embodied simulation platform with 3DGS and dynamic gaussian avatars. Compared to traditional mesh-based simulators (left), our 3DGS-based simulator (right) preserves high-frequency visual details and view￾dependent effects, while gaussian avatars provide realistic and dynamic human presence for human-aware navigation scenarios, thus helping train more robust agents. embod… view at source ↗
Figure 2
Figure 2. Figure 2: System overview of Habitat-GS. From left to right: Asset Preparation, where 3DGS scene assets and gaussian avatar assets are prepared respectively. Habitat￾GS Simulation Environment, where the render engine performs 3DGS rasterization for scene gaussians and LBS deformation followed by rasterization for avatar gaussians, producing RGB-D observations. The NavMesh blocking module retrieves pre-computed proxy… view at source ↗
Figure 3
Figure 3. Figure 3: Visual comparison of scene rendering. Mesh-based rendering (left) vs. our 3DGS rendering (right). Our simulator is based on 3DGS, which preserves high￾frequency details and supports diverse sources of rendering assets. lightweight CUDA LBS kernel to deform them to arbitrary SMPL-X [19] poses, thereby avoiding costly neural network inference at runtime. Below we detail how this rendering capability is combi… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparison of mesh avatars and gaussian avatars. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: VLM scene quality assessment. Gemini 3.0 Pro evaluates 240 rendered screenshots from each domain on three perceptual dimensions. GS scenes consistently outperform mesh scenes, confirming their superior visual fidelity and diversity. divided into 48 evaluation batches, each containing 5 GS and 5 mesh images with randomized indices to blind the model of rendering source. The VLM scores each image on a 10-poi… view at source ↗
Figure 6
Figure 6. Figure 6: System architecture of Habitat-GS. The system adopts a “visual–navigation decoupling” design principle, separating the visual rendering modules handled by the CUDA-based 3DGS rasterizer and LBS deformation, from the navigation module man￾aged by the traditional NavMesh and injected proxy capsules. This allows for photo￾realistic agent observations without modifying the core Habitat-Sim navigation logic. Ha… view at source ↗
Figure 7
Figure 7. Figure 7: Additional visualizations of 3DGS scenes and gaussian avatars. [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative visualization of navigation episodes. [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗
read the original abstract

Training embodied AI agents depends critically on the visual fidelity of simulation environments and the ability to model dynamic humans. Current simulators rely on mesh-based rasterization with limited visual realism, and their support for dynamic human avatars, where available, is constrained to mesh representations, hindering agent generalization to human-populated real-world scenarios. We present Habitat-GS, a navigation-centric embodied AI simulator extended from Habitat-Sim that integrates 3D Gaussian Splatting scene rendering and drivable gaussian avatars while maintaining full compatibility with the Habitat ecosystem. Our system implements a 3DGS renderer for real-time photorealistic rendering and supports scalable 3DGS asset import from diverse sources. For dynamic human modeling, we introduce a gaussian avatar module that enables each avatar to simultaneously serve as a photorealistic visual entity and an effective navigation obstacle, allowing agents to learn human-aware behaviors in realistic settings. Experiments on point-goal navigation demonstrate that agents trained on 3DGS scenes achieve stronger cross-domain generalization, with mixed-domain training being the most effective strategy. Evaluations on avatar-aware navigation further confirm that gaussian avatars enable effective human-aware navigation. Finally, performance benchmarks validate the system's scalability across varying scene complexity and avatar counts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces Habitat-GS, an extension of Habitat-Sim that integrates 3D Gaussian Splatting for real-time photorealistic scene rendering and drivable Gaussian avatars for dynamic human modeling in navigation tasks. It maintains full compatibility with the Habitat ecosystem, supports scalable asset import, and presents experiments showing that agents trained on 3DGS scenes achieve stronger cross-domain generalization in point-goal navigation (with mixed-domain training most effective) while Gaussian avatars enable effective human-aware navigation.

Significance. If the reported results hold, the work is significant for embodied AI because it directly addresses two key simulator limitations—visual fidelity and dynamic human modeling—using 3DGS, which has the potential to improve sim-to-real transfer for navigation agents in human-populated environments. The dual role of Gaussian avatars as both photorealistic visuals and collision obstacles is a practical engineering contribution, and the maintained Habitat compatibility lowers barriers to adoption. The stress-test concern regarding new artifacts or biases from 3DGS fidelity does not appear to land as a load-bearing issue given the described implementation and positive experimental outcomes.

minor comments (2)
  1. Abstract: The summary of experimental outcomes would be strengthened by including at least one key quantitative metric (e.g., success rate or SPL improvement) and a brief mention of baselines, even if full details appear later in the paper.
  2. The manuscript would benefit from a short dedicated subsection or paragraph clarifying how Gaussian avatar collision geometry is derived from the splat representation and whether any approximation steps are involved.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of Habitat-GS, recognition of its significance for embodied AI, and recommendation for minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an engineering description of a simulator extension (Habitat-GS) that integrates 3D Gaussian Splatting rendering and drivable avatars into Habitat-Sim. All central claims rest on system implementation details and reported experimental outcomes (point-goal navigation generalization and avatar-aware navigation performance). No derivation chain, equations, first-principles predictions, or fitted parameters labeled as predictions exist in the provided text. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The work is self-contained against external benchmarks via direct experiments.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

As an applied systems paper, the work relies on standard assumptions from computer graphics and simulation without introducing mathematical free parameters or unstated axioms beyond those implicit in 3DGS and Habitat-Sim.

invented entities (1)
  • drivable gaussian avatars no independent evidence
    purpose: To model dynamic humans that function simultaneously as photorealistic visuals and effective navigation obstacles
    New module introduced to overcome limitations of mesh-based avatars in existing simulators.

pith-pipeline@v0.9.0 · 5552 in / 1090 out tokens · 45733 ms · 2026-05-10T15:20:42.186110+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 29 canonical work pages · 8 internal anchors

  1. [1]

    Anderson, P., Chang, A., Chaplot, D.S., Dosovitskiy, A., Gupta, S., Koltun, V., Kosecka, J., Malik, J., Mottaghi, R., Savva, M., Zamir, A.R.: On evaluation of embodied navigation agents (2018),https://arxiv.org/abs/1807.06757

  2. [2]

    Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Mip-nerf 360: Unbounded anti-aliased neural radiance fields (2022),https://arxiv.org/ abs/2111.12077

  3. [3]

    Batra, D., Gokaslan, A., Kembhavi, A., Maksymets, O., Mottaghi, R., Savva, M., Toshev, A., Wijmans, E.: Objectnav revisited: On evaluation of embodied agents navigating to objects (2020),https://arxiv.org/abs/2006.13171

  4. [4]

    Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for sta- tistical machine translation (2014),https://arxiv.org/abs/1406.1078

  5. [5]

    Bear, Dan Gutfreund, David Cox, Antonio Torralba, James J

    Gan, C., Schwartz, J., Alter, S., Mrowca, D., Schrimpf, M., Traer, J., Freitas, J.D., Kubilius,J.,Bhandwaldar,A.,Haber,N.,Sano,M.,Kim,K.,Wang,E.,Lingelbach, M., Curtis, A., Feigelis, K., Bear, D.M., Gutfreund, D., Cox, D., Torralba, A., DiCarlo, J.J., Tenenbaum, J.B., McDermott, J.H., Yamins, D.L.K.: Threedworld: Aplatformforinteractivemulti-modalphysical...

  6. [6]

    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015),https://arxiv.org/abs/1512.03385

  7. [7]

    Hu, L., Zhang, H., Zhang, Y., Zhou, B., Liu, B., Zhang, S., Nie, L.: Gaussianavatar: Towards realistic human avatar modeling from a single video via animatable 3d gaussians (2024),https://arxiv.org/abs/2312.02134

  8. [8]

    Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering (2023),https://arxiv.org/abs/2308.04079

  9. [9]

    Kolve, E., Mottaghi, R., Han, W., VanderBilt, E., Weihs, L., Herrasti, A., Deitke, M., Ehsani, K., Gordon, D., Zhu, Y., Kembhavi, A., Gupta, A., Farhadi, A.: Ai2- thor: An interactive 3d environment for visual ai (2022),https://arxiv.org/abs/ 1712.05474

  10. [10]

    Lei, J., Wang, Y., Pavlakos, G., Liu, L., Daniilidis, K.: Gart: Gaussian articulated template models (2023),https://arxiv.org/abs/2311.16099

  11. [11]

    Li, C., Xia, F., Martín-Martín, R., Lingelbach, M., Srivastava, S., Shen, B., Vainio, K., Gokmen, C., Dharan, G., Jain, T., Kurenkov, A., Liu, C.K., Gweon, H., Wu, J., Fei-Fei, L., Savarese, S.: igibson 2.0: Object-centric simulation for robot learning of everyday household tasks (2021),https://arxiv.org/abs/2108.03272

  12. [12]

    In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

    Li, Z., Zheng, Z., Wang, L., Liu, Y.: Animatable gaussians: Learning pose- dependent gaussian maps for high-fidelity human avatar modeling. In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

  13. [13]

    Liu, X., Zhan, X., Tang, J., Shan, Y., Zeng, G., Lin, D., Liu, X., Liu, Z.: Hu- mangaussian: Text-driven 3d human generation with gaussian splatting (2024), https://arxiv.org/abs/2311.17061

  14. [14]

    Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: A skinnedmulti-personlinearmodel.ACMTrans.Graphics(Proc.SIGGRAPHAsia) 34(6), 248:1–248:16 (Oct 2015)

  15. [15]

    arXiv preprint arXiv:2308.09713 , year=

    Luiten, J., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis (2023),https://arxiv.org/abs/2308.09713 16 Z. Xia et al

  16. [16]

    Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields for view synthesis (2020), https://arxiv.org/abs/2003.08934

  17. [17]

    Instant neural graphics primitives with a multiresolution hash encoding

    Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics41(4), 1–15 (Jul 2022).https://doi.org/10.1145/3528223.3530127,http://dx.doi.org/ 10.1145/3528223.3530127

  18. [18]

    NVIDIA: Isaac Sim,https://github.com/isaac-sim/IsaacSim

  19. [19]

    Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Osman, A.A.A., Tzionas, D., Black, M.J.: Expressive body capture: 3d hands, face, and body from a single image (2019),https://arxiv.org/abs/1904.05866

  20. [20]

    Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: Text-to-3d using 2d diffusion (2022),https://arxiv.org/abs/2209.14988

  21. [21]

    Puig, X., Undersander, E., Szot, A., Cote, M.D., Yang, T.Y., Partsey, R., Desai, R., Clegg, A.W., Hlavac, M., Min, S.Y., Vondruš, V., Gervet, T., Berges, V.P., Turner, J.M., Maksymets, O., Kira, Z., Kalakrishnan, M., Malik, J., Chaplot, D.S., Jain, U., Batra, D., Rai, A., Mottaghi, R.: Habitat 3.0: A co-habitat for humans, avatars and robots (2023),https:...

  22. [22]

    Ramakrishnan, S.K., Gokaslan, A., Wijmans, E., Maksymets, O., Clegg, A., Turner, J., Undersander, E., Galuba, W., Westbury, A., Chang, A.X., Savva, M., Zhao, Y., Batra, D.: Habitat-matterport 3d dataset (hm3d): 1000 large-scale 3d environments for embodied ai (2021),https://arxiv.org/abs/2109.08238

  23. [23]

    Savva, M., Kadian, A., Maksymets, O., Zhao, Y., Wijmans, E., Jain, B., Straub, J., Liu, J., Koltun, V., Malik, J., Parikh, D., Batra, D.: Habitat: A platform for embodied ai research (2019),https://arxiv.org/abs/1904.01201

  24. [24]

    Shen, B., Xia, F., Li, C., Martín-Martín, R., Fan, L., Wang, G., Pérez-D’Arpino, C., Buch, S., Srivastava, S., Tchapmi, L.P., Tchapmi, M.E., Vainio, K., Wong, J., Fei-Fei, L., Savarese, S.: igibson 1.0: a simulation environment for interactive tasks in large realistic scenes (2021),https://arxiv.org/abs/2012.02924

  25. [25]

    co / datasets / spatialverse/InteriorGS(2025)

    SpatialVerse Research Team, M.T.I.: Interiorgs: A 3d gaussian splatting dataset of semantically labeled indoor scenes.https : / / huggingface . co / datasets / spatialverse/InteriorGS(2025)

  26. [26]

    Straub, J., Whelan, T., Ma, L., Chen, Y., Wijmans, E., Green, S., Engel, J.J., Mur- Artal, R., Ren, C., Verma, S., Clarkson, A., Yan, M., Budge, B., Yan, Y., Pan, X., Yon, J., Zou, Y., Leon, K., Carter, N., Briales, J., Gillingham, T., Mueggler, E., Pesqueira, L., Savva, M., Batra, D., Strasdat, H.M., Nardi, R.D., Goesele, M., Lovegrove, S., Newcombe, R.:...

  27. [27]

    Szot, A., Clegg, A., Undersander, E., Wijmans, E., Zhao, Y., Turner, J., Maestre, N.,Mukadam,M.,Chaplot,D.,Maksymets,O.,Gokaslan,A.,Vondrus,V.,Dharur, S., Meier, F., Galuba, W., Chang, A., Kira, Z., Koltun, V., Malik, J., Savva, M., Batra, D.: Habitat 2.0: Training home assistants to rearrange their habitat (2022), https://arxiv.org/abs/2106.14405

  28. [28]

    Team, G., Anil, R., Borgeaud, S., et al.: Gemini: A family of highly capable mul- timodal models (2025),https://arxiv.org/abs/2312.11805

  29. [29]

    Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world (2017),https://arxiv.org/abs/1703.06907

  30. [30]

    Wijmans, E., Kadian, A., Morcos, A., Lee, S., Essa, I., Parikh, D., Savva, M., Batra, D.: Dd-ppo: Learning near-perfect pointgoal navigators from 2.5 billion frames (2020),https://arxiv.org/abs/1911.00357 Habitat-GS 17

  31. [31]

    worldlabs.ai/blog/marble-world-model

    World Labs: Marble: A multimodal world model (11 2025),https : / / www . worldlabs.ai/blog/marble-world-model

  32. [32]

    Xiang, F., Qin, Y., Mo, K., Xia, Y., Zhu, H., Liu, F., Liu, M., Jiang, H., Yuan, Y., Wang, H., Yi, L., Chang, A.X., Guibas, L.J., Su, H.: Sapien: A simulated part-based interactive environment (2020),https://arxiv.org/abs/2003.08515

  33. [33]

    Yu, Z., Chen, A., Huang, B., Sattler, T., Geiger, A.: Mip-splatting: Alias-free 3d gaussian splatting (2023),https://arxiv.org/abs/2311.16493

  34. [34]

    visual–navigation decoupling

    Zhang, Y., Tang, S.: The wanderings of odysseus in 3d scenes (2022),https: //arxiv.org/abs/2112.09251 18 Z. Xia et al. Appendix This appendix provides supplementary information, extended evaluations, and implementation specifics to further support the main text. Specifically, Sec. A offers a more detailed illustration of the Habitat-GS architecture, elab-...

  35. [35]

    rendering_quality: Visual fidelity (sharpness, blur, artifacts, texture quality, g eometry consistency)

  36. [36]

    realism: How realistic and natural the rendered scene looks compared wit h real-world scenes

  37. [37]

    images": [ {

    scene_diversity: How distinct this image is compared with the other 9 images in this same batch (layout variety, scene types, objects, visual appearance variety). Important constraints: 30 Z. Xia et al. - Do NOT infer or mention rendering methods, engines, or dataset names. - Do NOT output markdown. Return strict JSON only. Required JSON format: { "images...