GASE: Gaussian Splatting-Based Automated System for Reconstructing Embodied-Simulation Environments
Pith reviewed 2026-06-27 00:54 UTC · model grok-4.3
The pith
GASE automates reconstruction of high-fidelity simulation environments from panoramic videos for robot learning with under 10 percent sim-to-real gap.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GASE uses multi-view video streams from panoramic camera arrays for rapid environment scanning. A camera-pose-based strategy extracts objects across frames in the 2D domain, followed by high-fidelity scene inpainting. Foreground objects and the static background are reconstructed independently with Gaussian splatting and seamlessly imported into physics simulators for policy training. Experiments show it outperforms existing 3D Gaussian-based methods in segmentation accuracy by over 10% and achieves state-of-the-art inpainting quality. Real-robot deployments in manipulation and navigation tasks maintain a performance gap of less than 10% compared to policies trained on real-world data.
What carries the argument
The camera-pose-based strategy for robust object extraction across frames in the 2D domain followed by high-fidelity scene inpainting to enable independent reconstruction of foreground objects and static background.
If this is right
- Outperforms existing 3D Gaussian-based methods in segmentation accuracy by over 10%.
- Achieves state-of-the-art inpainting quality.
- Maintains a performance gap of less than 10% in real-robot manipulation and navigation tasks compared to real-world trained policies.
- Enables efficient import of reconstructed assets into physics simulators for embodied agent training.
Where Pith is reading between the lines
- This system could support scaling up training data for robot policies without corresponding increases in real-world collection efforts.
- The reconstruction technique might be adapted for other simulation domains beyond robotics if the extraction method generalizes.
- Further validation on longer sequences or more cluttered scenes would test the robustness of the 2D extraction step.
Load-bearing premise
The camera-pose-based strategy robustly extracts objects across frames in the 2D domain to enable high-fidelity independent reconstruction of foreground objects and static background.
What would settle it
Observing a performance gap exceeding 10% between GASE-trained policies and real-world trained policies on the reported manipulation and navigation tasks.
read the original abstract
Training embodied agents in the real world requires skilled operators and expensive hardware. Simulation environments offer a compelling alternative by enabling large-scale, cost-effective data augmentation. Consequently, rapidly constructing high-fidelity simulation scenes with a minimal sim-to-real gap has become a critical objective in robot learning. While reconstruction-based methods provide superior visual quality, current workflows are hindered by inefficient data acquisition and subpar foreground object extraction. We thus propose GASE, a highly automated system for simulation scene construction. GASE leverages multi-view video streams from panoramic camera arrays to enable rapid environment scanning. To ensure high-quality asset generation, our pipeline introduces a camera-pose-based strategy that robustly extracts objects across frames in the 2D domain, followed by high-fidelity scene inpainting. Foreground objects and the static background are then reconstructed independently and seamlessly imported into physics simulators for policy training. Extensive experiments demonstrate that GASE outperforms existing 3D Gaussian-based methods in segmentation accuracy by over 10\% while achieving state-of-the-art inpainting quality. Furthermore, real-robot deployments across manipulation and navigation tasks maintains a performance gap of less than 10\% compared to policies trained purely on real-world data. These results confirm that GASE provides an efficient and highly effective solution for bridging the sim-to-real gap. Code will be released.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes GASE, an automated pipeline for constructing high-fidelity simulation environments from multi-view panoramic video using 3D Gaussian splatting. A camera-pose-based 2D extraction step followed by scene inpainting separates foreground objects from static background, enabling independent reconstruction and direct import into physics simulators. The central empirical claims are >10% gains in segmentation accuracy over prior 3D Gaussian methods, state-of-the-art inpainting quality, and real-robot manipulation/navigation policies whose performance remains within 10% of policies trained exclusively on real data.
Significance. If the reported performance numbers are supported by properly documented experiments, the work would be significant for embodied AI: it directly targets the data-acquisition bottleneck in sim-to-real transfer by offering a largely automated, high-visual-fidelity reconstruction workflow. The explicit promise to release code is a positive factor for reproducibility.
major comments (2)
- [Abstract / §4] Abstract and §4 (Experiments): the headline claims of '>10% segmentation accuracy' and '<10% sim-to-real performance gap' are presented without any description of dataset sizes, number of scenes, exact metrics, baseline implementations, number of trials, or statistical tests. Because these numbers constitute the primary evidence for the central claim that the pipeline bridges the sim-to-real gap, the absence of protocol details is load-bearing.
- [§3.2] §3.2 (Object Extraction and Inpainting): the entire performance narrative rests on the assertion that the camera-pose-based 2D extraction 'robustly extracts objects across frames' followed by high-fidelity inpainting. No quantitative ablation (e.g., extraction IoU, failure rates under occlusion or blur) or removal of the inpainting stage is reported, leaving the causal link between the proposed strategy and the claimed downstream gains unverified.
minor comments (1)
- [Abstract] The abstract states 'Code will be released' but provides neither a repository URL nor a commit hash; this should be added for a camera-ready version.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments on our manuscript. The feedback correctly identifies areas where additional experimental documentation and ablations would strengthen the presentation of our results. We address each major comment below and will incorporate the suggested revisions.
read point-by-point responses
-
Referee: [Abstract / §4] Abstract and §4 (Experiments): the headline claims of '>10% segmentation accuracy' and '<10% sim-to-real performance gap' are presented without any description of dataset sizes, number of scenes, exact metrics, baseline implementations, number of trials, or statistical tests. Because these numbers constitute the primary evidence for the central claim that the pipeline bridges the sim-to-real gap, the absence of protocol details is load-bearing.
Authors: We agree that the current presentation of headline claims in the abstract and §4 would benefit from more explicit protocol documentation. Although §4 describes the datasets, metrics, and tasks at a high level, it does not enumerate scene counts, trial numbers, or include statistical tests. In the revised manuscript we will expand §4 with a new 'Experimental Protocol' subsection that reports: number of panoramic video sequences and distinct scenes, exact metric definitions and implementations, baseline code references or re-implementations, number of policy training/evaluation trials per task, and results of appropriate statistical tests (e.g., paired t-tests with p-values). revision: yes
-
Referee: [§3.2] §3.2 (Object Extraction and Inpainting): the entire performance narrative rests on the assertion that the camera-pose-based 2D extraction 'robustly extracts objects across frames' followed by high-fidelity inpainting. No quantitative ablation (e.g., extraction IoU, failure rates under occlusion or blur) or removal of the inpainting stage is reported, leaving the causal link between the proposed strategy and the claimed downstream gains unverified.
Authors: The referee is correct that §3.2 currently provides only a qualitative description of the extraction and inpainting pipeline without supporting quantitative ablations. To establish the contribution of these components, the revised manuscript will add quantitative results: per-frame extraction IoU and failure rates under controlled occlusion/blur conditions, plus an ablation that removes the inpainting stage and measures the resulting impact on both segmentation accuracy and downstream policy performance. These new experiments will be reported in an expanded §4. revision: yes
Circularity Check
No circularity: empirical system paper with no derivations or fitted predictions
full rationale
The manuscript presents an automated reconstruction pipeline (multi-view video, camera-pose 2D extraction, inpainting, independent foreground/background Gaussian reconstruction) and supports its claims solely via reported experimental metrics (segmentation accuracy, inpainting quality, real-robot policy transfer gaps). No equations, parameter-fitting steps, self-citations used as uniqueness theorems, or renamings of known results appear in the provided text. The central claims rest on external empirical benchmarks rather than any internal reduction to fitted inputs or self-referential definitions, satisfying the self-contained criterion.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption 3D Gaussian splatting can produce high-fidelity reconstructions from multi-view images suitable for downstream simulation
Reference graph
Works this paper leans on
-
[1]
Real-is-sim: Bridging the sim-to-real gap with a dynamic digital twin, 2025
Jad Abou-Chakra, Lingfeng Sun, Krishan Rana, Brandon May, Karl Schmeckpeper, Niko Suenderhauf, Maria Vittoria Minniti, and Laura Herlant. Real-is-sim: Bridging the sim-to-real gap with a dynamic digital twin, 2025. URL https: //arxiv.org/abs/2504.03597
arXiv 2025
-
[2]
Piper robotic arm.https://www.agibot.com/, 2025
AgiBot. Piper robotic arm.https://www.agibot.com/, 2025. Accessed: 2026-05-09
2025
-
[3]
Barron, Ben Mildenhall, Dor Verbin, Pratul P
Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields.CVPR, 2022
2022
-
[4]
URLhttps://arxiv.org/abs/2410.24164
Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch, Lucy Xiaoyang Shi, James Tanner, Quan Vuong, Anna Walling, Haohuan Wang, and Ury Zhilinsky.π0: A visio...
Pith/arXiv arXiv 2026
-
[5]
Rt-1: Robotics transformer for real-world control at scale
Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Tomas Jackson, Sally Jesmonth, NikhilJ Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Kuang-Huei Lee, Sergey Levine, Yao Lu, Utsav Malla, D...
2022
-
[6]
Physx-3d: Physical-grounded 3d asset generation.arXiv preprint arXiv:2507.12465, 2025
Ziang Cao, Zhaoxi Chen, Liang Pan, and Ziwei Liu. Physx-3d: Physical-grounded 3d asset generation.arXiv preprint arXiv:2507.12465, 2025
arXiv 2025
-
[7]
Ziang Cao, Fangzhou Hong, Zhaoxi Chen, Liang Pan, and Ziwei Liu. Physx-anything: Simulation-ready physical 3d assets from single image.arXiv preprint arXiv:2511.13648, 2025
arXiv 2025
-
[8]
Sam 3: Segment anything with concepts, 2025
Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Va- sudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Zhou, Lilian...
Pith/arXiv arXiv 2025
-
[9]
Segment any 3d gaussians
Jiazhong Cen, Jiemin Fang, Chen Yang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, and Qi Tian. Segment any 3d gaussians. Dec 2023
2023
-
[10]
A survey on 3d gaussian splatting, 2025
Guikun Chen and Wenguan Wang. A survey on 3d gaussian splatting, 2025. URL https://arxiv.org/abs/2401. 03890
2025
-
[11]
Single-stage diffusion nerf: A unified approach to 3d generation and reconstruction
Hansheng Chen, Jiatao Gu, Anpei Chen, Wei Tian, Zhuowen Tu, Lingjie Liu, and Hao Su. Single-stage diffusion nerf: A unified approach to 3d generation and reconstruction. Apr 2023
2023
-
[12]
Flexworld: Progressively expanding 3d scenes for flexiable-view synthesis, 2025
Luxi Chen, Zihan Zhou, Min Zhao, Yikai Wang, Ge Zhang, Wenhao Huang, Hao Sun, Ji-Rong Wen, and Chongxuan Li. Flexworld: Progressively expanding 3d scenes for flexiable-view synthesis, 2025. URL https://arxiv.org/abs/2503. 13265
2025
-
[13]
Meshanything: Artist-created mesh generation with autoregressive transformers, 2024
Yiwen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Jiaxiang Tang, Xin Chen, Zhongang Cai, Lei Yang, Gang Yu, Guosheng Lin, and Chi Zhang. Meshanything: Artist-created mesh generation with autoregressive transformers, 2024
2024
-
[14]
Tracking anything with decoupled video segmentation
Ho Kei Cheng, Seoung Wug Oh, Brian Price, Alexander Schwing, and Joon-Young Lee. Tracking anything with decoupled video segmentation. InICCV, 2023
2023
-
[15]
Gunjan Chhablani, Xiaomeng Ye, Muhammad Zubair Irshad, and Zsolt Kira. Embodiedsplat: Personalized real-to-sim-to-real navigation with gaussian splats from a mobile device, 2025. URLhttps://arxiv.org/abs/2509.17430
arXiv 2025
-
[16]
X-sim: Cross-embodiment learning via real-to-sim-to-real
Prithwish Dan, Kushal Kedia, Angela Chao, Edward Weiyi Duan, Maximus Adrian Pace, Wei-Chiu Ma, and Sanjiban Choudhury. X-sim: Cross-embodiment learning via real-to-sim-to-real. 2025. URLhttps://arxiv.org/abs/2505.07096
arXiv 2025
-
[17]
Twinaligner: Visual-dynamic alignment empowers physics-aware real2sim2real for robotic manipulation
Hongwei Fan, Hang Dai, Jiyao Zhang, Jinzhou Li, Qiyang Yan, Yujie Zhao, Mingju Gao, Jinghang Wu, Hao Tang, and Hao Dong. Twinaligner: Visual-dynamic alignment empowers physics-aware real2sim2real for robotic manipulation. 2025. URL https://arxiv.org/abs/2512.19390. 9
arXiv 2025
-
[18]
Zhao, and Chelsea Finn
Zipeng Fu, Tony Z. Zhao, and Chelsea Finn. Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation. InConference on Robot Learning (CoRL), 2024
2024
-
[19]
Theory of the motion of the heavenly bodies moving about the sun in conic sections.Gauss’s Theoria Motus, 76(1):5–23, 1857
Carl Friedrich Gauss and Charles Henry Davis. Theory of the motion of the heavenly bodies moving about the sun in conic sections.Gauss’s Theoria Motus, 76(1):5–23, 1857
-
[20]
Re3sim: Generating high-fidelity simulation data via 3d-photorealistic real-to-sim for robotic manipulation
Xiaoshen Han, Junqiu Yu, Minghuan Liu, Yilun Chen, Xiaoyang Lyu, Yang Tian, Bolun Wang, Weinan Zhang, Weinan Zhang, and Jiangmiao Pang. Re3sim: Generating high-fidelity simulation data via 3d-photorealistic real-to-sim for robotic manipulation. InIEEE International Conference on Robotics and Automation (ICRA), 2026
2026
-
[21]
Gvgen: Text-to-3d generation with volumetric representation
Xianglong He, Junyi Chen, Sida Peng, Di Huang, Yangguang Li, Xiaoshui Huang, Chun Yuan, Wanli Ouyang, and Tong He. Gvgen: Text-to-3d generation with volumetric representation
-
[22]
Denoising diffusion probabilistic models
Jonathan Ho, Ajay Jain, Pieter Abbeel, and UC Berkeley. Denoising diffusion probabilistic models
-
[23]
In: Burbano, A., Zorin, D., Jarosz, W
Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically accurate radiance fields. InSIGGRAPH 2024 Conference Papers. Association for Computing Machinery, 2024. doi: 10.1145/3641519. 3657428
-
[24]
3d gaussian inpainting with depth-guided cross-view consistency
Sheng-Yu Huang, Zi-Ting Chou, and Yu-Chiang Frank Wang. 3d gaussian inpainting with depth-guided cross-view consistency. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 26704–26713, 2025
2025
-
[25]
Neural wavelet-domain diffusion for 3d shape generation
Ka-Hei Hui, Ruihui Li, Jingyu Hu, and Chi-Wing Fu. Neural wavelet-domain diffusion for 3d shape generation. Sep 2022
2022
-
[26]
Insta360 x5.https://www.insta360.com/, 2025
Insta360. Insta360 x5.https://www.insta360.com/, 2025. Accessed: 2026-05-09
2025
-
[27]
Physical Intelligence, Ali Amin, Raichelle Aniceto, Ashwin Balakrishna, Kevin Black, Ken Conley, Grace Connors, James Darpinian, Karan Dhabalia, Jared DiCarlo, Danny Driess, Michael Equi, Adnan Esmail, Yunhao Fang, Chelsea Finn, Catherine Glossop, Thomas Godden, Ivan Goryachev, Lachy Groom, Hunter Hancock, Karol Hausman, Gashon Hussein, Brian Ichter, Szym...
Pith/arXiv arXiv 2025
-
[28]
Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Manuel Y . Galliker, Dibya Ghosh, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Devin LeBlanc, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsc...
Pith/arXiv arXiv 2025
-
[29]
Polaris: Scalable real-to-sim evaluations for generalist robot policies, 2025
Arhan Jain, Mingtong Zhang, Kanav Arora, William Chen, Marcel Torne, Muhammad Zubair Irshad, Sergey Zakharov, Yue Wang, Sergey Levine, Chelsea Finn, Wei-Chiu Ma, Dhruv Shah, Abhishek Gupta, and Karl Pertsch. Polaris: Scalable real-to-sim evaluations for generalist robot policies, 2025. URLhttps://arxiv.org/abs/2512.16881
arXiv 2025
-
[30]
Postshot.https://www.jawset.com/, 2025
Jawset Visual Computing. Postshot.https://www.jawset.com/, 2025. Accessed: 2026-05-09
2025
-
[31]
Fastlgs: Speeding up language embedded gaussians with feature grid mapping
Yuzhou Ji, He Zhu, Junshu Tang, Wuyi Liu, Zhizhong Zhang, Xin Tan, and Yuan Xie. Fastlgs: Speeding up language embedded gaussians with feature grid mapping. InProceedings of the AAAI Conference on Artificial Intelligence, 2025
2025
-
[32]
Planing: A loosely coupled triangle-gaussian framework for streaming 3d reconstruction, 2026
Changjian Jiang, Kerui Ren, Xudong Li, Kaiwen Song, Linning Xu, Tao Lu, Junting Dong, Yu Zhang, Bo Dai, and Mulin Yu. Planing: A loosely coupled triangle-gaussian framework for streaming 3d reconstruction, 2026. URL https://arxiv. org/abs/2601.22046
arXiv 2026
-
[33]
Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning
Zhenyu Jiang, Yuqi Xie, Kevin Lin, Zhenjia Xu, Weikang Wan, Ajay Mandlekar, Linxi Fan, and Yuke Zhu. Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning. In2025 IEEE International Conference on Robotics and Automation (ICRA), 2025
2025
-
[34]
Poisson surface reconstruction
M Kazhdan. Poisson surface reconstruction. 2006
2006
-
[35]
Screened poisson surface reconstruction.Acm Transactions on Graphics, 32(3):1–13, 2013
Michael Kazhdan and Hugues Hoppe. Screened poisson surface reconstruction.Acm Transactions on Graphics, 32(3):1–13, 2013. 10
2013
-
[36]
3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4), July 2023
Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4), July 2023. URL https://repo-sam.inria.fr/fungraph/ 3d-gaussian-splatting/
2023
-
[37]
Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. Segment anything.arXiv:2304.02643, 2023
Pith/arXiv arXiv 2023
-
[38]
Feature refinement to improve high resolution image inpainting, 2022
Prakhar Kulshreshtha, Brian Pugh, and Salma Jiddi. Feature refinement to improve high resolution image inpainting, 2022. URLhttps://arxiv.org/abs/2206.13644
arXiv 2022
-
[39]
Desaint, Paris, 1788
Joseph-Louis Lagrange.Mécanique Analytique. Desaint, Paris, 1788
-
[40]
Lehome: A simulation environment for deformable object manipulation in household scenarios
Zeyi Li, Jade Yang, Jingkai Xu, Shangbin Xie, Yuran Wang, Zhenhao Shen, Tianxing Chen, Yan Shen, Wenjun Li, Yukun Zheng, Chaorui Zhang, Ming Chen, Chen Xie, and Ruihai Wu. Lehome: A simulation environment for deformable object manipulation in household scenarios. InIROS 2025 - 5th Workshop on RObotic MAnipulation of Deformable Objects: holistic approaches...
2025
-
[41]
Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection.arXiv preprint arXiv:2303.05499, 2023
Pith/arXiv arXiv 2023
-
[42]
In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021
Shitong Luo and Wei Hu. Diffusion probabilistic models for 3d point cloud generation. In2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2021. doi: 10.1109/cvpr46437.2021.00286. URL http://dx.doi. org/10.1109/cvpr46437.2021.00286
-
[43]
Gaga: Group any gaussians via 3d-aware memory bank
Weijie Lyu, Xueting Li, Abhijit Kundu, Yi-Hsuan Tsai, and Ming-Hsuan Yang. Gaga: Group any gaussians via 3d-aware memory bank. Mar 2025
2025
-
[44]
Mimicgen: A data generation system for scalable robot learning using human demonstrations
Ajay Mandlekar, Soroush Nasiriany, Bowen Wen, Iretiayo Akinola, Yashraj Narang, Linxi Fan, Yuke Zhu, and Dieter Fox. Mimicgen: A data generation system for scalable robot learning using human demonstrations. In7th Annual Conference on Robot Learning, 2023
2023
-
[45]
Srinivasan, Matthew Tancik, Jonathan T
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. InECCV, 2020
2020
-
[46]
Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning
M. Mittal, P. Roth, J. Tigue, A. Richard, O. Zhang, P. Du, A. Serrano-Muñoz, X. Yao, R. Zurbrügg, N. Rudin, L. Wawrzyniak, M. Rakhsha, A. Denzler, E. Heiden, A. Borovicka, O. Ahmed, I. Akinola, A. Anwar, M. T. Carlson, J. Y . Feng, A. Garg, R. Gasoto, L. Gulich, Y . Guo, M. Gussert, A. Hansen, M. Kulkarni, C. Li, W. Liu, V . Makoviychuk, G. Malczyk, H. Ma...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2511.04831 2025
-
[47]
3d gaussian ray tracing: Fast tracing of particle scenes.ACM Transactions on Graphics and SIGGRAPH Asia, 2024
Nicolas Moenne-Loccoz, Ashkan Mirzaei, Or Perel, Riccardo de Lutio, Janick Martinez Esturo, Gavriel State, Sanja Fidler, Nicholas Sharp, and Zan Gojcic. 3d gaussian ray tracing: Fast tracing of particle scenes.ACM Transactions on Graphics and SIGGRAPH Asia, 2024
2024
-
[48]
Diffrf: Rendering- guided 3d radiance field diffusion.Cornell University - arXiv,Cornell University - arXiv, Dec 2022
Norman Muller, Yawar Siddiqui, Lorenzo Porzi, SamuelRota Bulò, Peter Kontschieder, and Matthias Nießner. Diffrf: Rendering- guided 3d radiance field diffusion.Cornell University - arXiv,Cornell University - arXiv, Dec 2022
2022
-
[49]
Battaglia
Charlie Nash, Yaroslav Ganin, S.M.Ali Eslami, and PeterW. Battaglia. Polygen: An autoregressive generative model of 3d meshes.Cornell University - arXiv,Cornell University - arXiv, Feb 2020
2020
-
[50]
Robocasa: Large-scale simulation of everyday tasks for generalist robots
Soroush Nasiriany, Abhiram Maddukuri, Lance Zhang, Adeet Parikh, Aaron Lo, Abhishek Joshi, Ajay Mandlekar, and Yuke Zhu. Robocasa: Large-scale simulation of everyday tasks for generalist robots. InRobotics: Science and Systems (RSS), 2024
2024
-
[51]
Robocasa365: A large-scale simulation framework for training and benchmarking generalist robots
Soroush Nasiriany, Sepehr Nasiriany, Abhiram Maddukuri, and Yuke Zhu. Robocasa365: A large-scale simulation framework for training and benchmarking generalist robots. InInternational Conference on Learning Representations (ICLR), 2026
2026
-
[52]
Point-e: A system for generating 3d point clouds from complex prompts
Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, and Mark Chen. Point-e: A system for generating 3d point clouds from complex prompts. Dec 2022. 11
2022
-
[53]
Isaac Sim.https://github.com/isaac-sim/IsaacSim, 2024
NVIDIA Corporation. Isaac Sim.https://github.com/isaac-sim/IsaacSim, 2024. Version 5.1.0
2024
-
[54]
Deepsdf: Learning continuous signed distance functions for shape representation
Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. Deepsdf: Learning continuous signed distance functions for shape representation. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
2019
-
[55]
Fast: Efficient action tokenization for vision-language-action models, 2025
Karl Pertsch, Kyle Stachowicz, Brian Ichter, Danny Driess, Suraj Nair, Quan Vuong, Oier Mees, Chelsea Finn, and Sergey Levine. Fast: Efficient action tokenization for vision-language-action models, 2025. URL https://arxiv.org/abs/ 2501.09747
Pith/arXiv arXiv 2025
-
[56]
Polycam.https://poly.cam, 2024
Polycam Inc. Polycam.https://poly.cam, 2024. 3D scanning application
2024
-
[57]
Langsplat: 3d language gaussian splatting
Minghan Qin, Wanhua Li, Jiawei Zhou, Haoqian Wang, and Hanspeter Pfister. Langsplat: 3d language gaussian splatting. Dec 2023
2023
-
[58]
Splatsim: Zero-shot sim2real transfer of rgb manipulation policies using gaussian splatting, 2024
Mohammad Nomaan Qureshi, Sparsh Garg, Francisco Yandun, David Held, George Kantor, and Abhishesh Silwal. Splatsim: Zero-shot sim2real transfer of rgb manipulation policies using gaussian splatting, 2024. URL https://arxiv.org/abs/ 2409.10161
arXiv 2024
-
[59]
Language models are unsupervised multitask learners
Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019. URLhttps://api.semanticscholar.org/CorpusID:160025533
2019
-
[61]
URLhttps://arxiv.org/abs/2408.00714
-
[62]
Grounded sam: Assembling open-world models for diverse visual tasks, 2024
Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, Zhaoyang Zeng, Hao Zhang, Feng Li, Jie Yang, Hongyang Li, Qing Jiang, and Lei Zhang. Grounded sam: Assembling open-world models for diverse visual tasks, 2024
2024
-
[63]
Schönberger, Silvano Galliani, Torsten Sattler, Konrad Schindler, Marc Pollefeys, and Andreas Geiger
Thomas Schöps, Johannes L. Schönberger, Silvano Galliani, Torsten Sattler, Konrad Schindler, Marc Pollefeys, and Andreas Geiger. A multi-view stereo benchmark with high-resolution images and multi-camera videos. InConference on Computer Vision and Pattern Recognition (CVPR), 2017
2017
-
[64]
Yukai Shi, Weiyu Li, Zihao Wang, Hongyang Li, Xingyu Chen, Ping Tan, and Lei Zhang. Scenemaker: Open-set 3d scene generation with decoupled de-occlusion and pose estimation model.arXiv preprint arXiv:2512.10957, 2025
arXiv 2025
-
[65]
3d neural field generation using triplane diffusion
J.Ryan Shue, EricRyan Chan, Ryan Po, Zachary Ankner, Jiajun Wu, and Gordon Wetzstein. 3d neural field generation using triplane diffusion. Nov 2022
2022
-
[66]
Weiss, Niru Maheswaranathan, and Surya Ganguli
Jascha Sohl-Dickstein, EricA. Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequi- librium thermodynamics.arXiv: Learning,arXiv: Learning, Mar 2015
2015
-
[67]
Resolution-robust large mask inpainting with fourier convolutions
Roman Suvorov, Elizaveta Logacheva, Anton Mashikhin, Anastasia Remizova, Arsenii Ashukha, Aleksei Silvestrov, Naejin Kong, Harshith Goka, Kiwoong Park, and Victor Lempitsky. Resolution-robust large mask inpainting with fourier convolutions. arXiv preprint arXiv:2109.07161, 2021
arXiv 2021
-
[68]
V olumediffusion: Flexible text-to-3d generation with efficient volumetric encoder
Zhicong Tang, Shuyang Gu, Chunyu Wang, Ting Zhang, Jianmin Bao, Dong Chen, and Baining Guo. V olumediffusion: Flexible text-to-3d generation with efficient volumetric encoder. Apr 2024
2024
-
[69]
ALOHA 2 Team, Jorge Aldaco, Travis Armstrong, Robert Baruch, Jeff Bingham, Sanky Chan, Kenneth Draper, Debidatta Dwibedi, Chelsea Finn, Pete Florence, Spencer Goodrich, Wayne Gramlich, Torr Hage, Alexander Herzog, Jonathan Hoech, Thinh Nguyen, Ian Storz, Baruch Tabanpour, Leila Takayama, Jonathan Tompson, Ayzaan Wahid, Ted Wahrburg, Sichun Xu, Sergey Yaro...
-
[70]
URLhttps://arxiv.org/abs/2405.02292
-
[71]
Sam 3d: 3dfy anything in images
SAM 3D Team, Xingyu Chen, Fu-Jen Chu, Pierre Gleize, Kevin J Liang, Alexander Sax, Hao Tang, Weiyao Wang, Michelle Guo, Thibaut Hardin, Xiang Li, Aohan Lin, Jiawei Liu, Ziqi Ma, Anushka Sagar, Bowen Song, Xiaodong Wang, Jianing Yang, Bowen Zhang, Piotr Dollár, Georgia Gkioxari, Matt Feiszli, and Jitendra Malik. Sam 3d: 3dfy anything in images. 2025. URL h...
Pith/arXiv arXiv 2025
-
[72]
Online segment any 3d thing as instance tracking
Hanshi Wang, Zijian Cai, Jin Gao, Yiwei Zhang, Weiming Hu, Ke Wang, and Zhipeng Zhang. Online segment any 3d thing as instance tracking. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, . 12
-
[73]
Rodin: A generative model for sculpting 3d digital avatars using diffusion
Tengfei Wang, Bo Zhang, Ting Zhang, Shuyang Gu, Jianmin Bao, Tadas Baltrusaitis, Jingjing Shen, Dong Chen, Fang Wen, Qifeng Chen, Baining Guo, and Microsoft Research. Rodin: A generative model for sculpting 3d digital avatars using diffusion
-
[74]
Embodiedgen: Towards a generative 3d world engine for embodied intelligence, 2025
Xinjie Wang, Liu Liu, Yu Cao, Ruiqi Wu, Wenkang Qin, Dehui Wang, Wei Sui, and Zhizhong Su. Embodiedgen: Towards a generative 3d world engine for embodied intelligence, 2025. URLhttps://arxiv.org/abs/2506.10600
arXiv 2025
-
[75]
Gscream: Learning 3d geometry and feature consistent gaussian splatting for object removal
Yuxin Wang, Qianyi Wu, Guofeng Zhang, and Dan Xu. Gscream: Learning 3d geometry and feature consistent gaussian splatting for object removal. InECCV, 2024
2024
-
[76]
4d gaussian splatting for real-time dynamic scene rendering
Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 4d gaussian splatting for real-time dynamic scene rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20310–20320, June 2024
2024
-
[77]
3dgut: Enabling distorted cameras and secondary rays in gaussian splatting.Conference on Computer Vision and Pattern Recognition (CVPR), 2025
Qi Wu, Janick Martinez Esturo, Ashkan Mirzaei, Nicolas Moenne-Loccoz, and Zan Gojcic. 3dgut: Enabling distorted cameras and secondary rays in gaussian splatting.Conference on Computer Vision and Pattern Recognition (CVPR), 2025
2025
-
[78]
Drawer: Digital reconstruction and articulation with environment realism, 2025
Hongchi Xia, Entong Su, Marius Memmel, Arhan Jain, Raymond Yu, Numfor Mbiziwo-Tiapo, Ali Farhadi, Abhishek Gupta, Shenlong Wang, and Wei-Chiu Ma. Drawer: Digital reconstruction and articulation with environment realism, 2025. URL https://arxiv.org/abs/2504.15278
arXiv 2025
-
[79]
Structured 3d latents for scalable and versatile 3d generation.arXiv preprint arXiv:2412.01506, 2024
Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, and Jiaolong Yang. Structured 3d latents for scalable and versatile 3d generation.arXiv preprint arXiv:2412.01506, 2024
Pith/arXiv arXiv 2024
-
[80]
Embodiedsam: Online segment any 3d thing in real time.arXiv preprint arXiv:2408.11811, 2024
Xiuwei Xu, Huangxing Chen, Linqing Zhao, Ziwei Wang, Jie Zhou, and Jiwen Lu. Embodiedsam: Online segment any 3d thing in real time.arXiv preprint arXiv:2408.11811, 2024
arXiv 2024
-
[81]
Yunhan Yang, Yufan Zhou, Yuan-Chen Guo, Zi-Xin Zou, Yukun Huang, Ying-Tian Liu, Hao Xu, Ding Liang, Yan-Pei Cao, and Xihui Liu. Omnipart: Part-aware 3d generation with semantic decoupling and structural cohesion.arXiv preprint arXiv:2507.06165, 2025
arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.