A High-Fidelity Digital Twin for Robotic Manipulation Based on 3D Gaussian Splatting

Chengxu Zhou; Jingcheng Sun; Lingfan Bao; Tianhu Peng; Ziyang Sun

arxiv: 2601.03200 · v2 · submitted 2026-01-06 · 💻 cs.RO

A High-Fidelity Digital Twin for Robotic Manipulation Based on 3D Gaussian Splatting

Ziyang Sun , Lingfan Bao , Tianhu Peng , Jingcheng Sun , Chengxu Zhou This is my paper

Pith reviewed 2026-05-16 16:43 UTC · model grok-4.3

classification 💻 cs.RO

keywords digital twin3D Gaussian Splattingrobotic manipulationscene reconstructioncollision geometrysim-to-real transferFranka Emika Pandapick and place

0 comments

The pith

A 3D Gaussian Splatting framework builds photorealistic digital twins from sparse RGB views in minutes and converts them into accurate collision models for robotic manipulation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a practical system that turns limited camera images into interactive digital twins suitable for robot planning and execution. It relies on 3D Gaussian Splatting to deliver fast, visually accurate scene models that serve as a single representation for both rendering and physics. Two key additions handle the translation from visual data to usable robot models: visibility-aware fusion that assigns accurate semantic labels in 3D, and a lightweight filter step that extracts collision geometry ready for a physics engine. Experiments on a real Franka Emika Panda arm performing pick-and-place tasks show that the resulting models support reliable motion planning without extensive manual adjustment. The work therefore positions 3DGS-based twins as a direct bridge from quick perception to closed-loop control in everyday settings.

Core claim

We present a practical framework that constructs high-quality digital twins within minutes from sparse RGB inputs. Our system employs 3D Gaussian Splatting for fast, photorealistic reconstruction as a unified scene representation. We enhance 3DGS with visibility-aware semantic fusion for accurate 3D labelling and introduce an efficient, filter-based geometry conversion method to produce collision-ready models seamlessly integrated with a Unity-ROS2-MoveIt physics engine. In experiments with a Franka Emika Panda robot performing pick-and-place tasks, we demonstrate that this enhanced geometric accuracy effectively supports robust manipulation in real-world trials.

What carries the argument

3D Gaussian Splatting used as the core unified scene representation, extended by visibility-aware semantic fusion for 3D labels and a filter-based method that extracts collision geometry for direct use in physics-based planning.

If this is right

High-fidelity digital twins become available in minutes rather than hours, shortening the time from scene capture to executable robot plans.
Semantic labels and collision geometry derived directly from the same 3DGS model maintain consistency between vision and physics stages.
Integration with standard ROS2 and MoveIt pipelines allows the reconstructed models to drive closed-loop planning without custom middleware.
The method supports robust pick-and-place in unstructured scenes once the geometry conversion step is applied.
The overall pipeline offers a scalable route from sparse RGB perception to reliable manipulation without requiring dense sensors or manual scene modeling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the geometry conversion remains stable across lighting and viewpoint changes, the same pipeline could support online twin updates during long-running robot operations.
Extending the filter-based conversion to handle deformable objects would open the approach to tasks involving soft materials or articulated items.
Because reconstruction time is low, repeated capture cycles could be used to maintain an up-to-date twin when the workspace changes gradually.
The framework could be tested on multi-robot coordination by sharing the same 3DGS model across several agents without reprocessing.

Load-bearing premise

The visibility-aware semantic fusion and filter-based geometry conversion from 3DGS produce collision geometry accurate enough for reliable real-world manipulation without post-hoc tuning or significant sim-to-real discrepancies in unstructured environments.

What would settle it

A controlled trial in which the generated digital twin produces collision models that cause the robot to fail or collide during pick-and-place tasks in a scene where manual modeling succeeds, or where performance drops sharply once the environment changes slightly from the reconstruction views.

Figures

Figures reproduced from arXiv: 2601.03200 by Chengxu Zhou, Jingcheng Sun, Lingfan Bao, Tianhu Peng, Ziyang Sun.

**Figure 1.** Figure 1: The overall pipeline of this framework uses multi-view video input and 3DGS to reconstruct the scene geometry. Grounded-SAM provides semantic masks, which are fused with the 3D projection to form a semantically-aware digital twin. This twin enables collision-aware motion planning for real robot manipulation. 2. Related Work 2.1. 3D Scene Reconstruction for Robotics While dense mapping pipelines like TSDF [… view at source ↗

**Figure 2.** Figure 2: Integration and validation of the digital twin framework across simulation and reality. The Unity view Fig.2a shows the high-fidelity, photorealistic digital twin built with 3DGS and integrated with the physics engine. This model generates and validates collision-aware motion plans visualized in the Rviz interface Fig.2b, which uses simplified geometry for MoveIt planning. The validated plan is then execut… view at source ↗

**Figure 3.** Figure 3: Qualitative efficacy of the point cloud cleaning pipeline. Top: Raw 3DGS point clouds exhibiting floaters and surface fuzziness, which impede precise collision checking. Bottom: Refined geometries after applying our multi-stage filtering (heuristic filtering and DBSCAN). The process effectively removes artifacts and sharpens boundaries, yielding planning-ready digital twins for manipulation tasks. cluster… view at source ↗

**Figure 4.** Figure 4: Execution sequence of the multi-step rearrangement task in (a) the real world and (b) the digital twin. The robot grasps the blue box and places it on the cardboard box, then grasps the yellow cube and stacks it on the blue box, and finally grasps the toy hammer and places it in the target area. This demonstrates the framework’s capability for complex, zero-shot manipulation with proactive planning validat… view at source ↗

read the original abstract

Developing high-fidelity, interactive digital twins is crucial for enabling closed-loop motion planning and reliable real-world robot execution, which are essential to advancing sim-to-real transfer. However, existing approaches often suffer from slow reconstruction, limited visual fidelity, and difficulties in converting photorealistic models into planning-ready collision geometry. We present a practical framework that constructs high-quality digital twins within minutes from sparse RGB inputs. Our system employs 3D Gaussian Splatting (3DGS) for fast, photorealistic reconstruction as a unified scene representation. We enhance 3DGS with visibility-aware semantic fusion for accurate 3D labelling and introduce an efficient, filter-based geometry conversion method to produce collision-ready models seamlessly integrated with a Unity-ROS2-MoveIt physics engine. In experiments with a Franka Emika Panda robot performing pick-and-place tasks, we demonstrate that this enhanced geometric accuracy effectively supports robust manipulation in real-world trials. These results demonstrate that 3DGS-based digital twins, enriched with semantic and geometric consistency, offer a fast, reliable, and scalable path from perception to manipulation in unstructured environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable end-to-end pipeline from sparse RGB to planning-ready digital twin via 3DGS plus semantic fusion and filter conversion, but the geometry accuracy claim rests on qualitative robot trials without metrics.

read the letter

The main takeaway is that they built a practical system that turns a handful of RGB images into a photorealistic scene model with 3D Gaussian Splatting, adds visibility-aware semantic labels, converts the result to collision meshes through filtering, and drops it straight into Unity with ROS2 and MoveIt for a Franka arm. The whole thing runs in minutes and they show it supporting pick-and-place tasks without obvious failures in their setup. That integration is the useful part: it closes the loop from perception to executable planning faster than older reconstruction pipelines. The visibility-aware fusion and filter step are the concrete additions that let them label objects in 3D and produce meshes that MoveIt can use directly. The paper does a decent job laying out the flow so the engineering choices are traceable. The robot experiments at least confirm the system runs end-to-end on real hardware. The soft spot is the evaluation of the geometry conversion. The claim that the resulting collision models are accurate enough for reliable manipulation is central, yet the paper gives no surface error numbers, no Hausdorff distances, no ablation on the filter parameters, and no comparison against other mesh extraction methods. The trials succeed in what appears to be a structured scene, but that leaves open how well it holds when surfaces deviate enough to affect grasps or placements. Small gaps in the meshes can still break planning even if the visuals look good. This is aimed at robotics groups working on quick digital twins for sim-to-real work. Someone already using 3DGS or building manipulation pipelines could pick up the specific fusion and conversion tricks. It deserves peer review because the pipeline is coherent and the integration is real, even though the validation needs tightening to make the accuracy claims stick.

Referee Report

2 major / 1 minor

Summary. The paper claims to present a practical framework for constructing high-quality digital twins within minutes from sparse RGB inputs using 3D Gaussian Splatting (3DGS) as the core representation. It enhances 3DGS with visibility-aware semantic fusion for 3D labelling and an efficient filter-based geometry conversion to generate collision-ready models integrated with Unity-ROS2-MoveIt. Experiments with a Franka Emika Panda robot on pick-and-place tasks are said to demonstrate that the enhanced geometric accuracy supports robust real-world manipulation.

Significance. Should the quantitative validation of the collision geometry accuracy be provided, this work would represent a significant step toward practical, high-fidelity digital twins for robotic manipulation, offering advantages in reconstruction speed and visual fidelity over traditional methods. The seamless pipeline from perception to physics-based planning addresses key bottlenecks in sim-to-real transfer.

major comments (2)

The experiments section reports successful pick-and-place trials with a Franka Emika Panda but provides no quantitative metrics such as task success rates, pose errors, Hausdorff distances for the converted geometry, or comparisons to baselines, leaving the central claim of sufficient collision-model accuracy unsupported.
The filter-based geometry conversion method (introduced to produce collision-ready models from 3DGS) is described without error metrics, ablation on filter parameters, or validation against ground-truth meshes, which is load-bearing for the claim that it yields models accurate enough for reliable MoveIt planning without post-hoc tuning.

minor comments (1)

The abstract refers to 'unstructured environments' while the reported trials appear limited to a single structured pick-and-place setup; adding details on scene variability would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which identifies key opportunities to strengthen the quantitative validation of our claims regarding collision geometry accuracy and the filter-based conversion method. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: The experiments section reports successful pick-and-place trials with a Franka Emika Panda but provides no quantitative metrics such as task success rates, pose errors, Hausdorff distances for the converted geometry, or comparisons to baselines, leaving the central claim of sufficient collision-model accuracy unsupported.

Authors: We agree that the absence of quantitative metrics weakens support for the central claim. In the revised manuscript, we will augment the experiments section with task success rates across repeated trials, end-effector pose errors, Hausdorff distances for the converted collision geometry relative to ground-truth meshes, and direct comparisons to baseline reconstruction approaches. These additions will provide concrete evidence that the enhanced geometric accuracy enables reliable MoveIt planning. revision: yes
Referee: The filter-based geometry conversion method (introduced to produce collision-ready models from 3DGS) is described without error metrics, ablation on filter parameters, or validation against ground-truth meshes, which is load-bearing for the claim that it yields models accurate enough for reliable MoveIt planning without post-hoc tuning.

Authors: We concur that the filter-based conversion requires additional quantitative support. The revised version will incorporate error metrics (including Hausdorff distance and mean geometric deviation), ablation studies on key filter parameters, and validation against ground-truth meshes acquired via high-precision scanning. This will substantiate that the method produces planning-ready models without requiring manual post-processing. revision: yes

Circularity Check

0 steps flagged

No circularity: framework is an integration of existing 3DGS with added components validated experimentally

full rationale

The paper presents a system that applies 3D Gaussian Splatting for scene reconstruction, augments it with visibility-aware semantic fusion and a filter-based geometry conversion to produce collision meshes, and integrates the output into a Unity-ROS2-MoveIt pipeline. These steps are described as engineering additions evaluated through physical Franka robot pick-and-place trials. No equations, fitted parameters, or predictions are introduced that reduce by construction to the inputs; the claims rest on empirical demonstration rather than self-referential logic or load-bearing self-citations. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard assumptions from 3D Gaussian Splatting literature and robotics simulation pipelines, with the paper-specific enhancements treated as effective without detailed independent validation in the abstract.

axioms (2)

domain assumption 3D Gaussian Splatting produces photorealistic reconstructions from sparse RGB views that can be enhanced for semantic and geometric accuracy
Invoked as the foundation for the unified scene representation in the abstract.
ad hoc to paper The filter-based geometry conversion yields collision models sufficiently accurate for real-world manipulation planning
Introduced as part of the framework without quantitative justification in the abstract.

pith-pipeline@v0.9.0 · 5505 in / 1508 out tokens · 56507 ms · 2026-05-16T16:43:06.459559+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We enhance 3DGS with visibility-aware semantic fusion for accurate 3D labelling and introduce an efficient, filter-based geometry conversion method to produce collision-ready models
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

a multi-scale geometric filtering process with statistical outlier removal and adaptive mesh decimation... alpha shapes algorithm

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

[1]

Digital twins to embodied artificial intelligence: review and perspective,

J. Li and S. X. Yang, “Digital twins to embodied artificial intelligence: review and perspective,” Intelligence & Robotics, vol. 5, no. 1, 2025

work page 2025
[2]

A comprehensive review of vision-based 3d reconstruction methods,

L. Zhou, G. Wu, Y . Zuo, X. Chen, and H. Hu, “A comprehensive review of vision-based 3d reconstruction methods,”Sensors, vol. 24, no. 7, 2024

work page 2024
[3]

Nerf: Representing scenes as neural radiance fields for view synthesis,

B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” 2020

work page 2020
[4]

Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields,

J. T. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin-Brualla, and P. P. Srinivasan, “Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields,” 2021

work page 2021
[5]

V oxel structure-based mesh reconstruction from a 3d point cloud,

C. Lv, W. Lin, and B. Zhao, “V oxel structure-based mesh reconstruction from a 3d point cloud,” IEEE Transactions on Multimedia, vol. 24, p. 1815–1829, 2022

work page 2022
[6]

3d gaussian splatting for real-time radiance field rendering,

B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering,” 2023

work page 2023
[7]

Segment anything,

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Lo, P. Dollár, and R. Girshick, “Segment anything,” 2023

work page 2023
[8]

Grounded sam: Assembling open-world models for diverse visual tasks,

T. Ren, S. Liu, A. Zeng, J. Lin, K. Li, H. Cao, J. Chen, X. Huang, Y . Chen, F. Yan, Z. Zeng, H. Zhang, F. Li, J. Yang, H. Li, Q. Jiang, and L. Zhang, “Grounded sam: Assembling open-world models for diverse visual tasks,” 2024

work page 2024
[9]

Robogsim: A real2sim2real robotic gaussian splatting simulator,

X. Li, J. Li, Z. Zhang, R. Zhang, F. Jia, T. Wang, H. Fan, K.-K. Tseng, and R. Wang, “Robogsim: A real2sim2real robotic gaussian splatting simulator,” 2025. 16 Journal Paper type

work page 2025
[10]

Sam 2: Segment anything in images and videos,

N. Ravi, V . Gabeur, Y .-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. Rädle, C. Rolland, L. Gustafson, E. Mintun, J. Pan, K. V . Alwala, N. Carion, C.-Y . Wu, R. Girshick, P. Dollár, and C. Feichtenhofer, “Sam 2: Segment anything in images and videos,” 2024

work page 2024
[11]

Reducing the barrier to entry of complex robotic software: a moveit! case study,

D. Coleman, I. Sucan, S. Chitta, and N. Correll, “Reducing the barrier to entry of complex robotic software: a moveit! case study,” 2014

work page 2014
[12]

A volumetric method for building complex models from range images,

B. Curless and M. Levoy, “A volumetric method for building complex models from range images,” inProceedings of the 23rd annual conference on Computer graphics and interactive techniques, pp. 303–312, ACM, 1996

work page 1996
[13]

V oxblox: Incremental 3d euclidean signed distance fields for on-board mav planning,

H. Oleynikova, Z. Taylor, M. Fehr, R. Siegwart, and J. Nieto, “V oxblox: Incremental 3d euclidean signed distance fields for on-board mav planning,” inIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1366–1373, IEEE, 2017

work page 2017
[14]

Instant neural graphics primitives with a multiresolu- tion hash encoding,

T. Müller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolu- tion hash encoding,”ACM Transactions on Graphics, vol. 41, p. 1–15, July 2022

work page 2022
[15]

Segment any 3d gaussians,

J. Cen, J. Fang, C. Yang, L. Xie, X. Zhang, W. Shen, and Q. Tian, “Segment any 3d gaussians,” 2025

work page 2025
[16]

Splat-nav: Safe real-time robot navigation in gaussian splatting maps,

T. Chen, O. Shorinwa, J. Bruno, A. Swann, J. Yu, W. Zeng, K. Nagami, P. Dames, and M. Schwager, “Splat-nav: Safe real-time robot navigation in gaussian splatting maps,” 2025

work page 2025
[17]

Splat-mover: Multi-stage, open-vocabulary robotic manipulation via editable gaussian splatting,

O. Shorinwa, J. Tucker, A. Smith, A. Swann, T. Chen, R. Firoozi, M. K. III, and M. Schwager, “Splat-mover: Multi-stage, open-vocabulary robotic manipulation via editable gaussian splatting,” 2024

work page 2024
[18]

Graspsplats: Efficient manipulation with 3d feature splatting,

M. Ji, R.-Z. Qiu, X. Zou, and X. Wang, “Graspsplats: Efficient manipulation with 3d feature splatting,” 2024

work page 2024
[19]

Instantsplat: Sparse-view gaussian splatting in seconds,

Z. Fan, K. Wen, W. Cong, K. Wang, J. Zhang, X. Ding, D. Xu, B. Ivanovic, M. Pavone, G. Pavlakos, Z. Wang, and Y . Wang, “Instantsplat: Sparse-view gaussian splatting in seconds,” 2025

work page 2025
[20]

Poisson surface reconstruction,

M. Kazhdan, M. Bolitho, and H. Hoppe, “Poisson surface reconstruction,” inProceedings of the fourth Eurographics symposium on Geometry processing, pp. 61–70, Eurographics Association, 2006

work page 2006
[21]

Sugar: Surface-aligned gaussian splatting for efficient 3d mesh recon- struction and high-quality mesh rendering,

A. Guédon and V . Lepetit, “Sugar: Surface-aligned gaussian splatting for efficient 3d mesh recon- struction and high-quality mesh rendering,” 2023

work page 2023
[22]

Unitygaussiansplatting

A. Pranckevicius, “Unitygaussiansplatting.” https://github.com/aras-p/UnityGaussianSplatting, 2024

work page 2024
[23]

Ros2 for unity

Robotec.AI, “Ros2 for unity.” https://github.com/RobotecAI/ros2-for-unity, 2024. Accessed: 2025-04-28

work page 2024
[24]

Grounding image matching in 3d with mast3r,

V . Leroy, Y . Cabon, and J. Revaud, “Grounding image matching in 3d with mast3r,” 2024

work page 2024
[25]

Drivinggaussian: Composite gaussian splatting for surrounding dynamic autonomous driving scenes,

X. Zhou, Z. Lin, X. Shan, Y . Wang, D. Sun, and M.-H. Yang, “Drivinggaussian: Composite gaussian splatting for surrounding dynamic autonomous driving scenes,” 2023

work page 2023
[26]

Llmphy: Complex physical reasoning using large language models and world models,

A. Cherian, R. Corcodel, S. Jain, and D. Romeres, “Llmphy: Complex physical reasoning using large language models and world models,” 2024. 17

work page 2024

[1] [1]

Digital twins to embodied artificial intelligence: review and perspective,

J. Li and S. X. Yang, “Digital twins to embodied artificial intelligence: review and perspective,” Intelligence & Robotics, vol. 5, no. 1, 2025

work page 2025

[2] [2]

A comprehensive review of vision-based 3d reconstruction methods,

L. Zhou, G. Wu, Y . Zuo, X. Chen, and H. Hu, “A comprehensive review of vision-based 3d reconstruction methods,”Sensors, vol. 24, no. 7, 2024

work page 2024

[3] [3]

Nerf: Representing scenes as neural radiance fields for view synthesis,

B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” 2020

work page 2020

[4] [4]

Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields,

J. T. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin-Brualla, and P. P. Srinivasan, “Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields,” 2021

work page 2021

[5] [5]

V oxel structure-based mesh reconstruction from a 3d point cloud,

C. Lv, W. Lin, and B. Zhao, “V oxel structure-based mesh reconstruction from a 3d point cloud,” IEEE Transactions on Multimedia, vol. 24, p. 1815–1829, 2022

work page 2022

[6] [6]

3d gaussian splatting for real-time radiance field rendering,

B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering,” 2023

work page 2023

[7] [7]

Segment anything,

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Lo, P. Dollár, and R. Girshick, “Segment anything,” 2023

work page 2023

[8] [8]

Grounded sam: Assembling open-world models for diverse visual tasks,

T. Ren, S. Liu, A. Zeng, J. Lin, K. Li, H. Cao, J. Chen, X. Huang, Y . Chen, F. Yan, Z. Zeng, H. Zhang, F. Li, J. Yang, H. Li, Q. Jiang, and L. Zhang, “Grounded sam: Assembling open-world models for diverse visual tasks,” 2024

work page 2024

[9] [9]

Robogsim: A real2sim2real robotic gaussian splatting simulator,

X. Li, J. Li, Z. Zhang, R. Zhang, F. Jia, T. Wang, H. Fan, K.-K. Tseng, and R. Wang, “Robogsim: A real2sim2real robotic gaussian splatting simulator,” 2025. 16 Journal Paper type

work page 2025

[10] [10]

Sam 2: Segment anything in images and videos,

N. Ravi, V . Gabeur, Y .-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. Rädle, C. Rolland, L. Gustafson, E. Mintun, J. Pan, K. V . Alwala, N. Carion, C.-Y . Wu, R. Girshick, P. Dollár, and C. Feichtenhofer, “Sam 2: Segment anything in images and videos,” 2024

work page 2024

[11] [11]

Reducing the barrier to entry of complex robotic software: a moveit! case study,

D. Coleman, I. Sucan, S. Chitta, and N. Correll, “Reducing the barrier to entry of complex robotic software: a moveit! case study,” 2014

work page 2014

[12] [12]

A volumetric method for building complex models from range images,

B. Curless and M. Levoy, “A volumetric method for building complex models from range images,” inProceedings of the 23rd annual conference on Computer graphics and interactive techniques, pp. 303–312, ACM, 1996

work page 1996

[13] [13]

V oxblox: Incremental 3d euclidean signed distance fields for on-board mav planning,

H. Oleynikova, Z. Taylor, M. Fehr, R. Siegwart, and J. Nieto, “V oxblox: Incremental 3d euclidean signed distance fields for on-board mav planning,” inIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1366–1373, IEEE, 2017

work page 2017

[14] [14]

Instant neural graphics primitives with a multiresolu- tion hash encoding,

T. Müller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolu- tion hash encoding,”ACM Transactions on Graphics, vol. 41, p. 1–15, July 2022

work page 2022

[15] [15]

Segment any 3d gaussians,

J. Cen, J. Fang, C. Yang, L. Xie, X. Zhang, W. Shen, and Q. Tian, “Segment any 3d gaussians,” 2025

work page 2025

[16] [16]

Splat-nav: Safe real-time robot navigation in gaussian splatting maps,

T. Chen, O. Shorinwa, J. Bruno, A. Swann, J. Yu, W. Zeng, K. Nagami, P. Dames, and M. Schwager, “Splat-nav: Safe real-time robot navigation in gaussian splatting maps,” 2025

work page 2025

[17] [17]

Splat-mover: Multi-stage, open-vocabulary robotic manipulation via editable gaussian splatting,

O. Shorinwa, J. Tucker, A. Smith, A. Swann, T. Chen, R. Firoozi, M. K. III, and M. Schwager, “Splat-mover: Multi-stage, open-vocabulary robotic manipulation via editable gaussian splatting,” 2024

work page 2024

[18] [18]

Graspsplats: Efficient manipulation with 3d feature splatting,

M. Ji, R.-Z. Qiu, X. Zou, and X. Wang, “Graspsplats: Efficient manipulation with 3d feature splatting,” 2024

work page 2024

[19] [19]

Instantsplat: Sparse-view gaussian splatting in seconds,

Z. Fan, K. Wen, W. Cong, K. Wang, J. Zhang, X. Ding, D. Xu, B. Ivanovic, M. Pavone, G. Pavlakos, Z. Wang, and Y . Wang, “Instantsplat: Sparse-view gaussian splatting in seconds,” 2025

work page 2025

[20] [20]

Poisson surface reconstruction,

M. Kazhdan, M. Bolitho, and H. Hoppe, “Poisson surface reconstruction,” inProceedings of the fourth Eurographics symposium on Geometry processing, pp. 61–70, Eurographics Association, 2006

work page 2006

[21] [21]

Sugar: Surface-aligned gaussian splatting for efficient 3d mesh recon- struction and high-quality mesh rendering,

A. Guédon and V . Lepetit, “Sugar: Surface-aligned gaussian splatting for efficient 3d mesh recon- struction and high-quality mesh rendering,” 2023

work page 2023

[22] [22]

Unitygaussiansplatting

A. Pranckevicius, “Unitygaussiansplatting.” https://github.com/aras-p/UnityGaussianSplatting, 2024

work page 2024

[23] [23]

Ros2 for unity

Robotec.AI, “Ros2 for unity.” https://github.com/RobotecAI/ros2-for-unity, 2024. Accessed: 2025-04-28

work page 2024

[24] [24]

Grounding image matching in 3d with mast3r,

V . Leroy, Y . Cabon, and J. Revaud, “Grounding image matching in 3d with mast3r,” 2024

work page 2024

[25] [25]

Drivinggaussian: Composite gaussian splatting for surrounding dynamic autonomous driving scenes,

X. Zhou, Z. Lin, X. Shan, Y . Wang, D. Sun, and M.-H. Yang, “Drivinggaussian: Composite gaussian splatting for surrounding dynamic autonomous driving scenes,” 2023

work page 2023

[26] [26]

Llmphy: Complex physical reasoning using large language models and world models,

A. Cherian, R. Corcodel, S. Jain, and D. Romeres, “Llmphy: Complex physical reasoning using large language models and world models,” 2024. 17

work page 2024