ArtiTwinSplat: Interactable Digital Twin Reconstruction via Gaussian Splatting from RGB-D videos

Hermann Blum; Marco Hutter; Marc Pollefeys; Max Wilder-Smith; Pranjal Mishra; Ren\'e Zurbr\"ugg; Zuria Bauer

arxiv: 2606.24628 · v1 · pith:2YBM7AJWnew · submitted 2026-06-23 · 💻 cs.RO · cs.CV

ArtiTwinSplat: Interactable Digital Twin Reconstruction via Gaussian Splatting from RGB-D videos

Pranjal Mishra , Ren\'e Zurbr\"ugg , Max Wilder-Smith , Marco Hutter , Marc Pollefeys , Zuria Bauer , Hermann Blum This is my paper

Pith reviewed 2026-06-25 23:42 UTC · model grok-4.3

classification 💻 cs.RO cs.CV

keywords articulated objectsdigital twinsgaussian splattingRGB-D videosroboticsunsupervised discovery3D reconstruction

0 comments

The pith

ArtiTwinSplat builds articulated photo-realistic digital twins directly from RGB-D videos with no CAD models or annotations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ArtiTwinSplat as a way to turn RGB-D videos of objects into interactive digital twins that robots can use right away. It relies on 3D Gaussian Splatting to keep both the look and the shape accurate while an unsupervised process figures out object parts and how they move together from the video motion. The approach skips any need for pre-made designs or human labels, producing models that support real-time viewing and manipulation. This targets the bottleneck of creating usable object models for robots working in real environments.

Core claim

ArtiTwinSplat combines 3D Gaussian Splatting with an unsupervised articulation discovery pipeline to recover part structure and joint kinematics from observed motion alone in RGB-D videos, yielding stable, queryable digital twins that support real-time rendering, viewpoint control, and interactive manipulation without CAD models, simulation assets, or manual annotations.

What carries the argument

3D Gaussian Splatting coupled with an unsupervised articulation discovery pipeline that recovers part structure and joint kinematics from observed motion.

If this is right

Digital twins become constructible automatically at scale from everyday real-world video observations.
Twins remain stable and immediately usable by downstream robot planning and learning systems.
Models support real-time rendering, viewpoint control, and interactive manipulation out of the box.
The integration barrier drops for articulated object handling in embodied AI and human-robot settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Robot learning pipelines could incorporate these twins as drop-in environment models to reduce sim-to-real gaps.
The same video-to-twin process might extend to multi-object scenes if motion separation improves.
Deployment tests in varied lighting or partial occlusion would show whether the motion-based discovery holds up.

Load-bearing premise

An unsupervised pipeline can reliably recover part structure and joint kinematics from motion in real-world RGB-D videos without extra supervision.

What would settle it

A real-world RGB-D video sequence in which the recovered parts and joints produce a digital twin that fails to match observed object motion during interactive manipulation.

Figures

Figures reproduced from arXiv: 2606.24628 by Hermann Blum, Marco Hutter, Marc Pollefeys, Max Wilder-Smith, Pranjal Mishra, Ren\'e Zurbr\"ugg, Zuria Bauer.

**Figure 1.** Figure 1: ArtiTwinSplat pipeline for unsupervised articulated 3D reconstruction: (Stage I) A static pre-change sequence to train a canonical 3DGS model. (Stage II) A dynamic RGB-D capture is localized to the canonical model, and 2D appearance differences generate an initial change mask that seeds reverse SAM2 video object segmentation, giving dense per-frame object masks. (Stage III) Pixel correspondences are lifted… view at source ↗

**Figure 2.** Figure 2: Qualitative results across three real-world scenes at two articulation [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: From real-world capture to simulation-ready digital twin. [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

read the original abstract

Deploying robots in unstructured real-world environments needs accurate, interactive models of the objects. Constructing these models at scale remains a critical bottleneck for robotic system integration. We present ArtiTwinSplat, a framework that automatically constructs articulated, photo-realistic digital twins of objects directly from RGB-D videos, requiring no CAD models, simulation assets, or manual annotations. Our method is built on 3D Gaussian Splatting that preserve geometric fidelity and photometric realism, coupled with an unsupervised articulation discovery pipeline that recovers part structure and joint kinematics from observed motion alone. With tracking and optimization stages our method provides stable, queryable digital twins that support real-time rendering, viewpoint control, and interactive manipulation. Unlike prior methods confined to simulation, ArtiTwinSplat operates directly on real-world observations and produces twins that are immediately usable by downstream robot planning and learning systems. This method offers a practical, scalable pathway toward digital twin construction, lowering the integration barrier for articulated object manipulation in embodied AI and human-robot collaboration contexts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ArtiTwinSplat combines Gaussian Splatting with unsupervised motion-based articulation recovery to build interactive digital twins from real RGB-D video, but the abstract supplies no quantitative results or comparisons so the practical claims rest on unshown evidence.

read the letter

The core idea is to take an RGB-D video of a moving object, fit a 3D Gaussian Splatting representation for geometry and appearance, then recover part structure and joint parameters from the observed motion without any labels or CAD. The output is meant to be a queryable model that supports real-time rendering and robot interaction.

What stands out is the target: scalable construction of articulated twins directly from real observations rather than simulation. That matches a genuine bottleneck in robotics where people need models they can plan and learn with, not just visualize.

The main weakness is the lack of any numbers. The abstract states that the method produces stable, usable twins but gives no reconstruction error, articulation accuracy, runtime figures, or baseline comparisons. Without those, it is impossible to judge whether the unsupervised discovery step actually works reliably on noisy real data or only on clean cases.

The citation pattern and method description look standard for this area, but the central assumption—that motion alone is enough to identify joints and parts in real videos—needs concrete validation that is not visible here.

This is aimed at researchers building digital twins for manipulation or embodied AI. A reader already working on Gaussian Splatting or articulation modeling could extract the pipeline outline, but anyone looking for reproducible results will find the current write-up thin.

If the full paper contains solid experiments and ablations, it is worth sending to review; otherwise the evidence gap is too large for a serious referee process right now.

Referee Report

2 major / 0 minor

Summary. The paper presents ArtiTwinSplat, a framework that automatically constructs articulated, photo-realistic digital twins of objects directly from RGB-D videos. It combines 3D Gaussian Splatting for geometric and photometric fidelity with an unsupervised articulation discovery pipeline that recovers part structure and joint kinematics from observed motion. The method requires no CAD models, simulation assets, or manual annotations, and after tracking and optimization stages produces stable, queryable twins supporting real-time rendering, viewpoint control, interactive manipulation, and use in downstream robot planning and learning systems.

Significance. If the central claims hold with supporting evidence, the work would address a key bottleneck in robotic integration by enabling scalable, annotation-free construction of interactive digital twins from real-world RGB-D data. This could meaningfully advance embodied AI and human-robot collaboration by lowering barriers to articulated object modeling, provided the unsupervised pipeline proves reliable across diverse objects and motions.

major comments (2)

[Abstract] Abstract: The abstract asserts that the method works and produces usable twins but supplies no quantitative results, comparisons, error metrics, or validation details; central claims rest on unshown evidence.
[Abstract] Abstract: The unsupervised articulation discovery pipeline is claimed to reliably recover part structure and joint kinematics from observed motion alone in real-world RGB-D videos, but no details on the algorithm, motion observability assumptions, part segmentation stability, or kinematic identifiability are provided to assess this load-bearing assumption.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment on the abstract below and will revise accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The abstract asserts that the method works and produces usable twins but supplies no quantitative results, comparisons, error metrics, or validation details; central claims rest on unshown evidence.

Authors: We agree that the abstract would benefit from including key quantitative results to support the claims. In the revised version we will add specific metrics from the experiments section, such as rendering PSNR/SSIM, part segmentation accuracy, joint parameter errors, and comparisons to baselines. revision: yes
Referee: [Abstract] Abstract: The unsupervised articulation discovery pipeline is claimed to reliably recover part structure and joint kinematics from observed motion alone in real-world RGB-D videos, but no details on the algorithm, motion observability assumptions, part segmentation stability, or kinematic identifiability are provided to assess this load-bearing assumption.

Authors: The abstract is a concise summary; the algorithm, motion observability assumptions, part segmentation stability, and kinematic identifiability analysis are detailed in Sections 3 and 4 of the manuscript. To address the point we will insert a brief high-level statement on the pipeline approach and assumptions into the abstract. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

The supplied abstract and high-level description contain no equations, derivations, fitted parameters, or mathematical claims. The framework is presented as a combination of 3D Gaussian Splatting and an unsupervised articulation pipeline without any self-referential predictions, self-definitional steps, or load-bearing self-citations that reduce to inputs by construction. No load-bearing derivation chain exists to analyze, so the paper is self-contained against external benchmarks at the level of detail provided.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Abstract provides no explicit free parameters, invented entities, or detailed axioms; the method implicitly rests on standard computer vision assumptions about Gaussian Splatting fidelity and motion-based structure recovery.

axioms (2)

domain assumption 3D Gaussian Splatting preserves geometric fidelity and photometric realism from RGB-D input
Invoked as the foundation for photo-realistic and geometrically accurate twins.
domain assumption Observed motion in RGB-D video is sufficient for unsupervised recovery of part structure and joint kinematics
Central premise of the articulation discovery pipeline.

pith-pipeline@v0.9.1-grok · 5728 in / 1368 out tokens · 32203 ms · 2026-06-25T23:42:13.391486+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 4 canonical work pages

[1]

Black, and Otmar Hilliges

G. Yang, C. Wang, N. D. Reddy, and D. Ramanan, “Reconstruct- ing animatable categories from videos,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2023, pp. 16995–17005, doi: 10.1109/CVPR52729.2023.01630

work page doi:10.1109/cvpr52729.2023.01630 2023
[2]

URL https://proceedings.mlr

Y . Weng, B. Wen, J. Tremblay, V . Blukis, D. Fox, L. Guibas, and S. Birchfield, “Neural implicit representation for building digital twins of unknown articulated objects,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2024, pp. 3141–3150, doi: 10.1109/CVPR52733.2024.00303

work page doi:10.1109/cvpr52733.2024.00303 2024
[3]

doi: 10.1109/ICRA55743.2025.11128816

R. Luo, H. Geng, C. Deng, P. Li, Z. Wang, B. Jia, L. Guibas, and S. Huang, “PhysPart: Physically plausible part completion for interactable objects,” in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), 2025, pp. 12386–12393, doi: 10.1109/ICRA55743.2025.11127496

work page doi:10.1109/icra55743.2025.11127496 2025
[4]

SceneVerse: Scaling 3D vision-language learning for grounded scene understanding,

B. Jia, Y . Chen, H. Yu, Y . Wang, X. Niu, T. Liu, Q. Li, and S. Huang, “SceneVerse: Scaling 3D vision-language learning for grounded scene understanding,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2024

2024
[5]

An embodied generalist agent in 3D world,

J. Huang, S. Yong, X. Ma, X. Linghu, P. Li, Y . Wang, Q. Li, S.-C. Zhu, B. Jia, and S. Huang, “An embodied generalist agent in 3D world,” in Proc. Int. Conf. Mach. Learn. (ICML), 2024

2024
[6]

Multi- modal situated reasoning in 3D scenes,

X. Linghu, J. Huang, X. Niu, X. Ma, B. Jia, and S. Huang, “Multi- modal situated reasoning in 3D scenes,” in Adv. Neural Inf. Process. Syst. (NeurIPS), 2024

2024
[7]

GAPartNet: Cross-category domain-generalizable object perception and manipulation via generalizable and actionable parts,

H. Geng, H. Xu, C. Zhao, C. Xu, L. Yi, S. Huang, and H. Wang, “GAPartNet: Cross-category domain-generalizable object perception and manipulation via generalizable and actionable parts,” arXiv preprint arXiv:2211.05272, 2022

arXiv 2022
[8]

ARNOLD: A benchmark for language-grounded task learning with continuous states in realistic 3D scenes,

R. Gong, J. Huang, Y . Zhao, H. Geng, X. Gao, Q. Wu, W. Ai, Z. Zhou, D. Terzopoulos, S.-C. Zhu,et al., “ARNOLD: A benchmark for language-grounded task learning with continuous states in realistic 3D scenes,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2023

2023
[9]

Reconciling reality through simulation: A real-to-sim- to-real approach for robust manipulation,

M. Torne, A. Simeonov, Z. Li, A. Chan, T. Chen, A. Gupta, and P. Agrawal, “Reconciling reality through simulation: A real-to-sim- to-real approach for robust manipulation,” arXiv:2403.03949 [cs.RO], 2024

arXiv 2024
[10]

Robot see robot do: Imitating articulated object manipu- lation with monocular 4D reconstruction,

J. Kerr, C. M. Kim, M. Wu, B. Yi, Q. Wang, K. Goldberg, and A. Kanazawa, “Robot see robot do: Imitating articulated object manipu- lation with monocular 4D reconstruction,” in Proc. Conf. Robot Learn. (CoRL), 2024

2024
[11]

PARIS: Part-level re- construction and motion analysis for articulated objects,

J. Liu, A. Mahdavi-Amiri, and M. Savva, “PARIS: Part-level re- construction and motion analysis for articulated objects,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2023

2023
[12]

NeRF: Representing scenes as neural radiance fields for view synthesis,

B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoor- thi, and R. Ng, “NeRF: Representing scenes as neural radiance fields for view synthesis,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2020

2020
[13]

3D Gaussian splatting for real-time radiance field rendering,

B. Kerbl, G. Kopanas, T. Leimk ¨uhler, and G. Drettakis, “3D Gaussian splatting for real-time radiance field rendering,” ACM Trans. Graph., vol. 42, no. 4, 2023

2023
[14]

Real2Code: Reconstruct articulated objects via code generation,

Z. Mandi, Y . Weng, D. Bauer, and S. Song, “Real2Code: Reconstruct articulated objects via code generation,” arXiv:2406.08474, 2024

arXiv 2024
[15]

Articulate-Anything: Automatic modeling of articulated objects via a vision language foundation model,

L. Le, J. Xie, W. Liang, H.-J. Wang, Y . Yang, Y . J. Ma, K. Vedder, A. Krishna, D. Jayaraman, and E. Eaton, “Articulate-Anything: Automatic modeling of articulated objects via a vision language foundation model,” arXiv:2410.13882, 2024

arXiv 2024
[16]

Building interactable replicas of complex articulated objects via Gaussian splatting,

Y . Liu, B. Jia, R. Lu, J. Ni, S.-C. Zhu, and S. Huang, “Building interactable replicas of complex articulated objects via Gaussian splatting,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2025

2025
[17]

SplArt: Articulation esti- mation and part-level reconstruction with 3D Gaussian splatting,

S. Lin, J. Fang, M. Z. Irshad, V . C. Guizilini, R. A. Ambrus, G. Shakhnarovich, and M. R. Walter, “SplArt: Articulation esti- mation and part-level reconstruction with 3D Gaussian splatting,” arXiv:2506.03594, 2025

arXiv 2025
[18]

Deformable 3D Gaussians for high-fidelity monocular dynamic scene reconstruc- tion,

Z. Yang, X. Gao, W. Zhou, S. Jiao, Y . Zhang, and X. Jin, “Deformable 3D Gaussians for high-fidelity monocular dynamic scene reconstruc- tion,” arXiv:2309.13101, 2023

arXiv 2023
[19]

Shape of motion: 4D reconstruction from a single video,

Q. Wang, V . Ye, H. Gao, W. Zeng, J. Austin, Z. Li, and A. Kanazawa, “Shape of motion: 4D reconstruction from a single video,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2025

2025
[20]

SAM 2: Segment anything in images and videos,

N. Ravi, V . Gabeur, Y .-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. R¨adle, C. Rolland, L. Gustafson, E. Mintun, J. Pan, K. V . Alwala, N. Carion, C.-Y . Wu, R. Girshick, P. Doll´ar, and C. Feichtenhofer, “SAM 2: Segment anything in images and videos,” arXiv:2408.00714, 2024

Pith/arXiv arXiv 2024
[21]

From coarse to fine: Robust hierarchical localization at large scale,

P.-E. Sarlin, C. Cadena, R. Siegwart, and M. Dymczyk, “From coarse to fine: Robust hierarchical localization at large scale,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019

2019
[22]

3DGS-CD: 3D Gaussian splatting-based change detection for physical object rearrangement,

Z. Lu, J. Ye, and J. Leonard, “3DGS-CD: 3D Gaussian splatting-based change detection for physical object rearrangement,” arXiv:2411.03706 [cs.CV], 2025

arXiv 2025
[23]

TAPIP3D: Tracking any point in persistent 3D geometry,

B. Zhang, L. Ke, A. W. Harley, and K. Fragkiadaki, “TAPIP3D: Tracking any point in persistent 3D geometry,” arXiv:2504.14717, 2025

arXiv 2025
[24]

Mobility fitting using 4D RANSAC,

H. Li, G. Wan, H. Li, A. Sharf, K. Xu, and B. Chen, “Mobility fitting using 4D RANSAC,” Comput. Graph. Forum, vol. 35, no. 5, pp. 79–88, 2016, doi: 10.1111/cgf.12965

work page doi:10.1111/cgf.12965 2016

[1] [1]

Black, and Otmar Hilliges

G. Yang, C. Wang, N. D. Reddy, and D. Ramanan, “Reconstruct- ing animatable categories from videos,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2023, pp. 16995–17005, doi: 10.1109/CVPR52729.2023.01630

work page doi:10.1109/cvpr52729.2023.01630 2023

[2] [2]

URL https://proceedings.mlr

Y . Weng, B. Wen, J. Tremblay, V . Blukis, D. Fox, L. Guibas, and S. Birchfield, “Neural implicit representation for building digital twins of unknown articulated objects,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2024, pp. 3141–3150, doi: 10.1109/CVPR52733.2024.00303

work page doi:10.1109/cvpr52733.2024.00303 2024

[3] [3]

doi: 10.1109/ICRA55743.2025.11128816

R. Luo, H. Geng, C. Deng, P. Li, Z. Wang, B. Jia, L. Guibas, and S. Huang, “PhysPart: Physically plausible part completion for interactable objects,” in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), 2025, pp. 12386–12393, doi: 10.1109/ICRA55743.2025.11127496

work page doi:10.1109/icra55743.2025.11127496 2025

[4] [4]

SceneVerse: Scaling 3D vision-language learning for grounded scene understanding,

B. Jia, Y . Chen, H. Yu, Y . Wang, X. Niu, T. Liu, Q. Li, and S. Huang, “SceneVerse: Scaling 3D vision-language learning for grounded scene understanding,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2024

2024

[5] [5]

An embodied generalist agent in 3D world,

J. Huang, S. Yong, X. Ma, X. Linghu, P. Li, Y . Wang, Q. Li, S.-C. Zhu, B. Jia, and S. Huang, “An embodied generalist agent in 3D world,” in Proc. Int. Conf. Mach. Learn. (ICML), 2024

2024

[6] [6]

Multi- modal situated reasoning in 3D scenes,

X. Linghu, J. Huang, X. Niu, X. Ma, B. Jia, and S. Huang, “Multi- modal situated reasoning in 3D scenes,” in Adv. Neural Inf. Process. Syst. (NeurIPS), 2024

2024

[7] [7]

GAPartNet: Cross-category domain-generalizable object perception and manipulation via generalizable and actionable parts,

H. Geng, H. Xu, C. Zhao, C. Xu, L. Yi, S. Huang, and H. Wang, “GAPartNet: Cross-category domain-generalizable object perception and manipulation via generalizable and actionable parts,” arXiv preprint arXiv:2211.05272, 2022

arXiv 2022

[8] [8]

ARNOLD: A benchmark for language-grounded task learning with continuous states in realistic 3D scenes,

R. Gong, J. Huang, Y . Zhao, H. Geng, X. Gao, Q. Wu, W. Ai, Z. Zhou, D. Terzopoulos, S.-C. Zhu,et al., “ARNOLD: A benchmark for language-grounded task learning with continuous states in realistic 3D scenes,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2023

2023

[9] [9]

Reconciling reality through simulation: A real-to-sim- to-real approach for robust manipulation,

M. Torne, A. Simeonov, Z. Li, A. Chan, T. Chen, A. Gupta, and P. Agrawal, “Reconciling reality through simulation: A real-to-sim- to-real approach for robust manipulation,” arXiv:2403.03949 [cs.RO], 2024

arXiv 2024

[10] [10]

Robot see robot do: Imitating articulated object manipu- lation with monocular 4D reconstruction,

J. Kerr, C. M. Kim, M. Wu, B. Yi, Q. Wang, K. Goldberg, and A. Kanazawa, “Robot see robot do: Imitating articulated object manipu- lation with monocular 4D reconstruction,” in Proc. Conf. Robot Learn. (CoRL), 2024

2024

[11] [11]

PARIS: Part-level re- construction and motion analysis for articulated objects,

J. Liu, A. Mahdavi-Amiri, and M. Savva, “PARIS: Part-level re- construction and motion analysis for articulated objects,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2023

2023

[12] [12]

NeRF: Representing scenes as neural radiance fields for view synthesis,

B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoor- thi, and R. Ng, “NeRF: Representing scenes as neural radiance fields for view synthesis,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2020

2020

[13] [13]

3D Gaussian splatting for real-time radiance field rendering,

B. Kerbl, G. Kopanas, T. Leimk ¨uhler, and G. Drettakis, “3D Gaussian splatting for real-time radiance field rendering,” ACM Trans. Graph., vol. 42, no. 4, 2023

2023

[14] [14]

Real2Code: Reconstruct articulated objects via code generation,

Z. Mandi, Y . Weng, D. Bauer, and S. Song, “Real2Code: Reconstruct articulated objects via code generation,” arXiv:2406.08474, 2024

arXiv 2024

[15] [15]

Articulate-Anything: Automatic modeling of articulated objects via a vision language foundation model,

L. Le, J. Xie, W. Liang, H.-J. Wang, Y . Yang, Y . J. Ma, K. Vedder, A. Krishna, D. Jayaraman, and E. Eaton, “Articulate-Anything: Automatic modeling of articulated objects via a vision language foundation model,” arXiv:2410.13882, 2024

arXiv 2024

[16] [16]

Building interactable replicas of complex articulated objects via Gaussian splatting,

Y . Liu, B. Jia, R. Lu, J. Ni, S.-C. Zhu, and S. Huang, “Building interactable replicas of complex articulated objects via Gaussian splatting,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2025

2025

[17] [17]

SplArt: Articulation esti- mation and part-level reconstruction with 3D Gaussian splatting,

S. Lin, J. Fang, M. Z. Irshad, V . C. Guizilini, R. A. Ambrus, G. Shakhnarovich, and M. R. Walter, “SplArt: Articulation esti- mation and part-level reconstruction with 3D Gaussian splatting,” arXiv:2506.03594, 2025

arXiv 2025

[18] [18]

Deformable 3D Gaussians for high-fidelity monocular dynamic scene reconstruc- tion,

Z. Yang, X. Gao, W. Zhou, S. Jiao, Y . Zhang, and X. Jin, “Deformable 3D Gaussians for high-fidelity monocular dynamic scene reconstruc- tion,” arXiv:2309.13101, 2023

arXiv 2023

[19] [19]

Shape of motion: 4D reconstruction from a single video,

Q. Wang, V . Ye, H. Gao, W. Zeng, J. Austin, Z. Li, and A. Kanazawa, “Shape of motion: 4D reconstruction from a single video,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2025

2025

[20] [20]

SAM 2: Segment anything in images and videos,

N. Ravi, V . Gabeur, Y .-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. R¨adle, C. Rolland, L. Gustafson, E. Mintun, J. Pan, K. V . Alwala, N. Carion, C.-Y . Wu, R. Girshick, P. Doll´ar, and C. Feichtenhofer, “SAM 2: Segment anything in images and videos,” arXiv:2408.00714, 2024

Pith/arXiv arXiv 2024

[21] [21]

From coarse to fine: Robust hierarchical localization at large scale,

P.-E. Sarlin, C. Cadena, R. Siegwart, and M. Dymczyk, “From coarse to fine: Robust hierarchical localization at large scale,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019

2019

[22] [22]

3DGS-CD: 3D Gaussian splatting-based change detection for physical object rearrangement,

Z. Lu, J. Ye, and J. Leonard, “3DGS-CD: 3D Gaussian splatting-based change detection for physical object rearrangement,” arXiv:2411.03706 [cs.CV], 2025

arXiv 2025

[23] [23]

TAPIP3D: Tracking any point in persistent 3D geometry,

B. Zhang, L. Ke, A. W. Harley, and K. Fragkiadaki, “TAPIP3D: Tracking any point in persistent 3D geometry,” arXiv:2504.14717, 2025

arXiv 2025

[24] [24]

Mobility fitting using 4D RANSAC,

H. Li, G. Wan, H. Li, A. Sharf, K. Xu, and B. Chen, “Mobility fitting using 4D RANSAC,” Comput. Graph. Forum, vol. 35, no. 5, pp. 79–88, 2016, doi: 10.1111/cgf.12965

work page doi:10.1111/cgf.12965 2016