pith. sign in

arxiv: 2604.07728 · v1 · submitted 2026-04-09 · 💻 cs.CV · cs.GR· cs.RO

GEAR: GEometry-motion Alternating Refinement for Articulated Object Modeling with Gaussian Splatting

Pith reviewed 2026-05-10 16:48 UTC · model grok-4.3

classification 💻 cs.CV cs.GRcs.RO
keywords geararticulatedcomplexgeometry-motionmotionobjectspartsegmentation
0
0 comments X

The pith

GEAR alternates geometry and motion refinement to model articulated objects with Gaussian Splatting

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GEAR as an EM-style alternating optimization framework for reconstructing articulated objects. It jointly models geometry and motion by treating part segmentation as a latent variable and joint motion parameters as explicit variables, updating them in turn inside a Gaussian Splatting representation. This alternation is meant to deliver more stable convergence and better geometric-motion consistency than simultaneous optimization. The method adds multi-view priors from a standard 2D segmentation model plus a weak supervision constraint to keep generalization intact across complex multi-joint cases.

Core claim

GEAR is an EM-style alternating optimization framework that jointly models geometry and motion as interdependent components within a Gaussian Splatting representation. It treats part segmentation as a latent variable and joint motion parameters as explicit variables, alternately refining them for improved convergence and geometric-motion consistency. To enhance part segmentation quality without sacrificing generalization, it leverages a vanilla 2D segmentation model to provide multi-view part priors and employs a weakly supervised constraint to regularize the latent variable. Experiments on multiple benchmarks and the newly constructed GEAR-Multi dataset demonstrate state-of-the-art results,

What carries the argument

EM-style alternating refinement between latent part segmentation and explicit joint motion parameters within a Gaussian Splatting representation, which couples geometry and motion to enforce consistency during optimization

Load-bearing premise

Alternating updates between part segmentation treated as a latent variable and explicit motion parameters will converge stably and produce consistent geometry-motion relationships when guided by vanilla 2D segmentation priors.

What would settle it

If a direct comparison on the same complex multi-joint objects shows that simultaneous joint optimization without alternation yields lower reconstruction error and more accurate motion estimates than GEAR, the benefit of the alternating scheme would be falsified.

Figures

Figures reproduced from arXiv: 2604.07728 by Bin Fu, Jialin Li, Ruiping Wang, Xilin Chen.

Figure 1
Figure 1. Figure 1: Articulated object modeling involves coupled optimiza [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: GEAR is an EM-style framework with three modules: Initialization, geometry modeling, and motion modeling. The key idea is the alternating optimization of geometry and motion, which enhances the stability and performance of articulated object modeling. where s¯ denotes the opposite state of s, Vs ∩ Vs¯ represents the static voxel set, and D(·) is a morphological dilation operation to smooth noisy boundaries… view at source ↗
Figure 3
Figure 3. Figure 3: SAM Mask Aggregation module. Each fine-grained [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative results on complex multi-joint articulated objects. Compared to ArtGS [ [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Convergence Analysis on Storage 45271. (a) Sum of motion parameter errors (Axis Angle + Position + Geometry Distance) over iterations. (b) Total training loss over iterations. Our alternating optimization (blue) achieves the lowest error and loss, avoiding local minima traps and error accumulation. structing digital-twin assets. We convert the reconstructed meshes and motion parameters into URDF files and … view at source ↗
read the original abstract

High-fidelity interactive digital assets are essential for embodied intelligence and robotic interaction, yet articulated objects remain challenging to reconstruct due to their complex structures and coupled geometry-motion relationships. Existing methods suffer from instability in geometry-motion joint optimization, while their generalization remains limited on complex multi-joint or out-of-distribution objects. To address these challenges, we propose GEAR, an EM-style alternating optimization framework that jointly models geometry and motion as interdependent components within a Gaussian Splatting representation. GEAR treats part segmentation as a latent variable and joint motion parameters as explicit variables, alternately refining them for improved convergence and geometric-motion consistency. To enhance part segmentation quality without sacrificing generalization, we leverage a vanilla 2D segmentation model to provide multi-view part priors, and employ a weakly supervised constraint to regularize the latent variable. Experiments on multiple benchmarks and our newly constructed dataset GEAR-Multi demonstrate that GEAR achieves state-of-the-art results in geometric reconstruction and motion parameters estimation, particularly on complex articulated objects with multiple movable parts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes GEAR, an EM-style alternating optimization framework for articulated object modeling with Gaussian Splatting. It treats part segmentation as a latent variable and joint motion parameters as explicit variables, alternately refining them for improved convergence and geometric-motion consistency. The approach leverages vanilla 2D segmentation models for multi-view priors and a weakly supervised regularizer. Experiments on benchmarks and the new GEAR-Multi dataset are claimed to achieve state-of-the-art results in geometric reconstruction and motion estimation, especially for complex multi-part articulated objects.

Significance. If the alternating optimization delivers stable convergence and the claimed performance gains are substantiated, this could meaningfully advance reconstruction of articulated objects for robotics and embodied AI by addressing instability in joint geometry-motion optimization and improving generalization to multi-joint cases. The introduction of the GEAR-Multi dataset and integration of 2D priors represent constructive contributions to the field.

major comments (2)
  1. [Abstract] Abstract: The assertion of state-of-the-art performance on benchmarks and GEAR-Multi is unsupported by any quantitative metrics, ablation studies, error analysis, or implementation details, preventing evaluation of whether the alternating refinement actually improves convergence or consistency.
  2. [Method] Method: The core assumption that EM-style alternation between latent part segmentation and explicit motion parameters yields stable convergence and avoids part-label drift for multi-joint objects lacks theoretical analysis, convergence guarantees, or ablations isolating the alternation versus joint optimization or fixed priors; 2D priors are view-dependent and can be inconsistent under motion/occlusion, yet no evidence shows the updates enforce 3D consistency.
minor comments (1)
  1. [Abstract] The abstract would be strengthened by including at least one key quantitative result (e.g., a Chamfer distance or rotation error number) to ground the SOTA claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, providing clarifications based on the manuscript content and indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The assertion of state-of-the-art performance on benchmarks and GEAR-Multi is unsupported by any quantitative metrics, ablation studies, error analysis, or implementation details, preventing evaluation of whether the alternating refinement actually improves convergence or consistency.

    Authors: The abstract is intended as a concise summary of the work. The full manuscript provides quantitative metrics (PSNR, SSIM, Chamfer distance for geometry; rotation/translation errors for motion), ablation studies isolating the alternating optimization, error analysis, and implementation details in Section 4 and the supplementary material, which substantiate the state-of-the-art claims and improvements in convergence and consistency over baselines. To directly address the concern, we will revise the abstract to incorporate key quantitative highlights from the experiments. revision: yes

  2. Referee: [Method] Method: The core assumption that EM-style alternation between latent part segmentation and explicit motion parameters yields stable convergence and avoids part-label drift for multi-joint objects lacks theoretical analysis, convergence guarantees, or ablations isolating the alternation versus joint optimization or fixed priors; 2D priors are view-dependent and can be inconsistent under motion/occlusion, yet no evidence shows the updates enforce 3D consistency.

    Authors: The manuscript relies on extensive empirical evidence rather than formal theory, as the joint optimization is non-convex. Section 4.3 presents ablations that isolate the alternating refinement from joint optimization and fixed-prior baselines, showing reduced part-label drift and improved stability on multi-joint objects across repeated runs. A multi-view consistency regularizer is applied during the E-step updates to enforce 3D geometric consistency despite view-dependent 2D priors; qualitative results and quantitative consistency metrics in the experiments and supplementary material demonstrate robustness to motion and occlusion. We will expand the method discussion to clarify these mechanisms and add further ablation details in the revision. revision: partial

Circularity Check

0 steps flagged

No circularity: standard EM alternation with external 2D priors

full rationale

The derivation chain consists of an EM-style alternating optimization treating part segmentation as a latent variable and motion parameters as explicit variables, refined iteratively within a Gaussian Splatting representation. Part priors come from an external vanilla 2D segmentation model plus a weakly-supervised regularizer; these are independent inputs, not self-defined or fitted quantities renamed as predictions. No equations or steps reduce by construction to the target outputs, no uniqueness theorems are imported from self-citations, and no ansatz is smuggled via prior work. The framework is self-contained against external benchmarks and standard optimization techniques, with empirical SOTA claims resting on reported results rather than definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on standard computer vision assumptions about Gaussian Splatting representations and optimization stability; no specific free parameters or new entities are detailed.

axioms (2)
  • domain assumption Part segmentation can be treated as a latent variable in an EM-style alternating optimization for improved convergence
    Core mechanism of the proposed framework
  • domain assumption Vanilla 2D segmentation models provide useful multi-view priors for 3D part segmentation under weak supervision
    Used to regularize the latent variable without full supervision

pith-pipeline@v0.9.0 · 5487 in / 1355 out tokens · 58345 ms · 2026-05-10T16:48:29.113453+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · 1 internal anchor

  1. [1]

    Tran- scending dimensions using generative ai: Real-time 3d model generation in augmented reality

    Majid Behravan, Maryam Haghani, and Denis Graˇcanin. Tran- scending dimensions using generative ai: Real-time 3d model generation in augmented reality. InInternational Confer- ence on Human-Computer Interaction, pages 13–32. Springer,

  2. [2]

    Urdformer: A pipeline for constructing articulated simula- tion environments from real-world images.arXiv preprint arXiv:2405.11656, 2024

    Zoey Chen, Aaron Walsman, Marius Memmel, Kaichun Mo, Alex Fang, Karthikeya Vemuri, Alan Wu, Dieter Fox, and Abhishek Gupta. Urdformer: A pipeline for constructing articulated simulation environments from real-world images. arXiv preprint arXiv:2405.11656, 2024. 1, 2

  3. [3]

    Articulate your nerf: Unsupervised articulated object modeling via con- ditional view synthesis

    Jianning Deng, Kartic Subr, and Hakan Bilen. Articulate your nerf: Unsupervised articulated object modeling via con- ditional view synthesis. InAdvances in Neural Information Processing Systems (NeurIPS), pages 119717–119741, 2024. 2

  4. [4]

    A survey of embodied ai: From simulators to research tasks.IEEE Transactions on Emerging Topics in Computational Intelligence, 6(2):230–244, 2022

    Jiafei Duan, Samson Yu, Hui Li Tan, Hongyuan Zhu, and Cheston Tan. A survey of embodied ai: From simulators to research tasks.IEEE Transactions on Emerging Topics in Computational Intelligence, 6(2):230–244, 2022. 1

  5. [5]

    Duisterhof, Zhao Mandi, Yunchao Yao, Jia-Wei Liu, Mike Zheng Shou, Shuran Song, and Jeffrey Ichnowski

    Bardienus P. Duisterhof, Zhao Mandi, Yunchao Yao, Jia- Wei Liu, Mike Zheng Shou, Shuran Song, and Jeffrey Ich- nowski. Md-splatting: Learning metric deformation from 4d gaussians in highly deformable scenes.arXiv preprint arXiv:2312.00583, 2023. 3

  6. [6]

    Gs-lts: 3d gaussian splatting-based adaptive modeling for long-term service robots.arXiv preprint arXiv:2503.17733,

    Bin Fu, Jialin Li, Bin Zhang, Ruiping Wang, and Xilin Chen. Gs-lts: 3d gaussian splatting-based adaptive modeling for long-term service robots.arXiv preprint arXiv:2503.17733,

  7. [7]

    Partrm: Modeling part-level dynamics with large cross-state recon- struction model

    Mingju Gao, Yike Pan, Huan-ang Gao, Zongzheng Zhang, Wenyi Li, Hao Dong, Hao Tang, Li Yi, and Hao Zhao. Partrm: Modeling part-level dynamics with large cross-state recon- struction model. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7004–7014, 2025. 2

  8. [8]

    SAGE: Bridging semantic and actionable parts for generalizable manipulation of articulated objects

    Haoran Geng, Songlin Wei, Congyue Deng, Bokui Shen, He Wang, and Leonidas Guibas. SAGE: Bridging semantic and actionable parts for generalizable manipulation of articulated objects. InICLR Workshop on Large Language Model (LLM) Agents, 2024. 2

  9. [9]

    Articulatedgs: Self-supervised digital twin mod- eling of articulated objects using 3d gaussian splatting

    Junfu Guo, Yu Xin, Gaoyi Liu, Kai Xu, Ligang Liu, and Ruizhen Hu. Articulatedgs: Self-supervised digital twin mod- eling of articulated objects using 3d gaussian splatting. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 27144–27153,

  10. [10]

    Carto: Category and joint agnostic reconstruction of articulated objects

    Nick Heppert, Muhammad Zubair Irshad, Sergey Zakharov, Katherine Liu, Rares Andrei Ambrus, Jeannette Bohg, Abhi- nav Valada, and Thomas Kollar. Carto: Category and joint agnostic reconstruction of articulated objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21201–21210, 2023. 2

  11. [11]

    Ditto in the house: Building articulation models of indoor scenes through interactive perception

    Cheng-Chun Hsu, Zhenyu Jiang, and Yuke Zhu. Ditto in the house: Building articulation models of indoor scenes through interactive perception. InProceedings of the IEEE Interna- tional Conference on Robotics and Automation (ICRA), pages 3933–3939, 2023. 1, 2

  12. [12]

    2d gaussian splatting for geometrically accu- rate radiance fields

    Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically accu- rate radiance fields. InACM SIGGRAPH Conference Papers, pages 1–11, 2024. 2, 3

  13. [13]

    Ditto: Build- ing digital twins of articulated objects from interaction

    Zhenyu Jiang, Cheng-Chun Hsu, and Yuke Zhu. Ditto: Build- ing digital twins of articulated objects from interaction. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5616–5626, 2022. 1, 2, 6

  14. [14]

    Gs-planner: A gaussian-splatting- based planning framework for active high-fidelity reconstruc- tion

    Rui Jin, Yuman Gao, Yingjian Wang, Yuze Wu, Haojian Lu, Chao Xu, and Fei Gao. Gs-planner: A gaussian-splatting- based planning framework for active high-fidelity reconstruc- tion. InProceedings of the IEEE/RSJ International Confer- ence on Intelligent Robots and Systems (IROS), pages 11202– 11209, 2024. 2

  15. [15]

    Detection based part- level articulated object reconstruction from single rgbd im- age

    Yuki Kawana and Tatsuya Harada. Detection based part- level articulated object reconstruction from single rgbd im- age. InAdvances in Neural Information Processing Systems (NeurIPS), pages 18444–18473, 2023. 2

  16. [16]

    Unsu- pervised pose-aware part decomposition for 3d articulated objects.arXiv preprint arXiv:2110.04411, 2021

    Yuki Kawana, Yusuke Mukuta, and Tatsuya Harada. Unsu- pervised pose-aware part decomposition for 3d articulated objects.arXiv preprint arXiv:2110.04411, 2021. 2

  17. [17]

    Splatam: Splat track & map 3d gaussians for dense rgb-d slam

    Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian Scherer, Deva Ramanan, and Jonathon Luiten. Splatam: Splat track & map 3d gaussians for dense rgb-d slam. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 21357–21366, 2024. 2

  18. [18]

    3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics (TOG), 42 (4):1–14, 2023

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimk¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics (TOG), 42 (4):1–14, 2023. 2

  19. [19]

    Berg, Wan-Yen Lo, et al

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C. Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF International Con- ference on Computer Vision (ICCV), pages 4015–4026, 2023. 2

  20. [20]

    Dynmf: Neural motion factorization for real-time dynamic view syn- thesis with 3d gaussian splatting

    Agelos Kratimenos, Jiahui Lei, and Kostas Daniilidis. Dynmf: Neural motion factorization for real-time dynamic view syn- thesis with 3d gaussian splatting. InEuropean Conference on Computer Vision (ECCV), pages 252–269, 2024. 3

  21. [21]

    Articulate-anything: Auto- matic modeling of articulated objects via a vision-language foundation model

    Long Le, Jason Xie, William Liang, Hung-Ju Wang, Yue Yang, Yecheng Jason Ma, Kyle Vedder, Arjun Krishna, Di- nesh Jayaraman, and Eric Eaton. Articulate-anything: Auto- matic modeling of articulated objects via a vision-language foundation model. InInternational Conference on Learning Representations (ICLR), 2025. 2

  22. [22]

    Locate n’rotate: Two-stage open- able part detection with geometric foundation model priors

    Siqi Li, Xiaoxue Chen, Haoyu Cheng, Guyue Zhou, Hao Zhao, and Guanzhong Tian. Locate n’rotate: Two-stage open- able part detection with geometric foundation model priors. 9 InProceedings of the Asian Conference on Computer Vision (ACCV), pages 716–732, 2024. 2

  23. [23]

    Guibas, A

    Xiaolong Li, He Wang, Li Yi, Leonidas J. Guibas, A. Lynn Abbott, and Shuran Song. Category-level articulated object pose estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3706–3715, 2020. 2

  24. [24]

    Robogsim: A real2sim2real robotic gaussian splatting simulator.arXiv preprint arXiv:2411.11839, 2024

    Xinhai Li, Jialin Li, Ziheng Zhang, Rui Zhang, Fan Jia, Tian- cai Wang, Haoqiang Fan, Kuo-Kun Tseng, and Ruiping Wang. Robogsim: A real2sim2real robotic gaussian splatting simu- lator.arXiv preprint arXiv:2411.11839, 2024. 2

  25. [25]

    Gaufre: Gaussian deformation fields for real-time dynamic novel view synthesis

    Yiqing Liang, Numair Khan, Zhengqin Li, Thu Nguyen- Phuoc, Douglas Lanman, James Tompkin, and Lei Xiao. Gaufre: Gaussian deformation fields for real-time dynamic novel view synthesis. InProceedings of the IEEE/CVF Win- ter Conference on Applications of Computer Vision (WACV), pages 2642–2652, 2025. 3

  26. [26]

    Shengjie Lin, Jiading Fang, Muhammad Zubair Irshad, Vi- tor Campagnolo Guizilini, Rares Andrei Ambrus, Greg Shakhnarovich, and Matthew R. Walter. Splart: Articula- tion estimation and part-level reconstruction with 3d gaussian splatting.arXiv preprint arXiv:2506.03594, 2025. 2

  27. [27]

    Gaussian- flow: 4d reconstruction with dynamic 3d gaussian particle

    Youtian Lin, Zuozhuo Dai, Siyu Zhu, and Yao Yao. Gaussian- flow: 4d reconstruction with dynamic 3d gaussian particle. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21136–21145,

  28. [28]

    Paris: Part-level reconstruction and motion analysis for articulated objects

    Jiayi Liu, Ali Mahdavi-Amiri, and Manolis Savva. Paris: Part-level reconstruction and motion analysis for articulated objects. InProceedings of the IEEE/CVF International Con- ference on Computer Vision (ICCV), pages 352–363, 2023. 1, 2, 6

  29. [29]

    Building rearticulable models for arbitrary 3d objects from 4d point clouds

    Shaowei Liu, Saurabh Gupta, and Shenlong Wang. Building rearticulable models for arbitrary 3d objects from 4d point clouds. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21138–21147, 2023. 2

  30. [30]

    Aligning cyber space with physical world: A comprehensive survey on embod- ied ai.IEEE/ASME Transactions on Mechatronics, 30(6): 7253–7274, 2025

    Yang Liu, Weixing Chen, Yongjie Bai, Xiaodan Liang, Guan- bin Li, Wen Gao, and Liang Lin. Aligning cyber space with physical world: A comprehensive survey on embod- ied ai.IEEE/ASME Transactions on Mechatronics, 30(6): 7253–7274, 2025. 1

  31. [31]

    Building interactable replicas of complex articulated objects via gaussian splatting

    Yu Liu, Baoxiong Jia, Ruijie Lu, Junfeng Ni, Song-Chun Zhu, and Siyuan Huang. Building interactable replicas of complex articulated objects via gaussian splatting. InInternational Conference on Learning Representations (ICLR), 2025. 1, 2, 3, 5, 6, 7, 13, 14

  32. [32]

    Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis

    Jonathon Luiten, Georgios Kopanas, Bastian Leibe, and Deva Ramanan. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. InProceedings of the International Conference on 3D Vision (3DV), pages 800–809, 2024. 3

  33. [33]

    Sim2real 2: Actively building explicit physics model for precise articulated object manipulation

    Liqian Ma, Jiaojiao Meng, Shuntao Liu, Weihang Chen, Jing Xu, and Rui Chen. Sim2real 2: Actively building explicit physics model for precise articulated object manipulation. InProceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 11698–11704, 2023. 1, 2

  34. [34]

    Todd K. Moon. The expectation-maximization algorithm. IEEE Signal processing magazine, 13(6):47–60, 1996. 2

  35. [35]

    A-sdf: Learning disentangled signed distance functions for articulated shape representation

    Jiteng Mu, Weichao Qiu, Adam Kortylewski, Alan Yuille, Nuno Vasconcelos, and Xiaolong Wang. A-sdf: Learning disentangled signed distance functions for articulated shape representation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 13001–13011,

  36. [36]

    Structure from action: Learning interactions for articulated object 3d structure discovery

    Neil Nie, Samir Yitzhak Gadre, Kiana Ehsani, and Shu- ran Song. Structure from action: Learning interactions for articulated object 3d structure discovery.arXiv preprint arXiv:2207.08997, 2022. 2

  37. [37]

    Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction

    Michael Oechsle, Songyou Peng, and Andreas Geiger. Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5589–5599, 2021. 2

  38. [38]

    3dgs-avatar: Animatable avatars via deformable 3d gaussian splatting

    Zhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, and Siyu Tang. 3dgs-avatar: Animatable avatars via deformable 3d gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5020–5030, 2024. 2

  39. [39]

    Swings: sliding windows for dynamic 3d gaus- sian splatting

    Richard Shaw, Michal Nazarczuk, Jifei Song, Arthur Moreau, Sibi Catley-Chandar, Helisa Dhamo, and Eduardo P ´erez- Pellitero. Swings: sliding windows for dynamic 3d gaus- sian splatting. InEuropean Conference on Computer Vision (ECCV), pages 37–54, 2024. 3

  40. [40]

    Gaussianart: Unified modeling of geometry and motion for articulated objects.arXiv preprint arXiv:2508.14891, 2025

    Licheng Shen, Saining Zhang, Honghan Li, Peilin Yang, Zi- hao Huang, Zongzheng Zhang, and Hao Zhao. Gaussianart: Unified modeling of geometry and motion for articulated objects.arXiv preprint arXiv:2508.14891, 2025. 1, 2, 3, 5

  41. [41]

    3dgstream: On-the-fly training of 3d gaus- sians for efficient streaming of photo-realistic free-viewpoint videos

    Jiakai Sun, Han Jiao, Guangyuan Li, Zhanjie Zhang, Lei Zhao, and Wei Xing. 3dgstream: On-the-fly training of 3d gaus- sians for efficient streaming of photo-realistic free-viewpoint videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20675–20685, 2024. 3

  42. [42]

    Arti-pg: A toolbox for procedu- rally synthesizing large-scale and diverse articulated objects with rich annotations

    Jianhua Sun, Yuxuan Li, Jiude Wei, Longfei Xu, Nange Wang, Yining Zhang, and Cewu Lu. Arti-pg: A toolbox for procedu- rally synthesizing large-scale and diverse articulated objects with rich annotations. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision (ICCV), pages 6396–6405, 2025. 1

  43. [43]

    Maiya, Vatsal Agarwal, and Abhinav Shrivas- tava

    Archana Swaminathan, Anubhav Gupta, Kamal Gupta, Shishira R. Maiya, Vatsal Agarwal, and Abhinav Shrivas- tava. Leia: Latent view-invariant embeddings for implicit 3d articulation. InEuropean Conference on Computer Vision (ECCV), pages 210–227, 2024. 2

  44. [44]

    Neural geometric level of detail: Real-time rendering with implicit 3d shapes

    Towaki Takikawa, Joey Litalien, Kangxue Yin, Karsten Kreis, Charles Loop, Derek Nowrouzezahrai, Alec Jacobson, Mor- gan McGuire, and Sanja Fidler. Neural geometric level of detail: Real-time rendering with implicit 3d shapes. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11358–11367, 2021. 2

  45. [45]

    Cla-nerf: Category-level articulated neural radiance 10 field

    Wei-Cheng Tseng, Hung-Ju Liao, Lin Yen-Chen, and Min Sun. Cla-nerf: Category-level articulated neural radiance 10 field. InProceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 8454–8460, 2022. 2

  46. [46]

    Ryan, Chris- tian M

    Mythreye Venkatesan, Harini Mohan, Justin R. Ryan, Chris- tian M. Sch¨urch, Garry P. Nolan, David H. Frakes, and Ah- met F. Coskun. Virtual and augmented reality for biomedical applications.Cell reports medicine, 2(7), 2021. 1

  47. [47]

    Neus: Learning neural im- plicit surfaces by volume rendering for multi-view reconstruc- tion

    Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, and Wenping Wang. Neus: Learning neural im- plicit surfaces by volume rendering for multi-view reconstruc- tion. InAdvances in Neural Information Processing Systems (NeurIPS), pages 27171–27183, 2021. 2

  48. [48]

    Shape2motion: Joint analysis of motion parts and attributes from 3d shapes

    Xiaogang Wang, Bin Zhou, Yahao Shi, Xiaowu Chen, Qin- ping Zhao, and Kai Xu. Shape2motion: Joint analysis of motion parts and attributes from 3d shapes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8876–8884, 2019. 2

  49. [49]

    Self-supervised neural articulated shape and appearance models

    Fangyin Wei, Rohan Chabra, Lingni Ma, Christoph Lassner, Michael Zollh ¨ofer, Szymon Rusinkiewicz, Chris Sweeney, Richard Newcombe, and Mira Slavcheva. Self-supervised neural articulated shape and appearance models. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15816–15826, 2022. 2

  50. [50]

    Guibas, and Stan Birchfield

    Yijia Weng, Bowen Wen, Jonathan Tremblay, Valts Blukis, Dieter Fox, Leonidas J. Guibas, and Stan Birchfield. Neural implicit representation for building digital twins of unknown articulated objects. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 3141–3150, 2024. 1, 2, 6

  51. [51]

    Reartgs: Reconstruct- ing and generating articulated objects via 3d gaussian splat- ting with geometric and motion constraints.arXiv preprint arXiv:2503.06677, 2025

    Di Wu, Liu Liu, Linli Zhou, Anran Huang, Liangtu Song, Qiaojun Yu, Qi Wu, and Cewu Lu. Reartgs: Reconstruct- ing and generating articulated objects via 3d gaussian splat- ting with geometric and motion constraints.arXiv preprint arXiv:2503.06677, 2025. 2

  52. [52]

    4d gaussian splatting for real-time dynamic scene rendering

    Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 4d gaussian splatting for real-time dynamic scene rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20310–20320,

  53. [53]

    Hgs- mapping: Online dense mapping using hybrid gaussian rep- resentation in urban scenes.IEEE Robotics and Automation Letters, 2024

    Ke Wu, Kaizhao Zhang, Zhiwei Zhang, Muer Tie, Shanshuai Yuan, Jieru Zhao, Zhongxue Gan, and Wenchao Ding. Hgs- mapping: Online dense mapping using hybrid gaussian rep- resentation in urban scenes.IEEE Robotics and Automation Letters, 2024. 2

  54. [54]

    Predict- optimize-distill: A self-improving cycle for 4d object under- standing.arXiv preprint arXiv:2504.17441, 2025

    Mingxuan Wu, Huang Huang, Justin Kerr, Chung Min Kim, Anthony Zhang, Brent Yi, and Angjoo Kanazawa. Predict- optimize-distill: A self-improving cycle for 4d object under- standing.arXiv preprint arXiv:2504.17441, 2025. 2

  55. [55]

    Drawer: Digital reconstruction and articulation with environment realism

    Hongchi Xia, Entong Su, Marius Memmel, Arhan Jain, Ray- mond Yu, Numfor Mbiziwo-Tiapo, Ali Farhadi, Abhishek Gupta, Shenlong Wang, and Wei-Chiu Ma. Drawer: Digital reconstruction and articulation with environment realism. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 21771–21782,

  56. [56]

    Sapien: A simulated part-based interactive environment

    Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, et al. Sapien: A simulated part-based interactive environment. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11097–11107, 2020. 1, 6

  57. [57]

    Physgaussian: Physics- integrated 3d gaussians for generative dynamics

    Tianyi Xie, Zeshun Zong, Yuxing Qiu, Xuan Li, Yutao Feng, Yin Yang, and Chenfanfu Jiang. Physgaussian: Physics- integrated 3d gaussians for generative dynamics. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4389–4398, 2024. 2

  58. [58]

    Rpm-net: recurrent prediction of motion and parts from point cloud

    Zihao Yan, Ruizhen Hu, Xingguang Yan, Luanmin Chen, Oliver Van Kaick, Hao Zhang, and Hui Huang. Rpm-net: recurrent prediction of motion and parts from point cloud. ACM Transactions on Graphics (TOG), 38(6):1–15, 2019. 2

  59. [59]

    Deformable 3d gaussians for high- fidelity monocular dynamic scene reconstruction

    Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, and Xiaogang Jin. Deformable 3d gaussians for high- fidelity monocular dynamic scene reconstruction. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20331–20341, 2024. 3

  60. [60]

    Deep part induction from articu- lated object pairs.ACM Transactions on Graphics (TOG), 37 (6):1–15, 2018

    Li Yi, Haibin Huang, Difan Liu, Evangelos Kalogerakis, Hao Su, and Leonidas Guibas. Deep part induction from articu- lated object pairs.ACM Transactions on Graphics (TOG), 37 (6):1–15, 2018. 2

  61. [61]

    Iaao: Interactive affordance learning for articulated objects in 3d environments

    Can Zhang and Gim Hee Lee. Iaao: Interactive affordance learning for articulated objects in 3d environments. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12132–12142, 2025. 2

  62. [62]

    TwinOR: Photorealistic Digital Twins of Dynamic Operating Rooms for Embodied AI Research

    Han Zhang, Yiqing Shen, Roger D. Soberanis-Mukul, Ankita Ghosh, Hao Ding, Lalithkumar Seenivasan, Jose L. Porras, Zhekai Mao, Chenjia Li, Wenjie Xiao, et al. Twinor: Photore- alistic digital twins of dynamic operating rooms for embodied ai research.arXiv preprint arXiv:2511.07412, 2025. 1

  63. [63]

    Staged Opti- mization,

    Mandi Zhao, Yijia Weng, Dominik Bauer, and Shuran Song. Real2code: Reconstruct articulated objects via code genera- tion. InInternational Conference on Learning Representa- tions (ICLR), 2025. 2 11 GEAR: GEometry-motion Alternating Refinement for Articulated Object Modeling with Gaussian Splatting Supplementary Material Supplementary Material Overview The...