GEAR: GEometry-motion Alternating Refinement for Articulated Object Modeling with Gaussian Splatting

Bin Fu; Jialin Li; Ruiping Wang; Xilin Chen

arxiv: 2604.07728 · v1 · submitted 2026-04-09 · 💻 cs.CV · cs.GR· cs.RO

GEAR: GEometry-motion Alternating Refinement for Articulated Object Modeling with Gaussian Splatting

Jialin Li , Bin Fu , Ruiping Wang , Xilin Chen This is my paper

Pith reviewed 2026-05-10 16:48 UTC · model grok-4.3

classification 💻 cs.CV cs.GRcs.RO

keywords geararticulatedcomplexgeometry-motionmotionobjectspartsegmentation

0 comments

The pith

GEAR alternates geometry and motion refinement to model articulated objects with Gaussian Splatting

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GEAR as an EM-style alternating optimization framework for reconstructing articulated objects. It jointly models geometry and motion by treating part segmentation as a latent variable and joint motion parameters as explicit variables, updating them in turn inside a Gaussian Splatting representation. This alternation is meant to deliver more stable convergence and better geometric-motion consistency than simultaneous optimization. The method adds multi-view priors from a standard 2D segmentation model plus a weak supervision constraint to keep generalization intact across complex multi-joint cases.

Core claim

GEAR is an EM-style alternating optimization framework that jointly models geometry and motion as interdependent components within a Gaussian Splatting representation. It treats part segmentation as a latent variable and joint motion parameters as explicit variables, alternately refining them for improved convergence and geometric-motion consistency. To enhance part segmentation quality without sacrificing generalization, it leverages a vanilla 2D segmentation model to provide multi-view part priors and employs a weakly supervised constraint to regularize the latent variable. Experiments on multiple benchmarks and the newly constructed GEAR-Multi dataset demonstrate state-of-the-art results,

What carries the argument

EM-style alternating refinement between latent part segmentation and explicit joint motion parameters within a Gaussian Splatting representation, which couples geometry and motion to enforce consistency during optimization

Load-bearing premise

Alternating updates between part segmentation treated as a latent variable and explicit motion parameters will converge stably and produce consistent geometry-motion relationships when guided by vanilla 2D segmentation priors.

What would settle it

If a direct comparison on the same complex multi-joint objects shows that simultaneous joint optimization without alternation yields lower reconstruction error and more accurate motion estimates than GEAR, the benefit of the alternating scheme would be falsified.

Figures

Figures reproduced from arXiv: 2604.07728 by Bin Fu, Jialin Li, Ruiping Wang, Xilin Chen.

**Figure 2.** Figure 2: GEAR is an EM-style framework with three modules: Initialization, geometry modeling, and motion modeling. The key idea is the alternating optimization of geometry and motion, which enhances the stability and performance of articulated object modeling. where s¯ denotes the opposite state of s, Vs ∩ Vs¯ represents the static voxel set, and D(·) is a morphological dilation operation to smooth noisy boundaries… view at source ↗

**Figure 3.** Figure 3: SAM Mask Aggregation module. Each fine-grained [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative results on complex multi-joint articulated objects. Compared to ArtGS [ [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 6.** Figure 6: Convergence Analysis on Storage 45271. (a) Sum of motion parameter errors (Axis Angle + Position + Geometry Distance) over iterations. (b) Total training loss over iterations. Our alternating optimization (blue) achieves the lowest error and loss, avoiding local minima traps and error accumulation. structing digital-twin assets. We convert the reconstructed meshes and motion parameters into URDF files and … view at source ↗

read the original abstract

High-fidelity interactive digital assets are essential for embodied intelligence and robotic interaction, yet articulated objects remain challenging to reconstruct due to their complex structures and coupled geometry-motion relationships. Existing methods suffer from instability in geometry-motion joint optimization, while their generalization remains limited on complex multi-joint or out-of-distribution objects. To address these challenges, we propose GEAR, an EM-style alternating optimization framework that jointly models geometry and motion as interdependent components within a Gaussian Splatting representation. GEAR treats part segmentation as a latent variable and joint motion parameters as explicit variables, alternately refining them for improved convergence and geometric-motion consistency. To enhance part segmentation quality without sacrificing generalization, we leverage a vanilla 2D segmentation model to provide multi-view part priors, and employ a weakly supervised constraint to regularize the latent variable. Experiments on multiple benchmarks and our newly constructed dataset GEAR-Multi demonstrate that GEAR achieves state-of-the-art results in geometric reconstruction and motion parameters estimation, particularly on complex articulated objects with multiple movable parts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GEAR's EM-style alternation between latent part segmentation and explicit motion parameters is a direct attempt to stabilize Gaussian Splatting for articulated objects, but the abstract gives no numbers or ablations to show it works.

read the letter

The paper's main move is to split the joint optimization problem by treating part segmentation as a latent variable that gets refined in alternation with explicit joint motion parameters inside the Gaussian Splatting representation. This EM-style loop plus the use of off-the-shelf 2D segmentation for multi-view priors and a weak regularizer is the concrete addition over prior joint-optimization baselines for articulated objects. The new GEAR-Multi dataset for multi-joint cases is also a straightforward practical step if the results hold up there. These pieces address a known pain point in the subfield without overclaiming broader impact. The soft spot is the stability claim. The description relies on 2D priors that are inherently view-dependent and can shift under motion or occlusion, yet there is no mention of convergence analysis, part-label consistency checks across iterations, or ablations that isolate the alternation from plain joint optimization. Without those, the reported gains on complex objects remain hard to trust. The SOTA assertion on benchmarks and GEAR-Multi is stated but not quantified in the provided material, which leaves the central argument resting on unshown evidence. This work is for researchers already working on dynamic 3D reconstruction with Gaussians or articulated assets in robotics and graphics. It deserves a serious referee because the alternation idea is simple enough to test and the problem is real, even if the current write-up needs more experimental grounding to be convincing.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes GEAR, an EM-style alternating optimization framework for articulated object modeling with Gaussian Splatting. It treats part segmentation as a latent variable and joint motion parameters as explicit variables, alternately refining them for improved convergence and geometric-motion consistency. The approach leverages vanilla 2D segmentation models for multi-view priors and a weakly supervised regularizer. Experiments on benchmarks and the new GEAR-Multi dataset are claimed to achieve state-of-the-art results in geometric reconstruction and motion estimation, especially for complex multi-part articulated objects.

Significance. If the alternating optimization delivers stable convergence and the claimed performance gains are substantiated, this could meaningfully advance reconstruction of articulated objects for robotics and embodied AI by addressing instability in joint geometry-motion optimization and improving generalization to multi-joint cases. The introduction of the GEAR-Multi dataset and integration of 2D priors represent constructive contributions to the field.

major comments (2)

[Abstract] Abstract: The assertion of state-of-the-art performance on benchmarks and GEAR-Multi is unsupported by any quantitative metrics, ablation studies, error analysis, or implementation details, preventing evaluation of whether the alternating refinement actually improves convergence or consistency.
[Method] Method: The core assumption that EM-style alternation between latent part segmentation and explicit motion parameters yields stable convergence and avoids part-label drift for multi-joint objects lacks theoretical analysis, convergence guarantees, or ablations isolating the alternation versus joint optimization or fixed priors; 2D priors are view-dependent and can be inconsistent under motion/occlusion, yet no evidence shows the updates enforce 3D consistency.

minor comments (1)

[Abstract] The abstract would be strengthened by including at least one key quantitative result (e.g., a Chamfer distance or rotation error number) to ground the SOTA claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, providing clarifications based on the manuscript content and indicating planned revisions where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract: The assertion of state-of-the-art performance on benchmarks and GEAR-Multi is unsupported by any quantitative metrics, ablation studies, error analysis, or implementation details, preventing evaluation of whether the alternating refinement actually improves convergence or consistency.

Authors: The abstract is intended as a concise summary of the work. The full manuscript provides quantitative metrics (PSNR, SSIM, Chamfer distance for geometry; rotation/translation errors for motion), ablation studies isolating the alternating optimization, error analysis, and implementation details in Section 4 and the supplementary material, which substantiate the state-of-the-art claims and improvements in convergence and consistency over baselines. To directly address the concern, we will revise the abstract to incorporate key quantitative highlights from the experiments. revision: yes
Referee: [Method] Method: The core assumption that EM-style alternation between latent part segmentation and explicit motion parameters yields stable convergence and avoids part-label drift for multi-joint objects lacks theoretical analysis, convergence guarantees, or ablations isolating the alternation versus joint optimization or fixed priors; 2D priors are view-dependent and can be inconsistent under motion/occlusion, yet no evidence shows the updates enforce 3D consistency.

Authors: The manuscript relies on extensive empirical evidence rather than formal theory, as the joint optimization is non-convex. Section 4.3 presents ablations that isolate the alternating refinement from joint optimization and fixed-prior baselines, showing reduced part-label drift and improved stability on multi-joint objects across repeated runs. A multi-view consistency regularizer is applied during the E-step updates to enforce 3D geometric consistency despite view-dependent 2D priors; qualitative results and quantitative consistency metrics in the experiments and supplementary material demonstrate robustness to motion and occlusion. We will expand the method discussion to clarify these mechanisms and add further ablation details in the revision. revision: partial

Circularity Check

0 steps flagged

No circularity: standard EM alternation with external 2D priors

full rationale

The derivation chain consists of an EM-style alternating optimization treating part segmentation as a latent variable and motion parameters as explicit variables, refined iteratively within a Gaussian Splatting representation. Part priors come from an external vanilla 2D segmentation model plus a weakly-supervised regularizer; these are independent inputs, not self-defined or fitted quantities renamed as predictions. No equations or steps reduce by construction to the target outputs, no uniqueness theorems are imported from self-citations, and no ansatz is smuggled via prior work. The framework is self-contained against external benchmarks and standard optimization techniques, with empirical SOTA claims resting on reported results rather than definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on standard computer vision assumptions about Gaussian Splatting representations and optimization stability; no specific free parameters or new entities are detailed.

axioms (2)

domain assumption Part segmentation can be treated as a latent variable in an EM-style alternating optimization for improved convergence
Core mechanism of the proposed framework
domain assumption Vanilla 2D segmentation models provide useful multi-view priors for 3D part segmentation under weak supervision
Used to regularize the latent variable without full supervision

pith-pipeline@v0.9.0 · 5487 in / 1355 out tokens · 58345 ms · 2026-05-10T16:48:29.113453+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · 1 internal anchor

[1]

Tran- scending dimensions using generative ai: Real-time 3d model generation in augmented reality

Majid Behravan, Maryam Haghani, and Denis Graˇcanin. Tran- scending dimensions using generative ai: Real-time 3d model generation in augmented reality. InInternational Confer- ence on Human-Computer Interaction, pages 13–32. Springer,

work page
[2]

Urdformer: A pipeline for constructing articulated simula- tion environments from real-world images.arXiv preprint arXiv:2405.11656, 2024

Zoey Chen, Aaron Walsman, Marius Memmel, Kaichun Mo, Alex Fang, Karthikeya Vemuri, Alan Wu, Dieter Fox, and Abhishek Gupta. Urdformer: A pipeline for constructing articulated simulation environments from real-world images. arXiv preprint arXiv:2405.11656, 2024. 1, 2

work page arXiv 2024
[3]

Articulate your nerf: Unsupervised articulated object modeling via con- ditional view synthesis

Jianning Deng, Kartic Subr, and Hakan Bilen. Articulate your nerf: Unsupervised articulated object modeling via con- ditional view synthesis. InAdvances in Neural Information Processing Systems (NeurIPS), pages 119717–119741, 2024. 2

work page 2024
[4]

A survey of embodied ai: From simulators to research tasks.IEEE Transactions on Emerging Topics in Computational Intelligence, 6(2):230–244, 2022

Jiafei Duan, Samson Yu, Hui Li Tan, Hongyuan Zhu, and Cheston Tan. A survey of embodied ai: From simulators to research tasks.IEEE Transactions on Emerging Topics in Computational Intelligence, 6(2):230–244, 2022. 1

work page 2022
[5]

Duisterhof, Zhao Mandi, Yunchao Yao, Jia-Wei Liu, Mike Zheng Shou, Shuran Song, and Jeffrey Ichnowski

Bardienus P. Duisterhof, Zhao Mandi, Yunchao Yao, Jia- Wei Liu, Mike Zheng Shou, Shuran Song, and Jeffrey Ich- nowski. Md-splatting: Learning metric deformation from 4d gaussians in highly deformable scenes.arXiv preprint arXiv:2312.00583, 2023. 3

work page arXiv 2023
[6]

Gs-lts: 3d gaussian splatting-based adaptive modeling for long-term service robots.arXiv preprint arXiv:2503.17733,

Bin Fu, Jialin Li, Bin Zhang, Ruiping Wang, and Xilin Chen. Gs-lts: 3d gaussian splatting-based adaptive modeling for long-term service robots.arXiv preprint arXiv:2503.17733,

work page arXiv
[7]

Partrm: Modeling part-level dynamics with large cross-state recon- struction model

Mingju Gao, Yike Pan, Huan-ang Gao, Zongzheng Zhang, Wenyi Li, Hao Dong, Hao Tang, Li Yi, and Hao Zhao. Partrm: Modeling part-level dynamics with large cross-state recon- struction model. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7004–7014, 2025. 2

work page 2025
[8]

SAGE: Bridging semantic and actionable parts for generalizable manipulation of articulated objects

Haoran Geng, Songlin Wei, Congyue Deng, Bokui Shen, He Wang, and Leonidas Guibas. SAGE: Bridging semantic and actionable parts for generalizable manipulation of articulated objects. InICLR Workshop on Large Language Model (LLM) Agents, 2024. 2

work page 2024
[9]

Articulatedgs: Self-supervised digital twin mod- eling of articulated objects using 3d gaussian splatting

Junfu Guo, Yu Xin, Gaoyi Liu, Kai Xu, Ligang Liu, and Ruizhen Hu. Articulatedgs: Self-supervised digital twin mod- eling of articulated objects using 3d gaussian splatting. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 27144–27153,

work page
[10]

Carto: Category and joint agnostic reconstruction of articulated objects

Nick Heppert, Muhammad Zubair Irshad, Sergey Zakharov, Katherine Liu, Rares Andrei Ambrus, Jeannette Bohg, Abhi- nav Valada, and Thomas Kollar. Carto: Category and joint agnostic reconstruction of articulated objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21201–21210, 2023. 2

work page 2023
[11]

Ditto in the house: Building articulation models of indoor scenes through interactive perception

Cheng-Chun Hsu, Zhenyu Jiang, and Yuke Zhu. Ditto in the house: Building articulation models of indoor scenes through interactive perception. InProceedings of the IEEE Interna- tional Conference on Robotics and Automation (ICRA), pages 3933–3939, 2023. 1, 2

work page 2023
[12]

2d gaussian splatting for geometrically accu- rate radiance fields

Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically accu- rate radiance fields. InACM SIGGRAPH Conference Papers, pages 1–11, 2024. 2, 3

work page 2024
[13]

Ditto: Build- ing digital twins of articulated objects from interaction

Zhenyu Jiang, Cheng-Chun Hsu, and Yuke Zhu. Ditto: Build- ing digital twins of articulated objects from interaction. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5616–5626, 2022. 1, 2, 6

work page 2022
[14]

Gs-planner: A gaussian-splatting- based planning framework for active high-fidelity reconstruc- tion

Rui Jin, Yuman Gao, Yingjian Wang, Yuze Wu, Haojian Lu, Chao Xu, and Fei Gao. Gs-planner: A gaussian-splatting- based planning framework for active high-fidelity reconstruc- tion. InProceedings of the IEEE/RSJ International Confer- ence on Intelligent Robots and Systems (IROS), pages 11202– 11209, 2024. 2

work page 2024
[15]

Detection based part- level articulated object reconstruction from single rgbd im- age

Yuki Kawana and Tatsuya Harada. Detection based part- level articulated object reconstruction from single rgbd im- age. InAdvances in Neural Information Processing Systems (NeurIPS), pages 18444–18473, 2023. 2

work page 2023
[16]

Unsu- pervised pose-aware part decomposition for 3d articulated objects.arXiv preprint arXiv:2110.04411, 2021

Yuki Kawana, Yusuke Mukuta, and Tatsuya Harada. Unsu- pervised pose-aware part decomposition for 3d articulated objects.arXiv preprint arXiv:2110.04411, 2021. 2

work page arXiv 2021
[17]

Splatam: Splat track & map 3d gaussians for dense rgb-d slam

Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian Scherer, Deva Ramanan, and Jonathon Luiten. Splatam: Splat track & map 3d gaussians for dense rgb-d slam. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 21357–21366, 2024. 2

work page 2024
[18]

3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics (TOG), 42 (4):1–14, 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics (TOG), 42 (4):1–14, 2023. 2

work page 2023
[19]

Berg, Wan-Yen Lo, et al

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C. Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF International Con- ference on Computer Vision (ICCV), pages 4015–4026, 2023. 2

work page 2023
[20]

Dynmf: Neural motion factorization for real-time dynamic view syn- thesis with 3d gaussian splatting

Agelos Kratimenos, Jiahui Lei, and Kostas Daniilidis. Dynmf: Neural motion factorization for real-time dynamic view syn- thesis with 3d gaussian splatting. InEuropean Conference on Computer Vision (ECCV), pages 252–269, 2024. 3

work page 2024
[21]

Articulate-anything: Auto- matic modeling of articulated objects via a vision-language foundation model

Long Le, Jason Xie, William Liang, Hung-Ju Wang, Yue Yang, Yecheng Jason Ma, Kyle Vedder, Arjun Krishna, Di- nesh Jayaraman, and Eric Eaton. Articulate-anything: Auto- matic modeling of articulated objects via a vision-language foundation model. InInternational Conference on Learning Representations (ICLR), 2025. 2

work page 2025
[22]

Locate n’rotate: Two-stage open- able part detection with geometric foundation model priors

Siqi Li, Xiaoxue Chen, Haoyu Cheng, Guyue Zhou, Hao Zhao, and Guanzhong Tian. Locate n’rotate: Two-stage open- able part detection with geometric foundation model priors. 9 InProceedings of the Asian Conference on Computer Vision (ACCV), pages 716–732, 2024. 2

work page 2024
[23]

Guibas, A

Xiaolong Li, He Wang, Li Yi, Leonidas J. Guibas, A. Lynn Abbott, and Shuran Song. Category-level articulated object pose estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3706–3715, 2020. 2

work page 2020
[24]

Robogsim: A real2sim2real robotic gaussian splatting simulator.arXiv preprint arXiv:2411.11839, 2024

Xinhai Li, Jialin Li, Ziheng Zhang, Rui Zhang, Fan Jia, Tian- cai Wang, Haoqiang Fan, Kuo-Kun Tseng, and Ruiping Wang. Robogsim: A real2sim2real robotic gaussian splatting simu- lator.arXiv preprint arXiv:2411.11839, 2024. 2

work page arXiv 2024
[25]

Gaufre: Gaussian deformation fields for real-time dynamic novel view synthesis

Yiqing Liang, Numair Khan, Zhengqin Li, Thu Nguyen- Phuoc, Douglas Lanman, James Tompkin, and Lei Xiao. Gaufre: Gaussian deformation fields for real-time dynamic novel view synthesis. InProceedings of the IEEE/CVF Win- ter Conference on Applications of Computer Vision (WACV), pages 2642–2652, 2025. 3

work page 2025
[26]

Shengjie Lin, Jiading Fang, Muhammad Zubair Irshad, Vi- tor Campagnolo Guizilini, Rares Andrei Ambrus, Greg Shakhnarovich, and Matthew R. Walter. Splart: Articula- tion estimation and part-level reconstruction with 3d gaussian splatting.arXiv preprint arXiv:2506.03594, 2025. 2

work page arXiv 2025
[27]

Gaussian- flow: 4d reconstruction with dynamic 3d gaussian particle

Youtian Lin, Zuozhuo Dai, Siyu Zhu, and Yao Yao. Gaussian- flow: 4d reconstruction with dynamic 3d gaussian particle. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21136–21145,

work page
[28]

Paris: Part-level reconstruction and motion analysis for articulated objects

Jiayi Liu, Ali Mahdavi-Amiri, and Manolis Savva. Paris: Part-level reconstruction and motion analysis for articulated objects. InProceedings of the IEEE/CVF International Con- ference on Computer Vision (ICCV), pages 352–363, 2023. 1, 2, 6

work page 2023
[29]

Building rearticulable models for arbitrary 3d objects from 4d point clouds

Shaowei Liu, Saurabh Gupta, and Shenlong Wang. Building rearticulable models for arbitrary 3d objects from 4d point clouds. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21138–21147, 2023. 2

work page 2023
[30]

Aligning cyber space with physical world: A comprehensive survey on embod- ied ai.IEEE/ASME Transactions on Mechatronics, 30(6): 7253–7274, 2025

Yang Liu, Weixing Chen, Yongjie Bai, Xiaodan Liang, Guan- bin Li, Wen Gao, and Liang Lin. Aligning cyber space with physical world: A comprehensive survey on embod- ied ai.IEEE/ASME Transactions on Mechatronics, 30(6): 7253–7274, 2025. 1

work page 2025
[31]

Building interactable replicas of complex articulated objects via gaussian splatting

Yu Liu, Baoxiong Jia, Ruijie Lu, Junfeng Ni, Song-Chun Zhu, and Siyuan Huang. Building interactable replicas of complex articulated objects via gaussian splatting. InInternational Conference on Learning Representations (ICLR), 2025. 1, 2, 3, 5, 6, 7, 13, 14

work page 2025
[32]

Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis

Jonathon Luiten, Georgios Kopanas, Bastian Leibe, and Deva Ramanan. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. InProceedings of the International Conference on 3D Vision (3DV), pages 800–809, 2024. 3

work page 2024
[33]

Sim2real 2: Actively building explicit physics model for precise articulated object manipulation

Liqian Ma, Jiaojiao Meng, Shuntao Liu, Weihang Chen, Jing Xu, and Rui Chen. Sim2real 2: Actively building explicit physics model for precise articulated object manipulation. InProceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 11698–11704, 2023. 1, 2

work page 2023
[34]

Todd K. Moon. The expectation-maximization algorithm. IEEE Signal processing magazine, 13(6):47–60, 1996. 2

work page 1996
[35]

A-sdf: Learning disentangled signed distance functions for articulated shape representation

Jiteng Mu, Weichao Qiu, Adam Kortylewski, Alan Yuille, Nuno Vasconcelos, and Xiaolong Wang. A-sdf: Learning disentangled signed distance functions for articulated shape representation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 13001–13011,

work page
[36]

Structure from action: Learning interactions for articulated object 3d structure discovery

Neil Nie, Samir Yitzhak Gadre, Kiana Ehsani, and Shu- ran Song. Structure from action: Learning interactions for articulated object 3d structure discovery.arXiv preprint arXiv:2207.08997, 2022. 2

work page arXiv 2022
[37]

Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction

Michael Oechsle, Songyou Peng, and Andreas Geiger. Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5589–5599, 2021. 2

work page 2021
[38]

3dgs-avatar: Animatable avatars via deformable 3d gaussian splatting

Zhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, and Siyu Tang. 3dgs-avatar: Animatable avatars via deformable 3d gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5020–5030, 2024. 2

work page 2024
[39]

Swings: sliding windows for dynamic 3d gaus- sian splatting

Richard Shaw, Michal Nazarczuk, Jifei Song, Arthur Moreau, Sibi Catley-Chandar, Helisa Dhamo, and Eduardo P ´erez- Pellitero. Swings: sliding windows for dynamic 3d gaus- sian splatting. InEuropean Conference on Computer Vision (ECCV), pages 37–54, 2024. 3

work page 2024
[40]

Gaussianart: Unified modeling of geometry and motion for articulated objects.arXiv preprint arXiv:2508.14891, 2025

Licheng Shen, Saining Zhang, Honghan Li, Peilin Yang, Zi- hao Huang, Zongzheng Zhang, and Hao Zhao. Gaussianart: Unified modeling of geometry and motion for articulated objects.arXiv preprint arXiv:2508.14891, 2025. 1, 2, 3, 5

work page arXiv 2025
[41]

3dgstream: On-the-fly training of 3d gaus- sians for efficient streaming of photo-realistic free-viewpoint videos

Jiakai Sun, Han Jiao, Guangyuan Li, Zhanjie Zhang, Lei Zhao, and Wei Xing. 3dgstream: On-the-fly training of 3d gaus- sians for efficient streaming of photo-realistic free-viewpoint videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20675–20685, 2024. 3

work page 2024
[42]

Arti-pg: A toolbox for procedu- rally synthesizing large-scale and diverse articulated objects with rich annotations

Jianhua Sun, Yuxuan Li, Jiude Wei, Longfei Xu, Nange Wang, Yining Zhang, and Cewu Lu. Arti-pg: A toolbox for procedu- rally synthesizing large-scale and diverse articulated objects with rich annotations. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision (ICCV), pages 6396–6405, 2025. 1

work page 2025
[43]

Maiya, Vatsal Agarwal, and Abhinav Shrivas- tava

Archana Swaminathan, Anubhav Gupta, Kamal Gupta, Shishira R. Maiya, Vatsal Agarwal, and Abhinav Shrivas- tava. Leia: Latent view-invariant embeddings for implicit 3d articulation. InEuropean Conference on Computer Vision (ECCV), pages 210–227, 2024. 2

work page 2024
[44]

Neural geometric level of detail: Real-time rendering with implicit 3d shapes

Towaki Takikawa, Joey Litalien, Kangxue Yin, Karsten Kreis, Charles Loop, Derek Nowrouzezahrai, Alec Jacobson, Mor- gan McGuire, and Sanja Fidler. Neural geometric level of detail: Real-time rendering with implicit 3d shapes. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11358–11367, 2021. 2

work page 2021
[45]

Cla-nerf: Category-level articulated neural radiance 10 field

Wei-Cheng Tseng, Hung-Ju Liao, Lin Yen-Chen, and Min Sun. Cla-nerf: Category-level articulated neural radiance 10 field. InProceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 8454–8460, 2022. 2

work page 2022
[46]

Ryan, Chris- tian M

Mythreye Venkatesan, Harini Mohan, Justin R. Ryan, Chris- tian M. Sch¨urch, Garry P. Nolan, David H. Frakes, and Ah- met F. Coskun. Virtual and augmented reality for biomedical applications.Cell reports medicine, 2(7), 2021. 1

work page 2021
[47]

Neus: Learning neural im- plicit surfaces by volume rendering for multi-view reconstruc- tion

Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, and Wenping Wang. Neus: Learning neural im- plicit surfaces by volume rendering for multi-view reconstruc- tion. InAdvances in Neural Information Processing Systems (NeurIPS), pages 27171–27183, 2021. 2

work page 2021
[48]

Shape2motion: Joint analysis of motion parts and attributes from 3d shapes

Xiaogang Wang, Bin Zhou, Yahao Shi, Xiaowu Chen, Qin- ping Zhao, and Kai Xu. Shape2motion: Joint analysis of motion parts and attributes from 3d shapes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8876–8884, 2019. 2

work page 2019
[49]

Self-supervised neural articulated shape and appearance models

Fangyin Wei, Rohan Chabra, Lingni Ma, Christoph Lassner, Michael Zollh ¨ofer, Szymon Rusinkiewicz, Chris Sweeney, Richard Newcombe, and Mira Slavcheva. Self-supervised neural articulated shape and appearance models. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15816–15826, 2022. 2

work page 2022
[50]

Guibas, and Stan Birchfield

Yijia Weng, Bowen Wen, Jonathan Tremblay, Valts Blukis, Dieter Fox, Leonidas J. Guibas, and Stan Birchfield. Neural implicit representation for building digital twins of unknown articulated objects. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 3141–3150, 2024. 1, 2, 6

work page 2024
[51]

Reartgs: Reconstruct- ing and generating articulated objects via 3d gaussian splat- ting with geometric and motion constraints.arXiv preprint arXiv:2503.06677, 2025

Di Wu, Liu Liu, Linli Zhou, Anran Huang, Liangtu Song, Qiaojun Yu, Qi Wu, and Cewu Lu. Reartgs: Reconstruct- ing and generating articulated objects via 3d gaussian splat- ting with geometric and motion constraints.arXiv preprint arXiv:2503.06677, 2025. 2

work page arXiv 2025
[52]

4d gaussian splatting for real-time dynamic scene rendering

Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 4d gaussian splatting for real-time dynamic scene rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20310–20320,

work page
[53]

Hgs- mapping: Online dense mapping using hybrid gaussian rep- resentation in urban scenes.IEEE Robotics and Automation Letters, 2024

Ke Wu, Kaizhao Zhang, Zhiwei Zhang, Muer Tie, Shanshuai Yuan, Jieru Zhao, Zhongxue Gan, and Wenchao Ding. Hgs- mapping: Online dense mapping using hybrid gaussian rep- resentation in urban scenes.IEEE Robotics and Automation Letters, 2024. 2

work page 2024
[54]

Predict- optimize-distill: A self-improving cycle for 4d object under- standing.arXiv preprint arXiv:2504.17441, 2025

Mingxuan Wu, Huang Huang, Justin Kerr, Chung Min Kim, Anthony Zhang, Brent Yi, and Angjoo Kanazawa. Predict- optimize-distill: A self-improving cycle for 4d object under- standing.arXiv preprint arXiv:2504.17441, 2025. 2

work page arXiv 2025
[55]

Drawer: Digital reconstruction and articulation with environment realism

Hongchi Xia, Entong Su, Marius Memmel, Arhan Jain, Ray- mond Yu, Numfor Mbiziwo-Tiapo, Ali Farhadi, Abhishek Gupta, Shenlong Wang, and Wei-Chiu Ma. Drawer: Digital reconstruction and articulation with environment realism. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 21771–21782,

work page
[56]

Sapien: A simulated part-based interactive environment

Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, et al. Sapien: A simulated part-based interactive environment. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11097–11107, 2020. 1, 6

work page 2020
[57]

Physgaussian: Physics- integrated 3d gaussians for generative dynamics

Tianyi Xie, Zeshun Zong, Yuxing Qiu, Xuan Li, Yutao Feng, Yin Yang, and Chenfanfu Jiang. Physgaussian: Physics- integrated 3d gaussians for generative dynamics. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4389–4398, 2024. 2

work page 2024
[58]

Rpm-net: recurrent prediction of motion and parts from point cloud

Zihao Yan, Ruizhen Hu, Xingguang Yan, Luanmin Chen, Oliver Van Kaick, Hao Zhang, and Hui Huang. Rpm-net: recurrent prediction of motion and parts from point cloud. ACM Transactions on Graphics (TOG), 38(6):1–15, 2019. 2

work page 2019
[59]

Deformable 3d gaussians for high- fidelity monocular dynamic scene reconstruction

Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, and Xiaogang Jin. Deformable 3d gaussians for high- fidelity monocular dynamic scene reconstruction. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20331–20341, 2024. 3

work page 2024
[60]

Deep part induction from articu- lated object pairs.ACM Transactions on Graphics (TOG), 37 (6):1–15, 2018

Li Yi, Haibin Huang, Difan Liu, Evangelos Kalogerakis, Hao Su, and Leonidas Guibas. Deep part induction from articu- lated object pairs.ACM Transactions on Graphics (TOG), 37 (6):1–15, 2018. 2

work page 2018
[61]

Iaao: Interactive affordance learning for articulated objects in 3d environments

Can Zhang and Gim Hee Lee. Iaao: Interactive affordance learning for articulated objects in 3d environments. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12132–12142, 2025. 2

work page 2025
[62]

TwinOR: Photorealistic Digital Twins of Dynamic Operating Rooms for Embodied AI Research

Han Zhang, Yiqing Shen, Roger D. Soberanis-Mukul, Ankita Ghosh, Hao Ding, Lalithkumar Seenivasan, Jose L. Porras, Zhekai Mao, Chenjia Li, Wenjie Xiao, et al. Twinor: Photore- alistic digital twins of dynamic operating rooms for embodied ai research.arXiv preprint arXiv:2511.07412, 2025. 1

work page internal anchor Pith review Pith/arXiv arXiv 2025
[63]

Staged Opti- mization,

Mandi Zhao, Yijia Weng, Dominik Bauer, and Shuran Song. Real2code: Reconstruct articulated objects via code genera- tion. InInternational Conference on Learning Representa- tions (ICLR), 2025. 2 11 GEAR: GEometry-motion Alternating Refinement for Articulated Object Modeling with Gaussian Splatting Supplementary Material Supplementary Material Overview The...

work page 2025

[1] [1]

Tran- scending dimensions using generative ai: Real-time 3d model generation in augmented reality

Majid Behravan, Maryam Haghani, and Denis Graˇcanin. Tran- scending dimensions using generative ai: Real-time 3d model generation in augmented reality. InInternational Confer- ence on Human-Computer Interaction, pages 13–32. Springer,

work page

[2] [2]

Urdformer: A pipeline for constructing articulated simula- tion environments from real-world images.arXiv preprint arXiv:2405.11656, 2024

Zoey Chen, Aaron Walsman, Marius Memmel, Kaichun Mo, Alex Fang, Karthikeya Vemuri, Alan Wu, Dieter Fox, and Abhishek Gupta. Urdformer: A pipeline for constructing articulated simulation environments from real-world images. arXiv preprint arXiv:2405.11656, 2024. 1, 2

work page arXiv 2024

[3] [3]

Articulate your nerf: Unsupervised articulated object modeling via con- ditional view synthesis

Jianning Deng, Kartic Subr, and Hakan Bilen. Articulate your nerf: Unsupervised articulated object modeling via con- ditional view synthesis. InAdvances in Neural Information Processing Systems (NeurIPS), pages 119717–119741, 2024. 2

work page 2024

[4] [4]

A survey of embodied ai: From simulators to research tasks.IEEE Transactions on Emerging Topics in Computational Intelligence, 6(2):230–244, 2022

Jiafei Duan, Samson Yu, Hui Li Tan, Hongyuan Zhu, and Cheston Tan. A survey of embodied ai: From simulators to research tasks.IEEE Transactions on Emerging Topics in Computational Intelligence, 6(2):230–244, 2022. 1

work page 2022

[5] [5]

Duisterhof, Zhao Mandi, Yunchao Yao, Jia-Wei Liu, Mike Zheng Shou, Shuran Song, and Jeffrey Ichnowski

Bardienus P. Duisterhof, Zhao Mandi, Yunchao Yao, Jia- Wei Liu, Mike Zheng Shou, Shuran Song, and Jeffrey Ich- nowski. Md-splatting: Learning metric deformation from 4d gaussians in highly deformable scenes.arXiv preprint arXiv:2312.00583, 2023. 3

work page arXiv 2023

[6] [6]

Gs-lts: 3d gaussian splatting-based adaptive modeling for long-term service robots.arXiv preprint arXiv:2503.17733,

Bin Fu, Jialin Li, Bin Zhang, Ruiping Wang, and Xilin Chen. Gs-lts: 3d gaussian splatting-based adaptive modeling for long-term service robots.arXiv preprint arXiv:2503.17733,

work page arXiv

[7] [7]

Partrm: Modeling part-level dynamics with large cross-state recon- struction model

Mingju Gao, Yike Pan, Huan-ang Gao, Zongzheng Zhang, Wenyi Li, Hao Dong, Hao Tang, Li Yi, and Hao Zhao. Partrm: Modeling part-level dynamics with large cross-state recon- struction model. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7004–7014, 2025. 2

work page 2025

[8] [8]

SAGE: Bridging semantic and actionable parts for generalizable manipulation of articulated objects

Haoran Geng, Songlin Wei, Congyue Deng, Bokui Shen, He Wang, and Leonidas Guibas. SAGE: Bridging semantic and actionable parts for generalizable manipulation of articulated objects. InICLR Workshop on Large Language Model (LLM) Agents, 2024. 2

work page 2024

[9] [9]

Articulatedgs: Self-supervised digital twin mod- eling of articulated objects using 3d gaussian splatting

Junfu Guo, Yu Xin, Gaoyi Liu, Kai Xu, Ligang Liu, and Ruizhen Hu. Articulatedgs: Self-supervised digital twin mod- eling of articulated objects using 3d gaussian splatting. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 27144–27153,

work page

[10] [10]

Carto: Category and joint agnostic reconstruction of articulated objects

Nick Heppert, Muhammad Zubair Irshad, Sergey Zakharov, Katherine Liu, Rares Andrei Ambrus, Jeannette Bohg, Abhi- nav Valada, and Thomas Kollar. Carto: Category and joint agnostic reconstruction of articulated objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21201–21210, 2023. 2

work page 2023

[11] [11]

Ditto in the house: Building articulation models of indoor scenes through interactive perception

Cheng-Chun Hsu, Zhenyu Jiang, and Yuke Zhu. Ditto in the house: Building articulation models of indoor scenes through interactive perception. InProceedings of the IEEE Interna- tional Conference on Robotics and Automation (ICRA), pages 3933–3939, 2023. 1, 2

work page 2023

[12] [12]

2d gaussian splatting for geometrically accu- rate radiance fields

Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically accu- rate radiance fields. InACM SIGGRAPH Conference Papers, pages 1–11, 2024. 2, 3

work page 2024

[13] [13]

Ditto: Build- ing digital twins of articulated objects from interaction

Zhenyu Jiang, Cheng-Chun Hsu, and Yuke Zhu. Ditto: Build- ing digital twins of articulated objects from interaction. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5616–5626, 2022. 1, 2, 6

work page 2022

[14] [14]

Gs-planner: A gaussian-splatting- based planning framework for active high-fidelity reconstruc- tion

Rui Jin, Yuman Gao, Yingjian Wang, Yuze Wu, Haojian Lu, Chao Xu, and Fei Gao. Gs-planner: A gaussian-splatting- based planning framework for active high-fidelity reconstruc- tion. InProceedings of the IEEE/RSJ International Confer- ence on Intelligent Robots and Systems (IROS), pages 11202– 11209, 2024. 2

work page 2024

[15] [15]

Detection based part- level articulated object reconstruction from single rgbd im- age

Yuki Kawana and Tatsuya Harada. Detection based part- level articulated object reconstruction from single rgbd im- age. InAdvances in Neural Information Processing Systems (NeurIPS), pages 18444–18473, 2023. 2

work page 2023

[16] [16]

Unsu- pervised pose-aware part decomposition for 3d articulated objects.arXiv preprint arXiv:2110.04411, 2021

Yuki Kawana, Yusuke Mukuta, and Tatsuya Harada. Unsu- pervised pose-aware part decomposition for 3d articulated objects.arXiv preprint arXiv:2110.04411, 2021. 2

work page arXiv 2021

[17] [17]

Splatam: Splat track & map 3d gaussians for dense rgb-d slam

Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian Scherer, Deva Ramanan, and Jonathon Luiten. Splatam: Splat track & map 3d gaussians for dense rgb-d slam. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 21357–21366, 2024. 2

work page 2024

[18] [18]

3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics (TOG), 42 (4):1–14, 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics (TOG), 42 (4):1–14, 2023. 2

work page 2023

[19] [19]

Berg, Wan-Yen Lo, et al

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C. Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF International Con- ference on Computer Vision (ICCV), pages 4015–4026, 2023. 2

work page 2023

[20] [20]

Dynmf: Neural motion factorization for real-time dynamic view syn- thesis with 3d gaussian splatting

Agelos Kratimenos, Jiahui Lei, and Kostas Daniilidis. Dynmf: Neural motion factorization for real-time dynamic view syn- thesis with 3d gaussian splatting. InEuropean Conference on Computer Vision (ECCV), pages 252–269, 2024. 3

work page 2024

[21] [21]

Articulate-anything: Auto- matic modeling of articulated objects via a vision-language foundation model

Long Le, Jason Xie, William Liang, Hung-Ju Wang, Yue Yang, Yecheng Jason Ma, Kyle Vedder, Arjun Krishna, Di- nesh Jayaraman, and Eric Eaton. Articulate-anything: Auto- matic modeling of articulated objects via a vision-language foundation model. InInternational Conference on Learning Representations (ICLR), 2025. 2

work page 2025

[22] [22]

Locate n’rotate: Two-stage open- able part detection with geometric foundation model priors

Siqi Li, Xiaoxue Chen, Haoyu Cheng, Guyue Zhou, Hao Zhao, and Guanzhong Tian. Locate n’rotate: Two-stage open- able part detection with geometric foundation model priors. 9 InProceedings of the Asian Conference on Computer Vision (ACCV), pages 716–732, 2024. 2

work page 2024

[23] [23]

Guibas, A

Xiaolong Li, He Wang, Li Yi, Leonidas J. Guibas, A. Lynn Abbott, and Shuran Song. Category-level articulated object pose estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3706–3715, 2020. 2

work page 2020

[24] [24]

Robogsim: A real2sim2real robotic gaussian splatting simulator.arXiv preprint arXiv:2411.11839, 2024

Xinhai Li, Jialin Li, Ziheng Zhang, Rui Zhang, Fan Jia, Tian- cai Wang, Haoqiang Fan, Kuo-Kun Tseng, and Ruiping Wang. Robogsim: A real2sim2real robotic gaussian splatting simu- lator.arXiv preprint arXiv:2411.11839, 2024. 2

work page arXiv 2024

[25] [25]

Gaufre: Gaussian deformation fields for real-time dynamic novel view synthesis

Yiqing Liang, Numair Khan, Zhengqin Li, Thu Nguyen- Phuoc, Douglas Lanman, James Tompkin, and Lei Xiao. Gaufre: Gaussian deformation fields for real-time dynamic novel view synthesis. InProceedings of the IEEE/CVF Win- ter Conference on Applications of Computer Vision (WACV), pages 2642–2652, 2025. 3

work page 2025

[26] [26]

Shengjie Lin, Jiading Fang, Muhammad Zubair Irshad, Vi- tor Campagnolo Guizilini, Rares Andrei Ambrus, Greg Shakhnarovich, and Matthew R. Walter. Splart: Articula- tion estimation and part-level reconstruction with 3d gaussian splatting.arXiv preprint arXiv:2506.03594, 2025. 2

work page arXiv 2025

[27] [27]

Gaussian- flow: 4d reconstruction with dynamic 3d gaussian particle

Youtian Lin, Zuozhuo Dai, Siyu Zhu, and Yao Yao. Gaussian- flow: 4d reconstruction with dynamic 3d gaussian particle. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21136–21145,

work page

[28] [28]

Paris: Part-level reconstruction and motion analysis for articulated objects

Jiayi Liu, Ali Mahdavi-Amiri, and Manolis Savva. Paris: Part-level reconstruction and motion analysis for articulated objects. InProceedings of the IEEE/CVF International Con- ference on Computer Vision (ICCV), pages 352–363, 2023. 1, 2, 6

work page 2023

[29] [29]

Building rearticulable models for arbitrary 3d objects from 4d point clouds

Shaowei Liu, Saurabh Gupta, and Shenlong Wang. Building rearticulable models for arbitrary 3d objects from 4d point clouds. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21138–21147, 2023. 2

work page 2023

[30] [30]

Aligning cyber space with physical world: A comprehensive survey on embod- ied ai.IEEE/ASME Transactions on Mechatronics, 30(6): 7253–7274, 2025

Yang Liu, Weixing Chen, Yongjie Bai, Xiaodan Liang, Guan- bin Li, Wen Gao, and Liang Lin. Aligning cyber space with physical world: A comprehensive survey on embod- ied ai.IEEE/ASME Transactions on Mechatronics, 30(6): 7253–7274, 2025. 1

work page 2025

[31] [31]

Building interactable replicas of complex articulated objects via gaussian splatting

Yu Liu, Baoxiong Jia, Ruijie Lu, Junfeng Ni, Song-Chun Zhu, and Siyuan Huang. Building interactable replicas of complex articulated objects via gaussian splatting. InInternational Conference on Learning Representations (ICLR), 2025. 1, 2, 3, 5, 6, 7, 13, 14

work page 2025

[32] [32]

Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis

Jonathon Luiten, Georgios Kopanas, Bastian Leibe, and Deva Ramanan. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. InProceedings of the International Conference on 3D Vision (3DV), pages 800–809, 2024. 3

work page 2024

[33] [33]

Sim2real 2: Actively building explicit physics model for precise articulated object manipulation

Liqian Ma, Jiaojiao Meng, Shuntao Liu, Weihang Chen, Jing Xu, and Rui Chen. Sim2real 2: Actively building explicit physics model for precise articulated object manipulation. InProceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 11698–11704, 2023. 1, 2

work page 2023

[34] [34]

Todd K. Moon. The expectation-maximization algorithm. IEEE Signal processing magazine, 13(6):47–60, 1996. 2

work page 1996

[35] [35]

A-sdf: Learning disentangled signed distance functions for articulated shape representation

Jiteng Mu, Weichao Qiu, Adam Kortylewski, Alan Yuille, Nuno Vasconcelos, and Xiaolong Wang. A-sdf: Learning disentangled signed distance functions for articulated shape representation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 13001–13011,

work page

[36] [36]

Structure from action: Learning interactions for articulated object 3d structure discovery

Neil Nie, Samir Yitzhak Gadre, Kiana Ehsani, and Shu- ran Song. Structure from action: Learning interactions for articulated object 3d structure discovery.arXiv preprint arXiv:2207.08997, 2022. 2

work page arXiv 2022

[37] [37]

Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction

Michael Oechsle, Songyou Peng, and Andreas Geiger. Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5589–5599, 2021. 2

work page 2021

[38] [38]

3dgs-avatar: Animatable avatars via deformable 3d gaussian splatting

Zhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, and Siyu Tang. 3dgs-avatar: Animatable avatars via deformable 3d gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5020–5030, 2024. 2

work page 2024

[39] [39]

Swings: sliding windows for dynamic 3d gaus- sian splatting

Richard Shaw, Michal Nazarczuk, Jifei Song, Arthur Moreau, Sibi Catley-Chandar, Helisa Dhamo, and Eduardo P ´erez- Pellitero. Swings: sliding windows for dynamic 3d gaus- sian splatting. InEuropean Conference on Computer Vision (ECCV), pages 37–54, 2024. 3

work page 2024

[40] [40]

Gaussianart: Unified modeling of geometry and motion for articulated objects.arXiv preprint arXiv:2508.14891, 2025

Licheng Shen, Saining Zhang, Honghan Li, Peilin Yang, Zi- hao Huang, Zongzheng Zhang, and Hao Zhao. Gaussianart: Unified modeling of geometry and motion for articulated objects.arXiv preprint arXiv:2508.14891, 2025. 1, 2, 3, 5

work page arXiv 2025

[41] [41]

3dgstream: On-the-fly training of 3d gaus- sians for efficient streaming of photo-realistic free-viewpoint videos

Jiakai Sun, Han Jiao, Guangyuan Li, Zhanjie Zhang, Lei Zhao, and Wei Xing. 3dgstream: On-the-fly training of 3d gaus- sians for efficient streaming of photo-realistic free-viewpoint videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20675–20685, 2024. 3

work page 2024

[42] [42]

Arti-pg: A toolbox for procedu- rally synthesizing large-scale and diverse articulated objects with rich annotations

Jianhua Sun, Yuxuan Li, Jiude Wei, Longfei Xu, Nange Wang, Yining Zhang, and Cewu Lu. Arti-pg: A toolbox for procedu- rally synthesizing large-scale and diverse articulated objects with rich annotations. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision (ICCV), pages 6396–6405, 2025. 1

work page 2025

[43] [43]

Maiya, Vatsal Agarwal, and Abhinav Shrivas- tava

Archana Swaminathan, Anubhav Gupta, Kamal Gupta, Shishira R. Maiya, Vatsal Agarwal, and Abhinav Shrivas- tava. Leia: Latent view-invariant embeddings for implicit 3d articulation. InEuropean Conference on Computer Vision (ECCV), pages 210–227, 2024. 2

work page 2024

[44] [44]

Neural geometric level of detail: Real-time rendering with implicit 3d shapes

Towaki Takikawa, Joey Litalien, Kangxue Yin, Karsten Kreis, Charles Loop, Derek Nowrouzezahrai, Alec Jacobson, Mor- gan McGuire, and Sanja Fidler. Neural geometric level of detail: Real-time rendering with implicit 3d shapes. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11358–11367, 2021. 2

work page 2021

[45] [45]

Cla-nerf: Category-level articulated neural radiance 10 field

Wei-Cheng Tseng, Hung-Ju Liao, Lin Yen-Chen, and Min Sun. Cla-nerf: Category-level articulated neural radiance 10 field. InProceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 8454–8460, 2022. 2

work page 2022

[46] [46]

Ryan, Chris- tian M

Mythreye Venkatesan, Harini Mohan, Justin R. Ryan, Chris- tian M. Sch¨urch, Garry P. Nolan, David H. Frakes, and Ah- met F. Coskun. Virtual and augmented reality for biomedical applications.Cell reports medicine, 2(7), 2021. 1

work page 2021

[47] [47]

Neus: Learning neural im- plicit surfaces by volume rendering for multi-view reconstruc- tion

Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, and Wenping Wang. Neus: Learning neural im- plicit surfaces by volume rendering for multi-view reconstruc- tion. InAdvances in Neural Information Processing Systems (NeurIPS), pages 27171–27183, 2021. 2

work page 2021

[48] [48]

Shape2motion: Joint analysis of motion parts and attributes from 3d shapes

Xiaogang Wang, Bin Zhou, Yahao Shi, Xiaowu Chen, Qin- ping Zhao, and Kai Xu. Shape2motion: Joint analysis of motion parts and attributes from 3d shapes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8876–8884, 2019. 2

work page 2019

[49] [49]

Self-supervised neural articulated shape and appearance models

Fangyin Wei, Rohan Chabra, Lingni Ma, Christoph Lassner, Michael Zollh ¨ofer, Szymon Rusinkiewicz, Chris Sweeney, Richard Newcombe, and Mira Slavcheva. Self-supervised neural articulated shape and appearance models. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15816–15826, 2022. 2

work page 2022

[50] [50]

Guibas, and Stan Birchfield

Yijia Weng, Bowen Wen, Jonathan Tremblay, Valts Blukis, Dieter Fox, Leonidas J. Guibas, and Stan Birchfield. Neural implicit representation for building digital twins of unknown articulated objects. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 3141–3150, 2024. 1, 2, 6

work page 2024

[51] [51]

Reartgs: Reconstruct- ing and generating articulated objects via 3d gaussian splat- ting with geometric and motion constraints.arXiv preprint arXiv:2503.06677, 2025

Di Wu, Liu Liu, Linli Zhou, Anran Huang, Liangtu Song, Qiaojun Yu, Qi Wu, and Cewu Lu. Reartgs: Reconstruct- ing and generating articulated objects via 3d gaussian splat- ting with geometric and motion constraints.arXiv preprint arXiv:2503.06677, 2025. 2

work page arXiv 2025

[52] [52]

4d gaussian splatting for real-time dynamic scene rendering

Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 4d gaussian splatting for real-time dynamic scene rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20310–20320,

work page

[53] [53]

Hgs- mapping: Online dense mapping using hybrid gaussian rep- resentation in urban scenes.IEEE Robotics and Automation Letters, 2024

Ke Wu, Kaizhao Zhang, Zhiwei Zhang, Muer Tie, Shanshuai Yuan, Jieru Zhao, Zhongxue Gan, and Wenchao Ding. Hgs- mapping: Online dense mapping using hybrid gaussian rep- resentation in urban scenes.IEEE Robotics and Automation Letters, 2024. 2

work page 2024

[54] [54]

Predict- optimize-distill: A self-improving cycle for 4d object under- standing.arXiv preprint arXiv:2504.17441, 2025

Mingxuan Wu, Huang Huang, Justin Kerr, Chung Min Kim, Anthony Zhang, Brent Yi, and Angjoo Kanazawa. Predict- optimize-distill: A self-improving cycle for 4d object under- standing.arXiv preprint arXiv:2504.17441, 2025. 2

work page arXiv 2025

[55] [55]

Drawer: Digital reconstruction and articulation with environment realism

Hongchi Xia, Entong Su, Marius Memmel, Arhan Jain, Ray- mond Yu, Numfor Mbiziwo-Tiapo, Ali Farhadi, Abhishek Gupta, Shenlong Wang, and Wei-Chiu Ma. Drawer: Digital reconstruction and articulation with environment realism. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 21771–21782,

work page

[56] [56]

Sapien: A simulated part-based interactive environment

Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, et al. Sapien: A simulated part-based interactive environment. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11097–11107, 2020. 1, 6

work page 2020

[57] [57]

Physgaussian: Physics- integrated 3d gaussians for generative dynamics

Tianyi Xie, Zeshun Zong, Yuxing Qiu, Xuan Li, Yutao Feng, Yin Yang, and Chenfanfu Jiang. Physgaussian: Physics- integrated 3d gaussians for generative dynamics. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4389–4398, 2024. 2

work page 2024

[58] [58]

Rpm-net: recurrent prediction of motion and parts from point cloud

Zihao Yan, Ruizhen Hu, Xingguang Yan, Luanmin Chen, Oliver Van Kaick, Hao Zhang, and Hui Huang. Rpm-net: recurrent prediction of motion and parts from point cloud. ACM Transactions on Graphics (TOG), 38(6):1–15, 2019. 2

work page 2019

[59] [59]

Deformable 3d gaussians for high- fidelity monocular dynamic scene reconstruction

Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, and Xiaogang Jin. Deformable 3d gaussians for high- fidelity monocular dynamic scene reconstruction. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20331–20341, 2024. 3

work page 2024

[60] [60]

Deep part induction from articu- lated object pairs.ACM Transactions on Graphics (TOG), 37 (6):1–15, 2018

Li Yi, Haibin Huang, Difan Liu, Evangelos Kalogerakis, Hao Su, and Leonidas Guibas. Deep part induction from articu- lated object pairs.ACM Transactions on Graphics (TOG), 37 (6):1–15, 2018. 2

work page 2018

[61] [61]

Iaao: Interactive affordance learning for articulated objects in 3d environments

Can Zhang and Gim Hee Lee. Iaao: Interactive affordance learning for articulated objects in 3d environments. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12132–12142, 2025. 2

work page 2025

[62] [62]

TwinOR: Photorealistic Digital Twins of Dynamic Operating Rooms for Embodied AI Research

Han Zhang, Yiqing Shen, Roger D. Soberanis-Mukul, Ankita Ghosh, Hao Ding, Lalithkumar Seenivasan, Jose L. Porras, Zhekai Mao, Chenjia Li, Wenjie Xiao, et al. Twinor: Photore- alistic digital twins of dynamic operating rooms for embodied ai research.arXiv preprint arXiv:2511.07412, 2025. 1

work page internal anchor Pith review Pith/arXiv arXiv 2025

[63] [63]

Staged Opti- mization,

Mandi Zhao, Yijia Weng, Dominik Bauer, and Shuran Song. Real2code: Reconstruct articulated objects via code genera- tion. InInternational Conference on Learning Representa- tions (ICLR), 2025. 2 11 GEAR: GEometry-motion Alternating Refinement for Articulated Object Modeling with Gaussian Splatting Supplementary Material Supplementary Material Overview The...

work page 2025