SPAGS: Sparse-View Articulated Object Reconstruction from Single State via Planar Gaussian Splatting

Di Wu; Lijun Yue; Liu Liu; Liuzhu Chen; Meng Wang; Wenxiao Chen; Xueyu Yuan; Yiming Tang

arxiv: 2511.17092 · v4 · submitted 2025-11-21 · 💻 cs.CV

SPAGS: Sparse-View Articulated Object Reconstruction from Single State via Planar Gaussian Splatting

Di Wu , Liu Liu , Xueyu Yuan , Wenxiao Chen , Lijun Yue , Liuzhu Chen , Yiming Tang , Meng Wang This is my paper

Pith reviewed 2026-05-17 20:50 UTC · model grok-4.3

classification 💻 cs.CV

keywords Articulated object reconstructionPlanar Gaussian SplattingSparse-view 3D reconstructionSingle-state captureVision-language model promptingPart segmentationGaussian optimization

0 comments

The pith

Planar Gaussian Splatting reconstructs articulated objects from sparse single-state views by constraining Gaussians to planar primitives and using VLM prompting for part segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a method to reconstruct 3D models of articulated objects using only a few RGB images captured from one configuration, without needing multiple poses or extensive camera setups. It replaces standard 3D Gaussians with planar versions to improve depth and normal accuracy, then optimizes them step by step while adding smoothness and diffusion-based regularization. A vision-language model is prompted visually to label parts and estimate joints in an open-vocabulary way. If successful, this lowers the data cost for creating usable 3D models of everyday movable items like furniture or tools, making reconstruction practical from casual captures.

Core claim

The central claim is that constraining Gaussian splats to planar primitives, combined with a Gaussian information field for viewpoint selection and VLM-driven part labeling, enables category-agnostic reconstruction of articulated objects from sparse single-state RGB images while achieving higher surface fidelity than prior baselines on both synthetic and real data.

What carries the argument

Planar Gaussian primitives, which replace volumetric 3D Gaussians with flat representations to enforce accurate normal and depth estimates during coarse-to-fine optimization.

If this is right

Reconstruction pipelines no longer need multi-view or multi-state captures for articulated items.
Part-level surface models become obtainable from casual single-pose smartphone photos.
Open-vocabulary segmentation extends to new object categories without retraining detectors.
Depth and normal accuracy improve enough for downstream tasks like physics simulation of moving parts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same planar constraint might transfer to non-articulated scenes where sharp edges matter.
Replacing the VLM step with a learned joint predictor could remove reliance on prompt quality.
Sparse-view selection via the information field could be tested on dynamic video sequences.

Load-bearing premise

A vision-language model given visual prompts will produce reliable open-vocabulary part labels and joint parameters directly from the optimized planar Gaussian output.

What would settle it

Run the pipeline on a real-world object such as a folding chair or robot arm where the VLM segmentation visibly mislabels a joint axis; if the resulting 3D model then shows incorrect articulation, the end-to-end claim fails.

Figures

Figures reproduced from arXiv: 2511.17092 by Di Wu, Lijun Yue, Liu Liu, Liuzhu Chen, Meng Wang, Wenxiao Chen, Xueyu Yuan, Yiming Tang.

**Figure 2.** Figure 2: The Framework of SPAGS. We use the snowflake symbol to denote frozen network weights and the flame symbol to indicate [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of joint estimation. Note that we use high [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 5.** Figure 5: The qualitative results of novel view synthesis on [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: The qualitative results of articulated modeling. We set [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: The qualitative results of our real-world performance. [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

read the original abstract

Articulated objects are ubiquitous in daily environments, and their 3D reconstruction holds great significance across various fields. However, existing articulated object reconstruction methods typically require costly inputs such as multi-stage and multi-view observations. To address the limitations, we propose a category-agnostic articulated object reconstruction framework via planar Gaussian Splatting, which only uses sparse-view RGB images from a single state. Specifically, we first introduce a Gaussian information field to perceive the optimal sparse viewpoints from candidate camera poses. To ensure precise geometric fidelity, we constrain traditional 3D Gaussians into planar primitives, facilitating accurate normal and depth estimation. The planar Gaussians are then optimized in a coarse-to-fine manner, regularized by depth smoothness and few-shot diffusion priors. Furthermore, we leverage a Vision-Language Model (VLM) via visual prompting to achieve open-vocabulary part segmentation and joint parameter estimation. Extensive experiments on both synthetic and real-world datasets demonstrate that our approach significantly outperforms existing baselines, achieving superior part-level surface reconstruction fidelity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a single-state sparse-view pipeline for articulated reconstruction via planar Gaussians and VLM prompting, but the performance claims rest on unshown experiments and a potentially brittle VLM step.

read the letter

The paper's central contribution is a pipeline that reconstructs articulated objects from just a handful of views in one configuration, using planar-constrained Gaussians and a vision-language model to handle parts and joints. This could matter for settings where capturing multiple states or dense views is not feasible. What stands out as new is the use of planar primitives instead of full 3D Gaussians to get better normal and depth estimates, combined with a coarse-to-fine optimization schedule regularized by depth smoothness and few-shot diffusion priors. The Gaussian information field for selecting optimal viewpoints from candidates is also a practical addition. Then the VLM is prompted visually on the result to do open-vocabulary part segmentation and estimate joint parameters. This specific stack for the single-state sparse-view case does not appear in the cited prior work on Gaussian splatting or articulated reconstruction. The approach does a good job of tackling the input cost problem head-on and framing an end-to-end system that avoids heavy supervision. The soft spots are mostly around validation. The abstract states that the method significantly outperforms baselines on synthetic and real datasets for part-level surface reconstruction, but there are no numbers, tables, or detailed baseline descriptions provided. Without those, it's difficult to gauge the actual improvement or whether the gains come from the planar constraints or elsewhere. The free parameters like the optimization schedule and smoothness weight suggest some tuning is involved, which is normal but should be analyzed. A real concern is the VLM stage. If the vision-language model does not reliably produce accurate part segments and joint estimates from the Gaussian field on novel objects, then the articulated decomposition could be noisy, and the superior fidelity claim would not hold up. This is a common issue with VLMs on geometry tasks, so the paper would need strong evidence that this step works consistently. Overall, this is for people in computer vision and robotics who need lighter data requirements for 3D articulated modeling. A reader interested in Gaussian-based methods or VLM applications in reconstruction might get ideas from it. I think it deserves peer review. The idea is coherent and the problem is relevant, so referees could assess the experiments and suggest improvements on the VLM robustness.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes SPAGS, a category-agnostic framework for articulated object reconstruction from sparse-view RGB images in a single state. It introduces a Gaussian information field to select optimal viewpoints, constrains 3D Gaussians to planar primitives for improved normal and depth estimation, performs coarse-to-fine optimization regularized by depth smoothness and few-shot diffusion priors, and applies a Vision-Language Model via visual prompting for open-vocabulary part segmentation and joint parameter estimation. The central claim is that this yields superior part-level surface reconstruction fidelity over existing baselines on both synthetic and real-world datasets.

Significance. If validated, the work could meaningfully advance sparse-input articulated reconstruction by combining planar Gaussian splatting with VLM-based decomposition, addressing geometric fidelity in under-constrained single-state settings. The Gaussian information field and planar primitive constraint represent concrete technical contributions that merit evaluation; the diffusion prior regularization is a positive element for handling sparsity.

major comments (2)

[Abstract / Experiments] Abstract and Experiments section: The assertion that the method 'significantly outperforms existing baselines' and achieves 'superior part-level surface reconstruction fidelity' is presented without any quantitative tables, metrics, error bars, baseline descriptions, or ablation results in the manuscript, preventing verification of the central performance claim.
[Method (VLM stage)] VLM integration stage (method description following planar Gaussian optimization): The final decomposition into articulated components depends on VLM visual prompting for open-vocabulary part segmentation and joint estimation. No accuracy metrics, failure-mode analysis, or comparison against ground-truth part labels are reported for this step on novel objects; if VLM outputs are noisy or incomplete, the reported part-level fidelity gains cannot be attributed to the planar Gaussian optimization.

minor comments (2)

[Method] The 'Gaussian information field' is introduced as a novel component but lacks an explicit equation or pseudocode defining its computation from candidate poses, making reproduction difficult.
[Experiments / Figures] Figure captions and experimental setup descriptions should explicitly list the synthetic and real datasets used, the number of views, and the exact baselines compared to allow direct assessment of the outperformance claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. We have reviewed each major comment carefully and provide point-by-point responses below, indicating the revisions we plan to incorporate.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and Experiments section: The assertion that the method 'significantly outperforms existing baselines' and achieves 'superior part-level surface reconstruction fidelity' is presented without any quantitative tables, metrics, error bars, baseline descriptions, or ablation results in the manuscript, preventing verification of the central performance claim.

Authors: We acknowledge that the abstract makes strong performance claims and that the submitted manuscript version may not have presented the supporting quantitative evidence with sufficient prominence or completeness. The experiments section describes evaluations on synthetic and real-world datasets, but we agree that explicit tables with metrics (e.g., Chamfer distance, normal error, part-level IoU), error bars from repeated runs, detailed baseline specifications, and ablation studies are necessary for verification. In the revised manuscript we will expand the Experiments section to include these elements in a clear, tabular format so that the claims of significant outperformance and superior part-level fidelity can be directly verified. revision: yes
Referee: [Method (VLM stage)] VLM integration stage (method description following planar Gaussian optimization): The final decomposition into articulated components depends on VLM visual prompting for open-vocabulary part segmentation and joint estimation. No accuracy metrics, failure-mode analysis, or comparison against ground-truth part labels are reported for this step on novel objects; if VLM outputs are noisy or incomplete, the reported part-level fidelity gains cannot be attributed to the planar Gaussian optimization.

Authors: We agree that a separate quantitative assessment of the VLM stage is important to isolate its contribution. The current manuscript describes the use of visual prompting for open-vocabulary part segmentation and joint estimation but does not report dedicated metrics or analysis for this component. In the revision we will add a dedicated evaluation subsection (or appendix) that reports segmentation accuracy (e.g., mean IoU against ground-truth part labels), joint parameter estimation errors, failure-mode analysis with representative examples of noisy or incomplete VLM outputs, and comparisons on novel objects from both synthetic and real datasets. This will clarify the reliability of the VLM outputs and allow proper attribution of the observed part-level reconstruction gains. revision: yes

Circularity Check

0 steps flagged

No circularity: independent multi-stage pipeline with external priors and experimental validation

full rationale

The described framework consists of sequential, non-reductive steps: a Gaussian information field for viewpoint perception, planar primitive constraints for geometry estimation, coarse-to-fine optimization regularized by depth smoothness and few-shot diffusion priors, followed by VLM visual prompting for part segmentation and joint estimation. No equations, definitions, or self-citations are presented that make any claimed output (such as part-level fidelity) equivalent to the inputs by construction or force a prediction from a fitted subset. The central claims rest on comparative experiments against baselines rather than tautological reductions, rendering the derivation self-contained.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 1 invented entities

Abstract-only review limits visibility into exact hyperparameters and priors; the method rests on standard assumptions of Gaussian splatting plus new constraints whose independence from data fitting is not shown here.

free parameters (2)

coarse-to-fine optimization schedule
Step-wise refinement parameters chosen to balance geometry and regularization; values not stated in abstract.
depth smoothness weight
Regularization strength that trades off surface smoothness against fidelity; appears tuned rather than derived.

axioms (2)

domain assumption Planar primitives suffice to represent articulated surfaces with accurate normals and depth
Invoked when constraining 3D Gaussians to planar form for geometric fidelity.
domain assumption Few-shot diffusion priors provide useful regularization without introducing bias
Used to guide planar Gaussian optimization.

invented entities (1)

Gaussian information field no independent evidence
purpose: To select optimal sparse viewpoints from candidate poses
New module introduced to perceive best camera angles; no independent evidence supplied in abstract.

pith-pipeline@v0.9.0 · 5497 in / 1355 out tokens · 30276 ms · 2026-05-17T20:50:46.614037+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We compress 3D Gaussians into planar Gaussians... Lscale = 1/NG Σ min(S1,S2,S3)
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

coarse-to-fine optimization... depth smoothness and few-shot diffusion priors

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 2 internal anchors

[1]

Pgsr: Planar-based gaussian splatting for efficient and high-fidelity surface reconstruction.arXiv preprint arXiv:2406.06521, 2024

Danpeng Chen, Hai Li, Weicai Ye, Yifan Wang, Weijian Xie, Shangjin Zhai, Nan Wang, Haomin Liu, Hujun Bao, and Guofeng Zhang. Pgsr: Planar-based gaussian splatting for efficient and high-fidelity surface reconstruction.arXiv preprint arXiv:2406.06521, 2024. 4, 5, 6, 7

work page arXiv 2024
[2]

Gaussianeditor: Swift and controllable 3d editing with gaussian splatting, 2023

Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xiaofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huaping Liu, and Guosheng Lin. Gaussianeditor: Swift and controllable 3d editing with gaussian splatting, 2023. 5

work page 2023
[3]

Articulatedgs: Self-supervised digital twin modeling of articulated objects using 3d gaussian splatting

Junfu Guo, Yu Xin, Gaoyi Liu, Kai Xu, Ligang Liu, and Ruizhen Hu. Articulatedgs: Self-supervised digital twin modeling of articulated objects using 3d gaussian splatting. arXiv preprint arXiv:2503.08135, 2025. 2

work page arXiv 2025
[4]

LoRA: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InIn- ternational Conference on Learning Representations, 2022. 4

work page 2022
[5]

Transparentgs: Fast inverse rendering of transpar- ent objects with gaussians.ACM Transactions on Graphics (TOG), 44(4):1–17, 2025

Letian Huang, Dongwei Ye, Jialin Dan, Chengzhi Tao, Hui- wen Liu, Kun Zhou, Bo Ren, Yuanqi Li, Yanwen Guo, and Jie Guo. Transparentgs: Fast inverse rendering of transpar- ent objects with gaussians.ACM Transactions on Graphics (TOG), 44(4):1–17, 2025. 9

work page 2025
[6]

Spar3d: Stable point-aware re- construction of 3d objects from single images.arXiv preprint arXiv:2501.04689, 2025

Zixuan Huang, Mark Boss, Aaryaman Vasishta, James M Rehg, and Varun Jampani. Spar3d: Stable point-aware re- construction of 3d objects from single images.arXiv preprint arXiv:2501.04689, 2025. 2, 3

work page arXiv 2025
[7]

Ditto: Building digital twins of articulated objects from interaction

Zhenyu Jiang, Cheng-Chun Hsu, and Yuke Zhu. Ditto: Building digital twins of articulated objects from interaction. InConference on Computer Vision and Pattern Recognition (CVPR), 2022. 1, 2

work page 2022
[8]

3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023. 1, 2, 3, 4

work page 2023
[9]

Segment Anything

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C. Berg, Wan-Yen Lo, Piotr Doll ´ar, and Ross Girshick. Segment anything.arXiv:2304.02643, 2023. 6

work page internal anchor Pith review Pith/arXiv arXiv 2023
[10]

Non-rigid point cloud reg- istration with neural deformation pyramid.arXiv preprint arXiv:2205.12796, 2022

Yang Li and Tatsuya Harada. Non-rigid point cloud reg- istration with neural deformation pyramid.arXiv preprint arXiv:2205.12796, 2022. 4

work page arXiv 2022
[11]

Paris: Part-level reconstruction and motion analysis for articulated objects

Jiayi Liu, Ali Mahdavi-Amiri, and Manolis Savva. Paris: Part-level reconstruction and motion analysis for articulated objects. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 352–363, 2023. 1, 2

work page 2023
[12]

arXiv preprint arXiv:2410.16499 (2024)

Jiayi Liu, Denys Iliash, Angel X Chang, Manolis Savva, and Ali Mahdavi-Amiri. SINGAPO: Single image controlled generation of articulated parts in object.arXiv preprint arXiv:2410.16499, 2024. 2

work page arXiv 2024
[13]

Building interactable replicas of complex articulated objects via gaussian splatting

Yu Liu, Baoxiong Jia, Ruijie Lu, Junfeng Ni, Song-Chun Zhu, and Siyuan Huang. Building interactable replicas of complex articulated objects via gaussian splatting. InThe Thirteenth International Conference on Learning Represen- tations, 2025. 1, 2, 6, 7

work page 2025
[14]

Dreamart: Generating interactable articulated objects from a single image.arXiv preprint arXiv:2507.05763, 2025

Ruijie Lu, Yu Liu, Jiaxiang Tang, Junfeng Ni, Yuxiang Wang, Diwen Wan, Gang Zeng, Yixin Chen, and Siyuan Huang. Dreamart: Generating interactable articulated ob- jects from a single image.arXiv preprint arXiv:2507.05763,

work page arXiv
[15]

Language segment-anything: Sam with text prompt.https://github.com/luca- medeiros/ lang-segment-anything, 2024

Luca Medeiros. Language segment-anything: Sam with text prompt.https://github.com/luca- medeiros/ lang-segment-anything, 2024. Accessed: 2025-08-

work page 2024
[16]

SDEdit: Guided image synthesis and editing with stochastic differential equa- tions

Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jia- jun Wu, Jun-Yan Zhu, and Stefano Ermon. SDEdit: Guided image synthesis and editing with stochastic differential equa- tions. InInternational Conference on Learning Representa- tions, 2022. 5

work page 2022
[17]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. InECCV, 2020. 2

work page 2020
[18]

completely blind

Anish Mittal, Rajiv Soundararajan, and Alan C. Bovik. Mak- ing a “completely blind” image quality analyzer.IEEE Sig- nal Processing Letters, 20(3):209–212, 2013. 3

work page 2013
[19]

A-sdf: Learning disentangled signed distance functions for articulated shape representation

Jiteng Mu, Weichao Qiu, Adam Kortylewski, Alan Yuille, Nuno Vasconcelos, and Xiaolong Wang. A-sdf: Learning disentangled signed distance functions for articulated shape representation. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 12981–12991,

work page
[20]

Barron, Ben Mildenhall, Mehdi S

Michael Niemeyer, Jonathan T. Barron, Ben Mildenhall, Mehdi S. M. Sajjadi, Andreas Geiger, and Noha Radwan. Regnerf: Regularizing neural radiance fields for view syn- thesis from sparse inputs. InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2022. 2

work page 2022
[21]

Coherentgs: Sparse novel view synthesis with coherent 9 3d gaussians

Avinash Paliwal, Wei Ye, Jinhui Xiong, Dmytro Kotovenko, Rakesh Ranjan, Vikas Chandra, and Nima Khademi Kalan- tari. Coherentgs: Sparse novel view synthesis with coherent 9 3d gaussians. InEuropean Conference on Computer Vision, pages 19–37. Springer, 2024. 1, 2, 4, 5, 6, 7

work page 2024
[22]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 4

work page 2022
[23]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020. 1

work page internal anchor Pith review Pith/arXiv arXiv 2010
[24]

Sparsenerf: Distilling depth ranking for few-shot novel view synthesis

Guangcong Wang, Zhaoxi Chen, Chen Change Loy, and Zi- wei Liu. Sparsenerf: Distilling depth ranking for few-shot novel view synthesis. InIEEE/CVF International Confer- ence on Computer Vision (ICCV), 2023. 2

work page 2023
[25]

Reartgs: Reconstructing and generating articulated objects via 3d gaussian splatting with geometric and motion constraints

Di Wu, Liu Liu, Zhou Linli, Anran Huang, Liangtu Song, Qiaojun Yu, Qi Wu, and Cewu Lu. Reartgs: Reconstructing and generating articulated objects via 3d gaussian splatting with geometric and motion constraints. InThe Thirty-ninth Annual Conference on Neural Information Processing Sys- tems, 2025. 1, 2, 6, 7

work page 2025
[26]

Sparse2dgs: Geometry-prioritized gaussian splatting for surface reconstruction from sparse views

Jiang Wu, Rui Li, Yu Zhu, Rong Guo, Jinqiu Sun, and Yan- ning Zhang. Sparse2dgs: Geometry-prioritized gaussian splatting for surface reconstruction from sparse views. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 11307–11316,

work page
[27]

Sapien: A simulated part-based interactive environment

Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, et al. Sapien: A simulated part-based interactive environment. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11097– 11107, 2020. 6

work page 2020
[28]

Supergs: Super-resolution 3d gaussian splatting via latent feature field and gradient-guided splitting.arXiv preprint arXiv:2410.02571, 1, 2024

Shiyun Xie, Zhiru Wang, Xu Wang, Yinghao Zhu, Cheng- wei Pan, and Xiwang Dong. Supergs: Super-resolution 3d gaussian splatting enhanced by variational residual fea- tures and uncertainty-augmented learning.arXiv preprint arXiv:2410.02571, 2024. 9

work page arXiv 2024
[29]

Reactive diffusion policy: Slow-fast visual-tactile policy learning for contact- rich manipulation.arXiv preprint arXiv:2503.02881, 2025

Yuhan Xie, Yixi Cai, Yinqiang Zhang, Lei Yang, and Jia Pan. Gauss-mi: Gaussian splatting shannon mutual in- formation for active 3d reconstruction.arXiv preprint arXiv:2503.02881, 2025. 3

work page arXiv 2025
[30]

Sparsegs: Real- time 360° sparse view synthesis using gaussian splatting,

Haolin Xiong, Sairisheek Muttukuru, Rishi Upadhyay, Pradyumna Chari, and Achuta Kadambi. Sparsegs: Real- time 360° sparse view synthesis using gaussian splatting,

work page
[31]

Gaussianob- ject: High-quality 3d object reconstruction from four views with gaussian splatting.ACM Transactions on Graphics,

Chen Yang, Sikuang Li, Jiemin Fang, Ruofan Liang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, and Qi Tian. Gaussianob- ject: High-quality 3d object reconstruction from four views with gaussian splatting.ACM Transactions on Graphics,

work page
[32]

Depth anything: Unleashing the power of large-scale unlabeled data

Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. Depth anything: Unleashing the power of large-scale unlabeled data. InCVPR, 2024. 4

work page 2024
[33]

Deepemd: Differentiable earth mover’s distance for few-shot learning.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 45(5):5632–5648, 2022

Chi Zhang, Yujun Cai, Guosheng Lin, and Chunhua Shen. Deepemd: Differentiable earth mover’s distance for few-shot learning.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 45(5):5632–5648, 2022. 6

work page 2022
[34]

Adding conditional control to text-to-image diffusion models, 2023

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models, 2023. 4

work page 2023
[35]

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InCVPR, 2018. 3

work page 2018
[36]

Fsgs: Real-time few-shot view synthesis using gaussian splatting, 2023

Zehao Zhu, Zhiwen Fan, Yifan Jiang, and Zhangyang Wang. Fsgs: Real-time few-shot view synthesis using gaussian splatting, 2023. 1, 2 10

work page 2023

[1] [1]

Pgsr: Planar-based gaussian splatting for efficient and high-fidelity surface reconstruction.arXiv preprint arXiv:2406.06521, 2024

Danpeng Chen, Hai Li, Weicai Ye, Yifan Wang, Weijian Xie, Shangjin Zhai, Nan Wang, Haomin Liu, Hujun Bao, and Guofeng Zhang. Pgsr: Planar-based gaussian splatting for efficient and high-fidelity surface reconstruction.arXiv preprint arXiv:2406.06521, 2024. 4, 5, 6, 7

work page arXiv 2024

[2] [2]

Gaussianeditor: Swift and controllable 3d editing with gaussian splatting, 2023

Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xiaofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huaping Liu, and Guosheng Lin. Gaussianeditor: Swift and controllable 3d editing with gaussian splatting, 2023. 5

work page 2023

[3] [3]

Articulatedgs: Self-supervised digital twin modeling of articulated objects using 3d gaussian splatting

Junfu Guo, Yu Xin, Gaoyi Liu, Kai Xu, Ligang Liu, and Ruizhen Hu. Articulatedgs: Self-supervised digital twin modeling of articulated objects using 3d gaussian splatting. arXiv preprint arXiv:2503.08135, 2025. 2

work page arXiv 2025

[4] [4]

LoRA: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InIn- ternational Conference on Learning Representations, 2022. 4

work page 2022

[5] [5]

Transparentgs: Fast inverse rendering of transpar- ent objects with gaussians.ACM Transactions on Graphics (TOG), 44(4):1–17, 2025

Letian Huang, Dongwei Ye, Jialin Dan, Chengzhi Tao, Hui- wen Liu, Kun Zhou, Bo Ren, Yuanqi Li, Yanwen Guo, and Jie Guo. Transparentgs: Fast inverse rendering of transpar- ent objects with gaussians.ACM Transactions on Graphics (TOG), 44(4):1–17, 2025. 9

work page 2025

[6] [6]

Spar3d: Stable point-aware re- construction of 3d objects from single images.arXiv preprint arXiv:2501.04689, 2025

Zixuan Huang, Mark Boss, Aaryaman Vasishta, James M Rehg, and Varun Jampani. Spar3d: Stable point-aware re- construction of 3d objects from single images.arXiv preprint arXiv:2501.04689, 2025. 2, 3

work page arXiv 2025

[7] [7]

Ditto: Building digital twins of articulated objects from interaction

Zhenyu Jiang, Cheng-Chun Hsu, and Yuke Zhu. Ditto: Building digital twins of articulated objects from interaction. InConference on Computer Vision and Pattern Recognition (CVPR), 2022. 1, 2

work page 2022

[8] [8]

3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023. 1, 2, 3, 4

work page 2023

[9] [9]

Segment Anything

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C. Berg, Wan-Yen Lo, Piotr Doll ´ar, and Ross Girshick. Segment anything.arXiv:2304.02643, 2023. 6

work page internal anchor Pith review Pith/arXiv arXiv 2023

[10] [10]

Non-rigid point cloud reg- istration with neural deformation pyramid.arXiv preprint arXiv:2205.12796, 2022

Yang Li and Tatsuya Harada. Non-rigid point cloud reg- istration with neural deformation pyramid.arXiv preprint arXiv:2205.12796, 2022. 4

work page arXiv 2022

[11] [11]

Paris: Part-level reconstruction and motion analysis for articulated objects

Jiayi Liu, Ali Mahdavi-Amiri, and Manolis Savva. Paris: Part-level reconstruction and motion analysis for articulated objects. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 352–363, 2023. 1, 2

work page 2023

[12] [12]

arXiv preprint arXiv:2410.16499 (2024)

Jiayi Liu, Denys Iliash, Angel X Chang, Manolis Savva, and Ali Mahdavi-Amiri. SINGAPO: Single image controlled generation of articulated parts in object.arXiv preprint arXiv:2410.16499, 2024. 2

work page arXiv 2024

[13] [13]

Building interactable replicas of complex articulated objects via gaussian splatting

Yu Liu, Baoxiong Jia, Ruijie Lu, Junfeng Ni, Song-Chun Zhu, and Siyuan Huang. Building interactable replicas of complex articulated objects via gaussian splatting. InThe Thirteenth International Conference on Learning Represen- tations, 2025. 1, 2, 6, 7

work page 2025

[14] [14]

Dreamart: Generating interactable articulated objects from a single image.arXiv preprint arXiv:2507.05763, 2025

Ruijie Lu, Yu Liu, Jiaxiang Tang, Junfeng Ni, Yuxiang Wang, Diwen Wan, Gang Zeng, Yixin Chen, and Siyuan Huang. Dreamart: Generating interactable articulated ob- jects from a single image.arXiv preprint arXiv:2507.05763,

work page arXiv

[15] [15]

Language segment-anything: Sam with text prompt.https://github.com/luca- medeiros/ lang-segment-anything, 2024

Luca Medeiros. Language segment-anything: Sam with text prompt.https://github.com/luca- medeiros/ lang-segment-anything, 2024. Accessed: 2025-08-

work page 2024

[16] [16]

SDEdit: Guided image synthesis and editing with stochastic differential equa- tions

Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jia- jun Wu, Jun-Yan Zhu, and Stefano Ermon. SDEdit: Guided image synthesis and editing with stochastic differential equa- tions. InInternational Conference on Learning Representa- tions, 2022. 5

work page 2022

[17] [17]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. InECCV, 2020. 2

work page 2020

[18] [18]

completely blind

Anish Mittal, Rajiv Soundararajan, and Alan C. Bovik. Mak- ing a “completely blind” image quality analyzer.IEEE Sig- nal Processing Letters, 20(3):209–212, 2013. 3

work page 2013

[19] [19]

A-sdf: Learning disentangled signed distance functions for articulated shape representation

Jiteng Mu, Weichao Qiu, Adam Kortylewski, Alan Yuille, Nuno Vasconcelos, and Xiaolong Wang. A-sdf: Learning disentangled signed distance functions for articulated shape representation. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 12981–12991,

work page

[20] [20]

Barron, Ben Mildenhall, Mehdi S

Michael Niemeyer, Jonathan T. Barron, Ben Mildenhall, Mehdi S. M. Sajjadi, Andreas Geiger, and Noha Radwan. Regnerf: Regularizing neural radiance fields for view syn- thesis from sparse inputs. InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2022. 2

work page 2022

[21] [21]

Coherentgs: Sparse novel view synthesis with coherent 9 3d gaussians

Avinash Paliwal, Wei Ye, Jinhui Xiong, Dmytro Kotovenko, Rakesh Ranjan, Vikas Chandra, and Nima Khademi Kalan- tari. Coherentgs: Sparse novel view synthesis with coherent 9 3d gaussians. InEuropean Conference on Computer Vision, pages 19–37. Springer, 2024. 1, 2, 4, 5, 6, 7

work page 2024

[22] [22]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 4

work page 2022

[23] [23]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020. 1

work page internal anchor Pith review Pith/arXiv arXiv 2010

[24] [24]

Sparsenerf: Distilling depth ranking for few-shot novel view synthesis

Guangcong Wang, Zhaoxi Chen, Chen Change Loy, and Zi- wei Liu. Sparsenerf: Distilling depth ranking for few-shot novel view synthesis. InIEEE/CVF International Confer- ence on Computer Vision (ICCV), 2023. 2

work page 2023

[25] [25]

Reartgs: Reconstructing and generating articulated objects via 3d gaussian splatting with geometric and motion constraints

Di Wu, Liu Liu, Zhou Linli, Anran Huang, Liangtu Song, Qiaojun Yu, Qi Wu, and Cewu Lu. Reartgs: Reconstructing and generating articulated objects via 3d gaussian splatting with geometric and motion constraints. InThe Thirty-ninth Annual Conference on Neural Information Processing Sys- tems, 2025. 1, 2, 6, 7

work page 2025

[26] [26]

Sparse2dgs: Geometry-prioritized gaussian splatting for surface reconstruction from sparse views

Jiang Wu, Rui Li, Yu Zhu, Rong Guo, Jinqiu Sun, and Yan- ning Zhang. Sparse2dgs: Geometry-prioritized gaussian splatting for surface reconstruction from sparse views. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 11307–11316,

work page

[27] [27]

Sapien: A simulated part-based interactive environment

Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, et al. Sapien: A simulated part-based interactive environment. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11097– 11107, 2020. 6

work page 2020

[28] [28]

Supergs: Super-resolution 3d gaussian splatting via latent feature field and gradient-guided splitting.arXiv preprint arXiv:2410.02571, 1, 2024

Shiyun Xie, Zhiru Wang, Xu Wang, Yinghao Zhu, Cheng- wei Pan, and Xiwang Dong. Supergs: Super-resolution 3d gaussian splatting enhanced by variational residual fea- tures and uncertainty-augmented learning.arXiv preprint arXiv:2410.02571, 2024. 9

work page arXiv 2024

[29] [29]

Reactive diffusion policy: Slow-fast visual-tactile policy learning for contact- rich manipulation.arXiv preprint arXiv:2503.02881, 2025

Yuhan Xie, Yixi Cai, Yinqiang Zhang, Lei Yang, and Jia Pan. Gauss-mi: Gaussian splatting shannon mutual in- formation for active 3d reconstruction.arXiv preprint arXiv:2503.02881, 2025. 3

work page arXiv 2025

[30] [30]

Sparsegs: Real- time 360° sparse view synthesis using gaussian splatting,

Haolin Xiong, Sairisheek Muttukuru, Rishi Upadhyay, Pradyumna Chari, and Achuta Kadambi. Sparsegs: Real- time 360° sparse view synthesis using gaussian splatting,

work page

[31] [31]

Gaussianob- ject: High-quality 3d object reconstruction from four views with gaussian splatting.ACM Transactions on Graphics,

Chen Yang, Sikuang Li, Jiemin Fang, Ruofan Liang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, and Qi Tian. Gaussianob- ject: High-quality 3d object reconstruction from four views with gaussian splatting.ACM Transactions on Graphics,

work page

[32] [32]

Depth anything: Unleashing the power of large-scale unlabeled data

Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. Depth anything: Unleashing the power of large-scale unlabeled data. InCVPR, 2024. 4

work page 2024

[33] [33]

Deepemd: Differentiable earth mover’s distance for few-shot learning.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 45(5):5632–5648, 2022

Chi Zhang, Yujun Cai, Guosheng Lin, and Chunhua Shen. Deepemd: Differentiable earth mover’s distance for few-shot learning.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 45(5):5632–5648, 2022. 6

work page 2022

[34] [34]

Adding conditional control to text-to-image diffusion models, 2023

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models, 2023. 4

work page 2023

[35] [35]

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InCVPR, 2018. 3

work page 2018

[36] [36]

Fsgs: Real-time few-shot view synthesis using gaussian splatting, 2023

Zehao Zhu, Zhiwen Fan, Yifan Jiang, and Zhangyang Wang. Fsgs: Real-time few-shot view synthesis using gaussian splatting, 2023. 1, 2 10

work page 2023