SV-GS: Sparse View 4D Reconstruction with Skeleton-Driven Gaussian Splatting
Pith reviewed 2026-05-16 18:10 UTC · model grok-4.3
The pith
A skeleton-driven deformation field lets Gaussian splatting reconstruct moving objects accurately from sparse camera views over time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SV-GS simultaneously estimates a deformation model and the object's motion over time under sparse observations by optimizing a skeleton-driven deformation field composed of a coarse skeleton joint pose estimator and a module for fine-grained deformations. By making only the joint pose estimator time-dependent, the model enables smooth motion interpolation while preserving learned geometric details.
What carries the argument
Skeleton-driven deformation field consisting of a time-dependent coarse joint pose estimator and a time-independent fine-grained deformation module that guides Gaussian splatting optimization.
If this is right
- Outperforms existing sparse-observation methods by up to 34 percent PSNR on synthetic datasets.
- Matches performance of dense monocular video methods on real-world data while using significantly fewer frames.
- The initial static reconstruction input can be replaced by a diffusion-based generative prior for greater practicality.
Where Pith is reading between the lines
- The coarse-fine separation may generalize to other dynamic representations such as neural radiance fields or meshes.
- Existing surveillance camera networks could supply the sparse inputs needed for 4D reconstruction without new dense capture hardware.
- Fully automatic skeleton extraction could remove the remaining manual initialization step.
Load-bearing premise
A rough skeleton graph and initial static reconstruction are available to guide motion estimation under otherwise ill-posed sparse observations.
What would settle it
Run the method on a synthetic dynamic sequence with ground-truth 4D data but supply an intentionally inaccurate or missing skeleton graph and measure whether PSNR falls below competing skeleton-free baselines.
Figures
read the original abstract
Reconstructing a dynamic target moving over a large area is challenging. Standard approaches for dynamic object reconstruction require dense coverage in both the viewing space and the temporal dimension, typically relying on multi-view videos captured at each time step. However, such setups are only possible in constrained environments. In real-world scenarios, observations are often sparse over time and captured sparsely from diverse viewpoints (e.g., from security cameras), making dynamic reconstruction highly ill-posed. We present SV-GS, a framework that simultaneously estimates a deformation model and the object's motion over time under sparse observations. To initialize SV-GS, we leverage a rough skeleton graph and an initial static reconstruction as inputs to guide motion estimation. (Later, we show that this input requirement can be relaxed.) Our method optimizes a skeleton-driven deformation field composed of a coarse skeleton joint pose estimator and a module for fine-grained deformations. By making only the joint pose estimator time-dependent, our model enables smooth motion interpolation while preserving learned geometric details. Experiments on synthetic datasets show that our method outperforms existing approaches under sparse observations by up to 34% in PSNR, and achieves comparable performance to dense monocular video methods on real-world datasets despite using significantly fewer frames. Moreover, we demonstrate that the input initial static reconstruction can be replaced by a diffusion-based generative prior, making our method more practical for real-world scenarios.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes SV-GS, a framework for 4D dynamic reconstruction from sparse multi-view observations. It initializes with a rough skeleton graph plus static reconstruction (later relaxable to a diffusion prior), then optimizes a skeleton-driven Gaussian Splatting deformation model that separates time-dependent coarse joint-pose estimation from static fine-grained deformations. The central empirical claim is up to 34% PSNR improvement over prior methods on synthetic sparse-view data and performance parity with dense monocular video baselines on real data despite using far fewer frames.
Significance. If the quantitative gains and the diffusion-prior relaxation hold under rigorous controls, the work would meaningfully advance practical 4D reconstruction outside controlled capture rigs. The separation of coarse time-dependent pose from static detail is a clean modeling choice that could generalize to other sparse dynamic settings.
major comments (2)
- [Abstract / Experiments] Abstract and Experiments section: the headline 34% PSNR gain is reported without error bars, per-scene variance, or ablation tables isolating the skeleton-graph contribution from the deformation field. Because the joint-pose stage is anchored by the skeleton input, the absence of an ablation that replaces the provided skeleton with automatic pose estimation on identical sparse inputs leaves open whether the reported numbers reflect an oracle initialization rather than a fully automatic pipeline.
- [Methods] Methods: the deformation model is defined as the sum of a time-dependent coarse joint-pose estimator and a static fine-grained module. No derivation or loss-term analysis is supplied showing that this split is necessary for stability under the stated sparsity levels; an ablation that makes the fine module also time-dependent (or removes the skeleton anchor entirely) would be required to substantiate the modeling claim.
minor comments (2)
- [Abstract] Abstract: the phrase 'significantly fewer frames' should be quantified (exact frame counts and view counts for both SV-GS and the dense monocular baselines).
- [Experiments] The diffusion-prior relaxation is mentioned only qualitatively; a short table comparing PSNR when the static initialization is replaced by the generative prior versus the provided static mesh would strengthen the practicality claim.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and commit to revisions that strengthen the empirical presentation and modeling justification without altering the core claims.
read point-by-point responses
-
Referee: [Abstract / Experiments] Abstract and Experiments section: the headline 34% PSNR gain is reported without error bars, per-scene variance, or ablation tables isolating the skeleton-graph contribution from the deformation field. Because the joint-pose stage is anchored by the skeleton input, the absence of an ablation that replaces the provided skeleton with automatic pose estimation on identical sparse inputs leaves open whether the reported numbers reflect an oracle initialization rather than a fully automatic pipeline.
Authors: We agree that error bars, per-scene variance, and targeted ablations would strengthen the results section. In the revision we will add standard error bars to all quantitative tables, report per-scene PSNR values, and include a new ablation that substitutes the provided skeleton graph with an off-the-shelf automatic pose estimator (e.g., a recent monocular 3D pose method) while keeping all other inputs and sparsity levels identical. This will clarify that the reported gains are not solely attributable to oracle skeleton initialization. The diffusion-prior relaxation already demonstrated in the manuscript applies to the static reconstruction; the new ablation will extend the same spirit to the skeleton component. revision: yes
-
Referee: [Methods] Methods: the deformation model is defined as the sum of a time-dependent coarse joint-pose estimator and a static fine-grained module. No derivation or loss-term analysis is supplied showing that this split is necessary for stability under the stated sparsity levels; an ablation that makes the fine module also time-dependent (or removes the skeleton anchor entirely) would be required to substantiate the modeling claim.
Authors: We will expand the Methods section with a short derivation of the composite deformation field and an analysis of the loss terms that shows why anchoring only the coarse joint-pose stage to time-dependent parameters improves stability under the sparsity regimes considered. We will also add two new ablations: (1) making the fine-grained module time-dependent as well, and (2) removing the skeleton anchor entirely (relying solely on the diffusion prior). These experiments will be reported with the same metrics and sparsity settings used in the main tables, allowing direct comparison to the original split. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper describes an optimization-based framework that takes a rough skeleton graph and initial static reconstruction (or diffusion prior) as explicit inputs to initialize and guide motion estimation under sparse views. Performance claims rest on empirical results from synthetic and real datasets rather than any closed-form derivation or prediction that reduces by construction to fitted parameters. No self-definitional steps, fitted inputs renamed as predictions, load-bearing self-citations, or ansatz smuggling appear in the provided text; the approach is a standard engineering pipeline validated externally and therefore self-contained.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Score distillation sampling with learned manifold cor- rective
Thiemo Alldieck, Nikos Kolotouros, and Cristian Sminchis- escu. Score distillation sampling with learned manifold cor- rective. InEuropean Conference on Computer Vision, pages 1–18, 2024. 8
work page 2024
-
[2]
Jason Ansel, Edward Yang, Horace He, Natalia Gimelshein, Animesh Jain, Michael V oznesensky, Bin Bao, Peter Bell, David Berard, Evgeni Burovski, Geeta Chauhan, An- jali Chourdia, Will Constable, Alban Desmaison, Zachary DeVito, Elias Ellison, Will Feng, Jiong Gong, Michael Gschwind, Brian Hirsh, Sherlock Huang, Kshiteej Kalam- barkar, Laurent Kirsch, Mich...
work page 2024
-
[3]
4d-fy: Text-to-4d generation using hybrid score dis- tillation sampling
Sherwin Bahmani, Ivan Skorokhodov, Victor Rong, Gordon Wetzstein, Leonidas Guibas, Peter Wonka, Sergey Tulyakov, Jeong Joon Park, Andrea Tagliasacchi, and David B Lin- dell. 4d-fy: Text-to-4d generation using hybrid score dis- tillation sampling. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 7996–8006, 2024. 3
work page 2024
-
[4]
Michael Broxton, John Flynn, Ryan Overbeck, Daniel Erick- son, Peter Hedman, Matthew Duvall, Jason Dourgarian, Jay Busch, Matt Whalen, and Paul Debevec. Immersive light field video with a layered mesh representation.ACM Trans- actions on Graphics (TOG), 39(4):86–1, 2020. 2
work page 2020
-
[5]
Hexplane: A fast representa- tion for dynamic scenes
Ang Cao and Justin Johnson. Hexplane: A fast representa- tion for dynamic scenes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 130–141, 2023. 1, 2
work page 2023
-
[6]
Jun-Jee Chao, Qingyuan Jiang, and V olkan Isler. Part seg- mentation and motion estimation for articulated objects with dynamic 3d gaussians.arXiv preprint arXiv:2506.22718,
-
[7]
A kinematic no- tation for lower-pair mechanisms based on matrices
Jacques Denavit and Richard S Hartenberg. A kinematic no- tation for lower-pair mechanisms based on matrices. 1955. 4
work page 1955
-
[8]
Mingsong Dou, Sameh Khamis, Yury Degtyarev, Philip Davidson, Sean Ryan Fanello, Adarsh Kowdle, Sergio Orts Escolano, Christoph Rhemann, David Kim, Jonathan Taylor, et al. Fusion4d: Real-time performance capture of challeng- ing scenes.ACM Transactions on Graphics (ToG), 35(4): 1–13, 2016. 2
work page 2016
-
[9]
K-planes: Explicit radiance fields in space, time, and appearance
Sara Fridovich-Keil, Giacomo Meanti, Frederik Rahbæk Warburg, Benjamin Recht, and Angjoo Kanazawa. K-planes: Explicit radiance fields in space, time, and appearance. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 12479–12488, 2023. 2
work page 2023
-
[10]
Hang Gao, Ruilong Li, Shubham Tulsiani, Bryan Russell, and Angjoo Kanazawa. Monocular dynamic view synthesis: A reality check.Advances in Neural Information Processing Systems, 35:33768–33780, 2022. 2
work page 2022
-
[11]
Forward flow for novel view synthesis of dynamic scenes
Xiang Guo, Jiadai Sun, Yuchao Dai, Guanying Chen, Xiao- qing Ye, Xiao Tan, Errui Ding, Yumeng Zhang, and Jingdong Wang. Forward flow for novel view synthesis of dynamic scenes. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 16022–16033, 2023. 2
work page 2023
-
[12]
Sc-gs: Sparse-controlled gaussian splatting for editable dynamic scenes
Yi-Hua Huang, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei Cao, and Xiaojuan Qi. Sc-gs: Sparse-controlled gaussian splatting for editable dynamic scenes. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4220–4230, 2024. 1, 2, 3, 6
work page 2024
-
[13]
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023. 2, 3, 5
work page 2023
-
[14]
Dynmf: Neural motion factorization for real-time dynamic view synthesis with 3d gaussian splatting
Agelos Kratimenos, Jiahui Lei, and Kostas Daniilidis. Dynmf: Neural motion factorization for real-time dynamic view synthesis with 3d gaussian splatting. InEuropean Con- ference on Computer Vision, pages 252–269. Springer, 2024. 2
work page 2024
-
[15]
Gart: Gaussian articulated template mod- els
Jiahui Lei, Yufu Wang, Georgios Pavlakos, Lingjie Liu, and Kostas Daniilidis. Gart: Gaussian articulated template mod- els. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 19876–19887,
-
[16]
Articulated kinematics distillation from video diffusion models
Xuan Li, Qianli Ma, Tsung-Yi Lin, Yongxin Chen, Chen- fanfu Jiang, Ming-Yu Liu, and Donglai Xiang. Articulated kinematics distillation from video diffusion models. InPro- ceedings of the Computer Vision and Pattern Recognition Conference, pages 17571–17581, 2025. 3, 4
work page 2025
-
[17]
Neural scene flow fields for space-time view synthesis of dy- namic scenes
Zhengqi Li, Simon Niklaus, Noah Snavely, and Oliver Wang. Neural scene flow fields for space-time view synthesis of dy- namic scenes. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6498– 6508, 2021. 2
work page 2021
-
[18]
Align your gaussians: Text-to-4d with dynamic 3d gaussians and composed diffusion models
Huan Ling, Seung Wook Kim, Antonio Torralba, Sanja Fi- dler, and Karsten Kreis. Align your gaussians: Text-to-4d with dynamic 3d gaussians and composed diffusion models. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 8576–8588, 2024. 3
work page 2024
-
[19]
Lepard: Learning explicit part dis- covery for 3d articulated shape reconstruction
Di Liu, Anastasis Stathopoulos, Qilong Zhangli, Yunhe Gao, and Dimitris Metaxas. Lepard: Learning explicit part dis- covery for 3d articulated shape reconstruction. InAdvances in Neural Information Processing Systems, pages 54187– 54198. Curran Associates, Inc., 2023. 3
work page 2023
-
[20]
Dynamic gaus- sians mesh: Consistent mesh reconstruction from dynamic scenes
Isabella Liu, Hao Su, and Xiaolong Wang. Dynamic gaus- sians mesh: Consistent mesh reconstruction from dynamic scenes. InThe Thirteenth International Conference on Learning Representations, 2025. 6, 7
work page 2025
-
[21]
Riganything: Template-free autoregressive rigging for diverse 3d assets
Isabella Liu, Zhan Xu, Yifan Wang, Hao Tan, Zexiang Xu, Xiaolong Wang, Hao Su, and Zifan Shi. Riganything: Template-free autoregressive rigging for diverse 3d assets. ACM Transactions on Graphics (TOG), 44(4):1–12, 2025. 3
work page 2025
-
[22]
Minghua Liu, Ruoxi Shi, Linghao Chen, Zhuoyang Zhang, Chao Xu, Xinyue Wei, Hansheng Chen, Chong Zeng, Ji- 9 ayuan Gu, and Hao Su. One-2-3-45++: Fast single image to 3d objects with consistent multi-view generation and 3d dif- fusion. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 10072–10083,
-
[23]
MoDGS: Dy- namic gaussian splatting from casually-captured monocular videos with depth priors
Qingming LIU, Yuan Liu, Jiepeng Wang, Xianqiang Lyu, Peng Wang, Wenping Wang, and Junhui Hou. MoDGS: Dy- namic gaussian splatting from casually-captured monocular videos with depth priors. InThe Thirteenth International Conference on Learning Representations, 2025. 2
work page 2025
-
[24]
Zero-1-to- 3: Zero-shot one image to 3d object
Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tok- makov, Sergey Zakharov, and Carl V ondrick. Zero-1-to- 3: Zero-shot one image to 3d object. InProceedings of the IEEE/CVF international conference on computer vision, pages 9298–9309, 2023. 2, 3, 7
work page 2023
-
[25]
Build- ing rearticulable models for arbitrary 3d objects from 4d point clouds
Shaowei Liu, Saurabh Gupta, and Shenlong Wang. Build- ing rearticulable models for arbitrary 3d objects from 4d point clouds. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21138– 21147, 2023. 1
work page 2023
-
[26]
Neural vol- umes: learning dynamic renderable volumes from images
Stephen Lombardi, Tomas Simon, Jason Saragih, Gabriel Schwartz, Andreas Lehrmann, and Yaser Sheikh. Neural vol- umes: learning dynamic renderable volumes from images. ACM Transactions on Graphics (TOG), 38(4):1–14, 2019. 2
work page 2019
-
[27]
Matthew Loper, Naureen Mahmood, Javier Romero, Ger- ard Pons-Moll, and Michael J. Black. SMPL: A skinned multi-person linear model.ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, 2015. 3
work page 2015
-
[28]
Dynamic 3d gaussians: Tracking by per- sistent dynamic view synthesis
Jonathon Luiten, Georgios Kopanas, Bastian Leibe, and Deva Ramanan. Dynamic 3d gaussians: Tracking by per- sistent dynamic view synthesis. In2024 International Con- ference on 3D Vision (3DV), pages 800–809. IEEE, 2024. 2
work page 2024
-
[29]
Artem Lukoianov, Haitz S’aez de Oc’ariz Borde, Kristjan Greenewald, Vitor Guizilini, Timur Bagautdinov, Vincent Sitzmann, and Justin M Solomon. Score distillation via reparametrized ddim.Advances in Neural Information Pro- cessing Systems, 37:26011–26044, 2024. 8
work page 2024
-
[30]
Joint-dependent local deformations for hand an- imation and object grasping
Nadia Magnenat-Thalmann, Richard Laperri `ere, and Daniel Thalmann. Joint-dependent local deformations for hand an- imation and object grasping. InProceedings on Graphics interface’88, pages 26–33, 1989. 4
work page 1989
-
[31]
David McAllister, Songwei Ge, Jia-Bin Huang, David W. Jacobs, Alexei A. Efros, Aleksander Holynski, and Angjoo Kanazawa. Rethinking score distillation as a bridge between image distributions. InAdvances in Neural Information Pro- cessing Systems, 2024. 8
work page 2024
-
[32]
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021. 2, 4
work page 2021
-
[33]
Watch it move: Unsupervised discovery of 3d joints for re-posing of articulated objects
Atsuhiro Noguchi, Umar Iqbal, Jonathan Tremblay, Tatsuya Harada, and Orazio Gallo. Watch it move: Unsupervised discovery of 3d joints for re-posing of articulated objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3677–3687, 2022. 3
work page 2022
-
[34]
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InProceedings of the 3rd In- ternational Conference on Learning Representations (ICLR 2015), 2015. 6
work page 2015
-
[35]
Nerfies: Deformable neural radiance fields
Keunhong Park, Utkarsh Sinha, Jonathan T Barron, Sofien Bouaziz, Dan B Goldman, Steven M Seitz, and Ricardo Martin-Brualla. Nerfies: Deformable neural radiance fields. InProceedings of the IEEE/CVF international conference on computer vision, pages 5865–5874, 2021. 2
work page 2021
-
[36]
Keunhong Park, Utkarsh Sinha, Peter Hedman, Jonathan T Barron, Sofien Bouaziz, Dan B Goldman, Ricardo Martin- Brualla, and Steven M Seitz. Hypernerf: a higher- dimensional representation for topologically varying neural radiance fields.ACM Transactions on Graphics (TOG), 40 (6):1–12, 2021. 2
work page 2021
-
[37]
Sida Peng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9054–9063, 2021. 6, 7
work page 2021
-
[38]
A benchmark dataset and evaluation methodology for video object segmentation
Federico Perazzi, Jordi Pont-Tuset, Brian McWilliams, Luc Van Gool, Markus Gross, and Alexander Sorkine- Hornung. A benchmark dataset and evaluation methodology for video object segmentation. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 6, 7, 8
work page 2016
-
[39]
Dreamfusion: Text-to-3d using 2d diffusion
Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Milden- hall. Dreamfusion: Text-to-3d using 2d diffusion. InThe Eleventh International Conference on Learning Representa- tions, 2022. 3, 7
work page 2022
-
[40]
D-nerf: Neural radiance fields for dynamic scenes
Albert Pumarola, Enric Corona, Gerard Pons-Moll, and Francesc Moreno-Noguer. D-nerf: Neural radiance fields for dynamic scenes. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 10318–10327, 2021. 1, 2, 5, 6, 7
work page 2021
-
[41]
Javier Romero, Dimitris Tzionas, and Michael J Black. Em- bodied hands: Modeling and capturing hands and bodies to- gether.ACM Transactions on Graphics, 36(6), 2017. 3
work page 2017
-
[42]
Structure-from-motion revisited
Johannes Lutz Sch ¨onberger and Jan-Michael Frahm. Structure-from-motion revisited. InConference on Com- puter Vision and Pattern Recognition (CVPR), 2016. 2
work page 2016
-
[43]
Pixelwise view selection for un- structured multi-view stereo
Johannes Lutz Sch ¨onberger, Enliang Zheng, Marc Pollefeys, and Jan-Michael Frahm. Pixelwise view selection for un- structured multi-view stereo. InEuropean Conference on Computer Vision (ECCV), 2016. 2
work page 2016
-
[44]
Tensor4d: Efficient neural 4d decomposition for high-fidelity dynamic reconstruction and rendering
Ruizhi Shao, Zerong Zheng, Hanzhang Tu, Boning Liu, Hongwen Zhang, and Yebin Liu. Tensor4d: Efficient neural 4d decomposition for high-fidelity dynamic reconstruction and rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16632– 16642, 2023. 2
work page 2023
-
[45]
MVDream: Multi-view Diffusion for 3D Generation
Yichun Shi, Peng Wang, Jianglong Ye, Mai Long, Kejie Li, and Xiao Yang. Mvdream: Multi-view diffusion for 3d gen- eration.arXiv preprint arXiv:2308.16512, 2023. 2, 3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[46]
Denoising Diffusion Implicit Models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020. 7 10
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[47]
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation
Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, and Gang Zeng. Dreamgaussian: Generative gaussian splatting for effi- cient 3d content creation.arXiv preprint arXiv:2309.16653,
work page internal anchor Pith review Pith/arXiv arXiv
-
[48]
Edgar Tretschk, Ayush Tewari, Vladislav Golyanik, Michael Zollh¨ofer, Christoph Lassner, and Christian Theobalt. Non- rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video. InPro- ceedings of the IEEE/CVF international conference on com- puter vision, pages 12959–12970, 2021. 2
work page 2021
-
[49]
Lukas Uzolas, Elmar Eisemann, and Petr Kellnhofer. Template-free articulated neural point clouds for reposable view synthesis.Advances in Neural Information Processing Systems, 36:31621–31637, 2023. 1, 4, 7
work page 2023
-
[50]
Superpoint gaussian splatting for real-time high-fidelity dynamic scene recon- struction
Diwen Wan, Ruijie Lu, and Gang Zeng. Superpoint gaussian splatting for real-time high-fidelity dynamic scene recon- struction. InInternational Conference on Machine Learning, pages 49957–49972. PMLR, 2024. 2, 3
work page 2024
-
[51]
Diwen Wan, Yuxiang Wang, Ruijie Lu, and Gang Zeng. Template-free articulated gaussian splatting for real-time re- posable dynamic view synthesis.Advances in Neural Infor- mation Processing Systems, 37:62000–62023, 2024. 1, 3, 5, 6, 7
work page 2024
-
[52]
Root pose decomposition towards generic non-rigid 3d re- construction with monocular videos
Yikai Wang, Yinpeng Dong, Fuchun Sun, and Xiao Yang. Root pose decomposition towards generic non-rigid 3d re- construction with monocular videos. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 13890–13900, 2023. 1
work page 2023
-
[53]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Si- moncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004. 6
work page 2004
-
[54]
4d gaussian splatting for real-time dynamic scene render- ing
Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 4d gaussian splatting for real-time dynamic scene render- ing. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 20310– 20320, 2024. 1, 2, 5, 6, 7
work page 2024
-
[55]
Magicpony: Learning ar- ticulated 3d animals in the wild
Shangzhe Wu, Ruining Li, Tomas Jakab, Christian Rup- precht, and Andrea Vedaldi. Magicpony: Learning ar- ticulated 3d animals in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8792–8802, 2023. 3
work page 2023
-
[56]
CASA: Category-agnostic skeletal an- imal reconstruction
Yuefan Wu*, Zeyuan Chen*, Shaowei Liu, Zhongzheng Ren, and Shenlong Wang. CASA: Category-agnostic skeletal an- imal reconstruction. InNeural Information Processing Sys- tems (NeurIPS), 2022. 3
work page 2022
-
[57]
Space-time neural irradiance fields for free-viewpoint video
Wenqi Xian, Jia-Bin Huang, Johannes Kopf, and Changil Kim. Space-time neural irradiance fields for free-viewpoint video. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9421–9431,
-
[58]
Comp4d: Llm-guided compositional 4d scene generation
Dejia Xu, Hanwen Liang, Neel P Bhatt, Hezhen Hu, Hanxue Liang, Konstantinos N Plataniotis, and Zhangyang Wang. Comp4d: Llm-guided compositional 4d scene generation. arXiv preprint arXiv:2403.16993, 2024. 3
-
[59]
Rignet: Neural rigging for articu- lated characters.ACM Trans
Zhan Xu, Yang Zhou, Evangelos Kalogerakis, Chris Lan- dreth, and Karan Singh. Rignet: Neural rigging for articu- lated characters.ACM Trans. on Graphics, 39, 2020. 3
work page 2020
-
[60]
Jinbo Yan, Rui Peng, Zhiyan Wang, Luyang Tang, Jiayu Yang, Jie Liang, Jiahao Wu, and Ronggang Wang. Instant gaussian stream: Fast and generalizable streaming of dy- namic scene reconstruction via gaussian splatting. InPro- ceedings of the Computer Vision and Pattern Recognition Conference, pages 16520–16531, 2025. 1
work page 2025
-
[61]
Banmo: Building animatable 3d neural models from many casual videos
Gengshan Yang, Minh V o, Natalia Neverova, Deva Ra- manan, Andrea Vedaldi, and Hanbyul Joo. Banmo: Building animatable 3d neural models from many casual videos. In CVPR, 2022. 3, 4
work page 2022
-
[62]
Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction
Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, and Xiaogang Jin. Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. In 2024 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 20331–20341. IEEE, 2024. 2
work page 2024
-
[63]
Lassie: Learning articulated shape from sparse image ensemble via 3d part discovery
Chun-Han Yao, Wei-Chih Hung, Yuanzhen Li, Michael Ru- binstein, Ming-Hsuan Yang, and Varun Jampani. Lassie: Learning articulated shape from sparse image ensemble via 3d part discovery. InNeurIPS, 2022. 3
work page 2022
-
[64]
Riggs: Rigging of 3d gaussians for modeling articulated objects in videos
Yuxin Yao, Zhi Deng, and Junhui Hou. Riggs: Rigging of 3d gaussians for modeling articulated objects in videos. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 5592–5601, 2025. 1, 3, 4, 5, 6, 7
work page 2025
-
[65]
Stag4d: Spatial-temporal anchored generative 4d gaussians
Yifei Zeng, Yanqin Jiang, Siyu Zhu, Yuanxun Lu, Youtian Lin, Hao Zhu, Weiming Hu, Xun Cao, and Yao Yao. Stag4d: Spatial-temporal anchored generative 4d gaussians. 2024. 3
work page 2024
-
[66]
The unreasonable effectiveness of deep features as a perceptual metric
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 586–595, 2018. 6
work page 2018
-
[67]
Bags: Building animatable gaussian splat- ting from a monocular video with diffusion priors, 2024
Tingyang Zhang, Qingzhe Gao, Weiyu Li, Libin Liu, and Baoquan Chen. Bags: Building animatable gaussian splat- ting from a monocular video with diffusion priors, 2024. 3
work page 2024
-
[68]
Animate124: Animating one im- age to 4d dynamic scene.arXiv preprint arXiv:2311.14603,
Yuyang Zhao, Zhiwen Yan, Enze Xie, Lanqing Hong, Zhen- guo Li, and Gim Hee Lee. Animate124: Animating one im- age to 4d dynamic scene.arXiv preprint arXiv:2311.14603,
-
[69]
A unified approach for text- and image-guided 4d scene generation
Yufeng Zheng, Xueting Li, Koki Nagano, Sifei Liu, Otmar Hilliges, and Shalini De Mello. A unified approach for text- and image-guided 4d scene generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7300–7309, 2024. 3
work page 2024
-
[70]
3d menagerie: Modeling the 3d shape and pose of animals
Silvia Zuffi, Angjoo Kanazawa, David W Jacobs, and Michael J Black. 3d menagerie: Modeling the 3d shape and pose of animals. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 6365–6373,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.