PAOLI: Pose-free Articulated Object Learning from Sparse-view Images
Pith reviewed 2026-05-18 18:41 UTC · model grok-4.3
The pith
A method learns accurate 3D models of articulated objects from just four sparse unposed images per articulation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present a methodology to model articulated objects using a sparse set of images with unknown poses. Our central insight is to first solve a robust correspondence and alignment problem between unaligned reconstructions, before part motions can be analyzed. We first reconstruct each articulation independently using recent advances in sparse-view 3D reconstruction, then learn a deformation field that establishes dense correspondences across poses. A progressive disentanglement strategy further separates static from moving parts, enabling robust separation of camera and object motion. Finally, we optimize geometry, appearance, and kinematics jointly with a self-supervised loss that enforces跨跨
What carries the argument
A learned deformation field that aligns independent sparse-view reconstructions across articulations and supports progressive disentanglement of static and moving parts.
If this is right
- Articulated objects can be represented accurately with as few as four views per articulation and no camera supervision.
- Independent per-pose reconstructions can be aligned without external pose information to separate static and moving components.
- Joint optimization of geometry, appearance, and kinematics succeeds when driven only by cross-view and cross-pose consistency losses.
- The resulting models remain detailed on both standard benchmarks and real-world captured objects under the weaker input conditions.
Where Pith is reading between the lines
- Casual multi-view photography of moving objects could replace controlled studio capture for many 3D modeling tasks.
- The alignment-first strategy may extend to other problems involving unposed image sets, such as scene reconstruction with moving elements.
- If the deformation field proves stable on even sparser inputs, the approach could scale to video sequences with unknown camera motion.
Load-bearing premise
Independent sparse-view reconstructions of each articulation can be robustly aligned and disentangled into static and moving parts via a learned deformation field without any ground-truth poses or dense observations.
What would settle it
Apply the method to a benchmark with known ground-truth poses and part motions; if the output 3D models show reconstruction or motion errors comparable to or higher than pose-supervised baselines, the central claim would be falsified.
Figures
read the original abstract
We present a methodology to model articulated objects using a sparse set of images with unknown poses. Current methods require dense multi-view observations and ground-truth camera poses. Our approach operates with as few as four views per articulation and no camera supervision. Our central insight is to first solve a robust correspondence and alignment problem between unaligned reconstructions, before part motions can be analyzed. We first reconstruct each articulation independently using recent advances in sparse-view 3D reconstruction, then learn a deformation field that establishes dense correspondences across poses. A progressive disentanglement strategy further separates static from moving parts, enabling robust separation of camera and object motion. Finally, we optimize geometry, appearance, and kinematics jointly with a self-supervised loss that enforces cross-view and cross-pose consistency. Experiments on the standard benchmark and real-world examples demonstrate that our method produces accurate and detailed articulated object representations under significantly weaker input assumptions than existing approaches.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents PAOLI, a method to model articulated objects from sparse-view images with unknown camera poses. It reconstructs each articulation independently via recent sparse-view 3D techniques, learns a deformation field to establish dense correspondences across poses, applies a progressive disentanglement strategy to separate static and moving parts, and jointly optimizes geometry, appearance, and kinematics under a self-supervised cross-view/cross-pose consistency loss. The central claim is that the pipeline produces accurate articulated representations using as few as four views per articulation and no camera supervision, outperforming prior work that requires denser observations and ground-truth poses.
Significance. If the disentanglement and alignment steps prove robust, the work would be a meaningful advance in articulated object reconstruction by relaxing the strong input assumptions of dense multi-view capture and known poses. Building directly on recent sparse-view reconstruction advances and self-supervised losses is a practical strength; successful validation would enable more accessible data collection for robotics and AR applications. The explicit separation of alignment from motion analysis is a clear conceptual contribution.
major comments (2)
- [§3] §3 (Method), progressive disentanglement paragraph and associated loss formulation: the manuscript does not specify an explicit mechanism (e.g., regularization term, initialization strategy, or architectural bias) that prevents the learned deformation field from absorbing camera motion into object motion when each 4-view reconstruction carries large depth/pose ambiguities. The self-supervised consistency loss can be satisfied by kinematically incorrect solutions that conflate the two, directly undermining the central claim that camera and object motion can be robustly separated without ground-truth poses or dense observations.
- [§5] §5 (Experiments), quantitative tables and ablation studies: no ablation is reported that isolates the contribution of the progressive disentanglement module versus a baseline that simply aligns independent reconstructions; without this, it is impossible to verify that the method overcomes the geometric degeneracy highlighted in the central insight rather than relying on favorable initialization or dataset biases.
minor comments (2)
- [§3.2] Notation for the deformation field and the static/moving partition mask should be introduced with explicit equations rather than prose descriptions to improve reproducibility.
- [Figure 3] Figure 3 (qualitative results) would benefit from side-by-side comparison with a naive alignment baseline to visually demonstrate the effect of the disentanglement step.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback on our manuscript. The comments have helped us clarify key aspects of the method and strengthen the experimental validation. We address each major comment below and have made revisions to the manuscript as indicated.
read point-by-point responses
-
Referee: [§3] §3 (Method), progressive disentanglement paragraph and associated loss formulation: the manuscript does not specify an explicit mechanism (e.g., regularization term, initialization strategy, or architectural bias) that prevents the learned deformation field from absorbing camera motion into object motion when each 4-view reconstruction carries large depth/pose ambiguities. The self-supervised consistency loss can be satisfied by kinematically incorrect solutions that conflate the two, directly undermining the central claim that camera and object motion can be robustly separated without ground-truth poses or dense observations.
Authors: We appreciate the referee for highlighting this potential issue in the method description. The progressive disentanglement is intended to address the separation of camera and object motion by starting with static part alignment and progressively identifying moving parts. The deformation field is constrained by the initial independent reconstructions which provide a starting point less prone to absorbing global motion. Nevertheless, we acknowledge that the manuscript would benefit from a more explicit description of the mechanism. In the revised version, we have expanded the progressive disentanglement paragraph to include details on the initialization strategy and an added regularization term that limits the deformation field's ability to model large global transformations in initial stages. This should make the robustness clearer. revision: yes
-
Referee: [§5] §5 (Experiments), quantitative tables and ablation studies: no ablation is reported that isolates the contribution of the progressive disentanglement module versus a baseline that simply aligns independent reconstructions; without this, it is impossible to verify that the method overcomes the geometric degeneracy highlighted in the central insight rather than relying on favorable initialization or dataset biases.
Authors: We agree with the referee that an ablation isolating the progressive disentanglement is necessary to fully validate the contribution. We have added this ablation to the experiments section in the revised manuscript. The new results demonstrate that removing the progressive disentanglement leads to degraded performance in motion estimation, confirming that it plays a key role in overcoming the geometric ambiguities rather than depending on dataset biases or initialization alone. revision: yes
Circularity Check
No significant circularity; derivation relies on external sparse-view methods and self-supervised losses without self-referential reduction
full rationale
The paper's pipeline begins with independent per-articulation reconstructions drawn from cited external advances in sparse-view 3D reconstruction, followed by a learned deformation field for correspondences and a progressive disentanglement step optimized via cross-view/cross-pose consistency losses. None of these steps reduce a claimed prediction or uniqueness result to a fitted parameter or self-citation by construction; the alignment and static/moving separation are presented as emergent from the joint optimization rather than presupposed in the inputs. The central claims therefore remain independent of the paper's own fitted values or prior self-references, qualifying as a self-contained derivation against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Recent sparse-view 3D reconstruction advances can produce usable independent per-articulation models even with unknown poses.
- domain assumption A learned deformation field can establish reliable dense correspondences across different articulations without supervision.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we first reconstruct each articulation independently using recent advances in sparse-view 3D reconstruction, then learn a deformation field that establishes dense correspondences across poses. A progressive disentanglement strategy further separates static from moving parts
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
min Fdeform LCD(Ĝt, Gt) + Lphoto(R(Ĝt), R(Gt))
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Sameer Agarwal, Yasutaka Furukawa, Noah Snavely, Ian Si- mon, Brian Curless, Steven M Seitz, and Richard Szeliski. Building rome in a day. Communications of the ACM , 54 (10):105–112, 2011. 2
work page 2011
-
[2]
pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction
David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann. pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. In Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19457–19467, 2024. 2
work page 2024
-
[3]
Yuchen Che, Ryo Furukawa, and Asako Kanezaki. Op-align: Object-level and part-level alignment for self-supervised category-level articulated object pose estimation. In Eu- ropean Conference on Computer Vision , pages 72–88. Springer, 2024. 2
work page 2024
-
[4]
Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images
Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. In European Conference on Computer Vision, pages 370–386. Springer, 2024. 2
work page 2024
-
[5]
Gaussianpro: 3d gaussian splatting with progressive propagation
Kai Cheng, Xiaoxiao Long, Kaizhi Yang, Yao Yao, Wei Yin, Yuexin Ma, Wenping Wang, and Xuejin Chen. Gaussianpro: 3d gaussian splatting with progressive propagation. InForty- first International Conference on Machine Learning, 2024. 2
work page 2024
-
[6]
Depth-regularized optimization for 3d gaussian splatting in few-shot images
Jaeyoung Chung, Jeongtaek Oh, and Kyoung Mu Lee. Depth-regularized optimization for 3d gaussian splatting in few-shot images. In Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 811–820, 2024. 2
work page 2024
-
[7]
Articulate your nerf: Unsupervised articulated object modeling via con- ditional view synthesis
Jianning Deng, Kartic Subr, and Hakan Bilen. Articulate your nerf: Unsupervised articulated object modeling via con- ditional view synthesis. Advances in Neural Information Processing Systems, 37:119717–119741, 2024. 1, 2, 4, 5, 6, 12
work page 2024
-
[8]
Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps
Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang, et al. Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps. Advances in neural information processing systems , 37: 140138–140158, 2024. 2
work page 2024
-
[9]
Capt: Category-level articulation estimation from a single point cloud using transformer
Lian Fu, Ryoichi Ishikawa, Yoshihiro Sato, and Takeshi Oishi. Capt: Category-level articulation estimation from a single point cloud using transformer. In 2024 IEEE Inter- national Conference on Robotics and Automation (ICRA) , pages 751–757. IEEE, 2024. 2
work page 2024
-
[10]
Junfu Guo, Yu Xin, Gaoyi Liu, Kai Xu, Ligang Liu, and Ruizhen Hu. Articulatedgs: Self-supervised digital twin modeling of articulated objects using 3d gaussian splatting. In Proceedings of the Computer Vision and Pattern Recogni- tion Conference, pages 27144–27153, 2025. 1, 2, 5, 6, 12
work page 2025
-
[11]
Carto: Category and joint agnostic reconstruction of articulated objects
Nick Heppert, Muhammad Zubair Irshad, Sergey Zakharov, Katherine Liu, Rares Andrei Ambrus, Jeannette Bohg, Ab- hinav Valada, and Thomas Kollar. Carto: Category and joint agnostic reconstruction of articulated objects. In Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21201–21210, 2023. 2
work page 2023
-
[12]
2d gaussian splatting for geometrically ac- curate radiance fields
Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically ac- curate radiance fields. In ACM SIGGRAPH 2024 conference papers, pages 1–11, 2024. 2
work page 2024
-
[13]
Opd: Single-view 3d openable part detection
Hanxiao Jiang, Yongsen Mao, Manolis Savva, and Angel X Chang. Opd: Single-view 3d openable part detection. In European Conference on Computer Vision, pages 410–426. Springer, 2022. 1
work page 2022
-
[14]
Detection based part- level articulated object reconstruction from single rgbd im- age
Yuki Kawana and Tatsuya Harada. Detection based part- level articulated object reconstruction from single rgbd im- age. Advances in Neural Information Processing Systems , 36:18444–18473, 2023. 2
work page 2023
-
[15]
3d gaussian splatting for real-time radiance field rendering
Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph., 42(4):139–1,
-
[16]
Gener- ative sparse-view gaussian splatting
Hanyang Kong, Xingyi Yang, and Xinchao Wang. Gener- ative sparse-view gaussian splatting. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 26745–26755, 2025. 2
work page 2025
-
[17]
Long Le, Jason Xie, William Liang, Hung-Ju Wang, Yue Yang, Yecheng Jason Ma, Kyle Vedder, Arjun Krishna, Di- nesh Jayaraman, and Eric Eaton. Articulate-anything: Auto- matic modeling of articulated objects via a vision-language foundation model. arXiv preprint arXiv:2410.13882, 2024. 2
-
[18]
Compact 3d gaussian representation for radiance field
Joo Chan Lee, Daniel Rho, Xiangyu Sun, Jong Hwan Ko, and Eunbyung Park. Compact 3d gaussian representation for radiance field. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21719– 21728, 2024. 2
work page 2024
-
[19]
Nap: Neural 3d articulation prior
Jiahui Lei, Congyue Deng, Bokui Shen, Leonidas Guibas, and Kostas Daniilidis. Nap: Neural 3d articulation prior. arXiv preprint arXiv:2305.16315, 2023. 2
-
[20]
Ground- ing image matching in 3d with mast3r
Vincent Leroy, Yohann Cabon, and Jérôme Revaud. Ground- ing image matching in 3d with mast3r. In European Confer- ence on Computer Vision, pages 71–91. Springer, 2024. 2
work page 2024
-
[21]
Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, and Lin Gu. Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normaliza- tion. In Proceedings of the IEEE/CVF conference on com- puter vision and pattern recognition , pages 20775–20785,
-
[22]
Paris: Part-level reconstruction and motion analysis for articulated objects
Jiayi Liu, Ali Mahdavi-Amiri, and Manolis Savva. Paris: Part-level reconstruction and motion analysis for articulated objects. In Proceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 352–363, 2023. 1, 2, 5, 12
work page 2023
-
[23]
Cage: Controllable articulation generation
Jiayi Liu, Hou In Ivan Tam, Ali Mahdavi-Amiri, and Manolis Savva. Cage: Controllable articulation generation. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17880–17889, 2024. 2
work page 2024
-
[24]
Zero-1-to- 3: Zero-shot one image to 3d object
Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tok- makov, Sergey Zakharov, and Carl V ondrick. Zero-1-to- 3: Zero-shot one image to 3d object. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9298–9309, 2023. 2
work page 2023
-
[25]
Build- ing rearticulable models for arbitrary 3d objects from 4d 9 point clouds
Shaowei Liu, Saurabh Gupta, and Shenlong Wang. Build- ing rearticulable models for arbitrary 3d objects from 4d 9 point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21138– 21147, 2023. 2
work page 2023
-
[26]
Robust incremental structure-from-motion with hybrid fea- tures
Shaohui Liu, Yidan Gao, Tianyi Zhang, Rémi Pautrat, Jo- hannes L Schönberger, Viktor Larsson, and Marc Pollefeys. Robust incremental structure-from-motion with hybrid fea- tures. In European Conference on Computer Vision , pages 249–269. Springer, 2024. 2
work page 2024
-
[27]
Building interactable replicas of complex articulated objects via gaussian splatting
Yu Liu, Baoxiong Jia, Ruijie Lu, Junfeng Ni, Song-Chun Zhu, and Siyuan Huang. Building interactable replicas of complex articulated objects via gaussian splatting. In The Thirteenth International Conference on Learning Represen- tations, 2025. 2
work page 2025
-
[28]
Wonder3d: Sin- gle image to 3d using cross-domain diffusion
Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, et al. Wonder3d: Sin- gle image to 3d using cross-domain diffusion. In Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9970–9980, 2024. 2
work page 2024
-
[29]
Scaffold-gs: Structured 3d gaussians for view-adaptive rendering
Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua Lin, and Bo Dai. Scaffold-gs: Structured 3d gaussians for view-adaptive rendering. 2024 ieee. In CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20654–20664, 2023. 2
work page 2024
-
[30]
Nerf: Representing scenes as neural radiance fields for view syn- thesis
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. Communications of the ACM, 65(1):99–106, 2021. 1, 2
work page 2021
-
[31]
Chang, Li Yi, Subarna Tripathi, Leonidas J
Kaichun Mo, Shilin Zhu, Angel X. Chang, Li Yi, Subarna Tripathi, Leonidas J. Guibas, and Hao Su. PartNet: A large- scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In The IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), 2019. 5
work page 2019
-
[32]
A-sdf: Learning disentangled signed distance functions for articulated shape representation
Jiteng Mu, Weichao Qiu, Adam Kortylewski, Alan Yuille, Nuno Vasconcelos, and Xiaolong Wang. A-sdf: Learning disentangled signed distance functions for articulated shape representation. In Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 13001–13011,
-
[33]
Instant neural graphics primitives with a mul- tiresolution hash encoding
Thomas Müller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a mul- tiresolution hash encoding. ACM transactions on graphics (TOG), 41(4):1–15, 2022. 2
work page 2022
-
[34]
Structure from action: Learning interactions for articulated object 3d structure discovery
Neil Nie, Samir Yitzhak Gadre, Kiana Ehsani, and Shu- ran Song. Structure from action: Learning interactions for articulated object 3d structure discovery. arXiv preprint arXiv:2207.08997, 2022. 8
-
[35]
Understanding 3d object articulation in in- ternet videos
Shengyi Qian, Linyi Jin, Chris Rockwell, Siyi Chen, and David F Fouhey. Understanding 3d object articulation in in- ternet videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 1599– 1609, 2022. 1
work page 2022
-
[36]
SAM 2: Segment Anything in Images and Videos
Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junt- ing Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao- Yuan Wu, Ross Girshick, Piotr Dollár, and Christoph Feicht- enhofer. Sam 2: Segment anything in images and videos. arXiv preprint arXiv:24...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[37]
Structure-from-motion revisited
Johannes Lutz Schönberger and Jan-Michael Frahm. Structure-from-motion revisited. In Conference on Com- puter Vision and Pattern Recognition (CVPR), 2016. 2
work page 2016
-
[38]
Pixelwise view selection for un- structured multi-view stereo
Johannes Lutz Schönberger, Enliang Zheng, Marc Pollefeys, and Jan-Michael Frahm. Pixelwise view selection for un- structured multi-view stereo. In European Conference on Computer Vision (ECCV), 2016. 2
work page 2016
-
[39]
Opdmulti: Openable part detection for multiple objects
Xiaohao Sun, Hanxiao Jiang, Manolis Savva, and An- gel Xuan Chang. Opdmulti: Openable part detection for multiple objects. arXiv preprint arXiv:2303.14087 , 2023. 1
-
[40]
Splatter image: Ultra-fast single-view 3d recon- struction
Stanislaw Szymanowicz, Chrisitian Rupprecht, and Andrea Vedaldi. Splatter image: Ultra-fast single-view 3d recon- struction. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 10208– 10217, 2024. 2
work page 2024
-
[41]
Lgm: Large multi-view gaussian model for high-resolution 3d content creation
Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. Lgm: Large multi-view gaussian model for high-resolution 3d content creation. In European Conference on Computer Vision, pages 1–18. Springer, 2024. 2
work page 2024
-
[42]
Cla-nerf: Category-level articulated neural radiance field
Wei-Cheng Tseng, Hung-Ju Liao, Lin Yen-Chen, and Min Sun. Cla-nerf: Category-level articulated neural radiance field. In 2022 International Conference on Robotics and Au- tomation (ICRA), pages 8454–8460. IEEE, 2022. 2
work page 2022
-
[43]
Least-squares estimation of transforma- tion parameters between two point patterns
Shinji Umeyama. Least-squares estimation of transforma- tion parameters between two point patterns. IEEE Transac- tions on Pattern Analysis & Machine Intelligence , 13(04): 376–380, 1991. 13
work page 1991
-
[44]
Vggt: Vi- sual geometry grounded transformer
Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Vi- sual geometry grounded transformer. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 5294–5306, 2025. 2, 5
work page 2025
-
[45]
NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction
Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, and Wenping Wang. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689, 2021. 2
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[46]
Dust3r: Geometric 3d vi- sion made easy
Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vi- sion made easy. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20697– 20709, 2024. 2
work page 2024
-
[47]
Self-supervised neural articulated shape and appearance models
Fangyin Wei, Rohan Chabra, Lingni Ma, Christoph Lassner, Michael Zollhöfer, Szymon Rusinkiewicz, Chris Sweeney, Richard Newcombe, and Mira Slavcheva. Self-supervised neural articulated shape and appearance models. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15816–15826, 2022. 2
work page 2022
-
[48]
Neural implicit representation for building digital twins of unknown articulated objects
Yijia Weng, Bowen Wen, Jonathan Tremblay, Valts Blukis, Dieter Fox, Leonidas Guibas, and Stan Birchfield. Neural implicit representation for building digital twins of unknown articulated objects. In Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 3141–3150, 2024. 2 10
work page 2024
-
[49]
Multi-scale 3d gaussian splatting for anti-aliased rendering
Zhiwen Yan, Weng Fei Low, Yu Chen, and Gim Hee Lee. Multi-scale 3d gaussian splatting for anti-aliased rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20923–20931, 2024. 2
work page 2024
-
[50]
Teaser: Fast and certifiable point cloud registration
Heng Yang, Jingnan Shi, and Luca Carlone. Teaser: Fast and certifiable point cloud registration. IEEE Transactions on Robotics, 37(2):314–333, 2020. 4
work page 2020
-
[51]
Fast3r: Towards 3d reconstruction of 1000+ images in one forward pass
Jianing Yang, Alexander Sax, Kevin J Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, and Matt Feiszli. Fast3r: Towards 3d reconstruction of 1000+ images in one forward pass. In Proceedings of the Computer Vision and Pattern Recognition Conference , pages 21924–21935,
-
[52]
No pose, no problem: Surprisingly simple 3d gaussian splats from sparse unposed images
Botao Ye, Sifei Liu, Haofei Xu, Xueting Li, Marc Pollefeys, Ming-Hsuan Yang, and Songyou Peng. No pose, no problem: Surprisingly simple 3d gaussian splats from sparse unposed images. arXiv preprint arXiv:2410.24207, 2024. 2
-
[53]
Freesplat- ter: Free-viewpoint 3d gaussian splatting from a single im- age
Zehao Zhang, Anand Goel, Zhan Wang, Vladlen Koltun, Ji- tendra Malik, Chenxu Ma, and Leonidas Guibas. Freesplat- ter: Free-viewpoint 3d gaussian splatting from a single im- age. arXiv preprint arXiv:2401.04644, 2024. 2, 3
-
[54]
Fsgs: Real-time few-shot view synthesis using gaussian splatting
Zehao Zhu, Zhiwen Fan, Yifan Jiang, and Zhangyang Wang. Fsgs: Real-time few-shot view synthesis using gaussian splatting. In European conference on computer vision, pages 145–163. Springer, 2024. 2 11 PAOLI: Pose-free Articulated Object Learning from Sparse-view Images Supplementary Material
work page 2024
-
[55]
Supplementary Material 6.1. Detailed Discussion of Related Work Here we provide more detailed discussion of the most related articulated object learning work including, PARIS [22], AYN [7] and AGS [10]. As explained in the submission, these techniques assume dense views of the ob- ject across two articulation state along with the camera in- formation, unl...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.