ART: Articulated Reconstruction Transformer
Pith reviewed 2026-05-21 17:14 UTC · model grok-4.3
The pith
ART reconstructs complete 3D articulated objects from sparse multi-state RGB images by mapping inputs to learnable part slots that decode per-part geometry, texture, and articulation parameters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ART is a category-agnostic feed-forward model that reconstructs complete 3D articulated objects from only sparse, multi-state RGB images by treating objects as assemblies of rigid parts. The transformer maps the image inputs to a set of learnable part slots and jointly decodes unified representations for each part that include its 3D geometry, texture, and explicit articulation parameters. The resulting models are physically interpretable and directly exportable for simulation.
What carries the argument
Learnable part slots inside the transformer that receive sparse image inputs and decode unified per-part representations for 3D geometry, texture, and articulation parameters.
If this is right
- Reconstructions are physically interpretable and can be exported directly for use in simulation.
- The model produces complete 3D outputs from sparse multi-state image inputs without per-object optimization.
- Performance improves over prior baselines and reaches state-of-the-art results across diverse benchmarks.
- Joint decoding supplies geometry, texture, and explicit articulation parameters for every part in a single forward pass.
Where Pith is reading between the lines
- The part-based formulation could support downstream tasks such as physics-based planning once the 3D models are obtained.
- Because the method is feed-forward, it might enable reconstruction pipelines that run at interactive speeds once trained.
- Scaling the training data further could extend the approach to objects with more complex part hierarchies.
Load-bearing premise
A large-scale diverse dataset with per-part supervision exists and is sufficient to train a category-agnostic model that generalizes to unseen articulated objects.
What would settle it
A controlled test in which the trained model produces inaccurate geometry or articulation on a collection of articulated objects drawn from categories absent from the training set would falsify the generalization claim.
Figures
read the original abstract
We introduce ART, Articulated Reconstruction Transformer -- a category-agnostic, feed-forward model that reconstructs complete 3D articulated objects from only sparse, multi-state RGB images. Previous methods for articulated object reconstruction either rely on slow optimization with fragile cross-state correspondences or use feed-forward models limited to specific object categories. In contrast, ART treats articulated objects as assemblies of rigid parts, formulating reconstruction as part-based prediction. Our newly designed transformer architecture maps sparse image inputs to a set of learnable part slots, from which ART jointly decodes unified representations for individual parts, including their 3D geometry, texture, and explicit articulation parameters. The resulting reconstructions are physically interpretable and readily exportable for simulation. Trained on a large-scale, diverse dataset with per-part supervision, and evaluated across diverse benchmarks, ART achieves significant improvements over existing baselines and establishes a new state of the art for articulated object reconstruction from image inputs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ART, a category-agnostic feed-forward transformer for reconstructing complete 3D articulated objects from sparse multi-state RGB images. It models objects as rigid part assemblies, using a transformer with learnable part slots to jointly predict per-part 3D geometry, texture, and explicit articulation parameters. The model is trained with per-part supervision on a large-scale diverse dataset and claims new state-of-the-art results on diverse benchmarks, with outputs that are physically interpretable and exportable for simulation.
Significance. If the benchmark gains and generalization claims hold after verification of the training data and ablations, this would be a meaningful contribution to articulated object reconstruction. It shifts from slow optimization or category-specific feed-forward models to a general, fast, part-based transformer approach with explicit articulation outputs suitable for downstream simulation and robotics tasks.
major comments (1)
- [Abstract] The central claim of category-agnostic generalization rests on training with a 'large-scale, diverse dataset with per-part supervision' (Abstract). No statistics are supplied on category count, object instances, part-count distribution, image density per object, or label provenance (synthetic vs. real). This information is load-bearing for assessing whether the training distribution covers the variability needed for true unseen-object generalization.
minor comments (1)
- [Abstract] The abstract uses the phrase 'multi-state RGB images' without an early definition of what constitutes a state (different articulation poses, camera views, or both); a brief parenthetical clarification would improve accessibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our submission. The concern about dataset transparency is well-taken and directly relevant to the strength of our category-agnostic generalization claims. We address the point below and will revise the manuscript to include the requested details.
read point-by-point responses
-
Referee: [Abstract] The central claim of category-agnostic generalization rests on training with a 'large-scale, diverse dataset with per-part supervision' (Abstract). No statistics are supplied on category count, object instances, part-count distribution, image density per object, or label provenance (synthetic vs. real). This information is load-bearing for assessing whether the training distribution covers the variability needed for true unseen-object generalization.
Authors: We agree that quantitative dataset statistics are necessary to evaluate the scope of generalization and that their absence from the abstract weakens the central claim. While the full manuscript describes the data collection pipeline and per-part supervision in Section 4, we acknowledge that a concise summary of key statistics is missing from the abstract and early sections. In the revised manuscript we will add a short 'Training Data' paragraph (or table) immediately after the abstract or in Section 3 that reports: total categories covered, number of distinct object instances, part-count distribution (mean, range, and histogram summary), average number of multi-state views per object, and explicit confirmation that all labels are synthetic with ground-truth per-part annotations obtained from a physics-based simulator. This addition will allow readers to assess coverage of variability for unseen objects without altering any experimental results or claims. revision: yes
Circularity Check
No circularity in derivation chain
full rationale
The paper describes a feed-forward transformer architecture that maps image inputs to part slots and decodes geometry, texture, and articulation parameters. No equations, derivations, or self-referential definitions appear that reduce any claimed output to fitted inputs or prior self-citations by construction. The model is trained on an external dataset and evaluated on benchmarks, rendering the central claims empirically grounded rather than tautological.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
ART treats articulated objects as assemblies of rigid parts, formulating reconstruction as part-based prediction. Our newly designed transformer architecture maps sparse image inputs to a set of learnable part slots, from which ART jointly decodes unified representations for individual parts, including their 3D geometry, texture, and explicit articulation parameters.
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We allocate P0 learnable part slots... predict P ≤ P0 active parts... motion type C ∈ {static, prismatic, revolute}, joint axis D, pivot O, dynamics S.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Learning to generalize kinematic models to novel objects
Ben Abbatematteo, Stefanie Tellex, and George Konidaris. Learning to generalize kinematic models to novel objects. In Proceedings of the 3rd Conference on Robot Learning, 2019. 2
work page 2019
-
[2]
Hexplane: A fast representa- tion for dynamic scenes
Ang Cao and Justin Johnson. Hexplane: A fast representa- tion for dynamic scenes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 130–141, 2023. 4
work page 2023
-
[3]
Autopartgen: Autogres- sive 3d part generation and discovery.arXiv preprint arXiv:2507.13346, 2025
Minghao Chen, Jianyuan Wang, Roman Shapovalov, Tom Monnier, Hyunyoung Jung, Dilin Wang, Rakesh Ranjan, Iro Laina, and Andrea Vedaldi. Autopartgen: Autogres- sive 3d part generation and discovery.arXiv preprint arXiv:2507.13346, 2025. 3
-
[4]
Zoey Chen, Aaron Walsman, Marius Memmel, Kaichun Mo, Alex Fang, Karthikeya Vemuri, Alan Wu, Dieter Fox, and Abhishek Gupta. Urdformer: A pipeline for constructing articulated simulation environments from real-world images. arXiv preprint arXiv:2405.11656, 2024. 2, 3, 6, S2
-
[5]
Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blis- tein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025. 4
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[6]
Tianyuan Dai, Josiah Wong, Yunfan Jiang, Chen Wang, Cem Gokmen, Ruohan Zhang, Jiajun Wu, and Li Fei-Fei. Auto- mated creation of digital cousins for robust policy learning. arXiv preprint arXiv:2410.07408, 2024. 3
-
[7]
Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram V oleti, Samir Yitzhak Gadre, et al. Objaverse-xl: A universe of 10m+ 3d objects.Advances in Neural Informa- tion Processing Systems, 36:35799–35813, 2023. 3, 5
work page 2023
-
[8]
Objaverse: A universe of annotated 3d objects
Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A universe of annotated 3d objects. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13142–13153, 2023. 3, 5
work page 2023
-
[9]
Jianning Deng, Kartic Subr, and Hakan Bilen. Articulate your nerf: Unsupervised articulated object modeling via con- ditional view synthesis.Advances in Neural Information Processing Systems, 37:119717–119741, 2024. 1, 2
work page 2024
-
[10]
Anymate: A dataset and baselines for learning 3d object rigging
Yufan Deng, Yuhao Zhang, Chen Geng, Shangzhe Wu, and Jiajun Wu. Anymate: A dataset and baselines for learning 3d object rigging. InProceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Confer- ence Conference Papers, pages 1–10, 2025. 1
work page 2025
-
[11]
Prob- ing the 3d awareness of visual foundation models
Mohamed El Banani, Amit Raj, Kevis-Kokitsi Maninis, Ab- hishek Kar, Yuanzhen Li, Michael Rubinstein, Deqing Sun, Leonidas Guibas, Justin Johnson, and Varun Jampani. Prob- ing the 3d awareness of visual foundation models. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21795–21806, 2024. 4
work page 2024
-
[12]
Capt: Category-level articulation estimation from a single point cloud using transformer
Lian Fu, Ryoichi Ishikawa, Yoshihiro Sato, and Takeshi Oishi. Capt: Category-level articulation estimation from a single point cloud using transformer. In2024 IEEE Inter- national Conference on Robotics and Automation (ICRA), pages 751–757. IEEE, 2024. 2
work page 2024
-
[13]
Me- shart: Generating articulated meshes with structure-guided transformers
Daoyi Gao, Yawar Siddiqui, Lei Li, and Angela Dai. Me- shart: Generating articulated meshes with structure-guided transformers. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 618–627, 2025. 2
work page 2025
-
[14]
Partrm: Modeling part-level dynamics with large cross-state reconstruction model
Mingju Gao, Yike Pan, Huan-ang Gao, Zongzheng Zhang, Wenyi Li, Hao Dong, Hao Tang, Li Yi, and Hao Zhao. Partrm: Modeling part-level dynamics with large cross-state reconstruction model. InProceedings of the Computer Vi- sion and Pattern Recognition Conference, pages 7004–7014,
-
[15]
Carto: Category and joint agnostic reconstruction of articulated objects
Nick Heppert, Muhammad Zubair Irshad, Sergey Zakharov, Katherine Liu, Rares Andrei Ambrus, Jeannette Bohg, Ab- hinav Valada, and Thomas Kollar. Carto: Category and joint agnostic reconstruction of articulated objects. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21201–21210, 2023. 2, 3
work page 2023
-
[16]
LRM: Large Reconstruction Model for Single Image to 3D
Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. Lrm: Large reconstruction model for single image to 3d.arXiv preprint arXiv:2311.04400, 2023. 1, 2, 3, 4, 5
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[17]
Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perel- man, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Weli- hinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card. arXiv preprint arXiv:2410.21276, 2024. 4
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[18]
Distributional depth-based estimation of object ar- ticulation models
Ajinkya Jain, Stephen Giguere, Rudolf Lioutikov, and Scott Niekum. Distributional depth-based estimation of object ar- ticulation models. InConference on Robot Learning, pages 1611–1621. PMLR, 2022. 2
work page 2022
-
[19]
Yan-Bin Jia. Pl ¨ucker coordinates for lines in the space.Prob- lem Solver Techniques for Applied Computer Science, Com- S-477/577 Course Handout, 3, 2020. 4
work page 2020
-
[20]
Opd: Single-view 3d openable part detection
Hanxiao Jiang, Yongsen Mao, Manolis Savva, and Angel X Chang. Opd: Single-view 3d openable part detection. In European Conference on Computer Vision, pages 410–426. Springer, 2022. 2, 6, S2
work page 2022
-
[21]
Ditto: Building digital twins of articulated objects from interaction
Zhenyu Jiang, Cheng-Chun Hsu, and Yuke Zhu. Ditto: Building digital twins of articulated objects from interaction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5616–5626, 2022. 2 9
work page 2022
-
[22]
Haian Jin, Hanwen Jiang, Hao Tan, Kai Zhang, Sai Bi, Tianyuan Zhang, Fujun Luan, Noah Snavely, and Zexiang Xu. Lvsm: A large view synthesis model with minimal 3d inductive bias.arXiv preprint arXiv:2410.17242, 2024. 3
-
[23]
Procedural generation of articulated simulation-ready assets,
Abhishek Joshi, Beining Han, Jack Nugent, Max Gonzalez Saez-Diez, Yiming Zuo, Jonathan Liu, Hongyu Wen, Stama- tis Alexandropoulos, Karhan Kayan, Anna Calveri, Tao Sun, Gaowen Liu, Yi Shao, Alexander Raistrick, and Jia Deng. Procedural generation of articulated simulation-ready assets,
-
[24]
3d gaussian splatting for real-time radiance field rendering.ACM Trans
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1,
-
[25]
Jeonghwan Kim, Jisoo Kim, Jeonghyeon Na, and Hanbyul Joo. Parahome: Parameterizing everyday home activities to- wards 3d generative modeling of human-object interactions. InProceedings of the Computer Vision and Pattern Recogni- tion Conference, pages 1816–1828, 2025. 1
work page 2025
-
[26]
Jiahui Lei, Congyue Deng, William B Shen, Leonidas J Guibas, and Kostas Daniilidis. Nap: Neural 3d articulated object prior.Advances in Neural Information Processing Systems, 36:31878–31894, 2023. 2
work page 2023
-
[27]
Chengshu Li, Fei Xia, Roberto Mart ´ın-Mart´ın, Michael Lin- gelbach, Sanjana Srivastava, Bokui Shen, Kent Vainio, Cem Gokmen, Gokul Dharan, Tanish Jain, et al. igibson 2.0: Object-centric simulation for robot learning of everyday household tasks.arXiv preprint arXiv:2108.03272, 2021. 1
-
[28]
Chengshu Li, Ruohan Zhang, Josiah Wong, Cem Gok- men, Sanjana Srivastava, Roberto Mart ´ın-Mart´ın, Chen Wang, Gabrael Levine, Wensi Ai, Benjamin Martinez, et al. Behavior-1k: A human-centered, embodied ai benchmark with 1,000 everyday activities and realistic simulation.arXiv preprint arXiv:2403.09227, 2024. 1
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[29]
Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model
Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, and Sai Bi. Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model. arXiv preprint arXiv:2311.06214, 2023. 3
-
[30]
Ner- facc: A general nerf acceleration toolbox.arXiv preprint arXiv:2210.04847, 2022
Ruilong Li, Matthew Tancik, and Angjoo Kanazawa. Ner- facc: A general nerf acceleration toolbox.arXiv preprint arXiv:2210.04847, 2022. S1
-
[31]
Category-level articulated ob- ject pose estimation
Xiaolong Li, He Wang, Li Yi, Leonidas J Guibas, A Lynn Abbott, and Shuran Song. Category-level articulated ob- ject pose estimation. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 3706–3715, 2020. 2
work page 2020
-
[32]
Neuralangelo: High-fidelity neural surface reconstruction
Zhaoshuo Li, Thomas M ¨uller, Alex Evans, Russell H Tay- lor, Mathias Unberath, Ming-Yu Liu, and Chen-Hsuan Lin. Neuralangelo: High-fidelity neural surface reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 8456–8465, 2023. S1
work page 2023
-
[33]
Learning the 3d fauna of the web
Zizhang Li, Dor Litvak, Ruining Li, Yunzhi Zhang, Tomas Jakab, Christian Rupprecht, Shangzhe Wu, Andrea Vedaldi, and Jiajun Wu. Learning the 3d fauna of the web. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9752–9762, 2024. 4
work page 2024
-
[34]
Zhengqin Li, Dilin Wang, Ka Chen, Zhaoyang Lv, Thu Nguyen-Phuoc, Milim Lee, Jia-Bin Huang, Lei Xiao, Yufeng Zhu, Carl S Marshall, et al. Lirm: Large inverse render- ing model for progressive reconstruction of shape, materials and view-dependent radiance fields. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 505–517, 2025. 1...
work page 2025
-
[35]
Yuchen Lin, Chenguo Lin, Panwang Pan, Honglei Yan, Yiqiang Feng, Yadong Mu, and Katerina Fragkiadaki. Partcrafter: Structured 3d mesh generation via compo- sitional latent diffusion transformers.arXiv preprint arXiv:2506.05573, 2025. 3
-
[36]
Paris: Part-level reconstruction and motion analysis for articulated objects
Jiayi Liu, Ali Mahdavi-Amiri, and Manolis Savva. Paris: Part-level reconstruction and motion analysis for articulated objects. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 352–363, 2023. 2, 4, 6
work page 2023
-
[37]
arXiv preprint arXiv:2410.16499 (2024)
Jiayi Liu, Denys Iliash, Angel X Chang, Manolis Savva, and Ali Mahdavi-Amiri. Singapo: Single image controlled generation of articulated parts in objects.arXiv preprint arXiv:2410.16499, 2024. 2, 3, 6, S2
-
[38]
Survey on modeling of human-made articulated objects
Jiayi Liu, Manolis Savva, and Ali Mahdavi-Amiri. Survey on modeling of human-made articulated objects. InComputer Graphics Forum, page e70092. Wiley Online Library, 2025. 1
work page 2025
-
[39]
Liu Liu, Han Xue, Wenqiang Xu, Haoyuan Fu, and Cewu Lu. Toward real-world category-level articulation pose esti- mation.IEEE Transactions on Image Processing, 31:1072– 1083, 2022. 2
work page 2022
-
[40]
Category-level articulated object 9d pose estimation via reinforcement learning
Liu Liu, Jianming Du, Hao Wu, Xun Yang, Zhenguang Liu, Richang Hong, and Meng Wang. Category-level articulated object 9d pose estimation via reinforcement learning. InPro- ceedings of the 31st ACM International Conference on Mul- timedia, pages 728–736, 2023. 2
work page 2023
-
[41]
Nothing but geometric constraints: A model-free method for articulated object pose estimation
Qihao Liu, Weichao Qiu, Weiyao Wang, Gregory D Hager, and Alan L Yuille. Nothing but geometric constraints: A model-free method for articulated object pose estimation. arXiv preprint arXiv:2012.00088, 2020. 2
-
[42]
Building interactable replicas of complex articulated objects via gaussian splatting
Yu Liu, Baoxiong Jia, Ruijie Lu, Junfeng Ni, Song-Chun Zhu, and Siyuan Huang. Building interactable replicas of complex articulated objects via gaussian splatting. InThe Thirteenth International Conference on Learning Represen- tations, 2025. 2, 4, 6, 7
work page 2025
-
[43]
Threestudio: A modular framework for diffusion-guided 3d generation.cg
Ying-Tian Liu, Yuan-Chen Guo, Vikram V oleti, Ruizhi Shao, Chia-Hao Chen, Guan Luo, Zixin Zou, Chen Wang, Chris- tian Laforte, Yan-Pei Cao, et al. Threestudio: A modular framework for diffusion-guided 3d generation.cg. cs. ts- inghua. edu. cn, 2023. 1
work page 2023
-
[44]
Zhao Mandi, Yijia Weng, Dominik Bauer, and Shuran Song. Real2code: Reconstruct articulated objects via code genera- tion.arXiv preprint arXiv:2406.08474, 2024. 3
-
[45]
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021. 2, S1
work page 2021
-
[46]
A-sdf: Learning 10 disentangled signed distance functions for articulated shape representation
Jiteng Mu, Weichao Qiu, Adam Kortylewski, Alan Yuille, Nuno Vasconcelos, and Xiaolong Wang. A-sdf: Learning 10 disentangled signed distance functions for articulated shape representation. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 13001–13011,
-
[47]
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023. 4
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[48]
DreamFusion: Text-to-3D using 2D Diffusion
Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Milden- hall. Dreamfusion: Text-to-3d using 2d diffusion.arXiv preprint arXiv:2209.14988, 2022. 1
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[49]
Marius-Constantin Popescu, Valentina E Balas, Liliana Perescu-Popescu, and Nikos Mastorakis. Multilayer percep- tron and neural networks.WSEAS Transactions on Circuits and Systems, 8(7):579–588, 2009. 4
work page 2009
-
[50]
Understanding 3d object articulation in in- ternet videos
Shengyi Qian, Linyi Jin, Chris Rockwell, Siyi Chen, and David F Fouhey. Understanding 3d object articulation in in- ternet videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1599– 1609, 2022. 2
work page 2022
-
[51]
arXiv preprint arXiv:2502.02590 (2025)
Xiaowen Qiu, Jincheng Yang, Yian Wang, Zhehuan Chen, Yufei Wang, Tsun-Hsuan Wang, Zhou Xian, and Chuang Gan. Articulate anymesh: Open-vocabulary 3d articulated objects modeling.arXiv preprint arXiv:2502.02590, 2025. 2
-
[52]
Generalized in- tersection over union: A metric and a loss for bounding box regression
Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. Generalized in- tersection over union: A metric and a loss for bounding box regression. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 658–666,
-
[53]
igibson 1.0: A simulation environment for interactive tasks in large realistic scenes
Bokui Shen, Fei Xia, Chengshu Li, Roberto Mart ´ın-Mart´ın, Linxi Fan, Guanzhi Wang, Claudia P ´erez-D’Arpino, Shya- mal Buch, Sanjana Srivastava, Lyne Tchapmi, et al. igibson 1.0: A simulation environment for interactive tasks in large realistic scenes. In2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 7520–7527. IE...
work page 2021
-
[54]
Yawar Siddiqui, Tom Monnier, Filippos Kokkinos, Mahen- dra Kariya, Yanir Kleiman, Emilien Garreau, Oran Gafni, Natalia Neverova, Andrea Vedaldi, Roman Shapovalov, et al. Meta 3d assetgen: Text-to-mesh generation with high- quality geometry, texture, and pbr materials.Advances in Neural Information Processing Systems, 37:9532–9564,
-
[55]
Light field networks: Neu- ral scene representations with single-evaluation rendering
Vincent Sitzmann, Semon Rezchikov, Bill Freeman, Josh Tenenbaum, and Fredo Durand. Light field networks: Neu- ral scene representations with single-evaluation rendering. Advances in Neural Information Processing Systems, 34: 19313–19325, 2021. 4
work page 2021
-
[56]
Opdmulti: Openable part detection for multiple ob- jects
Xiaohao Sun, Hanxiao Jiang, Manolis Savva, and Angel Chang. Opdmulti: Openable part detection for multiple ob- jects. In2024 International Conference on 3D Vision (3DV), pages 169–178. IEEE, 2024. 2
work page 2024
-
[57]
Leia: Latent view-invariant embeddings for implicit 3d artic- ulation
Archana Swaminathan, Anubhav Gupta, Kamal Gupta, Shishira R Maiya, Vatsal Agarwal, and Abhinav Shrivastava. Leia: Latent view-invariant embeddings for implicit 3d artic- ulation. InEuropean Conference on Computer Vision, pages 210–227. Springer, 2024. 2
work page 2024
-
[58]
Attention is all you need.Advances in neural information processing systems, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 4
work page 2017
-
[59]
Active coarse-to-fine segmentation of moveable parts from real images
Ruiqi Wang, Akshay Gadi Patil, Fenggen Yu, and Hao Zhang. Active coarse-to-fine segmentation of moveable parts from real images. InEuropean Conference on Computer Vi- sion, pages 111–127. Springer, 2024. 2
work page 2024
-
[60]
Shape2motion: Joint analysis of motion parts and attributes from 3d shapes
Xiaogang Wang, Bin Zhou, Yahao Shi, Xiaowu Chen, Qin- ping Zhao, and Kai Xu. Shape2motion: Joint analysis of motion parts and attributes from 3d shapes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 8876–8884, 2019. 2
work page 2019
-
[61]
Meshlrm: Large reconstruction model for high- quality mesh
Xinyue Wei, Kai Zhang, Sai Bi, Hao Tan, Fujun Luan, Valentin Deschaintre, Kalyan Sunkavalli, Hao Su, and Zex- iang Xu. Meshlrm: Large reconstruction model for high- quality meshes.arXiv preprint arXiv:2404.12385, 2024. 1, 4, 5, S1
-
[62]
Neural implicit representation for building digital twins of unknown articulated objects
Yijia Weng, Bowen Wen, Jonathan Tremblay, Valts Blukis, Dieter Fox, Leonidas Guibas, and Stan Birchfield. Neural implicit representation for building digital twins of unknown articulated objects. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 3141–3150, 2024. 2, 4, 6
work page 2024
-
[63]
Sapien: A simulated part-based interactive environment
Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, et al. Sapien: A simulated part-based interactive environment. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11097– 11107, 2020. 2, 6, 8, S2
work page 2020
-
[64]
Unsupervised kinematic motion detection for part- segmented 3d shape collections
Xianghao Xu, Yifan Ruan, Srinath Sridhar, and Daniel Ritchie. Unsupervised kinematic motion detection for part- segmented 3d shape collections. InACM SIGGRAPH 2022 Conference Proceedings, pages 1–9, 2022. 2
work page 2022
-
[65]
Grm: Large gaussian reconstruction model for ef- ficient 3d reconstruction and generation
Yinghao Xu, Zifan Shi, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, and Gordon Wet- zstein. Grm: Large gaussian reconstruction model for ef- ficient 3d reconstruction and generation. InEuropean Con- ference on Computer Vision, pages 1–20. Springer, 2024. 1, 3, 5
work page 2024
-
[66]
4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos, June 2025
Zhen Xu, Zhengqin Li, Zhao Dong, Xiaowei Zhou, Richard Newcombe, and Zhaoyang Lv. 4dgt: Learning a 4d gaus- sian transformer using real-world monocular videos.arXiv preprint arXiv:2506.08015, 2025. 4
-
[67]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025. 4
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[68]
Reconstructing animatable categories from videos
Gengshan Yang, Chaoyang Wang, N Dinesh Reddy, and Deva Ramanan. Reconstructing animatable categories from videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16995– 17005, 2023. 1
work page 2023
-
[69]
Lior Yariv, Jiatao Gu, Yoni Kasten, and Yaron Lipman. V ol- ume rendering of neural implicit surfaces.Advances in neu- 11 ral information processing systems, 34:4805–4815, 2021. 5, S1
work page 2021
-
[70]
Junyi Zhang, Charles Herrmann, Junhwa Hur, Luisa Pola- nia Cabrera, Varun Jampani, Deqing Sun, and Ming-Hsuan Yang. A tale of two features: Stable diffusion complements dino for zero-shot semantic correspondence.Advances in Neural Information Processing Systems, 36:45533–45547,
-
[71]
The unreasonable effectiveness of deep features as a perceptual metric
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 586–595, 2018. 5, 6 12 ART: Articulated Reconstruction Transformer Supplementary Material A. More results We provide...
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.