pith. sign in

arxiv: 2512.16767 · v2 · pith:GNVTFQQ7new · submitted 2025-12-18 · 💻 cs.CV

Make-It-Poseable: Feed-forward Latent Posing Model for 3D Characters

Pith reviewed 2026-05-16 21:27 UTC · model grok-4.3

classification 💻 cs.CV
keywords 3D character posinglatent space transformationfeed-forward modelskinning-freezero-shot generalizationmesh deformationcomputer graphics
0
0 comments X

The pith

Make-It-Poseable poses 3D characters by transforming compact latent representations instead of meshes or skinning weights.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Make-It-Poseable, a feed-forward framework that reformulates 3D character posing as a skinning-free transformation in latent space. It combines a latent posing transformer, a dense pose representation, and an adaptive completion module trained with a bipartite-matched latent loss to handle topological changes. This setup targets problems in AI-generated assets that have irregular structures and fused geometry. A sympathetic reader would care because the approach claims to deliver higher pose conformance and zero-shot generalization to shapes such as quadrupeds while supporting editing tasks like part replacement.

Core claim

The central claim is that character posing can be recast as direct manipulation of compact latent representations of 3D shapes. The method integrates a latent posing transformer for shape manipulation, a dense pose representation for fine-grained control, and an adaptive completion module optimized via a bipartite-matched latent loss. This skinning-free design bypasses fixed mesh connectivity and traditional rigging constraints, enabling robust reconstruction under arbitrary topological changes.

What carries the argument

Latent posing transformer that performs shape manipulation directly on compact latent representations, decoupled from mesh topology.

If this is right

  • The method significantly outperforms existing baselines in posing quality.
  • The skeleton-agnostic design exhibits zero-shot generalization to diverse morphologies including quadrupeds.
  • It seamlessly supports 3D authoring applications such as part replacement and refinement.
  • It robustly processes AI-generated assets that exhibit flawed structures and fused geometry.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Animation pipelines could reduce reliance on manual rigging steps for procedurally generated models.
  • Real-time posing tools in games or VR might incorporate this latent approach for faster iteration on varied character shapes.
  • Extending the latent space with temporal information could support synthesis of animated sequences from static posed inputs.

Load-bearing premise

Compact latent representations preserve enough geometric detail to reconstruct fine features and handle arbitrary topological changes without mesh-specific priors or artifacts.

What would settle it

Apply the model to AI-generated characters with fused geometry or fine details such as hair and measure whether posed outputs show visible artifacts or loss of detail relative to a high-resolution reference mesh.

Figures

Figures reproduced from arXiv: 2512.16767 by Alan Zhao, Houqiang Li, Jax Xiang, Ori Zhang, Wengang Zhou, Zhenxun Yuan, Zhiyang Guo.

Figure 1
Figure 1. Figure 1: Given a 3D humanoid model of arbitrary shape and initial pose, our method efficiently re-poses it in a single feed-forward pass. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Pipeline of our character posing framework. Given a source shape and source/target skeletons, we encode them into latent representations with dense correspondence. A latent posing transformer then predicts the target shape tokens, which are finally decoded into the posed mesh. This framework is trained in two stages. First, a latent loss is established to preserve geometric details. Second, an adaptive com… view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of our key designs. (a) The skeleton encoder (Sec. 3.2) produces dense pose representations with latent-level one￾to-one correspondence. (b) Latent-space supervision (Sec. 3.4) ensures a semantically meaningful token transformation path to preserve geometric details. (c) Adaptive tokens (Sec. 3.5) are introduced in the finetuning stage to handle newly exposed structures after deformation. wher… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparison on diverse characters and poses. We showcase results for re-posing each character into a widely￾adopted T-pose and an additional random pose. Our method produces high-fidelity results across various cases. It robustly handles challenging inputs where MIA [6] and Puppeteer [25] produce significant artifacts, and gives better pose conformance and detail preserva￾tion compared to HY3D-O… view at source ↗
Figure 5
Figure 5. Figure 5: Our method enables various applications, [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

Posing 3D characters is a fundamental task in computer graphics. However, existing paradigms, ranging from traditional auto-rigging to recent pose-conditioned generative models, frequently struggle with inaccurate skinning weights, fixed mesh topologies, and poor pose conformance. These challenges have become particularly pronounced with the recent explosion of AI-generated 3D assets, which often exhibit flawed structures and fused geometry. To address these issues, we introduce Make-It-Poseable, a novel feed-forward framework that reformulates character posing as a skinning-free latent-space transformation problem. By decoupling shape deformation from the constraints of fixed mesh connectivity, our method directly operates on compact latent representations to reconstruct characters in target poses. To achieve this, our framework integrates a latent posing transformer for shape manipulation, a dense pose representation for fine-grained control, and an adaptive completion module optimized via a bipartite-matched latent loss to robustly handle topological changes. Extensive experiments demonstrate that our method significantly outperforms existing baselines in posing quality. Furthermore, our skeleton-agnostic design exhibits remarkable zero-shot generalization to diverse morphologies including quadrupeds and seamlessly supports various 3D authoring applications such as part replacement and refinement.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Make-It-Poseable, a feed-forward framework that reformulates 3D character posing as a skinning-free latent-space transformation problem. It integrates a latent posing transformer for shape manipulation, a dense pose representation for fine-grained control, and an adaptive completion module optimized via a bipartite-matched latent loss to handle topological changes. The central claims are that the method significantly outperforms existing baselines in posing quality, exhibits zero-shot generalization to diverse morphologies including quadrupeds, and supports 3D authoring applications such as part replacement and refinement, particularly for AI-generated assets with irregular structures.

Significance. If the empirical claims hold with proper validation, this work could meaningfully advance computer graphics by enabling robust posing of AI-generated 3D models without reliance on fixed topologies or accurate skinning weights. The skeleton-agnostic latent-space approach addresses a growing practical need and could influence downstream tasks in 3D content creation.

major comments (2)
  1. [Abstract] Abstract: The claim that the method 'significantly outperforms existing baselines in posing quality' and exhibits 'remarkable zero-shot generalization' is load-bearing for the contribution but is unsupported by any quantitative metrics, error bars, ablation details, or specific experimental results. This absence prevents assessment of the central empirical assertions.
  2. [Method] Method (latent posing transformer and adaptive completion): The assumption that compact latent representations preserve sufficient high-frequency geometric details to reconstruct posed characters without artifacts under arbitrary topological changes (e.g., fused AI-generated geometry) is central to the zero-shot and outperformance claims, yet the manuscript provides no direct evidence or analysis addressing the risk that the encoder discards such information.
minor comments (1)
  1. [Abstract] Abstract: Consider adding one sentence naming the primary baselines used for comparison to contextualize the outperformance claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback and for recognizing the potential impact of our work on posing AI-generated 3D assets. We address each major comment below and have revised the manuscript to improve clarity and provide additional supporting analysis.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that the method 'significantly outperforms existing baselines in posing quality' and exhibits 'remarkable zero-shot generalization' is load-bearing for the contribution but is unsupported by any quantitative metrics, error bars, ablation details, or specific experimental results. This absence prevents assessment of the central empirical assertions.

    Authors: We agree that the abstract would benefit from explicit quantitative anchors to allow immediate assessment of the claims. In the revised version we have updated the abstract to include key metrics (e.g., average Chamfer-distance reduction and zero-shot success rate on quadrupeds) drawn directly from the experimental tables, while preserving conciseness. Full results with error bars, statistical significance, and ablation details remain in Section 4. revision: yes

  2. Referee: [Method] Method (latent posing transformer and adaptive completion): The assumption that compact latent representations preserve sufficient high-frequency geometric details to reconstruct posed characters without artifacts under arbitrary topological changes (e.g., fused AI-generated geometry) is central to the zero-shot and outperformance claims, yet the manuscript provides no direct evidence or analysis addressing the risk that the encoder discards such information.

    Authors: The referee correctly identifies that the manuscript relies primarily on end-to-end empirical success rather than a direct information-preservation study. To address this, we have added a short analysis subsection and supplementary visualizations that compare high-frequency surface details before and after latent encoding/decoding on the most irregular AI-generated examples. We have also included a latent-dimension ablation that quantifies the point at which reconstruction artifacts appear. These additions provide the requested direct evidence without altering the core method. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation relies on independently trained modules and external baselines

full rationale

The paper presents a feed-forward latent posing framework with a latent posing transformer, dense pose representation, and adaptive completion module trained via bipartite-matched latent loss. These components are described as novel architectural choices optimized end-to-end, with performance evaluated against external baselines rather than internal fitted quantities. No self-definitional equations, fitted inputs renamed as predictions, or load-bearing self-citations appear in the abstract or described chain. The zero-shot generalization claims rest on empirical results for diverse morphologies, not on re-deriving inputs by construction. This is a standard non-circular design for a learned model.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the latent representation and bipartite matching are presented as core technical choices without further decomposition.

pith-pipeline@v0.9.0 · 5521 in / 987 out tokens · 34746 ms · 2026-05-16T21:27:44.744095+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. AniGen: Unified $S^3$ Fields for Animatable 3D Asset Generation

    cs.GR 2026-04 unverdicted novelty 7.0

    AniGen directly generates animatable 3D assets with consistent shape, skeleton, and skinning from single images using unified S^3 fields and a two-stage flow-matching pipeline.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · cited by 1 Pith paper · 3 internal anchors

  1. [1]

    Mixamo, 2024.https://www.mixamo.com

    Adobe. Mixamo, 2024.https://www.mixamo.com. 6

  2. [2]

    Automatic rigging and ani- mation of 3D characters.ACM TOG, 26(3):72–es, 2007

    Ilya Baran and Jovan Popovi ´c. Automatic rigging and ani- mation of 3D characters.ACM TOG, 26(3):72–es, 2007. 3

  3. [3]

    Human- Rig: Learning automatic rigging for humanoid character in a large scale dataset, 2024

    Zedong Chu, Feng Xiong, Meiduo Liu, Jinzhi Zhang, Mingqi Shao, Zhaoxu Sun, Di Wang, and Mu Xu. Human- Rig: Learning automatic rigging for humanoid character in a large scale dataset, 2024. 1, 3, 6

  4. [4]

    DetailGen3D: Generative 3D geometry enhancement via data-dependent flow, 2025

    Ken Deng, Yuan-Chen Guo, Jingxiang Sun, Zi-Xin Zou, Yangguang Li, Xin Cai, Yan-Pei Cao, Yebin Liu, and Ding Liang. DetailGen3D: Generative 3D geometry enhancement via data-dependent flow, 2025. 2, 4

  5. [5]

    Anymate: A dataset and baselines for learning 3D object rigging

    Yufan Deng, Yuhao Zhang, Chen Geng, Shangzhe Wu, and Jiajun Wu. Anymate: A dataset and baselines for learning 3D object rigging. InSIGGRAPH Conference Proceedings, Vancouver, BC, Canada, 2025. Association for Computing Machinery. 3

  6. [6]

    Make-It-Animatable: An ef- ficient framework for authoring animation-ready 3D charac- ters

    Zhiyang Guo, Jinxu Xiang, Kai Ma, Wengang Zhou, Houqiang Li, and Ran Zhang. Make-It-Animatable: An ef- ficient framework for authoring animation-ready 3D charac- ters. InCVPR, 2025. 1, 3, 4, 6, 7

  7. [7]

    LRM: Large reconstruction model for single image to 3D

    Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. LRM: Large reconstruction model for single image to 3D. InICLR, 2024. 3

  8. [8]

    DreamWaltz-G: Expressive 3D gaussian avatars from skeleton-guided 2D diffusion.IEEE TPAMI, 2025

    Yukun Huang, Jianan Wang, Ailing Zeng, Zheng-Jun Zha, Lei Zhang, and Xihui Liu. DreamWaltz-G: Expressive 3D gaussian avatars from skeleton-guided 2D diffusion.IEEE TPAMI, 2025. 3

  9. [9]

    AnimaX: Animating the inan- imate in 3D with joint video-pose diffusion models.arXiv preprint arXiv:2506.19851, 2025

    Zehuan Huang, Haoran Feng, Yangtian Sun, Yuanchen Guo, Yanpei Cao, and Lu Sheng. AnimaX: Animating the inan- imate in 3D with joint video-pose diffusion models.arXiv preprint arXiv:2506.19851, 2025. 3

  10. [10]

    LVSM: A large view synthesis model with minimal 3D inductive bias

    Haian Jin, Hanwen Jiang, Hao Tan, Kai Zhang, Sai Bi, Tianyuan Zhang, Fujun Luan, Noah Snavely, and Zexiang Xu. LVSM: A large view synthesis model with minimal 3D inductive bias. InICLR, 2025. 2, 3, 6, 1

  11. [11]

    arXiv preprint arXiv:2508.19247 , year=

    Lin Li, Zehuan Huang, Haoran Feng, Gengxiong Zhuang, Rui Chen, Chunchao Guo, and Lu Sheng. V oxhammer: Training-free precise and coherent 3D editing in native 3D space.arXiv preprint arXiv:2508.19247, 2025. 2

  12. [12]

    RE- LATE3D: Refocusing latent adapter for targeted local en- hancement and editing in 3D generation

    Xiao-Lei Li, Hao-Xiang Chen, Yanni Zhang, Kai Ma, Alan Zhao, Tai-Jiang Mu, Hao-Xiang Guo, and Ran Zhang. RE- LATE3D: Refocusing latent adapter for targeted local en- hancement and editing in 3D generation. InProceedings of the Special Interest Group on Computer Graphics and In- teractive Techniques Conference Conference Papers, pages 1–12, 2025. 2

  13. [13]

    TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models

    Yangguang Li, Zi-Xin Zou, Zexiang Liu, Dehu Wang, Yuan Liang, Zhipeng Yu, Xingchao Liu, Yuan-Chen Guo, Ding Liang, Wanli Ouyang, et al. TripoSG: High-fidelity 3d shape synthesis using large-scale rectified flow models.arXiv preprint arXiv:2502.06608, 2025. 2, 4

  14. [14]

    Tingting Liao, Hongwei Yi, Yuliang Xiu, Jiaxiang Tang, Yangyi Huang, Justus Thies, and Michael J. Black. TADA! Text to animatable digital avatars. In3DV, pages 1508–1519,

  15. [15]

    RigAnything: Template-free autoregressive rigging for diverse 3D assets

    Isabella Liu, Zhan Xu, Wang Yifan, Hao Tan, Zexiang Xu, Xiaolong Wang, Hao Su, and Zifan Shi. RigAnything: Template-free autoregressive rigging for diverse 3D assets. ACM TOG, 44(4):1–12, 2025. 1, 3

  16. [16]

    Zero-1-to-3: Zero-shot one image to 3D object

    Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tok- makov, Sergey Zakharov, and Carl V ondrick. Zero-1-to-3: Zero-shot one image to 3D object. InICCV, pages 9298– 9309, 2023. 2

  17. [17]

    Wonder3D: Sin- gle image to 3D using cross-domain diffusion

    Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, et al. Wonder3D: Sin- gle image to 3D using cross-domain diffusion. InCVPR, pages 9970–9980, 2024. 2

  18. [18]

    Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. SMPL: A skinned multi- person linear model.ACM TOG, 34(6):248:1–248:16, 2015. 3

  19. [19]

    TARig: Adaptive template- aware neural rigging for humanoid characters.Computers & Graphics, 114:158–167, 2023

    Jing Ma and Dongliang Zhang. TARig: Adaptive template- aware neural rigging for humanoid characters.Computers & Graphics, 114:158–167, 2023. 1, 3

  20. [20]

    Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. Expressive body capture: 3D hands, face, and body from a single image. InCVPR, pages 10975– 10985, 2019. 3

  21. [21]

    DreamFusion: Text-to-3D using 2D Diffusion

    Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Milden- hall. DreamFusion: Text-to-3D using 2D diffusion.arXiv preprint arXiv:2209.14988, 2022. 2

  22. [22]

    Alpha wrapping with an offset.ACM TOG, 41(4):1–22, 2022

    C ´edric Portaneri, Mael Rouxel-Labb ´e, Michael Hemmer, David Cohen-Steiner, and Pierre Alliez. Alpha wrapping with an offset.ACM TOG, 41(4):1–22, 2022. 6, 2

  23. [23]

    XCube: Large-scale 3D generative modeling using sparse voxel hierarchies

    Xuanchi Ren, Jiahui Huang, Xiaohui Zeng, Ken Museth, Sanja Fidler, and Francis Williams. XCube: Large-scale 3D generative modeling using sparse voxel hierarchies. In CVPR, pages 4209–4219, 2024. 2

  24. [24]

    Flexible isosurface extraction for gradient-based mesh optimization.ACM TOG, 42(4):1– 16, 2023

    Tianchang Shen, Jacob Munkberg, Jon Hasselgren, Kangxue Yin, Zian Wang, Wenzheng Chen, Zan Gojcic, Sanja Fidler, 9 Nicholas Sharp, and Jun Gao. Flexible isosurface extraction for gradient-based mesh optimization.ACM TOG, 42(4):1– 16, 2023. 3

  25. [25]

    Puppeteer: Rig and animate your 3D models

    Chaoyue Song, Xiu Li, Fan Yang, Zhongcong Xu, Jiacheng Wei, Fayao Liu, Jiashi Feng, Guosheng Lin, and Jianfeng Zhang. Puppeteer: Rig and animate your 3D models. NeurIPS, 2025. 1, 3, 6, 7, 4

  26. [26]

    MagicArticulate: Make your 3D mod- els articulation-ready

    Chaoyue Song, Jianfeng Zhang, Xiu Li, Fan Yang, Yiwen Chen, Zhongcong Xu, Jun Hao Liew, Xiaoyang Guo, Fayao Liu, Jiashi Feng, et al. MagicArticulate: Make your 3D mod- els articulation-ready. InCVPR, pages 15998–16007, 2025. 3

  27. [27]

    DRiVE: Diffusion-based rigging em- powers generation of versatile and expressive characters

    Mingze Sun, Junhao Chen, Junting Dong, Yurun Chen, Xinyu Jiang, Shiwei Mao, Puhua Jiang, Jingbo Wang, Bo Dai, and Ruqi Huang. DRiVE: Diffusion-based rigging em- powers generation of versatile and expressive characters. In CVPR, pages 21170–21180, 2025. 1, 3

  28. [28]

    Splatter image: Ultra-fast single-view 3D recon- struction

    Stanislaw Szymanowicz, Chrisitian Rupprecht, and Andrea Vedaldi. Splatter image: Ultra-fast single-view 3D recon- struction. InCVPR, pages 10208–10217, 2024. 3

  29. [29]

    LGM: Large multi-view gaussian model for high-resolution 3D content creation

    Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. LGM: Large multi-view gaussian model for high-resolution 3D content creation. InECCV, pages 1–18. Springer, 2024. 3

  30. [30]

    Hunyuan3D 2.1: From images to high-fidelity 3D assets with production-ready pbr material,

    Tencent Hunyuan3D Team. Hunyuan3D 2.1: From images to high-fidelity 3D assets with production-ready pbr material,

  31. [31]

    Hunyuan3D-Omni: A unified framework for controllable generation of 3D assets, 2025

    Tencent Hunyuan3D Team. Hunyuan3D-Omni: A unified framework for controllable generation of 3D assets, 2025. 2, 3, 6, 7

  32. [32]

    VGGT: Visual geometry grounded transformer

    Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. VGGT: Visual geometry grounded transformer. InCVPR, pages 5294–5306, 2025. 3

  33. [33]

    DUSt3R: Geometric 3D vision made easy

    Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. DUSt3R: Geometric 3D vision made easy. InCVPR, pages 20697–20709, 2024. 3

  34. [34]

    WonderHuman: Hallucinating unseen parts in dynamic 3D human reconstruction.arXiv preprint arXiv:2502.01045, 2025

    Zilong Wang, Zhiyang Dou, Yuan Liu, Cheng Lin, Xiao Dong, Yunhui Guo, Chenxu Zhang, Xin Li, Wenping Wang, and Xiaohu Guo. WonderHuman: Hallucinating unseen parts in dynamic 3D human reconstruction.arXiv preprint arXiv:2502.01045, 2025. 3

  35. [35]

    Meshlrm: Large reconstruction model for high- quality mesh

    Xinyue Wei, Kai Zhang, Sai Bi, Hao Tan, Fujun Luan, Valentin Deschaintre, Kalyan Sunkavalli, Hao Su, and Zex- iang Xu. MeshLRM: Large reconstruction model for high- quality mesh.arXiv preprint arXiv:2404.12385, 2024. 3

  36. [36]

    An- imateAnyMesh: A feed-forward 4D foundation model for text-driven universal mesh animation

    Zijie Wu, Chaohui Yu, Fan Wang, and Xiang Bai. An- imateAnyMesh: A feed-forward 4D foundation model for text-driven universal mesh animation. InICCV, 2025. 3

  37. [37]

    Structured 3D latents for scalable and versatile 3D generation

    Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, and Jiaolong Yang. Structured 3D latents for scalable and versatile 3D generation. InCVPR, pages 21469–21480, 2025. 2

  38. [38]

    InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

    Jiale Xu, Weihao Cheng, Yiming Gao, Xintao Wang, Shenghua Gao, and Ying Shan. InstantMesh: Efficient 3D mesh generation from a single image with sparse-view large reconstruction models.arXiv preprint arxiv:2404.07191,

  39. [39]

    GRM: Large gaussian reconstruction model for ef- ficient 3D reconstruction and generation

    Yinghao Xu, Zifan Shi, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, and Gordon Wet- zstein. GRM: Large gaussian reconstruction model for ef- ficient 3D reconstruction and generation. InECCV, pages 1–20. Springer, 2024. 3

  40. [40]

    RigNet: Neural rigging for articu- lated characters.ACM TOG, 39(4):58:58:1–58:58:14, 2020

    Zhan Xu, Yang Zhou, Evangelos Kalogerakis, Chris Lan- dreth, and Karan Singh. RigNet: Neural rigging for articu- lated characters.ACM TOG, 39(4):58:58:1–58:58:14, 2020. 1, 3

  41. [41]

    arXiv preprint arXiv:2506.21076 (2025) 4, 8

    Hongyu Yan, Kunming Luo, Weiyu Li, Yixun Liang, Sheng- ming Li, Jingwei Huang, Chunchao Guo, and Ping Tan. PoseMaster: Generating 3D characters in arbitrary poses from a single image.arXiv preprint arXiv:2506.21076, 2025. 2, 3

  42. [42]

    X-Part: high fidelity and structure coher- ent shape decomposition.arXiv preprint arXiv:2509.08643,

    Xinhao Yan, Jiachen Xu, Yang Li, Changfeng Ma, Yunhan Yang, Chunshi Wang, Zibo Zhao, Zeqiang Lai, Yunfei Zhao, Zhuo Chen, et al. X-Part: high fidelity and structure coher- ent shape decomposition.arXiv preprint arXiv:2509.08643,

  43. [43]

    Holopart: Generative 3d part amodal segmentation.arXiv preprint arXiv:2504.07943, 2025

    Yunhan Yang, Yuan-Chen Guo, Yukun Huang, Zi-Xin Zou, Zhipeng Yu, Yangguang Li, Yan-Pei Cao, and Xihui Liu. HoloPart: Generative 3D part amodal segmentation.arXiv preprint arXiv:2504.07943, 2025. 2

  44. [44]

    Hi3dgen: High-fidelity 3d geometry generation from images via normal bridging.arXiv preprint arXiv:2503.22236, 3:2,

    Chongjie Ye, Yushuang Wu, Ziteng Lu, Jiahao Chang, Xi- aoyang Guo, Jiaqing Zhou, Hao Zhao, and Xiaoguang Han. Hi3DGen: High-fidelity 3D geometry generation from im- ages via normal bridging.arXiv preprint arXiv:2503.22236, 3:2, 2025. 2

  45. [45]

    HumanRAM: Feed-forward human reconstruction and animation model using transformers

    Zhiyuan Yu, Zhe Li, Hujun Bao, Can Yang, and Xiaowei Zhou. HumanRAM: Feed-forward human reconstruction and animation model using transformers. InSIGGRAPH Conference Proceedings, 2025. 3, 6

  46. [46]

    3DShape2VecSet: A 3D shape representation for neural fields and generative diffusion models.ACM TOG, 42 (4):92:1–92:16, 2023

    Biao Zhang, Jiapeng Tang, Matthias Nießner, and Peter Wonka. 3DShape2VecSet: A 3D shape representation for neural fields and generative diffusion models.ACM TOG, 42 (4):92:1–92:16, 2023. 2, 4, 6, 1

  47. [47]

    Advances in feed-forward 3d reconstruction and view synthesis: A survey.arXiv preprint arXiv:2507.14501, 2025

    Jiahui Zhang, Yuelei Li, Anpei Chen, Muyu Xu, Kunhao Liu, Jianyuan Wang, Xiao-Xiao Long, Hanxue Liang, Zex- iang Xu, Hao Su, et al. Advances in feed-forward 3D re- construction and view synthesis: A survey.arXiv preprint arXiv:2507.14501, 2025. 3

  48. [48]

    One model to rig them all: Diverse skeleton rigging with UniRig.ACM TOG, 44(4):1–18, 2025

    Jia-Peng Zhang, Cheng-Feng Pu, Meng-Hao Guo, Yan-Pei Cao, and Shi-Min Hu. One model to rig them all: Diverse skeleton rigging with UniRig.ACM TOG, 44(4):1–18, 2025. 1, 3

  49. [49]

    GS-LRM: Large recon- struction model for 3D gaussian splatting

    Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, and Zexiang Xu. GS-LRM: Large recon- struction model for 3D gaussian splatting. InECCV, pages 1–19. Springer, 2024. 3

  50. [50]

    CLAY: A controllable large-scale generative model for cre- ating high-quality 3D assets.ACM TOG, 43(4):1–20, 2024

    Longwen Zhang, Ziyu Wang, Qixuan Zhang, Qiwei Qiu, Anqi Pang, Haoran Jiang, Wei Yang, Lan Xu, and Jingyi Yu. CLAY: A controllable large-scale generative model for cre- ating high-quality 3D assets.ACM TOG, 43(4):1–20, 2024. 2

  51. [51]

    post-transformer

    Longwen Zhang, Qixuan Zhang, Haoran Jiang, Yinuo Bai, Wei Yang, Lan Xu, and Jingyi Yu. BANG: Dividing 3D assets via generative exploded dynamics.ACM TOG, 44(4): 1–21, 2025. 2 10 Make-It-Poseable: Feed-forward Latent Posing Model for 3D Humanoid Character Animation Supplementary Material A. Implementation Details A.1. Model Details A.1.1. Shape V AE Our 3D...