pith. sign in

arxiv: 2605.19355 · v1 · pith:HEJG7CUTnew · submitted 2026-05-19 · 💻 cs.GR · cs.AI· cs.CV· cs.LG

Skinned Motion Retargeting with Spatially Adaptive Interaction Guidance

Pith reviewed 2026-05-20 02:27 UTC · model grok-4.3

classification 💻 cs.GR cs.AIcs.CVcs.LG
keywords motion retargetinginteraction preservationadaptive anchorstransformergraph autoencoderproximityskinned animationspatial guidance
0
0 comments X

The pith

Motion retargeting preserves self-contacts using dynamically repositioned anchors

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces a motion retargeting method that keeps interaction details like touches and close approaches when transferring movement between characters of very different shapes. It replaces fixed anchor points with ones that a Transformer moves to reachable spots on the new body, using a soft projection to stay on the surface. These moving anchors then steer a graph autoencoder that outputs the adjusted skeleton motion. Readers interested in animation would find this useful because it reduces the loss of meaning in interactions that happens with current techniques on exaggerated characters.

Core claim

By performing proximity matching over spatially adaptive anchors that are dynamically repositioned via a Transformer-based refinement strategy with differentiable soft projection, the method preserves interaction semantics such as self-contact and near-body proximity across characters with exaggerated body proportions, outperforming state-of-the-art approaches that use static correspondences.

What carries the argument

Spatially adaptive anchors repositioned by a Transformer-based strategy with differentiable soft projection that supply pose-dependent guidance to a graph autoencoder.

If this is right

  • The approach handles exaggerated body proportions where static methods fail.
  • Alternating optimization aligns the anchor adaptation and motion retargeting tasks.
  • Graph autoencoder uses the adapted anchors to predict target motion while preserving spatial configurations.
  • Evaluations show improved interaction fidelity on diverse character geometries.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This could simplify animation workflows for games featuring characters of varying scales and shapes.
  • The refinement strategy might extend to other tasks involving dynamic spatial relationships in 3D models.
  • Integrating the method with physics constraints could further improve realism in contact handling.

Load-bearing premise

The displacements predicted by the Transformer, once softly projected onto the target geometry, still capture the source pose's spatial structures effectively for guiding the retargeting.

What would settle it

Observe whether hand-to-head or hand-to-hand contacts in a source motion with a tall thin character are preserved when retargeted to a short stocky character, without new intersections or lost proximities.

Figures

Figures reproduced from arXiv: 2605.19355 by Chaelin Kim, Junghyun Nam, Junhyuk Jeon, Junyong Noh, Seokhyeon Hong, Soojin Choi.

Figure 1
Figure 1. Figure 1: We propose a geometry-aware motion retargeting framework that transfers motion between skinned characters by leveraging proximity matching over [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of our method. Pink lines indicate the input-output flow of the Adaptive Anchor Sampling module, whereas blue lines denote that of the [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visualizations of initial anchors and their corresponding body parts. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Overview of Adaptive Anchor Sampling module. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of the initial (left) and adapted (right) anchors on the [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Overview of (a) anchor encoder, (b) retarget encoder, and (c) retarget [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative comparison with baselines on skinned motion retargeting. Each row presents a case where the source character performs a motion involving [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative comparison of the ablation study on the Adaptive Anchor Sampling module. The leftmost figure shows the source pose with its initial [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Qualitative results of the ablation study on the use of [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: Qualitative results of the ablation study on the Proximity-based [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗
Figure 10
Figure 10. Figure 10: Qualitative results of the ablation on the parameter [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗
Figure 12
Figure 12. Figure 12: Qualitative comparison of training strategies. The leftmost figure [PITH_FULL_IMAGE:figures/full_fig_p015_12.png] view at source ↗
Figure 1
Figure 1. Figure 1: Comparison of training loss curves under alternating optimization (red) and joint optimization (blue). [PITH_FULL_IMAGE:figures/full_fig_p019_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Qualitative results of adaptive anchor behavior on a fixed target character under varying source characters and poses. A query anchor is selected on [PITH_FULL_IMAGE:figures/full_fig_p022_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative results on an out-of-domain character from the RigNet [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison between direct optimization and our learning [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative results of the ablation on the learning rate (lr) for anchor [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗
Figure 6
Figure 6. Figure 6: Methods Avg. Time / Frame (sec) ↓ (a) Direct Optimization 6.237 (b) Learning-based (Ours) 0.00545 In addition, our method avoids iterative per-sequence optimization at test time. For the sequence shown in [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗
read the original abstract

Retargeting motion across characters with varying body shapes while preserving interaction semantics, such as self-contact and near-body proximity, remains a challenging problem. While recent geometry-aware approaches address this by maintaining spatial relationships between predefined corresponding regions, their reliance on static correspondences often struggles when the target character exhibits exaggerated body proportions. In this paper, we present a geometry-aware motion retargeting framework that preserves interaction semantics by performing proximity matching over spatially adaptive anchors. Unlike prior methods with static anchor definitions, the proposed method dynamically repositions anchors to reachable regions on the target character. This is achieved via a Transformer-based anchor refinement strategy that predicts anchor displacements and constrains the translated anchors to remain on the target character geometry through differentiable soft projection. By incorporating pose-dependent spatial structures from the source character, the adapted anchors provide structurally coherent guidance for interaction-aware retargeting. Conditioned on these anchors, a graph-based autoencoder predicts target skeletal motion that preserves the spatial configuration of the source. To encourage task-aligned optimization between anchor adaptation and motion retargeting, we adopt an alternating training scheme in which each module is optimized in turn. Through extensive evaluations, we demonstrate that our method outperforms state-of-the-art approaches in preserving interaction fidelity across diverse character geometries.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to introduce a geometry-aware skinned motion retargeting method that dynamically repositions anchors on the target character using a Transformer to predict displacements, followed by differentiable soft projection to constrain anchors to the target geometry. These adapted anchors, which incorporate pose-dependent spatial structures from the source, then condition a graph autoencoder to predict target skeletal motion while preserving interaction semantics such as self-contact and near-body proximity. An alternating optimization scheme aligns the anchor adaptation and retargeting modules, with the abstract asserting outperformance over state-of-the-art methods across diverse character geometries based on extensive evaluations.

Significance. If the central claims hold with supporting quantitative evidence, the work would represent a meaningful advance in computer graphics for interaction-preserving retargeting, particularly for characters with exaggerated proportions where static correspondence methods fail. The dynamic anchor approach combined with alternating training could influence downstream applications in animation, games, and virtual reality by better maintaining spatial interaction fidelity.

major comments (2)
  1. [§3.2] §3.2 (Transformer-based anchor refinement and soft projection): The claim that adapted anchors 'provide structurally coherent guidance' for the graph autoencoder rests on the assumption that differentiable soft projection preserves source pose-dependent distances and contact semantics. However, when the target has exaggerated proportions, a small source displacement can map to collapsed or stretched configurations on the target mesh, potentially breaking the proximity matching that underpins the interaction preservation claim. This is load-bearing for the central contribution and requires either a formal analysis of distance preservation or targeted ablations.
  2. [Evaluation section (likely §5)] Evaluation section (likely §5): The abstract states that the method 'outperforms state-of-the-art approaches in preserving interaction fidelity' via 'extensive evaluations,' yet the provided text contains no quantitative metrics, error bars, ablation results on the soft-projection module, or dataset details. Without these, the outperformance claim cannot be assessed; specific tables or figures reporting interaction-error reductions (e.g., self-contact distance or proximity metrics) are needed to substantiate the result.
minor comments (2)
  1. [Method] Clarify the exact mathematical definition of the differentiable soft projection (e.g., whether it is barycentric interpolation or a learned weighted sum) and how gradients flow through it during alternating optimization.
  2. [Discussion or Conclusion] Add a brief discussion of failure cases or limitations when the target geometry deviates extremely from the source, to contextualize the scope of the spatially adaptive anchors.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below, indicating where revisions have been made to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Transformer-based anchor refinement and soft projection): The claim that adapted anchors 'provide structurally coherent guidance' for the graph autoencoder rests on the assumption that differentiable soft projection preserves source pose-dependent distances and contact semantics. However, when the target has exaggerated proportions, a small source displacement can map to collapsed or stretched configurations on the target mesh, potentially breaking the proximity matching that underpins the interaction preservation claim. This is load-bearing for the central contribution and requires either a formal analysis of distance preservation or targeted ablations.

    Authors: We agree this is a load-bearing assumption and thank the referee for identifying the need for stronger justification. The soft projection is formulated to map source displacements onto the target surface while approximately preserving relative distances for small motions. In the revision, we have added a formal analysis in §3.2 bounding the distortion introduced by the projection operator in terms of target mesh curvature. We have also included targeted ablations in §5 that measure self-contact and proximity errors with and without soft projection on characters with exaggerated proportions, confirming improved preservation of interaction semantics. revision: yes

  2. Referee: Evaluation section (likely §5): The abstract states that the method 'outperforms state-of-the-art approaches in preserving interaction fidelity' via 'extensive evaluations,' yet the provided text contains no quantitative metrics, error bars, ablation results on the soft-projection module, or dataset details. Without these, the outperformance claim cannot be assessed; specific tables or figures reporting interaction-error reductions (e.g., self-contact distance or proximity metrics) are needed to substantiate the result.

    Authors: The full manuscript contains quantitative results in Section 5, including tables with self-contact distance and proximity errors (with standard deviations as error bars) and dataset details in Section 4. However, to directly address the referee's request for explicit support of the outperformance claim, we have added a new ablation subsection (§5.4) isolating the soft-projection module and additional figures comparing interaction-error reductions against baselines. These changes make the evidence more prominent and accessible. revision: partial

Circularity Check

0 steps flagged

No circularity: standard neural pipeline with independent components

full rationale

The derivation chain relies on a Transformer predicting anchor displacements, followed by differentiable soft projection onto target geometry, then conditioning a graph autoencoder on the resulting anchors for skeletal motion prediction, with alternating optimization. None of these steps reduce by construction to fitted parameters or self-citations within the paper's own equations; the components are standard architectures whose outputs are not tautologically defined by the inputs. The central claim of preserving interaction semantics is supported by external evaluations rather than internal redefinition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no explicit free parameters, axioms, or invented entities are detailed; the method appears to build on existing neural architectures without introducing new postulated physical entities.

pith-pipeline@v0.9.0 · 5778 in / 1226 out tokens · 37945 ms · 2026-05-20T02:27:50.976739+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 2 internal anchors

  1. [1]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Learning to sample , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  2. [2]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Samplenet: Differentiable point cloud sampling , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  3. [3]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Attention discriminant sampling for point clouds , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  4. [4]

    Computer Graphics Forum , volume=

    Aura mesh: Motion retargeting to preserve the spatial relationships between skinned characters , author=. Computer Graphics Forum , volume=. 2018 , organization=

  5. [5]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Neural kinematic networks for unsupervised motion retargetting , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  6. [6]

    30th British Machine Vision Conference (BMVC 2019) , year=

    Pmnet: Learning of disentangled pose and movement for unsupervised motion retargeting , author=. 30th British Machine Vision Conference (BMVC 2019) , year=

  7. [7]

    ACM Transactions on Graphics (TOG) , volume=

    Skeleton-aware networks for deep motion retargeting , author=. ACM Transactions on Graphics (TOG) , volume=. 2020 , publisher=

  8. [8]

    SIGGRAPH Asia 2023 Conference Papers , pages=

    Same: Skeleton-agnostic motion embedding for character animation , author=. SIGGRAPH Asia 2023 Conference Papers , pages=

  9. [9]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Contact-aware retargeting of skinned motion , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  10. [10]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Skinned motion retargeting with residual perception of motion semantics & geometry , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  11. [11]

    Computer Vision and Image Understanding , volume=

    MoMa: Skinned motion retargeting using masked pose modeling , author=. Computer Vision and Image Understanding , volume=. 2024 , publisher=

  12. [12]

    Advances in Neural Information Processing Systems , volume=

    Skinned motion retargeting with dense geometric interaction perception , author=. Advances in Neural Information Processing Systems , volume=

  13. [13]

    2024 , publisher=

    Learning-based Self-Collision Avoidance in Retargeting using Body Part-specific Signed Distance Fields , author=. 2024 , publisher=

  14. [14]

    arXiv preprint arXiv:2504.06504 , year=

    STaR: Seamless Spatial-Temporal Aware Motion Retargeting with Penetration and Consistency Constraints , author=. arXiv preprint arXiv:2504.06504 , year=

  15. [15]

    Computer Graphics Forum , pages=

    ReConForM: Real-time Contact-aware Motion Retargeting for more Diverse Character Morphologies , author=. Computer Graphics Forum , pages=. 2025 , organization=

  16. [16]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    On the continuity of rotation representations in neural networks , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  17. [17]

    ACM Transactions on Graphics (TOG) , volume=

    Phase-functioned neural networks for character control , author=. ACM Transactions on Graphics (TOG) , volume=. 2017 , publisher=

  18. [18]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Pointnet: Deep learning on point sets for 3d classification and segmentation , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  19. [19]

    ACM Transactions on Graphics (TOG) , volume=

    Computer puppetry: An importance-based approach , author=. ACM Transactions on Graphics (TOG) , volume=. 2001 , publisher=

  20. [20]

    ACM Transactions on Graphics (TOG) , volume=

    Geometry-aware retargeting for two-skinned characters interaction , author=. ACM Transactions on Graphics (TOG) , volume=. 2024 , publisher=

  21. [21]

    MotionBuilder , author =

  22. [22]

    Computer Graphics Forum , volume=

    Character contact re-positioning under large environment deformation , author=. Computer Graphics Forum , volume=. 2016 , organization=

  23. [23]

    Proceedings of the 18th annual ACM SIGGRAPH/Eurographics Symposium on Computer Animation , pages=

    Interaction motion retargeting to highly dissimilar furniture environment , author=. Proceedings of the 18th annual ACM SIGGRAPH/Eurographics Symposium on Computer Animation , pages=

  24. [24]

    ACM SIGGRAPH 2023 Conference Proceedings , pages=

    Simulation and retargeting of complex multi-character interactions , author=. ACM SIGGRAPH 2023 Conference Proceedings , pages=

  25. [25]

    ACM SIGGRAPH 2010 papers , pages=

    Spatial relationship preserving character motion adaptation , author=. ACM SIGGRAPH 2010 papers , pages=

  26. [26]

    Proceedings of the 11th ACM SIGGRAPH Conference on Motion, Interaction and Games , pages=

    Surface based motion retargeting by preserving spatial relationship , author=. Proceedings of the 11th ACM SIGGRAPH Conference on Motion, Interaction and Games , pages=

  27. [27]

    Proceedings of the 12th ACM SIGGRAPH Conference on Motion, Interaction and Games , pages=

    Contact preserving shape transfer for rigging-free motion retargeting , author=. Proceedings of the 12th ACM SIGGRAPH Conference on Motion, Interaction and Games , pages=

  28. [28]

    Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation , pages=

    Relationship descriptors for interactive motion adaptation , author=. Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation , pages=

  29. [29]

    IEEE transactions on visualization and computer graphics , volume=

    Retargeting human-object interaction to virtual avatars , author=. IEEE transactions on visualization and computer graphics , volume=. 2016 , publisher=

  30. [30]

    Computer Graphics Forum , pages=

    InterFaceRays: Interaction-Oriented Furniture Surface Representation for Human Pose Retargeting , author=. Computer Graphics Forum , pages=. 2025 , organization=

  31. [31]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Semantics-aware motion retargeting with vision-language models , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  32. [32]

    Proceedings of the 25th annual conference on Computer graphics and interactive techniques , pages=

    Retargetting motion to new characters , author=. Proceedings of the 25th annual conference on Computer graphics and interactive techniques , pages=

  33. [33]

    The Journal of Visualization and Computer Animation , volume=

    Online motion retargetting , author=. The Journal of Visualization and Computer Animation , volume=. 2000 , publisher=

  34. [34]

    Proceedings of the 26th annual conference on Computer graphics and interactive techniques , pages=

    A hierarchical approach to interactive motion editing for human-like figures , author=. Proceedings of the 26th annual conference on Computer graphics and interactive techniques , pages=

  35. [35]

    ACM Transactions on Graphics (TOG) , volume=

    Ultrafast and Controllable Online Motion Retargeting for Game Scenarios , author=. ACM Transactions on Graphics (TOG) , volume=. 2025 , publisher=

  36. [36]

    Computer Graphics Forum , volume=

    Online Avatar Motion Adaptation to Morphologically-similar Spaces , author=. Computer Graphics Forum , volume=. 2023 , organization=

  37. [37]

    ACM Transactions on Graphics , volume=

    Neural state machine for character-scene interactions , author=. ACM Transactions on Graphics , volume=. 2019 , publisher=

  38. [38]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Stochastic scene-aware motion prediction , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  39. [39]

    Computer Graphics Forum , volume=

    Dafnet: Generating diverse actions for furniture interaction by learning conditional pose distribution , author=. Computer Graphics Forum , volume=. 2023 , organization=

  40. [40]

    SIGGRAPH Asia 2018 Posters , pages=

    A variational u-net for motion retargeting , author=. SIGGRAPH Asia 2018 Posters , pages=

  41. [41]

    IEEE Transactions on Visualization and Computer Graphics , volume=

    Pose-aware attention network for flexible motion retargeting by body part , author=. IEEE Transactions on Visualization and Computer Graphics , volume=. 2023 , publisher=

  42. [42]

    European Conference on Computer Vision , pages=

    Couch: Towards controllable human-chair interactions , author=. European Conference on Computer Vision , pages=. 2022 , organization=

  43. [43]

    Computer Animation and Virtual Worlds , volume=

    A variational U-Net for motion retargeting , author=. Computer Animation and Virtual Worlds , volume=. 2020 , publisher=

  44. [44]

    Advances in neural information processing systems , volume=

    Attention is all you need , author=. Advances in neural information processing systems , volume=

  45. [45]

    Graph Attention Networks

    Graph attention networks , author=. arXiv preprint arXiv:1710.10903 , year=

  46. [46]

    Communications of the ACM , volume=

    Generative adversarial networks , author=. Communications of the ACM , volume=. 2020 , publisher=

  47. [47]

    Adam: A Method for Stochastic Optimization

    Adam: A method for stochastic optimization , author=. arXiv preprint arXiv:1412.6980 , year=

  48. [48]

    Computers & Graphics , volume=

    Using task efficient contact configurations to animate creatures in arbitrary environments , author=. Computers & Graphics , volume=. 2014 , publisher=

  49. [49]

    Rignet: Neural rigging for artic- ulated characters.arXiv preprint arXiv:2005.00559, 2020

    Rignet: Neural rigging for articulated characters , author=. arXiv preprint arXiv:2005.00559 , year=

  50. [50]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Motion synthesis with sparse and flexible keyjoint control , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  51. [51]

    Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

    Goal-driven human motion synthesis in diverse task , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

  52. [52]

    Computer Graphics Forum , volume=

    Generative motion infilling from imprecisely timed keyframes , author=. Computer Graphics Forum , volume=. 2025 , organization=

  53. [53]

    Computers & Graphics , volume=

    Contact preserving shape transfer: Retargeting motion from one shape to another , author=. Computers & Graphics , volume=. 2020 , publisher=