Recognition: unknown
Contact Matrix: Enhancing Dance Motion Synthesis with Precise Interaction Modeling
Pith reviewed 2026-05-08 17:53 UTC · model grok-4.3
The pith
A contact-aware diffusion model that jointly generates duet dance motions and an explicit interaction matrix produces more precise physical contacts and better rhythmic alignment than prior methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that jointly generating motion and a contact matrix between two individuals inside a contact-aware diffusion model supplies explicit interaction modeling that guides sampling toward more precise and constrained dynamics, producing lower FID_k and FID_cd scores together with higher BED scores than the Duolando baseline.
What carries the argument
The contact matrix, an explicit representation of pairwise body-part contacts that is produced simultaneously with the motion sequence inside the diffusion model and used to constrain the generation trajectory.
If this is right
- The generated motions exhibit tighter physical contact fidelity as measured by reduced FID_cd.
- Rhythmic synchronization between dancers improves, reflected in elevated BED scores.
- The two-stage separation allows the contact signal to steer sampling even when high-quality duet data remains scarce.
- Body-part inconsistencies are reduced by the joint decoder in the first-stage VQ-VAE.
Where Pith is reading between the lines
- The same contact-matrix mechanism could be applied to other paired physical activities such as partner sports or object hand-offs.
- Extending the matrix to record forces or velocities rather than binary contacts might further tighten the interaction constraints.
- Because the matrix is generated at inference time, the model could accept user-specified contact patterns to control interaction style without retraining.
Load-bearing premise
Jointly generating the motion and contact matrix supplies enough guidance to enforce precise interactions without extra loss terms or larger datasets.
What would settle it
Retraining the diffusion stage without the contact-matrix output head and measuring whether interaction-specific metrics (FID_cd and BED) fall back to or below the Duolando baseline on the same test set would directly test the claim.
Figures
read the original abstract
Generating realistic reactive motions, in which one person reacts to the fixed motions of others, is challenging due to strict interaction constraints and a limited feasible solution space. This paper focuses on a typical scenario: duet dance, where high-quality data is scarce, motion patterns are complex, and the details of human interactions are both intricate and abundant. To tackle these challenges, we propose a novel two-stage framework. In the first stage, we introduce a motion VQ-VAE with separate body-part encoders and a joint decoder, enabling specialized codebooks to enhance representation capacity while dynamically modeling dependencies across body parts during decoding, thereby preventing inconsistencies in the generated motions. In the second stage, we propose a contact-aware diffusion model for reactive motion generation that jointly generates motion and a contact matrix between individuals, enabling explicit interaction modeling and providing guidance toward more precise and constrained interaction dynamics during sampling. Experiments show that our method outperforms Duolando with lower $\text{FID}_k$ (8.89 vs. 25.30) and $\text{FID}_{cd}$ (8.01 vs. 9.97), as well as a higher BED (0.4606 vs. 0.2858), indicating improved interaction fidelity and rhythmic synchronization.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that a two-stage framework improves reactive duet dance motion synthesis: a VQ-VAE with per-body-part encoders and joint decoder in stage 1, followed by a contact-aware diffusion model in stage 2 that jointly generates motion and an inter-person contact matrix to enforce precise interaction constraints. Experiments report quantitative gains over Duolando (FID_k 8.89 vs. 25.30, FID_cd 8.01 vs. 9.97, BED 0.4606 vs. 0.2858).
Significance. If the contact matrix genuinely supplies effective guidance during sampling, the work would meaningfully advance multi-person motion synthesis in data-scarce, high-constraint domains such as duet dance by addressing both representation capacity and interaction fidelity. The body-part codebook design is a concrete strength for handling complex dependencies.
major comments (2)
- [Abstract] Abstract and framework description: the central claim that jointly generating the contact matrix 'enables explicit interaction modeling and providing guidance toward more precise and constrained interaction dynamics during sampling' lacks any described auxiliary contact loss, contact-masking schedule, or post-sampling enforcement. Without these, the reported metric improvements cannot be confidently attributed to the contact matrix rather than the VQ-VAE stage or other training choices.
- [Experiments] The assumption that the learned joint distribution over motion and contact matrix will produce accurate, feasible contacts is load-bearing for the interaction-fidelity claim, yet no quantitative evaluation of contact-matrix accuracy (e.g., contact prediction error or intersection volume) is provided to verify that the matrix actually steers samples into valid states.
minor comments (2)
- [Abstract] Please add a citation for the Duolando baseline in the abstract and methods.
- [Abstract] Notation for FID_k and FID_cd should be defined at first use or in a table caption.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We address each major comment below and have revised the manuscript to incorporate clarifications and additional analyses where needed.
read point-by-point responses
-
Referee: [Abstract] Abstract and framework description: the central claim that jointly generating the contact matrix 'enables explicit interaction modeling and providing guidance toward more precise and constrained interaction dynamics during sampling' lacks any described auxiliary contact loss, contact-masking schedule, or post-sampling enforcement. Without these, the reported metric improvements cannot be confidently attributed to the contact matrix rather than the VQ-VAE stage or other training choices.
Authors: The contact matrix is generated jointly with the motion as an additional output channel in the diffusion model. The training objective is the standard diffusion loss applied simultaneously to both modalities, enabling the network to learn their joint distribution and implicit dependencies directly from data. This provides guidance during sampling because each denoising step produces motion and contacts that are consistent with each other by construction, without requiring an auxiliary loss, masking schedule, or post-processing enforcement. We have expanded the abstract and Section 3 to describe this mechanism more explicitly. To strengthen attribution of the gains, we have added an ablation study (new Table X) comparing the full model against a motion-only diffusion variant; the results show clear degradation in FID_k, FID_cd, and BED when the contact matrix is removed, indicating its contribution beyond the VQ-VAE stage. revision: yes
-
Referee: [Experiments] The assumption that the learned joint distribution over motion and contact matrix will produce accurate, feasible contacts is load-bearing for the interaction-fidelity claim, yet no quantitative evaluation of contact-matrix accuracy (e.g., contact prediction error or intersection volume) is provided to verify that the matrix actually steers samples into valid states.
Authors: We agree that direct quantitative validation of contact accuracy would provide stronger support for the claim. In the revised manuscript we have added a new evaluation subsection reporting contact prediction accuracy (fraction of correctly classified contact pairs) and average intersection volume (penetration depth) between the two dancers on generated samples. These metrics confirm low error rates and feasible contacts, consistent with the observed improvements in interaction fidelity. revision: yes
Circularity Check
No circularity: two-stage model evaluated via independent empirical metrics
full rationale
The paper's core contribution is a two-stage architecture (VQ-VAE followed by contact-aware diffusion) whose outputs are assessed on external metrics (FID_k, FID_cd, BED) against a named baseline (Duolando). No equation or claim reduces a result to its own inputs by definition, no fitted parameter is relabeled as a prediction, and no load-bearing premise rests on self-citation chains. The contact matrix is generated as an explicit joint output rather than presupposed, and performance differences are reported as measured quantities, not derived tautologies.
Axiom & Free-Parameter Ledger
free parameters (1)
- codebook sizes for body parts
axioms (1)
- domain assumption Joint decoding of body parts prevents motion inconsistencies
invented entities (1)
-
contact matrix
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Kfir Aberman, Rundi Wu, Dani Lischinski, Baoquan Chen, and Daniel Cohen-Or. Learning character-agnostic motion for motion retargeting in 2d.arXiv preprint arXiv:1905.01680, 2019. 3
-
[2]
Skeleton- aware networks for deep motion retargeting.ACM Transac- tions on Graphics (TOG), 39(4):62–1, 2020
Kfir Aberman, Peizhuo Li, Dani Lischinski, Olga Sorkine- Hornung, Daniel Cohen-Or, and Baoquan Chen. Skeleton- aware networks for deep motion retargeting.ACM Transac- tions on Graphics (TOG), 39(4):62–1, 2020. 3
2020
-
[3]
To react or not to react: End-to-end visual pose forecasting for personalized avatar during dyadic con- versations
Chaitanya Ahuja, Shugao Ma, Louis-Philippe Morency, and Yaser Sheikh. To react or not to react: End-to-end visual pose forecasting for personalized avatar during dyadic con- versations. In2019 International conference on multimodal interaction, pages 74–84, 2019. 2
2019
-
[4]
Pose-conditioned joint an- gle limits for 3d human pose reconstruction
Ijaz Akhter and Michael J Black. Pose-conditioned joint an- gle limits for 3d human pose reconstruction. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 1446–1455, 2015. 2, 3
2015
-
[5]
Listen, denoise, action! audio-driven motion synthesis with diffusion models.ACM Transactions on Graphics (TOG), 42(4):1–20, 2023
Simon Alexanderson, Rajmund Nagy, Jonas Beskow, and Gustav Eje Henter. Listen, denoise, action! audio-driven motion synthesis with diffusion models.ACM Transactions on Graphics (TOG), 42(4):1–20, 2023. 1
2023
-
[6]
Belfusion: Latent diffusion for behavior-driven human mo- tion prediction
German Barquero, Sergio Escalera, and Cristina Palmero. Belfusion: Latent diffusion for behavior-driven human mo- tion prediction. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 2317–2327,
-
[7]
Zhi Cen, Huaijin Pi, Sida Peng, Qing Shuai, Yujun Shen, Hujun Bao, Xiaowei Zhou, and Ruizhen Hu. Ready-to-react: Online reaction policy for two-character interaction genera- tion.arXiv preprint arXiv:2502.20370, 2025. 2, 3
-
[8]
Text2hoi: Text-guided 3d motion generation for hand- object interaction
Junuk Cha, Jihyeon Kim, Jae Shin Yoon, and Seungryul Baek. Text2hoi: Text-guided 3d motion generation for hand- object interaction. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 1577–1585, 2024. 2, 3
2024
-
[9]
Executing your commands via motion diffusion in latent space
Xin Chen, Biao Jiang, Wen Liu, Zilong Huang, Bin Fu, Tao Chen, and Gang Yu. Executing your commands via motion diffusion in latent space. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18000–18010, 2023. 3
2023
-
[10]
Interaction transformer for human reaction generation.IEEE Transactions on Multimedia, 25: 8842–8854, 2023
Baptiste Chopin, Hao Tang, Naima Otberdout, Mohamed Daoudi, and Nicu Sebe. Interaction transformer for human reaction generation.IEEE Transactions on Multimedia, 25: 8842–8854, 2023. 1, 2, 6, 7
2023
-
[11]
Mofusion: A framework for denoising-diffusion-based motion synthesis
Rishabh Dabral, Muhammad Hamza Mughal, Vladislav Golyanik, and Christian Theobalt. Mofusion: A framework for denoising-diffusion-based motion synthesis. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9760–9770, 2023. 3
2023
-
[12]
Adam: A method for stochastic opti- mization.(No Title), 2014
P Kingma Diederik. Adam: A method for stochastic opti- mization.(No Title), 2014. 6
2014
-
[13]
Cg-hoi: Contact-guided 3d human-object interaction generation
Christian Diller and Angela Dai. Cg-hoi: Contact-guided 3d human-object interaction generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19888–19901, 2024. 2
2024
-
[14]
Presence and interaction in mixed reality envi- ronments.The Visual Computer, 23:317–333, 2007
Arjan Egges, George Papagiannakis, and Nadia Magnenat- Thalmann. Presence and interaction in mixed reality envi- ronments.The Visual Computer, 23:317–333, 2007. 1
2007
-
[15]
Remos: 3d motion- conditioned reaction synthesis for two-person interactions
Anindita Ghosh, Rishabh Dabral, Vladislav Golyanik, Chris- tian Theobalt, and Philipp Slusallek. Remos: 3d motion- conditioned reaction synthesis for two-person interactions. InEuropean Conference on Computer Vision, pages 418–
-
[16]
Springer, 2024. 1, 2, 5
2024
-
[17]
Duetgen: Music driven two-person dance generation via hierarchical masked modeling
Anindita Ghosh, Bing Zhou, Rishabh Dabral, Jian Wang, Vladislav Golyanik, Christian Theobalt, Philipp Slusallek, and Chuan Guo. Duetgen: Music driven two-person dance generation via hierarchical masked modeling. InProceed- ings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers, pages 1–11, 2025. 2, 7
2025
-
[18]
Generating diverse and natural 3d human motions from text
Chuan Guo, Shihao Zou, Xinxin Zuo, Sen Wang, Wei Ji, Xingyu Li, and Li Cheng. Generating diverse and natural 3d human motions from text. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5152–5161, 2022. 1
2022
-
[19]
Scenemaker: Intelligent multimodal visualisation of natural language scripts
Eva Hanser, Paul Mc Kevitt, Tom Lunney, and Joan Condell. Scenemaker: Intelligent multimodal visualisation of natural language scripts. InArtificial Intelligence and Cognitive Sci- ence: 20th Irish Conference, AICS 2009, Dublin, Ireland, August 19-21, 2009, Revised Selected Papers 20, pages 144–
2009
-
[20]
Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020. 2, 5, 6
2020
-
[21]
Multi-agent long-term 3d human pose forecasting via interaction-aware trajectory conditioning
Jaewoo Jeong, Daehee Park, and Kuk-Jin Yoon. Multi-agent long-term 3d human pose forecasting via interaction-aware trajectory conditioning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1617–1628, 2024. 2
2024
-
[22]
Motiongpt: Human motion as a foreign lan- guage.Advances in Neural Information Processing Systems, 36:20067–20079, 2023
Biao Jiang, Xin Chen, Wen Liu, Jingyi Yu, Gang Yu, and Tao Chen. Motiongpt: Human motion as a foreign lan- guage.Advances in Neural Information Processing Systems, 36:20067–20079, 2023. 1, 3, 4
2023
-
[23]
A brand new dance partner: Music- conditioned pluralistic dancing controlled by multiple dance genres
Jinwoo Kim, Heeseok Oh, Seongjean Kim, Hoseok Tong, and Sanghoon Lee. A brand new dance partner: Music- conditioned pluralistic dancing controlled by multiple dance genres. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3490– 3500, 2022. 1
2022
-
[24]
Music-driven group choreography
Nhat Le, Thang Pham, Tuong Do, Erman Tjiputra, Quang D Tran, and Anh Nguyen. Music-driven group choreography. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8673–8682, 2023. 1
2023
-
[25]
Ai choreographer: Music conditioned 3d dance generation with aist++
Ruilong Li, Shan Yang, David A Ross, and Angjoo Kanazawa. Ai choreographer: Music conditioned 3d dance generation with aist++. InProceedings of the IEEE/CVF international conference on computer vision, pages 13401– 13412, 2021. 6
2021
-
[26]
Zimo Li, Yi Zhou, Shuangjiu Xiao, Chong He, Zeng Huang, and Hao Li. Auto-conditioned recurrent networks for ex- tended complex human motion synthesis.arXiv preprint arXiv:1707.05363, 2017. 1, 2, 3
-
[27]
Intergen: Diffusion-based multi-human motion genera- tion under complex interactions.International Journal of Computer Vision, 132(9):3463–3483, 2024
Han Liang, Wenqian Zhang, Wenxuan Li, Jingyi Yu, and Lan Xu. Intergen: Diffusion-based multi-human motion genera- tion under complex interactions.International Journal of Computer Vision, 132(9):3463–3483, 2024. 2
2024
-
[28]
Focal loss for dense object detection
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. InPro- ceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017. 5
2017
-
[29]
Disentangling and unifying graph convo- lutions for skeleton-based action recognition
Ziyu Liu, Hongwen Zhang, Zhenghao Chen, Zhiyong Wang, and Wanli Ouyang. Disentangling and unifying graph convo- lutions for skeleton-based action recognition. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 143–152, 2020. 3
2020
-
[30]
Smpl: A skinned multi- person linear model
Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. Smpl: A skinned multi- person linear model. InSeminal Graphics Papers: Pushing the Boundaries, Volume 2, pages 851–866. 2023. 3
2023
-
[31]
Decoupled Weight Decay Regularization
I Loshchilov. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017. 6
work page internal anchor Pith review arXiv 2017
-
[32]
Sihan Ma, Qiong Cao, Jing Zhang, and Dacheng Tao. Contact-aware human motion generation from textual de- scriptions.arXiv preprint arXiv:2403.15709, 2024. 2, 3
-
[33]
Synergy and synchrony in couple dances
V ongani Maluleke, Lea M ¨uller, Jathushan Rajasegaran, Georgios Pavlakos, Shiry Ginosar, Angjoo Kanazawa, and Jitendra Malik. Synergy and synchrony in couple dances. arXiv preprint arXiv:2409.04440, 2024. 1, 2, 4
-
[34]
Learning trajectory dependencies for human motion pre- diction
Wei Mao, Miaomiao Liu, Mathieu Salzmann, and Hongdong Li. Learning trajectory dependencies for human motion pre- diction. InProceedings of the IEEE/CVF international con- ference on computer vision, pages 9489–9497, 2019. 3
2019
-
[35]
History repeats itself: Human motion prediction via motion atten- tion
Wei Mao, Miaomiao Liu, and Mathieu Salzmann. History repeats itself: Human motion prediction via motion atten- tion. InEuropean Conference on Computer Vision, pages 474–489. Springer, 2020. 3
2020
-
[36]
librosa: Audio and music signal analysis in python.SciPy, 2015:18– 24, 2015
Brian McFee, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. librosa: Audio and music signal analysis in python.SciPy, 2015:18– 24, 2015. 4
2015
-
[37]
Performance-driven dance motion control of a virtual partner character
Christos Mousas. Performance-driven dance motion control of a virtual partner character. In2018 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pages 57–64. IEEE, 2018. 2
2018
-
[38]
Efficient content-based retrieval of motion capture data
Meinard M ¨uller, Tido R¨oder, and Michael Clausen. Efficient content-based retrieval of motion capture data. InACM SIG- GRAPH 2005 Papers, pages 677–685. 2005. 6
2005
-
[39]
Fmdistance: A fast and effective distance function for mo- tion capture data.Eurographics (Short Papers), 7(10), 2008
Kensuke Onuma, Christos Faloutsos, and Jessica K Hodgins. Fmdistance: A fast and effective distance function for mo- tion capture data.Eurographics (Short Papers), 7(10), 2008. 6
2008
-
[40]
Expressive body capture: 3d hands, face, and body from a single image
Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed AA Osman, Dimitrios Tzionas, and Michael J Black. Expressive body capture: 3d hands, face, and body from a single image. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10975–10985, 2019. 3, 6
2019
-
[41]
Content based querying and search- ing for 3d human motions
Manoj M Pawar, Gaurav N Pradhan, Kang Zhang, and Bal- akrishnan Prabhakaran. Content based querying and search- ing for 3d human motions. InAdvances in Multimedia Mod- eling: 14th International Multimedia Modeling Conference, MMM 2008, Kyoto, Japan, January 9-11, 2008. Proceedings 14, pages 446–455. Springer, 2008. 2, 3
2008
-
[42]
Hierarchical generation of human-object inter- actions with diffusion probabilistic models
Huaijin Pi, Sida Peng, Minghui Yang, Xiaowei Zhou, and Hujun Bao. Hierarchical generation of human-object inter- actions with diffusion probabilistic models. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 15061–15073, 2023. 4
2023
-
[43]
Hierarchical indexing structure for 3d human mo- tions
Gaurav N Pradhan, Chuanjun Li, and Balakrishnan Prab- hakaran. Hierarchical indexing structure for 3d human mo- tions. InInternational Conference on Multimedia Modeling, pages 386–396. Springer, 2007. 2, 3
2007
-
[44]
Improving language understanding by gen- erative pre-training
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. Improving language understanding by gen- erative pre-training. 2018. 7
2018
-
[45]
Bailando: 3d dance generation by actor-critic gpt with choreographic memory
Li Siyao, Weijiang Yu, Tianpei Gu, Chunze Lin, Quan Wang, Chen Qian, Chen Change Loy, and Ziwei Liu. Bailando: 3d dance generation by actor-critic gpt with choreographic memory. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11050– 11059, 2022. 2, 3, 4, 6
2022
-
[46]
Li Siyao, Tianpei Gu, Zhitao Yang, Zhengyu Lin, Ziwei Liu, Henghui Ding, Lei Yang, and Chen Change Loy. Duolando: Follower gpt with off-policy reinforcement learning for dance accompaniment.arXiv preprint arXiv:2403.18811,
-
[47]
1, 2, 3, 4, 5, 6, 7, 8
-
[48]
Deep unsupervised learning using nonequilibrium thermodynamics
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. InInternational confer- ence on machine learning, pages 2256–2265. pmlr, 2015. 2
2015
-
[49]
Denoising Diffusion Implicit Models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020. 2, 6
work page Pith review arXiv 2010
-
[50]
Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equa- tions.arXiv preprint arXiv:2011.13456, 2020. 2, 6
work page Pith review arXiv 2011
-
[51]
Local motion phases for learning multi-contact charac- ter movements.ACM Transactions on Graphics (TOG), 39 (4):54–1, 2020
Sebastian Starke, Yiwei Zhao, Taku Komura, and Kazi Za- man. Local motion phases for learning multi-contact charac- ter movements.ACM Transactions on Graphics (TOG), 39 (4):54–1, 2020. 2
2020
-
[52]
Role-aware interac- tion generation from textual description
Mikihiro Tanaka and Kent Fujiwara. Role-aware interac- tion generation from textual description. InProceedings of the IEEE/CVF international conference on computer vision, pages 15999–16009, 2023. 1
2023
-
[53]
Neural discrete representation learning.Advances in neural information pro- cessing systems, 30, 2017
Aaron Van Den Oord, Oriol Vinyals, et al. Neural discrete representation learning.Advances in neural information pro- cessing systems, 30, 2017. 1, 4
2017
-
[54]
Attention is all you need.Advances in neural information processing systems, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 2, 5
2017
-
[55]
Neural kinematic networks for unsupervised motion retargetting
Ruben Villegas, Jimei Yang, Duygu Ceylan, and Honglak Lee. Neural kinematic networks for unsupervised motion retargetting. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 8639–8648,
-
[56]
Saga: Stochastic whole- body grasping with contact
Yan Wu, Jiahao Wang, Yan Zhang, Siwei Zhang, Otmar Hilliges, Fisher Yu, and Siyu Tang. Saga: Stochastic whole- body grasping with contact. InEuropean Conference on Computer Vision, pages 257–274. Springer, 2022. 1
2022
-
[57]
Regennet: Towards human action-reaction synthesis
Liang Xu, Yizhou Zhou, Yichao Yan, Xin Jin, Wenhan Zhu, Fengyun Rao, Xiaokang Yang, and Wenjun Zeng. Regennet: Towards human action-reaction synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1759–1769, 2024. 1, 2, 5, 6, 7
2024
-
[58]
Spatial tempo- ral graph convolutional networks for skeleton-based action recognition
Sijie Yan, Yuanjun Xiong, and Dahua Lin. Spatial tempo- ral graph convolutional networks for skeleton-based action recognition. InProceedings of the AAAI conference on arti- ficial intelligence, 2018. 3
2018
-
[59]
Hongdi Yang, Chengyang Li, Zhenxuan Wu, Gaozheng Li, Jingya Wang, Jingyi Yu, Zhuo Su, and Lan Xu. Smgdiff: Soccer motion generation using diffusion probabilistic mod- els.arXiv preprint arXiv:2411.16216, 2024. 2, 6
-
[60]
Robots learn social skills: End-to-end learning of co-speech gesture generation for humanoid robots
Youngwoo Yoon, Woo-Ri Ko, Minsu Jang, Jaeyeon Lee, Jae- hong Kim, and Geehyuk Lee. Robots learn social skills: End-to-end learning of co-speech gesture generation for humanoid robots. In2019 International Conference on Robotics and Automation (ICRA), pages 4303–4309. IEEE,
-
[61]
Structure-aware human-action generation
Ping Yu, Yang Zhao, Chunyuan Li, Junsong Yuan, and Changyou Chen. Structure-aware human-action generation. InComputer Vision–ECCV 2020: 16th European Confer- ence, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXX 16, pages 18–34. Springer, 2020. 1
2020
-
[62]
Generating human motion from textual descrip- tions with discrete representations
Jianrong Zhang, Yangsong Zhang, Xiaodong Cun, Yong Zhang, Hongwei Zhao, Hongtao Lu, Xi Shen, and Ying Shan. Generating human motion from textual descrip- tions with discrete representations. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14730–14740, 2023. 1, 3, 4
2023
-
[63]
Semantics-guided neural networks for efficient skeleton-based human action recog- nition
Pengfei Zhang, Cuiling Lan, Wenjun Zeng, Junliang Xing, Jianru Xue, and Nanning Zheng. Semantics-guided neural networks for efficient skeleton-based human action recog- nition. Inproceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1112–1121,
-
[64]
Couch: Towards controllable human-chair interactions
Xiaohan Zhang, Bharat Lal Bhatnagar, Sebastian Starke, Vladimir Guzov, and Gerard Pons-Moll. Couch: Towards controllable human-chair interactions. InEuropean Confer- ence on Computer Vision, pages 518–535. Springer, 2022. 2, 3
2022
-
[65]
On the continuity of rotation representations in neural networks
Yi Zhou, Connelly Barnes, Jingwan Lu, Jimei Yang, and Hao Li. On the continuity of rotation representations in neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5745–5753,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.