Recognition: unknown
PhysiGen: Integrating Collision-Aware Physical Constraints for High-Fidelity Human-Human Interaction Generation
Pith reviewed 2026-05-09 19:34 UTC · model grok-4.3
The pith
PhysiGen reduces body interpenetration in AI-generated human interactions by using simplified geometric shapes to enforce physical collision constraints.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that simplifying human body meshes into geometric primitives for collision detection, combined with identifying collision regions to guide optimization, creates an efficient and effective way to integrate physical constraints into human-human interaction generation models, leading to reduced interpenetration and better visual and physical quality.
What carries the argument
PhysiGen optimization strategy, which approximates high-resolution meshes with geometric primitives to compute inter-person collisions efficiently and directs the generation process using collision region information.
Load-bearing premise
Approximating detailed human body meshes with simple geometric primitives captures enough collision information to guide effective optimization without overlooking important contact details or creating false positives.
What would settle it
Running the generation process with and without PhysiGen on the same model and inputs, then measuring the volume of interpenetrating body regions or counting collision events in the output sequences to see if the reduction is statistically significant.
Figures
read the original abstract
Despite substantial progress in text-driven 3D human motion synthesis, generating realistic multi-person interaction sequences remains challenging. Notably, body inter-penetration is a pervasive issue from both data acquisition to the generated results, which significantly undermines the realism and usability. Previous generative models either ignored this issue or introduced computationally expensive mesh-level loss functions to alleviate inter-body collisions. In this paper, we propose a general-purpose and computationally efficient optimization strategy named PhysiGen to explicitly integrate collision-aware physical constraints for human-human interaction generation. Specifically, we simplify the high-resolution human body mesh into geometric primitives to greatly reduce the cost of inter-person collision detection. Moreover, we identify the collision regions as the guidance of the optimization directions. PhysiGen is plug-and-play and can be readily integrated into existing human interaction generation models. Extensive cross-dataset and cross-model experiments show that our method can effectively reduce interpenetration and significantly improve visual coherence and physical plausibility compared to the state-of-the-art methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes PhysiGen, a plug-and-play optimization strategy for text-driven 3D human-human interaction generation. It simplifies high-resolution body meshes to geometric primitives to enable efficient collision detection, identifies collision regions to guide optimization directions, and integrates collision-aware physical constraints into existing generative models. Extensive cross-dataset and cross-model experiments are claimed to show reduced interpenetration and improved visual coherence and physical plausibility over state-of-the-art methods.
Significance. If the central claims hold, PhysiGen offers a computationally efficient, general-purpose approach to a pervasive problem in multi-person motion synthesis. The plug-and-play design and cross-model validation are strengths that could make physical plausibility improvements accessible without retraining or heavy mesh-level losses.
major comments (2)
- [§3.2] §3.2 (Method, primitive approximation): The claim that simplifying high-resolution meshes to geometric primitives 'greatly reduce[s] the cost of inter-person collision detection' while still providing effective guidance relies on the unquantified assumption that the approximation preserves locations and extents of actual penetrations (especially non-convex contacts involving hands/limbs). No error bounds, missed-collision rates, or ablation on primitive choice (e.g., spheres vs. capsules) are reported; this is load-bearing for the headline improvements in physical plausibility.
- [§4] §4 (Experiments): The cross-dataset and cross-model results are presented as showing 'significant' improvements, but without reported statistical significance tests, variance across runs, or direct comparison of collision metrics before/after PhysiGen on the same base model outputs, it is difficult to isolate the contribution of the collision guidance from other optimization factors.
minor comments (2)
- [§3] Notation for the collision threshold and optimization strength parameters should be explicitly listed as free parameters in the method section for reproducibility.
- [Figures] Figure captions for qualitative results should include the specific base model and dataset for each example to aid comparison.
Simulated Author's Rebuttal
We sincerely thank the referee for the constructive and detailed feedback. The comments highlight important aspects of rigor in our method and evaluation. We address each major comment point-by-point below, providing our response and indicating planned revisions to the manuscript.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Method, primitive approximation): The claim that simplifying high-resolution meshes to geometric primitives 'greatly reduce[s] the cost of inter-person collision detection' while still providing effective guidance relies on the unquantified assumption that the approximation preserves locations and extents of actual penetrations (especially non-convex contacts involving hands/limbs). No error bounds, missed-collision rates, or ablation on primitive choice (e.g., spheres vs. capsules) are reported; this is load-bearing for the headline improvements in physical plausibility.
Authors: We appreciate the referee's emphasis on quantifying the fidelity of the primitive approximation. In the manuscript, we selected capsules as the primary primitive because they efficiently model the cylindrical geometry of limbs and torso while supporting fast signed-distance queries for collision detection. The collision region identification step further focuses optimization on detected contact areas rather than relying solely on global primitive overlap. While the original submission does not include explicit error bounds or missed-collision statistics, the cross-model and cross-dataset results show consistent reductions in interpenetration metrics, indicating practical effectiveness. To strengthen this, we will add in the revision: (1) an ablation comparing spheres, capsules, and ellipsoids on approximation error (Hausdorff distance to original mesh and penetration depth error), (2) missed-collision rates measured against full-mesh ground-truth detection on held-out interaction samples, and (3) visualizations of preserved vs. missed contacts, particularly for hand/limb regions. These additions will provide the requested error bounds and confirm that the approximation supports reliable guidance. revision: yes
-
Referee: [§4] §4 (Experiments): The cross-dataset and cross-model results are presented as showing 'significant' improvements, but without reported statistical significance tests, variance across runs, or direct comparison of collision metrics before/after PhysiGen on the same base model outputs, it is difficult to isolate the contribution of the collision guidance from other optimization factors.
Authors: We agree that additional statistical controls and direct before/after comparisons would better isolate PhysiGen's contribution. The current experiments apply PhysiGen as a post-processing step on outputs from multiple base models and report aggregate improvements, but do not include run-to-run variance or formal significance testing. In the revised manuscript we will: (1) report mean ± standard deviation for all quantitative metrics (including collision volume and contact ratio) across at least three random seeds, (2) add a dedicated table comparing collision metrics on identical base-model sequences before and after PhysiGen optimization, and (3) include paired t-test p-values to establish statistical significance of the observed reductions in interpenetration. These changes will clarify the specific impact of the collision-aware constraints. revision: yes
Circularity Check
No circularity: PhysiGen is an independent optimization technique with no self-referential derivations
full rationale
The paper introduces PhysiGen as a plug-and-play optimization strategy that simplifies high-resolution meshes to geometric primitives for efficient collision detection and uses identified collision regions to guide optimization directions. This approach is presented as a general-purpose method integrable into existing models, with claims validated through cross-dataset and cross-model experiments rather than any internal derivation chain. No equations, fitted parameters renamed as predictions, self-citations as load-bearing premises, uniqueness theorems, or ansatzes smuggled via prior work appear in the abstract or described method. The central contribution reduces to a practical engineering choice (primitive approximation for speed) whose effectiveness is externally tested, not defined into existence by the inputs themselves. The derivation is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- Collision threshold or optimization strength parameters
axioms (1)
- domain assumption Geometric primitives can accurately represent human body collisions for optimization purposes
Reference graph
Works this paper leans on
-
[1]
Lan- guage2pose: Natural language grounded pose forecasting
Chaitanya Ahuja and Louis-Philippe Morency. Lan- guage2pose: Natural language grounded pose forecasting. In2019 International conference on 3D vision (3DV), pages 719–728. IEEE, 2019. 1
2019
-
[2]
To react or not to react: End-to-end visual pose forecasting for personalized avatar during dyadic con- versations, 2019
Chaitanya Ahuja, Shugao Ma, Louis-Philippe Morency, and Yaser Sheikh. To react or not to react: End-to-end visual pose forecasting for personalized avatar during dyadic con- versations, 2019. 1
2019
-
[3]
Socialinteractiongan: Multi-person interaction se- quence generation.IEEE Transactions on Affective Comput- ing, 14(3):2182–2192, 2022
Louis Airale, Dominique Vaufreydaz, and Xavier Alameda- Pineda. Socialinteractiongan: Multi-person interaction se- quence generation.IEEE Transactions on Affective Comput- ing, 14(3):2182–2192, 2022. 1
2022
-
[4]
Wordware Publishing, Inc., 2003
Erik Bethke.Game development and production. Wordware Publishing, Inc., 2003. 1
2003
-
[5]
Lan- guage models are few-shot learners.Advances in neural in- formation processing systems, 33:1877–1901, 2020
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakan- tan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Lan- guage models are few-shot learners.Advances in neural in- formation processing systems, 33:1877–1901, 2020. 2
1901
-
[6]
Digital life project: Autonomous 3d characters with social intelligence
Zhongang Cai, Jianping Jiang, Zhongfei Qing, Xinying Guo, Mingyuan Zhang, Zhengyu Lin, Haiyi Mei, Chen Wei, Ruisi Wang, Wanqi Yin, et al. Digital life project: Autonomous 3d characters with social intelligence. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 582–592, 2024. 2
2024
-
[7]
Executing your commands via motion diffusion in latent space
Xin Chen, Biao Jiang, Wen Liu, Zilong Huang, Bin Fu, Tao Chen, and Gang Yu. Executing your commands via motion diffusion in latent space. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18000–18010, 2023. 2
2023
-
[8]
Occlusion-aware networks for 3d human pose estimation in video
Yu Cheng, Bo Yang, Bo Wang, Wending Yan, et al. Occlusion-aware networks for 3d human pose estimation in video. InICCV, 2019. 2
2019
-
[9]
Cg-hoi: Contact-guided 3d human-object interaction generation, 2024
Christian Diller and Angela Dai. Cg-hoi: Contact-guided 3d human-object interaction generation, 2024. 2
2024
-
[10]
Generating diverse and natural 3d human motions from text
Chuan Guo, Shihao Zou, Xinxin Zuo, Sen Wang, Wei Ji, Xingyu Li, and Li Cheng. Generating diverse and natural 3d human motions from text. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5152–5161, 2022. 2
2022
-
[11]
Tm2t: Stochastic and tokenized modeling for the reciprocal genera- tion of 3d human motions and texts
Chuan Guo, Xinxin Zuo, Sen Wang, and Li Cheng. Tm2t: Stochastic and tokenized modeling for the reciprocal genera- tion of 3d human motions and texts. InEuropean Conference on Computer Vision, pages 580–597. Springer, 2022. 2
2022
-
[12]
Resolving 3d human pose ambigui- ties with 3d scene constraints, 2019
Mohamed Hassan, Vasileios Choutas, Dimitrios Tzionas, and Michael J Black. Resolving 3d human pose ambigui- ties with 3d scene constraints, 2019. 1, 2
2019
-
[13]
Classifier-Free Diffusion Guidance
Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022. 1
work page internal anchor Pith review arXiv 2022
-
[14]
Modeling multiple normal action representations for error detection in procedural tasks
Wei-Jin Huang, Yuan-Ming Li, Zhi-Wei Xia, et al. Modeling multiple normal action representations for error detection in procedural tasks. InCVPR, 2025. 1
2025
-
[15]
Wei-Jin Huang et al. Learning whole-body human- humanoid interaction from human-human demonstrations. arXiv:2601.09518, 2026. 1
-
[16]
Muhammad Gohar Javed, Chuan Guo, Li Cheng, and Xingyu Li. Intermask: 3d human interaction genera- tion via collaborative masked modeling.arXiv preprint arXiv:2410.10010, 2024. 2
-
[17]
Hand-object contact consistency reasoning for human grasps generation, 2021
Hanwen Jiang, Shaowei Liu, Jiashun Wang, and Xiaolong Wang. Hand-object contact consistency reasoning for human grasps generation, 2021. 2
2021
-
[18]
Coherent reconstruction of multiple humans from a single image
Wen Jiang, Nikos Kolotouros, Georgios Pavlakos, Xiaowei Zhou, and Kostas Daniilidis. Coherent reconstruction of multiple humans from a single image. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5579–5588, 2020. 1
2020
-
[19]
Flame: Free- form language-based motion synthesis & editing
Jihoon Kim, Jiseob Kim, and Sungjoon Choi. Flame: Free- form language-based motion synthesis & editing. InPro- ceedings of the AAAI Conference on Artificial Intelligence, pages 8255–8263, 2023. 2
2023
-
[20]
Auto-encoding vari- ational bayes, 2013
Diederik P Kingma, Max Welling, et al. Auto-encoding vari- ational bayes, 2013. 2
2013
-
[21]
Yuan-Ming Li, Qize Yang, Nan Lei, et al. Irg-motionllm: Interleaving motion generation, assessment and refinement for text-to-motion generation.arXiv:2512.10730, 2025. 1
-
[22]
Intergen: Diffusion-based multi-human motion genera- tion under complex interactions.International Journal of Computer Vision, 132(9):3463–3483, 2024
Han Liang, Wenqian Zhang, Wenxuan Li, Jingyi Yu, and Lan Xu. Intergen: Diffusion-based multi-human motion genera- tion under complex interactions.International Journal of Computer Vision, 132(9):3463–3483, 2024. 1, 2, 3, 5, 6
2024
-
[23]
Decoupled Weight Decay Regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017. 1
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[24]
John Wiley & Sons, 2005
Nadia Magnenat-Thalmann and Daniel Thalmann.Hand- book of virtual humans. John Wiley & Sons, 2005. 1
2005
-
[25]
Amass: Archive of motion capture as surface shapes
Naureen Mahmood, Nima Ghorbani, Nikolaus F Troje, Ger- ard Pons-Moll, and Michael J Black. Amass: Archive of motion capture as surface shapes. InProceedings of the IEEE/CVF international conference on computer vision, pages 5442–5451, 2019. 5
2019
-
[26]
librosa: Audio and music signal analysis in python., 2015
Brian McFee, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. librosa: Audio and music signal analysis in python., 2015. 1
2015
-
[27]
Improved denoising diffusion probabilistic models
Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. InInternational conference on machine learning, pages 8162–8171. PMLR,
-
[28]
Newnes, 2012
Rick Parent.Computer animation: algorithms and tech- niques. Newnes, 2012. 1
2012
-
[29]
Expressive body capture: 3d hands, face, and body from a single image
Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed AA Osman, Dimitrios Tzionas, and Michael J Black. Expressive body capture: 3d hands, face, and body from a single image. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10975–10985, 2019. 5
2019
-
[30]
Hoi-diff: Text-driven synthe- sis of 3d human-object interactions using diffusion models,
Xiaogang Peng, Yiming Xie, Zizhao Wu, Varun Jampani, Deqing Sun, and Huaizu Jiang. Hoi-diff: Text-driven synthe- sis of 3d human-object interactions using diffusion models,
-
[31]
Temos: Generating diverse human motions from textual descriptions
Mathis Petrovich, Michael J Black, and G ¨ul Varol. Temos: Generating diverse human motions from textual descriptions. InEuropean Conference on Computer Vision, pages 480–
-
[32]
Introduction to game development.(No Title),
Steve Rabin. Introduction to game development.(No Title),
-
[33]
in2in: Leveraging individual information to generate human interactions
Pablo Ruiz-Ponce, German Barquero, Cristina Palmero, Ser- gio Escalera, and Jos´e Garc´ıa-Rodr´ıguez. in2in: Leveraging individual information to generate human interactions. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 1941–1951, 2024. 1, 2, 6
1941
-
[34]
Intelligent robotic control.IEEE Transac- tions on Automatic Control, 28(5):547–557, 2003
George Saridis. Intelligent robotic control.IEEE Transac- tions on Automatic Control, 28(5):547–557, 2003. 1
2003
-
[35]
arXiv preprint arXiv:2303.01418 (2023) 3
Yonatan Shafir, Guy Tevet, Roy Kapon, and Amit H Bermano. Human motion diffusion as a generative prior. arXiv preprint arXiv:2303.01418, 2023. 2
-
[36]
Towards open domain text-driven synthesis of multi-person motions, 2024
Mengyi Shan, Lu Dong, Yutao Han, Yuan Yao, Tao Liu, Ifeoma Nwogu, Guo-Jun Qi, and Mitch Hill. Towards open domain text-driven synthesis of multi-person motions, 2024. 1
2024
-
[37]
Denoising Diffusion Implicit Models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020. 1
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[38]
Motionclip: Exposing human motion generation to clip space
Guy Tevet, Brian Gordon, Amir Hertz, Amit H Bermano, and Daniel Cohen-Or. Motionclip: Exposing human motion generation to clip space. InEuropean Conference on Com- puter Vision, pages 358–374. Springer, 2022. 2
2022
-
[39]
Guy Tevet, Sigal Raab, Brian Gordon, Yonatan Shafir, Daniel Cohen-Or, and Amit H Bermano. Human motion dif- fusion model.arXiv preprint arXiv:2209.14916, 2022. 2
work page internal anchor Pith review arXiv 2022
-
[40]
Gaze-guided hand- object interaction synthesis: Dataset and method, 2024
Jie Tian, Ran Ji, Lingxiao Yang, Suting Ni, Yuexin Ma, Lan Xu, Jingyi Yu, Ye Shi, and Jingya Wang. Gaze-guided hand- object interaction synthesis: Dataset and method, 2024. 2
2024
-
[41]
Attention is all you need.Advances in neural information processing systems, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 2
2017
-
[42]
To- wards domain generalization for multi-view 3d object detec- tion in bird-eye-view
Shuo Wang, Xinhai Zhao, Hai-Ming Xu, Zehui Chen, Dameng Yu, Jiahao Chang, Zhen Yang, and Feng Zhao. To- wards domain generalization for multi-view 3d object detec- tion in bird-eye-view. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 13333–13342, 2023. 1
2023
-
[43]
Timotion: Temporal and in- teractive framework for efficient human-human motion gen- eration, 2025
Yabiao Wang, Shuo Wang, Jiangning Zhang, Ke Fan, Jiafu Wu, Zhucun Xue, and Yong Liu. Timotion: Temporal and in- teractive framework for efficient human-human motion gen- eration, 2025. 2, 6
2025
-
[44]
Actformer: A gan-based transformer towards general action-conditioned 3d human motion gener- ation
Liang Xu, Ziyang Song, Dongliang Wang, Jing Su, Zhicheng Fang, Chenjing Ding, Weihao Gan, Yichao Yan, Xin Jin, Xi- aokang Yang, et al. Actformer: A gan-based transformer towards general action-conditioned 3d human motion gener- ation. InICCV, pages 2228–2238, 2023. 2
2023
-
[45]
Inter-x: Towards versatile human- human interaction analysis, 2024
Liang Xu, Xintao Lv, Yichao Yan, Xin Jin, Shuwen Wu, Congsheng Xu, Yifan Liu, Yizhou Zhou, Fengyun Rao, Xingdong Sheng, et al. Inter-x: Towards versatile human- human interaction analysis, 2024. 2, 3, 5, 6, 1
2024
-
[46]
Regennet: Towards human action-reaction synthesis
Liang Xu, Yizhou Zhou, Yichao Yan, Xin Jin, Wenhan Zhu, Fengyun Rao, Xiaokang Yang, and Wenjun Zeng. Regennet: Towards human action-reaction synthesis. InCVPR, pages 1759–1769, 2024. 2
2024
-
[47]
Light-t2m: A lightweight and fast model for text- to-motion generation
Zeng et al. Light-t2m: A lightweight and fast model for text- to-motion generation. InAAAI, 2025. 1
2025
-
[48]
Progressive human motion generation based on text and few motion frames.TCSVT, 2025
Ling-An Zeng et al. Progressive human motion generation based on text and few motion frames.TCSVT, 2025
2025
-
[49]
Re- modiffuse: Retrieval-augmented motion diffusion model
Mingyuan Zhang, Xinying Guo, Liang Pan, Zhongang Cai, Fangzhou Hong, Huirong Li, Lei Yang, and Ziwei Liu. Re- modiffuse: Retrieval-augmented motion diffusion model. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 364–373, 2023. 1
2023
-
[50]
Motiondif- fuse: Text-driven human motion generation with diffusion model.IEEE transactions on pattern analysis and machine intelligence, 46(6):4115–4128, 2024
Mingyuan Zhang, Zhongang Cai, Liang Pan, Fangzhou Hong, Xinying Guo, Lei Yang, and Ziwei Liu. Motiondif- fuse: Text-driven human motion generation with diffusion model.IEEE transactions on pattern analysis and machine intelligence, 46(6):4115–4128, 2024. 2
2024
-
[51]
Diffgrasp: Whole-body grasping synthesis guided by object motion us- ing a diffusion model, 2025
Yonghao Zhang, Qiang He, Yanguang Wan, Yinda Zhang, Xiaoming Deng, Cuixia Ma, and Hongan Wang. Diffgrasp: Whole-body grasping synthesis guided by object motion us- ing a diffusion model, 2025. 1, 2
2025
-
[52]
Compositional human-scene interaction synthe- sis with semantic control, 2022
Kaifeng Zhao, Shaofei Wang, Yan Zhang, Thabo Beeler, and Siyu Tang. Compositional human-scene interaction synthe- sis with semantic control, 2022. 1
2022
-
[53]
Synthesizing diverse human motions in 3d in- door scenes
Kaifeng Zhao, Yan Zhang, Shaofei Wang, Thabo Beeler, and Siyu Tang. Synthesizing diverse human motions in 3d in- door scenes. InProceedings of the IEEE/CVF international conference on computer vision, pages 14738–14749, 2023. 1
2023
-
[54]
A Survey of Large Language Models
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. A survey of large language mod- els.arXiv preprint arXiv:2303.18223, 1(2), 2023. 2
work page internal anchor Pith review arXiv 2023
-
[55]
Attt2m: Text-driven human motion generation with multi-perspective attention mechanism
Chongyang Zhong, Lei Hu, Zihao Zhang, and Shihong Xia. Attt2m: Text-driven human motion generation with multi-perspective attention mechanism. InProceedings of the IEEE/CVF international conference on computer vision, pages 509–519, 2023. 2
2023
-
[56]
Attt2m: Text-driven human motion generation with multi-perspective attention mechanism
Chongyang Zhong, Lei Hu, Zihao Zhang, and Shihong Xia. Attt2m: Text-driven human motion generation with multi-perspective attention mechanism. InProceedings of the IEEE/CVF international conference on computer vision, pages 509–519, 2023. 1
2023
-
[57]
One bends down and then the other notices and helps them up. Both of them communicate with each other and finally leave together
Wentao Zhu, Xiaoxuan Ma, Dongwoo Ro, Hai Ci, Jinlu Zhang, Jiaxin Shi, Feng Gao, Qi Tian, and Yizhou Wang. Human motion generation: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(4):2430– 2449, 2023. 1 10 PhysiGen: Integrating Collision-Aware Physical Constraints for High-Fidelity Human-Human Interaction Generation Supplementar...
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.