InterPhys: Physics-aware Human Motion Synthesis in a Dynamic Scene
Pith reviewed 2026-05-09 19:17 UTC · model grok-4.3
The pith
Soft physical constraints and a continuous distance-based force model generate physically plausible human motions in dynamic scenes with moving objects.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a physics-aware framework can synthesize human motions by explicitly modeling the full spectrum of forces—human-object, human-scene, and internal body dynamics—through soft constraints that enforce force and torque balance together with a novel continuous distance-based force model. This model extends contact handling to arbitrary surfaces and to interactions with dynamic, moving objects, yielding motions that are more physically grounded than those produced by methods limited to static scenes or hand-only contacts.
What carries the argument
The continuous distance-based force model, which computes interaction forces from distances to generalize contacts beyond hands or static surfaces and to include moving objects, paired with soft constraints that maintain force and torque balance.
If this is right
- Motions respect the complete set of human-related forces including internal body dynamics.
- Contact modeling extends to arbitrary surfaces and dynamic moving objects rather than being restricted to hands or static environments.
- Physical plausibility improves markedly in complex scenes compared with earlier limited-contact methods.
- The framework generalizes to new scenes while setting a benchmark for consistent human motion generation.
Where Pith is reading between the lines
- The distance-based contact approach may reduce reliance on explicit collision detection routines in downstream animation pipelines.
- Similar force modeling could transfer to generating interactions with additional classes of objects if the distance function is adjusted accordingly.
Load-bearing premise
Soft constraints plus the distance-based force model suffice to keep motions physically plausible in complex dynamic scenes without hard constraints, full rigid-body simulation, or post-processing corrections.
What would settle it
A concrete motion sequence generated by the method in a scene containing a moving object, where the human body penetrates the object or the net force and torque on the body fail to balance, would show the approach does not achieve its claimed physical consistency.
Figures
read the original abstract
This paper tackles the problem of physics-aware human motion synthesis in a dynamic scene. Unlike existing works which mainly tend to generate physically unrealistic motions due to limited contact modeling, typically restricted to hands, in this paper, we introduce a physics-aware human motion generation framework that explicitly models the full spectrum of human-related forces, including human-object, human-scene, and internal body dynamics.~Our method imposes soft physical constraints to maintain force and torque balance, ensuring physically grounded motion synthesis. We further propose a novel continuous distance-based force model that generalizes contact modeling to arbitrary surfaces, capturing interactions not only with static environments but also with dynamic, moving objects. Extensive experiments show that our approach significantly improves physical plausibility and generalizes well to complex scenes, setting a new benchmark for physically consistent human motion generation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces InterPhys, a physics-aware framework for human motion synthesis in dynamic scenes. It explicitly models the full spectrum of human-related forces (human-object, human-scene, and internal body dynamics) by imposing soft physical constraints to maintain force and torque balance. A novel continuous distance-based force model is proposed to generalize contact modeling to arbitrary surfaces, including interactions with both static environments and dynamic moving objects. The authors claim that extensive experiments demonstrate significantly improved physical plausibility and generalization, setting a new benchmark for physically consistent motion generation.
Significance. If the central claims hold, this work would advance physics-informed motion synthesis by providing a flexible alternative to hard constraints or full rigid-body simulation, particularly through the continuous force model that handles dynamic object interactions. This could influence downstream applications in animation, robotics, and VR by reducing reliance on post-processing corrections while maintaining physical grounding.
major comments (2)
- [Abstract, §3] Abstract and §3 (Method): The central claim that soft physical constraints plus the distance-based force model suffice to maintain force/torque balance in dynamic scenes is load-bearing but rests on an unverified assumption. Small per-step violations permitted by soft penalties can accumulate over time with moving objects, leading to implausibilities such as penetration or unbalanced torques; no section provides bounded residual analysis, long-horizon consistency metrics, or comparison against hard-constraint baselines to refute this risk.
- [§4] §4 (Experiments): The abstract asserts that 'extensive experiments' show improved plausibility and generalization, yet the provided text supplies no quantitative results, specific baselines, error tables, or ablation studies on the soft-constraint weights and distance-based scaling parameters. This absence makes it impossible to evaluate whether the method outperforms prior contact-limited approaches in complex dynamic scenes.
minor comments (2)
- [Abstract] The abstract would benefit from including one or two key quantitative metrics (e.g., force residual norms or contact accuracy) to ground the claims of 'significantly improves physical plausibility.'
- [§3.2] Notation for the continuous distance-based force model parameters (e.g., scaling factors) could be clarified with an explicit symbol table or definition list in §3.2.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We have addressed the concerns regarding the long-term stability of our soft-constraint formulation and the clarity of the experimental results. Revisions have been made to include additional analysis and to ensure all quantitative evaluations are explicitly presented.
read point-by-point responses
-
Referee: [Abstract, §3] Abstract and §3 (Method): The central claim that soft physical constraints plus the distance-based force model suffice to maintain force/torque balance in dynamic scenes is load-bearing but rests on an unverified assumption. Small per-step violations permitted by soft penalties can accumulate over time with moving objects, leading to implausibilities such as penetration or unbalanced torques; no section provides bounded residual analysis, long-horizon consistency metrics, or comparison against hard-constraint baselines to refute this risk.
Authors: We agree that explicit verification of long-term stability is important for soft-constraint methods. Our current experiments demonstrate that motions remain plausible without accumulating visible penetrations or torque imbalances over long sequences, thanks to the continuous distance-based force model that provides smooth gradients even for dynamic objects. However, we acknowledge the lack of formal bounded residual analysis in the original submission. In the revised manuscript we have added a dedicated stability analysis subsection reporting per-step and cumulative residual force/torque norms, maximum penetration depths, and long-horizon consistency metrics across 100+ frame sequences. We also include a limited comparison to a hard-constraint baseline, noting that hard constraints frequently cause solver divergence in scenes with moving objects, which motivated our soft formulation. revision: yes
-
Referee: [§4] §4 (Experiments): The abstract asserts that 'extensive experiments' show improved plausibility and generalization, yet the provided text supplies no quantitative results, specific baselines, error tables, or ablation studies on the soft-constraint weights and distance-based scaling parameters. This absence makes it impossible to evaluate whether the method outperforms prior contact-limited approaches in complex dynamic scenes.
Authors: We apologize that the quantitative details were not sufficiently prominent in the reviewed version. The full §4 contains error tables comparing against multiple baselines (including prior contact-limited and physics-based methods), reporting metrics such as average contact force error, penetration volume, and torque imbalance. Ablation studies on soft-constraint weights and distance-based scaling parameters are also present and show clear sensitivity trends. We have revised the section to ensure all tables, baseline descriptions, and ablation results are explicitly referenced and placed before the qualitative results, making the performance gains in dynamic scenes immediately verifiable. revision: yes
Circularity Check
No significant circularity in the derivation chain
full rationale
The paper introduces a physics-aware framework that imposes soft constraints for force/torque balance and proposes a novel continuous distance-based force model for human-object and human-scene interactions. These elements are framed as extensions of external physics principles rather than reductions of outputs to inputs. No quoted equations or sections in the abstract or description demonstrate self-definitional loops, fitted parameters renamed as predictions, or load-bearing self-citations that collapse the central claim. The derivation remains independent and self-contained.
Axiom & Free-Parameter Ledger
free parameters (2)
- soft constraint weights
- distance-based force scaling parameters
axioms (2)
- domain assumption Force and torque balance is a necessary condition for physically plausible human motion
- ad hoc to paper A distance-based continuous function can adequately approximate contact forces on arbitrary surfaces
invented entities (1)
-
continuous distance-based force model
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Listen, denoise, action! audio-driven motion synthesis with diffusion models.ACM Transactions on Graphics (TOG), 42(4):1–20, 2023
Simon Alexanderson, Rajmund Nagy, Jonas Beskow, and Gustav Eje Henter. Listen, denoise, action! audio-driven motion synthesis with diffusion models.ACM Transactions on Graphics (TOG), 42(4):1–20, 2023. 1
2023
-
[2]
Behave: Dataset and method for tracking human object in- teractions
Bharat Lal Bhatnagar, Xianghui Xie, Ilya A Petrov, Cristian Sminchisescu, Christian Theobalt, and Gerard Pons-Moll. Behave: Dataset and method for tracking human object in- teractions. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15935– 15946, 2022. 2
2022
-
[3]
Esti- mating contact dynamics
Marcus A Brubaker, Leonid Sigal, and David J Fleet. Esti- mating contact dynamics. In2009 IEEE 12th International Conference on Computer Vision, pages 2389–2396. IEEE,
-
[4]
Executing your commands via motion diffusion in latent space
Xin Chen, Biao Jiang, Wen Liu, Zilong Huang, Bin Fu, Tao Chen, and Gang Yu. Executing your commands via motion diffusion in latent space. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18000–18010, 2023. 1
2023
-
[5]
Detecting human-object contact in images
Yixin Chen, Sai Kumar Dwivedi, Michael J Black, and Dim- itrios Tzionas. Detecting human-object contact in images. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17100–17110, 2023. 3
2023
-
[6]
Pico: Reconstructing 3d people in con- tact with objects
Alp ´ar Cseke, Shashank Tripathi, Sai Kumar Dwivedi, Ar- jun S Lakshmipathy, Agniv Chatterjee, Michael J Black, and Dimitrios Tzionas. Pico: Reconstructing 3d people in con- tact with objects. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 1783–1794, 2025. 3
2025
-
[7]
Cg-hoi: Contact-guided 3d human-object interaction generation
Christian Diller and Angela Dai. Cg-hoi: Contact-guided 3d human-object interaction generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19888–19901, 2024. 1
2024
-
[8]
Differentiable dynamics for articu- lated 3d human motion reconstruction
Erik G ¨artner, Mykhaylo Andriluka, Erwin Coumans, and Cristian Sminchisescu. Differentiable dynamics for articu- lated 3d human motion reconstruction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13190–13200, 2022. 2
2022
-
[9]
Tm2d: Bimodality driven 3d dance generation via music-text integration
Kehong Gong, Dongze Lian, Heng Chang, Chuan Guo, Zi- hang Jiang, Xinxin Zuo, Michael Bi Mi, and Xinchao Wang. Tm2d: Bimodality driven 3d dance generation via music-text integration. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9942–9952, 2023. 1
2023
-
[10]
Generating diverse and natural 3d human motions from text
Chuan Guo, Shihao Zou, Xinxin Zuo, Sen Wang, Wei Ji, Xingyu Li, and Li Cheng. Generating diverse and natural 3d human motions from text. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5152–5161, 2022
2022
-
[11]
Momask: Generative masked model- ing of 3d human motions
Chuan Guo, Yuxuan Mu, Muhammad Gohar Javed, Sen Wang, and Li Cheng. Momask: Generative masked model- ing of 3d human motions. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1900–1910, 2024. 1
1900
-
[12]
Mohamed Hassan, Vasileios Choutas, Dimitrios Tzionas, and Michael J. Black. Resolving 3d human pose ambiguities with 3d scene constraints. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019. 2
2019
-
[13]
Stochas- tic scene-aware motion prediction
Mohamed Hassan, Duygu Ceylan, Ruben Villegas, Jun Saito, Jimei Yang, Yi Zhou, and Michael J Black. Stochas- tic scene-aware motion prediction. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 11374–11384, 2021. 2
2021
-
[14]
Synthesizing phys- ical character-scene interactions
Mohamed Hassan, Yunrong Guo, Tingwu Wang, Michael Black, Sanja Fidler, and Xue Bin Peng. Synthesizing phys- ical character-scene interactions. InACM SIGGRAPH 2023 Conference Proceedings, pages 1–9, 2023. 1, 2
2023
-
[15]
Nemf: Neural motion fields for kinematic an- imation.Advances in Neural Information Processing Sys- tems, 35:4244–4256, 2022
Chengan He, Jun Saito, James Zachary, Holly Rushmeier, and Yi Zhou. Nemf: Neural motion fields for kinematic an- imation.Advances in Neural Information Processing Sys- tems, 35:4244–4256, 2022. 7
2022
-
[16]
Intercap: joint markerless 3d tracking of hu- mans and objects in interaction from multi-view rgb-d im- ages.International Journal of Computer Vision, 132(7): 2551–2566, 2024
Yinghao Huang, Omid Taheri, Michael J Black, and Dim- itrios Tzionas. Intercap: joint markerless 3d tracking of hu- mans and objects in interaction from multi-view rgb-d im- ages.International Journal of Computer Vision, 132(7): 2551–2566, 2024. 2
2024
-
[17]
Primhoi: Compositional human-object interaction via reusable primitives
Kai Jia, Tengyu Liu, Mingtao Pei, Yixin Zhu, and Siyuan Huang. Primhoi: Compositional human-object interaction via reusable primitives. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 11491– 11501, 2025. 2
2025
-
[18]
Scaling up dynamic human-scene interaction mod- eling
Nan Jiang, Zhiyuan Zhang, Hongjie Li, Xiaoxuan Ma, Zan Wang, Yixin Chen, Tengyu Liu, Yixin Zhu, and Siyuan Huang. Scaling up dynamic human-scene interaction mod- eling. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 1737–1747,
-
[19]
Guided motion diffusion for controllable human motion synthesis
Korrawe Karunratanakul, Konpat Preechakul, Supasorn Suwajanakorn, and Siyu Tang. Guided motion diffusion for controllable human motion synthesis. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2151–2162, 2023. 1
2023
-
[20]
Adam: A Method for Stochastic Optimization
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980,
work page internal anchor Pith review Pith/arXiv arXiv
-
[21]
Object motion guided human motion synthesis.ACM Transactions on Graphics (TOG), 42(6):1–11, 2023
Jiaman Li, Jiajun Wu, and C Karen Liu. Object motion guided human motion synthesis.ACM Transactions on Graphics (TOG), 42(6):1–11, 2023. 1, 2, 3, 5, 6, 7, 8
2023
-
[22]
Karen Liu
Jiaman Li, Alexander Clegg, Roozbeh Mottaghi, Jiajun Wu, Xavier Puig, and C. Karen Liu. Controllable human-object interaction synthesis. InECCV, 2024. 1, 2, 3, 6, 7
2024
-
[23]
Genzi: Zero-shot 3d human-scene in- teraction generation
Lei Li and Angela Dai. Genzi: Zero-shot 3d human-scene in- teraction generation. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 20465–20474, 2024. 2
2024
-
[24]
Ai choreographer: Music conditioned 3d dance generation with aist++
Ruilong Li, Shan Yang, David A Ross, and Angjoo Kanazawa. Ai choreographer: Music conditioned 3d dance generation with aist++. InProceedings of the IEEE/CVF international conference on computer vision, pages 13401– 13412, 2021. 1
2021
-
[25]
Smpl: A skinned multi- person linear model
Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. Smpl: A skinned multi- person linear model. InSeminal Graphics Papers: Pushing the Boundaries, Volume 2, pages 851–866. 2023. 3
2023
-
[26]
Posegpt: Quantization-based 3d human mo- tion generation and forecasting
Thomas Lucas, Fabien Baradel, Philippe Weinzaepfel, and Gr´egory Rogez. Posegpt: Quantization-based 3d human mo- tion generation and forecasting. InEuropean Conference on Computer Vision, pages 417–435. Springer, 2022. 1
2022
-
[27]
Himo: A new benchmark for full-body human interacting with multiple objects
Xintao Lv, Liang Xu, Yichao Yan, Xin Jin, Congsheng Xu, Shuwen Wu, Yifan Liu, Lincheng Li, Mengxiao Bi, Wenjun Zeng, et al. Himo: A new benchmark for full-body human interacting with multiple objects. InEuropean Conference on Computer Vision, pages 300–318. Springer, 2024. 1
2024
-
[28]
Contact-aware human motion forecasting.Ad- vances in Neural Information Processing Systems, 35:7356– 7367, 2022
Wei Mao, Richard I Hartley, Mathieu Salzmann, and Miao- miao Liu. Contact-aware human motion forecasting.Ad- vances in Neural Information Processing Systems, 35:7356– 7367, 2022. 2
2022
-
[29]
Catch & carry: reusable neural controllers for vision-guided whole-body tasks.ACM Trans- actions on Graphics (TOG), 39(4):39–1, 2020
Josh Merel, Saran Tunyasuvunakool, Arun Ahuja, Yuval Tassa, Leonard Hasenclever, Vu Pham, Tom Erez, Greg Wayne, and Nicolas Heess. Catch & carry: reusable neural controllers for vision-guided whole-body tasks.ACM Trans- actions on Graphics (TOG), 39(4):39–1, 2020. 1, 2
2020
-
[30]
Contact-invariant optimization for hand manipulation
Igor Mordatch, Zoran Popovi ´c, and Emanuel Todorov. Contact-invariant optimization for hand manipulation. In Proceedings of the ACM SIGGRAPH/Eurographics sympo- sium on computer animation, pages 137–144, 2012. 3
2012
-
[31]
Animating human lower limbs us- ing contact-invariant optimization.ACM Transactions on Graphics (TOG), 32(6):1–8, 2013
Igor Mordatch, Jack M Wang, Emanuel Todorov, and Vladlen Koltun. Animating human lower limbs us- ing contact-invariant optimization.ACM Transactions on Graphics (TOG), 32(6):1–8, 2013. 3
2013
-
[32]
To- kenhsi: Unified synthesis of physical human-scene inter- actions through task tokenization
Liang Pan, Zeshi Yang, Zhiyang Dou, Wenjia Wang, Buzhen Huang, Bo Dai, Taku Komura, and Jingbo Wang. To- kenhsi: Unified synthesis of physical human-scene inter- actions through task tokenization. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5379–5391, 2025. 2
2025
-
[33]
Pytorch: An im- perative style, high-performance deep learning library.Ad- vances in neural information processing systems, 32, 2019
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An im- perative style, high-performance deep learning library.Ad- vances in neural information processing systems, 32, 2019. 8
2019
-
[34]
Action- conditioned 3d human motion synthesis with transformer vae
Mathis Petrovich, Michael J Black, and G ¨ul Varol. Action- conditioned 3d human motion synthesis with transformer vae. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 10985–10995, 2021. 1
2021
-
[35]
Temos: Generating diverse human motions from textual descriptions
Mathis Petrovich, Michael J Black, and G ¨ul Varol. Temos: Generating diverse human motions from textual descriptions. InEuropean Conference on Computer Vision, pages 480–
-
[36]
Finephys: Fine-grained hu- man action generation by explicitly incorporating physical laws for effective skeletal guidance
Dian Shao, Mingfei Shi, Shengda Xu, Haodong Chen, Yon- gle Huang, and Binglu Wang. Finephys: Fine-grained hu- man action generation by explicitly incorporating physical laws for effective skeletal guidance. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1905–1916, 2025. 2
1905
-
[37]
Bailando: 3d dance generation by actor-critic gpt with choreographic memory
Li Siyao, Weijiang Yu, Tianpei Gu, Chunze Lin, Quan Wang, Chen Qian, Chen Change Loy, and Ziwei Liu. Bailando: 3d dance generation by actor-critic gpt with choreographic memory. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11050– 11059, 2022. 1
2022
-
[38]
Human motion diffu- sion model
Guy Tevet, Sigal Raab, Brian Gordon, Yoni Shafir, Daniel Cohen-or, and Amit Haim Bermano. Human motion diffu- sion model. InThe Eleventh International Conference on Learning Representations, 2023. 1
2023
-
[39]
Deco: Dense estimation of 3d human-scene contact in the wild
Shashank Tripathi, Agniv Chatterjee, Jean-Claude Passy, Hongwei Yi, Dimitrios Tzionas, and Michael J Black. Deco: Dense estimation of 3d human-scene contact in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8001–8013, 2023. 3
2023
-
[40]
3d hu- man pose estimation via intuitive physics
Shashank Tripathi, Lea M ¨uller, Chun-Hao P Huang, Omid Taheri, Michael J Black, and Dimitrios Tzionas. 3d hu- man pose estimation via intuitive physics. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4713–4725, 2023. 2
2023
-
[41]
Humos: Human motion model conditioned on body shape
Shashank Tripathi, Omid Taheri, Christoph Lassner, Michael Black, Daniel Holden, and Carsten Stoll. Humos: Human motion model conditioned on body shape. InEuropean Con- ference on Computer Vision, pages 133–152. Springer, 2024. 2
2024
-
[42]
Edge: Editable dance generation from music
Jonathan Tseng, Rodrigo Castellon, and Karen Liu. Edge: Editable dance generation from music. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 448–458, 2023. 1
2023
-
[43]
Synthesizing long-term 3d human motion and in- teraction in 3d scenes
Jiashun Wang, Huazhe Xu, Jingwei Xu, Sifei Liu, and Xiao- long Wang. Synthesizing long-term 3d human motion and in- teraction in 3d scenes. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 9401–9411, 2021. 2
2021
-
[44]
Yinhuai Wang, Jing Lin, Ailing Zeng, Zhengyi Luo, Jian Zhang, and Lei Zhang. Physhoi: Physics-based imita- tion of dynamic human-object interaction.arXiv preprint arXiv:2312.04393, 2023. 1
-
[45]
Humanise: Language-conditioned hu- man motion generation in 3d scenes.Advances in Neural Information Processing Systems, 35:14959–14971, 2022
Zan Wang, Yixin Chen, Tengyu Liu, Yixin Zhu, Wei Liang, and Siyuan Huang. Humanise: Language-conditioned hu- man motion generation in 3d scenes.Advances in Neural Information Processing Systems, 35:14959–14971, 2022. 2
2022
-
[46]
Human- object interaction from human-level instructions
Zhen Wu, Jiaman Li, Pei Xu, and C Karen Liu. Human- object interaction from human-level instructions. InPro- ceedings of the IEEE/CVF International Conference on Computer Vision, pages 11176–11186, 2025. 1
2025
-
[47]
Visibility aware human-object interaction tracking from sin- gle rgb camera
Xianghui Xie, Bharat Lal Bhatnagar, and Gerard Pons-Moll. Visibility aware human-object interaction tracking from sin- gle rgb camera. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4757– 4768, 2023. 3
2023
-
[48]
In- tertrack: Tracking human object interaction without object templates
Xianghui Xie, Jan Eric Lenssen, and Gerard Pons-Moll. In- tertrack: Tracking human object interaction without object templates. In2025 International Conference on 3D Vision (3DV), pages 1427–1439. IEEE, 2025. 3
2025
-
[49]
Hierarchical planning and control for box loco-manipulation.Proceedings of the ACM on Computer Graphics and Interactive Techniques, 6(3):1– 18, 2023
Zhaoming Xie, Jonathan Tseng, Sebastian Starke, Michiel van de Panne, and C Karen Liu. Hierarchical planning and control for box loco-manipulation.Proceedings of the ACM on Computer Graphics and Interactive Techniques, 6(3):1– 18, 2023. 1, 2
2023
-
[50]
Scene-aware human motion forecasting via mutual distance prediction
Chaoyue Xing, Wei Mao, and Miaomiao Liu. Scene-aware human motion forecasting via mutual distance prediction. In European Conference on Computer Vision, pages 128–144. Springer, 2024. 2
2024
-
[51]
InterDiff: Generating 3d human-object interactions with physics-informed diffusion
Sirui Xu, Zhengyuan Li, Yu-Xiong Wang, and Liang-Yan Gui. InterDiff: Generating 3d human-object interactions with physics-informed diffusion. InICCV, 2023. 1, 2, 3, 6, 7
2023
-
[52]
Inter- dreamer: Zero-shot text to 3d dynamic human-object inter- action.Advances in Neural Information Processing Systems, 37:52858–52890, 2024
Sirui Xu, Yu-Xiong Wang, Liangyan Gui, et al. Inter- dreamer: Zero-shot text to 3d dynamic human-object inter- action.Advances in Neural Information Processing Systems, 37:52858–52890, 2024. 1, 2
2024
-
[53]
Interact: Advancing large-scale versatile 3d human-object interaction generation
Sirui Xu, Dongting Li, Yucheng Zhang, Xiyan Xu, Qi Long, Ziyin Wang, Yunzhi Lu, Shuchang Dong, Hezi Jiang, Akshat Gupta, et al. Interact: Advancing large-scale versatile 3d human-object interaction generation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 7048–7060, 2025. 1, 2, 6, 7
2025
-
[54]
Intermimic: Towards universal whole-body control for physics-based human-object interactions
Sirui Xu, Hung Yu Ling, Yu-Xiong Wang, and Liang-Yan Gui. Intermimic: Towards universal whole-body control for physics-based human-object interactions. InCVPR, 2025. 1, 2
2025
-
[55]
Guiding human-object interactions with rich geometry and relations
Mengqing Xue, Yifei Liu, Ling Guo, Shaoli Huang, and Changxing Ding. Guiding human-object interactions with rich geometry and relations. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 22714– 22723, 2025. 1
2025
-
[56]
Chainhoi: Joint-based kinematic chain modeling for human-object in- teraction generation
Ling-An Zeng, Guohong Huang, Yi-Lin Wei, Shengbo Gu, Yu-Ming Tang, Jingke Meng, and Wei-Shi Zheng. Chainhoi: Joint-based kinematic chain modeling for human-object in- teraction generation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12358–12369, 2025
2025
-
[57]
Generating human motion from textual descrip- tions with discrete representations
Jianrong Zhang, Yangsong Zhang, Xiaodong Cun, Yong Zhang, Hongwei Zhao, Hongtao Lu, Xi Shen, and Ying Shan. Generating human motion from textual descrip- tions with discrete representations. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14730–14740, 2023. 1
2023
-
[58]
Physpt: Physics-aware pretrained transformer for estimating human dynamics from monocular videos
Yufei Zhang, Jeffrey O Kephart, Zijun Cui, and Qiang Ji. Physpt: Physics-aware pretrained transformer for estimating human dynamics from monocular videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2305–2317, 2024. 1, 2, 3, 4, 6, 8
2024
-
[59]
Incorporating physics principles for precise human motion prediction
Yufei Zhang, Jeffrey O Kephart, and Qiang Ji. Incorporating physics principles for precise human motion prediction. In Proceedings of the IEEE/CVF Winter Conference on Appli- cations of Computer Vision, pages 6164–6174, 2024. 2
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.