OmniContact: Chaining Meta-Skills via Contact Flow for Generalizable Humanoid Loco-Manipulation
Pith reviewed 2026-06-26 02:00 UTC · model grok-4.3
The pith
Contact flow representation lets humanoid robots chain meta-skills for long-horizon loco-manipulation with high success.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
OmniContact centers on contact flow, a compact representation consisting of key body trajectories and time-series binary contact signals. This shared interface supports a low-level policy called CF-Track that learns a unified library of loco-manipulation skills and a high-level module called CF-Gen that heuristically synthesizes future contact-flow sequences. Together with the collected OmniContact MoCap-based dataset, the framework enables robust execution, autonomous failure recovery, and flexible composition of meta-skills, demonstrated by 98.7 percent success on Carry Box and 76.5 percent on Push-Stack Boxes while outperforming baselines.
What carries the argument
Contact flow (CF), the compact representation of key body trajectories and time-series binary contact signals that acts as the shared interface between low-level skill execution and high-level sequence composition.
If this is right
- The low-level policy learns a unified library of loco-manipulation skills from the contact flow interface.
- The high-level module can synthesize sequences that include autonomous recovery from failures.
- The framework integrates directly with vision-language models for semantic task decomposition into meta-skills.
- Complex behaviors become possible, such as arranging scattered boxes into specified shapes like a heart.
Where Pith is reading between the lines
- Contact flow might transfer to non-humanoid robots if their bodies can produce analogous trajectory and contact signals.
- Binary contact signals could prove sufficient for bridging perception and planning in other contact-rich manipulation domains.
- The dataset collection method suggests a scalable way to gather training data for similar hierarchical skill systems.
- Extending the approach to fully dynamic scenes with moving obstacles would test whether the representation remains stable.
Load-bearing premise
Contact flow serves as a sufficient shared interface that preserves enough information for both robust low-level execution and reliable high-level composition with autonomous recovery.
What would settle it
A long-horizon task where contact flow sequences lose critical object interaction details, causing the high-level module to produce compositions with success rates no higher than prior baselines.
Figures
read the original abstract
Learning long-horizon humanoid loco-manipulation poses a dual challenge: it requires not only the robust execution of meta-skills but also their seamless, closed-loop chaining equipped with autonomous recovery. Existing approaches remain limited: explicit humanoid-object interaction representations offer precision but are notoriously difficult for high-level planning, whereas implicit skill embeddings are compact but lack the interpretability required for reliable composition. We propose \ours, a hierarchical framework centered on \textbf{contact flow (CF)}, a compact representation consisting of key body trajectories and time-series binary contact signals. Leveraging this shared interface, our low-level policy \textbf{CF-Track} learns a unified library of loco-manipulation skills, while our high-level module \textbf{CF-Gen} heuristically synthesizes future contact-flow sequences. To support this setting, we additionally collect the OmniContact dataset, a MoCap-based HOI corpus for humanoid loco-manipulation (Appendix~\ref{sec:dataset}). Together, they enable robust execution, autonomous failure recovery, and flexible composition of meta-skills for long-horizon tasks. Experiments show that OmniContact achieves \(98.7\%\) success on \textit{Carry Box} and \(76.5\%\) on \textit{Push-Stack Boxes}, outperforming prior baselines by average margins of \(40.9\%\) in meta-skill and \(66.5\%\) in skill chaining. Besides, our framework naturally integrates with VLMs for semantic task decomposition, enabling complex, semantically grounded loco-manipulation behaviors, such as arranging scattered boxes into a heart shape.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces OmniContact, a hierarchical framework for humanoid loco-manipulation that centers on contact flow (CF)—a representation of key body trajectories plus binary contact time-series—as a shared interface. CF-Track learns a library of meta-skills for low-level execution while CF-Gen heuristically synthesizes CF sequences for high-level chaining with autonomous recovery; a new MoCap-based OmniContact dataset supports training. Experiments report 98.7% success on Carry Box and 76.5% on Push-Stack Boxes, with average gains of 40.9% (meta-skill) and 66.5% (chaining) over baselines, plus natural VLM integration for semantic decomposition.
Significance. If the reported performance holds under rigorous evaluation and the binary-contact representation is shown to suffice for recovery, the work would provide a concrete, interpretable bridge between low-level control and compositional planning that improves on both explicit HOI models and opaque skill embeddings. The release of the OmniContact dataset constitutes a clear positive contribution to the community.
major comments (2)
- [§5] §5 (Experiments): The central empirical claims rest on specific success rates (98.7%, 76.5%) and improvement margins (40.9%, 66.5%), yet the section supplies no information on trial counts, variance or error bars, baseline implementations or hyperparameters, statistical tests, or failure-mode analysis. Without these, the data cannot be assessed as support for the claim that CF is the enabling factor.
- [§3.1] §3.1 (Contact Flow definition): CF is defined using binary contact signals that discard force magnitude, friction coefficients, and continuous velocities. The manuscript provides no ablation replacing binary contacts with richer signals nor any analysis showing that the omitted dynamics are unnecessary for autonomous recovery on Push-Stack Boxes; this leaves the sufficiency of the interface for closed-loop chaining unverified.
minor comments (1)
- [§4.2] The description of CF-Gen heuristics could include a pseudocode listing or explicit decision rules to improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive feedback. The two major comments highlight important aspects of experimental reporting and the design choices in the contact flow representation. We address each point below and indicate the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [§5] §5 (Experiments): The central empirical claims rest on specific success rates (98.7%, 76.5%) and improvement margins (40.9%, 66.5%), yet the section supplies no information on trial counts, variance or error bars, baseline implementations or hyperparameters, statistical tests, or failure-mode analysis. Without these, the data cannot be assessed as support for the claim that CF is the enabling factor.
Authors: We agree that the current presentation of results in §5 lacks sufficient detail for rigorous evaluation. In the revised manuscript we will expand the experimental section to report the exact number of trials per task (100 independent rollouts), standard deviations across trials, full baseline implementation details and hyperparameter settings, results of statistical significance tests, and a categorized failure-mode analysis. These additions will make the contribution of contact flow clearer and allow direct assessment of the reported margins. revision: yes
-
Referee: [§3.1] §3.1 (Contact Flow definition): CF is defined using binary contact signals that discard force magnitude, friction coefficients, and continuous velocities. The manuscript provides no ablation replacing binary contacts with richer signals nor any analysis showing that the omitted dynamics are unnecessary for autonomous recovery on Push-Stack Boxes; this leaves the sufficiency of the interface for closed-loop chaining unverified.
Authors: Binary contact signals were selected to maintain compactness and interpretability for high-level chaining. The 76.5% success rate on Push-Stack Boxes, which includes autonomous recovery, offers task-level evidence that the representation is sufficient for the evaluated scenarios. To directly address the concern we will add a dedicated paragraph in §3.1 explaining the design rationale and will include, where data permits, a brief comparison of binary versus richer contact signals in the revision. revision: partial
Circularity Check
No circularity: experimental claims rest on reported outcomes, not self-referential definitions or fits
full rationale
The paper presents a hierarchical framework using contact flow as a shared interface between low-level CF-Track policies and high-level CF-Gen synthesis, supported by a new MoCap dataset. All load-bearing claims (98.7% Carry Box success, 76.5% Push-Stack success, 40.9% and 66.5% margins) are stated as direct experimental measurements rather than derived quantities. No equations, parameter fits, uniqueness theorems, or self-citations appear in the provided text that would reduce any prediction to an input by construction. The representation choice and dataset collection are presented as design decisions validated externally by task performance, with no self-definitional loops or renamed empirical patterns.
Axiom & Free-Parameter Ledger
invented entities (1)
-
contact flow (CF)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Qingwei Ben, Feiyu Jia, Jia Zeng, Junting Dong, Dahua Lin, and Jiangmiao Pang. Homie: Humanoid loco- manipulation with isomorphic exoskeleton cockpit.arXiv preprint arXiv:2502.13013, 2025. 2, 3
arXiv 2025
-
[2]
Expressive whole- body control for humanoid robots.arXiv preprint arXiv:2402.16796, 2024
Xuxin Cheng, Yandong Ji, Junming Chen, Ruihan Yang, Ge Yang, and Xiaolong Wang. Expressive whole- body control for humanoid robots.arXiv preprint arXiv:2402.16796, 2024. 3
arXiv 2024
-
[3]
Task and motion planning for humanoid loco-manipulation
Michal Ciebielski, Victor Dhédin, and Majid Khadiv. Task and motion planning for humanoid loco-manipulation. In 2025 IEEE-RAS 24th International Conference on Hu- manoid Robots (Humanoids), pages 1179–1186. IEEE,
2025
-
[4]
Runpei Dong, Ziyan Li, Xialin He, and Saurabh Gupta. Learning humanoid end-effector control for open- vocabulary visual loco-manipulation.arXiv preprint arXiv:2602.16705, 2026. 2
Pith/arXiv arXiv 2026
-
[5]
Humanplus: Humanoid shadowing and imitation from humans.arXiv preprint arXiv:2406.10454,
Zipeng Fu, Qingqing Zhao, Qi Wu, Gordon Wetzstein, and Chelsea Finn. Humanplus: Humanoid shadowing and imitation from humans.arXiv preprint arXiv:2406.10454,
-
[6]
Tairan He, Zhengyi Luo, Xialin He, Wenli Xiao, Chong Zhang, Weinan Zhang, Kris Kitani, Changliu Liu, and Guanya Shi. Omnih2o: Universal and dexterous human-to- humanoid whole-body teleoperation and learning.arXiv preprint arXiv:2406.08858, 2024. 2, 3
arXiv 2024
-
[7]
Tairan He, Jiawei Gao, Wenli Xiao, Yuanhang Zhang, Zi Wang, Jiashun Wang, Zhengyi Luo, Guanqi He, Nikhil Sobanbab, Chaoyi Pan, et al. Asap: Aligning simulation and real-world physics for learning agile humanoid whole- body skills.arXiv preprint arXiv:2502.01143, 2025. 3
arXiv 2025
-
[8]
Tairan He, Zi Wang, Haoru Xue, Qingwei Ben, Zhengyi Luo, Wenli Xiao, Ye Yuan, Xingye Da, Fernando Cas- tañeda, Shankar Sastry, et al. Viral: Visual sim-to-real at scale for humanoid loco-manipulation.arXiv preprint arXiv:2511.15200, 2025. 2, 3
arXiv 2025
-
[9]
Learning getting-up policies for real-world hu- manoid robots.ArXiv, abs/2502.12152, 2025
Xialin He, Runpei Dong, Zixuan Chen, and Saurabh Gupta. Learning getting-up policies for real-world hu- manoid robots.ArXiv, abs/2502.12152, 2025. 3
arXiv 2025
-
[10]
Xialin He, Sirui Xu, Xinyao Li, Runpei Dong, Liuyu Bian, Yu-Xiong Wang, and Liang-Yan Gui. Ultra: Unified multi- modal control for autonomous humanoid whole-body loco- manipulation.arXiv preprint arXiv:2603.03279, 2026. 3
arXiv 2026
-
[11]
Haoran Jiang, Jin Chen, Qingwen Bu, Li Chen, Modi Shi, Yanjie Zhang, Delong Li, Chuanzhe Suo, Chuang Wang, Zhihui Peng, et al. Wholebodyvla: Towards unified latent vla for whole-body loco-manipulation control.arXiv preprint arXiv:2512.11047, 2025. 4
arXiv 2025
-
[12]
Switch: Learn- ing agile skills switching for humanoid robots.arXiv preprint arXiv:2604.14834, 2026
Yuen-Fui Lau, Qihan Zhao, Yinhuai Wang, Runyi Yu, Hok Wai Tsui, Qifeng Chen, and Ping Tan. Switch: Learn- ing agile skills switching for humanoid robots.arXiv preprint arXiv:2604.14834, 2026. 3
Pith/arXiv arXiv 2026
-
[13]
Dongting Li, Xingyu Chen, Qianyang Wu, Bo Chen, Sikai Wu, Hanyu Wu, Guoyao Zhang, Liang Li, Mingliang Zhou, Diyun Xiang, et al. Haic: Humanoid agile object in- teraction control via dynamics-aware world model.arXiv preprint arXiv:2602.11758, 2026. 2
Pith/arXiv arXiv 2026
-
[14]
Jialong Li, Xuxin Cheng, Tianshu Huang, Shiqi Yang, Ri- Zhao Qiu, and Xiaolong Wang. Amo: Adaptive motion optimization for hyper-dexterous humanoid whole-body control.arXiv preprint arXiv:2505.03738, 2025. 3
arXiv 2025
-
[15]
Yitang Li, Zhengyi Luo, Tonghe Zhang, Cunxi Dai, Anssi Kanervisto, Andrea Tirinzoni, Haoyang Weng, Kris Ki- tani, Mateusz Guzek, Ahmed Touati, et al. Bfm-zero: A 9 OmniContact : Chaining Meta-Skills via Contact Flow promptable behavioral foundation model for humanoid control using unsupervised reinforcement learning.arXiv preprint arXiv:2511.04131, 2025
arXiv 2025
-
[16]
Hold my beer: Learning gentle humanoid locomotion and end-effector stabilization control
Yitang Li, Yuanhang Zhang, Wenli Xiao, Chaoyi Pan, Haoyang Weng, Guanqi He, Tairan He, and Guanya Shi. Hold my beer: Learning gentle humanoid locomotion and end-effector stabilization control. InRSS 2025 Workshop on Whole-body Control and Bimanual Manipulation: Ap- plications in Humanoids and Beyond, 2025
2025
-
[17]
Qiayuan Liao, Takara E Truong, Xiaoyu Huang, Yu- man Gao, Guy Tevet, Koushil Sreenath, and C Karen Liu. Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion.arXiv preprint arXiv:2508.08241, 2025. 3
Pith/arXiv arXiv 2025
-
[18]
Yutang Lin, Jieming Cui, Yixuan Li, Baoxiong Jia, Yixin Zhu, and Siyuan Huang. Lessmimic: Long-horizon hu- manoid interaction with unified distance field representa- tions.arXiv preprint arXiv:2602.21723, 2026. 2, 3, 7, 15, 17, 21
arXiv 2026
-
[19]
Opt2skill: Imitating dynamically- feasible whole-body trajectories for versatile humanoid loco-manipulation.IEEE Robotics and Automation Let- ters, 2025
Fukang Liu, Zhaoyuan Gu, Yilin Cai, Ziyi Zhou, Hyun- young Jung, Jaehwi Jang, Shijie Zhao, Sehoon Ha, Yue Chen, Danfei Xu, et al. Opt2skill: Imitating dynamically- feasible whole-body trajectories for versatile humanoid loco-manipulation.IEEE Robotics and Automation Let- ters, 2025. 2
2025
-
[20]
Ego-vision world model for humanoid contact planning.arXiv preprint arXiv:2510.11682, 2025
Hang Liu, Yuman Gao, Sangli Teng, Yufeng Chi, Yakun Sophia Shao, Zhongyu Li, Maani Ghaffari, and Koushil Sreenath. Ego-vision world model for humanoid contact planning.arXiv preprint arXiv:2510.11682, 2025. 4
arXiv 2025
-
[21]
Zhengyi Luo, Ye Yuan, Tingwu Wang, Chenran Li, Sirui Chen, Fernando Castaneda, Zi-Ang Cao, Jiefeng Li, David Minor, Qingwei Ben, et al. Sonic: Supersizing motion tracking for natural humanoid whole-body control.arXiv preprint arXiv:2511.07820, 2025. 2, 3, 7, 15, 16, 21
Pith/arXiv arXiv 2025
-
[22]
Carlson, Ji Yuan Feng, Animesh Garg, Renato Gasoto, Lionel Gulich, Yijie Guo, M
Mayank Mittal, Pascal Roth, James Tigue, Antoine Richard, Octi Zhang, Peter Du, Antonio Serrano-Muñoz, Xinjie Yao, René Zurbrügg, Nikita Rudin, Lukasz Wawrzyniak, Milad Rakhsha, Alain Denzler, Eric Heiden, Ales Borovicka, Ossama Ahmed, Iretiayo Akinola, Abrar Anwar, Mark T. Carlson, Ji Yuan Feng, Animesh Garg, Renato Gasoto, Lionel Gulich, Yijie Guo, M. G...
Pith/arXiv arXiv 2025
-
[23]
Amp: Adversarial motion priors for stylized physics-based character control.ACM Transac- tions on Graphics (ToG), 40(4):1–20, 2021
Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, and Angjoo Kanazawa. Amp: Adversarial motion priors for stylized physics-based character control.ACM Transac- tions on Graphics (ToG), 40(4):1–20, 2021. 3
2021
-
[24]
Humanoid locomotion as next token prediction.Advances in neural information processing systems, 37:79307–79324, 2024
Ilija Radosavovic, Bike Zhang, Baifeng Shi, Jathushan Rajasegaran, Sarthak Kamat, Trevor Darrell, Koushil Sreenath, and Jitendra Malik. Humanoid locomotion as next token prediction.Advances in neural information processing systems, 37:79307–79324, 2024. 3
2024
-
[25]
Junli Ren, Junfeng Long, Tao Huang, Huayi Wang, Zirui Wang, Feiyu Jia, Wentao Zhang, Jingbo Wang, Ping Luo, and Jiangmiao Pang. Humanoid goalkeeper: Learning from position conditioned task-motion constraints.arXiv preprint arXiv:2510.18002, 2025. 3
arXiv 2025
-
[26]
André Schakkal, Ben Zandonati, Zhutian Yang, and Navid Azizan. Hierarchical vision-language planning for multi-step humanoid manipulation.arXiv preprint arXiv:2506.22827, 2025. 4
arXiv 2025
-
[27]
Langwbc: Language-directed humanoid whole-body control via end-to-end learning
Yiyang Shao, Xiaoyu Huang, Bike Zhang, Qiayuan Liao, Yuman Gao, Yufeng Chi, Zhongyu Li, Sophia Shao, and Koushil Sreenath. Langwbc: Language-directed humanoid whole-body control via end-to-end learning. ArXiv, abs/2504.21738, 2025. 3
arXiv 2025
-
[28]
Modi Shi, Shijia Peng, Jin Chen, Haoran Jiang, Yinghui Li, Di Huang, Ping Luo, Hongyang Li, and Li Chen. Egohumanoid: Unlocking in-the-wild loco-manipulation with robot-free egocentric demonstration.arXiv preprint arXiv:2602.10106, 2026. 2, 3
Pith/arXiv arXiv 2026
-
[29]
Zhi Su, Bike Zhang, Nima Rahmanian, Yuman Gao, Qiayuan Liao, Caitlin Regan, Koushil Sreenath, and S Shankar Sastry. Hitter: A humanoid table tennis robot via hierarchical planning and learning.arXiv preprint arXiv:2508.21043, 2025. 3
arXiv 2025
-
[30]
Wandong Sun, Luying Feng, Baoshi Cao, Yang Liu, Yaochu Jin, and Zongwu Xie. Ulc: A unified and fine- grained controller for humanoid loco-manipulation.arXiv preprint arXiv:2507.06905, 2025. 2
arXiv 2025
-
[31]
Physically consistent humanoid loco- manipulation using latent diffusion models
Ilyass Taouil, Haizhou Zhao, Angela Dai, and Ma- jid Khadiv. Physically consistent humanoid loco- manipulation using latent diffusion models. In2025 IEEE- RAS 24th International Conference on Humanoid Robots (Humanoids), pages 1–8. IEEE, 2025. 4
2025
-
[32]
Maskedmimic: Unified physics-based character control through masked motion inpainting.ACM Transactions On Graphics (TOG), 43(6):1–21, 2024
Chen Tessler, Yunrong Guo, Ofir Nabati, Gal Chechik, and Xue Bin Peng. Maskedmimic: Unified physics-based character control through masked motion inpainting.ACM Transactions On Graphics (TOG), 43(6):1–21, 2024. 3, 4
2024
-
[33]
Mujoco: A physics engine for model-based control
Emanuel Todorov, Tom Erez, and Yuval Tassa. Mu- joco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033. IEEE, 2012. doi: 10.1109/IROS.2012.6386109. 15
-
[34]
Beamdojo: Learning agile humanoid locomotion on sparse footholds
Huayi Wang, Zirui Wang, Junli Ren, Qingwei Ben, Tao Huang, Weinan Zhang, and Jiangmiao Pang. Beamdojo: Learning agile humanoid locomotion on sparse footholds. ArXiv, abs/2502.10363, 2025. 3
arXiv 2025
-
[35]
Huayi Wang, Wentao Zhang, Runyi Yu, Tao Huang, Junli Ren, Feiyu Jia, Zirui Wang, Xiaojie Niu, Xiao Chen, Jiahe Chen, et al. Physhsi: Towards a real-world generalizable and natural humanoid-scene interaction system.arXiv preprint arXiv:2510.11072, 2025. 2, 3, 7, 15, 17, 21
arXiv 2025
-
[36]
Yinhuai Wang, Jing Lin, Ailing Zeng, Zhengyi Luo, Jian Zhang, and Lei Zhang. Physhoi: Physics-based imita- tion of dynamic human-object interaction.arXiv preprint arXiv:2312.04393, 2023. 2
arXiv 2023
-
[37]
Skillmimic: Learning basketball inter- action skills from demonstrations
Yinhuai Wang, Qihan Zhao, Runyi Yu, Hok Wai Tsui, Ailing Zeng, Jing Lin, Zhengyi Luo, Jiwen Yu, Xiu Li, Qifeng Chen, et al. Skillmimic: Learning basketball inter- action skills from demonstrations. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17540–17549, 2025. 2
2025
-
[38]
Humanx: Toward agile and gener- alizable humanoid interaction skills from human videos
Yinhuai Wang, Qihan Zhao, Yuen Fui Lau, Runyi Yu, Hok Wai Tsui, Qifeng Chen, Jingbo Wang, Jiangmiao Pang, and Ping Tan. Humanx: Toward agile and gener- alizable humanoid interaction skills from human videos. arXiv preprint arXiv:2602.02473, 2026. 2, 3
arXiv 2026
-
[39]
Haoyang Weng, Yitang Li, Nikhil Sobanbabu, Zihan Wang, Zhengyi Luo, Tairan He, Deva Ramanan, and 10 OmniContact : Chaining Meta-Skills via Contact Flow Guanya Shi. Hdmi: Learning interactive humanoid whole-body control from human videos.arXiv preprint arXiv:2509.16757, 2025. 2, 3, 7, 15, 16, 21
arXiv 2025
-
[40]
Tianshu Wu, Xiangqi Kong, Yue Chen, Qize Yu, Hang Ye, Jia Li, Yizhou Wang, and Hao Dong. Sugar: A scalable human-video-driven generalizable humanoid loco-manipulation learning framework.arXiv preprint arXiv:2605.20373, 2026. 2
Pith/arXiv arXiv 2026
-
[41]
Uniphys: Unified planner and controller with dif- fusion for flexible physics-based character control
Yan Wu, Korrawe Karunratanakul, Zhengyi Luo, and Siyu Tang. Uniphys: Unified planner and controller with dif- fusion for flexible physics-based character control. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13214–13224, 2025. 3, 4
2025
-
[42]
Parc: Physics-based augmentation with reinforcement learning for character controllers
Michael Xu, Yi Shi, KangKang Yin, and Xue Bin Peng. Parc: Physics-based augmentation with reinforcement learning for character controllers. InProceedings of the Special Interest Group on Computer Graphics and Inter- active Techniques Conference Conference Papers, pages 1–11, 2025. 3, 4
2025
-
[43]
Haoru Xue, Tairan He, Zi Wang, Qingwei Ben, Wenli Xiao, Zhengyi Luo, Xingye Da, Fernando Castañeda, Guanya Shi, Shankar Sastry, et al. Opening the sim-to-real door for humanoid pixel-to-action policy transfer.arXiv preprint arXiv:2512.01061, 2025. 2, 3
arXiv 2025
-
[44]
Haoru Xue, Xiaoyu Huang, Dantong Niu, Qiayuan Liao, Thomas Kragerud, Jan Tommy Gravdahl, Xue Bin Peng, Guanya Shi, Trevor Darrell, Koushil Sreenath, et al. Leverb: Humanoid whole-body control with la- tent vision-language instruction,(2025).URL https://arxiv. org/abs/2506.13751, 3(10), 2025. 4
arXiv 2025
-
[45]
Yufei Xue, Wentao Dong, Minghuan Liu, Weinan Zhang, and Jiangmiao Pang. A unified and general humanoid whole-body controller for fine-grained locomotion.ArXiv, abs/2502.03206, 2025. 3
arXiv 2025
-
[46]
Lujie Yang, Xiaoyu Huang, Zhen Wu, Angjoo Kanazawa, Pieter Abbeel, Carmelo Sferrazza, C Karen Liu, Rocky Duan, and Guanya Shi. Omniretarget: Interaction- preserving data generation for humanoid whole-body loco-manipulation and scene interaction.arXiv preprint arXiv:2509.26633, 2025. 2, 3
Pith/arXiv arXiv 2025
-
[47]
Shaofeng Yin, Yanjie Ze, Hong-Xing Yu, C Karen Liu, and Jiajun Wu. Visualmimic: Visual humanoid loco- manipulation via motion tracking and generation.arXiv preprint arXiv:2509.20322, 2025. 2, 3
arXiv 2025
-
[48]
Skillmimic- v2: Learning robust and generalizable interaction skills from sparse and noisy demonstrations
Runyi Yu, Yinhuai Wang, Qihan Zhao, Hok Wai Tsui, Jingbo Wang, Ping Tan, and Qifeng Chen. Skillmimic- v2: Learning robust and generalizable interaction skills from sparse and noisy demonstrations. InProceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers, pages 1–11, 2025. 2
2025
-
[49]
Twist: Tele- operated whole-body imitation system.arXiv preprint arXiv:2505.02833, 2025
Yanjie Ze, Zixuan Chen, Joao Pedro Araújo, Zi-ang Cao, Xue Bin Peng, Jiajun Wu, and C Karen Liu. Twist: Tele- operated whole-body imitation system.arXiv preprint arXiv:2505.02833, 2025. 2, 3
arXiv 2025
-
[50]
Wococo: Learning whole-body humanoid control with sequential contacts
Chong Zhang, Wenli Xiao, Tairan He, and Guanya Shi. Wococo: Learning whole-body humanoid control with sequential contacts. InConference on Robot Learning, pages 455–472. PMLR, 2025. 2, 3
2025
-
[51]
Falcon: Learning force- adaptive humanoid loco-manipulation.arXiv preprint arXiv:2505.06776, 2025
Yuanhang Zhang, Yifu Yuan, Prajwal Gurunath, Ishita Gupta, Shayegan Omidshafiei, Ali-akbar Agha- mohammadi, Marcell Vazquez-Chanlatte, Liam Peder- sen, Tairan He, and Guanya Shi. Falcon: Learning force- adaptive humanoid loco-manipulation.arXiv preprint arXiv:2505.06776, 2025. 3
arXiv 2025
-
[52]
Zhikai Zhang, Haofei Lu, Yunrui Lian, Ziqing Chen, Yun Liu, Chenghuai Lin, Han Xue, Zicheng Zeng, Zekun Qi, Shaolin Zheng, et al. Learning athletic humanoid tennis skills from imperfect human motion data.arXiv preprint arXiv:2603.12686, 2026. 3
arXiv 2026
-
[53]
Siheng Zhao, Yanjie Ze, Yue Wang, C Karen Liu, Pieter Abbeel, Guanya Shi, and Rocky Duan. Resmimic: From general motion tracking to humanoid whole-body loco-manipulation via residual learning.arXiv preprint arXiv:2510.05070, 2025. 3
arXiv 2025
-
[54]
Humanoid parkour learning.arXiv preprint arXiv:2406.10759, 2024
Ziwen Zhuang, Shenzhe Yao, and Hang Zhao. Humanoid parkour learning.arXiv preprint arXiv:2406.10759, 2024. 3 11 OmniContact : Chaining Meta-Skills via Contact Flow Appendix A. Dataset We introduce theOmniContact dataset, a compre- hensive human-object interaction (HOI) corpus tai- lored specifically for humanoid loco-manipulation. It captures object-const...
arXiv 2024
-
[55]
Walking stability: balance and gait quality
-
[56]
Box contact: whether the hands/body contact the box in a plausible carrying pose
-
[57]
Box stability: whether the box moves smoothly without obvious sliding, bouncing, penetration, or falling
-
[58]
Motion smoothness: absence of sudden jitter, joint twitching, or velocity discontinuities
-
[59]
Output a table in the following format: Video ID | Success Valid | Naturalness Score | Main Reason F
Task-level naturalness: whether the robot moves the box near the target in a reasonable way. Output a table in the following format: Video ID | Success Valid | Naturalness Score | Main Reason F. Compatibility with VLMs The compact and structured representation of contact flow provides a natural interface for high-level seman- tic planners, such as vision-...
-
[60]
A top-down image of the scene with movable objects
-
[61]
A natural-language task instruction
-
[62]
task_type
Available meta-skills: pick-place, push , kick, and spatial rearrangement. Your job: - Identify the task-relevant objects from the image. - Convert the instruction into object- level subgoals. - For each subgoal, choose a meta-skill and specify the target pose or target region. - Do not output humanoid joint motions, contact timings, or low-level controls...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.