MAPL trains quadruped locomotion policies from LLM-generated multi-objective trajectory preferences and matches or exceeds expert-designed reward performance in four environments without manual reward engineering.
Learning Tactile-Aware Quadrupedal Loco-Manipulation Policies
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Quadrupedal loco-manipulation is commonly built on visual perception and proprioception. Yet reliable contact-rich manipulation remains difficult: vision and proprioception alone cannot resolve uncertain, evolving interactions with the environment. Tactile sensing offers direct contact observability, but scalable tactile-aware learning framework for quadrupedal loco-manipulation is still underexplored. In this paper, we present a tactile-aware loco-manipulation policy learning pipeline with a hierarchical structure. Our approach has two key components. First, we leverage real-world human demonstrations to train a tactile-conditioned visuotactile high-level policy. This policy predicts not only end-effector trajectories for manipulation, but also the evolving tactile interaction cues that characterize how contact should develop over time. Second, we perform large-scale reinforcement learning in simulation to learn a tactile-aware whole-body control policy that tracks diverse commanded trajectories and tactile interaction cues, and transfers zero-shot to the real world. Together, these components enable coordinated locomotion and manipulation under contact-rich scenarios. We evaluate the system on real-world contact-rich tasks, including in-hand reorientation with insertion, valve tightening, and delicate object manipulation. Compared to vision-only and visuotactile baselines, our method improves performance by 28.54% on average across these tasks.
fields
cs.RO 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
MAPL: Multi-Objective Preference Learning for Robot Locomotion
MAPL trains quadruped locomotion policies from LLM-generated multi-objective trajectory preferences and matches or exceeds expert-designed reward performance in four environments without manual reward engineering.