M2R2 proposes a multimodal robotic representation for temporal action segmentation that combines proprioceptive and exteroceptive sensors with a novel training strategy enabling feature reuse across models, achieving new state-of-the-art results on three robotic datasets.
See, hear, and feel: Smart sensory fusion for robotic manipulation,
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.RO 3years
2025 3verdicts
UNVERDICTED 3representative citing papers
DreamTacVLA grounds VLA models in contact physics by aligning multi-scale vision-tactile inputs and predicting future tactile states, reaching up to 95% success on contact-rich tasks.
MSDP pretrains a transformer encoder via masked multisensory reconstruction and feeds the embeddings into an asymmetric actor-critic RL setup, yielding faster learning and high real-robot success rates with only 6,000 interactions.
citing papers explorer
-
M2R2: MultiModal Robotic Representation for Temporal Action Segmentation
M2R2 proposes a multimodal robotic representation for temporal action segmentation that combines proprioceptive and exteroceptive sensors with a novel training strategy enabling feature reuse across models, achieving new state-of-the-art results on three robotic datasets.
-
Learning to Feel the Future: DreamTacVLA for Contact-Rich Manipulation
DreamTacVLA grounds VLA models in contact physics by aligning multi-scale vision-tactile inputs and predicting future tactile states, reaching up to 95% success on contact-rich tasks.
-
Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning
MSDP pretrains a transformer encoder via masked multisensory reconstruction and feeds the embeddings into an asymmetric actor-critic RL setup, yielding faster learning and high real-robot success rates with only 6,000 interactions.