See, hear, and feel: Smart sensory fusion for robotic manipulation,

· 2022 · arXiv 2212.03858

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

M2R2: MultiModal Robotic Representation for Temporal Action Segmentation

cs.RO · 2025-04-25 · unverdicted · novelty 7.0

M2R2 proposes a multimodal robotic representation for temporal action segmentation that combines proprioceptive and exteroceptive sensors with a novel training strategy enabling feature reuse across models, achieving new state-of-the-art results on three robotic datasets.

Learning to Feel the Future: DreamTacVLA for Contact-Rich Manipulation

cs.RO · 2025-12-29 · unverdicted · novelty 6.0

DreamTacVLA grounds VLA models in contact physics by aligning multi-scale vision-tactile inputs and predicting future tactile states, reaching up to 95% success on contact-rich tasks.

Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning

cs.RO · 2025-11-18 · unverdicted · novelty 6.0

MSDP pretrains a transformer encoder via masked multisensory reconstruction and feeds the embeddings into an asymmetric actor-critic RL setup, yielding faster learning and high real-robot success rates with only 6,000 interactions.

citing papers explorer

Showing 3 of 3 citing papers.

M2R2: MultiModal Robotic Representation for Temporal Action Segmentation cs.RO · 2025-04-25 · unverdicted · none · ref 24
M2R2 proposes a multimodal robotic representation for temporal action segmentation that combines proprioceptive and exteroceptive sensors with a novel training strategy enabling feature reuse across models, achieving new state-of-the-art results on three robotic datasets.
Learning to Feel the Future: DreamTacVLA for Contact-Rich Manipulation cs.RO · 2025-12-29 · unverdicted · none · ref 19
DreamTacVLA grounds VLA models in contact physics by aligning multi-scale vision-tactile inputs and predicting future tactile states, reaching up to 95% success on contact-rich tasks.
Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning cs.RO · 2025-11-18 · unverdicted · none · ref 11
MSDP pretrains a transformer encoder via masked multisensory reconstruction and feeds the embeddings into an asymmetric actor-critic RL setup, yielding faster learning and high real-robot success rates with only 6,000 interactions.

See, hear, and feel: Smart sensory fusion for robotic manipulation,

fields

years

verdicts

representative citing papers

citing papers explorer