Multimodal Diffusion Forcing trains a diffusion model on partially masked multimodal robot trajectories to learn temporal and cross-modal dependencies for forceful manipulation.
Learning visuotactile skills with two multifingered hands
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.RO 7roles
background 4representative citing papers
FingerViP equips each finger with a miniature camera and trains a multi-view diffusion policy that achieves 80.8% success on real-world dexterous tasks previously limited by wrist-camera occlusion.
TeleGate achieves high-precision real-time whole-body teleoperation of humanoid robots by dynamically gating between expert policies and using a VAE motion prior to infer future intent from history, outperforming distillation baselines on dynamic motions with only 2.5 hours of mocap data.
DexVLA combines a scaled diffusion action expert with embodiment curriculum learning to achieve better generalization and performance than prior VLA models on diverse robot hardware and long-horizon tasks.
FAST applies discrete cosine transform to robot action sequences for efficient tokenization, enabling autoregressive VLAs to succeed on high-frequency dexterous tasks and scale to 10k hours of data while matching diffusion VLA performance with up to 5x faster training.
A hybrid event-driven switching system pairs VLA models with lightweight dexterous policies on a compliant anthropomorphic hand to perform language-conditioned multi-finger tasks with cross-embodiment modularity.
FlexiTac is a scalable piezoresistive tactile sensing system with flexible FPC-Velostat-FPC pads and a 100 Hz multi-channel readout board that mounts on rigid or soft grippers and supports visuo-tactile learning.
citing papers explorer
-
Multimodal Diffusion Forcing for Forceful Manipulation
Multimodal Diffusion Forcing trains a diffusion model on partially masked multimodal robot trajectories to learn temporal and cross-modal dependencies for forceful manipulation.
-
FingerViP: Learning Real-World Dexterous Manipulation with Fingertip Visual Perception
FingerViP equips each finger with a miniature camera and trains a multi-view diffusion policy that achieves 80.8% success on real-world dexterous tasks previously limited by wrist-camera occlusion.
-
TeleGate: Whole-Body Humanoid Teleoperation via Gated Expert Selection with Motion Prior
TeleGate achieves high-precision real-time whole-body teleoperation of humanoid robots by dynamically gating between expert policies and using a VAE motion prior to infer future intent from history, outperforming distillation baselines on dynamic motions with only 2.5 hours of mocap data.
-
DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control
DexVLA combines a scaled diffusion action expert with embodiment curriculum learning to achieve better generalization and performance than prior VLA models on diverse robot hardware and long-horizon tasks.
-
FAST: Efficient Action Tokenization for Vision-Language-Action Models
FAST applies discrete cosine transform to robot action sequences for efficient tokenization, enabling autoregressive VLAs to succeed on high-frequency dexterous tasks and scale to 10k hours of data while matching diffusion VLA performance with up to 5x faster training.
-
Language Conditioned Multi-Finger Dexterous Manipulation Enabled by Physical Compliance and Switching of Controllers
A hybrid event-driven switching system pairs VLA models with lightweight dexterous policies on a compliant anthropomorphic hand to perform language-conditioned multi-finger tasks with cross-embodiment modularity.
-
FlexiTac: A Low-Cost, Open-Source, Scalable Tactile Sensing Solution for Robotic Systems
FlexiTac is a scalable piezoresistive tactile sensing system with flexible FPC-Velostat-FPC pads and a 100 Hz multi-channel readout board that mounts on rigid or soft grippers and supports visuo-tactile learning.