MonoDuo generates synthetic bimanual demonstrations from single-arm teleoperation plus human collaboration to train policies achieving up to 70% zero-shot success on five manipulation tasks, with 65-70% gains from 25-shot finetuning.
hub Canonical reference
Open-television: Teleoperation with immersive active visual feedback
Canonical reference. 71% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
DexTwist detects tripod pinches, estimates the intended screw axis and twist magnitude, then applies real-time joint refinement to track turning progress while stabilizing the robot's tripod geometry.
DexSynRefine couples HOI motion manifold flow primitives with task-space residual RL and proprioceptive adaptation to convert human-object interaction data into executable dexterous robot motions, reporting 50-70 point real-world success rate gains over kinematic retargeting on five tasks.
Lucid-XR uses XR-headset physics simulation and physics-guided video generation to create synthetic data that trains robot policies transferring zero-shot to unseen real-world manipulation tasks.
ActiveGlasses learns robot manipulation from ego-centric human demos captured with active vision via smart glasses, achieving zero-shot transfer using object-centric point-cloud policies.
EgoVerse releases 1,362 hours of standardized egocentric human data across 1,965 tasks and shows via multi-lab experiments that robot policy performance scales with human data volume when the data aligns with robot objectives.
IGen generates realistic visuomotor training data including actions and temporally coherent visuals from unstructured open-world images via 3D reconstruction and VLM reasoning.
Isaac Lab is a unified GPU-native platform combining high-fidelity physics, photorealistic rendering, multi-frequency sensors, domain randomization, and learning pipelines for scalable multi-modal robot policy training.
EgoVLA pretrains VLA models on egocentric human videos, retargets predicted actions to robots via IK, and fine-tunes on few robot demos to improve bimanual manipulation performance on a new simulation benchmark.
DreamPolicy integrates an autoregressive diffusion world model with policy learning to produce a single scalable policy that generalizes to unseen composite terrains for humanoid locomotion.
DexWild co-trains dexterous robot policies on in-the-wild human hand interactions recorded with a low-cost system and limited robot data, achieving 68.5% success in unseen environments and 5.8x better cross-embodiment generalization.
FAST applies discrete cosine transform to robot action sequences for efficient tokenization, enabling autoregressive VLAs to succeed on high-frequency dexterous tasks and scale to 10k hours of data while matching diffusion VLA performance with up to 5x faster training.
WARP is an offline retargeting method using a SEW geometric solver to produce consistent whole-body robot trajectories from human demonstrations for zero-shot mobile manipulation.
Switch enables humanoid robots to perform agile, seamless transitions between locomotion skills via a kinematic skill graph, DRL tracking policy, and real-time graph-search scheduler.
HTD, a multimodal transformer policy trained with behavioral cloning and touch dreaming to predict future tactile latents, achieves a 90.9% relative success rate improvement over baselines on five real-world contact-rich humanoid loco-manipulation tasks.
A multi-view point cloud VR system with wrist RGB detail outperforms RGB streams and stereo views in robot teleoperation tasks per a 31-participant user study.
An open-source teleoperation framework enables intuitive whole-body control of mobile manipulators using commodity smartphone, leader arms, and foot pedals instead of costly VR equipment.
A two-room Wizard-of-Oz pilot collected 53 multimodal trials from five users to capture dialogue ambiguities for training ambiguity-aware assistive robot controllers.
GAM framework uses arc-length parameterization for temporal invariance and schema-affine factorization for geometric invariance to build a covariant action manifold integrated into VLA models for improved generalization from sparse data.
citing papers explorer
-
Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning
Isaac Lab is a unified GPU-native platform combining high-fidelity physics, photorealistic rendering, multi-frequency sensors, domain randomization, and learning pipelines for scalable multi-modal robot policy training.
-
FAST: Efficient Action Tokenization for Vision-Language-Action Models
FAST applies discrete cosine transform to robot action sequences for efficient tokenization, enabling autoregressive VLAs to succeed on high-frequency dexterous tasks and scale to 10k hours of data while matching diffusion VLA performance with up to 5x faster training.