On pre-training for visuo-motor control: Re- visiting a learning-from-scratch baseline.arXiv preprint arXiv:2212.05749, 2022

Nicklas Hansen, Zhecheng Yuan, Yanjie Ze, Tongzhou Mu, Aravind Rajeswaran, Hao Su, Huazhe Xu, Xiaolong Wang · 2022 · arXiv 2212.05749

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

representative citing papers

SFHand: Learning Embodied Manipulation by Streaming Egocentric 3D Hand Forecasting

cs.CV · 2025-11-22 · unverdicted · novelty 7.0

SFHand presents the first streaming language-guided autoregressive framework for 3D hand forecasting, achieving up to 35.8% gains over prior methods and 13.4% better downstream embodied task performance.

IGen: Scalable Data Generation for Robot Learning from Open-World Images

cs.RO · 2025-12-01 · unverdicted · novelty 6.0

IGen generates realistic visuomotor training data including actions and temporally coherent visuals from unstructured open-world images via 3D reconstruction and VLM reasoning.

Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation

cs.RO · 2024-09-24 · unverdicted · novelty 6.0

Gen2Act enables generalizable robot manipulation for unseen objects and novel motions by using zero-shot human video generation from web data to condition a policy trained on an order of magnitude less robot interaction data.

Now You See That: Learning End-to-End Humanoid Locomotion from Raw Pixels

cs.RO · 2026-02-06 · unverdicted · novelty 5.0

An end-to-end policy learns robust humanoid locomotion directly from noisy depth images via high-fidelity sensor simulation, vision-aware distillation from privileged maps, and terrain-specific multi-critic reward shaping.

citing papers explorer

Showing 4 of 4 citing papers.

SFHand: Learning Embodied Manipulation by Streaming Egocentric 3D Hand Forecasting cs.CV · 2025-11-22 · unverdicted · none · ref 19
SFHand presents the first streaming language-guided autoregressive framework for 3D hand forecasting, achieving up to 35.8% gains over prior methods and 13.4% better downstream embodied task performance.
IGen: Scalable Data Generation for Robot Learning from Open-World Images cs.RO · 2025-12-01 · unverdicted · none · ref 21
IGen generates realistic visuomotor training data including actions and temporally coherent visuals from unstructured open-world images via 3D reconstruction and VLM reasoning.
Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation cs.RO · 2024-09-24 · unverdicted · none · ref 38
Gen2Act enables generalizable robot manipulation for unseen objects and novel motions by using zero-shot human video generation from web data to condition a policy trained on an order of magnitude less robot interaction data.
Now You See That: Learning End-to-End Humanoid Locomotion from Raw Pixels cs.RO · 2026-02-06 · unverdicted · none · ref 10
An end-to-end policy learns robust humanoid locomotion directly from noisy depth images via high-fidelity sensor simulation, vision-aware distillation from privileged maps, and terrain-specific multi-critic reward shaping.

On pre-training for visuo-motor control: Re- visiting a learning-from-scratch baseline.arXiv preprint arXiv:2212.05749, 2022

fields

years

verdicts

representative citing papers

citing papers explorer