Where are we in the search for an artificial vi- sual cortex for embodied intelligence?

· 2023 · arXiv 2303.18240

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 2 dataset 1

citation-polarity summary

background 2 use dataset 1

representative citing papers

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

cs.RO · 2023-10-13 · unverdicted · novelty 7.0

A collaborative dataset spanning 22 robots and 527 skills enables RT-X models that transfer capabilities across different robot embodiments.

WARPED: Wrist-Aligned Rendering for Robot Policy Learning from Egocentric Human Demonstrations

cs.RO · 2026-04-12 · unverdicted · novelty 6.0

WARPED synthesizes realistic wrist-view observations from monocular egocentric human videos via foundation models, hand-object tracking, retargeting, and Gaussian Splatting to train visuomotor policies that match teleoperation success rates on five tabletop tasks with 5-8x less collection effort.

Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning

cs.RO · 2025-11-18 · unverdicted · novelty 6.0

MSDP pretrains a transformer encoder via masked multisensory reconstruction and feeds the embeddings into an asymmetric actor-critic RL setup, yielding faster learning and high real-robot success rates with only 6,000 interactions.

Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation

cs.RO · 2024-09-24 · unverdicted · novelty 6.0

Gen2Act enables generalizable robot manipulation for unseen objects and novel motions by using zero-shot human video generation from web data to condition a policy trained on an order of magnitude less robot interaction data.

Octo: An Open-Source Generalist Robot Policy

cs.RO · 2024-05-20 · unverdicted · novelty 6.0

Octo is an open-source transformer-based generalist robot policy pretrained on 800k trajectories that serves as an effective initialization for finetuning across diverse robotic platforms.

Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning

cs.LG · 2025-06-11 · unverdicted · novelty 5.0

BYOL-γ uses self-predictive representations to approximate successor representations, improving zero-shot combinatorial generalization in goal-conditioned behavioral cloning.

Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own

cs.RO · 2023-10-04 · unverdicted · novelty 5.0

RLFP and the FAC algorithm combine foundation-model priors for policy, value, and rewards to produce sample-efficient robotic RL that reaches 86% real-robot success after one hour and 100% success on 7/8 Meta-world tasks in under 100k frames.

citing papers explorer

Showing 7 of 7 citing papers.

Open X-Embodiment: Robotic Learning Datasets and RT-X Models cs.RO · 2023-10-13 · unverdicted · none · ref 56
A collaborative dataset spanning 22 robots and 527 skills enables RT-X models that transfer capabilities across different robot embodiments.
WARPED: Wrist-Aligned Rendering for Robot Policy Learning from Egocentric Human Demonstrations cs.RO · 2026-04-12 · unverdicted · none · ref 71
WARPED synthesizes realistic wrist-view observations from monocular egocentric human videos via foundation models, hand-object tracking, retargeting, and Gaussian Splatting to train visuomotor policies that match teleoperation success rates on five tabletop tasks with 5-8x less collection effort.
Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning cs.RO · 2025-11-18 · unverdicted · none · ref 51
MSDP pretrains a transformer encoder via masked multisensory reconstruction and feeds the embeddings into an asymmetric actor-critic RL setup, yielding faster learning and high real-robot success rates with only 6,000 interactions.
Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation cs.RO · 2024-09-24 · unverdicted · none · ref 6
Gen2Act enables generalizable robot manipulation for unseen objects and novel motions by using zero-shot human video generation from web data to condition a policy trained on an order of magnitude less robot interaction data.
Octo: An Open-Source Generalist Robot Policy cs.RO · 2024-05-20 · unverdicted · none · ref 58
Octo is an open-source transformer-based generalist robot policy pretrained on 800k trajectories that serves as an effective initialization for finetuning across diverse robotic platforms.
Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning cs.LG · 2025-06-11 · unverdicted · none · ref 33
BYOL-γ uses self-predictive representations to approximate successor representations, improving zero-shot combinatorial generalization in goal-conditioned behavioral cloning.
Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own cs.RO · 2023-10-04 · unverdicted · none · ref 45
RLFP and the FAC algorithm combine foundation-model priors for policy, value, and rewards to produce sample-efficient robotic RL that reaches 86% real-robot success after one hour and 100% success on 7/8 Meta-world tasks in under 100k frames.

Where are we in the search for an artificial vi- sual cortex for embodied intelligence?

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer