hub Canonical reference

RH20T: A comprehensive robotic dataset for learning diverse skills in one-shot

· 2023 · arXiv 2307.00595

Canonical reference. 73% of citing Pith papers cite this work as background.

17 Pith papers citing it

Background 73% of classified citations

read on arXiv browse 17 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6 dataset 4 method 1

citation-polarity summary

background 8 use dataset 2 use method 1

representative citing papers

RoboJailBench: Benchmarking Adversarial Attacks and Defenses in Embodied Robotic Agents

cs.CR · 2026-05-19 · unverdicted · novelty 7.0

RoboJailBench creates a taxonomy-based benchmark, intent-contrast datasets, and evaluation framework for jailbreak attacks and defenses in embodied robotic AI systems.

Beyond Isolation: A Unified Benchmark for General-Purpose Navigation

cs.RO · 2026-05-10 · unverdicted · novelty 7.0

OmniNavBench is a unified benchmark for general-purpose navigation featuring composite multi-skill instructions, support for humanoid, quadrupedal and wheeled robots, and 1779 human teleoperated trajectories across 170 environments.

OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction

cs.RO · 2026-04-30 · unverdicted · novelty 7.0

A 48-camera residential platform delivers real-time occlusion-robust 3D perception and coordinated actuation for multi-human multi-robot interaction in a shared home workspace.

Being-H0.7: A Latent World-Action Model from Egocentric Videos

cs.RO · 2026-04-30 · unverdicted · novelty 7.0

Being-H0.7 adds future-aware latent reasoning to direct VLA policies via dual-branch alignment on latent queries, matching world-model benefits at VLA efficiency.

VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis

cs.RO · 2026-04-23 · unverdicted · novelty 7.0

VistaBot integrates 4D geometry estimation and spatiotemporal view synthesis into action policies to improve cross-view generalization by 2.6-2.8x on a new VGS metric in simulation and real tasks.

RoboCOIN: An Open-Sourced Bimanual Robotic Data Collection for Integrated Manipulation

cs.RO · 2025-11-21 · accept · novelty 7.0

RoboCOIN is a large multi-embodiment bimanual manipulation dataset with hierarchical annotations and an open processing pipeline that improves model performance across robotic platforms.

3D-VLA: A 3D Vision-Language-Action Generative World Model

cs.CV · 2024-03-14 · unverdicted · novelty 7.0

3D-VLA is a new embodied foundation model that uses a 3D LLM plus aligned diffusion models to generate future images and point clouds for improved reasoning and action planning in 3D environments.

Spacetime Optimal-Transport Attention for Visuo-Haptic Imitation Learning of Contact-Rich Manipulation

cs.RO · 2026-05-19 · unverdicted · novelty 6.0

SO-TA replaces standard attention with optimal-transport alignment across vision, force/torque, and proprioception to improve diffusion-policy performance on real-robot insertion and wiping tasks.

HumanNet: Scaling Human-centric Video Learning to One Million Hours

cs.CV · 2026-05-07 · unverdicted · novelty 6.0

HumanNet is a 1M-hour human-centric video dataset with interaction annotations that enables better vision-language-action model performance than equivalent robot data in a controlled test.

MolmoAct2: Action Reasoning Models for Real-world Deployment

cs.RO · 2026-05-04 · unverdicted · novelty 6.0 · 2 refs

MolmoAct2 is an open VLA model that outperforms baselines like Pi-05 on 7 benchmarks and whose backbone surpasses GPT-5 on 13 embodied-reasoning tasks through new datasets, specialized training, and architecture changes for lower latency.

EgoVerse: An Egocentric Human Dataset for Robot Learning from Around the World

cs.RO · 2026-04-08 · unverdicted · novelty 6.0

EgoVerse releases 1,362 hours of standardized egocentric human data across 1,965 tasks and shows via multi-lab experiments that robot policy performance scales with human data volume when the data aligns with robot objectives.

Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons

cs.RO · 2026-03-02 · unverdicted · novelty 6.0

Robometer combines intra-trajectory progress supervision with inter-trajectory preference supervision on a 1M-trajectory dataset to learn more generalizable robotic reward functions than prior methods.

IGen: Scalable Data Generation for Robot Learning from Open-World Images

cs.RO · 2025-12-01 · unverdicted · novelty 6.0

IGen generates realistic visuomotor training data including actions and temporally coherent visuals from unstructured open-world images via 3D reconstruction and VLM reasoning.

ProcVLM: Learning Procedure-Grounded Progress Rewards for Robotic Manipulation

cs.RO · 2026-05-09 · unverdicted · novelty 5.0

ProcVLM learns procedure-grounded dense progress rewards for robotic manipulation via a reasoning-before-estimation VLM trained on a 60M-frame synthesized corpus from 30 embodied datasets.

Embody4D: A Generalist 4D World Model for Embodied AI

cs.CV · 2026-05-03 · unverdicted · novelty 5.0

Embody4D generates high-fidelity, view-consistent novel views from monocular videos for embodied scenarios via 3D-aware data synthesis, adaptive noise injection, and interaction-aware attention.

World Action Models: The Next Frontier in Embodied AI

cs.RO · 2026-05-12 · unverdicted · novelty 4.0

The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

From Video to Control: A Survey of Learning Manipulation Interfaces from Temporal Visual Data

cs.RO · 2026-04-04

citing papers explorer

Showing 17 of 17 citing papers.

RoboJailBench: Benchmarking Adversarial Attacks and Defenses in Embodied Robotic Agents cs.CR · 2026-05-19 · unverdicted · none · ref 9
RoboJailBench creates a taxonomy-based benchmark, intent-contrast datasets, and evaluation framework for jailbreak attacks and defenses in embodied robotic AI systems.
Beyond Isolation: A Unified Benchmark for General-Purpose Navigation cs.RO · 2026-05-10 · unverdicted · none · ref 9
OmniNavBench is a unified benchmark for general-purpose navigation featuring composite multi-skill instructions, support for humanoid, quadrupedal and wheeled robots, and 1779 human teleoperated trajectories across 170 environments.
OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction cs.RO · 2026-04-30 · unverdicted · none · ref 9
A 48-camera residential platform delivers real-time occlusion-robust 3D perception and coordinated actuation for multi-human multi-robot interaction in a shared home workspace.
Being-H0.7: A Latent World-Action Model from Egocentric Videos cs.RO · 2026-04-30 · unverdicted · none · ref 21
Being-H0.7 adds future-aware latent reasoning to direct VLA policies via dual-branch alignment on latent queries, matching world-model benefits at VLA efficiency.
VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis cs.RO · 2026-04-23 · unverdicted · none · ref 12
VistaBot integrates 4D geometry estimation and spatiotemporal view synthesis into action policies to improve cross-view generalization by 2.6-2.8x on a new VGS metric in simulation and real tasks.
RoboCOIN: An Open-Sourced Bimanual Robotic Data Collection for Integrated Manipulation cs.RO · 2025-11-21 · accept · none · ref 12
RoboCOIN is a large multi-embodiment bimanual manipulation dataset with hierarchical annotations and an open processing pipeline that improves model performance across robotic platforms.
3D-VLA: A 3D Vision-Language-Action Generative World Model cs.CV · 2024-03-14 · unverdicted · none · ref 17
3D-VLA is a new embodied foundation model that uses a 3D LLM plus aligned diffusion models to generate future images and point clouds for improved reasoning and action planning in 3D environments.
Spacetime Optimal-Transport Attention for Visuo-Haptic Imitation Learning of Contact-Rich Manipulation cs.RO · 2026-05-19 · unverdicted · none · ref 10
SO-TA replaces standard attention with optimal-transport alignment across vision, force/torque, and proprioception to improve diffusion-policy performance on real-robot insertion and wiping tasks.
HumanNet: Scaling Human-centric Video Learning to One Million Hours cs.CV · 2026-05-07 · unverdicted · none · ref 10
HumanNet is a 1M-hour human-centric video dataset with interaction annotations that enables better vision-language-action model performance than equivalent robot data in a controlled test.
MolmoAct2: Action Reasoning Models for Real-world Deployment cs.RO · 2026-05-04 · unverdicted · none · ref 14 · 2 links
MolmoAct2 is an open VLA model that outperforms baselines like Pi-05 on 7 benchmarks and whose backbone surpasses GPT-5 on 13 embodied-reasoning tasks through new datasets, specialized training, and architecture changes for lower latency.
EgoVerse: An Egocentric Human Dataset for Robot Learning from Around the World cs.RO · 2026-04-08 · unverdicted · none · ref 18
EgoVerse releases 1,362 hours of standardized egocentric human data across 1,965 tasks and shows via multi-lab experiments that robot policy performance scales with human data volume when the data aligns with robot objectives.
Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons cs.RO · 2026-03-02 · unverdicted · none · ref 65
Robometer combines intra-trajectory progress supervision with inter-trajectory preference supervision on a 1M-trajectory dataset to learn more generalizable robotic reward functions than prior methods.
IGen: Scalable Data Generation for Robot Learning from Open-World Images cs.RO · 2025-12-01 · unverdicted · none · ref 18
IGen generates realistic visuomotor training data including actions and temporally coherent visuals from unstructured open-world images via 3D reconstruction and VLM reasoning.
ProcVLM: Learning Procedure-Grounded Progress Rewards for Robotic Manipulation cs.RO · 2026-05-09 · unverdicted · none · ref 17
ProcVLM learns procedure-grounded dense progress rewards for robotic manipulation via a reasoning-before-estimation VLM trained on a 60M-frame synthesized corpus from 30 embodied datasets.
Embody4D: A Generalist 4D World Model for Embodied AI cs.CV · 2026-05-03 · unverdicted · none · ref 15
Embody4D generates high-fidelity, view-consistent novel views from monocular videos for embodied scenarios via 3D-aware data synthesis, adaptive noise injection, and interaction-aware attention.
World Action Models: The Next Frontier in Embodied AI cs.RO · 2026-05-12 · unverdicted · none · ref 131
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
From Video to Control: A Survey of Learning Manipulation Interfaces from Temporal Visual Data cs.RO · 2026-04-04 · unreviewed · ref 33

RH20T: A comprehensive robotic dataset for learning diverse skills in one-shot

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer