//arxiv.org/abs/2308.10901

Mendonca, R · 2023 · arXiv 2308.10901

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

RotVLA: Rotational Latent Action for Vision-Language-Action Model

cs.RO · 2026-05-13 · unverdicted · novelty 7.0

RotVLA models latent actions as continuous SO(n) rotations with triplet-frame supervision and flow-matching to reach 98.2% success on LIBERO and 89.6%/88.5% on RoboTwin2.0 using a 1.7B-parameter model.

Latent State Design for World Models under Sufficiency Constraints

cs.AI · 2026-05-03 · unverdicted · novelty 7.0

World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.

Grasp as You Dream: Imitating Functional Grasping from Generated Human Demonstrations

cs.RO · 2026-04-08 · unverdicted · novelty 6.0

GraspDreamer synthesizes human functional grasping demonstrations with visual generative models to enable zero-shot robot grasping with improved data efficiency and generalization.

Controllable Egocentric Video Generation via Occlusion-Aware Sparse 3D Hand Joints

cs.CV · 2026-03-12 · unverdicted · novelty 6.0

A new occlusion-aware control module generates high-fidelity egocentric videos from sparse 3D hand joints, supported by a million-clip dataset and cross-embodiment benchmark.

Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation

cs.RO · 2023-12-20 · conditional · novelty 6.0

A GPT-style model pre-trained on large video datasets achieves 94.9% success on CALVIN multi-task manipulation and 85.4% zero-shot generalization, outperforming prior baselines.

From Video to Control: A Survey of Learning Manipulation Interfaces from Temporal Visual Data

cs.RO · 2026-04-04 · accept · novelty 5.0

A survey introduces an interface-centric taxonomy for video-to-control methods in robotic manipulation and identifies the robotics integration layer as the central open challenge.

World Action Models: The Next Frontier in Embodied AI

cs.RO · 2026-05-12 · unverdicted · novelty 4.0

The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

citing papers explorer

Showing 7 of 7 citing papers.

RotVLA: Rotational Latent Action for Vision-Language-Action Model cs.RO · 2026-05-13 · unverdicted · none · ref 84
RotVLA models latent actions as continuous SO(n) rotations with triplet-frame supervision and flow-matching to reach 98.2% success on LIBERO and 89.6%/88.5% on RoboTwin2.0 using a 1.7B-parameter model.
Latent State Design for World Models under Sufficiency Constraints cs.AI · 2026-05-03 · unverdicted · none · ref 45
World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.
Grasp as You Dream: Imitating Functional Grasping from Generated Human Demonstrations cs.RO · 2026-04-08 · unverdicted · none · ref 27
GraspDreamer synthesizes human functional grasping demonstrations with visual generative models to enable zero-shot robot grasping with improved data efficiency and generalization.
Controllable Egocentric Video Generation via Occlusion-Aware Sparse 3D Hand Joints cs.CV · 2026-03-12 · unverdicted · none · ref 29
A new occlusion-aware control module generates high-fidelity egocentric videos from sparse 3D hand joints, supported by a million-clip dataset and cross-embodiment benchmark.
Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation cs.RO · 2023-12-20 · conditional · none · ref 145
A GPT-style model pre-trained on large video datasets achieves 94.9% success on CALVIN multi-task manipulation and 85.4% zero-shot generalization, outperforming prior baselines.
From Video to Control: A Survey of Learning Manipulation Interfaces from Temporal Visual Data cs.RO · 2026-04-04 · accept · none · ref 68
A survey introduces an interface-centric taxonomy for video-to-control methods in robotic manipulation and identifies the robotics integration layer as the central open challenge.
World Action Models: The Next Frontier in Embodied AI cs.RO · 2026-05-12 · unverdicted · none · ref 38
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

//arxiv.org/abs/2308.10901

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer