pith. sign in

hub Canonical reference

Videovla: Video generators can be generalizable robot manipulators

Canonical reference. 86% of citing Pith papers cite this work as background.

11 Pith papers citing it
Background 86% of classified citations

hub tools

citation-role summary

background 6 other 1

citation-polarity summary

fields

cs.RO 7 cs.CV 4

years

2026 11

polarities

background 6 unclear 1

representative citing papers

Embody4D: A Generalist 4D World Model for Embodied AI

cs.CV · 2026-05-03 · unverdicted · novelty 5.0

Embody4D generates high-fidelity, view-consistent novel views from monocular videos for embodied scenarios via 3D-aware data synthesis, adaptive noise injection, and interaction-aware attention.

Causal World Modeling for Robot Control

cs.CV · 2026-01-29 · unverdicted · novelty 5.0

LingBot-VA combines video world modeling with policy learning via Mixture-of-Transformers, closed-loop rollouts, and asynchronous inference to improve robot manipulation in simulation and real settings.

citing papers explorer

Showing 11 of 11 citing papers.