pith. sign in

hub Canonical reference

Videovla: Video generators can be generalizable robot manipulators.arXiv preprint arXiv:2512.06963

Canonical reference. 86% of citing Pith papers cite this work as background.

19 Pith papers citing it
Background 86% of classified citations

hub tools

citation-role summary

background 6 other 1

citation-polarity summary

years

2026 19

polarities

background 6 unclear 1

clear filters

representative citing papers

Next Forcing: Causal World Modeling with Multi-Chunk Prediction

cs.CV · 2026-06-09 · unverdicted · novelty 6.0

Next Forcing augments video generation models with auxiliary multi-chunk prediction modules to achieve faster training convergence, higher accuracy at high frame rates, and 2x faster inference on world modeling benchmarks.

Embody4D: A Generalist Data Engine for Embodied 4D World Modeling

cs.CV · 2026-05-03 · unverdicted · novelty 6.0 · 2 refs

Embody4D generates novel-view videos from monocular robot videos via a 3D-aware synthesis pipeline, confidence-aware expert modulation, and interaction-aware attention for embodied 4D world modeling.

Causal World Modeling for Robot Control

cs.CV · 2026-01-29 · unverdicted · novelty 5.0

LingBot-VA combines video world modeling with policy learning via Mixture-of-Transformers, closed-loop rollouts, and asynchronous inference to improve robot manipulation in simulation and real settings.

citing papers explorer

Showing 1 of 1 citing paper after filters.