pith. sign in

How much 3D do video foundation models encode?

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

fields

cs.CV 4 cs.RO 1

years

2026 5

verdicts

UNVERDICTED 5

roles

background 2

polarities

background 1 support 1

clear filters

representative citing papers

Novel View Synthesis as Video Completion

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

Video diffusion models can be adapted into permutation-invariant generators for sparse novel view synthesis by treating the problem as video completion and removing temporal order cues.

WALL-WM: Carving World Action Modeling at the Event Joints

cs.RO · 2026-06-01 · unverdicted · novelty 4.0

WALL-WM introduces event-grounded Vision-Language-Action pretraining that uses semantic events as the atomic unit to address granularity mismatch in world action models and reports state-of-the-art generalization.

World-R1: Reinforcing 3D Constraints for Text-to-Video Generation

cs.CV · 2026-04-27 · unverdicted · novelty 4.0 · 3 refs

World-R1 applies reinforcement learning via Flow-GRPO and a text dataset to align text-to-video models with 3D constraints from pre-trained foundation models, improving consistency while keeping original visual quality.

citing papers explorer

Showing 5 of 5 citing papers.