Video diffusion models

Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, David Fleet

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

WorldVLN: Autoregressive World Action Model for Aerial Vision-Language Navigation

cs.RO · 2026-05-15 · unverdicted · novelty 7.0

WorldVLN proposes the first autoregressive world action model for aerial vision-language navigation that predicts short-horizon latent world states, decodes them to waypoints in closed loop, and uses two-stage training with Action-aware GRPO to achieve over 12% success-rate gains on benchmarks plus零

citing papers explorer

Showing 1 of 1 citing paper.

WorldVLN: Autoregressive World Action Model for Aerial Vision-Language Navigation cs.RO · 2026-05-15 · unverdicted · none · ref 16
WorldVLN proposes the first autoregressive world action model for aerial vision-language navigation that predicts short-horizon latent world states, decodes them to waypoints in closed loop, and uses two-stage training with Action-aware GRPO to achieve over 12% success-rate gains on benchmarks plus零

Video diffusion models

fields

years

verdicts

representative citing papers

citing papers explorer