Masked world models for visual control

Younggyo Seo, Danijar Hafner, Hao Liu, Fangchen Liu, Stephen James, Kimin Lee, Pieter Abbeel · 2023

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

citation-role summary

background 1 dataset 1

citation-polarity summary

background 1 use dataset 1

representative citing papers

One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy

cs.CV · 2026-05-08 · conditional · novelty 7.0 · 3 refs

Reducing visual input to one token per frame in VLA world models maintains or improves long-horizon performance on MetaWorld, LIBERO, and real-robot tasks.

ALAM: Algebraically Consistent Latent Action Model for Vision-Language-Action Models

cs.RO · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

ALAM introduces algebraic consistency regularization on latent action transitions from videos, raising VLA success rates from 47.9% to 85.0% on MetaWorld MT50 and 94.1% to 98.1% on LIBERO.

Hydra-DP3: Frequency-Aware Right-Sizing of 3D Diffusion Policies for Visuomotor Control

cs.RO · 2026-05-02 · unverdicted · novelty 6.0 · 2 refs

Hydra-DP3 achieves SOTA visuomotor performance with under 1% of prior 3D diffusion policy parameters by using frequency analysis to justify a lightweight decoder and two-step DDIM inference.

citing papers explorer

Showing 3 of 3 citing papers.

One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy cs.CV · 2026-05-08 · conditional · none · ref 36 · 3 links
Reducing visual input to one token per frame in VLA world models maintains or improves long-horizon performance on MetaWorld, LIBERO, and real-robot tasks.
ALAM: Algebraically Consistent Latent Action Model for Vision-Language-Action Models cs.RO · 2026-05-11 · unverdicted · none · ref 39 · 2 links
ALAM introduces algebraic consistency regularization on latent action transitions from videos, raising VLA success rates from 47.9% to 85.0% on MetaWorld MT50 and 94.1% to 98.1% on LIBERO.
Hydra-DP3: Frequency-Aware Right-Sizing of 3D Diffusion Policies for Visuomotor Control cs.RO · 2026-05-02 · unverdicted · none · ref 22 · 2 links
Hydra-DP3 achieves SOTA visuomotor performance with under 1% of prior 3D diffusion policy parameters by using frequency analysis to justify a lightweight decoder and two-step DDIM inference.

Masked world models for visual control

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer