hub Canonical reference

Adaworld: Learning adaptable world models with latent actions

· 2025 · arXiv 2503.18938

Canonical reference. 100% of citing Pith papers cite this work as background.

24 Pith papers citing it

Background 100% of classified citations

read on arXiv browse 24 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 7

citation-polarity summary

background 7

representative citing papers

Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration

cs.CV · 2026-05-17 · unverdicted · novelty 7.0

Soap2Soap uses a multi-agent system with dual-bridge consistency via JSON screenplays and visual anchors plus batch keyframe generation to achieve better long-term consistency in cinematic video remaking than commercial APIs.

Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement

cs.CV · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

NOVA represents world states as INR weights for decoder-free rendering, compactness, and unsupervised disentanglement of background, foreground, and motion in video world models.

Latent State Design for World Models under Sufficiency Constraints

cs.AI · 2026-05-03 · unverdicted · novelty 7.0

World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.

UniLACT: Depth-Aware RGB Latent Action Learning for Vision-Language-Action Models

cs.RO · 2026-02-23 · unverdicted · novelty 7.0

UniLACT improves VLA models by adding depth-aware unified latent action pretraining that outperforms RGB-only baselines on seen and unseen manipulation tasks.

DreamGen: Unlocking Generalization in Robot Learning through Video World Models

cs.RO · 2025-05-19 · unverdicted · novelty 7.0

DreamGen trains robot policies on synthetic trajectories from adapted video world models, enabling a humanoid robot to perform 22 new behaviors in seen and unseen environments from a single pick-and-place teleoperation dataset.

Flow Matching in Feature Space for Stochastic World Modeling

cs.CV · 2026-06-27 · unverdicted · novelty 6.0

FlowWM applies flow matching directly in pretrained feature space with a one-step projection mechanism, improving perception accuracy, mode coverage, and horizon robustness on synthetic and real-world benchmarks.

PoLAR: Factorizing Extent and Mode in Latent Actions for Robot Policy Learning

cs.RO · 2026-06-19 · unverdicted · novelty 6.0

PoLAR imposes radial structure on latent actions in hyperbolic space to factorize extent and mode, improving robot policy performance over baselines.

CLAW: Learning Continuous Latent Action World Models via Adversarial Latent Regularization

cs.RO · 2026-06-02 · unverdicted · novelty 6.0

CLAW is an end-to-end self-supervised method that learns semantically meaningful continuous latent actions and predictive world models from action-free videos to support imitation learning and goal-directed planning.

VISTA: Triplet-Supervised Video Style Transfer with Diffusion Transformers

cs.CV · 2026-05-17 · unverdicted · novelty 6.0

VISTA introduces a new synthetic triplet dataset and diffusion-transformer framework with style adapter that jointly models style, content, and motion to achieve state-of-the-art video style transfer.

ALAM: Algebraically Consistent Latent Action Model for Vision-Language-Action Models

cs.RO · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

ALAM introduces algebraic consistency regularization on latent action transitions from videos, raising VLA success rates from 47.9% to 85.0% on MetaWorld MT50 and 94.1% to 98.1% on LIBERO.

LA-Pose: Latent Action Pretraining Meets Pose Estimation

cs.CV · 2026-04-30 · unverdicted · novelty 6.0

LA-Pose achieves over 10% higher pose accuracy than recent feed-forward methods on Waymo and PandaSet benchmarks by repurposing latent actions from self-supervised inverse-dynamics pretraining while using orders of magnitude less labeled 3D data.

Human Cognition in Machines: A Unified Perspective of World Models

cs.RO · 2026-04-17 · unverdicted · novelty 6.0

The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.

Learning Task-Invariant Properties via Dreamer: Enabling Efficient Policy Transfer for Quadruped Robots

cs.RO · 2026-04-03 · unverdicted · novelty 6.0

DreamTIP adds LLM-identified task-invariant properties as auxiliary targets in Dreamer's world model plus a mixed-replay adaptation step, delivering 28.1% average simulated transfer gains and 100% real-world climb success versus 10% for baselines.

RISE: Self-Improving Robot Policy with Compositional World Model

cs.RO · 2026-02-11 · unverdicted · novelty 6.0

RISE combines a controllable dynamics model and progress value model into a closed-loop self-improving pipeline that updates robot policies entirely in imagination, reporting over 35% absolute gains on three real-world tasks.

Ctrl-World: A Controllable Generative World Model for Robot Manipulation

cs.RO · 2025-10-11 · unverdicted · novelty 6.0

A controllable world model trained on the DROID dataset generates consistent multi-view robot trajectories for over 20 seconds and improves generalist policy success rates by 44.7% via imagined trajectory fine-tuning.

Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations

cs.RO · 2025-07-01 · unverdicted · novelty 6.0

RIGVid shows that filtered AI-generated videos can serve as effective supervision for complex robotic manipulation tasks without any real demonstrations.

UniVLA: Learning to Act Anywhere with Task-centric Latent Actions

cs.RO · 2025-05-09 · unverdicted · novelty 6.0

UniVLA trains cross-embodiment vision-language-action policies from unlabeled videos via a latent action model in DINO space, beating OpenVLA on benchmarks with 1/20th pretraining compute and 1/10th downstream data.

LaST-HD: Learning Latent Physical Reasoning from Scalable Human Data for Robot Manipulation

cs.RO · 2026-06-22 · unverdicted · novelty 5.0

LaST-HD creates a shared latent dynamics space via a world model to transfer physical reasoning from scalable human-hand demonstrations to robots, achieving over 90% accuracy with 20 minutes of new data after mixed training.

World Pilot: Steering Vision-Language-Action Models with World-Action Priors

cs.RO · 2026-06-10 · unverdicted · novelty 5.0

World Pilot augments VLA policies with world-action priors through latent and action steering pathways, reporting 84.7% success on LIBERO-Plus zero-shot OOD and top real-robot results across four tasks.

UniCanvas: A Diffusion-base Unified Model for Text-in-Image Joint Generation

cs.CV · 2026-06-02 · unverdicted · novelty 5.0

UniCanvas introduces a diffusion-based approach for unified multimodal generation by embedding text as visual patterns within images on a shared canvas.

PhyWorld: Physics-Faithful World Model for Video Generation

cs.CV · 2026-05-19 · unverdicted · novelty 5.0

PhyWorld improves temporal consistency and physical plausibility in video world models via flow matching fine-tuning followed by DPO on physics preference pairs, with reported gains on VBench and a custom physical-faithfulness benchmark.

WALL-WM: Carving World Action Modeling at the Event Joints

cs.RO · 2026-06-01 · unverdicted · novelty 4.0

WALL-WM introduces event-grounded Vision-Language-Action pretraining that uses semantic events as the atomic unit to address granularity mismatch in world action models and reports state-of-the-art generalization.

World Action Models: A Survey

cs.RO · 2026-06-18 · unverdicted · novelty 3.0

A survey that clarifies boundaries and organizes World Action Models by generation requirements and predictive substrates, identifying a trend toward generating less of the future.

From World Models to World Action Models: A Concise Tutorial for Robotics

cs.RO · 2026-07-01 · unverdicted · novelty 2.0 · 2 refs

A tutorial taxonomizes world models for robotics into observation-space and state-space types and introduces world action models via four paradigms linking predictions to executable actions.

citing papers explorer

Showing 24 of 24 citing papers after filters.

Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration cs.CV · 2026-05-17 · unverdicted · none · ref 9
Soap2Soap uses a multi-agent system with dual-bridge consistency via JSON screenplays and visual anchors plus batch keyframe generation to achieve better long-term consistency in cinematic video remaking than commercial APIs.
Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement cs.CV · 2026-05-07 · unverdicted · none · ref 6 · 2 links
NOVA represents world states as INR weights for decoder-free rendering, compactness, and unsupervised disentanglement of background, foreground, and motion in video world models.
Latent State Design for World Models under Sufficiency Constraints cs.AI · 2026-05-03 · unverdicted · none · ref 20
World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.
UniLACT: Depth-Aware RGB Latent Action Learning for Vision-Language-Action Models cs.RO · 2026-02-23 · unverdicted · none · ref 37
UniLACT improves VLA models by adding depth-aware unified latent action pretraining that outperforms RGB-only baselines on seen and unseen manipulation tasks.
DreamGen: Unlocking Generalization in Robot Learning through Video World Models cs.RO · 2025-05-19 · unverdicted · none · ref 76
DreamGen trains robot policies on synthetic trajectories from adapted video world models, enabling a humanoid robot to perform 22 new behaviors in seen and unseen environments from a single pick-and-place teleoperation dataset.
Flow Matching in Feature Space for Stochastic World Modeling cs.CV · 2026-06-27 · unverdicted · none · ref 53
FlowWM applies flow matching directly in pretrained feature space with a one-step projection mechanism, improving perception accuracy, mode coverage, and horizon robustness on synthetic and real-world benchmarks.
PoLAR: Factorizing Extent and Mode in Latent Actions for Robot Policy Learning cs.RO · 2026-06-19 · unverdicted · none · ref 20
PoLAR imposes radial structure on latent actions in hyperbolic space to factorize extent and mode, improving robot policy performance over baselines.
CLAW: Learning Continuous Latent Action World Models via Adversarial Latent Regularization cs.RO · 2026-06-02 · unverdicted · none · ref 21
CLAW is an end-to-end self-supervised method that learns semantically meaningful continuous latent actions and predictive world models from action-free videos to support imitation learning and goal-directed planning.
VISTA: Triplet-Supervised Video Style Transfer with Diffusion Transformers cs.CV · 2026-05-17 · unverdicted · none · ref 6
VISTA introduces a new synthetic triplet dataset and diffusion-transformer framework with style adapter that jointly models style, content, and motion to achieve state-of-the-art video style transfer.
ALAM: Algebraically Consistent Latent Action Model for Vision-Language-Action Models cs.RO · 2026-05-11 · unverdicted · none · ref 17 · 2 links
ALAM introduces algebraic consistency regularization on latent action transitions from videos, raising VLA success rates from 47.9% to 85.0% on MetaWorld MT50 and 94.1% to 98.1% on LIBERO.
LA-Pose: Latent Action Pretraining Meets Pose Estimation cs.CV · 2026-04-30 · unverdicted · none · ref 15
LA-Pose achieves over 10% higher pose accuracy than recent feed-forward methods on Waymo and PandaSet benchmarks by repurposing latent actions from self-supervised inverse-dynamics pretraining while using orders of magnitude less labeled 3D data.
Human Cognition in Machines: A Unified Perspective of World Models cs.RO · 2026-04-17 · unverdicted · none · ref 50
The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.
Learning Task-Invariant Properties via Dreamer: Enabling Efficient Policy Transfer for Quadruped Robots cs.RO · 2026-04-03 · unverdicted · none · ref 15
DreamTIP adds LLM-identified task-invariant properties as auxiliary targets in Dreamer's world model plus a mixed-replay adaptation step, delivering 28.1% average simulated transfer gains and 100% real-world climb success versus 10% for baselines.
RISE: Self-Improving Robot Policy with Compositional World Model cs.RO · 2026-02-11 · unverdicted · none · ref 24
RISE combines a controllable dynamics model and progress value model into a closed-loop self-improving pipeline that updates robot policies entirely in imagination, reporting over 35% absolute gains on three real-world tasks.
Ctrl-World: A Controllable Generative World Model for Robot Manipulation cs.RO · 2025-10-11 · unverdicted · none · ref 15
A controllable world model trained on the DROID dataset generates consistent multi-view robot trajectories for over 20 seconds and improves generalist policy success rates by 44.7% via imagined trajectory fine-tuning.
Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations cs.RO · 2025-07-01 · unverdicted · none · ref 36
RIGVid shows that filtered AI-generated videos can serve as effective supervision for complex robotic manipulation tasks without any real demonstrations.
UniVLA: Learning to Act Anywhere with Task-centric Latent Actions cs.RO · 2025-05-09 · unverdicted · none · ref 27
UniVLA trains cross-embodiment vision-language-action policies from unlabeled videos via a latent action model in DINO space, beating OpenVLA on benchmarks with 1/20th pretraining compute and 1/10th downstream data.
LaST-HD: Learning Latent Physical Reasoning from Scalable Human Data for Robot Manipulation cs.RO · 2026-06-22 · unverdicted · none · ref 39
LaST-HD creates a shared latent dynamics space via a world model to transfer physical reasoning from scalable human-hand demonstrations to robots, achieving over 90% accuracy with 20 minutes of new data after mixed training.
World Pilot: Steering Vision-Language-Action Models with World-Action Priors cs.RO · 2026-06-10 · unverdicted · none · ref 37
World Pilot augments VLA policies with world-action priors through latent and action steering pathways, reporting 84.7% success on LIBERO-Plus zero-shot OOD and top real-robot results across four tasks.
UniCanvas: A Diffusion-base Unified Model for Text-in-Image Joint Generation cs.CV · 2026-06-02 · unverdicted · none · ref 14
UniCanvas introduces a diffusion-based approach for unified multimodal generation by embedding text as visual patterns within images on a shared canvas.
PhyWorld: Physics-Faithful World Model for Video Generation cs.CV · 2026-05-19 · unverdicted · none · ref 36
PhyWorld improves temporal consistency and physical plausibility in video world models via flow matching fine-tuning followed by DPO on physics preference pairs, with reported gains on VBench and a custom physical-faithfulness benchmark.
WALL-WM: Carving World Action Modeling at the Event Joints cs.RO · 2026-06-01 · unverdicted · none · ref 23
WALL-WM introduces event-grounded Vision-Language-Action pretraining that uses semantic events as the atomic unit to address granularity mismatch in world action models and reports state-of-the-art generalization.
World Action Models: A Survey cs.RO · 2026-06-18 · unverdicted · none · ref 40
A survey that clarifies boundaries and organizes World Action Models by generation requirements and predictive substrates, identifying a trend toward generating less of the future.
From World Models to World Action Models: A Concise Tutorial for Robotics cs.RO · 2026-07-01 · unverdicted · none · ref 5 · 2 links
A tutorial taxonomizes world models for robotics into observation-space and state-space types and introduces world action models via four paradigms linking predictions to executable actions.

Adaworld: Learning adaptable world models with latent actions

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer