//arxiv.org/abs/2512.24766

Karthik Dharmarajan, Wenlong Huang, Jiajun Wu, Li Fei-Fei, Ruohan Zhang · 2025 · arXiv 2512.24766

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Point Tracking Improves World Action Models

cs.RO · 2026-05-22 · unverdicted · novelty 7.0

JOPAT jointly models pixels, point tracks, and actions in a diffusion transformer and reports gains over pixel-only baselines on long-horizon robot tasks with occlusion and off-screen motion.

RoboFlow4D: A Lightweight Flow World Model Toward Real-Time Flow-Guided Robotic Manipulation

cs.RO · 2026-05-17 · unverdicted · novelty 7.0

RoboFlow4D is an end-to-end lightweight flow world model that predicts multi-frame 3D flows from visual observations and textual instructions to provide explicit planning for real-time robotic manipulation.

DSSP: Diffusion State Space Policy with Full-History Encoding

cs.RO · 2026-05-14 · conditional · novelty 7.0

DSSP is a history-conditioned diffusion state space policy that uses SSMs to encode full observation streams with an auxiliary dynamics objective and hierarchical fusion, achieving SOTA results with reduced model size in robot manipulation.

CreFlow: Corrective Reflow for Sparse-Reward Embodied Video Diffusion RL

cs.CV · 2026-05-14 · conditional · novelty 7.0

CreFlow combines LTL compositional rewards with credit-aware NFT and corrective reflow losses in online RL to improve embodied video diffusion models, raising downstream task success by 23.8 percentage points on eight bimanual manipulation tasks.

Human Cognition in Machines: A Unified Perspective of World Models

cs.RO · 2026-04-17 · unverdicted · novelty 6.0

The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.

LAMP: Lift Image-Editing as General 3D Priors for Open-world Manipulation

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

LAMP extracts continuous 3D inter-object transformations from image editing to serve as geometry-aware priors for zero-shot open-world robotic manipulation.

X-Imitator: Spatial-Aware Imitation Learning via Bidirectional Action-Pose Interaction

cs.RO · 2026-05-12 · unverdicted · novelty 5.0

X-Imitator is a bidirectional action-pose interaction framework for spatial-aware imitation learning that outperforms vanilla policies and explicit pose guidance on 24 simulated and 3 real-world robotic tasks.

From Video to Control: A Survey of Learning Manipulation Interfaces from Temporal Visual Data

cs.RO · 2026-04-04 · accept · novelty 5.0

A survey introduces an interface-centric taxonomy for video-to-control methods in robotic manipulation and identifies the robotics integration layer as the central open challenge.

World Action Models: The Next Frontier in Embodied AI

cs.RO · 2026-05-12 · unverdicted · novelty 4.0

The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

citing papers explorer

Showing 9 of 9 citing papers.

Point Tracking Improves World Action Models cs.RO · 2026-05-22 · unverdicted · none · ref 50
JOPAT jointly models pixels, point tracks, and actions in a diffusion transformer and reports gains over pixel-only baselines on long-horizon robot tasks with occlusion and off-screen motion.
RoboFlow4D: A Lightweight Flow World Model Toward Real-Time Flow-Guided Robotic Manipulation cs.RO · 2026-05-17 · unverdicted · none · ref 5
RoboFlow4D is an end-to-end lightweight flow world model that predicts multi-frame 3D flows from visual observations and textual instructions to provide explicit planning for real-time robotic manipulation.
DSSP: Diffusion State Space Policy with Full-History Encoding cs.RO · 2026-05-14 · conditional · none · ref 9
DSSP is a history-conditioned diffusion state space policy that uses SSMs to encode full observation streams with an auxiliary dynamics objective and hierarchical fusion, achieving SOTA results with reduced model size in robot manipulation.
CreFlow: Corrective Reflow for Sparse-Reward Embodied Video Diffusion RL cs.CV · 2026-05-14 · conditional · none · ref 10
CreFlow combines LTL compositional rewards with credit-aware NFT and corrective reflow losses in online RL to improve embodied video diffusion models, raising downstream task success by 23.8 percentage points on eight bimanual manipulation tasks.
Human Cognition in Machines: A Unified Perspective of World Models cs.RO · 2026-04-17 · unverdicted · none · ref 39
The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.
LAMP: Lift Image-Editing as General 3D Priors for Open-world Manipulation cs.CV · 2026-04-09 · unverdicted · none · ref 20
LAMP extracts continuous 3D inter-object transformations from image editing to serve as geometry-aware priors for zero-shot open-world robotic manipulation.
X-Imitator: Spatial-Aware Imitation Learning via Bidirectional Action-Pose Interaction cs.RO · 2026-05-12 · unverdicted · none · ref 17
X-Imitator is a bidirectional action-pose interaction framework for spatial-aware imitation learning that outperforms vanilla policies and explicit pose guidance on 24 simulated and 3 real-world robotic tasks.
From Video to Control: A Survey of Learning Manipulation Interfaces from Temporal Visual Data cs.RO · 2026-04-04 · accept · none · ref 25
A survey introduces an interface-centric taxonomy for video-to-control methods in robotic manipulation and identifies the robotics integration layer as the central open challenge.
World Action Models: The Next Frontier in Embodied AI cs.RO · 2026-05-12 · unverdicted · none · ref 78
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

//arxiv.org/abs/2512.24766

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer