pith. sign in

hub Canonical reference

A comprehensive survey on world models for embodied AI

Canonical reference. 89% of citing Pith papers cite this work as background.

16 Pith papers citing it
Background 89% of classified citations

hub tools

citation-role summary

background 9

citation-polarity summary

years

2026 16

verdicts

UNVERDICTED 16

roles

background 8

polarities

background 7 unclear 1

representative citing papers

Point Tracking Improves World Action Models

cs.RO · 2026-05-22 · unverdicted · novelty 7.0

JOPAT jointly models pixels, point tracks, and actions in a diffusion transformer and reports gains over pixel-only baselines on long-horizon robot tasks with occlusion and off-screen motion.

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

cs.AI · 2026-04-24 · unverdicted · novelty 7.0

Proposes a levels x laws taxonomy for world models in AI agents, defining L1-L3 capabilities across physical, digital, social, and scientific regimes while reviewing over 400 works to outline a roadmap for advanced agentic modeling.

The DAWN of World-Action Interactive Models

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

DAWN couples a world predictor with a world-conditioned action denoiser in latent space so that each refines the other recursively, yielding strong planning and safety results on autonomous driving benchmarks.

CoWorld-VLA: Thinking in a Multi-Expert World Model for Autonomous Driving

cs.CV · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

CoWorld-VLA extracts semantic, geometric, dynamic, and trajectory expert tokens from multi-source supervision and feeds them into a diffusion-based hierarchical planner, achieving competitive collision avoidance and trajectory accuracy on the NAVSIM v1 benchmark.

Human Cognition in Machines: A Unified Perspective of World Models

cs.RO · 2026-04-17 · unverdicted · novelty 6.0

The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.

Safety, Security, and Cognitive Risks in World Models

cs.CR · 2026-04-01 · unverdicted · novelty 6.0

World models enable efficient AI planning but create risks from adversarial corruption, goal misgeneralization, and human bias, demonstrated via attacks that amplify errors and reduce rewards on models like RSSM and DreamerV3.

RISE: Self-Improving Robot Policy with Compositional World Model

cs.RO · 2026-02-11 · unverdicted · novelty 6.0

RISE combines a controllable dynamics model and progress value model into a closed-loop self-improving pipeline that updates robot policies entirely in imagination, reporting over 35% absolute gains on three real-world tasks.

PhyWorld: Physics-Faithful World Model for Video Generation

cs.CV · 2026-05-19 · unverdicted · novelty 5.0

PhyWorld improves temporal consistency and physical plausibility in video world models via flow matching fine-tuning followed by DPO on physics preference pairs, with reported gains on VBench and a custom physical-faithfulness benchmark.

World Action Models: The Next Frontier in Embodied AI

cs.RO · 2026-05-12 · unverdicted · novelty 4.0

The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

citing papers explorer

Showing 16 of 16 citing papers.

  • Point Tracking Improves World Action Models cs.RO · 2026-05-22 · unverdicted · none · ref 29

    JOPAT jointly models pixels, point tracks, and actions in a diffusion transformer and reports gains over pixel-only baselines on long-horizon robot tasks with occlusion and off-screen motion.

  • Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond cs.AI · 2026-04-24 · unverdicted · none · ref 227

    Proposes a levels x laws taxonomy for world models in AI agents, defining L1-L3 capabilities across physical, digital, social, and scientific regimes while reviewing over 400 works to outline a roadmap for advanced agentic modeling.

  • The DAWN of World-Action Interactive Models cs.CV · 2026-05-12 · unverdicted · none · ref 28

    DAWN couples a world predictor with a world-conditioned action denoiser in latent space so that each refines the other recursively, yielding strong planning and safety results on autonomous driving benchmarks.

  • CoWorld-VLA: Thinking in a Multi-Expert World Model for Autonomous Driving cs.CV · 2026-05-11 · unverdicted · none · ref 28 · 2 links

    CoWorld-VLA extracts semantic, geometric, dynamic, and trajectory expert tokens from multi-source supervision and feeds them into a diffusion-based hierarchical planner, achieving competitive collision avoidance and trajectory accuracy on the NAVSIM v1 benchmark.

  • Human Cognition in Machines: A Unified Perspective of World Models cs.RO · 2026-04-17 · unverdicted · none · ref 101

    The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.

  • WM-DAgger: Enabling Efficient Data Aggregation for Imitation Learning with World Models cs.RO · 2026-04-13 · unverdicted · none · ref 9

    WM-DAgger uses world models with corrective action synthesis and consistency-guided filtering to aggregate OOD recovery data for imitation learning, reporting 93.3% success in soft bag pushing with five demonstrations.

  • Safety, Security, and Cognitive Risks in World Models cs.CR · 2026-04-01 · unverdicted · none · ref 11

    World models enable efficient AI planning but create risks from adversarial corruption, goal misgeneralization, and human bias, demonstrated via attacks that amplify errors and reduce rewards on models like RSSM and DreamerV3.

  • Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses cs.CR · 2026-03-28 · unverdicted · none · ref 205

    The survey organizes over 400 papers on embodied AI safety into a multi-level taxonomy and flags overlooked issues such as fragile multimodal fusion and unstable planning under jailbreaks.

  • RISE: Self-Improving Robot Policy with Compositional World Model cs.RO · 2026-02-11 · unverdicted · none · ref 57

    RISE combines a controllable dynamics model and progress value model into a closed-loop self-improving pipeline that updates robot policies entirely in imagination, reporting over 35% absolute gains on three real-world tasks.

  • PhyWorld: Physics-Faithful World Model for Video Generation cs.CV · 2026-05-19 · unverdicted · none · ref 8

    PhyWorld improves temporal consistency and physical plausibility in video world models via flow matching fine-tuning followed by DPO on physics preference pairs, with reported gains on VBench and a custom physical-faithfulness benchmark.

  • WorldArena 2.0: Extending Embodied World Model Benchmarking on Modality, Functionality and Platform cs.RO · 2026-05-18 · unverdicted · none · ref 13

    WorldArena 2.0 extends embodied world model benchmarks to visuotactile perception, interactive policy training, and diverse real and simulated robotic platforms under a unified protocol.

  • STARRY: Spatial-Temporal Action-Centric World Modeling for Robotic Manipulation cs.RO · 2026-04-29 · unverdicted · none · ref 23

    STARRY uses unified diffusion to align spatial-temporal world predictions with action generation plus GASAM for geometry-aware attention, reaching 93.82%/93.30% success on 50 bimanual tasks in simulation and raising real-world success from 42.5% to 70.8%.

  • OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering cs.CV · 2026-04-09 · unverdicted · none · ref 17

    OmniJigsaw is a self-supervised proxy task that reconstructs shuffled audio-visual clips via joint integration, sample-level selection, and clip-level masking strategies, yielding gains on 15 video, audio, and reasoning benchmarks.

  • World Action Models: The Next Frontier in Embodied AI cs.RO · 2026-05-12 · unverdicted · none · ref 274

    The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

  • Event-Centric World Modeling with Memory-Augmented Retrieval for Embodied Decision-Making cs.LG · 2026-04-08 · unverdicted · none · ref 20

    An event-centric framework encodes environments as semantic events and retrieves weighted prior maneuvers from a knowledge bank to enable interpretable, physics-aware decision-making for UAVs.

  • OpenWorldLib: A Unified Codebase and Definition of Advanced World Models cs.CV · 2026-04-06 · unverdicted · none · ref 70

    OpenWorldLib offers a standardized codebase and definition for world models that combine perception, interaction, and memory to understand and predict the world.