pith. sign in

super hub Canonical reference

World Models

Canonical reference. 88% of citing Pith papers cite this work as background.

141 Pith papers citing it
Background 88% of classified citations
abstract

We explore building generative neural network models of popular reinforcement learning environments. Our world model can be trained quickly in an unsupervised manner to learn a compressed spatial and temporal representation of the environment. By using features extracted from the world model as inputs to an agent, we can train a very compact and simple policy that can solve the required task. We can even train our agent entirely inside of its own hallucinated dream generated by its world model, and transfer this policy back into the actual environment. An interactive version of this paper is available at https://worldmodels.github.io/

hub tools

citation-role summary

background 36 method 3 other 1

citation-polarity summary

claims ledger

  • abstract We explore building generative neural network models of popular reinforcement learning environments. Our world model can be trained quickly in an unsupervised manner to learn a compressed spatial and temporal representation of the environment. By using features extracted from the world model as inputs to an agent, we can train a very compact and simple policy that can solve the required task. We can even train our agent entirely inside of its own hallucinated dream generated by its world model, and transfer this policy back into the actual environment. An interactive version of this paper is

authors

co-cited works

representative citing papers

From Generalist to Specialist Representation

cs.LG · 2026-05-12 · unverdicted · novelty 8.0

Task structure is identifiable across time steps and task-relevant representations are identifiable within steps in a nonparametric setting under sparsity regularization.

A Model-Free Universal AI

cs.AI · 2026-02-26 · unverdicted · novelty 8.0

AIQI is the first model-free universal AI agent proven asymptotically ε-optimal in general RL by inducing over distributional Q-functions instead of policies or environments.

MemGym: a Long-Horizon Memory Environment for LLM Agents

cs.CL · 2026-05-20 · unverdicted · novelty 7.0

MemGym unifies agent gyms into a memory benchmark with isolated scoring across tool-use, research, coding, and computer-use regimes plus a lightweight reward model for tractable coding evaluation.

EgoTL: Egocentric Think-Aloud Chains for Long-Horizon Tasks

cs.CV · 2026-04-10 · unverdicted · novelty 7.0

EgoTL provides a new egocentric dataset with think-aloud chains and metric labels that benchmarks VLMs on long-horizon tasks and improves their planning, reasoning, and spatial grounding after finetuning.

citing papers explorer

Showing 50 of 141 citing papers.

  • From Generalist to Specialist Representation cs.LG · 2026-05-12 · unverdicted · none · ref 2 · internal anchor

    Task structure is identifiable across time steps and task-relevant representations are identifiable within steps in a nonparametric setting under sparsity regularization.

  • EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding cs.CV · 2026-05-11 · unverdicted · none · ref 61 · internal anchor

    EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.

  • A Model-Free Universal AI cs.AI · 2026-02-26 · unverdicted · none · ref 2 · internal anchor

    AIQI is the first model-free universal AI agent proven asymptotically ε-optimal in general RL by inducing over distributional Q-functions instead of policies or environments.

  • CRONOS: Benchmarking Counterfactual Physical Consistency in Video Models cs.CV · 2026-05-22 · unverdicted · none · ref 19 · internal anchor

    CRONOS benchmark shows recent open-source video generators fail to preserve physical consistency under controlled changes to viewpoint, scene, object category, and appearance.

  • MemGym: a Long-Horizon Memory Environment for LLM Agents cs.CL · 2026-05-20 · unverdicted · none · ref 14 · internal anchor

    MemGym unifies agent gyms into a memory benchmark with isolated scoring across tool-use, research, coding, and computer-use regimes plus a lightweight reward model for tractable coding evaluation.

  • Demo-JEPA: Joint-Embedding Predictive Architecture for One-shot Cross-Embodiment Imitation cs.RO · 2026-05-20 · unverdicted · none · ref 45 · internal anchor

    Demo-JEPA enables one-shot cross-embodiment imitation by mapping visual demonstrations to shared latent future trajectories that serve as subgoals for the target agent's own forward dynamics planning.

  • Baba in Wonderland: Online Self-Supervised Dynamics Discovery for Executable World Models cs.AI · 2026-05-16 · unverdicted · none · ref 6 · internal anchor

    Alice uses preservation conflicts from failed candidate updates to create class-stratified hypotheses and guide exploration, improving executable world-model learning under prior misalignment.

  • Learning POMDP World Models from Observations with Language-Model Priors cs.LG · 2026-05-13 · unverdicted · none · ref 2 · internal anchor

    Pinductor leverages language-model priors to learn POMDP world models from limited trajectories, matching privileged-access methods in performance and exceeding tabular baselines in sample efficiency.

  • JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning cs.LG · 2026-05-13 · unverdicted · none · ref 2 · internal anchor

    JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampling than pixel diffusion baselines.

  • Runtime Monitoring of Perception-Based Autonomous Systems via Embedding Temporal Logic cs.LG · 2026-05-12 · unverdicted · none · ref 84 · 2 links · internal anchor

    Embedding Temporal Logic (ETL) performs runtime monitoring directly in learned embedding spaces using distance-based predicates composed with temporal operators, supported by conformal calibration for reliable predicate evaluation.

  • Support-Safe Variational Hybrid Filtering for Contact-Mode and Sparse-Law Recovery cs.RO · 2026-05-12 · unverdicted · none · ref 2 · internal anchor

    VHYDRO is a support-safe variational hybrid filter that jointly recovers continuous latent states, discrete contact modes, and sparse port-Hamiltonian laws per regime while preventing loss of feasible transitions.

  • The Gordian Knot for VLMs: Diagrammatic Knot Reasoning as a Hard Benchmark cs.AI · 2026-05-11 · unverdicted · none · ref 5 · internal anchor

    KnotBench benchmark shows state-of-the-art VLMs perform near random on diagrammatic knot reasoning tasks and lack ability to simulate structural moves.

  • ACWM-Phys: Investigating Generalized Physical Interaction in Action-Conditioned Video World Models cs.CV · 2026-05-09 · unverdicted · none · ref 4 · 2 links · internal anchor

    ACWM-Phys is a controllable simulator benchmark with in- and out-of-distribution protocols for evaluating action-conditioned world models across rigid, kinematic, deformable, and particle dynamics.

  • SYNCR: A Cross-Video Reasoning Benchmark with Synthetic Grounding cs.CV · 2026-05-08 · unverdicted · none · ref 10 · internal anchor

    SYNCR benchmark shows leading MLLMs reach only 52.5% average accuracy on cross-video reasoning tasks against an 89.5% human baseline, with major weaknesses in physical and spatial reasoning.

  • Learning Visual Feature-Based World Models via Residual Latent Action cs.CV · 2026-05-08 · unverdicted · none · ref 58 · internal anchor

    RLA-WM predicts residual latent actions via flow matching to create visual feature world models that outperform prior feature-based and diffusion approaches while enabling offline video-based robot RL.

  • Operator-Guided Invariance Learning for Continuous Reinforcement Learning cs.LG · 2026-05-07 · unverdicted · none · ref 7 · internal anchor

    VPSD-RL discovers exact and approximate value-preserving Lie-group operators in continuous RL to stabilize learning via transition augmentation and consistency regularization.

  • Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement cs.CV · 2026-05-07 · unverdicted · none · ref 9 · 2 links · internal anchor

    NOVA represents world states as INR weights for decoder-free rendering, compactness, and unsupervised disentanglement of background, foreground, and motion in video world models.

  • Counterfactual identifiability beyond global monotonicity: non-monotone triangular structural causal models cs.LG · 2026-05-06 · unverdicted · none · ref 14 · internal anchor

    Non-monotone triangular SCMs with mechanism-wise invertibility and context-independent inverse transport are equivalent to exogenous isomorphism and achieve complete counterfactual identifiability, with supporting experiments on synthetic data and MuJoCo tasks.

  • Latent State Design for World Models under Sufficiency Constraints cs.AI · 2026-05-03 · unverdicted · none · ref 24 · internal anchor

    World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.

  • Graph World Models: Concepts, Taxonomy, and Future Directions cs.AI · 2026-04-30 · unverdicted · none · ref 1 · internal anchor

    The paper unifies emerging graph-based world models under a new paradigm and proposes a taxonomy organized by spatial, physical, and logical relational inductive biases.

  • Exploring Spatial Intelligence from a Generative Perspective cs.CV · 2026-04-22 · unverdicted · none · ref 11 · internal anchor

    Fine-tuning multimodal models on a new synthetic spatial benchmark improves generative spatial compliance on real and synthetic tasks and transfers to better spatial understanding.

  • Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training cs.LG · 2026-04-20 · unverdicted · none · ref 5 · internal anchor

    Curiosity-Critic rewards the improvement in cumulative prediction error via a tractable per-step surrogate (current error minus learned asymptotic baseline), outperforming prior curiosity methods in a stochastic grid world.

  • GTASA: Ground Truth Annotations for Spatiotemporal Analysis, Evaluation and Training of Video Models cs.CV · 2026-04-12 · unverdicted · none · ref 24 · internal anchor

    GTASA supplies annotated multi-actor videos with exact 3D spatial and temporal ground truth that outperforms neural video generators in physical and semantic validity while enabling new probes of video encoders.

  • EgoTL: Egocentric Think-Aloud Chains for Long-Horizon Tasks cs.CV · 2026-04-10 · unverdicted · none · ref 18 · internal anchor

    EgoTL provides a new egocentric dataset with think-aloud chains and metric labels that benchmarks VLMs on long-horizon tasks and improves their planning, reasoning, and spatial grounding after finetuning.

  • MotionScape: A Large-Scale Real-World Highly Dynamic UAV Video Dataset for World Models cs.CV · 2026-04-09 · unverdicted · none · ref 10 · internal anchor

    MotionScape is a large-scale UAV video dataset with highly dynamic 6-DoF motions, geometric trajectories, and semantic annotations to train world models that better simulate complex 3D dynamics under large viewpoint changes.

  • SCP: Spatial Causal Prediction in Video cs.CV · 2026-03-04 · unverdicted · none · ref 23 · internal anchor

    SCP defines a new benchmark task for predicting spatial causal outcomes beyond direct observation and shows that 23 leading models lag far behind humans on it.

  • Joint Embedding Variational Bayes cs.LG · 2026-02-05 · unverdicted · none · ref 5 · internal anchor

    VJE is a new variational non-contrastive SSL method that models target embeddings with a directional-radial Student-t distribution to enable structured uncertainty estimation directly in the learned representation space.

  • Neural Neural Scaling Laws cs.LG · 2026-01-27 · conditional · none · ref 4 · internal anchor

    NeuNeu, a neural network trained on HuggingFace checkpoints, predicts language model accuracy on 66 downstream tasks at 1.99% MAE by extrapolating trajectories, outperforming logistic scaling laws by 44% and generalizing zero-shot to new models and tasks.

  • Latent Chain-of-Thought World Modeling for End-to-End Driving cs.CV · 2025-12-11 · unverdicted · none · ref 8 · internal anchor

    LCDrive unifies chain-of-thought reasoning and action selection for end-to-end driving by interleaving action-proposal tokens and latent world-model tokens that predict action outcomes, yielding faster inference and better trajectories than text-based or non-reasoning baselines.

  • Training Agents Inside of Scalable World Models cs.AI · 2025-09-29 · conditional · none · ref 63 · internal anchor

    Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.

  • Mastering Diverse Domains through World Models cs.AI · 2023-01-10 · unverdicted · none · ref 16 · internal anchor

    DreamerV3 uses world models and robustness techniques to solve over 150 tasks across domains with a single configuration, including Minecraft diamond collection from scratch.

  • Mastering Atari with Discrete World Models cs.LG · 2020-10-05 · accept · none · ref 22 · internal anchor

    DreamerV2 reaches human-level performance on 55 Atari games by learning behaviors inside a separately trained discrete-latent world model.

  • Dream to Control: Learning Behaviors by Latent Imagination cs.LG · 2019-12-03 · accept · none · ref 18 · internal anchor

    Dreamer learns to control from images by imagining and optimizing behaviors in a learned latent world model, outperforming prior methods on 20 visual tasks in data efficiency and final performance.

  • Learning the Arrow of Time cs.LG · 2019-07-02 · unverdicted · none · ref 13 · internal anchor

    Introduces a learned arrow of time in MDPs that aligns with the Jordan-Kinderlehrer-Otto notion for stochastic processes and enables practical RL utilities like reachability and side-effect detection.

  • Exploring Model-based Planning with Policy Networks cs.LG · 2019-06-20 · unverdicted · none · ref 13 · internal anchor

    POPLIN combines policy networks with model-predictive planning by optimizing either action sequences or policy parameters, yielding 3x better sample efficiency than PETS, TD3 and SAC on MuJoCo locomotion tasks.

  • SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models cs.CV · 2026-05-22 · unverdicted · none · ref 19 · internal anchor

    SCOPE adds per-pixel action conditioning to pretrained video diffusion models and releases the CrossFPS multi-game dataset to support cross-game FPS world model simulation with zero-shot transfer.

  • Efficient Agentic Reasoning Through Self-Regulated Simulative Planning cs.AI · 2026-05-21 · unverdicted · none · ref 30 · internal anchor

    SR²AM achieves competitive Pass@1 accuracy on diverse tasks with 25.8-95.3% fewer reasoning tokens than much larger models by using self-regulated simulative planning trained via supervised learning and RL.

  • FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching cs.CV · 2026-05-20 · unverdicted · none · ref 8 · internal anchor

    FlowLong generates videos several times longer than native model windows by blending adjacent predictions with Tweedie matching to enforce manifold and temporal consistency while using stochastic noise injection early and deterministic sampling later.

  • Do Vision--Language Models Understand 3D Scenes or Just Catalogue Objects? cs.CV · 2026-05-19 · accept · none · ref 15 · internal anchor

    VLMs achieve 53-97% on volumetric rearrangement planning but only 6-45% on occlusion and under 7% on reflections in a new 3,034-sample benchmark, with white-box analysis localizing the failure to visual-token merger in Qwen3-VL-8B-Thinking.

  • Latent Video Prediction Learns Better World Models cs.CV · 2026-05-15 · unverdicted · none · ref 11 · internal anchor

    Latent prediction video models exhibit a distinct robustness profile across corruption, occlusion, fine-grained discrimination, and temporal sensitivity compared to other self-supervised video models when used as world models.

  • Neural Point-Forms cs.LG · 2026-05-15 · unverdicted · none · ref 51 · internal anchor

    Neural point-forms are introduced as permutation-invariant neural layers that output learned form-comparison matrices for point clouds, with a claimed consistency proof under sampling and manifold assumptions and competitive results on synthetic and biological data.

  • ReactiveGWM: Steering NPC in Reactive Game World Models cs.CV · 2026-05-14 · unverdicted · none · ref 12 · internal anchor

    ReactiveGWM introduces a decoupled diffusion architecture for player-NPC interactions that learns game-agnostic response logic for zero-shot strategy transfer across games.

  • PriorZero: Bridging Language Priors and World Models for Decision Making cs.LG · 2026-05-12 · unverdicted · none · ref 1 · internal anchor

    PriorZero uses root-only LLM prior injection in MCTS and alternating world-model training with LLM fine-tuning to raise exploration efficiency and final performance on Jericho text games and BabyAI gridworlds.

  • WorldComp2D: Spatio-semantic Representations of Object Identity and Location from Local Views cs.CV · 2026-05-12 · unverdicted · none · ref 34 · internal anchor

    WorldComp2D explicitly structures latent space geometry by object identity and spatial proximity via a proximity-dependent encoder and localizer, cutting parameters up to 4X and FLOPs 2.2X versus state-of-the-art lightweight models on facial landmark localization while staying real-time on CPU.

  • Network-Efficient World Model Token Streaming cs.RO · 2026-05-11 · unverdicted · none · ref 1 · internal anchor

    An adaptive delta-prioritization algorithm using cosine distance and Hamming-drift thresholds improves embedding distortion by 4.8-7.2% and next-token perplexity by 2.1-6.3% over periodic keyframing at matched low bitrates for tokenized driving world models.

  • Beyond Thinking: Imagining in 360$^\circ$ for Humanoid Visual Search cs.CV · 2026-05-09 · unverdicted · none · ref 92 · internal anchor

    Imagining in 360° decouples visual search into a single-step probabilistic semantic layout predictor and an actor, removing the need for multi-turn CoT reasoning and trajectory annotations while improving efficiency in 360° environments.

  • MolWorld: Molecule World Models for Actionable Molecular Optimization cs.LG · 2026-05-09 · unverdicted · none · ref 17 · internal anchor

    MolWorld expands a molecule-transfer graph using a world model to discover high-property molecules that maintain strong structural connectivity to known compounds for actionable optimization.

  • Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners cs.AI · 2026-05-08 · unverdicted · none · ref 6 · internal anchor

    Frontier LRMs match human game-learning behavior and predict fMRI signals an order of magnitude better than RL or Bayesian agents because of their in-context game-state representations.

  • Predictive but Not Plannable: RC-aux for Latent World Models cs.LG · 2026-05-08 · unverdicted · none · ref 15 · internal anchor

    RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.

  • Three-in-One World Model: Energy-Based Consistency, Prediction, and Counterfactual Inference for Marketing Intervention cs.AI · 2026-05-08 · unverdicted · none · ref 8 · internal anchor

    A DBM-based architecture learns consumer beliefs to enable consistent prediction and counterfactual inference for marketing interventions, outperforming baselines on heterogeneous treatment effects in simulation.