JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampling than pixel diffusion baselines.
Temporal straightening for latent planning
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 8roles
background 2polarities
background 2representative citing papers
STEP embeds progressive time series into a manifold between orthogonal prototypes so that polar angle tracks irreversible state progression and radius tracks mode via self-supervised contrastive learning.
Slot-MPC learns slot representations to build a differentiable object-centric dynamics model that supports efficient gradient-based MPC for robotic manipulation in novel situations.
RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.
The work derives the optimal ratio of dynamics-to-reward samples that minimizes a bound on return error and characterizes the tradeoff between noisy but cheap rewards versus accurate but expensive ones in imagination-based policy optimization.
A vision-language-aligned world model turns visuomotor MPC into a language-following planner that reaches 87% success on 288 unseen semantic tasks where standard VLAs drop to 22%.
An empirical audit of 22 JEPA-style training auxiliaries on Llama-3.2-1B fine-tuning for regex generation finds no statistically significant task improvement after multiple-testing correction, even when auxiliaries visibly alter hidden-state geometry.
citing papers explorer
-
JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning
JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampling than pixel diffusion baselines.
-
STEP: Learning STructured Embeddings for Progressive Time Series
STEP embeds progressive time series into a manifold between orthogonal prototypes so that polar angle tracks irreversible state progression and radius tracks mode via self-supervised contrastive learning.
-
Slot-MPC: Goal-Conditioned Model Predictive Control with Object-Centric Representations
Slot-MPC learns slot representations to build a differentiable object-centric dynamics model that supports efficient gradient-based MPC for robotic manipulation in novel situations.
-
Predictive but Not Plannable: RC-aux for Latent World Models
RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.
-
On Training in Imagination
The work derives the optimal ratio of dynamics-to-reward samples that minimizes a bound on return error and characterizes the tradeoff between noisy but cheap rewards versus accurate but expensive ones in imagination-based policy optimization.
-
Grounded World Model for Semantically Generalizable Planning
A vision-language-aligned world model turns visuomotor MPC into a language-following planner that reaches 87% success on 288 unseen semantic tasks where standard VLAs drop to 22%.
-
Representation Without Reward: A JEPA Audit for LLM Fine-Tuning
An empirical audit of 22 JEPA-style training auxiliaries on Llama-3.2-1B fine-tuning for regex generation finds no statistically significant task improvement after multiple-testing correction, even when auxiliaries visibly alter hidden-state geometry.
- LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels