Causal-JEPA: Learning World Models through Object-Level Latent Masking

Heejeong Nam; Lucas Maes; Quentin Le Lidec; Randall Balestriero; Yann LeCun

arxiv: 2602.11389 · v2 · pith:IGW7QBTSnew · submitted 2026-02-11 · 💻 cs.AI

Causal-JEPA: Learning World Models through Object-Level Latent Masking

Heejeong Nam , Quentin Le Lidec , Lucas Maes , Yann LeCun , Randall Balestriero This is my paper

classification 💻 cs.AI

keywords c-jepamaskingobject-levelpredictionworldmodelsobject-centriccontrol

0 comments

read the original abstract

World models require robust relational understanding to support prediction, reasoning, and control. While object-centric representations provide a useful abstraction, they are not sufficient to capture interaction-dependent dynamics. We therefore propose C-JEPA, a simple and flexible object-centric world model that extends masked joint embedding prediction from image patches to object-centric representations. By masking object-level latents and requiring each masked object state to be inferred from the surrounding context, C-JEPA imposes structured partial observability during training, creating counterfactual-like prediction queries that discourage shortcut solutions and make interaction-dependent prediction necessary under the learning objective. Empirically, C-JEPA leads to consistent gains in visual question answering, with an absolute improvement of about 20% in counterfactual reasoning over the same architecture without object-level masking. On agent control tasks, C-JEPA enables substantially more efficient planning by using only 1% of the total latent input features required by patch-based world models, while achieving comparable performance. Finally, we provide a formal analysis demonstrating that object-level masking induces useful inductive bias by controlling observability. Our code is available at https://github.com/galilai-group/cjepa.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Latent State Design for World Models under Sufficiency Constraints
cs.AI 2026-05 unverdicted novelty 7.0

World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.
LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels
cs.LG 2026-03 unverdicted novelty 6.0

LeWM is the first end-to-end trainable JEPA from pixels that uses only two loss terms for stable training and fast planning on 2D/3D control tasks.
CausalVAE as a Plug-in for World Models: Towards Reliable Counterfactual Dynamics
cs.LG 2026-04 unverdicted novelty 5.0

CausalVAE plug-in for world models preserves factual prediction and boosts counterfactual retrieval, with large gains on physics benchmarks and recovered physical interaction trends.