IA-JEPA applies motion-centric masking in JEPA to focus on entity interactions, reporting 14.26% causal reasoning accuracy on CLEVRER versus 3.22% for standard baselines plus higher latent entropy and R²=0.43 energy linearization.
Masked autoencoders are scalable vision learners
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Entity-Centric World Models: Interaction-Aware Masking for Causal Video Prediction
IA-JEPA applies motion-centric masking in JEPA to focus on entity interactions, reporting 14.26% causal reasoning accuracy on CLEVRER versus 3.22% for standard baselines plus higher latent entropy and R²=0.43 energy linearization.