IA-JEPA applies interaction-aware masking to JEPA, raising causal reasoning accuracy on CLEVRER from 3.22% to 14.26% while producing a higher-entropy latent space that better aligns with physical energy.
Cross-modal contrastive masked autoencoder for compressed video pre-training.IEEE Transactions on Image Processing (TIP), 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Entity-Centric World Models: Interaction-Aware Masking for Causal Video Prediction
IA-JEPA applies interaction-aware masking to JEPA, raising causal reasoning accuracy on CLEVRER from 3.22% to 14.26% while producing a higher-entropy latent space that better aligns with physical energy.