LeVLJEPA is the first non-contrastive vision-language pretraining method that learns via cross-modal prediction without negatives, producing stronger dense features than contrastive baselines on VQA and segmentation tasks.
stable-pretraining- v1: Foundation model research made simple.arXiv preprint arXiv:2511.19484, 2025
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3verdicts
UNVERDICTED 3representative citing papers
FF-JEPA introduces a two-model hierarchical structure with an action-free latent planner to decompose long-horizon planning into short subgoals in latent world models.
Matching in semantic SSL feature space via Sinkhorn divergence enables effective one-step generation on ImageNet by inducing compact geometry for distribution matching, with training and evaluation features best kept distinct.
citing papers explorer
-
LeVLJEPA: End-to-End Vision-Language Pretraining Without Negatives
LeVLJEPA is the first non-contrastive vision-language pretraining method that learns via cross-modal prediction without negatives, producing stronger dense features than contrastive baselines on VQA and segmentation tasks.