hub

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

Randall Balestriero, Yann LeCun · 2025 · cs.LG · arXiv 2511.08544

31 Pith papers cite this work. Polarity classification is still indexing.

31 Pith papers citing it

open full Pith review browse 31 citing papers arXiv PDF

abstract

Learning manipulable representations of the world and its dynamics is central to AI. Joint-Embedding Predictive Architectures (JEPAs) offer a promising blueprint, but lack of practical guidance and theory has led to ad-hoc R&D. We present a comprehensive theory of JEPAs and instantiate it in {\bf LeJEPA}, a lean, scalable, and theoretically grounded training objective. First, we identify the isotropic Gaussian as the optimal distribution that JEPAs' embeddings should follow to minimize downstream prediction risk. Second, we introduce a novel objective--{\bf Sketched Isotropic Gaussian Regularization} (SIGReg)--to constrain embeddings to reach that ideal distribution. Combining the JEPA predictive loss with SIGReg yields LeJEPA with numerous theoretical and practical benefits: (i) single trade-off hyperparameter, (ii) linear time and memory complexity, (iii) stability across hyper-parameters, architectures (ResNets, ViTs, ConvNets) and domains, (iv) heuristics-free, e.g., no stop-gradient, no teacher-student, no hyper-parameter schedulers, and (v) distributed training-friendly implementation requiring only $\approx$50 lines of code. Our empirical validation covers 10+ datasets, 60+ architectures, all with varying scales and domains. As an example, using imagenet-1k for pretraining and linear evaluation with frozen backbone, LeJEPA reaches 79\% with a ViT-H/14. We hope that the simplicity and theory-friendly ecosystem offered by LeJEPA will reestablish self-supervised pre-training as a core pillar of AI research (\href{https://github.com/rbalestr-lab/lejepa}{GitHub repo}).

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 method 2

citation-polarity summary

background 2 use method 2

representative citing papers

PEIRA: Learning Predictive Encoders through Inter-View Regressor Alignment

cs.LG · 2026-05-17 · unverdicted · novelty 7.0

PEIRA learns predictive encoders by optimizing the trace of the optimal inter-view linear regressor, with only nontrivial global minimizers as stable equilibria that recover leading nonlinear canonical correlation subspaces.

JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampling than pixel diffusion baselines.

ProteinJEPA: Latent prediction complements protein language models

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Masked-position MLM plus JEPA latent prediction outperforms MLM-only pretraining on 10-11 of 16 downstream tasks for 35M-150M protein models while JEPA alone fails.

Joint Embedding Variational Bayes

cs.LG · 2026-02-05 · unverdicted · novelty 7.0

VJE is a new variational non-contrastive SSL method that models target embeddings with a directional-radial Student-t distribution to enable structured uncertainty estimation directly in the learned representation space.

Uncovering the Latent Potential of Deep Intermediate Representations

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Introduces LOES, a constructive spectral method to select task-discriminative subspaces from intermediate layer embeddings, and GeoReg for enforcing simplicial class geometry during fine-tuning, with reported gains increasing with model depth across modalities.

SpectralEarth-FM: Bringing Hyperspectral Imagery into Multimodal Earth Observation Pretraining

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

SpectralEarth-FM is a multisensor hierarchical transformer pretrained on a 40TB co-located HSI-MSI-SAR dataset using a JEPA-style objective and reports state-of-the-art results on hyperspectral and standard EO benchmarks.

Beyond Isotropy in JEPAs: Hamiltonian Geometry and Symplectic Prediction

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

Fixed isotropic marginals in JEPAs can be maximally misaligned with unknown structured geometries, and HamJEPA using symplectic Hamiltonian leapfrog maps improves kNN and linear-probe performance on CIFAR-100 and ImageNet-100.

LACE: Latent Visual Representation for Cross-Embodiment Learning

cs.RO · 2026-05-16 · unverdicted · novelty 6.0

LACE aligns human-robot visual features via semantic distribution matching on corresponding body parts plus Gram loss, yielding 65% better zero-shot policy transfer than baseline DINO.

Latent Geometry Beyond Search: Amortizing Planning in World Models

cs.RO · 2026-05-09 · unverdicted · novelty 6.0

In regularized latent spaces of world models, planning can be amortized into a goal-conditioned inverse dynamics model that matches CEM performance at 100-130x lower per-decision cost.

Predictive but Not Plannable: RC-aux for Latent World Models

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.

AeroJEPA: Learning Semantic Latent Representations for Scalable 3D Aerodynamic Field Modeling

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

AeroJEPA applies joint-embedding predictive learning to produce scalable, semantically organized latent representations for 3D aerodynamic fields that support both field reconstruction and downstream design tasks.

Why Self-Supervised Encoders Want to Be Normal

cs.IT · 2026-04-30 · unverdicted · novelty 6.0

Self-supervised encoders prefer isotropic Gaussian latent states because the Information Bottleneck, recast as rate-distortion over the predictive manifold, makes these states optimal for target-neutral representations.

Information bottleneck for learning the phase space of dynamics from high-dimensional experimental data

physics.data-an · 2026-04-27 · unverdicted · novelty 6.0

DySIB recovers a two-dimensional representation matching the phase space of a physical pendulum from high-dimensional video data by maximizing predictive mutual information in latent space.

Self-supervised pretraining for an iterative image size agnostic vision transformer

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

A sequential-to-global SSL method based on DINO pretrains iterative foveal-inspired vision transformers to achieve competitive ImageNet-1K performance with constant compute regardless of input resolution.

Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

Sonata is a small hybrid world model pre-trained to predict future IMU states that outperforms autoregressive baselines on clinical discrimination, fall-risk prediction, and cross-cohort transfer while fitting on-device wearables.

Infrastructure-Centric World Models: Bridging Temporal Depth and Spatial Breadth for Roadside Perception

cs.CV · 2026-04-19 · unverdicted · novelty 6.0

Infrastructure-centric world models use roadside sensors' temporal depth to complement vehicle spatial breadth for better traffic simulation and prediction.

REZE: Representation Regularization for Domain-adaptive Text Embedding Pre-finetuning

cs.CL · 2026-04-19 · unverdicted · novelty 6.0

REZE controls representation shifts in contrastive pre-finetuning of text embeddings via eigenspace decomposition of anchor-positive pairs and adaptive soft-shrinkage on task-variant directions.

LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

cs.LG · 2026-03-13 · unverdicted · novelty 6.0

LeWM is the first end-to-end trainable JEPA from pixels that uses only two loss terms for stable training and fast planning on 2D/3D control tasks.

PEPR: Privileged Event-based Predictive Regularization for Domain Generalization

cs.CV · 2026-02-04 · unverdicted · novelty 6.0

PEPR reframes learning with privileged event data as predicting latent event features from RGB to improve domain generalization in object detection and segmentation without direct cross-modal alignment.

Foresee-to-Ground: From Predictive Temporal Perception to Evidence-Driven Reasoning for Video Temporal Grounding

cs.CV · 2026-05-21 · unverdicted · novelty 5.0

F2G improves video temporal grounding accuracy by decoupling event identification from boundary measurement using predictive temporal perception to create citable evidence segments for LLM reasoning.

stable-worldmodel: A Platform for Reproducible World Modeling Research and Evaluation

cs.LG · 2026-05-20 · unverdicted · novelty 5.0

The paper presents stable-worldmodel (swm), a platform with high-performance data layer, modern world model baselines, planning solvers, and extended environments for reproducible research and generalization evaluation.

Quantifying the Pre-training Dividend: Generative versus Latent Self-Supervised Learning for Time Series Foundation Models

cs.LG · 2026-05-19 · unverdicted · novelty 5.0

Self-supervised pre-training delivers large gains up to 375% on time series anomaly detection and classification but only marginal benefits for forecasting, driven by a precision-invariance trade-off in the learned representations.

Factorized Latent Dynamics for Video JEPA: An Empirical Study of Auxiliary Objectives

cs.CV · 2026-05-16 · unverdicted · novelty 5.0

Empirical tests show that factorized world-model with hard-region-weighted latent dynamics improves ImageNet-100 by 5.92 and SSv2 by 3.21 points over baseline in mixed-dataset pretraining while staying within 0.3 points on Diving-48.

Representation Without Reward: A JEPA Audit for LLM Fine-Tuning

cs.LG · 2026-05-14 · conditional · novelty 5.0

An empirical audit of 22 JEPA-style training auxiliaries on Llama-3.2-1B fine-tuning for regex generation finds no statistically significant task improvement after multiple-testing correction, even when auxiliaries visibly alter hidden-state geometry.

citing papers explorer

Showing 31 of 31 citing papers.

PEIRA: Learning Predictive Encoders through Inter-View Regressor Alignment cs.LG · 2026-05-17 · unverdicted · none · ref 6 · internal anchor
PEIRA learns predictive encoders by optimizing the trace of the optimal inter-view linear regressor, with only nontrivial global minimizers as stable equilibria that recover leading nonlinear canonical correlation subspaces.
JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning cs.LG · 2026-05-13 · unverdicted · none · ref 41 · internal anchor
JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampling than pixel diffusion baselines.
ProteinJEPA: Latent prediction complements protein language models cs.LG · 2026-05-08 · unverdicted · none · ref 5 · internal anchor
Masked-position MLM plus JEPA latent prediction outperforms MLM-only pretraining on 10-11 of 16 downstream tasks for 35M-150M protein models while JEPA alone fails.
Joint Embedding Variational Bayes cs.LG · 2026-02-05 · unverdicted · none · ref 2 · internal anchor
VJE is a new variational non-contrastive SSL method that models target embeddings with a directional-radial Student-t distribution to enable structured uncertainty estimation directly in the learned representation space.
Uncovering the Latent Potential of Deep Intermediate Representations cs.LG · 2026-05-21 · unverdicted · none · ref 27 · internal anchor
Introduces LOES, a constructive spectral method to select task-discriminative subspaces from intermediate layer embeddings, and GeoReg for enforcing simplicial class geometry during fine-tuning, with reported gains increasing with model depth across modalities.
SpectralEarth-FM: Bringing Hyperspectral Imagery into Multimodal Earth Observation Pretraining cs.CV · 2026-05-20 · unverdicted · none · ref 6 · internal anchor
SpectralEarth-FM is a multisensor hierarchical transformer pretrained on a 40TB co-located HSI-MSI-SAR dataset using a JEPA-style objective and reports state-of-the-art results on hyperspectral and standard EO benchmarks.
Beyond Isotropy in JEPAs: Hamiltonian Geometry and Symplectic Prediction cs.LG · 2026-05-19 · unverdicted · none · ref 3 · internal anchor
Fixed isotropic marginals in JEPAs can be maximally misaligned with unknown structured geometries, and HamJEPA using symplectic Hamiltonian leapfrog maps improves kNN and linear-probe performance on CIFAR-100 and ImageNet-100.
LACE: Latent Visual Representation for Cross-Embodiment Learning cs.RO · 2026-05-16 · unverdicted · none · ref 63 · internal anchor
LACE aligns human-robot visual features via semantic distribution matching on corresponding body parts plus Gram loss, yielding 65% better zero-shot policy transfer than baseline DINO.
Latent Geometry Beyond Search: Amortizing Planning in World Models cs.RO · 2026-05-09 · unverdicted · none · ref 2 · internal anchor
In regularized latent spaces of world models, planning can be amortized into a goal-conditioned inverse dynamics model that matches CEM performance at 100-130x lower per-decision cost.
Predictive but Not Plannable: RC-aux for Latent World Models cs.LG · 2026-05-08 · unverdicted · none · ref 4 · internal anchor
RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.
AeroJEPA: Learning Semantic Latent Representations for Scalable 3D Aerodynamic Field Modeling cs.LG · 2026-05-07 · unverdicted · none · ref 5 · internal anchor
AeroJEPA applies joint-embedding predictive learning to produce scalable, semantically organized latent representations for 3D aerodynamic fields that support both field reconstruction and downstream design tasks.
Why Self-Supervised Encoders Want to Be Normal cs.IT · 2026-04-30 · unverdicted · none · ref 6 · internal anchor
Self-supervised encoders prefer isotropic Gaussian latent states because the Information Bottleneck, recast as rate-distortion over the predictive manifold, makes these states optimal for target-neutral representations.
Information bottleneck for learning the phase space of dynamics from high-dimensional experimental data physics.data-an · 2026-04-27 · unverdicted · none · ref 40 · internal anchor
DySIB recovers a two-dimensional representation matching the phase space of a physical pendulum from high-dimensional video data by maximizing predictive mutual information in latent space.
Self-supervised pretraining for an iterative image size agnostic vision transformer cs.CV · 2026-04-22 · unverdicted · none · ref 4 · internal anchor
A sequential-to-global SSL method based on DINO pretrains iterative foveal-inspired vision transformers to achieve competitive ImageNet-1K performance with constant compute regardless of input resolution.
Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity cs.LG · 2026-04-20 · unverdicted · none · ref 64 · internal anchor
Sonata is a small hybrid world model pre-trained to predict future IMU states that outperforms autoregressive baselines on clinical discrimination, fall-risk prediction, and cross-cohort transfer while fitting on-device wearables.
Infrastructure-Centric World Models: Bridging Temporal Depth and Spatial Breadth for Roadside Perception cs.CV · 2026-04-19 · unverdicted · none · ref 3 · internal anchor
Infrastructure-centric world models use roadside sensors' temporal depth to complement vehicle spatial breadth for better traffic simulation and prediction.
REZE: Representation Regularization for Domain-adaptive Text Embedding Pre-finetuning cs.CL · 2026-04-19 · unverdicted · none · ref 38 · internal anchor
REZE controls representation shifts in contrastive pre-finetuning of text embeddings via eigenspace decomposition of anchor-positive pairs and adaptive soft-shrinkage on task-variant directions.
LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels cs.LG · 2026-03-13 · unverdicted · none · ref 27 · internal anchor
LeWM is the first end-to-end trainable JEPA from pixels that uses only two loss terms for stable training and fast planning on 2D/3D control tasks.
PEPR: Privileged Event-based Predictive Regularization for Domain Generalization cs.CV · 2026-02-04 · unverdicted · none · ref 4 · internal anchor
PEPR reframes learning with privileged event data as predicting latent event features from RGB to improve domain generalization in object detection and segmentation without direct cross-modal alignment.
Foresee-to-Ground: From Predictive Temporal Perception to Evidence-Driven Reasoning for Video Temporal Grounding cs.CV · 2026-05-21 · unverdicted · none · ref 40 · internal anchor
F2G improves video temporal grounding accuracy by decoupling event identification from boundary measurement using predictive temporal perception to create citable evidence segments for LLM reasoning.
stable-worldmodel: A Platform for Reproducible World Modeling Research and Evaluation cs.LG · 2026-05-20 · unverdicted · none · ref 20 · internal anchor
The paper presents stable-worldmodel (swm), a platform with high-performance data layer, modern world model baselines, planning solvers, and extended environments for reproducible research and generalization evaluation.
Quantifying the Pre-training Dividend: Generative versus Latent Self-Supervised Learning for Time Series Foundation Models cs.LG · 2026-05-19 · unverdicted · none · ref 2 · internal anchor
Self-supervised pre-training delivers large gains up to 375% on time series anomaly detection and classification but only marginal benefits for forecasting, driven by a precision-invariance trade-off in the learned representations.
Factorized Latent Dynamics for Video JEPA: An Empirical Study of Auxiliary Objectives cs.CV · 2026-05-16 · unverdicted · none · ref 33 · internal anchor
Empirical tests show that factorized world-model with hard-region-weighted latent dynamics improves ImageNet-100 by 5.92 and SSv2 by 3.21 points over baseline in mixed-dataset pretraining while staying within 0.3 points on Diving-48.
Representation Without Reward: A JEPA Audit for LLM Fine-Tuning cs.LG · 2026-05-14 · conditional · none · ref 16 · internal anchor
An empirical audit of 22 JEPA-style training auxiliaries on Llama-3.2-1B fine-tuning for regex generation finds no statistically significant task improvement after multiple-testing correction, even when auxiliaries visibly alter hidden-state geometry.
MultiMedVision: Multi-Modal Medical Vision Framework cs.CV · 2026-05-09 · unverdicted · none · ref 2 · internal anchor
A unified Sparse Vision Transformer learns joint 2D/3D medical image representations via self-supervision and achieves competitive AUROC on chest X-ray and CT benchmarks with 5x less data than modality-specific models.
Efficient Hierarchical Implicit Flow Q-learning for Offline Goal-conditioned Reinforcement Learning cs.LG · 2026-04-10 · unverdicted · none · ref 27 · internal anchor
Proposes mean flow policies and LeJEPA loss to overcome Gaussian policy limits and weak subgoal generation in hierarchical offline GCRL, reporting strong results on OGBench state and pixel tasks.
Position: agentic AI orchestration should be Bayes-consistent cs.AI · 2026-05-01 · unverdicted · none · ref 6 · internal anchor
Agentic AI orchestration should apply Bayesian principles for belief maintenance, updating from interactions, and utility-based action selection.
JEPAMatch: Geometric Representation Shaping for Semi-Supervised Learning cs.LG · 2026-04-22 · unverdicted · none · ref 3 · internal anchor
JEPAMatch augments FlexMatch with LeJEPA-derived latent regularization to produce better-structured representations, yielding higher accuracy and faster convergence on CIFAR-100, STL-10, and Tiny-ImageNet.
HEPA: A Self-Supervised Horizon-Conditioned Event Predictive Architecture for Time Series cs.LG · 2026-05-11 · unreviewed · ref 19 · 2 links · internal anchor
Understanding Self-Supervised Learning via Latent Distribution Matching cs.LG · 2026-05-05 · unreviewed · ref 3 · 2 links · internal anchor
Statistical Consistency and Generalization of Contrastive Representation Learning cs.LG · 2026-05-04 · unreviewed · ref 84 · internal anchor

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer