pith. sign in

hub

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

31 Pith papers cite this work. Polarity classification is still indexing.

31 Pith papers citing it
abstract

Learning manipulable representations of the world and its dynamics is central to AI. Joint-Embedding Predictive Architectures (JEPAs) offer a promising blueprint, but lack of practical guidance and theory has led to ad-hoc R&D. We present a comprehensive theory of JEPAs and instantiate it in {\bf LeJEPA}, a lean, scalable, and theoretically grounded training objective. First, we identify the isotropic Gaussian as the optimal distribution that JEPAs' embeddings should follow to minimize downstream prediction risk. Second, we introduce a novel objective--{\bf Sketched Isotropic Gaussian Regularization} (SIGReg)--to constrain embeddings to reach that ideal distribution. Combining the JEPA predictive loss with SIGReg yields LeJEPA with numerous theoretical and practical benefits: (i) single trade-off hyperparameter, (ii) linear time and memory complexity, (iii) stability across hyper-parameters, architectures (ResNets, ViTs, ConvNets) and domains, (iv) heuristics-free, e.g., no stop-gradient, no teacher-student, no hyper-parameter schedulers, and (v) distributed training-friendly implementation requiring only $\approx$50 lines of code. Our empirical validation covers 10+ datasets, 60+ architectures, all with varying scales and domains. As an example, using imagenet-1k for pretraining and linear evaluation with frozen backbone, LeJEPA reaches 79\% with a ViT-H/14. We hope that the simplicity and theory-friendly ecosystem offered by LeJEPA will reestablish self-supervised pre-training as a core pillar of AI research (\href{https://github.com/rbalestr-lab/lejepa}{GitHub repo}).

hub tools

citation-role summary

background 2 method 2

citation-polarity summary

years

2026 31

representative citing papers

Joint Embedding Variational Bayes

cs.LG · 2026-02-05 · unverdicted · novelty 7.0

VJE is a new variational non-contrastive SSL method that models target embeddings with a directional-radial Student-t distribution to enable structured uncertainty estimation directly in the learned representation space.

Uncovering the Latent Potential of Deep Intermediate Representations

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Introduces LOES, a constructive spectral method to select task-discriminative subspaces from intermediate layer embeddings, and GeoReg for enforcing simplicial class geometry during fine-tuning, with reported gains increasing with model depth across modalities.

Predictive but Not Plannable: RC-aux for Latent World Models

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.

Why Self-Supervised Encoders Want to Be Normal

cs.IT · 2026-04-30 · unverdicted · novelty 6.0

Self-supervised encoders prefer isotropic Gaussian latent states because the Information Bottleneck, recast as rate-distortion over the predictive manifold, makes these states optimal for target-neutral representations.

Representation Without Reward: A JEPA Audit for LLM Fine-Tuning

cs.LG · 2026-05-14 · conditional · novelty 5.0

An empirical audit of 22 JEPA-style training auxiliaries on Llama-3.2-1B fine-tuning for regex generation finds no statistically significant task improvement after multiple-testing correction, even when auxiliaries visibly alter hidden-state geometry.

citing papers explorer

Showing 31 of 31 citing papers.