hub

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

Randall Balestriero, Yann LeCun · 2025 · cs.LG · arXiv 2511.08544

74 Pith papers cite this work. Polarity classification is still indexing.

74 Pith papers citing it

open full Pith review browse 74 citing papers arXiv PDF

abstract

Learning manipulable representations of the world and its dynamics is central to AI. Joint-Embedding Predictive Architectures (JEPAs) offer a promising blueprint, but lack of practical guidance and theory has led to ad-hoc R&D. We present a comprehensive theory of JEPAs and instantiate it in {\bf LeJEPA}, a lean, scalable, and theoretically grounded training objective. First, we identify the isotropic Gaussian as the optimal distribution that JEPAs' embeddings should follow to minimize downstream prediction risk. Second, we introduce a novel objective--{\bf Sketched Isotropic Gaussian Regularization} (SIGReg)--to constrain embeddings to reach that ideal distribution. Combining the JEPA predictive loss with SIGReg yields LeJEPA with numerous theoretical and practical benefits: (i) single trade-off hyperparameter, (ii) linear time and memory complexity, (iii) stability across hyper-parameters, architectures (ResNets, ViTs, ConvNets) and domains, (iv) heuristics-free, e.g., no stop-gradient, no teacher-student, no hyper-parameter schedulers, and (v) distributed training-friendly implementation requiring only $\approx$50 lines of code. Our empirical validation covers 10+ datasets, 60+ architectures, all with varying scales and domains. As an example, using imagenet-1k for pretraining and linear evaluation with frozen backbone, LeJEPA reaches 79\% with a ViT-H/14. We hope that the simplicity and theory-friendly ecosystem offered by LeJEPA will reestablish self-supervised pre-training as a core pillar of AI research (\href{https://github.com/rbalestr-lab/lejepa}{GitHub repo}).

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 method 2

citation-polarity summary

background 2 use method 2

representative citing papers

A Generalization Theory for JEPA-Based World Models

cs.LG · 2026-06-25 · unverdicted · novelty 8.0

The paper formulates JEPA pretraining as conditional spectral graph learning equivalent to low-rank factorization of an action-conditioned co-occurrence matrix and derives a finite-sample generalization bound connecting pretraining error to downstream planning regret.

When Does LeJEPA Learn a World Model?

stat.ML · 2026-05-25 · unverdicted · novelty 8.0

LeJEPA achieves linear identifiability of latent variables uniquely when the latents are Gaussian in worlds with stationary additive-noise transitions.

LeVLJEPA: End-to-End Vision-Language Pretraining Without Negatives

cs.CV · 2026-07-01 · unverdicted · novelty 7.0

LeVLJEPA is the first non-contrastive vision-language pretraining method that learns via cross-modal prediction without negatives, producing stronger dense features than contrastive baselines on VQA and segmentation tasks.

FlexTab: A Flexible Encoder-Decoder Architecture for In-Context Learning Across Diverse Tabular Tasks

cs.LG · 2026-06-29 · unverdicted · novelty 7.0 · 2 refs

FlexTab shows a shared encoder with task-specific decoders trained on unlabeled tables can achieve SOTA on classification, regression, anomaly detection and entity matching while staying competitive on relational entity classification.

Equilibrium World Models

econ.GN · 2026-06-22 · unverdicted · novelty 7.0

Equilibrium World Models are a deep-learning solver that enforces exact equilibrium conditions on broad model-generated state distributions to globally solve dynamic stochastic models featuring rare disasters, binding constraints, and counterfactual states.

SkyJEPA: Learning Long-Horizon World Models for Zero-Shot Sim-to-Real Control of Quadrotors

cs.RO · 2026-06-22 · unverdicted · novelty 7.0

SkyJEPA learns long-horizon latent dynamics for quadrotors via JEPA plus a physics prober, enabling zero-shot sim-to-real control with sampling-based MPC and automated sim data generation.

S-JEPA : Soft Clustering Anchors for Self-Supervised Speech Representation Learning

cs.SD · 2026-06-17 · unverdicted · novelty 7.0

S-JEPA uses soft GMM posteriors in a JEPA framework for self-supervised speech learning, achieving lowest WER below 90M parameters without offline re-clustering.

Identifiability Without Gaussianity: Symbolic World Models and Near-Infinite Temporal Consistency

stat.ML · 2026-06-09 · unverdicted · novelty 7.0

PGSA achieves exact linear identifiability and near-infinite temporal consistency for non-Gaussian regimes via symbolic causal grounding, with four theorems formalized in Lean 4.

When to Align, When to Predict: A Phase Diagram for Multimodal Learning

cs.LG · 2026-06-09 · accept · novelty 7.0 · 2 refs

A spiked signal-plus-noise model yields separation ratios that partition multimodal problems into four regimes where alignment, prediction, both, or neither succeed.

A Unifying Framework for Concept-Based Representational Similarity

cs.LG · 2026-06-08 · unverdicted · novelty 7.0

A unifying framework decomposes concept alignment into instance-wise and distributional translation and concept consistency, introduces the InterVenchA benchmark, and shows that joint optimization via CoSAE recovers strong alignment even with 0.1% paired data.

A Unifying View of Attention Sinks: Two Algorithms, Two Solutions

cs.LG · 2026-06-06 · unverdicted · novelty 7.0

Attention sinks reflect either adaptive nop or broadcast mechanisms, with distinct traces, synthetic diagnostics, and complementary interventions via gating plus registers.

Contrast encodes inductive bias: separating slow noise from dynamics in predictive representation learning

cs.LG · 2026-06-05 · conditional · novelty 7.0

Cross-trajectory negative sampling in contrastive predictive objectives causes encoding of slow noise over dynamics; intra-trajectory sampling eliminates the shortcut and recovers dynamical variables even under strong noise.

Exact equivariance, kept through training, buys zero-shot generalisation across the symmetry group

cs.LG · 2026-06-02 · unverdicted · novelty 7.0 · 2 refs

Exact equivariance preserved through training renders one-step relMSE invariant across the symmetry group, enabling zero-shot generalization from a restricted training slice.

UR-JEPA: Uniform Rectifiability as a Regularizer for Joint-Embedding Predictive Architectures

cs.LG · 2026-05-31 · unverdicted · novelty 7.0

UR-JEPA applies uniform rectifiability regularization via a smoothed Carleson square function to JEPA training, producing embeddings with 4-5 order PCA spectral drop at dimension 20-25 and lower seed variance than Gaussian regularization on Inet10, Galaxy10, and EuroSAT.

PEIRA: Learning Predictive Encoders through Inter-View Regressor Alignment

cs.LG · 2026-05-17 · unverdicted · novelty 7.0

PEIRA learns predictive encoders by optimizing the trace of the optimal inter-view linear regressor, with only nontrivial global minimizers as stable equilibria that recover leading nonlinear canonical correlation subspaces.

JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampling than pixel diffusion baselines.

HEPA: A Self-Supervised Horizon-Conditioned Event Predictive Architecture for Time Series

cs.LG · 2026-05-11 · unverdicted · novelty 7.0 · 3 refs

HEPA pretrains via horizon-conditioned JEPA on unlabeled data then fine-tunes only the predictor for event survival CDFs, outperforming PatchTST, iTransformer, MAE and Chronos-2 on at least 10 of 14 benchmarks with fixed hyperparameters, an order of magnitude fewer tuned parameters and less labeled

ProteinJEPA: Latent prediction complements protein language models

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Masked-position MLM plus JEPA latent prediction outperforms MLM-only pretraining on 10-11 of 16 downstream tasks for 35M-150M protein models while JEPA alone fails.

Statistical Consistency and Generalization of Contrastive Representation Learning

cs.LG · 2026-05-04 · unverdicted · novelty 7.0

The paper proves statistical consistency of contrastive loss to optimal ranking via an AUC criterion and derives generalization bounds O(1/m + 1/sqrt(n)) for supervised and O(1/sqrt(m) + 1/sqrt(n)) for self-supervised CRL that explain benefits of large negative sets.

Joint Embedding Variational Bayes

cs.LG · 2026-02-05 · unverdicted · novelty 7.0

VJE is a new variational non-contrastive SSL method that models target embeddings with a directional-radial Student-t distribution to enable structured uncertainty estimation directly in the learned representation space.

ACID: Action Consistency via Inverse Dynamics for Planning with World Models

cs.RO · 2026-07-02 · unverdicted · novelty 6.0

ACID improves decision-time planning in world models by adding per-step action consistency residuals from an inverse dynamics model to the planning cost via an adaptive weight, yielding better performance with less compute across manipulation and navigation tasks.

Delta-JEPA: Learning Action-Sensitive World Models via Latent Difference Decoding

cs.AI · 2026-06-30 · unverdicted · novelty 6.0

Delta-JEPA augments latent forward prediction with a Latent Difference Action Decoder that reconstructs actions from embedding displacements, yielding action-sensitive world models that improve planning on four visual continuous-control tasks over JEPA baselines.

ScaleAware-JEPA: Latent Representation for Discovery in Multiscale Physical Fields

cs.LG · 2026-06-29 · unverdicted · novelty 6.0

ScaleAware-JEPA combines Constrained Diffusion Decomposition with a scale-tied JEPA objective to learn label-free latent coordinates that recover coherent morphology in multiscale fields such as MHD turbulence and interstellar gas.

Domain-Informed Multi-View Self-Distillation for Astronomical Light-Curve Representation Learning with JEPA

astro-ph.IM · 2026-06-26 · unverdicted · novelty 6.0

A JEPA-based model with domain-informed multi-view self-distillation learns light-curve representations that outperform hand-crafted features on 15 of 16 StarEmbed metrics and adapts competitively to other irregular time-series datasets.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer