pith. sign in

hub

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

74 Pith papers cite this work. Polarity classification is still indexing.

74 Pith papers citing it
abstract

Learning manipulable representations of the world and its dynamics is central to AI. Joint-Embedding Predictive Architectures (JEPAs) offer a promising blueprint, but lack of practical guidance and theory has led to ad-hoc R&D. We present a comprehensive theory of JEPAs and instantiate it in {\bf LeJEPA}, a lean, scalable, and theoretically grounded training objective. First, we identify the isotropic Gaussian as the optimal distribution that JEPAs' embeddings should follow to minimize downstream prediction risk. Second, we introduce a novel objective--{\bf Sketched Isotropic Gaussian Regularization} (SIGReg)--to constrain embeddings to reach that ideal distribution. Combining the JEPA predictive loss with SIGReg yields LeJEPA with numerous theoretical and practical benefits: (i) single trade-off hyperparameter, (ii) linear time and memory complexity, (iii) stability across hyper-parameters, architectures (ResNets, ViTs, ConvNets) and domains, (iv) heuristics-free, e.g., no stop-gradient, no teacher-student, no hyper-parameter schedulers, and (v) distributed training-friendly implementation requiring only $\approx$50 lines of code. Our empirical validation covers 10+ datasets, 60+ architectures, all with varying scales and domains. As an example, using imagenet-1k for pretraining and linear evaluation with frozen backbone, LeJEPA reaches 79\% with a ViT-H/14. We hope that the simplicity and theory-friendly ecosystem offered by LeJEPA will reestablish self-supervised pre-training as a core pillar of AI research (\href{https://github.com/rbalestr-lab/lejepa}{GitHub repo}).

hub tools

citation-role summary

background 2 method 2

citation-polarity summary

years

2026 74

clear filters

representative citing papers

A Generalization Theory for JEPA-Based World Models

cs.LG · 2026-06-25 · unverdicted · novelty 8.0

The paper formulates JEPA pretraining as conditional spectral graph learning equivalent to low-rank factorization of an action-conditioned co-occurrence matrix and derives a finite-sample generalization bound connecting pretraining error to downstream planning regret.

When Does LeJEPA Learn a World Model?

stat.ML · 2026-05-25 · unverdicted · novelty 8.0

LeJEPA achieves linear identifiability of latent variables uniquely when the latents are Gaussian in worlds with stationary additive-noise transitions.

LeVLJEPA: End-to-End Vision-Language Pretraining Without Negatives

cs.CV · 2026-07-01 · unverdicted · novelty 7.0

LeVLJEPA is the first non-contrastive vision-language pretraining method that learns via cross-modal prediction without negatives, producing stronger dense features than contrastive baselines on VQA and segmentation tasks.

Equilibrium World Models

econ.GN · 2026-06-22 · unverdicted · novelty 7.0

Equilibrium World Models are a deep-learning solver that enforces exact equilibrium conditions on broad model-generated state distributions to globally solve dynamic stochastic models featuring rare disasters, binding constraints, and counterfactual states.

A Unifying Framework for Concept-Based Representational Similarity

cs.LG · 2026-06-08 · unverdicted · novelty 7.0

A unifying framework decomposes concept alignment into instance-wise and distributional translation and concept consistency, introduces the InterVenchA benchmark, and shows that joint optimization via CoSAE recovers strong alignment even with 0.1% paired data.

HEPA: A Self-Supervised Horizon-Conditioned Event Predictive Architecture for Time Series

cs.LG · 2026-05-11 · unverdicted · novelty 7.0 · 3 refs

HEPA pretrains via horizon-conditioned JEPA on unlabeled data then fine-tunes only the predictor for event survival CDFs, outperforming PatchTST, iTransformer, MAE and Chronos-2 on at least 10 of 14 benchmarks with fixed hyperparameters, an order of magnitude fewer tuned parameters and less labeled

Joint Embedding Variational Bayes

cs.LG · 2026-02-05 · unverdicted · novelty 7.0

VJE is a new variational non-contrastive SSL method that models target embeddings with a directional-radial Student-t distribution to enable structured uncertainty estimation directly in the learned representation space.

ACID: Action Consistency via Inverse Dynamics for Planning with World Models

cs.RO · 2026-07-02 · unverdicted · novelty 6.0

ACID improves decision-time planning in world models by adding per-step action consistency residuals from an inverse dynamics model to the planning cost via an adaptive weight, yielding better performance with less compute across manipulation and navigation tasks.

citing papers explorer

Showing 50 of 70 citing papers after filters.