pith. machine review for the scientific record. sign in

arxiv: 2511.08544 · v3 · submitted 2025-11-11 · 💻 cs.LG · cs.AI· cs.CV· stat.ML

Recognition: 3 theorem links

· Lean Theorem

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

Authors on Pith no claims yet

Pith reviewed 2026-05-16 07:18 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CVstat.ML
keywords self-supervised learningJEPAisotropic Gaussianregularizationrepresentation learningViTImageNetpredictive architectures
0
0 comments X

The pith

LeJEPA shows that self-supervised embeddings reach minimal downstream prediction risk when constrained to an isotropic Gaussian distribution via sketched regularization added to the JEPA loss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper first derives that an isotropic Gaussian is the optimal distribution for embeddings in Joint-Embedding Predictive Architectures to reduce prediction risk on downstream tasks. It then proposes Sketched Isotropic Gaussian Regularization (SIGReg) as a practical way to enforce this distribution. When SIGReg is combined with the standard JEPA predictive loss, the resulting LeJEPA objective requires only one trade-off hyperparameter, runs in linear time and memory, and eliminates common training heuristics such as stop-gradients and teacher-student setups. Experiments across more than ten datasets and sixty architectures confirm stable performance, including 79 percent top-1 accuracy on ImageNet-1k pretraining followed by linear evaluation of a frozen ViT-H/14 backbone.

Core claim

Identifying the isotropic Gaussian as the distribution that minimizes downstream prediction risk and introducing SIGReg to enforce it allows the JEPA predictive loss to be augmented into LeJEPA, a single-objective training method that is theoretically grounded, linearly scalable, and free of ad-hoc heuristics while remaining stable across architectures and data domains.

What carries the argument

Sketched Isotropic Gaussian Regularization (SIGReg) approximates the constraint that embeddings follow an isotropic Gaussian distribution when added to the JEPA predictive loss.

If this is right

  • Only one trade-off hyperparameter controls the entire training process
  • Time and memory scale linearly with dataset size
  • Training remains stable without stop-gradients, teacher-student networks, or learning-rate schedulers
  • The method works across ResNets, ViTs, and ConvNets on multiple domains
  • ImageNet-1k pretraining followed by linear evaluation of a frozen ViT-H/14 reaches 79 percent top-1 accuracy

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The single-objective form could simplify large-scale distributed pretraining pipelines by removing many implementation choices
  • Similar sketched regularization might extend to other embedding-based objectives beyond JEPA
  • The Gaussian optimality result may link to broader information-theoretic views of representation learning
  • Implementation requiring roughly fifty lines of code suggests the method is immediately usable in standard frameworks

Load-bearing premise

The derivation that the isotropic Gaussian distribution minimizes downstream prediction risk must hold under the stated conditions on the embedding space and loss.

What would settle it

If LeJEPA requires extra heuristics or underperforms competitive baselines on a large new dataset where the optimality conditions are met, the claimed benefits of the Gaussian target and SIGReg would not generalize.

read the original abstract

Learning manipulable representations of the world and its dynamics is central to AI. Joint-Embedding Predictive Architectures (JEPAs) offer a promising blueprint, but lack of practical guidance and theory has led to ad-hoc R&D. We present a comprehensive theory of JEPAs and instantiate it in {\bf LeJEPA}, a lean, scalable, and theoretically grounded training objective. First, we identify the isotropic Gaussian as the optimal distribution that JEPAs' embeddings should follow to minimize downstream prediction risk. Second, we introduce a novel objective--{\bf Sketched Isotropic Gaussian Regularization} (SIGReg)--to constrain embeddings to reach that ideal distribution. Combining the JEPA predictive loss with SIGReg yields LeJEPA with numerous theoretical and practical benefits: (i) single trade-off hyperparameter, (ii) linear time and memory complexity, (iii) stability across hyper-parameters, architectures (ResNets, ViTs, ConvNets) and domains, (iv) heuristics-free, e.g., no stop-gradient, no teacher-student, no hyper-parameter schedulers, and (v) distributed training-friendly implementation requiring only $\approx$50 lines of code. Our empirical validation covers 10+ datasets, 60+ architectures, all with varying scales and domains. As an example, using imagenet-1k for pretraining and linear evaluation with frozen backbone, LeJEPA reaches 79\% with a ViT-H/14. We hope that the simplicity and theory-friendly ecosystem offered by LeJEPA will reestablish self-supervised pre-training as a core pillar of AI research (\href{https://github.com/rbalestr-lab/lejepa}{GitHub repo}).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to deliver a comprehensive theory of Joint-Embedding Predictive Architectures (JEPAs) by identifying the isotropic Gaussian as the unique distribution that minimizes downstream prediction risk, then instantiating this via Sketched Isotropic Gaussian Regularization (SIGReg) to produce LeJEPA: a single-hyperparameter objective with linear time/memory, no stop-gradients or teacher-student mechanisms, and stable performance across 60+ architectures and 10+ datasets. Empirical highlight is 79% linear-evaluation accuracy on ImageNet-1k using a ViT-H/14 backbone.

Significance. If the optimality derivation is rigorous and the empirical claims hold under the stated conditions, the work would be significant for replacing heuristic-heavy SSL pipelines with a provably motivated, implementation-light alternative. The single trade-off parameter, distributed-training compatibility, and broad architecture coverage are attractive; the open GitHub repo further strengthens reproducibility.

major comments (2)
  1. [Theory section] Theory section (optimality derivation): the claim that the isotropic Gaussian uniquely minimizes downstream risk under the JEPA loss is load-bearing for SIGReg and the 'provable' and 'heuristics-free' assertions, yet the provided sketch appears to assume a linear downstream head and independence between embedding covariances and targets; these assumptions must be stated explicitly with the precise conditions on the data distribution and predictor class, or the risk-reduction guarantee does not follow for general nonlinear heads.
  2. [§4] §4 (empirical validation): the 79% ImageNet-1k result with ViT-H/14 is concrete, but the stability claim across 60+ architectures requires an ablation table showing performance variance when the single trade-off hyperparameter is varied by ±50% and when the sketching dimension in SIGReg is reduced; without these controls the 'linear complexity' and 'no scheduler' benefits cannot be isolated from implementation details.
minor comments (2)
  1. [Abstract] Abstract and §3: the phrase 'parameter-free' for the Gaussian target is imprecise once the trade-off hyperparameter and sketching dimension are introduced; clarify the exact count of free parameters.
  2. [Tables/Figures] Figure captions and Table 1: axis labels and column headers should explicitly state whether accuracies are top-1 or top-5 and whether the backbone is frozen.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point-by-point below, providing clarifications on the theoretical assumptions and committing to additional empirical controls in the revision.

read point-by-point responses
  1. Referee: [Theory section] Theory section (optimality derivation): the claim that the isotropic Gaussian uniquely minimizes downstream risk under the JEPA loss is load-bearing for SIGReg and the 'provable' and 'heuristics-free' assertions, yet the provided sketch appears to assume a linear downstream head and independence between embedding covariances and targets; these assumptions must be stated explicitly with the precise conditions on the data distribution and predictor class, or the risk-reduction guarantee does not follow for general nonlinear heads.

    Authors: We agree that the optimality derivation relies on a linear downstream head and assumes independence between embedding covariances and targets. These conditions will be stated explicitly in the revised theory section, together with the precise requirements on the data distribution (bounded second moments) and the predictor class (linear functions). Under these assumptions the isotropic Gaussian is the unique minimizer of downstream risk. For general nonlinear heads the strict guarantee does not follow from the current analysis; we will add a remark acknowledging this limitation while noting that the empirical results across diverse architectures remain consistent with the proposed objective. revision: partial

  2. Referee: [§4] §4 (empirical validation): the 79% ImageNet-1k result with ViT-H/14 is concrete, but the stability claim across 60+ architectures requires an ablation table showing performance variance when the single trade-off hyperparameter is varied by ±50% and when the sketching dimension in SIGReg is reduced; without these controls the 'linear complexity' and 'no scheduler' benefits cannot be isolated from implementation details.

    Authors: We accept that stronger controls are needed to isolate the claimed benefits. In the revised manuscript we will add an ablation table in §4 that reports linear-evaluation accuracy for the trade-off hyperparameter varied by ±50% and for reduced sketching dimensions, using a representative subset of the 60+ architectures (including ResNets and ViTs). The full set of results will be summarized with references to the new table, thereby supporting the stability, linear-complexity, and scheduler-free claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper presents a first-principles derivation identifying the isotropic Gaussian as the distribution minimizing downstream prediction risk under its stated conditions on embeddings and loss, then introduces SIGReg as a new constraint to enforce it. No quoted steps reduce by the paper's own equations to a fitted input renamed as prediction, a self-definitional loop, or a load-bearing self-citation chain. The central claim (LeJEPA objective) retains independent content from the new regularization rather than collapsing to its inputs by construction. This is the expected non-finding for a theory-driven contribution with external empirical validation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method depends on one explicit trade-off hyperparameter and the domain assumption that isotropic Gaussian embeddings minimize downstream risk; no new particles or dimensions are postulated.

free parameters (1)
  • trade-off hyperparameter
    Single scalar balancing the predictive JEPA loss against SIGReg; its value is chosen once per run rather than fitted per dataset.
axioms (1)
  • domain assumption Isotropic Gaussian is the optimal distribution for JEPAs embeddings to minimize downstream prediction risk.
    Identified as the first theoretical step before introducing SIGReg.

pith-pipeline@v0.9.0 · 5622 in / 1326 out tokens · 24985 ms · 2026-05-16T07:18:53.098609+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • Cost.FunctionalEquation washburn_uniqueness_aczel echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    we identify the isotropic Gaussian as the optimal distribution that JEPAs' embeddings should follow to minimize downstream prediction risk... SIGReg... to constrain embeddings to reach that ideal distribution

  • Foundation.LogicAsFunctionalEquation rcl_polynomial_closure_theorem echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    Combining the JEPA predictive loss with SIGReg yields LeJEPA with... heuristics-free... single trade-off hyperparameter... linear time and memory complexity

  • Foundation.LogicAsFunctionalEquation operative_to_laws_of_logic refines
    ?
    refines

    Relation between the paper passage and the cited Recognition theorem.

    finite pairwise polynomial closure... RCL family

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 21 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning

    cs.LG 2026-05 unverdicted novelty 7.0

    JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampli...

  2. ProteinJEPA: Latent prediction complements protein language models

    cs.LG 2026-05 unverdicted novelty 7.0

    Masked-position MLM plus JEPA latent prediction outperforms MLM-only pretraining on 10-11 of 16 downstream tasks for 35M-150M protein models while JEPA alone fails.

  3. Joint Embedding Variational Bayes

    cs.LG 2026-02 unverdicted novelty 7.0

    VJE is a new variational non-contrastive SSL method that models target embeddings with a directional-radial Student-t distribution to enable structured uncertainty estimation directly in the learned representation space.

  4. HEPA: A Self-Supervised Horizon-Conditioned Event Predictive Architecture for Time Series

    cs.LG 2026-05 unverdicted novelty 6.0

    HEPA combines self-supervised JEPA pretraining on time series representations with horizon-conditioned finetuning to predict rare events via survival CDFs, outperforming PatchTST, iTransformer, MAE, and Chronos-2 on a...

  5. HEPA: A Self-Supervised Horizon-Conditioned Event Predictive Architecture for Time Series

    cs.LG 2026-05 unverdicted novelty 6.0

    HEPA combines JEPA self-supervised pretraining with horizon-conditioned fine-tuning to predict rare events in multivariate time series as a monotonic survival distribution, outperforming PatchTST, iTransformer, MAE, a...

  6. Latent Geometry Beyond Search: Amortizing Planning in World Models

    cs.RO 2026-05 unverdicted novelty 6.0

    In regularized latent spaces of world models, planning can be amortized into a goal-conditioned inverse dynamics model that matches CEM performance at 100-130x lower per-decision cost.

  7. Predictive but Not Plannable: RC-aux for Latent World Models

    cs.LG 2026-05 unverdicted novelty 6.0

    RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.

  8. AeroJEPA: Learning Semantic Latent Representations for Scalable 3D Aerodynamic Field Modeling

    cs.LG 2026-05 unverdicted novelty 6.0

    AeroJEPA applies joint-embedding predictive learning to produce scalable, semantically organized latent representations for 3D aerodynamic fields that support both field reconstruction and downstream design tasks.

  9. Understanding Self-Supervised Learning via Latent Distribution Matching

    cs.LG 2026-05 unverdicted novelty 6.0

    Self-supervised learning is cast as latent distribution matching that aligns representations to a model while enforcing uniformity, unifying multiple SSL families and proving identifiability for predictive variants ev...

  10. Why Self-Supervised Encoders Want to Be Normal

    cs.IT 2026-04 unverdicted novelty 6.0

    Self-supervised encoders prefer isotropic Gaussian latent states because the Information Bottleneck, recast as rate-distortion over the predictive manifold, makes these states optimal for target-neutral representations.

  11. Information bottleneck for learning the phase space of dynamics from high-dimensional experimental data

    physics.data-an 2026-04 unverdicted novelty 6.0

    DySIB recovers a two-dimensional representation matching the phase space of a physical pendulum from high-dimensional video data by maximizing predictive mutual information in latent space.

  12. Self-supervised pretraining for an iterative image size agnostic vision transformer

    cs.CV 2026-04 unverdicted novelty 6.0

    A sequential-to-global SSL method based on DINO pretrains iterative foveal-inspired vision transformers to achieve competitive ImageNet-1K performance with constant compute regardless of input resolution.

  13. Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity

    cs.LG 2026-04 unverdicted novelty 6.0

    Sonata is a small hybrid world model pre-trained to predict future IMU states that outperforms autoregressive baselines on clinical discrimination, fall-risk prediction, and cross-cohort transfer while fitting on-devi...

  14. Infrastructure-Centric World Models: Bridging Temporal Depth and Spatial Breadth for Roadside Perception

    cs.CV 2026-04 unverdicted novelty 6.0

    Infrastructure-centric world models use roadside sensors' temporal depth to complement vehicle spatial breadth for better traffic simulation and prediction.

  15. REZE: Representation Regularization for Domain-adaptive Text Embedding Pre-finetuning

    cs.CL 2026-04 unverdicted novelty 6.0

    REZE controls representation shifts in contrastive pre-finetuning of text embeddings via eigenspace decomposition of anchor-positive pairs and adaptive soft-shrinkage on task-variant directions.

  16. LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

    cs.LG 2026-03 unverdicted novelty 6.0

    LeWM is the first end-to-end trainable JEPA from pixels that uses only two loss terms for stable training and fast planning on 2D/3D control tasks.

  17. PEPR: Privileged Event-based Predictive Regularization for Domain Generalization

    cs.CV 2026-02 unverdicted novelty 6.0

    PEPR reframes learning with privileged event data as predicting latent event features from RGB to improve domain generalization in object detection and segmentation without direct cross-modal alignment.

  18. MultiMedVision: Multi-Modal Medical Vision Framework

    cs.CV 2026-05 unverdicted novelty 5.0

    A unified Sparse Vision Transformer learns joint 2D/3D medical image representations via self-supervision and achieves competitive AUROC on chest X-ray and CT benchmarks with 5x less data than modality-specific models.

  19. Efficient Hierarchical Implicit Flow Q-learning for Offline Goal-conditioned Reinforcement Learning

    cs.LG 2026-04 unverdicted novelty 5.0

    Proposes mean flow policies and LeJEPA loss to overcome Gaussian policy limits and weak subgoal generation in hierarchical offline GCRL, reporting strong results on OGBench state and pixel tasks.

  20. Position: agentic AI orchestration should be Bayes-consistent

    cs.AI 2026-05 unverdicted novelty 4.0

    Agentic AI orchestration should apply Bayesian principles for belief maintenance, updating from interactions, and utility-based action selection.

  21. JEPAMatch: Geometric Representation Shaping for Semi-Supervised Learning

    cs.LG 2026-04 unverdicted novelty 4.0

    JEPAMatch augments FlexMatch with LeJEPA-derived latent regularization to produce better-structured representations, yielding higher accuracy and faster convergence on CIFAR-100, STL-10, and Tiny-ImageNet.

Reference graph

Works this paper leans on

144 extracted references · 144 canonical work pages · cited by 20 Pith papers · 11 internal anchors

  1. [1]

    Biometrika , volume=

    An analysis of variance test for normality (complete samples) , author=. Biometrika , volume=. 1965 , publisher=

  2. [2]

    Biometrika , volume=

    Goodness-of-fit tests on a circle , author=. Biometrika , volume=. 1961 , publisher=

  3. [3]

    2010 , publisher=

    Digital nets and sequences: discrepancy theory and quasi--Monte Carlo integration , author=. 2010 , publisher=

  4. [4]

    NA , year=

    Characteristic functions , author=. NA , year=

  5. [5]

    Econometric reviews , volume=

    Empirical characteristic function estimation and its applications , author=. Econometric reviews , volume=. 2004 , publisher=

  6. [6]

    Les Fonctions quasi analytiques: le

    Carleman, Torsten , year=. Les Fonctions quasi analytiques: le

  7. [7]

    The annals of Statistics , pages=

    The empirical characteristic function and its applications , author=. The annals of Statistics , pages=. 1977 , publisher=

  8. [8]

    1987 , publisher=

    Real and complex analysis , author=. 1987 , publisher=

  9. [9]

    2, 2022-06-27 , author=

    A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27 , author=. Open Review , volume=

  10. [10]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Momentum contrast for unsupervised visual representation learning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  11. [11]

    International conference on machine learning , pages=

    A simple framework for contrastive learning of visual representations , author=. International conference on machine learning , pages=. 2020 , organization=

  12. [12]

    International conference on machine learning , pages=

    Whitening for self-supervised representation learning , author=. International conference on machine learning , pages=. 2021 , organization=

  13. [13]

    VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

    Vicreg: Variance-invariance-covariance regularization for self-supervised learning , author=. arXiv preprint arXiv:2105.04906 , year=

  14. [14]

    1981 , publisher=

    Probability, statistics, and truth , author=. 1981 , publisher=

  15. [15]

    IEEE transactions on neural networks and learning systems , volume=

    Efficient kNN classification with different numbers of nearest neighbors , author=. IEEE transactions on neural networks and learning systems , volume=. 2017 , publisher=

  16. [16]

    Theory of Probability & Its Applications , volume=

    On estimating regression , author=. Theory of Probability & Its Applications , volume=. 1964 , publisher=

  17. [17]

    Journal of Mathematical Imaging and Vision , volume=

    Sliced and radon wasserstein barycenters of measures , author=. Journal of Mathematical Imaging and Vision , volume=. 2015 , publisher=

  18. [18]

    2019 , eprint=

    Averaging Weights Leads to Wider Optima and Better Generalization , author=. 2019 , eprint=

  19. [19]

    Uncertainty in artificial intelligence , pages=

    Sliced score matching: A scalable approach to density and score estimation , author=. Uncertainty in artificial intelligence , pages=. 2020 , organization=

  20. [20]

    Software Impacts , volume=

    The t-digest: Efficient estimates of distributions , author=. Software Impacts , volume=. 2021 , publisher=

  21. [21]

    Computing Extremely Accurate Quantiles Using t-Digests

    Computing extremely accurate quantiles using t-digests , author=. arXiv preprint arXiv:1902.04023 , year=

  22. [22]

    arXiv preprint arXiv:1908.10693 , year=

    Ddsketch: A fast and fully-mergeable quantile sketch with relative-error guarantees , author=. arXiv preprint arXiv:1908.10693 , year=

  23. [23]

    Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units , pages=

    Comparison based sorting for systems with multiple GPUs , author=. Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units , pages=

  24. [24]

    Proceedings of the 2022 International Conference on Management of Data , pages=

    Evaluating multi-GPU sorting with modern interconnects , author=. Proceedings of the 2022 International Conference on Management of Data , pages=

  25. [25]

    Advances in Neural Information Processing Systems , volume=

    Energy-based sliced wasserstein distance , author=. Advances in Neural Information Processing Systems , volume=

  26. [26]

    Smooth regression analysis , author=. Sankhy. 1964 , publisher=

  27. [27]

    2019 international conference on intelligent computing and control systems (ICCS) , pages=

    A brief review of nearest neighbor algorithm for learning and classification , author=. 2019 international conference on intelligent computing and control systems (ICCS) , pages=. 2019 , organization=

  28. [28]

    2010 seventh international conference on fuzzy systems and knowledge discovery , volume=

    An adaptive k-nearest neighbor algorithm , author=. 2010 seventh international conference on fuzzy systems and knowledge discovery , volume=. 2010 , organization=

  29. [29]

    2006 , publisher=

    Pattern recognition and machine learning , author=. 2006 , publisher=

  30. [30]

    SIAM journal on matrix analysis and applications , volume=

    Tikhonov regularization and total least squares , author=. SIAM journal on matrix analysis and applications , volume=. 1999 , publisher=

  31. [31]

    Neural computation , volume=

    Training with noise is equivalent to Tikhonov regularization , author=. Neural computation , volume=. 1995 , publisher=

  32. [32]

    Big data , volume=

    Effects of distance measure choice on k-nearest neighbor classifier performance: a review , author=. Big data , volume=. 2019 , publisher=

  33. [33]

    International Conference on Machine Learning , pages=

    An alternative softmax operator for reinforcement learning , author=. International Conference on Machine Learning , pages=. 2017 , organization=

  34. [34]

    International Journal of Mathematical and Statistical Sciences , volume=

    Nonparametric entropy estimation: An overview , author=. International Journal of Mathematical and Statistical Sciences , volume=. 1997 , publisher=

  35. [35]

    2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003

    A new class of entropy estimators for multi-dimensional densities , author=. 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP'03). , volume=. 2003 , organization=

  36. [36]

    Annals of the Institute of Statistical Mathematics , volume=

    Estimation of entropy and other functionals of a multivariate density , author=. Annals of the Institute of Statistical Mathematics , volume=. 1989 , publisher=

  37. [37]

    2018 , publisher=

    Density estimation for statistics and data analysis , author=. 2018 , publisher=

  38. [38]

    Vision Transformers Need Registers

    Vision transformers need registers , author=. arXiv preprint arXiv:2309.16588 , year=

  39. [39]

    DINOv3

    Dinov3 , author=. arXiv preprint arXiv:2508.10104 , year=

  40. [40]

    Advances in neural information processing systems , volume=

    Bootstrap your own latent-a new approach to self-supervised learning , author=. Advances in neural information processing systems , volume=

  41. [41]

    arXiv preprint arXiv:2308.00566 , year=

    Stochastic positional embeddings improve masked image modeling , author=. arXiv preprint arXiv:2308.00566 , year=

  42. [42]

    The Journal of Machine Learning Research , volume=

    Hilbert space embeddings and metrics on probability measures , author=. The Journal of Machine Learning Research , volume=. 2010 , publisher=

  43. [43]

    International conference on machine learning , pages=

    A kernel test of goodness of fit , author=. International conference on machine learning , pages=. 2016 , organization=

  44. [44]

    The journal of machine learning research , volume=

    A kernel two-sample test , author=. The journal of machine learning research , volume=. 2012 , publisher=

  45. [45]

    A. N. Kolmogorov , title =. Giornale dell'Istituto Italiano degli Attuari , volume =

  46. [46]

    1990 , publisher=

    How to test normality and other distributional assumptions , author=. 1990 , publisher=

  47. [47]

    goodness of fit

    Asymptotic theory of certain" goodness of fit" criteria based on stochastic processes , author=. The annals of mathematical statistics , pages=. 1952 , publisher=

  48. [48]

    Scandinavian Actuarial Journal , volume=

    On the composition of elementary errors: First paper: Mathematical deductions , author=. Scandinavian Actuarial Journal , volume=. 1928 , publisher=

  49. [49]

    Journal of the London Mathematical Society , volume=

    Some theorems on distribution functions , author=. Journal of the London Mathematical Society , volume=. 1936 , publisher=

  50. [50]

    Advances in Neural Information Processing Systems , volume=

    Implicit variance regularization in non-contrastive SSL , author=. Advances in Neural Information Processing Systems , volume=

  51. [51]

    DINOv2: Learning Robust Visual Features without Supervision

    Dinov2: Learning robust visual features without supervision , author=. arXiv preprint arXiv:2304.07193 , year=

  52. [52]

    arXiv preprint arXiv:2504.01017 , year=

    Scaling language-free visual representation learning , author=. arXiv preprint arXiv:2504.01017 , year=

  53. [53]

    Advances in neural information processing systems , volume=

    Big self-supervised models are strong semi-supervised learners , author=. Advances in neural information processing systems , volume=

  54. [54]

    Proceedings of the ieee/cvf International Conference on computer vision , pages=

    Scaling and benchmarking self-supervised visual representation learning , author=. Proceedings of the ieee/cvf International Conference on computer vision , pages=

  55. [55]

    Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

    Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

  56. [56]

    arXiv preprint arXiv:2405.15613 , year=

    Automatic data curation for self-supervised learning: A clustering-based approach , author=. arXiv preprint arXiv:2405.15613 , year=

  57. [57]

    arXiv preprint arXiv:2305.17326 , year=

    Matrix information theory for self-supervised learning , author=. arXiv preprint arXiv:2305.17326 , year=

  58. [58]

    IEEE transactions on knowledge and data engineering , volume=

    Self-supervised learning: Generative or contrastive , author=. IEEE transactions on knowledge and data engineering , volume=. 2021 , publisher=

  59. [59]

    arXiv preprint arXiv:2207.10081 , year=

    What do we maximize in self-supervised learning? , author=. arXiv preprint arXiv:2207.10081 , year=

  60. [60]

    Entropy , volume=

    To compress or not to compress—self-supervised learning and information theory: A review , author=. Entropy , volume=. 2024 , publisher=

  61. [61]

    , author=

    Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. , author=. Journal of machine learning research , volume=

  62. [62]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    The inaturalist species classification and detection dataset , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  63. [63]

    Proceedings of the National Academy of Sciences , volume=

    Prevalence of neural collapse during the terminal phase of deep learning training , author=. Proceedings of the National Academy of Sciences , volume=. 2020 , publisher=

  64. [64]

    Advances in neural information processing systems , volume=

    Semi-supervised learning with deep generative models , author=. Advances in neural information processing systems , volume=

  65. [65]

    nature , volume=

    Learning representations by back-propagating errors , author=. nature , volume=. 1986 , publisher=

  66. [66]

    nature , volume=

    Deep learning , author=. nature , volume=. 2015 , publisher=

  67. [67]

    DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    Droid: A large-scale in-the-wild robot manipulation dataset , author=. arXiv preprint arXiv:2403.12945 , year=

  68. [68]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Emerging properties in self-supervised vision transformers , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  69. [69]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Self-supervised learning from images with a joint-embedding predictive architecture , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  70. [70]

    Advances in Neural Information Processing Systems , volume=

    How jepa avoids noisy features: The implicit bias of deep linear self distillation networks , author=. Advances in Neural Information Processing Systems , volume=

  71. [71]

    Journal of personality , volume=

    On the perception of incongruity: A paradigm , author=. Journal of personality , volume=. 1949 , publisher=

  72. [72]

    Voss, Leipzig , year=

    Handbook of physiological optics , author=. Voss, Leipzig , year=

  73. [73]

    siamese

    Signature verification using a" siamese" time delay neural network , author=. Advances in neural information processing systems , volume=

  74. [74]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    On the importance of asymmetry for siamese representation learning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  75. [75]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    An empirical study of training self-supervised vision transformers , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  76. [76]

    International Conference on Machine Learning , pages=

    Understanding self-supervised learning dynamics without contrastive pairs , author=. International Conference on Machine Learning , pages=. 2021 , organization=

  77. [77]

    IEEE transactions on electronic computers , number=

    Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition , author=. IEEE transactions on electronic computers , number=. 2006 , publisher=

  78. [78]

    arXiv preprint arXiv:2110.09348 , year=

    Understanding dimensional collapse in contrastive self-supervised learning , author=. arXiv preprint arXiv:2110.09348 , year=

  79. [79]

    1867 , publisher=

    Handbuch der physiologischen Optik , author=. 1867 , publisher=

  80. [80]

    Philosophical Transactions of the Royal Society of London

    Perceptions as hypotheses , author=. Philosophical Transactions of the Royal Society of London. B, Biological Sciences , volume=. 1980 , publisher=

Showing first 80 references.