Recognition: 3 theorem links
· Lean TheoremLeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics
Pith reviewed 2026-05-16 07:18 UTC · model grok-4.3
The pith
LeJEPA shows that self-supervised embeddings reach minimal downstream prediction risk when constrained to an isotropic Gaussian distribution via sketched regularization added to the JEPA loss.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Identifying the isotropic Gaussian as the distribution that minimizes downstream prediction risk and introducing SIGReg to enforce it allows the JEPA predictive loss to be augmented into LeJEPA, a single-objective training method that is theoretically grounded, linearly scalable, and free of ad-hoc heuristics while remaining stable across architectures and data domains.
What carries the argument
Sketched Isotropic Gaussian Regularization (SIGReg) approximates the constraint that embeddings follow an isotropic Gaussian distribution when added to the JEPA predictive loss.
If this is right
- Only one trade-off hyperparameter controls the entire training process
- Time and memory scale linearly with dataset size
- Training remains stable without stop-gradients, teacher-student networks, or learning-rate schedulers
- The method works across ResNets, ViTs, and ConvNets on multiple domains
- ImageNet-1k pretraining followed by linear evaluation of a frozen ViT-H/14 reaches 79 percent top-1 accuracy
Where Pith is reading between the lines
- The single-objective form could simplify large-scale distributed pretraining pipelines by removing many implementation choices
- Similar sketched regularization might extend to other embedding-based objectives beyond JEPA
- The Gaussian optimality result may link to broader information-theoretic views of representation learning
- Implementation requiring roughly fifty lines of code suggests the method is immediately usable in standard frameworks
Load-bearing premise
The derivation that the isotropic Gaussian distribution minimizes downstream prediction risk must hold under the stated conditions on the embedding space and loss.
What would settle it
If LeJEPA requires extra heuristics or underperforms competitive baselines on a large new dataset where the optimality conditions are met, the claimed benefits of the Gaussian target and SIGReg would not generalize.
read the original abstract
Learning manipulable representations of the world and its dynamics is central to AI. Joint-Embedding Predictive Architectures (JEPAs) offer a promising blueprint, but lack of practical guidance and theory has led to ad-hoc R&D. We present a comprehensive theory of JEPAs and instantiate it in {\bf LeJEPA}, a lean, scalable, and theoretically grounded training objective. First, we identify the isotropic Gaussian as the optimal distribution that JEPAs' embeddings should follow to minimize downstream prediction risk. Second, we introduce a novel objective--{\bf Sketched Isotropic Gaussian Regularization} (SIGReg)--to constrain embeddings to reach that ideal distribution. Combining the JEPA predictive loss with SIGReg yields LeJEPA with numerous theoretical and practical benefits: (i) single trade-off hyperparameter, (ii) linear time and memory complexity, (iii) stability across hyper-parameters, architectures (ResNets, ViTs, ConvNets) and domains, (iv) heuristics-free, e.g., no stop-gradient, no teacher-student, no hyper-parameter schedulers, and (v) distributed training-friendly implementation requiring only $\approx$50 lines of code. Our empirical validation covers 10+ datasets, 60+ architectures, all with varying scales and domains. As an example, using imagenet-1k for pretraining and linear evaluation with frozen backbone, LeJEPA reaches 79\% with a ViT-H/14. We hope that the simplicity and theory-friendly ecosystem offered by LeJEPA will reestablish self-supervised pre-training as a core pillar of AI research (\href{https://github.com/rbalestr-lab/lejepa}{GitHub repo}).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to deliver a comprehensive theory of Joint-Embedding Predictive Architectures (JEPAs) by identifying the isotropic Gaussian as the unique distribution that minimizes downstream prediction risk, then instantiating this via Sketched Isotropic Gaussian Regularization (SIGReg) to produce LeJEPA: a single-hyperparameter objective with linear time/memory, no stop-gradients or teacher-student mechanisms, and stable performance across 60+ architectures and 10+ datasets. Empirical highlight is 79% linear-evaluation accuracy on ImageNet-1k using a ViT-H/14 backbone.
Significance. If the optimality derivation is rigorous and the empirical claims hold under the stated conditions, the work would be significant for replacing heuristic-heavy SSL pipelines with a provably motivated, implementation-light alternative. The single trade-off parameter, distributed-training compatibility, and broad architecture coverage are attractive; the open GitHub repo further strengthens reproducibility.
major comments (2)
- [Theory section] Theory section (optimality derivation): the claim that the isotropic Gaussian uniquely minimizes downstream risk under the JEPA loss is load-bearing for SIGReg and the 'provable' and 'heuristics-free' assertions, yet the provided sketch appears to assume a linear downstream head and independence between embedding covariances and targets; these assumptions must be stated explicitly with the precise conditions on the data distribution and predictor class, or the risk-reduction guarantee does not follow for general nonlinear heads.
- [§4] §4 (empirical validation): the 79% ImageNet-1k result with ViT-H/14 is concrete, but the stability claim across 60+ architectures requires an ablation table showing performance variance when the single trade-off hyperparameter is varied by ±50% and when the sketching dimension in SIGReg is reduced; without these controls the 'linear complexity' and 'no scheduler' benefits cannot be isolated from implementation details.
minor comments (2)
- [Abstract] Abstract and §3: the phrase 'parameter-free' for the Gaussian target is imprecise once the trade-off hyperparameter and sketching dimension are introduced; clarify the exact count of free parameters.
- [Tables/Figures] Figure captions and Table 1: axis labels and column headers should explicitly state whether accuracies are top-1 or top-5 and whether the backbone is frozen.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment point-by-point below, providing clarifications on the theoretical assumptions and committing to additional empirical controls in the revision.
read point-by-point responses
-
Referee: [Theory section] Theory section (optimality derivation): the claim that the isotropic Gaussian uniquely minimizes downstream risk under the JEPA loss is load-bearing for SIGReg and the 'provable' and 'heuristics-free' assertions, yet the provided sketch appears to assume a linear downstream head and independence between embedding covariances and targets; these assumptions must be stated explicitly with the precise conditions on the data distribution and predictor class, or the risk-reduction guarantee does not follow for general nonlinear heads.
Authors: We agree that the optimality derivation relies on a linear downstream head and assumes independence between embedding covariances and targets. These conditions will be stated explicitly in the revised theory section, together with the precise requirements on the data distribution (bounded second moments) and the predictor class (linear functions). Under these assumptions the isotropic Gaussian is the unique minimizer of downstream risk. For general nonlinear heads the strict guarantee does not follow from the current analysis; we will add a remark acknowledging this limitation while noting that the empirical results across diverse architectures remain consistent with the proposed objective. revision: partial
-
Referee: [§4] §4 (empirical validation): the 79% ImageNet-1k result with ViT-H/14 is concrete, but the stability claim across 60+ architectures requires an ablation table showing performance variance when the single trade-off hyperparameter is varied by ±50% and when the sketching dimension in SIGReg is reduced; without these controls the 'linear complexity' and 'no scheduler' benefits cannot be isolated from implementation details.
Authors: We accept that stronger controls are needed to isolate the claimed benefits. In the revised manuscript we will add an ablation table in §4 that reports linear-evaluation accuracy for the trade-off hyperparameter varied by ±50% and for reduced sketching dimensions, using a representative subset of the 60+ architectures (including ResNets and ViTs). The full set of results will be summarized with references to the new table, thereby supporting the stability, linear-complexity, and scheduler-free claims. revision: yes
Circularity Check
No significant circularity detected in derivation chain
full rationale
The paper presents a first-principles derivation identifying the isotropic Gaussian as the distribution minimizing downstream prediction risk under its stated conditions on embeddings and loss, then introduces SIGReg as a new constraint to enforce it. No quoted steps reduce by the paper's own equations to a fitted input renamed as prediction, a self-definitional loop, or a load-bearing self-citation chain. The central claim (LeJEPA objective) retains independent content from the new regularization rather than collapsing to its inputs by construction. This is the expected non-finding for a theory-driven contribution with external empirical validation.
Axiom & Free-Parameter Ledger
free parameters (1)
- trade-off hyperparameter
axioms (1)
- domain assumption Isotropic Gaussian is the optimal distribution for JEPAs embeddings to minimize downstream prediction risk.
Lean theorems connected to this paper
-
Cost.FunctionalEquationwashburn_uniqueness_aczel echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
we identify the isotropic Gaussian as the optimal distribution that JEPAs' embeddings should follow to minimize downstream prediction risk... SIGReg... to constrain embeddings to reach that ideal distribution
-
Foundation.LogicAsFunctionalEquationrcl_polynomial_closure_theorem echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
Combining the JEPA predictive loss with SIGReg yields LeJEPA with... heuristics-free... single trade-off hyperparameter... linear time and memory complexity
-
Foundation.LogicAsFunctionalEquationoperative_to_laws_of_logic refines?
refinesRelation between the paper passage and the cited Recognition theorem.
finite pairwise polynomial closure... RCL family
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 21 Pith papers
-
JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning
JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampli...
-
ProteinJEPA: Latent prediction complements protein language models
Masked-position MLM plus JEPA latent prediction outperforms MLM-only pretraining on 10-11 of 16 downstream tasks for 35M-150M protein models while JEPA alone fails.
-
Joint Embedding Variational Bayes
VJE is a new variational non-contrastive SSL method that models target embeddings with a directional-radial Student-t distribution to enable structured uncertainty estimation directly in the learned representation space.
-
HEPA: A Self-Supervised Horizon-Conditioned Event Predictive Architecture for Time Series
HEPA combines self-supervised JEPA pretraining on time series representations with horizon-conditioned finetuning to predict rare events via survival CDFs, outperforming PatchTST, iTransformer, MAE, and Chronos-2 on a...
-
HEPA: A Self-Supervised Horizon-Conditioned Event Predictive Architecture for Time Series
HEPA combines JEPA self-supervised pretraining with horizon-conditioned fine-tuning to predict rare events in multivariate time series as a monotonic survival distribution, outperforming PatchTST, iTransformer, MAE, a...
-
Latent Geometry Beyond Search: Amortizing Planning in World Models
In regularized latent spaces of world models, planning can be amortized into a goal-conditioned inverse dynamics model that matches CEM performance at 100-130x lower per-decision cost.
-
Predictive but Not Plannable: RC-aux for Latent World Models
RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.
-
AeroJEPA: Learning Semantic Latent Representations for Scalable 3D Aerodynamic Field Modeling
AeroJEPA applies joint-embedding predictive learning to produce scalable, semantically organized latent representations for 3D aerodynamic fields that support both field reconstruction and downstream design tasks.
-
Understanding Self-Supervised Learning via Latent Distribution Matching
Self-supervised learning is cast as latent distribution matching that aligns representations to a model while enforcing uniformity, unifying multiple SSL families and proving identifiability for predictive variants ev...
-
Why Self-Supervised Encoders Want to Be Normal
Self-supervised encoders prefer isotropic Gaussian latent states because the Information Bottleneck, recast as rate-distortion over the predictive manifold, makes these states optimal for target-neutral representations.
-
Information bottleneck for learning the phase space of dynamics from high-dimensional experimental data
DySIB recovers a two-dimensional representation matching the phase space of a physical pendulum from high-dimensional video data by maximizing predictive mutual information in latent space.
-
Self-supervised pretraining for an iterative image size agnostic vision transformer
A sequential-to-global SSL method based on DINO pretrains iterative foveal-inspired vision transformers to achieve competitive ImageNet-1K performance with constant compute regardless of input resolution.
-
Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity
Sonata is a small hybrid world model pre-trained to predict future IMU states that outperforms autoregressive baselines on clinical discrimination, fall-risk prediction, and cross-cohort transfer while fitting on-devi...
-
Infrastructure-Centric World Models: Bridging Temporal Depth and Spatial Breadth for Roadside Perception
Infrastructure-centric world models use roadside sensors' temporal depth to complement vehicle spatial breadth for better traffic simulation and prediction.
-
REZE: Representation Regularization for Domain-adaptive Text Embedding Pre-finetuning
REZE controls representation shifts in contrastive pre-finetuning of text embeddings via eigenspace decomposition of anchor-positive pairs and adaptive soft-shrinkage on task-variant directions.
-
LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels
LeWM is the first end-to-end trainable JEPA from pixels that uses only two loss terms for stable training and fast planning on 2D/3D control tasks.
-
PEPR: Privileged Event-based Predictive Regularization for Domain Generalization
PEPR reframes learning with privileged event data as predicting latent event features from RGB to improve domain generalization in object detection and segmentation without direct cross-modal alignment.
-
MultiMedVision: Multi-Modal Medical Vision Framework
A unified Sparse Vision Transformer learns joint 2D/3D medical image representations via self-supervision and achieves competitive AUROC on chest X-ray and CT benchmarks with 5x less data than modality-specific models.
-
Efficient Hierarchical Implicit Flow Q-learning for Offline Goal-conditioned Reinforcement Learning
Proposes mean flow policies and LeJEPA loss to overcome Gaussian policy limits and weak subgoal generation in hierarchical offline GCRL, reporting strong results on OGBench state and pixel tasks.
-
Position: agentic AI orchestration should be Bayes-consistent
Agentic AI orchestration should apply Bayesian principles for belief maintenance, updating from interactions, and utility-based action selection.
-
JEPAMatch: Geometric Representation Shaping for Semi-Supervised Learning
JEPAMatch augments FlexMatch with LeJEPA-derived latent regularization to produce better-structured representations, yielding higher accuracy and faster convergence on CIFAR-100, STL-10, and Tiny-ImageNet.
Reference graph
Works this paper leans on
-
[1]
An analysis of variance test for normality (complete samples) , author=. Biometrika , volume=. 1965 , publisher=
work page 1965
-
[2]
Goodness-of-fit tests on a circle , author=. Biometrika , volume=. 1961 , publisher=
work page 1961
-
[3]
Digital nets and sequences: discrepancy theory and quasi--Monte Carlo integration , author=. 2010 , publisher=
work page 2010
- [4]
-
[5]
Empirical characteristic function estimation and its applications , author=. Econometric reviews , volume=. 2004 , publisher=
work page 2004
-
[6]
Les Fonctions quasi analytiques: le
Carleman, Torsten , year=. Les Fonctions quasi analytiques: le
-
[7]
The annals of Statistics , pages=
The empirical characteristic function and its applications , author=. The annals of Statistics , pages=. 1977 , publisher=
work page 1977
- [8]
-
[9]
A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27 , author=. Open Review , volume=
work page 2022
-
[10]
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
Momentum contrast for unsupervised visual representation learning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
-
[11]
International conference on machine learning , pages=
A simple framework for contrastive learning of visual representations , author=. International conference on machine learning , pages=. 2020 , organization=
work page 2020
-
[12]
International conference on machine learning , pages=
Whitening for self-supervised representation learning , author=. International conference on machine learning , pages=. 2021 , organization=
work page 2021
-
[13]
VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning
Vicreg: Variance-invariance-covariance regularization for self-supervised learning , author=. arXiv preprint arXiv:2105.04906 , year=
work page internal anchor Pith review Pith/arXiv arXiv
- [14]
-
[15]
IEEE transactions on neural networks and learning systems , volume=
Efficient kNN classification with different numbers of nearest neighbors , author=. IEEE transactions on neural networks and learning systems , volume=. 2017 , publisher=
work page 2017
-
[16]
Theory of Probability & Its Applications , volume=
On estimating regression , author=. Theory of Probability & Its Applications , volume=. 1964 , publisher=
work page 1964
-
[17]
Journal of Mathematical Imaging and Vision , volume=
Sliced and radon wasserstein barycenters of measures , author=. Journal of Mathematical Imaging and Vision , volume=. 2015 , publisher=
work page 2015
-
[18]
Averaging Weights Leads to Wider Optima and Better Generalization , author=. 2019 , eprint=
work page 2019
-
[19]
Uncertainty in artificial intelligence , pages=
Sliced score matching: A scalable approach to density and score estimation , author=. Uncertainty in artificial intelligence , pages=. 2020 , organization=
work page 2020
-
[20]
The t-digest: Efficient estimates of distributions , author=. Software Impacts , volume=. 2021 , publisher=
work page 2021
-
[21]
Computing Extremely Accurate Quantiles Using t-Digests
Computing extremely accurate quantiles using t-digests , author=. arXiv preprint arXiv:1902.04023 , year=
work page internal anchor Pith review Pith/arXiv arXiv 1902
-
[22]
arXiv preprint arXiv:1908.10693 , year=
Ddsketch: A fast and fully-mergeable quantile sketch with relative-error guarantees , author=. arXiv preprint arXiv:1908.10693 , year=
-
[23]
Comparison based sorting for systems with multiple GPUs , author=. Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units , pages=
-
[24]
Proceedings of the 2022 International Conference on Management of Data , pages=
Evaluating multi-GPU sorting with modern interconnects , author=. Proceedings of the 2022 International Conference on Management of Data , pages=
work page 2022
-
[25]
Advances in Neural Information Processing Systems , volume=
Energy-based sliced wasserstein distance , author=. Advances in Neural Information Processing Systems , volume=
-
[26]
Smooth regression analysis , author=. Sankhy. 1964 , publisher=
work page 1964
-
[27]
2019 international conference on intelligent computing and control systems (ICCS) , pages=
A brief review of nearest neighbor algorithm for learning and classification , author=. 2019 international conference on intelligent computing and control systems (ICCS) , pages=. 2019 , organization=
work page 2019
-
[28]
2010 seventh international conference on fuzzy systems and knowledge discovery , volume=
An adaptive k-nearest neighbor algorithm , author=. 2010 seventh international conference on fuzzy systems and knowledge discovery , volume=. 2010 , organization=
work page 2010
-
[29]
Pattern recognition and machine learning , author=. 2006 , publisher=
work page 2006
-
[30]
SIAM journal on matrix analysis and applications , volume=
Tikhonov regularization and total least squares , author=. SIAM journal on matrix analysis and applications , volume=. 1999 , publisher=
work page 1999
-
[31]
Training with noise is equivalent to Tikhonov regularization , author=. Neural computation , volume=. 1995 , publisher=
work page 1995
-
[32]
Effects of distance measure choice on k-nearest neighbor classifier performance: a review , author=. Big data , volume=. 2019 , publisher=
work page 2019
-
[33]
International Conference on Machine Learning , pages=
An alternative softmax operator for reinforcement learning , author=. International Conference on Machine Learning , pages=. 2017 , organization=
work page 2017
-
[34]
International Journal of Mathematical and Statistical Sciences , volume=
Nonparametric entropy estimation: An overview , author=. International Journal of Mathematical and Statistical Sciences , volume=. 1997 , publisher=
work page 1997
-
[35]
2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003
A new class of entropy estimators for multi-dimensional densities , author=. 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP'03). , volume=. 2003 , organization=
work page 2003
-
[36]
Annals of the Institute of Statistical Mathematics , volume=
Estimation of entropy and other functionals of a multivariate density , author=. Annals of the Institute of Statistical Mathematics , volume=. 1989 , publisher=
work page 1989
-
[37]
Density estimation for statistics and data analysis , author=. 2018 , publisher=
work page 2018
-
[38]
Vision Transformers Need Registers
Vision transformers need registers , author=. arXiv preprint arXiv:2309.16588 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[39]
Dinov3 , author=. arXiv preprint arXiv:2508.10104 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[40]
Advances in neural information processing systems , volume=
Bootstrap your own latent-a new approach to self-supervised learning , author=. Advances in neural information processing systems , volume=
-
[41]
arXiv preprint arXiv:2308.00566 , year=
Stochastic positional embeddings improve masked image modeling , author=. arXiv preprint arXiv:2308.00566 , year=
-
[42]
The Journal of Machine Learning Research , volume=
Hilbert space embeddings and metrics on probability measures , author=. The Journal of Machine Learning Research , volume=. 2010 , publisher=
work page 2010
-
[43]
International conference on machine learning , pages=
A kernel test of goodness of fit , author=. International conference on machine learning , pages=. 2016 , organization=
work page 2016
-
[44]
The journal of machine learning research , volume=
A kernel two-sample test , author=. The journal of machine learning research , volume=. 2012 , publisher=
work page 2012
-
[45]
A. N. Kolmogorov , title =. Giornale dell'Istituto Italiano degli Attuari , volume =
-
[46]
How to test normality and other distributional assumptions , author=. 1990 , publisher=
work page 1990
-
[47]
Asymptotic theory of certain" goodness of fit" criteria based on stochastic processes , author=. The annals of mathematical statistics , pages=. 1952 , publisher=
work page 1952
-
[48]
Scandinavian Actuarial Journal , volume=
On the composition of elementary errors: First paper: Mathematical deductions , author=. Scandinavian Actuarial Journal , volume=. 1928 , publisher=
work page 1928
-
[49]
Journal of the London Mathematical Society , volume=
Some theorems on distribution functions , author=. Journal of the London Mathematical Society , volume=. 1936 , publisher=
work page 1936
-
[50]
Advances in Neural Information Processing Systems , volume=
Implicit variance regularization in non-contrastive SSL , author=. Advances in Neural Information Processing Systems , volume=
-
[51]
DINOv2: Learning Robust Visual Features without Supervision
Dinov2: Learning robust visual features without supervision , author=. arXiv preprint arXiv:2304.07193 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[52]
arXiv preprint arXiv:2504.01017 , year=
Scaling language-free visual representation learning , author=. arXiv preprint arXiv:2504.01017 , year=
-
[53]
Advances in neural information processing systems , volume=
Big self-supervised models are strong semi-supervised learners , author=. Advances in neural information processing systems , volume=
-
[54]
Proceedings of the ieee/cvf International Conference on computer vision , pages=
Scaling and benchmarking self-supervised visual representation learning , author=. Proceedings of the ieee/cvf International Conference on computer vision , pages=
-
[55]
Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
-
[56]
arXiv preprint arXiv:2405.15613 , year=
Automatic data curation for self-supervised learning: A clustering-based approach , author=. arXiv preprint arXiv:2405.15613 , year=
-
[57]
arXiv preprint arXiv:2305.17326 , year=
Matrix information theory for self-supervised learning , author=. arXiv preprint arXiv:2305.17326 , year=
-
[58]
IEEE transactions on knowledge and data engineering , volume=
Self-supervised learning: Generative or contrastive , author=. IEEE transactions on knowledge and data engineering , volume=. 2021 , publisher=
work page 2021
-
[59]
arXiv preprint arXiv:2207.10081 , year=
What do we maximize in self-supervised learning? , author=. arXiv preprint arXiv:2207.10081 , year=
-
[60]
To compress or not to compress—self-supervised learning and information theory: A review , author=. Entropy , volume=. 2024 , publisher=
work page 2024
- [61]
-
[62]
Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
The inaturalist species classification and detection dataset , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
-
[63]
Proceedings of the National Academy of Sciences , volume=
Prevalence of neural collapse during the terminal phase of deep learning training , author=. Proceedings of the National Academy of Sciences , volume=. 2020 , publisher=
work page 2020
-
[64]
Advances in neural information processing systems , volume=
Semi-supervised learning with deep generative models , author=. Advances in neural information processing systems , volume=
-
[65]
Learning representations by back-propagating errors , author=. nature , volume=. 1986 , publisher=
work page 1986
- [66]
-
[67]
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Droid: A large-scale in-the-wild robot manipulation dataset , author=. arXiv preprint arXiv:2403.12945 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[68]
Proceedings of the IEEE/CVF international conference on computer vision , pages=
Emerging properties in self-supervised vision transformers , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=
-
[69]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
Self-supervised learning from images with a joint-embedding predictive architecture , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
-
[70]
Advances in Neural Information Processing Systems , volume=
How jepa avoids noisy features: The implicit bias of deep linear self distillation networks , author=. Advances in Neural Information Processing Systems , volume=
-
[71]
Journal of personality , volume=
On the perception of incongruity: A paradigm , author=. Journal of personality , volume=. 1949 , publisher=
work page 1949
- [72]
- [73]
-
[74]
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
On the importance of asymmetry for siamese representation learning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
-
[75]
Proceedings of the IEEE/CVF international conference on computer vision , pages=
An empirical study of training self-supervised vision transformers , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=
-
[76]
International Conference on Machine Learning , pages=
Understanding self-supervised learning dynamics without contrastive pairs , author=. International Conference on Machine Learning , pages=. 2021 , organization=
work page 2021
-
[77]
IEEE transactions on electronic computers , number=
Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition , author=. IEEE transactions on electronic computers , number=. 2006 , publisher=
work page 2006
-
[78]
arXiv preprint arXiv:2110.09348 , year=
Understanding dimensional collapse in contrastive self-supervised learning , author=. arXiv preprint arXiv:2110.09348 , year=
- [79]
-
[80]
Philosophical Transactions of the Royal Society of London
Perceptions as hypotheses , author=. Philosophical Transactions of the Royal Society of London. B, Biological Sciences , volume=. 1980 , publisher=
work page 1980
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.