UR-JEPA: Uniform Rectifiability as a Regularizer for Joint-Embedding Predictive Architectures

Triet M. Le

arxiv: 2606.01443 · v1 · pith:SCHYGDPKnew · submitted 2026-05-31 · 💻 cs.LG · cs.AI· cs.CV

UR-JEPA: Uniform Rectifiability as a Regularizer for Joint-Embedding Predictive Architectures

Triet M. Le This is my paper

Pith reviewed 2026-06-28 17:13 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CV

keywords uniform rectifiabilityjoint embedding predictive architecturerepresentation collapseself-supervised learningmanifold hypothesisPCA spectrum analysisCarleson square function

0 comments

The pith

Targeting uniform rectifiability in JEPA training produces embeddings concentrated on low-dimensional manifolds with comparable accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces UR-JEPA to address representation collapse in joint-embedding predictive architectures by targeting uniformly rectifiable measures instead of isotropic Gaussians. This choice aligns the regularization with the manifold hypothesis that data embeddings should lie on low-dimensional subsets. Experiments across several datasets show that UR-JEPA achieves similar or better accuracy than the prior LeJEPA method while exhibiting lower variance across random seeds. The learned embeddings under UR-JEPA display a pronounced drop in their PCA spectrum, indicating concentration in fewer dimensions, whereas the Gaussian approach yields a flatter spectrum. Readers interested in self-supervised representation learning would care because this provides a geometrically motivated way to prevent collapse without forcing full-dimensional isotropy.

Core claim

UR-JEPA targets a uniformly n-rectifiable measure of local tangent dimension n at small scales, realized through a Gaussian-kernel smoothed Carleson-type square function L^CGLT, with a complementary Jones β-number formulation. On Inet10, UR-JEPA(L^CGLT) attains 0.9141 ± 0.0014 for a +0.83 pp gain over LeJEPA(L^SIGReg) with ~30% lower seed standard deviation. On matched-recipe Galaxy10 SDSS, a single-seed ImageNet-100 run, and a 3-seed EuroSAT remote-sensing run, the two methods lie in the same peak-accuracy band at convergence, with UR-JEPA retaining its lower-seed-variance signature. The distinction is geometric: UR-JEPA(L^CGLT) produces a global PCA spectrum with a 4 to 5 order-of-magnitud

What carries the argument

Gaussian-kernel smoothed Carleson-type square function L^CGLT that targets a uniformly n-rectifiable measure of local tangent dimension n

If this is right

UR-JEPA achieves a small accuracy gain on Inet10 with substantially lower variance across seeds.
The embeddings exhibit a sharp PCA spectrum drop indicating low-dimensional concentration.
Per-dimension marginals are near-Gaussian for both regularizers as a consequence of the Diaconis-Freedman theorem.
Competitive performance holds on Galaxy10 SDSS, ImageNet-100, and EuroSAT with smaller backbones possible for remote sensing.
The two methods produce structurally distinct projected representations at matched accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Uniform rectifiability may provide a more natural target for self-supervised learning when data is assumed to lie on manifolds.
The reduced seed variance could lead to more reliable training in practice.
Alternative implementations using Jones β-numbers could be explored for computational efficiency.
This geometric regularization might apply to other predictive or contrastive learning setups to encourage manifold-like representations.

Load-bearing premise

The PCA spectral drop and variance reduction are caused by the L^CGLT loss enforcing uniform rectifiability rather than by unspecified differences in training procedure, hyperparameters, or data processing.

What would settle it

An experiment that exactly matches the training procedures, hyperparameters, and data processing for both UR-JEPA and LeJEPA and then checks whether the 4-5 order-of-magnitude PCA drop at index 20-25 and the 30% lower seed standard deviation still appear.

Figures

Figures reproduced from arXiv: 2606.01443 by Triet M. Le.

**Figure 2.** Figure 2: Per-epoch online linear-probe top-1 accuracy (left) and training-regularizer loss (right) [PITH_FULL_IMAGE:figures/full_fig_p019_2.png] view at source ↗

**Figure 3.** Figure 3: Inet100 single-seed training: linear-probe test accuracy (left) and probe loss (right) across [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗

**Figure 4.** Figure 4: Projector geometry diagnostics on Inet10 at the seed-0 matched-recipe checkpoints. Top [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗

**Figure 5.** Figure 5: Projector geometry diagnostics on Galaxy10 SDSS at the seed-0 matched-recipe check [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗

**Figure 6.** Figure 6: Projector geometry diagnostics on the Inet100 validation split at the seed-0 matched [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗

**Figure 7.** Figure 7: Six-way projector-geometry comparison on Galaxy10 SDSS at the seed-0 matched-recipe [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗

**Figure 8.** Figure 8: Projector-geometry overlay for the EuroSAT matched-recipe checkpoints: all 3 seeds [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗

read the original abstract

A central difficulty in training Joint-Embedding Predictive Architectures (JEPAs) is preventing representation collapse. LeJEPA addresses this by enforcing an isotropic Gaussian target on the embeddings via Sketched Isotropic Gaussian Regularization (SIGReg). This target is in tension with the manifold hypothesis, which expects embeddings to concentrate on a low-dimensional subset of the ambient space. We propose \emph{UR-JEPA}, which targets a uniformly $n$-rectifiable measure of local tangent dimension $n$ at small scales, realized through a Gaussian-kernel smoothed Carleson-type square function $\mathcal{L}^{\text{CGLT}}$, with a complementary Jones $\beta$-number formulation. On Inet10, UR-JEPA($\mathcal{L}^{\text{CGLT}}$) attains $0.9141 \pm 0.0014$ for a $+0.83$\,pp gain over LeJEPA($\mathcal{L}^{\text{SIGReg}}$) with $\sim 30\%$ lower seed standard deviation; on matched-recipe Galaxy10~SDSS, a single-seed ImageNet-$100$ run, and a $3$-seed EuroSAT remote-sensing run, the two methods lie in the same peak-accuracy band at convergence, with UR-JEPA retaining its lower-seed-variance signature. On EuroSAT the in-domain pair is competitive at $96.0$ to $96.1\%$ with large remote-sensing foundation-model transfer at a $25\times$ smaller backbone. The distinction is geometric: direct visualization of the projector output distribution shows that on all four datasets UR--JEPA($\mathcal{L}^{\text{CGLT}}$) produces a global PCA spectrum with a $4$ to $5$ order-of-magnitude drop at index $\sim 20$ to $25$ out of $D = 32$, while LeJEPA's spectrum is near-flat (top-to-bottom ratio at most $3.6$). Per-dimension marginals are simultaneously near-Gaussian for both methods (mean Shapiro-Wilk $W \in [0.992, 0.996]$) as a Diaconis-Freedman consequence. At matched accuracy the two regularizers therefore yield structurally distinct projected representations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

UR-JEPA adds a rectifiability regularizer to JEPA that produces lower seed variance and a sharper PCA spectrum, but the abstract leaves open whether training differences explain the geometric shift.

read the letter

UR-JEPA replaces the isotropic Gaussian target from LeJEPA with a new regularizer L^CGLT based on a Gaussian-smoothed Carleson square function and Jones beta numbers, aiming for uniformly n-rectifiable measures on the embeddings.

The formulation itself is new and not in the prior work. The paper documents two consistent observations: seed-to-seed variance drops by about 30 percent, and the global PCA spectrum of the projector outputs shows a 4-5 order-of-magnitude drop after dimension 20-25 (D=32), while the baseline stays nearly flat. Both methods keep near-Gaussian marginals per dimension. Those structural differences are the most concrete output.

Accuracy improvements are modest and dataset-dependent, appearing as a +0.83 pp gain only on Inet10 while the other three runs sit in the same performance band. The main soft spot is that the abstract flags matched recipes only for Galaxy10 SDSS, ImageNet-100, and EuroSAT; it does not make the same statement for the Inet10 result where the accuracy edge is reported. Without explicit confirmation that every hyperparameter, augmentation, and schedule is identical, the PCA drop and variance reduction cannot be isolated to the rectifiability target. The stress-test concern therefore stands on the information given.

This is for people working on JEPA-style self-supervised models who want to explore geometric constraints beyond isotropy. A reader focused on regularization design would get value from the spectral comparison even if the accuracy lift stays small.

The paper has a distinct idea and some empirical signal, so it deserves peer review to check the methods and ablations in full.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes UR-JEPA, which replaces the isotropic Gaussian target of LeJEPA (via SIGReg) with a uniformly n-rectifiable target realized by the Gaussian-kernel smoothed Carleson square function loss L^CGLT (and a complementary Jones beta-number formulation). It reports that UR-JEPA(L^CGLT) achieves a +0.83 pp accuracy gain on Inet10 (0.9141 ± 0.0014) with ~30% lower seed standard deviation, lies in the same accuracy band as LeJEPA on matched-recipe runs of Galaxy10 SDSS, ImageNet-100, and EuroSAT, and produces projector outputs whose global PCA spectrum exhibits a 4-5 order-of-magnitude drop at index ~20-25 (D=32) while LeJEPA spectra remain near-flat; both yield near-Gaussian marginals.

Significance. If the PCA spectral collapse and variance reduction can be isolated to the rectifiability regularizer, the work supplies a geometrically principled alternative to isotropic regularization that aligns with the manifold hypothesis while preserving the Diaconis-Freedman near-Gaussian marginal property. The explicit grounding in Gaussian-smoothed Carleson and Jones beta objects from geometric measure theory is a methodological strength, as is the consistent lower seed variance across four datasets.

major comments (2)

[Abstract] Abstract: The Inet10 accuracy and variance results are presented without the 'matched-recipe' qualifier that is explicitly attached to the Galaxy10 SDSS, ImageNet-100, and EuroSAT runs. Because the central claim attributes the 4-5 order-of-magnitude PCA drop and ~30% seed-std reduction to L^CGLT enforcing uniform rectifiability, the absence of explicit confirmation that every hyperparameter, augmentation, optimizer schedule, and data pipeline is identical on Inet10 leaves the attribution open to the alternative explanation of uncontrolled procedural differences.
[Abstract] Abstract (and experimental section): No statistical tests on the seed variances, no ablation removing only the L^CGLT term while holding all other factors fixed, and no verification that the PCA spectra were computed on identically trained models are reported. These omissions are load-bearing for the claim that the observed geometric distinction (top-to-bottom ratio 4-5 orders vs. at most 3.6) is caused by the rectifiability target rather than by other factors.

minor comments (1)

[Abstract] The notation L^CGLT and L^SIGReg is introduced without an explicit equation reference in the abstract; a pointer to the defining equations would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments regarding clarity and statistical support in our experimental claims. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The Inet10 accuracy and variance results are presented without the 'matched-recipe' qualifier that is explicitly attached to the Galaxy10 SDSS, ImageNet-100, and EuroSAT runs. Because the central claim attributes the 4-5 order-of-magnitude PCA drop and ~30% seed-std reduction to L^CGLT enforcing uniform rectifiability, the absence of explicit confirmation that every hyperparameter, augmentation, optimizer schedule, and data pipeline is identical on Inet10 leaves the attribution open to the alternative explanation of uncontrolled procedural differences.

Authors: We agree that consistency in qualifiers improves clarity. The Inet10 experiments followed the identical matched-recipe protocol (same hyperparameters, augmentations, optimizer schedule, and data pipeline) as the other datasets, with the sole difference being the regularization term. We will revise the abstract to attach the 'matched-recipe' qualifier to the Inet10 results, thereby making the attribution to the rectifiability target explicit and uniform across all experiments. revision: yes
Referee: [Abstract] Abstract (and experimental section): No statistical tests on the seed variances, no ablation removing only the L^CGLT term while holding all other factors fixed, and no verification that the PCA spectra were computed on identically trained models are reported. These omissions are load-bearing for the claim that the observed geometric distinction (top-to-bottom ratio 4-5 orders vs. at most 3.6) is caused by the rectifiability target rather than by other factors.

Authors: We will incorporate formal statistical tests (e.g., Levene's test) on the reported seed variances in the revised experimental section. The direct head-to-head comparison of LeJEPA($\mathcal{L}^{\text{SIGReg}}$) versus UR-JEPA($\mathcal{L}^{\text{CGLT}}$) with every other factor held fixed already functions as the requested ablation isolating the rectifiability regularizer. We will add an explicit verification statement confirming that all PCA spectra were computed on the projector outputs of models trained under these matched conditions. These textual and statistical additions will be included in the revision. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper defines the core L^CGLT loss directly from standard geometric measure theory primitives (Gaussian-smoothed Carleson square function and Jones beta numbers) applied to projector outputs, without any reduction to fitted parameters, self-referential equations, or load-bearing self-citations. Empirical claims (accuracy gains, PCA spectral drop, variance reduction) are presented as observed outcomes on specific datasets rather than predictions forced by construction from the inputs. No ansatz smuggling, renaming of known results, or uniqueness theorems imported from prior author work appear in the provided text. The central geometric distinction is therefore independent of the reported measurements.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the effectiveness of the newly proposed L^CGLT loss in producing uniformly rectifiable embeddings; the main addition is the application of Carleson square function and Jones beta numbers as regularizers, with the manifold hypothesis serving as background motivation.

axioms (1)

domain assumption The manifold hypothesis applies to learned embeddings in JEPA models
Invoked to motivate moving from isotropic Gaussian to rectifiable target measure.

invented entities (1)

L^CGLT regularizer no independent evidence
purpose: Enforce uniformly n-rectifiable measure on embeddings via smoothed Carleson square function
Newly introduced loss term realizing the uniform rectifiability target.

pith-pipeline@v0.9.1-grok · 5955 in / 1682 out tokens · 42542 ms · 2026-06-28T17:13:00.764916+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 24 canonical work pages · 8 internal anchors

[1]

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

R. Balestriero and Y. LeCun,LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics, arXiv:2511.08544, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Assran, Q

M. Assran, Q. Duval, I. Misra, P. Bojanowski, P. Vincent, M. Rabbat, Y. LeCun, and N. Ballas, Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture, ICCV 2023; arXiv:2301.08243

work page arXiv 2023
[3]

Revisiting Feature Prediction for Learning Visual Representations from Video

A. Bardes, Q. Garrido, J. Ponce, X. Chen, M. Rabbat, Y. LeCun, M. Assran, and N. Ballas,Re- visiting Feature Prediction for Learning Visual Representations from Video, arXiv:2404.08471, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[4]

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

M. Assran, A. Bardes, D. Fan, Q. Garrido et al.,V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction, and Planning, arXiv:2506.09985, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[5]

Bardes, J

A. Bardes, J. Ponce, and Y. LeCun,MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features, arXiv:2307.12698, 2023

work page arXiv 2023
[6]

D. Chen, M. Shukor, T. Moutakanni, W. Chung, J. Yu, T. Kasarla, Y. Bang, A. Bolourchi, Y. LeCun, and P. Fung,VL-JEPA: Joint-Embedding Predictive Architecture for Vision– Language, arXiv:2512.10942, 2025

work page arXiv 2025
[7]

T. Chen, S. Kornblith, M. Norouzi, and G. Hinton,A Simple Framework for Contrastive Learn- ing of Visual Representations, ICML 2020; arXiv:2002.05709

work page internal anchor Pith review Pith/arXiv arXiv 2020
[8]

K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick,Momentum Contrast for Unsupervised Visual Representation Learning, CVPR 2020; arXiv:1911.05722

work page arXiv 2020
[9]

Grill, F

J.-B. Grill, F. Strub, F. Altch´ e et al.,Bootstrap Your Own Latent: A New Approach to Self- Supervised Learning, NeurIPS 2020; arXiv:2006.07733

work page arXiv 2020
[10]

Chen and K

X. Chen and K. He,Exploring Simple Siamese Representation Learning, CVPR 2021; arXiv:2011.10566

work page arXiv 2021
[11]

Emerging Properties in Self-Supervised Vision Transformers

M. Caron, H. Touvron, I. Misra, H. J´ egou, J. Mairal, P. Bojanowski, and A. Joulin,Emerging Properties in Self-Supervised Vision Transformers, ICCV 2021; arXiv:2104.14294

work page internal anchor Pith review Pith/arXiv arXiv 2021
[12]

DINOv2: Learning Robust Visual Features without Supervision

M. Oquab, T. Darcet et al.,DINOv2: Learning Robust Visual Features without Supervision, TMLR 2023; arXiv:2304.07193. 1https://access-ci.org/ 2https://www.rcac.purdue.edu/anvil 32

work page internal anchor Pith review Pith/arXiv arXiv 2023
[13]

Zbontar, L

J. Zbontar, L. Jing, I. Misra, Y. LeCun, and S. Deny,Barlow Twins: Self-Supervised Learning via Redundancy Reduction, ICML 2021; arXiv:2103.03230

work page arXiv 2021
[14]

VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

A. Bardes, J. Ponce, and Y. LeCun,VICReg: Variance–Invariance–Covariance Regularization for Self-Supervised Learning, ICLR 2022; arXiv:2105.04906

work page internal anchor Pith review Pith/arXiv arXiv 2022
[15]

Ermolov, A

A. Ermolov, A. Siarohin, E. Sangineto, and N. Sebe,Whitening for Self-Supervised Represen- tation Learning, ICML 2021; arXiv:2007.06346

work page arXiv 2021
[16]

Wang and P

T. Wang and P. Isola,Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere, ICML 2020; arXiv:2005.10242

work page arXiv 2020
[17]

Yerxa, Y

T. Yerxa, Y. Kuang, E. Simoncelli, and S. Chung,Learning Efficient Coding of Natural Images with Maximum Manifold Capacity Representations, NeurIPS 2023; arXiv:2303.03307

work page arXiv 2023
[18]

L. Jing, P. Vincent, Y. LeCun, and Y. Tian,Understanding Dimensional Collapse in Con- trastive Self-Supervised Learning, ICLR 2022; arXiv:2110.09348

work page arXiv 2022
[19]

Y. Tian, X. Chen, and S. Ganguli,Understanding Self-Supervised Learning Dynamics without Contrastive Pairs, ICML 2021; arXiv:2102.06810

work page arXiv 2021
[20]

Zhang, K

C. Zhang, K. Zhang, T. X. Pham, A. Niu, Z. Qiao, C. D. Yoo, and I. S. Kweon,How Does Sim- Siam Avoid Collapse Without Negative Samples? A Unified Understanding with Self-Supervised Contrastive Learning, ICLR 2022; arXiv:2203.16262

work page arXiv 2022
[21]

P. Pope, C. Zhu, A. Abdelkader, M. Goldblum, and T. Goldstein,The Intrinsic Dimension of Images and Its Impact on Learning, ICLR 2021; arXiv:2104.08894

work page arXiv 2021
[22]

Ansuini, A

A. Ansuini, A. Laio, J. H. Macke, and D. Zoccolan,Intrinsic Dimension of Data Representa- tions in Deep Neural Networks, NeurIPS 2019; arXiv:1905.12784

work page arXiv 2019
[23]

Facco, M

E. Facco, M. d’Errico, A. Rodriguez, and A. Laio,Estimating the Intrinsic Dimension of Datasets by a Minimal Neighborhood Information, Sci. Rep.7, 12140 (2017)

2017
[24]

Square functions and uniform rectifiability

V. Chousionis, J. Garnett, T. Le, and X. Tolsa,Square functions and uniform rectifiability, arXiv:1401.3382, 2014; Trans. Amer. Math. Soc.368(2016), no. 8, 6063–6102

work page internal anchor Pith review Pith/arXiv arXiv 2014
[25]

P. W. Jones,Rectifiable sets and the traveling salesman problem, Invent. Math.102(1990), no. 1, 1–15

1990
[26]

G. Lerman,How to partition a low-dimensional data set into disjoint clusters of different geometric structures, Workshop on Clustering High-Dimensional Data and its Applications, SIAM International Conference on Data Mining, Arlington, VA, 2002.https://www-users. cse.umn.edu/~lerman/reports/geo_clust.pdf

2002
[27]

Lerman,Quantifying curvelike structures of measures by usingL 2 Jones quantities, Comm

G. Lerman,Quantifying curvelike structures of measures by usingL 2 Jones quantities, Comm. Pure Appl. Math.56(2003), no. 9, 1294–1365

2003
[28]

Tolsa,Characterization ofn-rectifiability in terms of Jones’ square function: Part I, Calc

X. Tolsa,Characterization ofn-rectifiability in terms of Jones’ square function: Part I, Calc. Var. Partial Differential Equations54(2015), no. 4, 3643–3665

2015
[29]

Azzam and X

J. Azzam and X. Tolsa,Characterization ofn-rectifiability in terms of Jones’ square function: Part II, Geom. Funct. Anal.25(2015), no. 5, 1371–1412. 33

2015
[30]

Martikainen and T

H. Martikainen and T. Orponen,Boundedness of the density-normalized Jones’ square function does not imply1-rectifiability, J. Math. Pures Appl. (9)110(2018), 71–92

2018
[31]

Helber, B

P. Helber, B. Bischke, A. Dengel, and D. Borth,EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.12(2019), no. 7, 2217–2226

2019
[32]

Corley, C

I. Corley, C. Robinson, A. Ortiz, and J. Lavista Ferres,Revisiting Pre-trained Remote Sensing Model Benchmarks: Resizing and Normalization Matters, CVPR 2024 Workshop on Perception Beyond the Visible Spectrum (PBVS), 2024

2024
[33]

David and S

G. David and S. Semmes,Analysis of and on Uniformly Rectifiable Sets, Math. Surveys Monogr. 38, AMS, 1993

1993
[34]

Pajot,Analytic Capacity, Rectifiability, Menger Curvature and the Cauchy Integral, Lecture Notes in Math

H. Pajot,Analytic Capacity, Rectifiability, Menger Curvature and the Cauchy Integral, Lecture Notes in Math. 1799, Springer, 2002

2002
[35]

Tolsa,Analytic Capacity, the Cauchy Transform, and Non-homogeneous Calder´ on– Zygmund Theory, Progress in Math

X. Tolsa,Analytic Capacity, the Cauchy Transform, and Non-homogeneous Calder´ on– Zygmund Theory, Progress in Math. 307, Birkh¨ auser, 2014

2014
[36]

Le Cam,An approximation theorem for the Poisson binomial distribution, Pacific J

L. Le Cam,An approximation theorem for the Poisson binomial distribution, Pacific J. Math. 10(1960), 1181–1197

1960
[37]

Davis and W

C. Davis and W. M. Kahan,The rotation of eigenvectors by a perturbation. III, SIAM J. Numer. Anal.7(1970), 1–46

1970
[38]

Y. Yu, T. Wang, and R. J. Samworth,A useful variant of the Davis–Kahan theorem for statisticians, Biometrika102(2015), no. 2, 315–323

2015
[39]

Diaconis and D

P. Diaconis and D. Freedman,Asymptotics of graphical projection pursuit, Ann. Statist.12 (1984), no. 3, 793–815

1984
[40]

S. S. Shapiro and M. B. Wilk,An analysis of variance test for normality (complete samples), Biometrika52(1965), no. 3/4, 591–611

1965
[41]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei,ImageNet: A Large-Scale Hier- archical Image Database, CVPR 2009

2009
[42]

Howard,Imagenette: A smaller subset of 10 easily classified classes from ImageNet, GitHub, 2019.https://github.com/fastai/imagenette

J. Howard,Imagenette: A smaller subset of 10 easily classified classes from ImageNet, GitHub, 2019.https://github.com/fastai/imagenette

2019
[43]

Y. Tian, D. Krishnan, and P. Isola,Contrastive Multiview Coding, ECCV 2020; arXiv:1906.05849

work page arXiv 2020
[44]

H. W. Leung and J. Bovy,Galaxy10 SDSS Dataset, astroNN documentation, 2018.https: //astronn.readthedocs.io/en/latest/galaxy10sdss.html

2018
[45]

C. J. Lintott, K. Schawinski, A. Slosar, K. Land, S. Bamford, D. Thomas, M. J. Raddick, R. C. Nichol, A. Szalay, D. Andreescu, P. Murray, and J. Vandenberg,Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey, Mon. Not. R. Astron. Soc.389(2008), no. 3, 1179–1189. 34

2008

[1] [1]

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

R. Balestriero and Y. LeCun,LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics, arXiv:2511.08544, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[2] [2]

Assran, Q

M. Assran, Q. Duval, I. Misra, P. Bojanowski, P. Vincent, M. Rabbat, Y. LeCun, and N. Ballas, Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture, ICCV 2023; arXiv:2301.08243

work page arXiv 2023

[3] [3]

Revisiting Feature Prediction for Learning Visual Representations from Video

A. Bardes, Q. Garrido, J. Ponce, X. Chen, M. Rabbat, Y. LeCun, M. Assran, and N. Ballas,Re- visiting Feature Prediction for Learning Visual Representations from Video, arXiv:2404.08471, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[4] [4]

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

M. Assran, A. Bardes, D. Fan, Q. Garrido et al.,V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction, and Planning, arXiv:2506.09985, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[5] [5]

Bardes, J

A. Bardes, J. Ponce, and Y. LeCun,MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features, arXiv:2307.12698, 2023

work page arXiv 2023

[6] [6]

D. Chen, M. Shukor, T. Moutakanni, W. Chung, J. Yu, T. Kasarla, Y. Bang, A. Bolourchi, Y. LeCun, and P. Fung,VL-JEPA: Joint-Embedding Predictive Architecture for Vision– Language, arXiv:2512.10942, 2025

work page arXiv 2025

[7] [7]

T. Chen, S. Kornblith, M. Norouzi, and G. Hinton,A Simple Framework for Contrastive Learn- ing of Visual Representations, ICML 2020; arXiv:2002.05709

work page internal anchor Pith review Pith/arXiv arXiv 2020

[8] [8]

K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick,Momentum Contrast for Unsupervised Visual Representation Learning, CVPR 2020; arXiv:1911.05722

work page arXiv 2020

[9] [9]

Grill, F

J.-B. Grill, F. Strub, F. Altch´ e et al.,Bootstrap Your Own Latent: A New Approach to Self- Supervised Learning, NeurIPS 2020; arXiv:2006.07733

work page arXiv 2020

[10] [10]

Chen and K

X. Chen and K. He,Exploring Simple Siamese Representation Learning, CVPR 2021; arXiv:2011.10566

work page arXiv 2021

[11] [11]

Emerging Properties in Self-Supervised Vision Transformers

M. Caron, H. Touvron, I. Misra, H. J´ egou, J. Mairal, P. Bojanowski, and A. Joulin,Emerging Properties in Self-Supervised Vision Transformers, ICCV 2021; arXiv:2104.14294

work page internal anchor Pith review Pith/arXiv arXiv 2021

[12] [12]

DINOv2: Learning Robust Visual Features without Supervision

M. Oquab, T. Darcet et al.,DINOv2: Learning Robust Visual Features without Supervision, TMLR 2023; arXiv:2304.07193. 1https://access-ci.org/ 2https://www.rcac.purdue.edu/anvil 32

work page internal anchor Pith review Pith/arXiv arXiv 2023

[13] [13]

Zbontar, L

J. Zbontar, L. Jing, I. Misra, Y. LeCun, and S. Deny,Barlow Twins: Self-Supervised Learning via Redundancy Reduction, ICML 2021; arXiv:2103.03230

work page arXiv 2021

[14] [14]

VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

A. Bardes, J. Ponce, and Y. LeCun,VICReg: Variance–Invariance–Covariance Regularization for Self-Supervised Learning, ICLR 2022; arXiv:2105.04906

work page internal anchor Pith review Pith/arXiv arXiv 2022

[15] [15]

Ermolov, A

A. Ermolov, A. Siarohin, E. Sangineto, and N. Sebe,Whitening for Self-Supervised Represen- tation Learning, ICML 2021; arXiv:2007.06346

work page arXiv 2021

[16] [16]

Wang and P

T. Wang and P. Isola,Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere, ICML 2020; arXiv:2005.10242

work page arXiv 2020

[17] [17]

Yerxa, Y

T. Yerxa, Y. Kuang, E. Simoncelli, and S. Chung,Learning Efficient Coding of Natural Images with Maximum Manifold Capacity Representations, NeurIPS 2023; arXiv:2303.03307

work page arXiv 2023

[18] [18]

L. Jing, P. Vincent, Y. LeCun, and Y. Tian,Understanding Dimensional Collapse in Con- trastive Self-Supervised Learning, ICLR 2022; arXiv:2110.09348

work page arXiv 2022

[19] [19]

Y. Tian, X. Chen, and S. Ganguli,Understanding Self-Supervised Learning Dynamics without Contrastive Pairs, ICML 2021; arXiv:2102.06810

work page arXiv 2021

[20] [20]

Zhang, K

C. Zhang, K. Zhang, T. X. Pham, A. Niu, Z. Qiao, C. D. Yoo, and I. S. Kweon,How Does Sim- Siam Avoid Collapse Without Negative Samples? A Unified Understanding with Self-Supervised Contrastive Learning, ICLR 2022; arXiv:2203.16262

work page arXiv 2022

[21] [21]

P. Pope, C. Zhu, A. Abdelkader, M. Goldblum, and T. Goldstein,The Intrinsic Dimension of Images and Its Impact on Learning, ICLR 2021; arXiv:2104.08894

work page arXiv 2021

[22] [22]

Ansuini, A

A. Ansuini, A. Laio, J. H. Macke, and D. Zoccolan,Intrinsic Dimension of Data Representa- tions in Deep Neural Networks, NeurIPS 2019; arXiv:1905.12784

work page arXiv 2019

[23] [23]

Facco, M

E. Facco, M. d’Errico, A. Rodriguez, and A. Laio,Estimating the Intrinsic Dimension of Datasets by a Minimal Neighborhood Information, Sci. Rep.7, 12140 (2017)

2017

[24] [24]

Square functions and uniform rectifiability

V. Chousionis, J. Garnett, T. Le, and X. Tolsa,Square functions and uniform rectifiability, arXiv:1401.3382, 2014; Trans. Amer. Math. Soc.368(2016), no. 8, 6063–6102

work page internal anchor Pith review Pith/arXiv arXiv 2014

[25] [25]

P. W. Jones,Rectifiable sets and the traveling salesman problem, Invent. Math.102(1990), no. 1, 1–15

1990

[26] [26]

G. Lerman,How to partition a low-dimensional data set into disjoint clusters of different geometric structures, Workshop on Clustering High-Dimensional Data and its Applications, SIAM International Conference on Data Mining, Arlington, VA, 2002.https://www-users. cse.umn.edu/~lerman/reports/geo_clust.pdf

2002

[27] [27]

Lerman,Quantifying curvelike structures of measures by usingL 2 Jones quantities, Comm

G. Lerman,Quantifying curvelike structures of measures by usingL 2 Jones quantities, Comm. Pure Appl. Math.56(2003), no. 9, 1294–1365

2003

[28] [28]

Tolsa,Characterization ofn-rectifiability in terms of Jones’ square function: Part I, Calc

X. Tolsa,Characterization ofn-rectifiability in terms of Jones’ square function: Part I, Calc. Var. Partial Differential Equations54(2015), no. 4, 3643–3665

2015

[29] [29]

Azzam and X

J. Azzam and X. Tolsa,Characterization ofn-rectifiability in terms of Jones’ square function: Part II, Geom. Funct. Anal.25(2015), no. 5, 1371–1412. 33

2015

[30] [30]

Martikainen and T

H. Martikainen and T. Orponen,Boundedness of the density-normalized Jones’ square function does not imply1-rectifiability, J. Math. Pures Appl. (9)110(2018), 71–92

2018

[31] [31]

Helber, B

P. Helber, B. Bischke, A. Dengel, and D. Borth,EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.12(2019), no. 7, 2217–2226

2019

[32] [32]

Corley, C

I. Corley, C. Robinson, A. Ortiz, and J. Lavista Ferres,Revisiting Pre-trained Remote Sensing Model Benchmarks: Resizing and Normalization Matters, CVPR 2024 Workshop on Perception Beyond the Visible Spectrum (PBVS), 2024

2024

[33] [33]

David and S

G. David and S. Semmes,Analysis of and on Uniformly Rectifiable Sets, Math. Surveys Monogr. 38, AMS, 1993

1993

[34] [34]

Pajot,Analytic Capacity, Rectifiability, Menger Curvature and the Cauchy Integral, Lecture Notes in Math

H. Pajot,Analytic Capacity, Rectifiability, Menger Curvature and the Cauchy Integral, Lecture Notes in Math. 1799, Springer, 2002

2002

[35] [35]

Tolsa,Analytic Capacity, the Cauchy Transform, and Non-homogeneous Calder´ on– Zygmund Theory, Progress in Math

X. Tolsa,Analytic Capacity, the Cauchy Transform, and Non-homogeneous Calder´ on– Zygmund Theory, Progress in Math. 307, Birkh¨ auser, 2014

2014

[36] [36]

Le Cam,An approximation theorem for the Poisson binomial distribution, Pacific J

L. Le Cam,An approximation theorem for the Poisson binomial distribution, Pacific J. Math. 10(1960), 1181–1197

1960

[37] [37]

Davis and W

C. Davis and W. M. Kahan,The rotation of eigenvectors by a perturbation. III, SIAM J. Numer. Anal.7(1970), 1–46

1970

[38] [38]

Y. Yu, T. Wang, and R. J. Samworth,A useful variant of the Davis–Kahan theorem for statisticians, Biometrika102(2015), no. 2, 315–323

2015

[39] [39]

Diaconis and D

P. Diaconis and D. Freedman,Asymptotics of graphical projection pursuit, Ann. Statist.12 (1984), no. 3, 793–815

1984

[40] [40]

S. S. Shapiro and M. B. Wilk,An analysis of variance test for normality (complete samples), Biometrika52(1965), no. 3/4, 591–611

1965

[41] [41]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei,ImageNet: A Large-Scale Hier- archical Image Database, CVPR 2009

2009

[42] [42]

Howard,Imagenette: A smaller subset of 10 easily classified classes from ImageNet, GitHub, 2019.https://github.com/fastai/imagenette

J. Howard,Imagenette: A smaller subset of 10 easily classified classes from ImageNet, GitHub, 2019.https://github.com/fastai/imagenette

2019

[43] [43]

Y. Tian, D. Krishnan, and P. Isola,Contrastive Multiview Coding, ECCV 2020; arXiv:1906.05849

work page arXiv 2020

[44] [44]

H. W. Leung and J. Bovy,Galaxy10 SDSS Dataset, astroNN documentation, 2018.https: //astronn.readthedocs.io/en/latest/galaxy10sdss.html

2018

[45] [45]

C. J. Lintott, K. Schawinski, A. Slosar, K. Land, S. Bamford, D. Thomas, M. J. Raddick, R. C. Nichol, A. Szalay, D. Andreescu, P. Murray, and J. Vandenberg,Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey, Mon. Not. R. Astron. Soc.389(2008), no. 3, 1179–1189. 34

2008