pith. machine review for the scientific record. sign in

arxiv: 2604.21046 · v2 · submitted 2026-04-22 · 💻 cs.LG

Recognition: unknown

JEPAMatch: Geometric Representation Shaping for Semi-Supervised Learning

Authors on Pith no claims yet

Pith reviewed 2026-05-10 00:35 UTC · model grok-4.3

classification 💻 cs.LG
keywords semi-supervised learningrepresentation shapingpseudo-labelinglatent space regularizationisotropic Gaussianimage classificationFixMatchgeometric priors
0
0 comments X

The pith

Regularizing latent representations to isotropic Gaussians alongside pseudo-labeling boosts accuracy and speeds convergence in semi-supervised image classification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to fix two problems in FixMatch-style semi-supervised methods: majority classes dominating due to incorrect pseudo-labels and noisy early labels forcing long training before clear decision boundaries form. It does so by adding a latent-space regularization term, drawn from the principle that useful representations should form an isotropic Gaussian structure, to the loss of FlexMatch. A reader would care if this produces better-structured representations that improve results on scarce-label image tasks while cutting total training compute. The approach keeps the original pseudo-labeling machinery intact and simply shapes the geometry of the learned features.

Core claim

The central claim is that augmenting the adaptive pseudo-labeling loss of FlexMatch with a latent regularization term that enforces isotropic Gaussian structure in the representation space yields well-structured features, higher classification accuracy, and markedly faster convergence than standard FixMatch-based pipelines on image benchmarks.

What carries the argument

The JEPAMatch objective, formed by combining FlexMatch's semi-supervised loss with a LeJEPA-derived term that regularizes the latent space toward an isotropic Gaussian distribution.

If this is right

  • The combined objective consistently produces higher accuracy than existing baselines on CIFAR-100, STL-10, and Tiny-ImageNet.
  • Convergence occurs in fewer epochs than standard FixMatch pipelines, directly lowering total computational cost.
  • The method reduces dominance by majority classes and mitigates the effect of noisy early pseudo-labels while retaining pseudo-labeling benefits.
  • Representations remain compatible with the original adaptive threshold mechanisms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same regularization term could be tested inside other pseudo-labeling frameworks to check whether geometric shaping is broadly additive.
  • Measuring the actual deviation from isotropy in the learned latents after training would provide a direct diagnostic for whether the claimed structure is achieved.
  • If the acceleration holds, the method could be applied to larger-scale unlabeled image collections where compute savings matter most.

Load-bearing premise

That enforcing an isotropic Gaussian structure in latent space will be compatible with confidence-based pseudo-labeling dynamics and will improve representation quality without new failure modes or offsetting hyper-parameter costs.

What would settle it

Running the same experiments on CIFAR-100, STL-10, and Tiny-ImageNet and finding that JEPAMatch matches or underperforms FlexMatch in final accuracy or shows no reduction in epochs or wall-clock time to convergence would falsify the claim.

Figures

Figures reproduced from arXiv: 2604.21046 by Ali Aghababaei-Harandi, Aude Sportisse, Massih-Reza Amini.

Figure 1
Figure 1. Figure 1: Multi-view Augmentation Strategy for JEPAMatch. The Original image is pro￾cessed into global context views (Weak and Strong) and structurally isolated patches (Local views) to enforce geometric consistency. 4.1 Curriculum Level: Dynamic Pseudo-Labeling At the Curriculum Level, JEPAMatch is supervised using both ground-truth labels and high-confidence pseudo-labels generated by the classifier head hϕ. For l… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the JEPAMatch architecture. The backbone encoder ( [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Convergence speed and evolution of accuracy of JEPAMatch and Flex [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of pseudo-labeling accuracy between JEPAMatch and Flex [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Effect of different hyper-parameters λrep and β on CIFAR-100 with 10000 labeled images. The best performance is achieved with λrep = 0.5 and β = 0.1. their true labels. JEPAMatch consistently maintains more pseudo-labels passing the threshold while also achieving higher correctness, allowing the network to leverage a larger portion of unlabeled samples for longer during training. This results in a stronger… view at source ↗
Figure 6
Figure 6. Figure 6: Data Utilization in Pseudo Labeling on CIFAR-100 with 400 labeled im [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Maximum class count evolution. complexity and number of classes across our benchmarks, the final output di￾mension is set to 128 for CIFAR-100 and STL-10, and increased to 256 for Tiny-ImageNet [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
read the original abstract

Semi-supervised learning has emerged as a powerful paradigm for leveraging large amounts of unlabeled data to improve the performance of machine learning models when labeled data are scarce. Among existing approaches, methods derived from FixMatch have achieved state-of-the-art results in image classification by combining weak and strong data augmentations with confidence-based pseudo-labeling. Despite their strong empirical performance, these methods typically struggle with two critical bottlenecks: majority classes tend to dominate the learning process, which is amplified by incorrect pseudo-labels, leading to biased models. Furthermore, noisy early pseudo-labels prevent the model from forming clear decision boundaries, requiring prolonged training to learn informative representation. In this paper, we introduce a paradigm shift from conventional logical output threshold base, toward an explicit shaping of geometric representations. Our approach is inspired by the recently proposed Latent-Euclidean Joint-Embedding Predictive Architectures (LeJEPA), a theoretically grounded framework asserting that meaningful representations should exhibit an isotropic Gaussian structure in latent space. Building on this principle, we propose a new training objective that combines the classical semi-supervised loss used in FlexMatch, an adaptive extension of FixMatch, with a latent-space regularization term derived from LeJEPA. Our proposed approach, encourages well-structured representations while preserving the advantages of pseudo-labeling strategies. Through extensive experiments on CIFAR-100, STL-10 and Tiny-ImageNet, we demonstrate that the proposed method consistently outperforms existing baselines. In addition, our method significantly accelerates the convergence, drastically reducing the overall computational cost compared to standard FixMatch-based pipelines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes JEPAMatch, a semi-supervised image classification method that augments the FlexMatch objective (adaptive pseudo-labeling with weak/strong augmentations) with a latent-space regularization term drawn from LeJEPA to enforce isotropic Gaussian structure in the learned representations. The central claim is that this geometric shaping mitigates majority-class bias and noisy early pseudo-labels, yielding consistent accuracy gains and substantially faster convergence on CIFAR-100, STL-10, and Tiny-ImageNet relative to FixMatch-based baselines.

Significance. If the empirical results are reproducible and the regularization term proves compatible with adaptive thresholding, the work would offer a practical route to more efficient SSL by importing a theoretically motivated geometric prior. The explicit focus on representation geometry rather than output-space heuristics is a clear conceptual contribution, though its value hinges on whether the added term delivers gains beyond what careful hyper-parameter tuning of existing methods already achieves.

major comments (3)
  1. [§3] §3 (Method): The combined training objective is described only at a high level; the explicit mathematical form of the LeJEPA regularization term, its scaling coefficient relative to the FlexMatch loss, and any scheduling or annealing schedule are not provided. Without these details it is impossible to determine whether the reported accuracy and convergence improvements are attributable to the isotropic-Gaussian prior or to an additional tuned hyper-parameter.
  2. [§4] §4 (Experiments): No ablation isolating the LeJEPA term is reported, nor is any analysis given of how the fixed isotropic-Gaussian constraint interacts with FlexMatch’s adaptive confidence threshold. This omission is load-bearing because the skeptic’s concern—that the regularization may conflict with pseudo-label selection dynamics when labels are still noisy—cannot be evaluated from the given results.
  3. [§4] §4 (Experiments): The convergence-acceleration claim lacks quantitative support such as epochs-to-target-accuracy curves, wall-clock times, or statistical significance across multiple random seeds. The abstract’s assertion of “drastically reducing the overall computational cost” therefore rests on qualitative statements rather than verifiable measurements.
minor comments (1)
  1. [Abstract] The abstract states that the method “consistently outperforms existing baselines” but does not list the precise baselines, the labeled-data regimes (e.g., 4 labels per class), or the number of runs used for the reported numbers.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive report. The comments highlight important areas where additional detail and analysis will strengthen the manuscript. We address each major comment below and commit to incorporating the requested clarifications and experiments in the revised version.

read point-by-point responses
  1. Referee: [§3] §3 (Method): The combined training objective is described only at a high level; the explicit mathematical form of the LeJEPA regularization term, its scaling coefficient relative to the FlexMatch loss, and any scheduling or annealing schedule are not provided. Without these details it is impossible to determine whether the reported accuracy and convergence improvements are attributable to the isotropic-Gaussian prior or to an additional tuned hyper-parameter.

    Authors: We agree that the original submission presented the combined objective at a high level. In the revised manuscript we will explicitly state the full training objective as L = L_FlexMatch + λ L_LeJEPA, where L_LeJEPA is the negative log-likelihood of the latent representations under an isotropic unit Gaussian prior (i.e., (1/2)‖z‖² + const). The coefficient λ is fixed at 0.1 throughout training with no annealing schedule. These additions will make clear that the geometric prior is a constant, lightweight regularizer whose contribution can be directly compared to the FlexMatch term. revision: yes

  2. Referee: [§4] §4 (Experiments): No ablation isolating the LeJEPA term is reported, nor is any analysis given of how the fixed isotropic-Gaussian constraint interacts with FlexMatch’s adaptive confidence threshold. This omission is load-bearing because the skeptic’s concern—that the regularization may conflict with pseudo-label selection dynamics when labels are still noisy—cannot be evaluated from the given results.

    Authors: We acknowledge that an explicit ablation isolating the LeJEPA term and a direct analysis of its interaction with the adaptive threshold were missing. The revised version will include a new ablation table (FlexMatch vs. JEPAMatch) on all three datasets and a short discussion explaining that the isotropic-Gaussian constraint improves representation quality from the earliest epochs, thereby reducing the incidence of low-confidence noisy pseudo-labels and allowing the adaptive threshold to operate on more reliable features. This will directly address the potential conflict concern. revision: yes

  3. Referee: [§4] §4 (Experiments): The convergence-acceleration claim lacks quantitative support such as epochs-to-target-accuracy curves, wall-clock times, or statistical significance across multiple random seeds. The abstract’s assertion of “drastically reducing the overall computational cost” therefore rests on qualitative statements rather than verifiable measurements.

    Authors: We agree that the convergence-acceleration claim requires quantitative backing. In the revision we will add (i) accuracy-versus-epochs curves for JEPAMatch and the FlexMatch baseline on CIFAR-100 and STL-10, (ii) the number of epochs required to reach 80 % and 85 % accuracy, and (iii) all main results reported as mean ± standard deviation over three independent random seeds. Where feasible we will also report approximate wall-clock times per epoch on the same hardware. These additions will replace the qualitative statements with verifiable measurements. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method is a combination of existing components

full rationale

The paper proposes JEPAMatch by combining the FlexMatch loss (an adaptive extension of FixMatch) with a latent-space regularization term explicitly derived from the cited LeJEPA framework, which asserts an isotropic Gaussian structure in latent space. This is presented as an empirical combination rather than an internal derivation that reduces to the inputs by construction. No equations, fitted parameters, or self-citations are shown that force the new objective or performance claims to be tautological. The claims rest on experimental results on CIFAR-100, STL-10, and Tiny-ImageNet rather than on a self-referential definition or imported uniqueness theorem. The regularization is treated as an external principle, not smuggled in as an unverified ansatz within the current derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no equations or implementation details available to enumerate free parameters, axioms, or invented entities. The central claim rests on the unverified transfer of the LeJEPA isotropic-Gaussian principle to the SSL setting.

pith-pipeline@v0.9.0 · 5587 in / 1196 out tokens · 31724 ms · 2026-05-10T00:35:21.681433+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 2 canonical work pages · 1 internal anchor

  1. [1]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Assran, M., Duval, Q., Misra, I., Bojanowski, P., Vincent, P., Rabbat, M., LeCun, Y., Ballas, N.: Self-supervised learning from images with a joint-embedding predic- tive architecture. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 15619–15629 (2023)

  2. [2]

    Advances in Neural In- formation Processing Systems35, 26671–26685 (2022)

    Balestriero, R., LeCun, Y.: Contrastive and non-contrastive self-supervised learn- ing recover global and local spectral embedding methods. Advances in Neural In- formation Processing Systems35, 26671–26685 (2022)

  3. [3]

    Lejepa: Provable and scalable self-supervised learning without the heuristics, 2025

    Balestriero, R., LeCun, Y.: Lejepa: Provable and scalable self-supervised learning without the heuristics. arXiv preprint arXiv:2511.08544 (2025)

  4. [4]

    International Conference on Learning Representations (2019a) JEPAMatch 15

    Berthelot, D., Carlini, N., Cubuk, E.D., Kurakin, A., Sohn, K., Zhang, H., Raf- fel, C.: Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring. International Conference on Learning Representations (2019a) JEPAMatch 15

  5. [5]

    In: Advances in Neural Information Processing Systems (NeurIPS)

    Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.A.: Mixmatch: A holistic approach to semi-supervised learning. In: Advances in Neural Information Processing Systems (NeurIPS). vol. 32 (2019)

  6. [6]

    In: International Conferenceon LearningRepresentations(2022),https://openreview.net/forum? id=Q5uh1Nvv5dm

    Berthelot, D., Roelofs, R., Sohn, K., Carlini, N., Kurakin, A.: Adamatch: A unified approach to semi-supervised learning and domain adaptation. In: International Conferenceon LearningRepresentations(2022),https://openreview.net/forum? id=Q5uh1Nvv5dm

  7. [7]

    (eds.): Semi-Supervised Learning

    Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. The MIT Press (2006)

  8. [8]

    In: The Eleventh International Conference on Learning Representations (2023),https://openreview.net/forum?id=ymt1zQXBDiF

    Chen, H., Tao, R., Fan, Y., Wang, Y., Wang, J., Schiele, B., Xie, X., Raj, B., Sav- vides, M.: Softmatch: Addressing the quantity-quality tradeoff in semi-supervised learning. In: The Eleventh International Conference on Learning Representations (2023),https://openreview.net/forum?id=ymt1zQXBDiF

  9. [9]

    In: Proceedings of the 58th annual meeting of the association for computational linguistics

    Chen, J., Yang, Z., Yang, D.: Mixtext: Linguistically-informed interpolation of hidden space for semi-supervised text classification. In: Proceedings of the 58th annual meeting of the association for computational linguistics. pp. 2147–2157 (2020)

  10. [10]

    In: Proceedings of the fourteenth international conference on ar- tificial intelligence and statistics

    Coates, A., Ng, A., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. In: Proceedings of the fourteenth international conference on ar- tificial intelligence and statistics. pp. 215–223. JMLR Workshop and Conference Proceedings (2011)

  11. [11]

    In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

    Da Costa, V.G.T., Zara, G., Rota, P., Oliveira-Santos, T., Sebe, N., Murino, V., Ricci, E.: Dual-head contrastive domain adaptation for video action recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 1181–1190 (2022)

  12. [12]

    Frontiers in oncology12, 960984 (2022)

    Eckardt, J.N., Bornhäuser, M., Wendt, K., Middeke, J.M.: Semi-supervised learn- ing in cancer diagnostics. Frontiers in oncology12, 960984 (2022)

  13. [13]

    In: International Conference on Learning Representations (ICLR) / or Relevant Venue (2023)

    Fan, Y., et al.: Crmatch: Feature-level consistency approach for semi-supervised learning. In: International Conference on Learning Representations (ICLR) / or Relevant Venue (2023)

  14. [14]

    In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Fini, E., Astolfi, P., Alahari, K., Alameda-Pineda, X., Mairal, J., Nabi, M., Ricci, E.: Semi-supervised learning made simple with self-supervised clustering. In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 3187–3197 (2023)

  15. [15]

    Advances in neural information processing systems17(2004)

    Grandvalet, Y., Bengio, Y.: Semi-supervised learning by entropy minimization. Advances in neural information processing systems17(2004)

  16. [16]

    In: Proceedings of the AAAI Conference on Artificial Intelligence

    Han, H., Yuan, J., Wei, C., Yu, Z.: Regmixmatch: Optimizing mixup utilization in semi-supervised learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 17032–17040 (2025)

  17. [17]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp. 770–778 (2016)

  18. [18]

    Master’s thesis, Department of Computer Science, University of Toronto (2009)

    Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto (2009)

  19. [19]

    CS 231N7(7), 3 (2015)

    Le, Y., Yang, X.: Tiny imagenet visual recognition challenge. CS 231N7(7), 3 (2015)

  20. [20]

    In: ICML workshop on Challenges in Representation Learning (WREPL) (2013) 16 Ali Aghababaei-Harandi, Aude Sportisse and Massih-Reza Amini

    Lee, D.H.: Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: ICML workshop on Challenges in Representation Learning (WREPL) (2013) 16 Ali Aghababaei-Harandi, Aude Sportisse and Massih-Reza Amini

  21. [21]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Li, J., Xiong, C., Hoi, S.C.: Comatch: Semi-supervised learning with contrastive graph regularization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 9475–9484 (2021)

  22. [22]

    In: Advances in Neural Information Processing Systems (NeurIPS) / or Relevant Venue (2023)

    Liu, A., et al.: Flatmatch: Bridging the gap between labeled and unlabeled data in semi-supervised learning. In: Advances in Neural Information Processing Systems (NeurIPS) / or Relevant Venue (2023)

  23. [23]

    Bioinformatics37(21), 3744– 3751 (2021)

    Moffat, L., Jones, D.T.: Increasing the accuracy of single sequence prediction meth- ods using a deep semi-supervised learning framework. Bioinformatics37(21), 3744– 3751 (2021)

  24. [24]

    International Conference on Learning Representations (2021)

    Rizve, M.N., Duarte, K., Rawat, Y.S., Shah, M.: In defense of pseudo-labeling: An uncertainty-aware pseudo-label selection framework for semi-supervised learning. International Conference on Learning Representations (2021)

  25. [25]

    In: Inter- national Conference on Learning Representations (ICLR)

    Samuli, L., Timo, A.: Temporal ensembling for semi-supervised learning. In: Inter- national Conference on Learning Representations (ICLR). vol. 4, p. 6 (2017)

  26. [26]

    Advances in Neural Information Processing Systems 33, 596–608 (2020)

    Sohn, K., Berthelot, D., Li, C.L., Zhang, Z., Carlini, N., Cubuk, E.D., Kurakin, A., Zhang, H., Raffel, C.: Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Advances in Neural Information Processing Systems 33, 596–608 (2020)

  27. [27]

    IEEE Transactions on Neural Networks and Learning Systems 34(11), 8174–8194 (2022)

    Song, Z., Yang, X., Xu, Z., King, I.: Graph-based semi-supervised learning: A com- prehensive review. IEEE Transactions on Neural Networks and Learning Systems 34(11), 8174–8194 (2022)

  28. [28]

    Advances in Neural Information Processing Systems (2017)

    Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in Neural Information Processing Systems (2017)

  29. [29]

    Wang, Y., Chen, H., Fan, Y., SUN, W., Tao, R., Hou, W., Wang, R., Yang, L., Zhou, Z., Guo, L.Z., Qi, H., Wu, Z., Li, Y.F., Nakamura, S., Ye, W., Sav- vides, M., Raj, B., Shinozaki, T., Schiele, B., Wang, J., Xie, X., Zhang, Y.: Usb: A unified semi-supervised learning benchmark for classification35, 3938– 3961 (2022),https://proceedings.neurips.cc/paper_fi...

  30. [30]

    In: International Conference on Learning Representations (ICLR) (2023)

    Wang, Y., Chen, H., Heng, Q., Hou, W., Fan, Y., Wu, Z., Wang, J., Savvides, M., Shinozaki, T., Raj, B., et al.: Freematch: Self-adaptive thresholding for semi- supervised learning. In: International Conference on Learning Representations (ICLR) (2023)

  31. [31]

    Advances in Neural Information Processing Systems (2019)

    Xie, Q., Dai, Z., Hovy, E., Luong, M.T., Le, Q.V.: Unsupervised data augmenta- tion for consistency training. Advances in Neural Information Processing Systems (2019)

  32. [32]

    In: Advances in Neural Information Processing Systems (NeurIPS)

    Xie, Q., Dai, Z., Hovy, E., Luong, T., Le, Q.: Unsupervised data augmentation for consistency training. In: Advances in Neural Information Processing Systems (NeurIPS). vol. 33, pp. 6256–6268 (2020)

  33. [33]

    In: International Conference on Machine Learning

    Xu, Y., Shang, L., Ye, J., Qian, Q., Li, Y.F., Sun, B., Li, H., Jin, R.: Dash: Semi- supervised learning with dynamic thresholding. In: International Conference on Machine Learning. pp. 11525–11536. PMLR (2021)

  34. [34]

    Wide Residual Networks

    Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)

  35. [35]

    Advances in Neural Information Processing Systems (2021)

    Zhang, B., Wang, Y., Hou, W., Wu, H., Wang, J., Okumura, M., Shinozaki, T.: FlexMatch: Boosting semi-supervised learning with curriculum pseudo labeling. Advances in Neural Information Processing Systems (2021)

  36. [36]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Zheng, M., You, S., Huang, L., Wang, F., Qian, C., Xu, C.: Simmatch: Semi- supervised learning with similarity matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 14471–14481 (2022) JEPAMatch 17 A Extra Experiments A.1 Tiny-ImageNet: We extended our evaluation to the Tiny-ImageNet dataset by testing JEPA- M...