Radial Basis Function Networks as Projection Heads in Self-Supervised Learning

Andreas Schliebitz; Heiko Tapken; Martin Atzmueller

arxiv: 2606.21590 · v1 · pith:H56UYU5Xnew · submitted 2026-06-19 · 💻 cs.CV · cs.LG

Radial Basis Function Networks as Projection Heads in Self-Supervised Learning

Andreas Schliebitz , Heiko Tapken , Martin Atzmueller This is my paper

Pith reviewed 2026-06-26 14:53 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords self-supervised learningradial basis function networkprojection headrepresentation qualitylabel-free metricimage classificationMoCoSimCLR

0 comments

The pith

Radial basis function networks can replace MLP projection heads in self-supervised learning while supplying a label-free metric for backbone quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that an RBFN projection head performs comparably to a standard MLP across five common SSL methods on four image datasets. Its learned centers and shape parameters yield a metric called Scale-Normalized Separation that tracks representation quality without labels or extra classifiers. SNS shows strong positive correlation with logistic regression accuracy, so the projector itself can serve as a quality proxy. Three Gaussian RBF layers are recommended as the practical construction. The work also releases a new Open Images-derived dataset to aid reproducible experiments.

Core claim

In self-supervised learning pipelines, replacing the conventional MLP projection head with a radial basis function network maintains competitive downstream performance on image classification tasks. The RBFN's interpretable centers and shape parameters directly support computation of Scale-Normalized Separation, a metric that exhibits strong to very strong correlation with logistic regression metrics and therefore functions as a reliable label-free indicator of backbone representation quality.

What carries the argument

Radial basis function network (RBFN) projection head with three Gaussian-activated layers, from whose learned centers and shapes Scale-Normalized Separation is computed as a label-free quality metric.

If this is right

RBFN heads with three Gaussian layers serve as drop-in replacements that match MLP performance in MoCo, SimCLR, BYOL, SwAV, and SimSiam.
SNS derived solely from RBFN parameters acts as a reliable proxy for backbone quality without requiring labeled data.
The approach avoids the computational waste of training then discarding an MLP head.
The released Open Images V7-derived dataset enables further reproducible studies of representation quality metrics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

SNS could be monitored continuously during SSL training to detect representation collapse without pausing for labeled probes.
The same RBFN construction might be tested in non-image domains such as audio or graph SSL to check whether the correlation pattern persists.
Because the centers are explicit, one could inspect which regions of the representation space the backbone emphasizes most strongly.

Load-bearing premise

The centers and shape parameters learned by the RBFN during training capture backbone representation quality in a manner that generalizes beyond the specific training runs and datasets examined.

What would settle it

Compute SNS on RBFN heads trained with a new architecture or dataset outside the five SSL methods and four datasets tested, then measure its correlation with linear probing accuracy; a collapse to weak or negative correlation would refute the proxy claim.

Figures

Figures reproduced from arXiv: 2606.21590 by Andreas Schliebitz, Heiko Tapken, Martin Atzmueller.

**Figure 1.** Figure 1: RBFN projection head with stacked linear and RBF layers featuring RBF nonlinearities, preceded by an optional batch normalization step (disabled by default). z ∈ R 128 nn.Linear (128→2048) nn.BatchNorm1d RBFLayer(K, fd, fb, n) nn.Linear (2048→128) p ∈ R 128 [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗

read the original abstract

Self-supervised learning (SSL) typically relies on a backbone encoder followed by a small multilayer perceptron (MLP) projection head, which is conventionally discarded after training, while backbone quality is assessed via costly linear probing on labeled data. We argue that this approach including discarding the projector is rather computationally wasteful. Instead, we propose replacing the MLP head with a radial basis function network (RBFN), whose interpretable center and shape parameters can be exploited to judge representation quality without labels or a separate classifier. To this end, we introduce Scale-Normalized Separation (SNS), a novel label-free quality metric derived solely from the kernel centers and shapes learned during training. Across five canonical SSL architectures (MoCo, SimCLR, BYOL, SwAV and SimSiam) and four image classification datasets, we show that RBFN projection heads are competitive drop-in replacements for standard MLP projectors. We recommend constructing them with three RBF layers activated by the Gaussian radial basis function. Moreover, SNS exhibits strong to very strong positive correlation with established logistic regression metrics, demonstrating that a trained RBFN projector can act as a reliable proxy for backbone representation quality. We additionally publish a novel PyTorch compatible image classification dataset based on Google's Open Images V7 to facilitate reproducible research into representation learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RBFN heads are competitive with MLPs across five SSL methods and SNS from their parameters tracks linear probe accuracy in the reported runs.

read the letter

RBFN projection heads can replace the usual MLP ones in MoCo, SimCLR, BYOL, SwAV and SimSiam without hurting downstream accuracy on the four datasets tested. The paper also extracts a Scale-Normalized Separation score straight from the learned RBF centers and shapes and shows it lines up well with logistic regression probes.

The systematic swap across five standard frameworks plus the release of a new Open Images subset are the concrete contributions. Recommending three Gaussian RBF layers gives readers something they can try directly. The correlations between SNS and the labeled metric are the part that could matter for people who want to skip the probe step.

The main limitation is that the abstract and reported results give no error bars, run counts, or significance numbers, so the competitiveness claim is harder to weigh. SNS is built from parameters that are optimized inside the same SSL loss, which makes the independence of the proxy less obvious and leaves open how well it travels to new datasets or training regimes. The work stays inside image classification, so nothing is claimed beyond that.

Researchers who build or evaluate SSL pipelines would find the replacement experiment and the SNS idea useful to check. It is worth sending out for peer review because the experiments hit the usual baselines and the metric is simple enough to reproduce or refute quickly.

Referee Report

2 major / 1 minor

Summary. The paper proposes replacing standard MLP projection heads with radial basis function networks (RBFNs) in self-supervised learning frameworks (MoCo, SimCLR, BYOL, SwAV, SimSiam). It introduces Scale-Normalized Separation (SNS), a label-free metric derived from the learned RBF centers and shape parameters, as a proxy for backbone representation quality. Empirical results across four image classification datasets claim that RBFN heads (recommended with three layers and Gaussian activation) are competitive drop-in replacements for MLPs, with SNS exhibiting strong to very strong positive correlations to logistic regression probing accuracies. The work also releases a new PyTorch-compatible image classification dataset derived from Open Images V7.

Significance. If the empirical claims hold under rigorous validation, the work could reduce computational waste in SSL pipelines by retaining and exploiting the projection head for evaluation rather than discarding it, while providing an interpretable, label-free alternative to linear probing. The SNS metric and the released dataset represent concrete contributions to reproducibility and efficiency in representation learning.

major comments (2)

[Abstract / Experimental results] Abstract and experimental results: the central claim of competitive performance across five SSL architectures and four datasets, plus strong SNS-logistic regression correlations, is reported without details on training hyperparameters, number of runs, error bars, statistical significance tests, or controls for post-hoc selection. This leaves the empirical support for the drop-in replacement and proxy-metric claims only moderately substantiated.
[SNS metric definition] SNS definition and evaluation: SNS is computed directly from the RBF centers and shape parameters that are optimized as part of the same SSL training objective, creating dependence on the learned parameters. The manuscript presents SNS as an independent proxy, but additional validation (e.g., sensitivity analysis or comparison to external benchmarks) is needed to confirm it generalizes beyond the specific training runs.

minor comments (1)

[Method / Recommendations] The recommendation to use three RBF layers with Gaussian activation should be accompanied by an ablation study or justification if not already included, to clarify sensitivity to these architectural choices.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below, clarifying our position and outlining revisions where appropriate to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract / Experimental results] Abstract and experimental results: the central claim of competitive performance across five SSL architectures and four datasets, plus strong SNS-logistic regression correlations, is reported without details on training hyperparameters, number of runs, error bars, statistical significance tests, or controls for post-hoc selection. This leaves the empirical support for the drop-in replacement and proxy-metric claims only moderately substantiated.

Authors: We agree that expanded reporting of experimental details will improve rigor. In the revised manuscript we will add full hyperparameter tables, specify the number of independent runs, include error bars or standard deviations, report statistical significance where relevant, and discuss controls for post-hoc selection. These additions will provide stronger substantiation while preserving the existing empirical trends across architectures and datasets. revision: yes
Referee: [SNS metric definition] SNS definition and evaluation: SNS is computed directly from the RBF centers and shape parameters that are optimized as part of the same SSL training objective, creating dependence on the learned parameters. The manuscript presents SNS as an independent proxy, but additional validation (e.g., sensitivity analysis or comparison to external benchmarks) is needed to confirm it generalizes beyond the specific training runs.

Authors: SNS is explicitly derived from the learned RBF parameters, which is the intended mechanism for obtaining a label-free signal; we will revise the text to avoid any implication of full independence from the training process. To address the request for further validation, we will incorporate a sensitivity analysis (perturbations to centers and shapes) and comparisons against additional external benchmarks in the revision. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central claims are empirical: RBFN heads are shown competitive with MLP heads across five SSL methods and four datasets via standard training and evaluation protocols, and SNS is introduced as a metric computed from the learned RBF parameters with its utility demonstrated by observed correlations to logistic regression accuracy. No derivation reduces a result to its inputs by construction, no self-citation chain supports a uniqueness claim, and no fitted parameter is relabeled as an independent prediction. The SNS definition is explicit about its dependence on training outputs, and the correlation evidence is external to that definition.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

The central claim rests on the empirical claim that RBFN heads match MLP performance and that SNS derived from their parameters correlates with labeled metrics; this depends on the unstated assumption that RBFN training dynamics remain compatible with existing SSL losses and that the correlation is not an artifact of the chosen datasets.

free parameters (2)

number of RBF layers
Set to three in the recommendation; chosen after experiments.
RBF shape and center parameters
Learned during training; used directly to compute SNS.

axioms (1)

domain assumption RBFN can be substituted for MLP projection heads without materially changing the learned backbone representations under standard SSL objectives.
Invoked when claiming drop-in replacement status across MoCo, SimCLR, BYOL, SwAV and SimSiam.

invented entities (1)

Scale-Normalized Separation (SNS) no independent evidence
purpose: Label-free proxy for backbone quality derived from RBF centers and shapes.
New metric introduced in the paper; no independent evidence outside the reported correlations is described.

pith-pipeline@v0.9.1-grok · 5761 in / 1439 out tokens · 28208 ms · 2026-06-26T14:53:26.209621+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 36 canonical work pages · 1 internal anchor

[1]

In: Proc

Agrawal, K.K., Mondal, A.K., Ghosh, A., Richards, B.A.:α-ReQ: Assessing rep- resentation quality by measuring eigenspectrum decay. In: Proc. International Conference on Neural Information Processing Systems. NIPS ’22, Curran Asso- ciates Inc., Red Hook, NY, USA (2022), https://dl.acm.org/doi/10.5555/3600270. 3601551

work page doi:10.5555/3600270 2022
[2]

Amirian, M., Schwenker, F.: Radial Basis Function Networks for Convolutional Neural Networks to Learn Similarity Distance Metric and Improve Interpretabil- ity.IEEEAccess8,123087–123097(2020).https://doi.org/10.1109/ACCESS.2020. 3007337

work page doi:10.1109/access.2020 2020
[3]

In: Proc

Bardes, A., Ponce, J., LeCun, Y.: VICReg: Variance-Invariance-Covariance Reg- ularization for Self-Supervised Learning. In: Proc. International Conference on Learning Representations, ICLR. OpenReview.net (2022). https://doi.org/10. 48550/arXiv.2105.04906

Pith/arXiv arXiv 2022
[4]

In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S

Ben-Shaul,I.,Shwartz-Ziv,R.,Galanti,T.,Dekel,S.,LeCun,Y.:ReverseEngineer- ing Self-Supervised Learning. In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. (eds.) Proc. International Conference on Neural Information Processing Systems (NeurIPS) (2023). https://doi.org/10.48550/arXiv.2305.15614

work page doi:10.48550/arxiv.2305.15614 2023
[5]

Transactions on Machine Learning Research2023(2023)

Bordes, F., Balestriero, R., Garrido, Q., Bardes, A., Vincent, P.: Guillotine Reg- ularization: Why removing layers is needed to improve generalization in Self- Supervised Learning. Transactions on Machine Learning Research2023(2023). https://doi.org/10.48550/arXiv.2206.13378

work page doi:10.48550/arxiv.2206.13378 2023
[6]

In: Advances in Neural Informa- tion Processing Systems

Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature Verification using a ”Siamese” Time Delay Neural Network. In: Advances in Neural Informa- tion Processing Systems. vol. 6 (1993), https://dl.acm.org/doi/10.5555/2987189. 2987282

work page doi:10.5555/2987189 1993
[7]

Complex Systems2(3) (1988), https://www.complex-systems.com/ abstracts/v02_i03_a05/, accessed: 2026-06-02

Broomhead, D.S., Lowe, D.: Multivariable Functional Interpolation and Adap- tive Networks. Complex Systems2(3) (1988), https://www.complex-systems.com/ abstracts/v02_i03_a05/, accessed: 2026-06-02

1988
[8]

In: Proc

Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsuper- vised Learning of Visual Features by Contrasting Cluster Assignments. In: Proc. International Conference on Neural Information Processing Systems. NIPS ’20, Curran Associates Inc., Red Hook, NY, USA (2020), https://dl.acm.org/doi/10. 5555/3495724.3496555

arXiv 2020
[9]

Deepsd: Automatic deep skinning and pose space deformation for 3d garment animation

Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging Properties in Self-Supervised Vision Transformers. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Mon- treal, QC, Canada, October 10-17, 2021. pp. 9630–9640. IEEE (2021). https: //doi.org/10.1109/ICCV48922.2021.00951

work page doi:10.1109/iccv48922.2021.00951 2021
[10]

2020, in Proceedings of the 37th International Conference on Machine Learning, ICML’20 (JMLR.org), doi: 10.5555/3524938.3525087

Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A Simple Framework for Con- trastive Learning of Visual Representations. In: Proceedings of the 37th Inter- national Conference on Machine Learning. ICML’20, JMLR.org (2020), https: //dl.acm.org/doi/10.5555/3524938.3525087

work page doi:10.5555/3524938.3525087 2020
[11]

In: Proc

Chen, T., Kornblith, S., Swersky, K., Norouzi, M., Hinton, G.: Big Self-Supervised Models are Strong Semi-Supervised Learners. In: Proc. International Conference on Neural Information Processing Systems. NIPS ’20, Curran Associates Inc., Red Hook, NY, USA (2020), https://dl.acm.org/doi/10.5555/3495724.3497589

work page doi:10.5555/3495724.3497589 2020
[12]

CoRRabs/2003.04297(2020)

Chen, X., Fan, H., Girshick, R.B., He, K.: Improved Baselines with Momentum Contrastive Learning. CoRRabs/2003.04297(2020). https://doi.org/10.48550/ arXiv.2003.04297 18 A. Schliebitz et al

Pith/arXiv arXiv 2003
[13]

Otaduy, and Dan Casas

Chen, X., He, K.: Exploring Simple Siamese Representation Learning. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 15745–15753 (2021). https://doi.org/10.1109/CVPR46437.2021.01549

work page doi:10.1109/cvpr46437.2021.01549 2021
[14]

Chun-Hsiao Yeh, Y.C.: IN100pytorch: PyTorch Implementation: Training ResNets on ImageNet-100 (2022), https://github.com/danielchyeh/ImageNet-100-Pytorch, accessed: 2026-06-02

2022
[15]

In: Proc

Cubuk, E.D., Zoph, B., Mané, D., Vasudevan, V., Le, Q.V.: AutoAugment: Learn- ing Augmentation Strategies From Data. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 113–123 (2019). https: //doi.org/10.1109/CVPR.2019.00020

work page doi:10.1109/cvpr.2019.00020 2019
[16]

Cubuk, E.D., Zoph, B., Shlens, J., Le, Q.V.: RandAugment: Practical automated dataaugmentationwithareducedsearchspace.In:2020IEEE/CVFConferenceon Computer Vision and Pattern Recognition Workshops (CVPRW). pp. 3008–3017 (2020). https://doi.org/10.1109/CVPRW50498.2020.00359

work page doi:10.1109/cvprw50498.2020.00359 2020
[17]

Deep Residual Learning for Image Recognition

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A Large- Scale Hierarchical Image Database. In: Proc. IEEE Conference on Computer Vi- sion and Pattern Recognition. pp. 248–255 (2009). https://doi.org/10.1109/CVPR. 2009.5206848

work page doi:10.1109/cvpr 2009
[18]

Otaduy, and Dan Casas

Deng, W., Zheng, L.: Are Labels Always Necessary for Classifier Accuracy Eval- uation? In: Proc. IEEE Conference on Computer Vision and Pattern Recog- nition, CVPR. pp. 15069–15078. Computer Vision Foundation / IEEE (2021). https://doi.org/10.1109/CVPR46437.2021.01482

work page doi:10.1109/cvpr46437.2021.01482 2021
[19]

In: Proc

Garrido, Q., Balestriero, R., Najman, L., LeCun, Y.: RankMe: Assessing the Down- stream Performance of Pretrained Self-supervised Representations by Their Rank. In: Proc. International Conference on Machine Learning. ICML’23, JMLR.org (2023), http://dl.acm.org/doi/10.5555/3618408.3618848

work page doi:10.5555/3618408.3618848 2023
[20]

Google LLC: Open Images V7 (2022), https://storage.googleapis.com/ openimages/web/factsfigures_v7.html, accessed: 2026-06-02

2022
[21]

In: Proc

Grill, J.B., Strub, F., Altché, F., Tallec, C., Richemond, P.H., Buchatskaya, E., Doersch,C.,Pires,B.A.,Guo,Z.D.,Azar,M.G.,Piot,B.,Kavukcuoglu,K.,Munos, R., Valko, M.: Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning. In: Proc. International Conference on Neural Information Processing Systems. NIPS ’20, Curran Associates Inc., Red Ho...

work page doi:10.5555/3495724.3497510 2020
[22]

CoRR abs/2212.11491(2022)

Gupta, K., Ajanthan, T., van den Hengel, A., Gould, S.: Understanding and Improving the Role of Projection Head in Self-Supervised Learning. CoRR abs/2212.11491(2022). https://doi.org/10.48550/arXiv.2212.11491

work page doi:10.48550/arxiv.2212.11491 2022
[23]

Deep Residual Learning for Image Recognition

He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

work page doi:10.1109/cvpr.2016.90 2016
[24]

In: Proc

Hendrycks, D., Mu, N., Cubuk, E.D., Zoph, B., Gilmer, J., Lakshminarayanan, B.: AugMix: A Simple Data Processing Method to Improve Robustness and Un- certainty. In: Proc. International Conference on Learning Representations, ICLR. OpenReview.net (2020). https://doi.org/10.48550/arXiv.1912.02781

work page doi:10.48550/arxiv.1912.02781 2020
[25]

IEEE Transactions on Neural Networks4(1), 156–159 (1993)

Jang, J.S., Sun, C.T.: Functional Equivalence Between Radial Basis Function Net- works and Fuzzy Inference Systems. IEEE Transactions on Neural Networks4(1), 156–159 (1993). https://doi.org/10.1109/72.182710

work page doi:10.1109/72.182710 1993
[26]

In: Proc

Jing, L., Vincent, P., LeCun, Y., Tian, Y.: Understanding Dimensional Collapse in Contrastive Self-supervised Learning. In: Proc. International Conference on Learn- ing Representations, ICLR. OpenReview.net (2022). https://doi.org/10.48550/ arXiv.2110.09348 RBFNs as Projection Heads in Self-Supervised Learning 19

arXiv 2022
[27]

In: Proc

Johnson, D.D., Hanchi, A.E., Maddison, C.J.: Contrastive Learning Can Find An Optimal Basis For Approximately View-Invariant Functions. In: Proc. Interna- tional Conference on Learning Representations, ICLR. OpenReview.net (2023). https://doi.org/10.48550/arXiv.2210.01883

work page doi:10.48550/arxiv.2210.01883 2023
[28]

Kornblith, S., Shlens, J., Le, Q.V.: Do Better ImageNet Models Transfer Better? In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2656–2666 (2019). https://doi.org/10.1109/CVPR.2019.00277

work page doi:10.1109/cvpr.2019.00277 2019
[29]

In: Proc

Lee, J., Bahri, Y., Novak, R., Schoenholz, S.S., Pennington, J., Sohl-Dickstein, J.: Deep Neural Networks as Gaussian Processes. In: Proc. International Conference on Learning Representations, ICLR. OpenReview.net (2018). https://doi.org/10. 48550/arXiv.1711.00165

Pith/arXiv arXiv 2018
[30]

In: Proc

Li, Y., Pogodin, R., Sutherland, D.J., Gretton, A.: Self-Supervised Learning with Kernel Dependence Maximization. In: Proc. International Conference on Neural Information Processing Systems. NIPS ’21, Curran Associates Inc., Red Hook, NY, USA (2021), http://dl.acm.org/doi/10.5555/3540261.3541451

work page doi:10.5555/3540261.3541451 2021
[31]

In: Proc

Lundberg, S.M., Lee, S.I.: A Unified Approach to Interpreting Model Predictions. In: Proc. International Conference on Neural Information Processing Systems. p. 4768–4777. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (2017), https: //dl.acm.org/doi/10.5555/3295222.3295230

work page doi:10.5555/3295222.3295230 2017
[32]

In: Proc

Ma, J., Hu, T., Wang, W.: Deciphering the Projection Head: Representation Eval- uation Self-supervised Learning. In: Proc. International Joint Conference on Arti- ficial Intelligence. IJCAI ’24 (2024). https://doi.org/10.24963/ijcai.2024/522

work page doi:10.24963/ijcai.2024/522 2024
[33]

Constructive Approximation2(1), 11–22 (1986)

Micchelli, C.A.: Interpolation of Scattered Data: Distance Matrices and Condition- ally Positive Definite Functions. Constructive Approximation2(1), 11–22 (1986). https://doi.org/10.1007/BF01893414

work page doi:10.1007/bf01893414 1986
[34]

Nandam, S.R., Atito, S., Feng, Z., Kittler, J., Awais, M.: Investigating Self- Supervised Methods for Label-Efficient Learning. Int. J. Comput. Vision133(7), 4522–4537 (Mar 2025). https://doi.org/10.1007/s11263-025-02397-4

work page doi:10.1007/s11263-025-02397-4 2025
[35]

CoRRabs/1807.03748(2018)

van den Oord, A., Li, Y., Vinyals, O.: Representation Learning with Contrastive Predictive Coding. CoRRabs/1807.03748(2018). https://doi.org/10.48550/ arXiv.1807.03748

Pith/arXiv arXiv 2018
[36]

Orr, M.J.L.: Introduction to Radial Basis Function Networks. Tech. rep., Centre for Cognitive Science, University of Edinburgh (April 1996), https://faculty.cc.gatech. edu/~isbell/tutorials/rbf-intro.pdf, accessed: 2026-06-05

1996
[37]

Neural Computation3(2), 246–257 (06 1991)

Park, J., Sandberg, I.W.: Universal Approximation Using Radial-Basis-Function Networks. Neural Computation3(2), 246–257 (06 1991). https://doi.org/10.1162/ neco.1991.3.2.246

1991
[38]

In: Neu, G., Rosasco, L

Parulekar, A., Collins, L., Shanmugam, K., Mokhtari, A., Shakkottai, S.: InfoNCE Loss Provably Learns Cluster-Preserving Representations. In: Neu, G., Rosasco, L. (eds.) Proc. International Conference on Learning Theory. Proceedings of Machine Learning Research, vol. 195, pp. 1914–1961. PMLR (12–15 Jul 2023). https://doi. org/10.48550/arXiv.2302.07920

work page doi:10.48550/arxiv.2302.07920 1914
[39]

Note on regression and inheritance in the case of two parents , author =

Pearson, K.: VII. Note on Regression and Inheritance in the Case of Two Parents. Proceedings of the Royal Society of London58(347-352), 240–242 (12 1895). https: //doi.org/10.1098/rspl.1895.0041

work page doi:10.1098/rspl.1895.0041
[40]

Schliebitz et al

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet Large Scale Visual Recognition Challenge 2013 (ILSVRC2013) (2013), https: //www.image-net.org/challenges/LSVRC/2013/index, accessed: 2026-06-02 20 A. Schliebitz et al

2013
[41]

Bernstein, Alexander C

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV)115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

work page doi:10.1007/s11263-015-0816-y 2015
[42]

Russo, A.: Pytorch RBF Layer (2021), https://github.com/rssalessio/ PytorchRBFLayer, accessed: 2026-06-02

2021
[43]

Schliebitz, A., Tapken, H., Atzmueller, M.: The OpenImagesV7-100 Dataset (Dec 2025), https://github.com/andreas-schliebitz/open-images-v7-100, accessed: 2026-06-02

2025
[44]

CoRRabs/2307.08913(2023)

Song, Z., Su, X., Wang, J., Qiang, W., Zheng, C., Sun, F.: Towards the Sparseness of Projection Head in Self-Supervised Learning. CoRRabs/2307.08913(2023). https://doi.org/10.48550/arXiv.2307.08913

work page doi:10.48550/arxiv.2307.08913 2023
[45]

The American Journal of Psychology15(1), 72–101 (1904), http://www.jstor.org/ stable/1412159

Spearman, C.: The Proof and Measurement of Association between Two Things. The American Journal of Psychology15(1), 72–101 (1904), http://www.jstor.org/ stable/1412159

arXiv 1904
[46]

Susmelj, I., Heller, M., Wirth, P., Prescott, J., Ebner, M., et al.: Lightly, https: //github.com/lightly-ai/lightly, accessed: 2026-06-02

2026
[47]

In: Proc

Thilak, V., Huang, C., Saremi, O., Dinh, L., Goh, H., Nakkiran, P., Susskind, J.M., Littwin, E.: LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures. In: Proc. International Conference on Learning Representations, ICLR. OpenReview.net (2024). https://doi.org/10.48550/arXiv.2312.04000

work page doi:10.48550/arxiv.2312.04000 2024
[48]

In: Proc Euro- peanConferenceonComputerVision(ECCV).p.776–794.Springer-Verlag,Berlin, Heidelberg (2020)

Tian, Y., Krishnan, D., Isola, P.: Contrastive Multiview Coding. In: Proc Euro- peanConferenceonComputerVision(ECCV).p.776–794.Springer-Verlag,Berlin, Heidelberg (2020). https://doi.org/10.1007/978-3-030-58621-8_45

work page doi:10.1007/978-3-030-58621-8_45 2020
[49]

TorchVision maintainers and contributors: TorchVision: PyTorch’s Computer Vi- sion library (Nov 2016), https://github.com/pytorch/vision, accessed: 2026-06-02

2016
[50]

In: Proc

Wang, T., Isola, P.: Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere. In: Proc. International Confer- ence on Machine Learning. ICML’20, JMLR.org (2020), https://dl.acm.org/doi/ 10.5555/3524938.3525859

work page doi:10.5555/3524938.3525859 2020
[51]

Deep Kernel Learning

Wilson, A.G., Hu, Z., Salakhutdinov, R., Xing, E.P.: Deep Kernel Learning. In: Gretton, A., Robert, C.C. (eds.) Proc. International Conference on Artificial In- telligence and Statistics. vol. 51, pp. 370–378. PMLR, Cadiz, Spain (09–11 May 2016). https://doi.org/10.48550/arXiv.1511.02222

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1511.02222 2016
[52]

Artificial Intelligence Review59(4), 112 (Feb 2026)

Wittscher, L.: A survey on design choices for self-supervised learning in computer vision. Artificial Intelligence Review59(4), 112 (Feb 2026). https://doi.org/10. 1007/s10462-026-11506-9

2026
[53]

In: Proc

Xue, Y., Gan, E., Ni, J., Joshi, S., Mirzasoleiman, B.: Investigating the Benefits of Projection Head for Representation Learning. In: Proc. International Confer- ence on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net (2024). https://doi.org/10.48550/arXiv.2403.11391

work page doi:10.48550/arxiv.2403.11391 2024
[54]

CoRRabs/1812.03190(2018)

Zadeh, P.H., Hosseini, R., Sra, S.: Deep-RBF Networks Revisited: Robust Classifi- cation with Rejection. CoRRabs/1812.03190(2018). https://doi.org/10.48550/ arXiv.1812.03190

Pith/arXiv arXiv 2018
[55]

Proceedings of Machine Learning Research, vol

Zbontar, J., Jing, L., Misra, I., LeCun, Y., Deny, S.: Barlow Twins: Self-Supervised LearningviaRedundancyReduction.In:Meila,M.,Zhang,T.(eds.)Proceedingsof the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 12310–12320. PMLR (18–24 Jul 2021). https: //doi.org/10.48550/arXiv.2103.03230

work page doi:10.48550/arxiv.2103.03230 2021

[1] [1]

In: Proc

Agrawal, K.K., Mondal, A.K., Ghosh, A., Richards, B.A.:α-ReQ: Assessing rep- resentation quality by measuring eigenspectrum decay. In: Proc. International Conference on Neural Information Processing Systems. NIPS ’22, Curran Asso- ciates Inc., Red Hook, NY, USA (2022), https://dl.acm.org/doi/10.5555/3600270. 3601551

work page doi:10.5555/3600270 2022

[2] [2]

Amirian, M., Schwenker, F.: Radial Basis Function Networks for Convolutional Neural Networks to Learn Similarity Distance Metric and Improve Interpretabil- ity.IEEEAccess8,123087–123097(2020).https://doi.org/10.1109/ACCESS.2020. 3007337

work page doi:10.1109/access.2020 2020

[3] [3]

In: Proc

Bardes, A., Ponce, J., LeCun, Y.: VICReg: Variance-Invariance-Covariance Reg- ularization for Self-Supervised Learning. In: Proc. International Conference on Learning Representations, ICLR. OpenReview.net (2022). https://doi.org/10. 48550/arXiv.2105.04906

Pith/arXiv arXiv 2022

[4] [4]

In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S

Ben-Shaul,I.,Shwartz-Ziv,R.,Galanti,T.,Dekel,S.,LeCun,Y.:ReverseEngineer- ing Self-Supervised Learning. In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. (eds.) Proc. International Conference on Neural Information Processing Systems (NeurIPS) (2023). https://doi.org/10.48550/arXiv.2305.15614

work page doi:10.48550/arxiv.2305.15614 2023

[5] [5]

Transactions on Machine Learning Research2023(2023)

Bordes, F., Balestriero, R., Garrido, Q., Bardes, A., Vincent, P.: Guillotine Reg- ularization: Why removing layers is needed to improve generalization in Self- Supervised Learning. Transactions on Machine Learning Research2023(2023). https://doi.org/10.48550/arXiv.2206.13378

work page doi:10.48550/arxiv.2206.13378 2023

[6] [6]

In: Advances in Neural Informa- tion Processing Systems

Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature Verification using a ”Siamese” Time Delay Neural Network. In: Advances in Neural Informa- tion Processing Systems. vol. 6 (1993), https://dl.acm.org/doi/10.5555/2987189. 2987282

work page doi:10.5555/2987189 1993

[7] [7]

Complex Systems2(3) (1988), https://www.complex-systems.com/ abstracts/v02_i03_a05/, accessed: 2026-06-02

Broomhead, D.S., Lowe, D.: Multivariable Functional Interpolation and Adap- tive Networks. Complex Systems2(3) (1988), https://www.complex-systems.com/ abstracts/v02_i03_a05/, accessed: 2026-06-02

1988

[8] [8]

In: Proc

Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsuper- vised Learning of Visual Features by Contrasting Cluster Assignments. In: Proc. International Conference on Neural Information Processing Systems. NIPS ’20, Curran Associates Inc., Red Hook, NY, USA (2020), https://dl.acm.org/doi/10. 5555/3495724.3496555

arXiv 2020

[9] [9]

Deepsd: Automatic deep skinning and pose space deformation for 3d garment animation

Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging Properties in Self-Supervised Vision Transformers. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Mon- treal, QC, Canada, October 10-17, 2021. pp. 9630–9640. IEEE (2021). https: //doi.org/10.1109/ICCV48922.2021.00951

work page doi:10.1109/iccv48922.2021.00951 2021

[10] [10]

2020, in Proceedings of the 37th International Conference on Machine Learning, ICML’20 (JMLR.org), doi: 10.5555/3524938.3525087

Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A Simple Framework for Con- trastive Learning of Visual Representations. In: Proceedings of the 37th Inter- national Conference on Machine Learning. ICML’20, JMLR.org (2020), https: //dl.acm.org/doi/10.5555/3524938.3525087

work page doi:10.5555/3524938.3525087 2020

[11] [11]

In: Proc

Chen, T., Kornblith, S., Swersky, K., Norouzi, M., Hinton, G.: Big Self-Supervised Models are Strong Semi-Supervised Learners. In: Proc. International Conference on Neural Information Processing Systems. NIPS ’20, Curran Associates Inc., Red Hook, NY, USA (2020), https://dl.acm.org/doi/10.5555/3495724.3497589

work page doi:10.5555/3495724.3497589 2020

[12] [12]

CoRRabs/2003.04297(2020)

Chen, X., Fan, H., Girshick, R.B., He, K.: Improved Baselines with Momentum Contrastive Learning. CoRRabs/2003.04297(2020). https://doi.org/10.48550/ arXiv.2003.04297 18 A. Schliebitz et al

Pith/arXiv arXiv 2003

[13] [13]

Otaduy, and Dan Casas

Chen, X., He, K.: Exploring Simple Siamese Representation Learning. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 15745–15753 (2021). https://doi.org/10.1109/CVPR46437.2021.01549

work page doi:10.1109/cvpr46437.2021.01549 2021

[14] [14]

Chun-Hsiao Yeh, Y.C.: IN100pytorch: PyTorch Implementation: Training ResNets on ImageNet-100 (2022), https://github.com/danielchyeh/ImageNet-100-Pytorch, accessed: 2026-06-02

2022

[15] [15]

In: Proc

Cubuk, E.D., Zoph, B., Mané, D., Vasudevan, V., Le, Q.V.: AutoAugment: Learn- ing Augmentation Strategies From Data. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 113–123 (2019). https: //doi.org/10.1109/CVPR.2019.00020

work page doi:10.1109/cvpr.2019.00020 2019

[16] [16]

Cubuk, E.D., Zoph, B., Shlens, J., Le, Q.V.: RandAugment: Practical automated dataaugmentationwithareducedsearchspace.In:2020IEEE/CVFConferenceon Computer Vision and Pattern Recognition Workshops (CVPRW). pp. 3008–3017 (2020). https://doi.org/10.1109/CVPRW50498.2020.00359

work page doi:10.1109/cvprw50498.2020.00359 2020

[17] [17]

Deep Residual Learning for Image Recognition

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A Large- Scale Hierarchical Image Database. In: Proc. IEEE Conference on Computer Vi- sion and Pattern Recognition. pp. 248–255 (2009). https://doi.org/10.1109/CVPR. 2009.5206848

work page doi:10.1109/cvpr 2009

[18] [18]

Otaduy, and Dan Casas

Deng, W., Zheng, L.: Are Labels Always Necessary for Classifier Accuracy Eval- uation? In: Proc. IEEE Conference on Computer Vision and Pattern Recog- nition, CVPR. pp. 15069–15078. Computer Vision Foundation / IEEE (2021). https://doi.org/10.1109/CVPR46437.2021.01482

work page doi:10.1109/cvpr46437.2021.01482 2021

[19] [19]

In: Proc

Garrido, Q., Balestriero, R., Najman, L., LeCun, Y.: RankMe: Assessing the Down- stream Performance of Pretrained Self-supervised Representations by Their Rank. In: Proc. International Conference on Machine Learning. ICML’23, JMLR.org (2023), http://dl.acm.org/doi/10.5555/3618408.3618848

work page doi:10.5555/3618408.3618848 2023

[20] [20]

Google LLC: Open Images V7 (2022), https://storage.googleapis.com/ openimages/web/factsfigures_v7.html, accessed: 2026-06-02

2022

[21] [21]

In: Proc

Grill, J.B., Strub, F., Altché, F., Tallec, C., Richemond, P.H., Buchatskaya, E., Doersch,C.,Pires,B.A.,Guo,Z.D.,Azar,M.G.,Piot,B.,Kavukcuoglu,K.,Munos, R., Valko, M.: Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning. In: Proc. International Conference on Neural Information Processing Systems. NIPS ’20, Curran Associates Inc., Red Ho...

work page doi:10.5555/3495724.3497510 2020

[22] [22]

CoRR abs/2212.11491(2022)

Gupta, K., Ajanthan, T., van den Hengel, A., Gould, S.: Understanding and Improving the Role of Projection Head in Self-Supervised Learning. CoRR abs/2212.11491(2022). https://doi.org/10.48550/arXiv.2212.11491

work page doi:10.48550/arxiv.2212.11491 2022

[23] [23]

Deep Residual Learning for Image Recognition

He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

work page doi:10.1109/cvpr.2016.90 2016

[24] [24]

In: Proc

Hendrycks, D., Mu, N., Cubuk, E.D., Zoph, B., Gilmer, J., Lakshminarayanan, B.: AugMix: A Simple Data Processing Method to Improve Robustness and Un- certainty. In: Proc. International Conference on Learning Representations, ICLR. OpenReview.net (2020). https://doi.org/10.48550/arXiv.1912.02781

work page doi:10.48550/arxiv.1912.02781 2020

[25] [25]

IEEE Transactions on Neural Networks4(1), 156–159 (1993)

Jang, J.S., Sun, C.T.: Functional Equivalence Between Radial Basis Function Net- works and Fuzzy Inference Systems. IEEE Transactions on Neural Networks4(1), 156–159 (1993). https://doi.org/10.1109/72.182710

work page doi:10.1109/72.182710 1993

[26] [26]

In: Proc

Jing, L., Vincent, P., LeCun, Y., Tian, Y.: Understanding Dimensional Collapse in Contrastive Self-supervised Learning. In: Proc. International Conference on Learn- ing Representations, ICLR. OpenReview.net (2022). https://doi.org/10.48550/ arXiv.2110.09348 RBFNs as Projection Heads in Self-Supervised Learning 19

arXiv 2022

[27] [27]

In: Proc

Johnson, D.D., Hanchi, A.E., Maddison, C.J.: Contrastive Learning Can Find An Optimal Basis For Approximately View-Invariant Functions. In: Proc. Interna- tional Conference on Learning Representations, ICLR. OpenReview.net (2023). https://doi.org/10.48550/arXiv.2210.01883

work page doi:10.48550/arxiv.2210.01883 2023

[28] [28]

Kornblith, S., Shlens, J., Le, Q.V.: Do Better ImageNet Models Transfer Better? In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2656–2666 (2019). https://doi.org/10.1109/CVPR.2019.00277

work page doi:10.1109/cvpr.2019.00277 2019

[29] [29]

In: Proc

Lee, J., Bahri, Y., Novak, R., Schoenholz, S.S., Pennington, J., Sohl-Dickstein, J.: Deep Neural Networks as Gaussian Processes. In: Proc. International Conference on Learning Representations, ICLR. OpenReview.net (2018). https://doi.org/10. 48550/arXiv.1711.00165

Pith/arXiv arXiv 2018

[30] [30]

In: Proc

Li, Y., Pogodin, R., Sutherland, D.J., Gretton, A.: Self-Supervised Learning with Kernel Dependence Maximization. In: Proc. International Conference on Neural Information Processing Systems. NIPS ’21, Curran Associates Inc., Red Hook, NY, USA (2021), http://dl.acm.org/doi/10.5555/3540261.3541451

work page doi:10.5555/3540261.3541451 2021

[31] [31]

In: Proc

Lundberg, S.M., Lee, S.I.: A Unified Approach to Interpreting Model Predictions. In: Proc. International Conference on Neural Information Processing Systems. p. 4768–4777. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (2017), https: //dl.acm.org/doi/10.5555/3295222.3295230

work page doi:10.5555/3295222.3295230 2017

[32] [32]

In: Proc

Ma, J., Hu, T., Wang, W.: Deciphering the Projection Head: Representation Eval- uation Self-supervised Learning. In: Proc. International Joint Conference on Arti- ficial Intelligence. IJCAI ’24 (2024). https://doi.org/10.24963/ijcai.2024/522

work page doi:10.24963/ijcai.2024/522 2024

[33] [33]

Constructive Approximation2(1), 11–22 (1986)

Micchelli, C.A.: Interpolation of Scattered Data: Distance Matrices and Condition- ally Positive Definite Functions. Constructive Approximation2(1), 11–22 (1986). https://doi.org/10.1007/BF01893414

work page doi:10.1007/bf01893414 1986

[34] [34]

Nandam, S.R., Atito, S., Feng, Z., Kittler, J., Awais, M.: Investigating Self- Supervised Methods for Label-Efficient Learning. Int. J. Comput. Vision133(7), 4522–4537 (Mar 2025). https://doi.org/10.1007/s11263-025-02397-4

work page doi:10.1007/s11263-025-02397-4 2025

[35] [35]

CoRRabs/1807.03748(2018)

van den Oord, A., Li, Y., Vinyals, O.: Representation Learning with Contrastive Predictive Coding. CoRRabs/1807.03748(2018). https://doi.org/10.48550/ arXiv.1807.03748

Pith/arXiv arXiv 2018

[36] [36]

Orr, M.J.L.: Introduction to Radial Basis Function Networks. Tech. rep., Centre for Cognitive Science, University of Edinburgh (April 1996), https://faculty.cc.gatech. edu/~isbell/tutorials/rbf-intro.pdf, accessed: 2026-06-05

1996

[37] [37]

Neural Computation3(2), 246–257 (06 1991)

Park, J., Sandberg, I.W.: Universal Approximation Using Radial-Basis-Function Networks. Neural Computation3(2), 246–257 (06 1991). https://doi.org/10.1162/ neco.1991.3.2.246

1991

[38] [38]

In: Neu, G., Rosasco, L

Parulekar, A., Collins, L., Shanmugam, K., Mokhtari, A., Shakkottai, S.: InfoNCE Loss Provably Learns Cluster-Preserving Representations. In: Neu, G., Rosasco, L. (eds.) Proc. International Conference on Learning Theory. Proceedings of Machine Learning Research, vol. 195, pp. 1914–1961. PMLR (12–15 Jul 2023). https://doi. org/10.48550/arXiv.2302.07920

work page doi:10.48550/arxiv.2302.07920 1914

[39] [39]

Note on regression and inheritance in the case of two parents , author =

Pearson, K.: VII. Note on Regression and Inheritance in the Case of Two Parents. Proceedings of the Royal Society of London58(347-352), 240–242 (12 1895). https: //doi.org/10.1098/rspl.1895.0041

work page doi:10.1098/rspl.1895.0041

[40] [40]

Schliebitz et al

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet Large Scale Visual Recognition Challenge 2013 (ILSVRC2013) (2013), https: //www.image-net.org/challenges/LSVRC/2013/index, accessed: 2026-06-02 20 A. Schliebitz et al

2013

[41] [41]

Bernstein, Alexander C

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV)115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

work page doi:10.1007/s11263-015-0816-y 2015

[42] [42]

Russo, A.: Pytorch RBF Layer (2021), https://github.com/rssalessio/ PytorchRBFLayer, accessed: 2026-06-02

2021

[43] [43]

Schliebitz, A., Tapken, H., Atzmueller, M.: The OpenImagesV7-100 Dataset (Dec 2025), https://github.com/andreas-schliebitz/open-images-v7-100, accessed: 2026-06-02

2025

[44] [44]

CoRRabs/2307.08913(2023)

Song, Z., Su, X., Wang, J., Qiang, W., Zheng, C., Sun, F.: Towards the Sparseness of Projection Head in Self-Supervised Learning. CoRRabs/2307.08913(2023). https://doi.org/10.48550/arXiv.2307.08913

work page doi:10.48550/arxiv.2307.08913 2023

[45] [45]

The American Journal of Psychology15(1), 72–101 (1904), http://www.jstor.org/ stable/1412159

Spearman, C.: The Proof and Measurement of Association between Two Things. The American Journal of Psychology15(1), 72–101 (1904), http://www.jstor.org/ stable/1412159

arXiv 1904

[46] [46]

Susmelj, I., Heller, M., Wirth, P., Prescott, J., Ebner, M., et al.: Lightly, https: //github.com/lightly-ai/lightly, accessed: 2026-06-02

2026

[47] [47]

In: Proc

Thilak, V., Huang, C., Saremi, O., Dinh, L., Goh, H., Nakkiran, P., Susskind, J.M., Littwin, E.: LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures. In: Proc. International Conference on Learning Representations, ICLR. OpenReview.net (2024). https://doi.org/10.48550/arXiv.2312.04000

work page doi:10.48550/arxiv.2312.04000 2024

[48] [48]

In: Proc Euro- peanConferenceonComputerVision(ECCV).p.776–794.Springer-Verlag,Berlin, Heidelberg (2020)

Tian, Y., Krishnan, D., Isola, P.: Contrastive Multiview Coding. In: Proc Euro- peanConferenceonComputerVision(ECCV).p.776–794.Springer-Verlag,Berlin, Heidelberg (2020). https://doi.org/10.1007/978-3-030-58621-8_45

work page doi:10.1007/978-3-030-58621-8_45 2020

[49] [49]

TorchVision maintainers and contributors: TorchVision: PyTorch’s Computer Vi- sion library (Nov 2016), https://github.com/pytorch/vision, accessed: 2026-06-02

2016

[50] [50]

In: Proc

Wang, T., Isola, P.: Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere. In: Proc. International Confer- ence on Machine Learning. ICML’20, JMLR.org (2020), https://dl.acm.org/doi/ 10.5555/3524938.3525859

work page doi:10.5555/3524938.3525859 2020

[51] [51]

Deep Kernel Learning

Wilson, A.G., Hu, Z., Salakhutdinov, R., Xing, E.P.: Deep Kernel Learning. In: Gretton, A., Robert, C.C. (eds.) Proc. International Conference on Artificial In- telligence and Statistics. vol. 51, pp. 370–378. PMLR, Cadiz, Spain (09–11 May 2016). https://doi.org/10.48550/arXiv.1511.02222

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1511.02222 2016

[52] [52]

Artificial Intelligence Review59(4), 112 (Feb 2026)

Wittscher, L.: A survey on design choices for self-supervised learning in computer vision. Artificial Intelligence Review59(4), 112 (Feb 2026). https://doi.org/10. 1007/s10462-026-11506-9

2026

[53] [53]

In: Proc

Xue, Y., Gan, E., Ni, J., Joshi, S., Mirzasoleiman, B.: Investigating the Benefits of Projection Head for Representation Learning. In: Proc. International Confer- ence on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net (2024). https://doi.org/10.48550/arXiv.2403.11391

work page doi:10.48550/arxiv.2403.11391 2024

[54] [54]

CoRRabs/1812.03190(2018)

Zadeh, P.H., Hosseini, R., Sra, S.: Deep-RBF Networks Revisited: Robust Classifi- cation with Rejection. CoRRabs/1812.03190(2018). https://doi.org/10.48550/ arXiv.1812.03190

Pith/arXiv arXiv 2018

[55] [55]

Proceedings of Machine Learning Research, vol

Zbontar, J., Jing, L., Misra, I., LeCun, Y., Deny, S.: Barlow Twins: Self-Supervised LearningviaRedundancyReduction.In:Meila,M.,Zhang,T.(eds.)Proceedingsof the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 12310–12320. PMLR (18–24 Jul 2021). https: //doi.org/10.48550/arXiv.2103.03230

work page doi:10.48550/arxiv.2103.03230 2021