pith. sign in

arxiv: 2502.09252 · v2 · pith:72XLCBZUnew · submitted 2025-02-13 · 💻 cs.LG

On the Importance of Embedding Norms in Self-Supervised Learning

classification 💻 cs.LG
keywords embeddingnormsconvergencelearningnetworkconfidencedatarole
0
0 comments X
read the original abstract

Self-supervised learning (SSL) allows training data representations without a supervised signal and has become an important paradigm in machine learning. Most SSL methods employ the cosine similarity between embedding vectors and hence effectively embed data on a hypersphere. While this seemingly implies that embedding norms cannot play any role in SSL, a few recent works have suggested that embedding norms have properties related to network convergence and confidence. In this paper, we resolve this apparent contradiction and systematically establish the embedding norm's role in SSL training. Using theoretical analysis, simulations, and experiments, we show that embedding norms (i) govern SSL convergence rates and (ii) encode network confidence, with smaller norms corresponding to unexpected samples. Additionally, we show that manipulating embedding norms can have large effects on convergence speed. Our findings demonstrate that SSL embedding norms are integral to understanding and optimizing network behavior.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Optimization Dynamics Imprint Semantic Specificity in Contrastive Embedding Norms

    stat.ML 2026-06 unverdicted novelty 7.0

    Embedding norms in contrastive models encode semantic properties via optimization dynamics under scale-invariant losses.

  2. Improving Relative Representations with Learned Anchors and Whitened Inner Products

    cs.LG 2026-05 unverdicted novelty 5.0

    Learned anchors as semantic prototypes combined with whitened inner products improve relative representations, enabling nearly lossless zero-shot communication between heterogeneous neural models on vision and language tasks.