In generalized contrastive learning with imbalanced classes, optimal representations collapse to class means whose angular geometry is determined by class proportions via convex optimization, and extreme imbalance causes all minority classes to collapse to one vector.
hub
Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred Warmuth
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
RankUp raises effective rank of representations in deep MetaFormer recommenders via randomized splitting and multi-embeddings, delivering 2-5% GMV gains in production deployments at Weixin.
A contrastive alignment model plus offline preference learning explicitly grounds hierarchical VLA language descriptions to actions and visuals on LanguageTable, achieving performance comparable to fully supervised fine-tuning while reducing annotation needs.
DistillGaze reduces median gaze error by 58.62% on a 2000+ participant dataset by distilling foundation models into a 256K-parameter on-device model using synthetic labeled data and unlabeled real data.
LeJEPA derives an optimal isotropic Gaussian target for embeddings and enforces it via sketched regularization to deliver scalable, heuristics-free self-supervised pretraining with 79% ImageNet linear accuracy on ViT-H/14.
Self-supervised neural operator uses Bayesian PINNs to generate training data and a Transformer to learn PDE operators, achieving high accuracy on 1D/2D reaction-diffusion and fluid vibration problems with optional lightweight finetuning.
Proves learnability of ordered multiple smooth boundaries in pairwise binary classification via localized deep ReLU networks.
ECG-JEPA applies a joint-embedding predictive architecture with Cross-Pattern Attention to learn semantic representations from unlabeled 12-lead ECG data and reports state-of-the-art results on diagnostic classification, feature extraction, and segmentation.
Domain-specific augmentations and plant-only training data produce stronger self-supervised representations for fine-grained plant recognition than standard SSL pipelines or ImageNet pretraining.
A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.