In generalized contrastive learning with imbalanced classes, optimal representations collapse to class means whose angular geometry is determined by class proportions via convex optimization, and extreme imbalance causes all minority classes to collapse to one vector.
hub
A cookbook of self-supervised learning
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
RankUp raises effective rank of representations in deep MetaFormer recommenders via randomized splitting and multi-embeddings, delivering 2-5% GMV gains in production deployments at Weixin.
A contrastive alignment model plus offline preference learning explicitly grounds hierarchical VLA language descriptions to actions and visuals on LanguageTable, achieving performance comparable to fully supervised fine-tuning while reducing annotation needs.
DistillGaze reduces median gaze error by 58.62% on a 2000+ participant dataset by distilling foundation models into a 256K-parameter on-device model using synthetic labeled data and unlabeled real data.
LeJEPA derives an optimal isotropic Gaussian target for embeddings and enforces it via sketched regularization to deliver scalable, heuristics-free self-supervised pretraining with 79% ImageNet linear accuracy on ViT-H/14.
Self-supervised neural operator uses Bayesian PINNs to generate training data and a Transformer to learn PDE operators, achieving high accuracy on 1D/2D reaction-diffusion and fluid vibration problems with optional lightweight finetuning.
Proves learnability of ordered multiple smooth boundaries in pairwise binary classification via localized deep ReLU networks.
ECG-JEPA applies a joint-embedding predictive architecture with Cross-Pattern Attention to learn semantic representations from unlabeled 12-lead ECG data and reports state-of-the-art results on diagnostic classification, feature extraction, and segmentation.
Domain-specific augmentations and plant-only training data produce stronger self-supervised representations for fine-grained plant recognition than standard SSL pipelines or ImageNet pretraining.
A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.
citing papers explorer
-
Optimal Representations for Generalized Contrastive Learning with Imbalanced Datasets
In generalized contrastive learning with imbalanced classes, optimal representations collapse to class means whose angular geometry is determined by class proportions via convex optimization, and extreme imbalance causes all minority classes to collapse to one vector.
-
RankUp: Towards High-rank Representations for Large Scale Advertising Recommender Systems
RankUp raises effective rank of representations in deep MetaFormer recommenders via randomized splitting and multi-embeddings, delivering 2-5% GMV gains in production deployments at Weixin.
-
Grounding Hierarchical Vision-Language-Action Models Through Explicit Language-Action Alignment
A contrastive alignment model plus offline preference learning explicitly grounds hierarchical VLA language descriptions to actions and visuals on LanguageTable, achieving performance comparable to fully supervised fine-tuning while reducing annotation needs.
-
Rapidly deploying on-device eye tracking by distilling visual foundation models
DistillGaze reduces median gaze error by 58.62% on a 2000+ participant dataset by distilling foundation models into a 256K-parameter on-device model using synthetic labeled data and unlabeled real data.
-
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics
LeJEPA derives an optimal isotropic Gaussian target for embeddings and enforces it via sketched regularization to deliver scalable, heuristics-free self-supervised pretraining with 79% ImageNet linear accuracy on ViT-H/14.
-
Self-supervised neural operator for solving partial differential equations
Self-supervised neural operator uses Bayesian PINNs to generate training data and a Transformer to learn PDE operators, achieving high accuracy on 1D/2D reaction-diffusion and fluid vibration problems with optional lightweight finetuning.
-
Statistical learnability of smooth boundaries via pairwise binary classification with deep ReLU networks
Proves learnability of ordered multiple smooth boundaries in pairwise binary classification via localized deep ReLU networks.
-
Learning General Representation of 12-Lead Electrocardiogram with a Joint-Embedding Predictive Architecture
ECG-JEPA applies a joint-embedding predictive architecture with Cross-Pattern Attention to learn semantic representations from unlabeled 12-lead ECG data and reports state-of-the-art results on diagnostic classification, feature extraction, and segmentation.
-
Self-Supervised Learning of Plant Image Representations
Domain-specific augmentations and plant-only training data produce stronger self-supervised representations for fine-grained plant recognition than standard SSL pipelines or ImageNet pretraining.
-
There Will Be a Scientific Theory of Deep Learning
A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.
- Next-Latent Prediction Transformers Learn Compact World Models