Latent prediction SSL recovers latent trees from PCFG data with sample complexity constant in hierarchy depth L (up to logs), unlike exponential for token-level or supervised methods.
hub
Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred Warmuth
14 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
In generalized contrastive learning with imbalanced classes, optimal representations collapse to class means whose angular geometry is determined by class proportions via convex optimization, and extreme imbalance causes all minority classes to collapse to one vector.
A two-stage self-supervised protocol using SimCLR on generic then target unlabeled data, followed by generic-label fine-tuning, reaches 97.8% average accuracy for parking occupancy across three public datasets without target labels.
RankUp raises effective rank of representations in deep MetaFormer recommenders via randomized splitting and multi-embeddings, delivering 2-5% GMV gains in production deployments at Weixin.
A contrastive alignment model plus offline preference learning explicitly grounds hierarchical VLA language descriptions to actions and visuals on LanguageTable, achieving performance comparable to fully supervised fine-tuning while reducing annotation needs.
DistillGaze reduces median gaze error by 58.62% on a 2000+ participant dataset by distilling foundation models into a 256K-parameter on-device model using synthetic labeled data and unlabeled real data.
LeJEPA derives an optimal isotropic Gaussian target for embeddings and enforces it via sketched regularization to deliver scalable, heuristics-free self-supervised pretraining with 79% ImageNet linear accuracy on ViT-H/14.
Self-supervised neural operator uses Bayesian PINNs to generate training data and a Transformer to learn PDE operators, achieving high accuracy on 1D/2D reaction-diffusion and fluid vibration problems with optional lightweight finetuning.
Proves learnability of ordered multiple smooth boundaries in pairwise binary classification via localized deep ReLU networks.
ECG-JEPA applies a joint-embedding predictive architecture with Cross-Pattern Attention to learn semantic representations from unlabeled 12-lead ECG data and reports state-of-the-art results on diagnostic classification, feature extraction, and segmentation.
Domain-adapted augmentations and plant-specific training data improve self-supervised representations for fine-grained plant species recognition over standard SSL pipelines.
Target-informed self-supervised pretraining via masked image modeling and contrastive learning, plus a confidence-aware infusion head, yields over 6% Dice improvement on unlabeled target-domain POCUS images for pediatric fracture assessment.
A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.
citing papers explorer
-
Learn from your own latents and not from tokens: A sample-complexity theory
Latent prediction SSL recovers latent trees from PCFG data with sample complexity constant in hierarchy depth L (up to logs), unlike exponential for token-level or supervised methods.
-
Optimal Representations for Generalized Contrastive Learning with Imbalanced Datasets
In generalized contrastive learning with imbalanced classes, optimal representations collapse to class means whose angular geometry is determined by class proportions via convex optimization, and extreme imbalance causes all minority classes to collapse to one vector.
-
RankUp: Towards High-rank Representations for Large Scale Advertising Recommender Systems
RankUp raises effective rank of representations in deep MetaFormer recommenders via randomized splitting and multi-embeddings, delivering 2-5% GMV gains in production deployments at Weixin.
-
Grounding Hierarchical Vision-Language-Action Models Through Explicit Language-Action Alignment
A contrastive alignment model plus offline preference learning explicitly grounds hierarchical VLA language descriptions to actions and visuals on LanguageTable, achieving performance comparable to fully supervised fine-tuning while reducing annotation needs.
-
Rapidly deploying on-device eye tracking by distilling visual foundation models
DistillGaze reduces median gaze error by 58.62% on a 2000+ participant dataset by distilling foundation models into a 256K-parameter on-device model using synthetic labeled data and unlabeled real data.
-
Self-supervised neural operator for solving partial differential equations
Self-supervised neural operator uses Bayesian PINNs to generate training data and a Transformer to learn PDE operators, achieving high accuracy on 1D/2D reaction-diffusion and fluid vibration problems with optional lightweight finetuning.
-
Statistical learnability of smooth boundaries via pairwise binary classification with deep ReLU networks
Proves learnability of ordered multiple smooth boundaries in pairwise binary classification via localized deep ReLU networks.
-
Learning General Representation of 12-Lead Electrocardiogram with a Joint-Embedding Predictive Architecture
ECG-JEPA applies a joint-embedding predictive architecture with Cross-Pattern Attention to learn semantic representations from unlabeled 12-lead ECG data and reports state-of-the-art results on diagnostic classification, feature extraction, and segmentation.
-
Self-Supervised Learning of Plant Image Representations
Domain-adapted augmentations and plant-specific training data improve self-supervised representations for fine-grained plant species recognition over standard SSL pipelines.
-
Robust Cross-Domain Generalization Using Unlabeled Target Data with Source-Domain Supervision
Target-informed self-supervised pretraining via masked image modeling and contrastive learning, plus a confidence-aware infusion head, yields over 6% Dice improvement on unlabeled target-domain POCUS images for pediatric fracture assessment.
-
There Will Be a Scientific Theory of Deep Learning
A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.