Weighted InfoNCE objectives realize specific target geometries in embedding space, with SupCon producing size-dependent inter-class similarities under imbalance while Soft SupCon and certain continuous variants preserve regular simplex or unique optima.
hub
Learning deep representations by mutual information estimation and maximization
23 Pith papers cite this work. Polarity classification is still indexing.
abstract
In this work, we perform unsupervised learning of representations by maximizing mutual information between an input and the output of a deep neural network encoder. Importantly, we show that structure matters: incorporating knowledge about locality of the input to the objective can greatly influence a representation's suitability for downstream tasks. We further control characteristics of the representation by matching to a prior distribution adversarially. Our method, which we call Deep InfoMax (DIM), outperforms a number of popular unsupervised learning methods and competes with fully-supervised learning on several classification tasks. DIM opens new avenues for unsupervised learning of representations and is an important step towards flexible formulations of representation-learning objectives for specific end-goals.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
A framework with TOPPing source selection and VACAI-Bowl dual-branch model yields 54.62% average improvement in dependency parsing across 10 low-resource varieties.
FF-TRUST delivers state-of-the-art sleep staging performance across domain shifts and both symmetric and asymmetric label noise by jointly regularizing temporal and spectral consistency on five public datasets.
SimCLR learns visual representations by contrasting augmented views of the same image and reaches 76.5% ImageNet top-1 accuracy with a linear classifier, matching a supervised ResNet-50.
CrysLDNet combines VAE and latent diffusion pretraining on unlabeled crystals to improve graph encoder performance on property prediction by about 4-5% on JARVIS and MP datasets.
CLDG and CLDG++ learn node representations on dynamic graphs by contrasting timespan views under temporal translation invariance, with extensions for global context via diffusion and integration into anomaly detection.
Information defined as maximum-caliber deviation derives IIT 3.0 cause-effect repertoires from constrained entropy maximization and equates to prediction error under CLT and LDT.
LeJEPA derives an optimal isotropic Gaussian target for embeddings and enforces it via sketched regularization to deliver scalable, heuristics-free self-supervised pretraining with 79% ImageNet linear accuracy on ViT-H/14.
RCL adds similarity-based weak positive samples to supervised contrastive learning in sequential recommendation and reports an average 4.88% improvement over state-of-the-art methods across six datasets.
A multi-scale and cross-scale contrastive learning framework uses intra-encoder stage features and a new sampling process to link short-range and long-range video moments for temporal grounding.
TaDSE learns dialogue sentence embeddings via template-guided self-supervised contrastive learning plus synthetic slot-filling augmentation and reports gains on five downstream benchmarks.
LLMs exhibit mid-layer representation advantage for recommendations; MARC compresses representations modularly to reduce costs while improving performance, as shown in a large-scale online advertising deployment.
V-JEPA models trained only on feature prediction from 2 million public videos achieve 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet-1K using frozen ViT-H/16 backbones.
Hugging Face releases an open-source Python library that supplies a unified API and pretrained weights for major Transformer architectures used in natural language processing.
SSL clustering is derived as KL-divergence optimization where a teacher-distribution constraint normalizes via inverse cluster priors and simplifies to batch centering by Jensen's inequality.
M-IDoL learns modality-specific and diverse representations by maximizing inter-modality entropy and minimizing intra-modality uncertainty through information decomposition in MoE subspaces.
ID-Sim is a new similarity metric that aims to capture human selective sensitivity to identities by training on curated real and generative synthetic data and validating against human annotations on recognition, retrieval, and generative tasks.
GMAE learns disentangled view-specific and view-common embeddings via dual-path autoencoders and cross-view adversarial training to boost performance on complete and incomplete multi-view clustering tasks.
Introduces IFM loss regularization for CNNs to learn correlated discriminative features, tested on shiftedMNIST dataset.
DVSA improves zero-shot learning under ambiguous labels by mutually calibrating visual features and attributes with attention and dynamic disambiguation.
citing papers explorer
-
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics
LeJEPA derives an optimal isotropic Gaussian target for embeddings and enforces it via sketched regularization to deliver scalable, heuristics-free self-supervised pretraining with 79% ImageNet linear accuracy on ViT-H/14.
-
Relative Contrastive Learning for Sequential Recommendation with Similarity-based Positive Pair Selection
RCL adds similarity-based weak positive samples to supervised contrastive learning in sequential recommendation and reports an average 4.88% improvement over state-of-the-art methods across six datasets.