Weighted InfoNCE objectives realize specific target geometries in embedding space, with SupCon producing size-dependent inter-class similarities under imbalance while Soft SupCon and certain continuous variants preserve regular simplex or unique optima.
hub
Learning deep representations by mutual information estimation and maximization
21 Pith papers cite this work. Polarity classification is still indexing.
abstract
In this work, we perform unsupervised learning of representations by maximizing mutual information between an input and the output of a deep neural network encoder. Importantly, we show that structure matters: incorporating knowledge about locality of the input to the objective can greatly influence a representation's suitability for downstream tasks. We further control characteristics of the representation by matching to a prior distribution adversarially. Our method, which we call Deep InfoMax (DIM), outperforms a number of popular unsupervised learning methods and competes with fully-supervised learning on several classification tasks. DIM opens new avenues for unsupervised learning of representations and is an important step towards flexible formulations of representation-learning objectives for specific end-goals.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
A framework with TOPPing source selection and VACAI-Bowl dual-branch model yields 54.62% average improvement in dependency parsing across 10 low-resource varieties.
FF-TRUST delivers state-of-the-art sleep staging performance across domain shifts and both symmetric and asymmetric label noise by jointly regularizing temporal and spectral consistency on five public datasets.
SimCLR learns visual representations by contrasting augmented views of the same image and reaches 76.5% ImageNet top-1 accuracy with a linear classifier, matching a supervised ResNet-50.
Information defined as maximum-caliber deviation derives IIT 3.0 cause-effect repertoires from constrained entropy maximization and equates to prediction error under CLT and LDT.
LeJEPA derives an optimal isotropic Gaussian target for embeddings and enforces it via sketched regularization to deliver scalable, heuristics-free self-supervised pretraining with 79% ImageNet linear accuracy on ViT-H/14.
RCL adds similarity-based weak positive samples to supervised contrastive learning in sequential recommendation and reports an average 4.88% improvement over state-of-the-art methods across six datasets.
A multi-scale and cross-scale contrastive learning framework uses intra-encoder stage features and a new sampling process to link short-range and long-range video moments for temporal grounding.
TaDSE learns dialogue sentence embeddings via template-guided self-supervised contrastive learning plus synthetic slot-filling augmentation and reports gains on five downstream benchmarks.
LLMs exhibit mid-layer representation advantage for recommendations; MARC compresses representations modularly to reduce costs while improving performance, as shown in a large-scale online advertising deployment.
V-JEPA models trained only on feature prediction from 2 million public videos achieve 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet-1K using frozen ViT-H/16 backbones.
Hugging Face releases an open-source Python library that supplies a unified API and pretrained weights for major Transformer architectures used in natural language processing.
SSL clustering is derived as KL-divergence optimization where a teacher-distribution constraint normalizes via inverse cluster priors and simplifies to batch centering by Jensen's inequality.
M-IDoL learns modality-specific and diverse representations by maximizing inter-modality entropy and minimizing intra-modality uncertainty through information decomposition in MoE subspaces.
ID-Sim is a new similarity metric that aims to capture human selective sensitivity to identities by training on curated real and generative synthetic data and validating against human annotations on recognition, retrieval, and generative tasks.
GMAE learns disentangled view-specific and view-common embeddings via dual-path autoencoders and cross-view adversarial training to boost performance on complete and incomplete multi-view clustering tasks.
Introduces IFM loss regularization for CNNs to learn correlated discriminative features, tested on shiftedMNIST dataset.
DVSA improves zero-shot learning under ambiguous labels by mutually calibrating visual features and attributes with attention and dynamic disambiguation.
A practical guide that organizes seven IT measures around three questions each—what it answers in AI, suitable estimators, and dangerous misuses—complete with flowchart, table, and worked examples.
citing papers explorer
-
A Unified Geometric Framework for Weighted Contrastive Learning
Weighted InfoNCE objectives realize specific target geometries in embedding space, with SupCon producing size-dependent inter-class similarities under imbalance while Soft SupCon and certain continuous variants preserve regular simplex or unique optima.
-
Harnessing Linguistic Dissimilarity for Language Generalization on Unseen Low-Resource Varieties
A framework with TOPPing source selection and VACAI-Bowl dual-branch model yields 54.62% average improvement in dependency parsing across 10 low-resource varieties.
-
Towards Multi-Source Domain Generalization for Sleep Staging with Noisy Labels
FF-TRUST delivers state-of-the-art sleep staging performance across domain shifts and both symmetric and asymmetric label noise by jointly regularizing temporal and spectral consistency on five public datasets.
-
A Simple Framework for Contrastive Learning of Visual Representations
SimCLR learns visual representations by contrasting augmented views of the same image and reaches 76.5% ImageNet top-1 accuracy with a linear classifier, matching a supervised ResNet-50.
-
Information as Maximum-Caliber Deviation: A bridge between Integrated Information Theory and the Free Energy Principle
Information defined as maximum-caliber deviation derives IIT 3.0 cause-effect repertoires from constrained entropy maximization and equates to prediction error under CLT and LDT.
-
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics
LeJEPA derives an optimal isotropic Gaussian target for embeddings and enforces it via sketched regularization to deliver scalable, heuristics-free self-supervised pretraining with 79% ImageNet linear accuracy on ViT-H/14.
-
Relative Contrastive Learning for Sequential Recommendation with Similarity-based Positive Pair Selection
RCL adds similarity-based weak positive samples to supervised contrastive learning in sequential recommendation and reports an average 4.88% improvement over state-of-the-art methods across six datasets.
-
Multi-Scale Contrastive Learning for Video Temporal Grounding
A multi-scale and cross-scale contrastive learning framework uses intra-encoder stage features and a new sampling process to link short-range and long-range video moments for temporal grounding.
-
Template-assisted Contrastive Learning of Task-oriented Dialogue Sentence Embeddings
TaDSE learns dialogue sentence embeddings via template-guided self-supervised contrastive learning plus synthetic slot-filling augmentation and reports gains on five downstream benchmarks.
-
Modular Representation Compression: Adapting LLMs for Efficient and Effective Recommendations
LLMs exhibit mid-layer representation advantage for recommendations; MARC compresses representations modularly to reduce costs while improving performance, as shown in a large-scale online advertising deployment.
-
Revisiting Feature Prediction for Learning Visual Representations from Video
V-JEPA models trained only on feature prediction from 2 million public videos achieve 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet-1K using frozen ViT-H/16 backbones.
-
HuggingFace's Transformers: State-of-the-art Natural Language Processing
Hugging Face releases an open-source Python library that supplies a unified API and pretrained weights for major Transformer architectures used in natural language processing.
-
Information theoretic underpinning of self-supervised learning by clustering
SSL clustering is derived as KL-divergence optimization where a teacher-distribution constraint normalizes via inverse cluster priors and simplifies to batch centering by Jensen's inequality.
-
M-IDoL: Information Decomposition for Modality-Specific and Diverse Representation Learning in Medical Foundation Model
M-IDoL learns modality-specific and diverse representations by maximizing inter-modality entropy and minimizing intra-modality uncertainty through information decomposition in MoE subspaces.
-
ID-Sim: An Identity-Focused Similarity Metric
ID-Sim is a new similarity metric that aims to capture human selective sensitivity to identities by training on curated real and generative synthetic data and validating against human annotations on recognition, retrieval, and generative tasks.
-
Learning Disentangled Representations for Generalized Multi-view Clustering
GMAE learns disentangled view-specific and view-common embeddings via dual-path autoencoders and cross-view adversarial training to boost performance on complete and incomplete multi-view clustering tasks.
-
Learning to Find Correlated Features by Maximizing Information Flow in Convolutional Neural Networks
Introduces IFM loss regularization for CNNs to learn correlated discriminative features, tested on shiftedMNIST dataset.
-
Dynamic Visual-semantic Alignment for Zero-shot Learning with Ambiguous Labels
DVSA improves zero-shot learning under ambiguous labels by mutually calibrating visual features and attributes with attention and dynamic disambiguation.
-
Information-Theoretic Measures in AI: A Practical Decision Guide
A practical guide that organizes seven IT measures around three questions each—what it answers in AI, suitable estimators, and dangerous misuses—complete with flowchart, table, and worked examples.
- InfoGeo: Information-Theoretic Object-Centric Learning for Cross-View Generalizable UAV Geo-Localization
- DETR-ViP: Detection Transformer with Robust Discriminative Visual Prompts