Recognition: unknown
Learning deep representations by mutual information estimation and maximization
read the original abstract
In this work, we perform unsupervised learning of representations by maximizing mutual information between an input and the output of a deep neural network encoder. Importantly, we show that structure matters: incorporating knowledge about locality of the input to the objective can greatly influence a representation's suitability for downstream tasks. We further control characteristics of the representation by matching to a prior distribution adversarially. Our method, which we call Deep InfoMax (DIM), outperforms a number of popular unsupervised learning methods and competes with fully-supervised learning on several classification tasks. DIM opens new avenues for unsupervised learning of representations and is an important step towards flexible formulations of representation-learning objectives for specific end-goals.
This paper has not been read by Pith yet.
Forward citations
Cited by 15 Pith papers
-
A Unified Geometric Framework for Weighted Contrastive Learning
Weighted InfoNCE objectives realize specific target geometries in embedding space, with SupCon producing size-dependent inter-class similarities under imbalance while Soft SupCon and certain continuous variants preser...
-
Harnessing Linguistic Dissimilarity for Language Generalization on Unseen Low-Resource Varieties
A framework with TOPPing source selection and VACAI-Bowl dual-branch model yields 54.62% average improvement in dependency parsing across 10 low-resource varieties.
-
DETR-ViP: Detection Transformer with Robust Discriminative Visual Prompts
DETR-ViP boosts visual-prompted detection performance by learning globally discriminative prompts through integration and distillation on top of image-text contrastive learning, with a selective fusion step for stability.
-
Towards Multi-Source Domain Generalization for Sleep Staging with Noisy Labels
FF-TRUST delivers state-of-the-art sleep staging performance across domain shifts and both symmetric and asymmetric label noise by jointly regularizing temporal and spectral consistency on five public datasets.
-
A Simple Framework for Contrastive Learning of Visual Representations
SimCLR learns visual representations by contrasting augmented views of the same image and reaches 76.5% ImageNet top-1 accuracy with a linear classifier, matching a supervised ResNet-50.
-
Information as Maximum-Caliber Deviation: A bridge between Integrated Information Theory and the Free Energy Principle
Information defined as maximum-caliber deviation derives IIT 3.0 cause-effect repertoires from constrained entropy maximization and equates to prediction error under CLT and LDT.
-
Modular Representation Compression: Adapting LLMs for Efficient and Effective Recommendations
LLMs exhibit mid-layer representation advantage for recommendations; MARC compresses representations modularly to reduce costs while improving performance, as shown in a large-scale online advertising deployment.
-
Revisiting Feature Prediction for Learning Visual Representations from Video
V-JEPA models trained only on feature prediction from 2 million public videos achieve 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet-1K using frozen ViT-H/16 backbones.
-
HuggingFace's Transformers: State-of-the-art Natural Language Processing
Hugging Face releases an open-source Python library that supplies a unified API and pretrained weights for major Transformer architectures used in natural language processing.
-
Information theoretic underpinning of self-supervised learning by clustering
SSL clustering is derived as KL-divergence optimization where a teacher-distribution constraint normalizes via inverse cluster priors and simplifies to batch centering by Jensen's inequality.
-
InfoGeo: Information-Theoretic Object-Centric Learning for Cross-View Generalizable UAV Geo-Localization
InfoGeo reformulates cross-view geo-localization as an information bottleneck that aligns object-centric structural relations across views while minimizing view-specific noise.
-
M-IDoL: Information Decomposition for Modality-Specific and Diverse Representation Learning in Medical Foundation Model
M-IDoL learns modality-specific and diverse representations by maximizing inter-modality entropy and minimizing intra-modality uncertainty through information decomposition in MoE subspaces.
-
ID-Sim: An Identity-Focused Similarity Metric
ID-Sim is a new similarity metric that aims to capture human selective sensitivity to identities by training on curated real and generative synthetic data and validating against human annotations on recognition, retri...
-
Dynamic Visual-semantic Alignment for Zero-shot Learning with Ambiguous Labels
DVSA improves zero-shot learning under ambiguous labels by mutually calibrating visual features and attributes with attention and dynamic disambiguation.
-
Information-Theoretic Measures in AI: A Practical Decision Guide
A practical guide that organizes seven IT measures around three questions each—what it answers in AI, suitable estimators, and dangerous misuses—complete with flowchart, table, and worked examples.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.