Learning deep representations by mutual information estimation and maximization

R Devon Hjelm , Alex Fedorov , Samuel Lavoie-Marchildon , Karan Grewal , Phil Bachman , Adam Trischler , Yoshua Bengio

Authors on Pith no claims yet

classification 📊 stat.ML cs.LG

keywords learningdeeprepresentationsunsupervisedinformationinputmutualrepresentation

read the original abstract

In this work, we perform unsupervised learning of representations by maximizing mutual information between an input and the output of a deep neural network encoder. Importantly, we show that structure matters: incorporating knowledge about locality of the input to the objective can greatly influence a representation's suitability for downstream tasks. We further control characteristics of the representation by matching to a prior distribution adversarially. Our method, which we call Deep InfoMax (DIM), outperforms a number of popular unsupervised learning methods and competes with fully-supervised learning on several classification tasks. DIM opens new avenues for unsupervised learning of representations and is an important step towards flexible formulations of representation-learning objectives for specific end-goals.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 15 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Unified Geometric Framework for Weighted Contrastive Learning
cs.LG 2026-05 unverdicted novelty 8.0

Weighted InfoNCE objectives realize specific target geometries in embedding space, with SupCon producing size-dependent inter-class similarities under imbalance while Soft SupCon and certain continuous variants preser...
Harnessing Linguistic Dissimilarity for Language Generalization on Unseen Low-Resource Varieties
cs.CL 2026-05 unverdicted novelty 7.0

A framework with TOPPing source selection and VACAI-Bowl dual-branch model yields 54.62% average improvement in dependency parsing across 10 low-resource varieties.
DETR-ViP: Detection Transformer with Robust Discriminative Visual Prompts
cs.CV 2026-04 unverdicted novelty 7.0

DETR-ViP boosts visual-prompted detection performance by learning globally discriminative prompts through integration and distillation on top of image-text contrastive learning, with a selective fusion step for stability.
Towards Multi-Source Domain Generalization for Sleep Staging with Noisy Labels
cs.LG 2026-04 unverdicted novelty 7.0

FF-TRUST delivers state-of-the-art sleep staging performance across domain shifts and both symmetric and asymmetric label noise by jointly regularizing temporal and spectral consistency on five public datasets.
A Simple Framework for Contrastive Learning of Visual Representations
cs.LG 2020-02 accept novelty 7.0

SimCLR learns visual representations by contrasting augmented views of the same image and reaches 76.5% ImageNet top-1 accuracy with a linear classifier, matching a supervised ResNet-50.
Information as Maximum-Caliber Deviation: A bridge between Integrated Information Theory and the Free Energy Principle
q-bio.NC 2026-05 unverdicted novelty 6.0

Information defined as maximum-caliber deviation derives IIT 3.0 cause-effect repertoires from constrained entropy maximization and equates to prediction error under CLT and LDT.
Modular Representation Compression: Adapting LLMs for Efficient and Effective Recommendations
cs.IR 2026-04 unverdicted novelty 6.0

LLMs exhibit mid-layer representation advantage for recommendations; MARC compresses representations modularly to reduce costs while improving performance, as shown in a large-scale online advertising deployment.
Revisiting Feature Prediction for Learning Visual Representations from Video
cs.CV 2024-02 conditional novelty 6.0

V-JEPA models trained only on feature prediction from 2 million public videos achieve 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet-1K using frozen ViT-H/16 backbones.
HuggingFace's Transformers: State-of-the-art Natural Language Processing
cs.CL 2019-10 accept novelty 6.0

Hugging Face releases an open-source Python library that supplies a unified API and pretrained weights for major Transformer architectures used in natural language processing.
Information theoretic underpinning of self-supervised learning by clustering
cs.LG 2026-05 unverdicted novelty 5.0

SSL clustering is derived as KL-divergence optimization where a teacher-distribution constraint normalizes via inverse cluster priors and simplifies to batch centering by Jensen's inequality.
InfoGeo: Information-Theoretic Object-Centric Learning for Cross-View Generalizable UAV Geo-Localization
cs.CV 2026-05 unverdicted novelty 5.0

InfoGeo reformulates cross-view geo-localization as an information bottleneck that aligns object-centric structural relations across views while minimizing view-specific noise.
M-IDoL: Information Decomposition for Modality-Specific and Diverse Representation Learning in Medical Foundation Model
cs.CV 2026-04 unverdicted novelty 5.0

M-IDoL learns modality-specific and diverse representations by maximizing inter-modality entropy and minimizing intra-modality uncertainty through information decomposition in MoE subspaces.
ID-Sim: An Identity-Focused Similarity Metric
cs.CV 2026-04 unverdicted novelty 5.0

ID-Sim is a new similarity metric that aims to capture human selective sensitivity to identities by training on curated real and generative synthetic data and validating against human annotations on recognition, retri...
Dynamic Visual-semantic Alignment for Zero-shot Learning with Ambiguous Labels
cs.CV 2026-04 unverdicted novelty 4.0

DVSA improves zero-shot learning under ambiguous labels by mutually calibrating visual features and attributes with attention and dynamic disambiguation.
Information-Theoretic Measures in AI: A Practical Decision Guide
cs.AI 2026-04 unverdicted novelty 3.0

A practical guide that organizes seven IT measures around three questions each—what it answers in AI, suitable estimators, and dangerous misuses—complete with flowchart, table, and worked examples.