Deep Learning and the Information Bottleneck Principle

Naftali Tishby; Noga Zaslavsky

arxiv: 1503.02406 · v1 · pith:FIU7OEFAnew · submitted 2015-03-09 · 💻 cs.LG

Deep Learning and the Information Bottleneck Principle

Naftali Tishby , Noga Zaslavsky This is my paper

classification 💻 cs.LG

keywords informationbottleneckdeeplayerboundsgeneralizationinputlayers

0 comments

read the original abstract

Deep Neural Networks (DNNs) are analyzed via the theoretical framework of the information bottleneck (IB) principle. We first show that any DNN can be quantified by the mutual information between the layers and the input and output variables. Using this representation we can calculate the optimal information theoretic limits of the DNN and obtain finite sample generalization bounds. The advantage of getting closer to the theoretical limit is quantifiable both by the generalization bound and by the network's simplicity. We argue that both the optimal architecture, number of layers and features/connections at each layer, are related to the bifurcation points of the information bottleneck tradeoff, namely, relevant compression of the input layer with respect to the output layer. The hierarchical representations at the layered network naturally correspond to the structural phase transitions along the information curve. We believe that this new insight can lead to new optimality bounds and deep learning algorithms.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Information as Maximum-Caliber Deviation: A bridge between Integrated Information Theory and the Free Energy Principle
q-bio.NC 2026-05 unverdicted novelty 6.0

Information defined as maximum-caliber deviation derives IIT 3.0 cause-effect repertoires from constrained entropy maximization and equates to prediction error under CLT and LDT.
Starve to Perceive: Taming Lazy Perception in VLMs with Constrained Visual Bandwidth
cs.CV 2026-05 unverdicted novelty 5.0

Constraining visual token budget per observation during VLM training forces genuine active perception and delivers 5% average relative improvement without auxiliary losses or architecture changes.
Consistency Analysis of Sentiment Predictions using Syntactic & Semantic Context Assessment Summarization (SSAS)
cs.CL 2026-04 unverdicted novelty 5.0

SSAS improves LLM sentiment prediction consistency and data quality by up to 30% on three review datasets via syntactic and semantic context assessment summarization.
Informationally Compressive Anonymization: Non-Degrading Sensitive Input Protection for Privacy-Preserving Supervised Machine Learning
cs.LG 2026-03 unverdicted novelty 5.0

ICA and VEIL enable privacy-preserving supervised ML by producing structurally non-invertible encodings aligned with downstream tasks while maintaining predictive utility.
Leveraging Weighted Syntactic and Semantic Context Assessment Summary (wSSAS) Towards Text Categorization Using LLMs
cs.CL 2026-04 unverdicted novelty 4.0

wSSAS is a two-phase deterministic framework that uses hierarchical text organization and SNR-based feature prioritization to improve clustering integrity, categorization accuracy, and reproducibility when applying LL...
Lecture Notes on Statistical Physics and Neural Networks
cond-mat.dis-nn 2026-05 unverdicted novelty 2.0

Lecture notes that treat statistical physics as probability theory and connect Ising models, spin glasses, and renormalization group ideas to Hopfield networks, restricted Boltzmann machines, and large language models.