arxiv: 1612.00410 · v7 · submitted 2016-12-01 · 💻 cs.LG · cs.IT· math.IT

Deep Variational Information Bottleneck

Alexander A. Alemi , Ian Fischer , Joshua V. Dillon , Kevin Murphy This is my paper

Pith reviewed 2026-05-18 06:18 UTC · model grok-4.3

classification 💻 cs.LG cs.ITmath.IT

keywords information bottleneckvariational inferenceneural network regularizationrepresentation learningadversarial robustnessgeneralizationmutual information

0 comments

The pith

A variational approximation to the information bottleneck lets neural networks learn compressed yet predictive representations that generalize better and resist adversarial attacks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a variational method to optimize the information bottleneck objective inside deep neural networks. This objective pushes the model to compress the input while keeping enough information to predict the target, controlled by a single trade-off parameter. By using variational bounds and the reparameterization trick, the approach becomes trainable end-to-end with standard gradient methods. A sympathetic reader would care because it turns an information-theoretic principle into a practical regularizer that demonstrably improves both accuracy on new examples and resistance to small input perturbations meant to fool the model.

Core claim

We present a variational approximation to the information bottleneck of Tishby et al. (1999). This variational approach allows us to parameterize the information bottleneck model using a neural network and leverage the reparameterization trick for efficient training. We call this method Deep Variational Information Bottleneck, or Deep VIB. We show that models trained with the VIB objective outperform those that are trained with other forms of regularization, in terms of generalization performance and robustness to adversarial attack.

What carries the argument

The Deep VIB objective, a variational upper bound on the information-bottleneck Lagrangian that replaces the mutual-information terms with expectations under parameterized encoder and decoder distributions.

If this is right

Neural networks achieve higher accuracy on held-out test data than networks trained with dropout or weight decay.
The learned representations exhibit greater robustness to adversarial perturbations crafted to maximize prediction error.
A single scalar beta directly controls the amount of compression applied to the input representation.
The method supports fully end-to-end training of deep architectures without requiring separate pre-training stages.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same variational bound could be adapted to sequential or graph-structured data where explicit compression of history or neighborhood information is desirable.
Success of VIB on adversarial robustness suggests that many existing regularizers may be implicitly performing a similar information-compression role.
Combining the objective with modern data-augmentation pipelines might further widen the robustness gap observed in the paper.

Load-bearing premise

The variational bounds on the mutual information terms stay tight enough during training that the learned representation actually realizes the intended compression-prediction trade-off.

What would settle it

Measure the true mutual informations I(X;Z) and I(Z;Y) after training and check whether they vary with the beta parameter exactly as the information-bottleneck curve predicts, or run head-to-head comparisons on multiple datasets where VIB fails to beat standard regularizers.

read the original abstract

We present a variational approximation to the information bottleneck of Tishby et al. (1999). This variational approach allows us to parameterize the information bottleneck model using a neural network and leverage the reparameterization trick for efficient training. We call this method "Deep Variational Information Bottleneck", or Deep VIB. We show that models trained with the VIB objective outperform those that are trained with other forms of regularization, in terms of generalization performance and robustness to adversarial attack.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents a variational approximation to the Information Bottleneck (IB) principle of Tishby et al. (1999), called Deep Variational Information Bottleneck (VIB). It parameterizes the IB model with neural networks for the encoder q(z|x) and decoder p(y|z), applies the reparameterization trick, and optimizes a variational surrogate to the IB Lagrangian max I(Y;Z) - β I(X;Z). The central claim is that VIB-trained models outperform those trained with other regularizers on generalization and adversarial robustness.

Significance. If the variational bounds remain sufficiently tight and the method truly implements the intended IB trade-off, this supplies a practical, scalable realization of information-theoretic regularization for deep networks. The approach could influence regularization techniques and robustness research by providing a principled alternative to ad-hoc penalties.

major comments (3)

[§2, Eq. (3)] The derivation in §2 applies the standard variational lower bound to the IB objective, yielding the loss E_{q(z|x)}[-log p(y|z)] + β KL(q(z|x)||r(z)). However, no diagnostic is provided (e.g., estimated mutual information curves or bound-gap plots) to confirm that the upper bound on I(X;Z) and lower bound on I(Y;Z) stay tight throughout optimization; if loose, the reported gains may arise from the specific KL regularizer rather than IB compression.
[Table 1, §4.1] Table 1 and §4.1 report superior MNIST generalization for VIB over dropout and weight decay, but the comparison does not control for hyper-parameter search budget across methods. Without this, it is unclear whether the advantage is attributable to the IB principle or to differences in tuning effort.
[§4.3] §4.3 claims improved adversarial robustness, yet the evaluation uses a fixed attack strength without reporting sensitivity to stronger attacks or providing the exact attack parameters. This leaves open whether the robustness is a genuine consequence of the information bottleneck or an artifact of the chosen evaluation.

minor comments (2)

[§2] The notation for the variational prior r(z) versus the marginal p(z) should be made consistent across equations to avoid reader confusion.
[Figure 1] Figure 1 would benefit from axis labels that explicitly state the quantities plotted (e.g., estimated I(X;Z) versus β) and from reporting results over multiple random seeds.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below, indicating where revisions will be made.

read point-by-point responses

Referee: [§2, Eq. (3)] The derivation in §2 applies the standard variational lower bound to the IB objective, yielding the loss E_{q(z|x)}[-log p(y|z)] + β KL(q(z|x)||r(z)). However, no diagnostic is provided (e.g., estimated mutual information curves or bound-gap plots) to confirm that the upper bound on I(X;Z) and lower bound on I(Y;Z) stay tight throughout optimization; if loose, the reported gains may arise from the specific KL regularizer rather than IB compression.

Authors: We agree that diagnostics on bound tightness would strengthen the paper. In the revision we will add plots of the estimated mutual information terms I(X;Z) and I(Y;Z) together with the variational gap throughout training. These will be computed using the same Monte-Carlo estimators already present in the code and will help confirm that the reported gains track the intended IB trade-off. revision: yes
Referee: [Table 1, §4.1] Table 1 and §4.1 report superior MNIST generalization for VIB over dropout and weight decay, but the comparison does not control for hyper-parameter search budget across methods. Without this, it is unclear whether the advantage is attributable to the IB principle or to differences in tuning effort.

Authors: This is a fair criticism. While we performed grid searches of comparable size for all methods, we did not explicitly equalize total wall-clock budget. In the revised manuscript we will report the exact hyper-parameter ranges explored for each baseline and add a short discussion of search effort. A fully re-tuned matched-budget experiment is beyond the scope of a minor revision but can be noted as future work if the referee requests it. revision: partial
Referee: [§4.3] §4.3 claims improved adversarial robustness, yet the evaluation uses a fixed attack strength without reporting sensitivity to stronger attacks or providing the exact attack parameters. This leaves open whether the robustness is a genuine consequence of the information bottleneck or an artifact of the chosen evaluation.

Authors: We accept the point. The original experiments used FGSM with ε = 0.3 (standard at the time) but omitted full parameter disclosure and sensitivity curves. The revision will state the precise attack parameters, include results for a range of ε values, and add a brief comparison with PGD to show that the robustness advantage persists under stronger attacks. revision: yes

Circularity Check

0 steps flagged

No significant circularity: standard variational approximation to external IB objective

full rationale

The paper starts from the information bottleneck Lagrangian of Tishby et al. (1999), an external reference, and applies the standard variational upper bound on I(X;Z) via KL(q(z|x)||r(z)) together with a lower bound on I(Y;Z) via the decoder expectation. This produces a tractable objective that is then optimized with neural networks and the reparameterization trick. Neither the derivation nor the empirical performance claims reduce to a fitted parameter, self-definition, or self-citation chain; the bounds are explicit approximations whose tightness is an empirical question rather than a definitional identity. Experiments compare against other regularizers on held-out data, providing independent content.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of the variational approximation to mutual information and on the choice of a scalar trade-off parameter.

free parameters (1)

beta
Scalar multiplier on the compression term in the IB Lagrangian; its value is chosen by the user or by validation.

axioms (1)

domain assumption A variational distribution q(z|x) can be used to obtain a tractable lower bound on the mutual information I(X;Z).
Invoked when replacing the exact IB objective with the variational surrogate.

pith-pipeline@v0.9.0 · 5601 in / 1166 out tokens · 25429 ms · 2026-05-18T06:18:02.568607+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith.Foundation.DAlembert.Inevitability bilinear_family_forced unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

I(Z,Y;θ)−βI(Z,X;θ)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 17 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 7.0

JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampli...
Privacy-Aware Video Anomaly Detection through Orthogonal Subspace Projection
cs.CV 2026-05 unverdicted novelty 7.0

A new orthogonal projection module for video anomaly detection suppresses facial attributes via weak face-presence signals and cosine alignment while preserving anomaly-relevant features like pose and motion.
From Observations to States: Latent Time Series Forecasting
cs.LG 2026-01 conditional novelty 7.0

LatentTSF improves time series forecasting accuracy and representation quality by shifting prediction from observation space to a learned latent state space via autoencoding.
Information Filtering via Variational Regularization for Robot Manipulation
cs.RO 2026-01 unverdicted novelty 7.0

Variational Regularization imposes an adaptive information bottleneck on noisy intermediate features in DP3-UNet and DP3-DiT policies, consistently raising task success rates on RoboTwin2.0, Adroit, and MetaWorld whil...
Dream to Control: Learning Behaviors by Latent Imagination
cs.LG 2019-12 accept novelty 7.0

Dreamer learns to control from images by imagining and optimizing behaviors in a learned latent world model, outperforming prior methods on 20 visual tasks in data efficiency and final performance.
Bad Seeing or Bad Thinking? Rewarding Perception for Vision-Language Reasoning
cs.AI 2026-05 unverdicted novelty 6.0

A new RL method called MoCA with Perception Verification rewards perceptual fidelity independently to improve both seeing and thinking in VLMs.
Hypergraph and Latent ODE Learning for Multimodal Root Cause Localization in Microservices
cs.LG 2026-05 unverdicted novelty 6.0

HyperODE RCA integrates hypergraph learning with latent ODEs and cross-modal attention to improve root cause localization in microservice architectures on the Tianchi AIOps benchmark.
Information bottleneck for learning the phase space of dynamics from high-dimensional experimental data
physics.data-an 2026-04 unverdicted novelty 6.0

DySIB recovers a two-dimensional representation matching the phase space of a physical pendulum from high-dimensional video data by maximizing predictive mutual information in latent space.
Variational Feature Compression for Model-Specific Representations
cs.CV 2026-04 unverdicted novelty 6.0

A variational latent bottleneck with KL regularization and a dynamic binary mask based on saliency produces model-specific features that keep high accuracy for one classifier but drop others below 2% on CIFAR-100 with...
Super Agents and Confounders: Influence of surrounding agents on vehicle trajectory prediction
cs.LG 2026-04 unverdicted novelty 6.0

Surrounding agents frequently degrade trajectory prediction accuracy in interactive driving scenes, and integrating a Conditional Information Bottleneck improves results by ignoring non-beneficial contextual signals.
TabTransformer: Tabular Data Modeling Using Contextual Embeddings
cs.LG 2020-12 unverdicted novelty 6.0

TabTransformer uses Transformer self-attention to generate contextual embeddings from categorical features in tabular data, outperforming prior deep learning methods by at least 1% mean AUC and matching tree-based ens...
Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges
cs.LG 2026-04 unverdicted novelty 5.0

The paper introduces the Proxy Compression Hypothesis as a unifying framework explaining reward hacking in RLHF as an emergent result of compressing high-dimensional human objectives into proxy reward signals under op...
URMF: Uncertainty-aware Robust Multimodal Fusion for Multimodal Sarcasm Detection
cs.CV 2026-04 unverdicted novelty 5.0

URMF uses learnable Gaussian posteriors to estimate modality-specific uncertainty and adjust fusion weights for improved multimodal sarcasm detection on MSD and MMSD2 benchmarks.
DRAFT: Task Decoupled Latent Reasoning for Agent Safety
cs.LG 2026-02 unverdicted novelty 5.0

DRAFT decouples agent safety judgment into latent extraction and reasoning stages, raising average benchmark accuracy from 63.27% to 91.18%.
Structural Prognostic Event Modeling for Multimodal Cancer Survival Analysis
cs.CV 2025-11 unverdicted novelty 5.0

SlotSPE is a slot-attention framework that decomposes multimodal cancer data into structural prognostic event slots to improve survival prediction and interpretability.
TELL-TALE: Task Efficient LLMs with Task Aware Layer Elimination
cs.LG 2025-10 unverdicted novelty 5.0

TALE selectively prunes task-detrimental layers in LLMs at inference time to match or exceed baseline performance with lower computational cost across multiple models and tasks.
Adversary-Free Counterfactual Prediction via Information-Regularized Representations
cs.LG 2025-10 unverdicted novelty 5.0

Develops an adversary-free counterfactual prediction framework by deriving a variational objective that upper-bounds mutual information between stochastic representations and treatments.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · cited by 17 Pith papers · 9 internal anchors

[1]

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

Mart \' n Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[2]

Information Dropout: Learning Optimal Representations Through Noisy Computation

Alessandro Achille and Stefano Soatto. Information dropout: Learning optimal representations through noisy computation. 2016. URL http://arxiv.org/abs/1611.01353

work page internal anchor Pith review Pith/arXiv arXiv 2016
[3]

The IM algorithm: a variational approach to information maximization

David Barber Felix Agakov. The IM algorithm: a variational approach to information maximization . In NIPS, volume 16, 2004

work page 2004
[4]

The virtues of peer pressure: A simple method for discovering high-value mistakes

Shumeet Baluja, Michele Covell, and Rahul Sukthankar. The virtues of peer pressure: A simple method for discovering high-value mistakes. In Intl. Conf. Computer Analysis of Images and Patterns, 2015

work page 2015
[5]

Towards open world recognition

Abhijit Bendale and Terrance Boult. Towards open world recognition. In CVPR, 2015

work page 2015
[6]

Predictability, complexity, and learning

William Bialek, Ilya Nemenman, and Naftali Tishby. Predictability, complexity, and learning. Neural computation, 13 0 (11): 0 2409--2463, 2001

work page 2001
[7]

Weight uncertainty in neural networks

Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight uncertainty in neural networks. In ICML, 2015

work page 2015
[8]

Browne and Paul D

Ryan P. Browne and Paul D. McNicholas. Multivariate sharp quadratic bounds via -strong convexity and the fenchel connection. Electronic Journal of Statistics, 9, 2015

work page 2015
[9]

Towards evaluating the robustness of neural networks

Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. Arxiv, 2016

work page 2016
[10]

Relevant sparse codes with variational information bottleneck

Matthew Chalk, Olivier Marre, and Gasper Tkacik. Relevant sparse codes with variational information bottleneck. In NIPS, 2016

work page 2016
[11]

Chechik, A Globersonand N

G. Chechik, A Globersonand N. Tishby, and Y. Weiss. Information bottleneck for gaussian variables. J. of Machine Learning Research, 6: 0 165–188, 2005

work page 2005
[12]

Differential privacy as a mutual information constraint

Paul Cuff and Lanqing Yu. Differential privacy as a mutual information constraint. In ACM Conference on Computer and Communications Security ( CCS ) , 2016

work page 2016
[13]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp.\ 248--255. IEEE, 2009

work page 2009
[14]

Robustness of classifiers: from adversarial to random noise

Alhussein Fawzi, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. Robustness of classifiers: from adversarial to random noise. In NIPS, 2016

work page 2016
[15]

Understanding the difficulty of training deep feedforward neural networks

Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In AI/Statistics, volume 9, pp.\ 249--256, 2010

work page 2010
[16]

Explaining and harnessing adversarial examples

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015

work page 2015
[17]

beta-VAE : Learning basic visual concepts with a constrained variational framework

Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. beta-VAE : Learning basic visual concepts with a constrained variational framework. In ICLR , 2017. URL https://openreview.net/pdf?id=Sy2fzU9gl

work page 2017
[18]

Learning with a Strong Adversary

Ruitong Huang, Bing Xu, Dale Schuurmans, and Csaba Szepesv \' a ri. Learning with a strong adversary. CoRR, abs/1511.03034, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[19]

Adam: A method for stochastic optimization

Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2015

work page 2015
[20]

Auto-encoding variational Bayes

Diederik P Kingma and Max Welling. Auto-encoding variational Bayes . In ICLR, 2014

work page 2014
[21]

Adversarial examples in the physical world

Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. In ICLR Workshop , 2017. URL https://openreview.net/pdf?id=S1OufnIlx

work page 2017
[22]

The Variational Fair Autoencoder

Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, and Richard Zemel. The variational fair autoencoder. In ICLR , 2016. URL http://arxiv.org/abs/1511.00830

work page internal anchor Pith review Pith/arXiv arXiv 2016
[23]

Information theory, inference and learning algorithms

David JC MacKay. Information theory, inference and learning algorithms. Cambridge university press, 2003

work page 2003
[24]

Variational information maximisation for intrinsically motivated reinforcement learning

Shakir Mohamed and Danilo Jimenez Rezende. Variational information maximisation for intrinsically motivated reinforcement learning. In NIPS, pp.\ 2125--2133, 2015

work page 2015
[25]

Universal adversarial perturbations

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. Arxiv, 2016

work page 2016
[26]

Deepfool: a simple and accurate method to fool deep neural networks

Seyed - Mohsen Moosavi - Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In CVPR, 2016

work page 2016
[27]

Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images

Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In CVPR, 2015. URL http://arxiv.org/abs/1412.1897

work page internal anchor Pith review Pith/arXiv arXiv 2015
[28]

Predictive information in a sensory population

Stephanie E Palmer, Olivier Marre, Michael J Berry, and William Bialek. Predictive information in a sensory population. PNAS, 112 0 (22): 0 6908--6913, 2015

work page 2015
[29]

The limitations of deep learning in adversarial settings

Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. In Proceedings of the 1st IEEE European Symposium on Security and Privacy, 2015

work page 2015
[30]

Regularizing neural networks by penalizing confident output predictions

Gabriel Pereyra, George Tuckery, Jan Chorowski, and Lukasz Kaiser. Regularizing neural networks by penalizing confident output predictions. In ICLR Workshop , 2017. URL https://openreview.net/pdf?id=HyhbYrGYe

work page 2017
[31]

Acceleration of stochastic approximation by averaging

Boris T Polyak and Anatoli B Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30 0 (4): 0 838--855, 1992

work page 1992
[32]

Confusing Deep Convolution Networks by Relabelling

Leigh Robinson and Benjamin Graham. Confusing deep convolution networks by relabelling. arXiv preprint 1510.06925, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[33]

Adversarial manipulation of deep representations

Sara Sabour, Yanshuai Cao, Fartash Faghri, and David J Fleet. Adversarial manipulation of deep representations. In ICLR, 2016

work page 2016
[34]

Learning and generalization with the information bottleneck

Ohad Shamir, Sivan Sabato, and Naftali Tishby. Learning and generalization with the information bottleneck. Theoretical Computer Science, 411 0 (29-30): 0 2696--2711, 2010

work page 2010
[35]

Information-based clustering

Noam Slonim, Gurinder Singh Atwal, Ga s per Tka c ik, and William Bialek. Information-based clustering. PNAS, 102 0 (51): 0 18297--18302, 2005

work page 2005
[36]

How many clusters? an information-theoretic perspective

Susanne Still and William Bialek. How many clusters? an information-theoretic perspective. Neural computation, 16 0 (12): 0 2483--2506, 2004

work page 2004
[37]

Intriguing properties of neural networks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In ICLR, 2014. URL http://arxiv.org/abs/1312.6199

work page internal anchor Pith review Pith/arXiv arXiv 2014
[38]

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alex Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:1602.07261, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[39]

Deep learning and the information bottleneck principle

N Tishby and N Zaslavsky. Deep learning and the information bottleneck principle. In IEEE Information Theory Workshop , pp.\ 1--5, April 2015 a

work page 2015
[40]

Tishby, F.C

N. Tishby, F.C. Pereira, and W. Biale. The information bottleneck method. In The 37th annual Allerton Conf. on Communication, Control, and Computing, pp.\ 368--377, 1999

work page 1999
[41]

Deep learning and the information bottleneck principle

Naftali Tishby and Noga Zaslavsky. Deep learning and the information bottleneck principle. In Information Theory Workshop (ITW), 2015 IEEE, pp.\ 1--5. IEEE, 2015 b

work page 2015
[42]

On the relation between identifiability, differential privacy and Mutual-Information privacy

Weina Wang, Lei Ying, and Junshan Zhang. On the relation between identifiability, differential privacy and Mutual-Information privacy. IEEE Trans. Inf. Theory, 62: 0 5018--5029, 2016 a

work page 2016
[43]

Deep Variational Canonical Correlation Analysis

Weiran Wang, Honglak Lee, and Karen Livescu. Deep variational canonical correlation analysis. arXiv [cs.LG], 11 October 2016 b . URL https://arxiv.org/abs/1610.03454

work page internal anchor Pith review Pith/arXiv arXiv 2016