pith. machine review for the scientific record. sign in

arxiv: 1612.00410 · v7 · submitted 2016-12-01 · 💻 cs.LG · cs.IT· math.IT

Deep Variational Information Bottleneck

Pith reviewed 2026-05-18 06:18 UTC · model grok-4.3

classification 💻 cs.LG cs.ITmath.IT
keywords information bottleneckvariational inferenceneural network regularizationrepresentation learningadversarial robustnessgeneralizationmutual information
0
0 comments X

The pith

A variational approximation to the information bottleneck lets neural networks learn compressed yet predictive representations that generalize better and resist adversarial attacks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a variational method to optimize the information bottleneck objective inside deep neural networks. This objective pushes the model to compress the input while keeping enough information to predict the target, controlled by a single trade-off parameter. By using variational bounds and the reparameterization trick, the approach becomes trainable end-to-end with standard gradient methods. A sympathetic reader would care because it turns an information-theoretic principle into a practical regularizer that demonstrably improves both accuracy on new examples and resistance to small input perturbations meant to fool the model.

Core claim

We present a variational approximation to the information bottleneck of Tishby et al. (1999). This variational approach allows us to parameterize the information bottleneck model using a neural network and leverage the reparameterization trick for efficient training. We call this method Deep Variational Information Bottleneck, or Deep VIB. We show that models trained with the VIB objective outperform those that are trained with other forms of regularization, in terms of generalization performance and robustness to adversarial attack.

What carries the argument

The Deep VIB objective, a variational upper bound on the information-bottleneck Lagrangian that replaces the mutual-information terms with expectations under parameterized encoder and decoder distributions.

If this is right

  • Neural networks achieve higher accuracy on held-out test data than networks trained with dropout or weight decay.
  • The learned representations exhibit greater robustness to adversarial perturbations crafted to maximize prediction error.
  • A single scalar beta directly controls the amount of compression applied to the input representation.
  • The method supports fully end-to-end training of deep architectures without requiring separate pre-training stages.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same variational bound could be adapted to sequential or graph-structured data where explicit compression of history or neighborhood information is desirable.
  • Success of VIB on adversarial robustness suggests that many existing regularizers may be implicitly performing a similar information-compression role.
  • Combining the objective with modern data-augmentation pipelines might further widen the robustness gap observed in the paper.

Load-bearing premise

The variational bounds on the mutual information terms stay tight enough during training that the learned representation actually realizes the intended compression-prediction trade-off.

What would settle it

Measure the true mutual informations I(X;Z) and I(Z;Y) after training and check whether they vary with the beta parameter exactly as the information-bottleneck curve predicts, or run head-to-head comparisons on multiple datasets where VIB fails to beat standard regularizers.

read the original abstract

We present a variational approximation to the information bottleneck of Tishby et al. (1999). This variational approach allows us to parameterize the information bottleneck model using a neural network and leverage the reparameterization trick for efficient training. We call this method "Deep Variational Information Bottleneck", or Deep VIB. We show that models trained with the VIB objective outperform those that are trained with other forms of regularization, in terms of generalization performance and robustness to adversarial attack.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents a variational approximation to the Information Bottleneck (IB) principle of Tishby et al. (1999), called Deep Variational Information Bottleneck (VIB). It parameterizes the IB model with neural networks for the encoder q(z|x) and decoder p(y|z), applies the reparameterization trick, and optimizes a variational surrogate to the IB Lagrangian max I(Y;Z) - β I(X;Z). The central claim is that VIB-trained models outperform those trained with other regularizers on generalization and adversarial robustness.

Significance. If the variational bounds remain sufficiently tight and the method truly implements the intended IB trade-off, this supplies a practical, scalable realization of information-theoretic regularization for deep networks. The approach could influence regularization techniques and robustness research by providing a principled alternative to ad-hoc penalties.

major comments (3)
  1. [§2, Eq. (3)] The derivation in §2 applies the standard variational lower bound to the IB objective, yielding the loss E_{q(z|x)}[-log p(y|z)] + β KL(q(z|x)||r(z)). However, no diagnostic is provided (e.g., estimated mutual information curves or bound-gap plots) to confirm that the upper bound on I(X;Z) and lower bound on I(Y;Z) stay tight throughout optimization; if loose, the reported gains may arise from the specific KL regularizer rather than IB compression.
  2. [Table 1, §4.1] Table 1 and §4.1 report superior MNIST generalization for VIB over dropout and weight decay, but the comparison does not control for hyper-parameter search budget across methods. Without this, it is unclear whether the advantage is attributable to the IB principle or to differences in tuning effort.
  3. [§4.3] §4.3 claims improved adversarial robustness, yet the evaluation uses a fixed attack strength without reporting sensitivity to stronger attacks or providing the exact attack parameters. This leaves open whether the robustness is a genuine consequence of the information bottleneck or an artifact of the chosen evaluation.
minor comments (2)
  1. [§2] The notation for the variational prior r(z) versus the marginal p(z) should be made consistent across equations to avoid reader confusion.
  2. [Figure 1] Figure 1 would benefit from axis labels that explicitly state the quantities plotted (e.g., estimated I(X;Z) versus β) and from reporting results over multiple random seeds.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below, indicating where revisions will be made.

read point-by-point responses
  1. Referee: [§2, Eq. (3)] The derivation in §2 applies the standard variational lower bound to the IB objective, yielding the loss E_{q(z|x)}[-log p(y|z)] + β KL(q(z|x)||r(z)). However, no diagnostic is provided (e.g., estimated mutual information curves or bound-gap plots) to confirm that the upper bound on I(X;Z) and lower bound on I(Y;Z) stay tight throughout optimization; if loose, the reported gains may arise from the specific KL regularizer rather than IB compression.

    Authors: We agree that diagnostics on bound tightness would strengthen the paper. In the revision we will add plots of the estimated mutual information terms I(X;Z) and I(Y;Z) together with the variational gap throughout training. These will be computed using the same Monte-Carlo estimators already present in the code and will help confirm that the reported gains track the intended IB trade-off. revision: yes

  2. Referee: [Table 1, §4.1] Table 1 and §4.1 report superior MNIST generalization for VIB over dropout and weight decay, but the comparison does not control for hyper-parameter search budget across methods. Without this, it is unclear whether the advantage is attributable to the IB principle or to differences in tuning effort.

    Authors: This is a fair criticism. While we performed grid searches of comparable size for all methods, we did not explicitly equalize total wall-clock budget. In the revised manuscript we will report the exact hyper-parameter ranges explored for each baseline and add a short discussion of search effort. A fully re-tuned matched-budget experiment is beyond the scope of a minor revision but can be noted as future work if the referee requests it. revision: partial

  3. Referee: [§4.3] §4.3 claims improved adversarial robustness, yet the evaluation uses a fixed attack strength without reporting sensitivity to stronger attacks or providing the exact attack parameters. This leaves open whether the robustness is a genuine consequence of the information bottleneck or an artifact of the chosen evaluation.

    Authors: We accept the point. The original experiments used FGSM with ε = 0.3 (standard at the time) but omitted full parameter disclosure and sensitivity curves. The revision will state the precise attack parameters, include results for a range of ε values, and add a brief comparison with PGD to show that the robustness advantage persists under stronger attacks. revision: yes

Circularity Check

0 steps flagged

No significant circularity: standard variational approximation to external IB objective

full rationale

The paper starts from the information bottleneck Lagrangian of Tishby et al. (1999), an external reference, and applies the standard variational upper bound on I(X;Z) via KL(q(z|x)||r(z)) together with a lower bound on I(Y;Z) via the decoder expectation. This produces a tractable objective that is then optimized with neural networks and the reparameterization trick. Neither the derivation nor the empirical performance claims reduce to a fitted parameter, self-definition, or self-citation chain; the bounds are explicit approximations whose tightness is an empirical question rather than a definitional identity. Experiments compare against other regularizers on held-out data, providing independent content.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of the variational approximation to mutual information and on the choice of a scalar trade-off parameter.

free parameters (1)
  • beta
    Scalar multiplier on the compression term in the IB Lagrangian; its value is chosen by the user or by validation.
axioms (1)
  • domain assumption A variational distribution q(z|x) can be used to obtain a tractable lower bound on the mutual information I(X;Z).
    Invoked when replacing the exact IB objective with the variational surrogate.

pith-pipeline@v0.9.0 · 5601 in / 1166 out tokens · 25429 ms · 2026-05-18T06:18:02.568607+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 17 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning

    cs.LG 2026-05 unverdicted novelty 7.0

    JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampli...

  2. Privacy-Aware Video Anomaly Detection through Orthogonal Subspace Projection

    cs.CV 2026-05 unverdicted novelty 7.0

    A new orthogonal projection module for video anomaly detection suppresses facial attributes via weak face-presence signals and cosine alignment while preserving anomaly-relevant features like pose and motion.

  3. From Observations to States: Latent Time Series Forecasting

    cs.LG 2026-01 conditional novelty 7.0

    LatentTSF improves time series forecasting accuracy and representation quality by shifting prediction from observation space to a learned latent state space via autoencoding.

  4. Information Filtering via Variational Regularization for Robot Manipulation

    cs.RO 2026-01 unverdicted novelty 7.0

    Variational Regularization imposes an adaptive information bottleneck on noisy intermediate features in DP3-UNet and DP3-DiT policies, consistently raising task success rates on RoboTwin2.0, Adroit, and MetaWorld whil...

  5. Dream to Control: Learning Behaviors by Latent Imagination

    cs.LG 2019-12 accept novelty 7.0

    Dreamer learns to control from images by imagining and optimizing behaviors in a learned latent world model, outperforming prior methods on 20 visual tasks in data efficiency and final performance.

  6. Bad Seeing or Bad Thinking? Rewarding Perception for Vision-Language Reasoning

    cs.AI 2026-05 unverdicted novelty 6.0

    A new RL method called MoCA with Perception Verification rewards perceptual fidelity independently to improve both seeing and thinking in VLMs.

  7. Hypergraph and Latent ODE Learning for Multimodal Root Cause Localization in Microservices

    cs.LG 2026-05 unverdicted novelty 6.0

    HyperODE RCA integrates hypergraph learning with latent ODEs and cross-modal attention to improve root cause localization in microservice architectures on the Tianchi AIOps benchmark.

  8. Information bottleneck for learning the phase space of dynamics from high-dimensional experimental data

    physics.data-an 2026-04 unverdicted novelty 6.0

    DySIB recovers a two-dimensional representation matching the phase space of a physical pendulum from high-dimensional video data by maximizing predictive mutual information in latent space.

  9. Variational Feature Compression for Model-Specific Representations

    cs.CV 2026-04 unverdicted novelty 6.0

    A variational latent bottleneck with KL regularization and a dynamic binary mask based on saliency produces model-specific features that keep high accuracy for one classifier but drop others below 2% on CIFAR-100 with...

  10. Super Agents and Confounders: Influence of surrounding agents on vehicle trajectory prediction

    cs.LG 2026-04 unverdicted novelty 6.0

    Surrounding agents frequently degrade trajectory prediction accuracy in interactive driving scenes, and integrating a Conditional Information Bottleneck improves results by ignoring non-beneficial contextual signals.

  11. TabTransformer: Tabular Data Modeling Using Contextual Embeddings

    cs.LG 2020-12 unverdicted novelty 6.0

    TabTransformer uses Transformer self-attention to generate contextual embeddings from categorical features in tabular data, outperforming prior deep learning methods by at least 1% mean AUC and matching tree-based ens...

  12. Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

    cs.LG 2026-04 unverdicted novelty 5.0

    The paper introduces the Proxy Compression Hypothesis as a unifying framework explaining reward hacking in RLHF as an emergent result of compressing high-dimensional human objectives into proxy reward signals under op...

  13. URMF: Uncertainty-aware Robust Multimodal Fusion for Multimodal Sarcasm Detection

    cs.CV 2026-04 unverdicted novelty 5.0

    URMF uses learnable Gaussian posteriors to estimate modality-specific uncertainty and adjust fusion weights for improved multimodal sarcasm detection on MSD and MMSD2 benchmarks.

  14. DRAFT: Task Decoupled Latent Reasoning for Agent Safety

    cs.LG 2026-02 unverdicted novelty 5.0

    DRAFT decouples agent safety judgment into latent extraction and reasoning stages, raising average benchmark accuracy from 63.27% to 91.18%.

  15. Structural Prognostic Event Modeling for Multimodal Cancer Survival Analysis

    cs.CV 2025-11 unverdicted novelty 5.0

    SlotSPE is a slot-attention framework that decomposes multimodal cancer data into structural prognostic event slots to improve survival prediction and interpretability.

  16. TELL-TALE: Task Efficient LLMs with Task Aware Layer Elimination

    cs.LG 2025-10 unverdicted novelty 5.0

    TALE selectively prunes task-detrimental layers in LLMs at inference time to match or exceed baseline performance with lower computational cost across multiple models and tasks.

  17. Adversary-Free Counterfactual Prediction via Information-Regularized Representations

    cs.LG 2025-10 unverdicted novelty 5.0

    Develops an adversary-free counterfactual prediction framework by deriving a variational objective that upper-bounds mutual information between stochastic representations and treatments.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · cited by 17 Pith papers · 9 internal anchors

  1. [1]

    TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

    Mart \' n Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016

  2. [2]

    Information Dropout: Learning Optimal Representations Through Noisy Computation

    Alessandro Achille and Stefano Soatto. Information dropout: Learning optimal representations through noisy computation. 2016. URL http://arxiv.org/abs/1611.01353

  3. [3]

    The IM algorithm: a variational approach to information maximization

    David Barber Felix Agakov. The IM algorithm: a variational approach to information maximization . In NIPS, volume 16, 2004

  4. [4]

    The virtues of peer pressure: A simple method for discovering high-value mistakes

    Shumeet Baluja, Michele Covell, and Rahul Sukthankar. The virtues of peer pressure: A simple method for discovering high-value mistakes. In Intl. Conf. Computer Analysis of Images and Patterns, 2015

  5. [5]

    Towards open world recognition

    Abhijit Bendale and Terrance Boult. Towards open world recognition. In CVPR, 2015

  6. [6]

    Predictability, complexity, and learning

    William Bialek, Ilya Nemenman, and Naftali Tishby. Predictability, complexity, and learning. Neural computation, 13 0 (11): 0 2409--2463, 2001

  7. [7]

    Weight uncertainty in neural networks

    Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight uncertainty in neural networks. In ICML, 2015

  8. [8]

    Browne and Paul D

    Ryan P. Browne and Paul D. McNicholas. Multivariate sharp quadratic bounds via -strong convexity and the fenchel connection. Electronic Journal of Statistics, 9, 2015

  9. [9]

    Towards evaluating the robustness of neural networks

    Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. Arxiv, 2016

  10. [10]

    Relevant sparse codes with variational information bottleneck

    Matthew Chalk, Olivier Marre, and Gasper Tkacik. Relevant sparse codes with variational information bottleneck. In NIPS, 2016

  11. [11]

    Chechik, A Globersonand N

    G. Chechik, A Globersonand N. Tishby, and Y. Weiss. Information bottleneck for gaussian variables. J. of Machine Learning Research, 6: 0 165–188, 2005

  12. [12]

    Differential privacy as a mutual information constraint

    Paul Cuff and Lanqing Yu. Differential privacy as a mutual information constraint. In ACM Conference on Computer and Communications Security ( CCS ) , 2016

  13. [13]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp.\ 248--255. IEEE, 2009

  14. [14]

    Robustness of classifiers: from adversarial to random noise

    Alhussein Fawzi, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. Robustness of classifiers: from adversarial to random noise. In NIPS, 2016

  15. [15]

    Understanding the difficulty of training deep feedforward neural networks

    Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In AI/Statistics, volume 9, pp.\ 249--256, 2010

  16. [16]

    Explaining and harnessing adversarial examples

    Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015

  17. [17]

    beta-VAE : Learning basic visual concepts with a constrained variational framework

    Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. beta-VAE : Learning basic visual concepts with a constrained variational framework. In ICLR , 2017. URL https://openreview.net/pdf?id=Sy2fzU9gl

  18. [18]

    Learning with a Strong Adversary

    Ruitong Huang, Bing Xu, Dale Schuurmans, and Csaba Szepesv \' a ri. Learning with a strong adversary. CoRR, abs/1511.03034, 2015

  19. [19]

    Adam: A method for stochastic optimization

    Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2015

  20. [20]

    Auto-encoding variational Bayes

    Diederik P Kingma and Max Welling. Auto-encoding variational Bayes . In ICLR, 2014

  21. [21]

    Adversarial examples in the physical world

    Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. In ICLR Workshop , 2017. URL https://openreview.net/pdf?id=S1OufnIlx

  22. [22]

    The Variational Fair Autoencoder

    Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, and Richard Zemel. The variational fair autoencoder. In ICLR , 2016. URL http://arxiv.org/abs/1511.00830

  23. [23]

    Information theory, inference and learning algorithms

    David JC MacKay. Information theory, inference and learning algorithms. Cambridge university press, 2003

  24. [24]

    Variational information maximisation for intrinsically motivated reinforcement learning

    Shakir Mohamed and Danilo Jimenez Rezende. Variational information maximisation for intrinsically motivated reinforcement learning. In NIPS, pp.\ 2125--2133, 2015

  25. [25]

    Universal adversarial perturbations

    Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. Arxiv, 2016

  26. [26]

    Deepfool: a simple and accurate method to fool deep neural networks

    Seyed - Mohsen Moosavi - Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In CVPR, 2016

  27. [27]

    Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images

    Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In CVPR, 2015. URL http://arxiv.org/abs/1412.1897

  28. [28]

    Predictive information in a sensory population

    Stephanie E Palmer, Olivier Marre, Michael J Berry, and William Bialek. Predictive information in a sensory population. PNAS, 112 0 (22): 0 6908--6913, 2015

  29. [29]

    The limitations of deep learning in adversarial settings

    Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. In Proceedings of the 1st IEEE European Symposium on Security and Privacy, 2015

  30. [30]

    Regularizing neural networks by penalizing confident output predictions

    Gabriel Pereyra, George Tuckery, Jan Chorowski, and Lukasz Kaiser. Regularizing neural networks by penalizing confident output predictions. In ICLR Workshop , 2017. URL https://openreview.net/pdf?id=HyhbYrGYe

  31. [31]

    Acceleration of stochastic approximation by averaging

    Boris T Polyak and Anatoli B Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30 0 (4): 0 838--855, 1992

  32. [32]

    Confusing Deep Convolution Networks by Relabelling

    Leigh Robinson and Benjamin Graham. Confusing deep convolution networks by relabelling. arXiv preprint 1510.06925, 2015

  33. [33]

    Adversarial manipulation of deep representations

    Sara Sabour, Yanshuai Cao, Fartash Faghri, and David J Fleet. Adversarial manipulation of deep representations. In ICLR, 2016

  34. [34]

    Learning and generalization with the information bottleneck

    Ohad Shamir, Sivan Sabato, and Naftali Tishby. Learning and generalization with the information bottleneck. Theoretical Computer Science, 411 0 (29-30): 0 2696--2711, 2010

  35. [35]

    Information-based clustering

    Noam Slonim, Gurinder Singh Atwal, Ga s per Tka c ik, and William Bialek. Information-based clustering. PNAS, 102 0 (51): 0 18297--18302, 2005

  36. [36]

    How many clusters? an information-theoretic perspective

    Susanne Still and William Bialek. How many clusters? an information-theoretic perspective. Neural computation, 16 0 (12): 0 2483--2506, 2004

  37. [37]

    Intriguing properties of neural networks

    Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In ICLR, 2014. URL http://arxiv.org/abs/1312.6199

  38. [38]

    Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

    Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alex Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:1602.07261, 2016

  39. [39]

    Deep learning and the information bottleneck principle

    N Tishby and N Zaslavsky. Deep learning and the information bottleneck principle. In IEEE Information Theory Workshop , pp.\ 1--5, April 2015 a

  40. [40]

    Tishby, F.C

    N. Tishby, F.C. Pereira, and W. Biale. The information bottleneck method. In The 37th annual Allerton Conf. on Communication, Control, and Computing, pp.\ 368--377, 1999

  41. [41]

    Deep learning and the information bottleneck principle

    Naftali Tishby and Noga Zaslavsky. Deep learning and the information bottleneck principle. In Information Theory Workshop (ITW), 2015 IEEE, pp.\ 1--5. IEEE, 2015 b

  42. [42]

    On the relation between identifiability, differential privacy and Mutual-Information privacy

    Weina Wang, Lei Ying, and Junshan Zhang. On the relation between identifiability, differential privacy and Mutual-Information privacy. IEEE Trans. Inf. Theory, 62: 0 5018--5029, 2016 a

  43. [43]

    Deep Variational Canonical Correlation Analysis

    Weiran Wang, Honglak Lee, and Karen Livescu. Deep variational canonical correlation analysis. arXiv [cs.LG], 11 October 2016 b . URL https://arxiv.org/abs/1610.03454