Entropy-Based Characterisation of the Polarised Regime in Latent Variable Models

Lisa Bonheme; Marek Grzes; Peter Clapham

arxiv: 2605.15965 · v1 · pith:V3ZGJTG7new · submitted 2026-05-15 · 💻 cs.LG

Entropy-Based Characterisation of the Polarised Regime in Latent Variable Models

Peter Clapham , Lisa Bonheme , Marek Grzes This is my paper

Pith reviewed 2026-05-20 19:30 UTC · model grok-4.3

classification 💻 cs.LG

keywords variational autoencoderslatent variable modelsentropypolarised regimeKL divergenceactive dimensionsinformation theory

0 comments

The pith

The entropy of the mean representation classifies active dimensions in the polarised regime of latent variable models without relying on a Gaussian prior.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that the entropy of the mean latent representation can classify which dimensions are active in variational models. This matters because it offers a prior-independent way to understand the polarised regime where some latents become active and others passive. The authors link this entropy to the KL minimisation objective using entropy-variance bounds and show it aligns with existing active-passive distinctions. They demonstrate that this works empirically on several VAE variants and related models, and that passive dimensions can still help in tasks after normalisation.

Core claim

The authors propose an information-theoretic classification of the polarised regime in latent variable models based on the entropy of the mean representation. They demonstrate theoretically that this entropy is coupled to KL minimisation via entropy-variance bounds and relate the criterion to Bonheme's active/passive conditions. The criterion recovers the polarised regime consistently across beta-VAEs, identifiable VAEs, least-volume autoencoders and L2-regularised autoencoders. Entropy of the mean alone cannot distinguish active from mixed dimensions without variance signals, but passive dimensions yield small consistent improvements on downstream tasks when codes are normalised, suggesting

What carries the argument

the entropy of the mean representation, which classifies active dimensions by its coupling to the KL term via variance bounds

If this is right

The proposed entropy criterion applies to variational models with various priors without requiring Gaussian assumptions.
Passive dimensions can improve downstream task performance when latent codes are appropriately normalised.
The entropy measure alone requires additional variance information to separate active from mixed dimensions.
The classification recovers polarised regimes whenever they appear in the tested model classes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This criterion could be used to monitor latent dimension usage during training in a wider range of generative models.
Appropriate scaling of all latent codes might allow models to retain more information without changing the training objective.
Testing the entropy criterion on models with non-Gaussian or discrete latent variables would check its broader applicability beyond the studied cases.

Load-bearing premise

The variance bounds that tie mean entropy to KL minimisation must be valid for the model and prior being used.

What would settle it

Running the entropy classification on a variational model with a heavy-tailed prior and checking if it matches the dimensions that actually contribute to reducing the KL term would test the claim.

Figures

Figures reproduced from arXiv: 2605.15965 by Lisa Bonheme, Marek Grzes, Peter Clapham.

**Figure 1.** Figure 1: Illustration of the architecture of a Variational Autoencoder (VAE). 2.2. Polarised Regime Understanding the polarised regime is crucial for interpreting the behaviour and limitations of VAEs in representing data. As discussed in Section 1, in a polarised regime the latent space is split into three categories: active, passive and mixed. • Active variables: These variables encode the information that cap… view at source ↗

**Figure 2.** Figure 2: The overlap between the marginal entropies H(𝐗) and H(𝐘) (Cover and Thomas, 2012). 2.4. Entropy Approximation Throughout the theoretical analysis (Section 4), we distinguish between Shannon entropy H(⋅) for discrete variables and differential entropy h(⋅) for continuous variables. In empirical sections, however, we use the notation H(⋅) generically to denote whichever entropy functional is being approxim… view at source ↗

**Figure 3.** Figure 3: Representative mean (left) and variance (right) distributions for active (top), passive (middle), and mixed (bottom) latent variables. We now clarify the relationship between Bonheme’s criteria and the entropy-based criterion introduced in Section 3. Our earlier analysis established that H(𝝁𝑖 ) is tightly linked to Var(𝝁𝑖 ) via entropy–variance inequalities. In particular, the entropy of the mean represen… view at source ↗

**Figure 4.** Figure 4: Per dimension differential entropy/variance for smallNORB 𝑛𝑧 = 64, 𝛽 = 4. Variables are classified by Bonheme’s criteria and coloured accordingly and active variables are rather dispersed (along the x-axis asymptote). This indicates that using the variance of the mean representation may serve to separate between active (or mixed) variables strongly, while the entropy may serve to separate between passive … view at source ↗

**Figure 5.** Figure 5: Differential entropy of 𝐳 for a number of different approximation techniques: histogram, k-nearest neighbours, Gaussian Mixture Model Monte Carlo. Taken together, these comparisons establish precise relationships between the three activity criteria. KL minimisation enforces Bonheme’s passive conditions exactly, recovering collapse when the mean and variance representations converge to the prior. Entropy… view at source ↗

**Figure 6.** Figure 6: Left variance, middle entropy, right mutual information. Values for a typical active variable 𝛽 = 4.0. increasing 𝛽 must reduce mutual information, this finding supports those found by Dai, Wang and Wipf (2020). Since the three measures behave consistently in all cases, we use the entropy H(𝝁) as the primary measure for identifying active and passive variables for the remainder of this paper. It does not… view at source ↗

**Figure 7.** Figure 7: Venn diagram of quantities from smallNORB 𝛽 = 4.0. Note the exclusion of 𝐻(𝝁|𝑿). 1.0 2.0 4.0 8.0 16.0 0.0 0.2 0.4 0.6 0.8 1.0 Stacked Entropy H(X| ) MI(X; ) H( |X) [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Venn diagram represented as a stacked graph, varying 𝛽. H(𝑿) stays constant, while H(𝝁) decreases for larger 𝛽. Results are scaled by the joint entropy H(𝑿, 𝝁), adding to 1. 6.3. Polarised Regime Using the threshold 𝐻(𝝁) > 𝜏, we observe a clear separation between active and passive variables throughout training. Representative examples from smallNORB with 𝛽 = 2.0 are shown in [PITH_FULL_IMAGE:figures/full… view at source ↗

**Figure 10.** Figure 10: Alternative passive variable on smallNORB, 𝛽 = 2.0. After convergence, the marginal entropy distribution shows a sharp division between a small set of high-entropy active dimensions and a cluster of near-zero passive dimensions ( [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗

**Figure 11.** Figure 11: Marginal entropies of 𝝁 for smallNORB at 𝛽 = 2.0. 6.4. Generalisability Applying the entropy criterion to LV-AEs reveals the same polarised structure ( [PITH_FULL_IMAGE:figures/full_fig_p010_11.png] view at source ↗

**Figure 13.** Figure 13: Typical variable distributions for trained iVAEs. This satisfies the arguments made by Wang and Cunningham (2021). While some dimensions appear to have very low entropy, as is common in selective collapse, it is clear the entropy is non-zero. This contrasts with Figures 11 - 12. 6.5. Downstream Tasks To evaluate the practical utility of the learned variables, we train logistic regressors on the top 𝑛 vari… view at source ↗

**Figure 12.** Figure 12: LV-AEs at two points of collapse. iVAEs also exhibit a polarised regime ( [PITH_FULL_IMAGE:figures/full_fig_p010_12.png] view at source ↗

**Figure 14.** Figure 14: VAE regression results using the top 𝑛 dimensions on smallNORB for various 𝛽. 1 2 3 4 5 6 7 8 9 10 Number of latent features 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Accuracy 1 2 3 4 5 6 7 8 9 10 Number of latent features 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Accuracy values 5 3 1 0.5 0.1 0.05 0.01 [PITH_FULL_IMAGE:figures/full_fig_p011_14.png] view at source ↗

**Figure 15.** Figure 15: iVAE regression performance on MNIST. 1 2 3 4 5 6 7 8 9 10 Number of latent features 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Accuracy 1 2 3 4 5 6 7 8 9 10 Number of latent features 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Accuracy λ values 0.0001 0.00015 0.0002 0.00025 0.0003 [PITH_FULL_IMAGE:figures/full_fig_p011_15.png] view at source ↗

**Figure 16.** Figure 16: LV-AE regression results on MNIST. 1 2 3 4 5 6 7 8 9 10 Number of latent features 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 Accuracy 1 2 3 4 5 6 7 8 9 10 Number of latent features 0.3 0.4 0.5 0.6 Accuracy Beta values 1.0 2.0 4.0 8.0 16.0 [PITH_FULL_IMAGE:figures/full_fig_p011_16.png] view at source ↗

**Figure 17.** Figure 17: L2-AE regression results on smallNORB. In regimes where entropy distributions do not show a clear polarised regime, the separation between active and passive variables becomes ambiguous. This may occur in weakly regularised models, where all latent dimensions retain moderate entropy and no dimensions are passive. In such cases, a pattern much more like [PITH_FULL_IMAGE:figures/full_fig_p011_17.png] view at source ↗

read the original abstract

Variational Autoencoders (VAEs) often exhibit a polarised regime in which latent variables separate into active, passive, and mixed subsets. Existing criteria for identifying active dimensions depend on a Gaussian prior, limiting their applicability to variational models and specific priors. We propose a simple information-theoretic classification of the polarised regime based on the entropy of the mean representation. We show theoretically how this entropy couples to KL minimisation through entropy--variance bounds, and we relate the resulting criterion to Bonheme's active/passive conditions. We also clarify a key limitation: entropy of the mean alone cannot reliably distinguish active from mixed dimensions without additional signals from the variance representation. Empirically, we evaluate the entropy criterion on $\beta$-VAEs, identifiable VAEs, Least-Volume Autoencoders, and L2-regularised autoencoders, and find that it consistently recovers a polarised regime when such a regime is present across the model classes studied. Finally, we show that passive dimensions can yield small but consistent improvements on downstream tasks when latent codes are appropriately normalised, suggesting that collapse is often a matter of scale rather than absolute information removal.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The entropy-of-the-mean criterion offers a practical prior-free diagnostic for the polarised regime, but the entropy-variance bounds that link it to KL minimisation look like the weakest link.

read the letter

The paper's main contribution is a simple entropy measure on the mean latent representation that classifies active, passive, and mixed dimensions without needing a Gaussian prior. It ties this to existing active/passive conditions through entropy-variance bounds and shows the measure recovers polarisation consistently in experiments across beta-VAEs, identifiable VAEs, least-volume autoencoders, and L2-regularised models. The downstream-task note that passive dimensions can still add value after normalisation is a small but concrete observation worth keeping in mind for practitioners who treat collapse as total information loss rather than a scaling issue.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an entropy-based criterion for characterizing the polarised regime in latent variable models such as VAEs, using the entropy of the mean representation to classify active, passive, and mixed dimensions. It claims a theoretical coupling of this entropy to KL minimisation via entropy-variance bounds, relates the criterion to Bonheme's active/passive conditions, explicitly notes the limitation that mean entropy alone cannot separate active from mixed dimensions without variance signals, and reports empirical consistency in recovering the polarised regime across β-VAEs, identifiable VAEs, Least-Volume Autoencoders, and L2-regularised autoencoders. It further suggests that passive dimensions can yield small downstream improvements when latent codes are normalised.

Significance. If the entropy-variance bounds hold generally and the empirical consistency is robust, the work could provide a useful prior-independent information-theoretic tool for analysing latent polarisation in variational models, extending beyond Gaussian-specific criteria. The cross-architecture evaluation and the practical observation on normalised passive dimensions are constructive contributions that could aid representation learning research.

major comments (2)

[Theoretical analysis section] Theoretical derivation of entropy-variance bounds: the central claim that mean entropy couples to KL minimisation (and thereby classifies active/passive dimensions) rests on these bounds. The manuscript must explicitly state the assumptions under which the bounds are derived and verify their tightness for the non-Gaussian priors and regularised objectives used in the β-VAE, identifiable VAE, and L2-regularised experiments; if the bounds become loose outside standard Gaussian variational families, the claimed coupling and classification do not reliably follow from mean entropy alone.
[Section discussing limitations of the mean entropy criterion] Clarification of limitation and its impact on the criterion: the paper correctly notes that entropy of the mean cannot reliably distinguish active from mixed dimensions without variance signals. This limitation should be quantified (e.g., via an explicit statement of the additional variance information required) because it directly affects whether the proposed entropy criterion can stand alone as a classification method or still depends on signals similar to those in existing approaches.

minor comments (2)

[Abstract and experimental setup] Ensure consistent terminology between the abstract (Least-Volume Autoencoders) and the experimental section for all four model classes.
[Methods or notation section] Provide an explicit mathematical definition of 'entropy of the mean representation' at the first point of use, including the precise expectation or summation involved.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which help clarify the presentation of our theoretical and empirical contributions. We address each major comment below and outline the revisions we will make.

read point-by-point responses

Referee: [Theoretical analysis section] Theoretical derivation of entropy-variance bounds: the central claim that mean entropy couples to KL minimisation (and thereby classifies active/passive dimensions) rests on these bounds. The manuscript must explicitly state the assumptions under which the bounds are derived and verify their tightness for the non-Gaussian priors and regularised objectives used in the β-VAE, identifiable VAE, and L2-regularised experiments; if the bounds become loose outside standard Gaussian variational families, the claimed coupling and classification do not reliably follow from mean entropy alone.

Authors: We agree that the assumptions must be stated explicitly. In the revised manuscript we will add a dedicated paragraph in the theoretical analysis section listing the assumptions (Gaussian variational posteriors, standard normal prior, and the specific entropy-variance inequality used). For tightness outside these assumptions, we acknowledge that the bounds are derived under Gaussian variational families and may loosen for non-Gaussian or heavily regularised objectives. Nevertheless, the empirical results across β-VAEs, identifiable VAEs, Least-Volume Autoencoders and L2-regularised autoencoders show consistent recovery of the polarised regime, indicating that the mean-entropy criterion remains practically useful even when the theoretical coupling is approximate. We will add a short discussion of this point. revision: partial
Referee: [Section discussing limitations of the mean entropy criterion] Clarification of limitation and its impact on the criterion: the paper correctly notes that entropy of the mean cannot reliably distinguish active from mixed dimensions without variance signals. This limitation should be quantified (e.g., via an explicit statement of the additional variance information required) because it directly affects whether the proposed entropy criterion can stand alone as a classification method or still depends on signals similar to those in existing approaches.

Authors: We will expand the limitations paragraph to quantify the requirement: distinguishing active from mixed dimensions requires at least one additional signal from the variance representation (e.g., the entropy of the per-dimension variances or a threshold on the average variance). We will state explicitly that mean entropy alone is therefore not a fully standalone classifier and is intended to be combined with variance information, consistent with the spirit of prior active/passive criteria. This clarification will be added without altering the core claim that mean entropy provides a prior-independent indicator of polarisation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper proposes an entropy-based criterion for the polarised regime and derives its coupling to KL minimisation via entropy-variance bounds under the stated variational and prior assumptions. The relation to Bonheme's active/passive conditions is presented as an additional connection rather than the foundation or definition of the new result. No equations or claims reduce by construction to fitted inputs, self-definitions, or unverified self-citations; the theoretical steps rely on modeling assumptions that are external to the target classification. The work is therefore self-contained against external benchmarks with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard information-theoretic inequalities (entropy-variance bounds) and the existence of a polarised regime in the tested models; no new free parameters or invented entities are introduced in the abstract. The main unstated premise is that the mean representation entropy is a sufficient statistic for regime detection once variance signals are added.

axioms (1)

domain assumption Entropy-variance bounds link the entropy of the mean latent representation to the KL divergence term under the variational family.
Invoked to show theoretical coupling between the proposed entropy criterion and KL minimisation.

pith-pipeline@v0.9.0 · 5732 in / 1425 out tokens · 35431 ms · 2026-05-20T19:30:38.730519+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

79 extracted references · 79 canonical work pages · 4 internal anchors

[1]

2019 , volume=

Junxian He and Daniel Spokoyny and Graham Neubig and Taylor Berg-Kirkpatrick , journal=. 2019 , volume=

work page 2019
[2]

Jordan and Zoubin Ghahramani and T

Michael I. Jordan and Zoubin Ghahramani and T. Jaakkola and Lawrence K. Saul , journal=. 1999 , volume=

work page 1999
[3]

Uesaka and S

Yuhta Takida and Wei-Hsiang Liao and T. Uesaka and S. Takahashi and Yuki Mitsufuji , journal=. 2021 , volume=

work page 2021
[4]

Dai and Ziyu Wang and D

B. Dai and Ziyu Wang and D. Wipf , booktitle=

work page
[5]

Bowman and L

Samuel R. Bowman and L. Vilnis and Oriol Vinyals and Andrew M. Dai and R. J. CoNLL , year=

work page
[6]

International Conference on Learning Representations (ICLR) , year=

Auto-Encoding Variational Bayes , author=. International Conference on Learning Representations (ICLR) , year=

work page
[7]

ArXiv , year=

Irina Higgins and David Amos and David Pfau and S. ArXiv , year=

work page
[8]

International Conference on Machine Learning , year=

Fixing a Broken ELBO , author=. International Conference on Machine Learning , year=

work page
[9]

Higgins and Lo

I. Higgins and Lo. ICLR , year=

work page
[10]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =

Yoshua Bengio and Aaron Courville and Pascal Vincent , title =. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =

work page
[11]

Science , volume=

Reducing the dimensionality of data with neural networks , author=. Science , volume=. 2006 , publisher=

work page 2006
[12]

Proceedings of the 35th International Conference on Machine Learning , volume =

Optimizing the Latent Space of Generative Networks , author =. Proceedings of the 35th International Conference on Machine Learning , volume =. 2018 , publisher =

work page 2018
[13]

A Survey of Inductive Biases for Factorial Representation-Learning

A survey of inductive biases for factorial representation-learning , author=. arXiv preprint arXiv:1612.05299 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[14]

International Conference on Learning Representations (ICLR) , year=

A framework for the quantitative evaluation of disentangled representations , author=. International Conference on Learning Representations (ICLR) , year=

work page
[15]

, title =

Kruskal, Joseph B. , title =. Psychometrika , volume =. 1964 , doi =

work page 1964
[16]

The information bottleneck method

The information bottleneck method , author=. arXiv preprint physics/0004057 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Vision Research , volume=

Sparse coding with an overcomplete basis set: A strategy employed by V1? , author=. Vision Research , volume=. 1997 , publisher=

work page 1997
[18]

Science , volume=

Nonlinear dimensionality reduction by locally linear embedding , author=. Science , volume=. 2000 , publisher=

work page 2000
[19]

2022 , volume=

Zihao Wang and Liu Ziyin , journal=. 2022 , volume=

work page 2022
[20]

Bin Dai and Yu Wang and John A. D. Aston and Gang Hua and David Paul Wipf , journal=. 2018 , volume=

work page 2018
[21]

Willcocks , journal=

Sam Bond-Taylor and Chris G. Willcocks , journal=

work page
[22]

Hinton , journal=

Simon Kornblith and Mohammad Norouzi and Honglak Lee and Geoffrey E. Hinton , journal=. 2019 , volume=

work page 2019
[23]

Bauer and M

Francesco Locatello and S. Bauer and M. Lucic and S. Gelly and B. Sch. ArXiv , year=

work page
[24]

Eastwood and Christopher K

C. Eastwood and Christopher K. I. Williams , booktitle=

work page
[25]

Journal of Machine Learning Research , year =

Lisa Bonheme and Marek Grzes , title =. Journal of Machine Learning Research , year =

work page
[26]

2023 International Conference on Machine Learning and Applications (ICMLA) , year=

Posterior Collapse in Variational Gradient Origin Networks , author=. 2023 International Conference on Machine Learning and Applications (ICMLA) , year=

work page 2023
[27]

Shannon , title =

Claude E. Shannon , title =. Bell System Technical Journal , volume =. 1948 , publisher =

work page 1948
[28]

Neural Networks , year=

Neural networks and principal component analysis: Learning from examples without local minima , author=. Neural Networks , year=

work page
[29]

Multilayer feedforward networks are universal approximators

Multilayer feedforward networks are universal approximators , journal =. 1989 , issn =. doi:https://doi.org/10.1016/0893-6080(89)90020-8 , url =

work page doi:10.1016/0893-6080(89)90020-8 1989
[30]

Signal Processing , volume=

Independent component analysis, A new concept? , author=. Signal Processing , volume=. 1994 , publisher=

work page 1994
[31]

Neural Networks , volume=

Independent component analysis: algorithms and applications , author=. Neural Networks , volume=. 2000 , publisher=

work page 2000
[32]

Convergent Learning: Do different neural networks learn the same representations?

Convergent Learning: Do different neural networks learn the same representations? , author =. arXiv preprint arXiv:1511.07543 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[33]

Advances in Neural Information Processing Systems , year =

Revisiting Model Stitching to Compare Neural Representations , author =. Advances in Neural Information Processing Systems , year =

work page
[34]

Proceedings of the 36th International Conference on Machine Learning , series =

Similarity of Neural Network Representations Revisited , author =. Proceedings of the 36th International Conference on Machine Learning , series =. 2019 , editor =

work page 2019
[35]

arXiv preprint arXiv:2205.08399 , year=

How do Variational Autoencoders Learn? Insights from Representational Similarity , author=. arXiv preprint arXiv:2205.08399 , year=

work page arXiv
[36]

Advances in Neural Information Processing Systems , year=

Implicit Neural Representations with Periodic Activation Functions , author=. Advances in Neural Information Processing Systems , year=

work page
[37]

Diagnosing and Enhancing

Bin Dai and David Wipf , booktitle=. Diagnosing and Enhancing. 2019 , url=

work page 2019
[38]

2012 , publisher=

Elements of Information Theory , author=. 2012 , publisher=

work page 2012
[39]

Variational Autoencoders Pursue PCA Directions (by Accident) , year=

Rolínek, Michal and Zietlow, Dominik and Martius, Georg , booktitle=. Variational Autoencoders Pursue PCA Directions (by Accident) , year=

work page
[40]

ArXiv , year=

Deep Variational Information Bottleneck , author=. ArXiv , year=

work page
[41]

2006 , publisher=

Pattern Recognition and Machine Learning , author=. 2006 , publisher=

work page 2006
[42]

Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume=

Probabilistic Principal Component Analysis , author=. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume=. 1999 , publisher=

work page 1999
[43]

Yann LeCun and L. Proc. IEEE , year=

work page
[44]

Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004

Yann LeCun and Fu Jie Huang and L. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004. , year=

work page 2004
[45]

Loic Matthey and Irina Higgins and Demis Hassabis and Alexander Lerchner , title =. 2017

work page 2017
[46]

Information Flows of Diverse Autoencoders , volume=

Lee, Sungyeop and Jo, Junghyo , year=. Information Flows of Diverse Autoencoders , volume=. Entropy , publisher=. doi:10.3390/e23070862 , number=

work page doi:10.3390/e23070862
[47]

Neural networks : the official journal of the International Neural Network Society , year=

Understanding Autoencoders with Information Theoretic Concepts , author=. Neural networks : the official journal of the International Neural Network Society , year=

work page
[48]

Proceedings of the 37th annual Allerton conference on communication, control and computing , volume=

The information bottleneck method , author=. Proceedings of the 37th annual Allerton conference on communication, control and computing , volume=

work page
[49]

IBM Journal of Research and Development , volume=

Information theoretical analysis of multivariate correlation , author=. IBM Journal of Research and Development , volume=. 1960 , publisher=

work page 1960
[50]

Tucker and R

James Lucas and G. Tucker and R. Grosse and Mohammad Norouzi. DGS@ICLR. 2019

work page 2019
[51]

Annals of Human Genetics , year=

THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , author=. Annals of Human Genetics , year=

work page
[52]

patterned

Bruno A. Olshausen and David J. Field , keywords =. Sparse coding with an overcomplete basis set: A strategy employed by V1? , journal =. 1997 , issn =. doi:https://doi.org/10.1016/S0042-6989(97)00169-7 , url =

work page doi:10.1016/s0042-6989(97)00169-7 1997
[53]

2011 , howpublished =

Andrew Ng , title =. 2011 , howpublished =

work page 2011
[54]

and McClelland, James L

Rumelhart, David E. and McClelland, James L. , booktitle=. Learning Internal Representations by Error Propagation , year=

work page
[55]

The Polarised Regime of identifiable Variational Autoencoders , booktitle =

Lisa Bonheme and Marek Grzes , url =. The Polarised Regime of identifiable Variational Autoencoders , booktitle =. 2023 , month =

work page 2023
[56]

The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science , volume=

On lines and planes of closest fit to systems of points in space , author=. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science , volume=. 1901 , publisher=

work page 1901
[57]

Journal of Educational Psychology , volume=

Analysis of a complex of statistical variables into principal components , author=. Journal of Educational Psychology , volume=. 1933 , publisher=

work page 1933
[58]

Towards A Rigorous Science of Interpretable Machine Learning

Towards A Rigorous Science of Interpretable Machine Learning , author=. arXiv preprint arXiv:1702.08608 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[59]

Advances in Neural Information Processing Systems , pages=

Attention Is All You Need , author=. Advances in Neural Information Processing Systems , pages=

work page
[60]

Advances in Neural Information Processing Systems , pages=

Generative Adversarial Nets , author=. Advances in Neural Information Processing Systems , pages=

work page
[61]

2016 , publisher=

Deep Learning , author=. 2016 , publisher=

work page 2016
[62]

1904 , publisher=

Spearman, Charles , journal=. 1904 , publisher=

work page 1904
[63]

1957 , publisher=

Dynamic Programming , author=. 1957 , publisher=

work page 1957
[64]

2002 , publisher =

Principal Component Analysis , author =. 2002 , publisher =

work page 2002
[65]

Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics , year =

Alféd Rényi , title =. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics , year =

work page
[66]

Resonance , year=

Equation of State Calculations by Fast Computing Machines , author=. Resonance , year=

work page
[67]

Biometrika , year=

Monte Carlo Sampling Methods Using Markov Chains and Their Applications , author=. Biometrika , year=

work page
[68]

Beal , title =

Michael J. Beal , title =. 2003 , type =

work page 2003
[69]

ICLR , year=

Compressing Latent Space via Least Volume , author=. ICLR , year=

work page
[70]

International Conference on Artificial Intelligence and Statistics , year=

Variational Autoencoders and Nonlinear ICA: A Unifying Framework , author=. International Conference on Artificial Intelligence and Statistics , year=

work page
[71]

Chen, Ricky T. Q. and Li, Xuechen and Grosse, Roger B and Duvenaud, David K , booktitle =. Isolating Sources of Disentanglement in Variational Autoencoders , url =

work page
[72]

Third Symposium on Advances in Approximate Bayesian Inference , year=

Posterior Collapse and Latent Variable Non-identifiability , author=. Third Symposium on Advances in Approximate Bayesian Inference , year=

work page
[73]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year=

The Intrinsic Dimension of Images and Its Impact on Learning , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year=

work page
[74]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops , year=

Spectral Intrinsic Dimensionality Estimation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops , year=

work page
[75]

An Empirical Bayes Approach to Statistics

Robbins, Herbert E. An Empirical Bayes Approach to Statistics. Breakthroughs in Statistics: Foundations and Basic Theory. 1992. doi:10.1007/978-1-4612-0919-5_26

work page doi:10.1007/978-1-4612-0919-5_26 1992
[76]

1983 , edition =

Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences , author =. 1983 , edition =

work page 1983
[77]

Social Science Research Council Bulletin , year =

Horst, Paul , title =. Social Science Research Council Bulletin , year =

work page
[78]

2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE) , pages=

Empirical Comparison between Autoencoders and Traditional Dimensionality Reduction Methods , author=. 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE) , pages=. 2019 , organization=

work page 2019
[79]

Applied Sciences , VOLUME =

Yu, Jinyue and Sun, Zhiqiang and Yu, Chengcheng , TITLE =. Applied Sciences , VOLUME =. 2025 , NUMBER =

work page 2025

[1] [1]

2019 , volume=

Junxian He and Daniel Spokoyny and Graham Neubig and Taylor Berg-Kirkpatrick , journal=. 2019 , volume=

work page 2019

[2] [2]

Jordan and Zoubin Ghahramani and T

Michael I. Jordan and Zoubin Ghahramani and T. Jaakkola and Lawrence K. Saul , journal=. 1999 , volume=

work page 1999

[3] [3]

Uesaka and S

Yuhta Takida and Wei-Hsiang Liao and T. Uesaka and S. Takahashi and Yuki Mitsufuji , journal=. 2021 , volume=

work page 2021

[4] [4]

Dai and Ziyu Wang and D

B. Dai and Ziyu Wang and D. Wipf , booktitle=

work page

[5] [5]

Bowman and L

Samuel R. Bowman and L. Vilnis and Oriol Vinyals and Andrew M. Dai and R. J. CoNLL , year=

work page

[6] [6]

International Conference on Learning Representations (ICLR) , year=

Auto-Encoding Variational Bayes , author=. International Conference on Learning Representations (ICLR) , year=

work page

[7] [7]

ArXiv , year=

Irina Higgins and David Amos and David Pfau and S. ArXiv , year=

work page

[8] [8]

International Conference on Machine Learning , year=

Fixing a Broken ELBO , author=. International Conference on Machine Learning , year=

work page

[9] [9]

Higgins and Lo

I. Higgins and Lo. ICLR , year=

work page

[10] [10]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =

Yoshua Bengio and Aaron Courville and Pascal Vincent , title =. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =

work page

[11] [11]

Science , volume=

Reducing the dimensionality of data with neural networks , author=. Science , volume=. 2006 , publisher=

work page 2006

[12] [12]

Proceedings of the 35th International Conference on Machine Learning , volume =

Optimizing the Latent Space of Generative Networks , author =. Proceedings of the 35th International Conference on Machine Learning , volume =. 2018 , publisher =

work page 2018

[13] [13]

A Survey of Inductive Biases for Factorial Representation-Learning

A survey of inductive biases for factorial representation-learning , author=. arXiv preprint arXiv:1612.05299 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

International Conference on Learning Representations (ICLR) , year=

A framework for the quantitative evaluation of disentangled representations , author=. International Conference on Learning Representations (ICLR) , year=

work page

[15] [15]

, title =

Kruskal, Joseph B. , title =. Psychometrika , volume =. 1964 , doi =

work page 1964

[16] [16]

The information bottleneck method

The information bottleneck method , author=. arXiv preprint physics/0004057 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

Vision Research , volume=

Sparse coding with an overcomplete basis set: A strategy employed by V1? , author=. Vision Research , volume=. 1997 , publisher=

work page 1997

[18] [18]

Science , volume=

Nonlinear dimensionality reduction by locally linear embedding , author=. Science , volume=. 2000 , publisher=

work page 2000

[19] [19]

2022 , volume=

Zihao Wang and Liu Ziyin , journal=. 2022 , volume=

work page 2022

[20] [20]

Bin Dai and Yu Wang and John A. D. Aston and Gang Hua and David Paul Wipf , journal=. 2018 , volume=

work page 2018

[21] [21]

Willcocks , journal=

Sam Bond-Taylor and Chris G. Willcocks , journal=

work page

[22] [22]

Hinton , journal=

Simon Kornblith and Mohammad Norouzi and Honglak Lee and Geoffrey E. Hinton , journal=. 2019 , volume=

work page 2019

[23] [23]

Bauer and M

Francesco Locatello and S. Bauer and M. Lucic and S. Gelly and B. Sch. ArXiv , year=

work page

[24] [24]

Eastwood and Christopher K

C. Eastwood and Christopher K. I. Williams , booktitle=

work page

[25] [25]

Journal of Machine Learning Research , year =

Lisa Bonheme and Marek Grzes , title =. Journal of Machine Learning Research , year =

work page

[26] [26]

2023 International Conference on Machine Learning and Applications (ICMLA) , year=

Posterior Collapse in Variational Gradient Origin Networks , author=. 2023 International Conference on Machine Learning and Applications (ICMLA) , year=

work page 2023

[27] [27]

Shannon , title =

Claude E. Shannon , title =. Bell System Technical Journal , volume =. 1948 , publisher =

work page 1948

[28] [28]

Neural Networks , year=

Neural networks and principal component analysis: Learning from examples without local minima , author=. Neural Networks , year=

work page

[29] [29]

Multilayer feedforward networks are universal approximators

Multilayer feedforward networks are universal approximators , journal =. 1989 , issn =. doi:https://doi.org/10.1016/0893-6080(89)90020-8 , url =

work page doi:10.1016/0893-6080(89)90020-8 1989

[30] [30]

Signal Processing , volume=

Independent component analysis, A new concept? , author=. Signal Processing , volume=. 1994 , publisher=

work page 1994

[31] [31]

Neural Networks , volume=

Independent component analysis: algorithms and applications , author=. Neural Networks , volume=. 2000 , publisher=

work page 2000

[32] [32]

Convergent Learning: Do different neural networks learn the same representations?

Convergent Learning: Do different neural networks learn the same representations? , author =. arXiv preprint arXiv:1511.07543 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[33] [33]

Advances in Neural Information Processing Systems , year =

Revisiting Model Stitching to Compare Neural Representations , author =. Advances in Neural Information Processing Systems , year =

work page

[34] [34]

Proceedings of the 36th International Conference on Machine Learning , series =

Similarity of Neural Network Representations Revisited , author =. Proceedings of the 36th International Conference on Machine Learning , series =. 2019 , editor =

work page 2019

[35] [35]

arXiv preprint arXiv:2205.08399 , year=

How do Variational Autoencoders Learn? Insights from Representational Similarity , author=. arXiv preprint arXiv:2205.08399 , year=

work page arXiv

[36] [36]

Advances in Neural Information Processing Systems , year=

Implicit Neural Representations with Periodic Activation Functions , author=. Advances in Neural Information Processing Systems , year=

work page

[37] [37]

Diagnosing and Enhancing

Bin Dai and David Wipf , booktitle=. Diagnosing and Enhancing. 2019 , url=

work page 2019

[38] [38]

2012 , publisher=

Elements of Information Theory , author=. 2012 , publisher=

work page 2012

[39] [39]

Variational Autoencoders Pursue PCA Directions (by Accident) , year=

Rolínek, Michal and Zietlow, Dominik and Martius, Georg , booktitle=. Variational Autoencoders Pursue PCA Directions (by Accident) , year=

work page

[40] [40]

ArXiv , year=

Deep Variational Information Bottleneck , author=. ArXiv , year=

work page

[41] [41]

2006 , publisher=

Pattern Recognition and Machine Learning , author=. 2006 , publisher=

work page 2006

[42] [42]

Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume=

Probabilistic Principal Component Analysis , author=. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume=. 1999 , publisher=

work page 1999

[43] [43]

Yann LeCun and L. Proc. IEEE , year=

work page

[44] [44]

Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004

Yann LeCun and Fu Jie Huang and L. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004. , year=

work page 2004

[45] [45]

Loic Matthey and Irina Higgins and Demis Hassabis and Alexander Lerchner , title =. 2017

work page 2017

[46] [46]

Information Flows of Diverse Autoencoders , volume=

Lee, Sungyeop and Jo, Junghyo , year=. Information Flows of Diverse Autoencoders , volume=. Entropy , publisher=. doi:10.3390/e23070862 , number=

work page doi:10.3390/e23070862

[47] [47]

Neural networks : the official journal of the International Neural Network Society , year=

Understanding Autoencoders with Information Theoretic Concepts , author=. Neural networks : the official journal of the International Neural Network Society , year=

work page

[48] [48]

Proceedings of the 37th annual Allerton conference on communication, control and computing , volume=

The information bottleneck method , author=. Proceedings of the 37th annual Allerton conference on communication, control and computing , volume=

work page

[49] [49]

IBM Journal of Research and Development , volume=

Information theoretical analysis of multivariate correlation , author=. IBM Journal of Research and Development , volume=. 1960 , publisher=

work page 1960

[50] [50]

Tucker and R

James Lucas and G. Tucker and R. Grosse and Mohammad Norouzi. DGS@ICLR. 2019

work page 2019

[51] [51]

Annals of Human Genetics , year=

THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , author=. Annals of Human Genetics , year=

work page

[52] [52]

patterned

Bruno A. Olshausen and David J. Field , keywords =. Sparse coding with an overcomplete basis set: A strategy employed by V1? , journal =. 1997 , issn =. doi:https://doi.org/10.1016/S0042-6989(97)00169-7 , url =

work page doi:10.1016/s0042-6989(97)00169-7 1997

[53] [53]

2011 , howpublished =

Andrew Ng , title =. 2011 , howpublished =

work page 2011

[54] [54]

and McClelland, James L

Rumelhart, David E. and McClelland, James L. , booktitle=. Learning Internal Representations by Error Propagation , year=

work page

[55] [55]

The Polarised Regime of identifiable Variational Autoencoders , booktitle =

Lisa Bonheme and Marek Grzes , url =. The Polarised Regime of identifiable Variational Autoencoders , booktitle =. 2023 , month =

work page 2023

[56] [56]

The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science , volume=

On lines and planes of closest fit to systems of points in space , author=. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science , volume=. 1901 , publisher=

work page 1901

[57] [57]

Journal of Educational Psychology , volume=

Analysis of a complex of statistical variables into principal components , author=. Journal of Educational Psychology , volume=. 1933 , publisher=

work page 1933

[58] [58]

Towards A Rigorous Science of Interpretable Machine Learning

Towards A Rigorous Science of Interpretable Machine Learning , author=. arXiv preprint arXiv:1702.08608 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[59] [59]

Advances in Neural Information Processing Systems , pages=

Attention Is All You Need , author=. Advances in Neural Information Processing Systems , pages=

work page

[60] [60]

Advances in Neural Information Processing Systems , pages=

Generative Adversarial Nets , author=. Advances in Neural Information Processing Systems , pages=

work page

[61] [61]

2016 , publisher=

Deep Learning , author=. 2016 , publisher=

work page 2016

[62] [62]

1904 , publisher=

Spearman, Charles , journal=. 1904 , publisher=

work page 1904

[63] [63]

1957 , publisher=

Dynamic Programming , author=. 1957 , publisher=

work page 1957

[64] [64]

2002 , publisher =

Principal Component Analysis , author =. 2002 , publisher =

work page 2002

[65] [65]

Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics , year =

Alféd Rényi , title =. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics , year =

work page

[66] [66]

Resonance , year=

Equation of State Calculations by Fast Computing Machines , author=. Resonance , year=

work page

[67] [67]

Biometrika , year=

Monte Carlo Sampling Methods Using Markov Chains and Their Applications , author=. Biometrika , year=

work page

[68] [68]

Beal , title =

Michael J. Beal , title =. 2003 , type =

work page 2003

[69] [69]

ICLR , year=

Compressing Latent Space via Least Volume , author=. ICLR , year=

work page

[70] [70]

International Conference on Artificial Intelligence and Statistics , year=

Variational Autoencoders and Nonlinear ICA: A Unifying Framework , author=. International Conference on Artificial Intelligence and Statistics , year=

work page

[71] [71]

Chen, Ricky T. Q. and Li, Xuechen and Grosse, Roger B and Duvenaud, David K , booktitle =. Isolating Sources of Disentanglement in Variational Autoencoders , url =

work page

[72] [72]

Third Symposium on Advances in Approximate Bayesian Inference , year=

Posterior Collapse and Latent Variable Non-identifiability , author=. Third Symposium on Advances in Approximate Bayesian Inference , year=

work page

[73] [73]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year=

The Intrinsic Dimension of Images and Its Impact on Learning , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year=

work page

[74] [74]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops , year=

Spectral Intrinsic Dimensionality Estimation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops , year=

work page

[75] [75]

An Empirical Bayes Approach to Statistics

Robbins, Herbert E. An Empirical Bayes Approach to Statistics. Breakthroughs in Statistics: Foundations and Basic Theory. 1992. doi:10.1007/978-1-4612-0919-5_26

work page doi:10.1007/978-1-4612-0919-5_26 1992

[76] [76]

1983 , edition =

Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences , author =. 1983 , edition =

work page 1983

[77] [77]

Social Science Research Council Bulletin , year =

Horst, Paul , title =. Social Science Research Council Bulletin , year =

work page

[78] [78]

2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE) , pages=

Empirical Comparison between Autoencoders and Traditional Dimensionality Reduction Methods , author=. 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE) , pages=. 2019 , organization=

work page 2019

[79] [79]

Applied Sciences , VOLUME =

Yu, Jinyue and Sun, Zhiqiang and Yu, Chengcheng , TITLE =. Applied Sciences , VOLUME =. 2025 , NUMBER =

work page 2025