pith. sign in

arxiv: 2605.15965 · v1 · pith:V3ZGJTG7new · submitted 2026-05-15 · 💻 cs.LG

Entropy-Based Characterisation of the Polarised Regime in Latent Variable Models

Pith reviewed 2026-05-20 19:30 UTC · model grok-4.3

classification 💻 cs.LG
keywords variational autoencoderslatent variable modelsentropypolarised regimeKL divergenceactive dimensionsinformation theory
0
0 comments X

The pith

The entropy of the mean representation classifies active dimensions in the polarised regime of latent variable models without relying on a Gaussian prior.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that the entropy of the mean latent representation can classify which dimensions are active in variational models. This matters because it offers a prior-independent way to understand the polarised regime where some latents become active and others passive. The authors link this entropy to the KL minimisation objective using entropy-variance bounds and show it aligns with existing active-passive distinctions. They demonstrate that this works empirically on several VAE variants and related models, and that passive dimensions can still help in tasks after normalisation.

Core claim

The authors propose an information-theoretic classification of the polarised regime in latent variable models based on the entropy of the mean representation. They demonstrate theoretically that this entropy is coupled to KL minimisation via entropy-variance bounds and relate the criterion to Bonheme's active/passive conditions. The criterion recovers the polarised regime consistently across beta-VAEs, identifiable VAEs, least-volume autoencoders and L2-regularised autoencoders. Entropy of the mean alone cannot distinguish active from mixed dimensions without variance signals, but passive dimensions yield small consistent improvements on downstream tasks when codes are normalised, suggesting

What carries the argument

the entropy of the mean representation, which classifies active dimensions by its coupling to the KL term via variance bounds

If this is right

  • The proposed entropy criterion applies to variational models with various priors without requiring Gaussian assumptions.
  • Passive dimensions can improve downstream task performance when latent codes are appropriately normalised.
  • The entropy measure alone requires additional variance information to separate active from mixed dimensions.
  • The classification recovers polarised regimes whenever they appear in the tested model classes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This criterion could be used to monitor latent dimension usage during training in a wider range of generative models.
  • Appropriate scaling of all latent codes might allow models to retain more information without changing the training objective.
  • Testing the entropy criterion on models with non-Gaussian or discrete latent variables would check its broader applicability beyond the studied cases.

Load-bearing premise

The variance bounds that tie mean entropy to KL minimisation must be valid for the model and prior being used.

What would settle it

Running the entropy classification on a variational model with a heavy-tailed prior and checking if it matches the dimensions that actually contribute to reducing the KL term would test the claim.

Figures

Figures reproduced from arXiv: 2605.15965 by Lisa Bonheme, Marek Grzes, Peter Clapham.

Figure 1
Figure 1. Figure 1: Illustration of the architecture of a Variational Autoencoder (VAE). 2.2. Polarised Regime Understanding the polarised regime is crucial for inter￾preting the behaviour and limitations of VAEs in represent￾ing data. As discussed in Section 1, in a polarised regime the latent space is split into three categories: active, passive and mixed. • Active variables: These variables encode the infor￾mation that cap… view at source ↗
Figure 2
Figure 2. Figure 2: The overlap between the marginal entropies H(𝐗) and H(𝐘) (Cover and Thomas, 2012). 2.4. Entropy Approximation Throughout the theoretical analysis (Section 4), we dis￾tinguish between Shannon entropy H(⋅) for discrete vari￾ables and differential entropy h(⋅) for continuous variables. In empirical sections, however, we use the notation H(⋅) generically to denote whichever entropy functional is being approxim… view at source ↗
Figure 3
Figure 3. Figure 3: Representative mean (left) and variance (right) distributions for active (top), passive (middle), and mixed (bottom) latent variables. We now clarify the relationship between Bonheme’s cri￾teria and the entropy-based criterion introduced in Section 3. Our earlier analysis established that H(𝝁𝑖 ) is tightly linked to Var(𝝁𝑖 ) via entropy–variance inequalities. In particular, the entropy of the mean represen… view at source ↗
Figure 4
Figure 4. Figure 4: Per dimension differential entropy/variance for small￾NORB 𝑛𝑧 = 64, 𝛽 = 4. Variables are classified by Bonheme’s criteria and coloured accordingly and active variables are rather dispersed (along the x-axis asymptote). This indicates that using the variance of the mean representation may serve to separate between active (or mixed) variables strongly, while the entropy may serve to separate between passive … view at source ↗
Figure 5
Figure 5. Figure 5: Differential entropy of 𝐳 for a number of different approximation techniques: histogram, k-nearest neighbours, Gaussian Mixture Model Monte Carlo. Taken together, these comparisons establish precise re￾lationships between the three activity criteria. KL min￾imisation enforces Bonheme’s passive conditions exactly, recovering collapse when the mean and variance represen￾tations converge to the prior. Entropy… view at source ↗
Figure 6
Figure 6. Figure 6: Left variance, middle entropy, right mutual information. Values for a typical active variable 𝛽 = 4.0. increasing 𝛽 must reduce mutual information, this finding supports those found by Dai, Wang and Wipf (2020). Since the three measures behave consistently in all cases, we use the entropy H(𝝁) as the primary measure for identify￾ing active and passive variables for the remainder of this pa￾per. It does not… view at source ↗
Figure 7
Figure 7. Figure 7: Venn diagram of quantities from smallNORB 𝛽 = 4.0. Note the exclusion of 𝐻(𝝁|𝑿). 1.0 2.0 4.0 8.0 16.0 0.0 0.2 0.4 0.6 0.8 1.0 Stacked Entropy H(X| ) MI(X; ) H( |X) [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Venn diagram represented as a stacked graph, varying 𝛽. H(𝑿) stays constant, while H(𝝁) decreases for larger 𝛽. Results are scaled by the joint entropy H(𝑿, 𝝁), adding to 1. 6.3. Polarised Regime Using the threshold 𝐻(𝝁) > 𝜏, we observe a clear separation between active and passive variables throughout training. Representative examples from smallNORB with 𝛽 = 2.0 are shown in [PITH_FULL_IMAGE:figures/full… view at source ↗
Figure 10
Figure 10. Figure 10: Alternative passive variable on smallNORB, 𝛽 = 2.0. After convergence, the marginal entropy distribution shows a sharp division between a small set of high-entropy active dimensions and a cluster of near-zero passive dimen￾sions ( [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Marginal entropies of 𝝁 for smallNORB at 𝛽 = 2.0. 6.4. Generalisability Applying the entropy criterion to LV-AEs reveals the same polarised structure ( [PITH_FULL_IMAGE:figures/full_fig_p010_11.png] view at source ↗
Figure 13
Figure 13. Figure 13: Typical variable distributions for trained iVAEs. This satisfies the arguments made by Wang and Cunningham (2021). While some dimensions appear to have very low entropy, as is common in selective collapse, it is clear the entropy is non-zero. This contrasts with Figures 11 - 12. 6.5. Downstream Tasks To evaluate the practical utility of the learned variables, we train logistic regressors on the top 𝑛 vari… view at source ↗
Figure 12
Figure 12. Figure 12: LV-AEs at two points of collapse. iVAEs also exhibit a polarised regime ( [PITH_FULL_IMAGE:figures/full_fig_p010_12.png] view at source ↗
Figure 14
Figure 14. Figure 14: VAE regression results using the top 𝑛 dimensions on smallNORB for various 𝛽. 1 2 3 4 5 6 7 8 9 10 Number of latent features 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Accuracy 1 2 3 4 5 6 7 8 9 10 Number of latent features 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Accuracy values 5 3 1 0.5 0.1 0.05 0.01 [PITH_FULL_IMAGE:figures/full_fig_p011_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: iVAE regression performance on MNIST. 1 2 3 4 5 6 7 8 9 10 Number of latent features 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Accuracy 1 2 3 4 5 6 7 8 9 10 Number of latent features 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Accuracy λ values 0.0001 0.00015 0.0002 0.00025 0.0003 [PITH_FULL_IMAGE:figures/full_fig_p011_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: LV-AE regression results on MNIST. 1 2 3 4 5 6 7 8 9 10 Number of latent features 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 Accuracy 1 2 3 4 5 6 7 8 9 10 Number of latent features 0.3 0.4 0.5 0.6 Accuracy Beta values 1.0 2.0 4.0 8.0 16.0 [PITH_FULL_IMAGE:figures/full_fig_p011_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: L2-AE regression results on smallNORB. In regimes where entropy distributions do not show a clear polarised regime, the separation between active and passive variables becomes ambiguous. This may occur in weakly regularised models, where all latent dimensions re￾tain moderate entropy and no dimensions are passive. In such cases, a pattern much more like [PITH_FULL_IMAGE:figures/full_fig_p011_17.png] view at source ↗
read the original abstract

Variational Autoencoders (VAEs) often exhibit a polarised regime in which latent variables separate into active, passive, and mixed subsets. Existing criteria for identifying active dimensions depend on a Gaussian prior, limiting their applicability to variational models and specific priors. We propose a simple information-theoretic classification of the polarised regime based on the entropy of the mean representation. We show theoretically how this entropy couples to KL minimisation through entropy--variance bounds, and we relate the resulting criterion to Bonheme's active/passive conditions. We also clarify a key limitation: entropy of the mean alone cannot reliably distinguish active from mixed dimensions without additional signals from the variance representation. Empirically, we evaluate the entropy criterion on $\beta$-VAEs, identifiable VAEs, Least-Volume Autoencoders, and L2-regularised autoencoders, and find that it consistently recovers a polarised regime when such a regime is present across the model classes studied. Finally, we show that passive dimensions can yield small but consistent improvements on downstream tasks when latent codes are appropriately normalised, suggesting that collapse is often a matter of scale rather than absolute information removal.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an entropy-based criterion for characterizing the polarised regime in latent variable models such as VAEs, using the entropy of the mean representation to classify active, passive, and mixed dimensions. It claims a theoretical coupling of this entropy to KL minimisation via entropy-variance bounds, relates the criterion to Bonheme's active/passive conditions, explicitly notes the limitation that mean entropy alone cannot separate active from mixed dimensions without variance signals, and reports empirical consistency in recovering the polarised regime across β-VAEs, identifiable VAEs, Least-Volume Autoencoders, and L2-regularised autoencoders. It further suggests that passive dimensions can yield small downstream improvements when latent codes are normalised.

Significance. If the entropy-variance bounds hold generally and the empirical consistency is robust, the work could provide a useful prior-independent information-theoretic tool for analysing latent polarisation in variational models, extending beyond Gaussian-specific criteria. The cross-architecture evaluation and the practical observation on normalised passive dimensions are constructive contributions that could aid representation learning research.

major comments (2)
  1. [Theoretical analysis section] Theoretical derivation of entropy-variance bounds: the central claim that mean entropy couples to KL minimisation (and thereby classifies active/passive dimensions) rests on these bounds. The manuscript must explicitly state the assumptions under which the bounds are derived and verify their tightness for the non-Gaussian priors and regularised objectives used in the β-VAE, identifiable VAE, and L2-regularised experiments; if the bounds become loose outside standard Gaussian variational families, the claimed coupling and classification do not reliably follow from mean entropy alone.
  2. [Section discussing limitations of the mean entropy criterion] Clarification of limitation and its impact on the criterion: the paper correctly notes that entropy of the mean cannot reliably distinguish active from mixed dimensions without variance signals. This limitation should be quantified (e.g., via an explicit statement of the additional variance information required) because it directly affects whether the proposed entropy criterion can stand alone as a classification method or still depends on signals similar to those in existing approaches.
minor comments (2)
  1. [Abstract and experimental setup] Ensure consistent terminology between the abstract (Least-Volume Autoencoders) and the experimental section for all four model classes.
  2. [Methods or notation section] Provide an explicit mathematical definition of 'entropy of the mean representation' at the first point of use, including the precise expectation or summation involved.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which help clarify the presentation of our theoretical and empirical contributions. We address each major comment below and outline the revisions we will make.

read point-by-point responses
  1. Referee: [Theoretical analysis section] Theoretical derivation of entropy-variance bounds: the central claim that mean entropy couples to KL minimisation (and thereby classifies active/passive dimensions) rests on these bounds. The manuscript must explicitly state the assumptions under which the bounds are derived and verify their tightness for the non-Gaussian priors and regularised objectives used in the β-VAE, identifiable VAE, and L2-regularised experiments; if the bounds become loose outside standard Gaussian variational families, the claimed coupling and classification do not reliably follow from mean entropy alone.

    Authors: We agree that the assumptions must be stated explicitly. In the revised manuscript we will add a dedicated paragraph in the theoretical analysis section listing the assumptions (Gaussian variational posteriors, standard normal prior, and the specific entropy-variance inequality used). For tightness outside these assumptions, we acknowledge that the bounds are derived under Gaussian variational families and may loosen for non-Gaussian or heavily regularised objectives. Nevertheless, the empirical results across β-VAEs, identifiable VAEs, Least-Volume Autoencoders and L2-regularised autoencoders show consistent recovery of the polarised regime, indicating that the mean-entropy criterion remains practically useful even when the theoretical coupling is approximate. We will add a short discussion of this point. revision: partial

  2. Referee: [Section discussing limitations of the mean entropy criterion] Clarification of limitation and its impact on the criterion: the paper correctly notes that entropy of the mean cannot reliably distinguish active from mixed dimensions without variance signals. This limitation should be quantified (e.g., via an explicit statement of the additional variance information required) because it directly affects whether the proposed entropy criterion can stand alone as a classification method or still depends on signals similar to those in existing approaches.

    Authors: We will expand the limitations paragraph to quantify the requirement: distinguishing active from mixed dimensions requires at least one additional signal from the variance representation (e.g., the entropy of the per-dimension variances or a threshold on the average variance). We will state explicitly that mean entropy alone is therefore not a fully standalone classifier and is intended to be combined with variance information, consistent with the spirit of prior active/passive criteria. This clarification will be added without altering the core claim that mean entropy provides a prior-independent indicator of polarisation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper proposes an entropy-based criterion for the polarised regime and derives its coupling to KL minimisation via entropy-variance bounds under the stated variational and prior assumptions. The relation to Bonheme's active/passive conditions is presented as an additional connection rather than the foundation or definition of the new result. No equations or claims reduce by construction to fitted inputs, self-definitions, or unverified self-citations; the theoretical steps rely on modeling assumptions that are external to the target classification. The work is therefore self-contained against external benchmarks with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard information-theoretic inequalities (entropy-variance bounds) and the existence of a polarised regime in the tested models; no new free parameters or invented entities are introduced in the abstract. The main unstated premise is that the mean representation entropy is a sufficient statistic for regime detection once variance signals are added.

axioms (1)
  • domain assumption Entropy-variance bounds link the entropy of the mean latent representation to the KL divergence term under the variational family.
    Invoked to show theoretical coupling between the proposed entropy criterion and KL minimisation.

pith-pipeline@v0.9.0 · 5732 in / 1425 out tokens · 35431 ms · 2026-05-20T19:30:38.730519+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

79 extracted references · 79 canonical work pages · 4 internal anchors

  1. [1]

    2019 , volume=

    Junxian He and Daniel Spokoyny and Graham Neubig and Taylor Berg-Kirkpatrick , journal=. 2019 , volume=

  2. [2]

    Jordan and Zoubin Ghahramani and T

    Michael I. Jordan and Zoubin Ghahramani and T. Jaakkola and Lawrence K. Saul , journal=. 1999 , volume=

  3. [3]

    Uesaka and S

    Yuhta Takida and Wei-Hsiang Liao and T. Uesaka and S. Takahashi and Yuki Mitsufuji , journal=. 2021 , volume=

  4. [4]

    Dai and Ziyu Wang and D

    B. Dai and Ziyu Wang and D. Wipf , booktitle=

  5. [5]

    Bowman and L

    Samuel R. Bowman and L. Vilnis and Oriol Vinyals and Andrew M. Dai and R. J. CoNLL , year=

  6. [6]

    International Conference on Learning Representations (ICLR) , year=

    Auto-Encoding Variational Bayes , author=. International Conference on Learning Representations (ICLR) , year=

  7. [7]

    ArXiv , year=

    Irina Higgins and David Amos and David Pfau and S. ArXiv , year=

  8. [8]

    International Conference on Machine Learning , year=

    Fixing a Broken ELBO , author=. International Conference on Machine Learning , year=

  9. [9]

    Higgins and Lo

    I. Higgins and Lo. ICLR , year=

  10. [10]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =

    Yoshua Bengio and Aaron Courville and Pascal Vincent , title =. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =

  11. [11]

    Science , volume=

    Reducing the dimensionality of data with neural networks , author=. Science , volume=. 2006 , publisher=

  12. [12]

    Proceedings of the 35th International Conference on Machine Learning , volume =

    Optimizing the Latent Space of Generative Networks , author =. Proceedings of the 35th International Conference on Machine Learning , volume =. 2018 , publisher =

  13. [13]

    A Survey of Inductive Biases for Factorial Representation-Learning

    A survey of inductive biases for factorial representation-learning , author=. arXiv preprint arXiv:1612.05299 , year=

  14. [14]

    International Conference on Learning Representations (ICLR) , year=

    A framework for the quantitative evaluation of disentangled representations , author=. International Conference on Learning Representations (ICLR) , year=

  15. [15]

    , title =

    Kruskal, Joseph B. , title =. Psychometrika , volume =. 1964 , doi =

  16. [16]

    The information bottleneck method

    The information bottleneck method , author=. arXiv preprint physics/0004057 , year=

  17. [17]

    Vision Research , volume=

    Sparse coding with an overcomplete basis set: A strategy employed by V1? , author=. Vision Research , volume=. 1997 , publisher=

  18. [18]

    Science , volume=

    Nonlinear dimensionality reduction by locally linear embedding , author=. Science , volume=. 2000 , publisher=

  19. [19]

    2022 , volume=

    Zihao Wang and Liu Ziyin , journal=. 2022 , volume=

  20. [20]

    Bin Dai and Yu Wang and John A. D. Aston and Gang Hua and David Paul Wipf , journal=. 2018 , volume=

  21. [21]

    Willcocks , journal=

    Sam Bond-Taylor and Chris G. Willcocks , journal=

  22. [22]

    Hinton , journal=

    Simon Kornblith and Mohammad Norouzi and Honglak Lee and Geoffrey E. Hinton , journal=. 2019 , volume=

  23. [23]

    Bauer and M

    Francesco Locatello and S. Bauer and M. Lucic and S. Gelly and B. Sch. ArXiv , year=

  24. [24]

    Eastwood and Christopher K

    C. Eastwood and Christopher K. I. Williams , booktitle=

  25. [25]

    Journal of Machine Learning Research , year =

    Lisa Bonheme and Marek Grzes , title =. Journal of Machine Learning Research , year =

  26. [26]

    2023 International Conference on Machine Learning and Applications (ICMLA) , year=

    Posterior Collapse in Variational Gradient Origin Networks , author=. 2023 International Conference on Machine Learning and Applications (ICMLA) , year=

  27. [27]

    Shannon , title =

    Claude E. Shannon , title =. Bell System Technical Journal , volume =. 1948 , publisher =

  28. [28]

    Neural Networks , year=

    Neural networks and principal component analysis: Learning from examples without local minima , author=. Neural Networks , year=

  29. [29]

    Multilayer feedforward networks are universal approximators

    Multilayer feedforward networks are universal approximators , journal =. 1989 , issn =. doi:https://doi.org/10.1016/0893-6080(89)90020-8 , url =

  30. [30]

    Signal Processing , volume=

    Independent component analysis, A new concept? , author=. Signal Processing , volume=. 1994 , publisher=

  31. [31]

    Neural Networks , volume=

    Independent component analysis: algorithms and applications , author=. Neural Networks , volume=. 2000 , publisher=

  32. [32]

    Convergent Learning: Do different neural networks learn the same representations?

    Convergent Learning: Do different neural networks learn the same representations? , author =. arXiv preprint arXiv:1511.07543 , year =

  33. [33]

    Advances in Neural Information Processing Systems , year =

    Revisiting Model Stitching to Compare Neural Representations , author =. Advances in Neural Information Processing Systems , year =

  34. [34]

    Proceedings of the 36th International Conference on Machine Learning , series =

    Similarity of Neural Network Representations Revisited , author =. Proceedings of the 36th International Conference on Machine Learning , series =. 2019 , editor =

  35. [35]

    arXiv preprint arXiv:2205.08399 , year=

    How do Variational Autoencoders Learn? Insights from Representational Similarity , author=. arXiv preprint arXiv:2205.08399 , year=

  36. [36]

    Advances in Neural Information Processing Systems , year=

    Implicit Neural Representations with Periodic Activation Functions , author=. Advances in Neural Information Processing Systems , year=

  37. [37]

    Diagnosing and Enhancing

    Bin Dai and David Wipf , booktitle=. Diagnosing and Enhancing. 2019 , url=

  38. [38]

    2012 , publisher=

    Elements of Information Theory , author=. 2012 , publisher=

  39. [39]

    Variational Autoencoders Pursue PCA Directions (by Accident) , year=

    Rolínek, Michal and Zietlow, Dominik and Martius, Georg , booktitle=. Variational Autoencoders Pursue PCA Directions (by Accident) , year=

  40. [40]

    ArXiv , year=

    Deep Variational Information Bottleneck , author=. ArXiv , year=

  41. [41]

    2006 , publisher=

    Pattern Recognition and Machine Learning , author=. 2006 , publisher=

  42. [42]

    Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume=

    Probabilistic Principal Component Analysis , author=. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume=. 1999 , publisher=

  43. [43]

    Yann LeCun and L. Proc. IEEE , year=

  44. [44]

    Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004

    Yann LeCun and Fu Jie Huang and L. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004. , year=

  45. [45]

    Loic Matthey and Irina Higgins and Demis Hassabis and Alexander Lerchner , title =. 2017

  46. [46]

    Information Flows of Diverse Autoencoders , volume=

    Lee, Sungyeop and Jo, Junghyo , year=. Information Flows of Diverse Autoencoders , volume=. Entropy , publisher=. doi:10.3390/e23070862 , number=

  47. [47]

    Neural networks : the official journal of the International Neural Network Society , year=

    Understanding Autoencoders with Information Theoretic Concepts , author=. Neural networks : the official journal of the International Neural Network Society , year=

  48. [48]

    Proceedings of the 37th annual Allerton conference on communication, control and computing , volume=

    The information bottleneck method , author=. Proceedings of the 37th annual Allerton conference on communication, control and computing , volume=

  49. [49]

    IBM Journal of Research and Development , volume=

    Information theoretical analysis of multivariate correlation , author=. IBM Journal of Research and Development , volume=. 1960 , publisher=

  50. [50]

    Tucker and R

    James Lucas and G. Tucker and R. Grosse and Mohammad Norouzi. DGS@ICLR. 2019

  51. [51]

    Annals of Human Genetics , year=

    THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , author=. Annals of Human Genetics , year=

  52. [52]

    patterned

    Bruno A. Olshausen and David J. Field , keywords =. Sparse coding with an overcomplete basis set: A strategy employed by V1? , journal =. 1997 , issn =. doi:https://doi.org/10.1016/S0042-6989(97)00169-7 , url =

  53. [53]

    2011 , howpublished =

    Andrew Ng , title =. 2011 , howpublished =

  54. [54]

    and McClelland, James L

    Rumelhart, David E. and McClelland, James L. , booktitle=. Learning Internal Representations by Error Propagation , year=

  55. [55]

    The Polarised Regime of identifiable Variational Autoencoders , booktitle =

    Lisa Bonheme and Marek Grzes , url =. The Polarised Regime of identifiable Variational Autoencoders , booktitle =. 2023 , month =

  56. [56]

    The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science , volume=

    On lines and planes of closest fit to systems of points in space , author=. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science , volume=. 1901 , publisher=

  57. [57]

    Journal of Educational Psychology , volume=

    Analysis of a complex of statistical variables into principal components , author=. Journal of Educational Psychology , volume=. 1933 , publisher=

  58. [58]

    Towards A Rigorous Science of Interpretable Machine Learning

    Towards A Rigorous Science of Interpretable Machine Learning , author=. arXiv preprint arXiv:1702.08608 , year=

  59. [59]

    Advances in Neural Information Processing Systems , pages=

    Attention Is All You Need , author=. Advances in Neural Information Processing Systems , pages=

  60. [60]

    Advances in Neural Information Processing Systems , pages=

    Generative Adversarial Nets , author=. Advances in Neural Information Processing Systems , pages=

  61. [61]

    2016 , publisher=

    Deep Learning , author=. 2016 , publisher=

  62. [62]

    1904 , publisher=

    Spearman, Charles , journal=. 1904 , publisher=

  63. [63]

    1957 , publisher=

    Dynamic Programming , author=. 1957 , publisher=

  64. [64]

    2002 , publisher =

    Principal Component Analysis , author =. 2002 , publisher =

  65. [65]

    Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics , year =

    Alféd Rényi , title =. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics , year =

  66. [66]

    Resonance , year=

    Equation of State Calculations by Fast Computing Machines , author=. Resonance , year=

  67. [67]

    Biometrika , year=

    Monte Carlo Sampling Methods Using Markov Chains and Their Applications , author=. Biometrika , year=

  68. [68]

    Beal , title =

    Michael J. Beal , title =. 2003 , type =

  69. [69]

    ICLR , year=

    Compressing Latent Space via Least Volume , author=. ICLR , year=

  70. [70]

    International Conference on Artificial Intelligence and Statistics , year=

    Variational Autoencoders and Nonlinear ICA: A Unifying Framework , author=. International Conference on Artificial Intelligence and Statistics , year=

  71. [71]

    Chen, Ricky T. Q. and Li, Xuechen and Grosse, Roger B and Duvenaud, David K , booktitle =. Isolating Sources of Disentanglement in Variational Autoencoders , url =

  72. [72]

    Third Symposium on Advances in Approximate Bayesian Inference , year=

    Posterior Collapse and Latent Variable Non-identifiability , author=. Third Symposium on Advances in Approximate Bayesian Inference , year=

  73. [73]

    Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year=

    The Intrinsic Dimension of Images and Its Impact on Learning , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year=

  74. [74]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops , year=

    Spectral Intrinsic Dimensionality Estimation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops , year=

  75. [75]

    An Empirical Bayes Approach to Statistics

    Robbins, Herbert E. An Empirical Bayes Approach to Statistics. Breakthroughs in Statistics: Foundations and Basic Theory. 1992. doi:10.1007/978-1-4612-0919-5_26

  76. [76]

    1983 , edition =

    Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences , author =. 1983 , edition =

  77. [77]

    Social Science Research Council Bulletin , year =

    Horst, Paul , title =. Social Science Research Council Bulletin , year =

  78. [78]

    2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE) , pages=

    Empirical Comparison between Autoencoders and Traditional Dimensionality Reduction Methods , author=. 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE) , pages=. 2019 , organization=

  79. [79]

    Applied Sciences , VOLUME =

    Yu, Jinyue and Sun, Zhiqiang and Yu, Chengcheng , TITLE =. Applied Sciences , VOLUME =. 2025 , NUMBER =