Generative random latent features models and statistics of natural images

Ilya Nemenman; Philipp Fleig

arxiv: 2212.02987 · v2 · submitted 2022-12-06 · ❄️ cond-mat.dis-nn

Generative random latent features models and statistics of natural images

Philipp Fleig , Ilya Nemenman This is my paper

Pith reviewed 2026-05-24 10:19 UTC · model grok-4.3

classification ❄️ cond-mat.dis-nn

keywords latent featuresnatural imagessparse mixinggenerative modelcorrelation statisticsdata decompositionlinear mixingeigenvalue distributions

0 comments

The pith

Natural image correlations match the sparse mixing regime of a two-parameter generative latent feature model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs a generative model where data arises from linear mixing of latent features, with the key step of allowing statistical dependence between the mixing coefficients. This dependence, controlled by only two parameters for dimensionality and correlation patterns, reproduces several characteristic structures including sparse mixing. Comparing the model's correlation and eigenvalue distributions to those measured in natural images yields a near-perfect match specifically in the sparse mixing regime. The result indicates that sparse coding is the natural decomposition for such data, replacing trial-and-error selection of latent features with a test based on observed correlations. A sympathetic reader cares because the same correlation signature can be used to diagnose the appropriate decomposition for many other complex natural datasets.

Core claim

We argue that sample-sample correlations carry information about the appropriate latent feature decomposition for a given dataset. Our generative random latent feature matrix model is built on linear mixing but allows statistical dependence between mixing coefficients; latent dimensionality and correlation patterns are set by two parameters. The model generates distinct correlation and eigenvalue distributions for different regimes, including overlapping clusters, sparse mixing, and constrained mixing. Fitting the model to correlation data from natural images produces a near-perfect match with the sparse mixing regime, consistent with the known sparse coding structure of natural scenes and,

What carries the argument

The generative random latent feature matrix model of linear mixing with statistically dependent mixing coefficients, controlled by two parameters for dimensionality and correlations.

If this is right

Different mixing regimes produce distinguishable correlation and eigenvalue distributions.
Fitting the model to correlation data identifies the appropriate latent decomposition type for a dataset.
Natural images are identified as belonging to the sparse mixing regime, consistent with sparse coding.
The same procedure supplies information about suitable decompositions for diverse biological datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The correlation-based test could be applied to neural population recordings or gene-expression matrices to diagnose their latent structure.
Models that force mixing coefficients to be independent may systematically miss the sparse or clustered patterns common in natural data.
Because only two parameters control the patterns, many seemingly different datasets may share the same correlation signatures once their mixing regime is identified.

Load-bearing premise

Allowing statistical dependence between the mixing coefficients is what enables the model to capture characteristic properties of natural data.

What would settle it

If the measured correlation matrix and eigenvalue spectrum from natural images fail to match the sparse-mixing predictions of the model within the reported near-perfect agreement.

Figures

Figures reproduced from arXiv: 2212.02987 by Ilya Nemenman, Philipp Fleig.

**Figure 2.** Figure 2: 0 5 10 15 20 25 30 Eigenvalues 0.0 0.5 1.0 1.5 2.0 2.5 Density £10°1 0 5 10 15 20 Eigenvalues 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Density £10°2 0 1 2 Cluster distance £10°3 0.99 1.00 Sample-sample correlation 0.0 0.1 0.2 0.3 Cluster distance 0 1 Sample-sample correlation 0.0 0.5 1.0 Cluster distance 0 1 Sample-sample correlation 0.0 0.5 1.0 Cluster distance 0 1 Sample-sample correlation (b) FIG. 2. Qualita… view at source ↗

**Figure 3.** Figure 3: new 0.0 2.5 5.0 7.5 10.0 12.5 15.0 Eigenvalues ∏ 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Density £10°1 Dirichlet-Gaussian Gaussian-Gaussian °0.5 0.0 0.5 1.0 Sample-sample correlation 0 1 2 3 4 Density 0.0 2.5 5.0 7.5 10.0 12.5 15.0 Eigenvalues 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Density £10°1 Dirichlet-Gaussian Gaussian-Gaussian 0.0 0.2 0.4 0.6 Cluster distance 0 1 Sample-sample correlation FIG. 3. The SUV model admits a spar… view at source ↗

**Figure 4.** Figure 4: FIG. 4 [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5 [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: FIG. 6 [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: FIG. 7 [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: FIG. 8 [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

read the original abstract

Complex, multivariable systems are often analyzed by grouping their constituent units into components, sometimes referred to as latent features, which afford physical or biological interpretation. However, a priori many different types of latent features and data decompositions can be defined, and one typically uses a trial and error approach to determine a decomposition that is natural to the system and its data. It is highly desirable to develop principled understanding of which decomposition is appropriate for given a data set. In this work, we take a step in this direction and argue that sample-sample correlations in the data carry important information to this effect. For this we construct a generative random latent feature matrix model of large data based on linear mixing of latent features. Key ingredient of our model is that we allow for statistical dependence between the mixing coefficients and argue that the model captures characteristic properties found in many types of natural data. Latent dimensionality and correlation patterns of the data are controlled by only two model parameters. The model's data patterns include (overlapping) clusters, sparse mixing, and constrained (non-negative) mixing. We describe the characteristic correlation and eigenvalue distributions of each pattern. Finally, we fit the model on correlation data from natural images and find a near perfect match with the sparse mixing regime of our model. This finding is in line with the well-known sparse coding structure in natural scene images and provides information about the appropriate data decomposition, namely a sparse coding scheme. We believe that our work will deliver similar insights for diverse data of biological systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The model fits natural-image correlations by tuning its two parameters to the sparse-mixing regime, but the match is by construction and does not show sparse coding is uniquely required.

read the letter

The paper's core contribution is a two-parameter generative model of linear mixing where statistical dependence among the mixing coefficients is explicit. Different dependence structures produce distinct correlation patterns and eigenvalue spectra (clusters, sparse, non-negative), and the authors map natural-image sample-sample correlations onto the sparse case after fitting the parameters. That mapping is new in its compactness and the direct link from dependence to second-order statistics. The derivations of the characteristic distributions for each regime look clean and could be checked independently. Credit for keeping the model minimal and for engaging the sparse-coding literature on images. The fit itself is the main limitation. Parameters are adjusted directly to the observed correlations, so agreement is expected rather than tested. No held-out validation, no quantitative error metrics, and no comparison to alternative generative models appear in the abstract or the stress-test summary. Without those, the inference that the data decomposition must be sparse remains under-supported; other models could reproduce the same second-order stats. The work is aimed at researchers who build latent-feature models for high-dimensional natural data and want a correlation-based way to select among decompositions. It is coherent on its own terms and shows clear thinking about the literature, so it deserves referee time even if the central claim needs more independent checks.

Referee Report

2 major / 2 minor

Summary. The paper constructs a two-parameter generative model (latent dimensionality and mixing correlation parameter) for data matrices formed by linear mixing of latent features, where statistical dependence among mixing coefficients produces distinct regimes with characteristic sample-sample correlation and eigenvalue spectra (overlapping clusters, sparse mixing, non-negative mixing). Fitting these parameters to empirical correlations from natural images yields a near-perfect match to the sparse-mixing regime, which the authors interpret as evidence that sparse coding is the appropriate data decomposition for such images.

Significance. If the fitting procedure can be shown to be robust and the sparse regime can be shown to be uniquely required rather than merely sufficient, the framework would offer a principled, correlation-based method for selecting among possible latent decompositions in complex data. The explicit mapping from dependence structure to correlation patterns is a conceptual strength that could generalize beyond images.

major comments (2)

[Abstract] Abstract: the claim of a 'near perfect match' with the sparse mixing regime is obtained by directly tuning the two free parameters (latent dimensionality and mixing correlation) to the observed natural-image correlations. Because the model is constructed so that different dependence structures generate qualitatively different correlation patterns, this agreement is achieved by construction and does not constitute an independent test that sparse mixing is required.
[Abstract] Abstract (paragraph beginning 'Key ingredient of our model'): the assertion that allowing statistical dependence between mixing coefficients 'captures characteristic properties found in many types of natural data' is presented as a modeling premise rather than a derived or validated result; the manuscript demonstrates that the model can reproduce different patterns but does not test whether alternative generative mechanisms with different latent decompositions could match the same second-order statistics equally well.

minor comments (2)

The manuscript should report the precise fitting procedure, the quantitative error metric used to declare a 'near perfect match,' and any cross-validation or held-out data checks.
Notation for the mixing coefficients and their correlation parameter should be introduced with explicit equations early in the model section to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The comments correctly identify that our fitting procedure demonstrates consistency with the sparse-mixing regime rather than proving uniqueness against all possible alternative generative models. We address both points below and will revise the abstract and related discussion accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of a 'near perfect match' with the sparse mixing regime is obtained by directly tuning the two free parameters (latent dimensionality and mixing correlation) to the observed natural-image correlations. Because the model is constructed so that different dependence structures generate qualitatively different correlation patterns, this agreement is achieved by construction and does not constitute an independent test that sparse mixing is required.

Authors: We agree that the match is obtained by tuning the two parameters and therefore shows consistency with the sparse regime rather than constituting an independent test of necessity. The model distinguishes qualitatively different correlation patterns across regimes, and the natural-image data fall into the sparse regime, which aligns with the established sparse-coding literature. We will revise the abstract to replace 'near perfect match' with language indicating that the observed correlations are consistent with the sparse-mixing regime of the model, and to clarify that this provides a correlation-based indication of the appropriate decomposition within the class of models considered. revision: yes
Referee: [Abstract] Abstract (paragraph beginning 'Key ingredient of our model'): the assertion that allowing statistical dependence between mixing coefficients 'captures characteristic properties found in many types of natural data' is presented as a modeling premise rather than a derived or validated result; the manuscript demonstrates that the model can reproduce different patterns but does not test whether alternative generative mechanisms with different latent decompositions could match the same second-order statistics equally well.

Authors: We acknowledge that the statement is a modeling premise motivated by known statistical dependencies in natural data, rather than a result derived or validated within the paper. The work focuses on the consequences of this assumption for correlation and eigenvalue spectra. We will revise the abstract to present the dependence structure explicitly as an assumption that enables the model to generate the observed regimes, without claiming it has been shown to be the only mechanism capable of reproducing the second-order statistics. revision: yes

Circularity Check

1 steps flagged

Near-perfect match to sparse regime achieved by fitting the two model parameters directly to the correlation data

specific steps

fitted input called prediction [Abstract]
"Finally, we fit the model on correlation data from natural images and find a near perfect match with the sparse mixing regime of our model. This finding is in line with the well-known sparse coding structure in natural scene images and provides information about the appropriate data decomposition, namely a sparse coding scheme."

The model is explicitly parameterized so that its two free parameters select among qualitatively distinct correlation patterns (including the sparse-mixing regime). Fitting those parameters to the observed sample-sample correlations and then declaring a near-perfect match to the sparse regime means the reported agreement is forced by the tuning step rather than arising from an independent derivation or out-of-sample test.

full rationale

The paper constructs a two-parameter generative model whose regimes (clusters, sparse mixing, non-negative) are defined to produce qualitatively different correlation patterns. It then tunes those parameters to empirical image correlations and reports a near-perfect match to the sparse regime as evidence that sparse coding is the appropriate decomposition. Because the match is obtained by construction through the fit, it does not constitute an independent test or prediction. No other load-bearing steps reduce to self-definition or self-citation chains.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on the linear-mixing generative assumption plus the introduction of statistical dependence between mixing coefficients as the key modeling choice; two free parameters are adjusted to data to produce the reported match.

free parameters (2)

latent dimensionality
Controls the number of latent features in the generative model.
mixing correlation parameter
Controls the statistical dependence among mixing coefficients and selects among cluster, sparse, and constrained regimes.

axioms (2)

domain assumption Observed data is generated by linear mixing of latent features.
Stated as the foundational construction of the model.
ad hoc to paper Statistical dependence between mixing coefficients captures characteristic properties of natural data.
Explicitly identified as the key ingredient that allows the model to reproduce observed patterns.

pith-pipeline@v0.9.0 · 5796 in / 1367 out tokens · 29339 ms · 2026-05-24T10:19:57.112918+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 2 internal anchors

[1]

Generative random latent features models and statistics of natural images

Generally, T ∼ N, though modern experiments of- ten push us to N ≫ T ≫ 1. Surprisingly, such large- dimensional data coming from natural systems are of- ten simpler than they could have been in that they re- veal an intrinsically lower-dimensional, hidden structure. The number of latent features (aka, collective degrees of freedom, which may be externally...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[2]

2(c) re- flects the transition between the two extreme limits (from pure clusters to uniform mixing)

Correlations The family of correlation distributions in Fig. 2(c) re- flects the transition between the two extreme limits (from pure clusters to uniform mixing). The first observation is that the density is not symmetric around zero. In the clusters limit (black curve), the density of correlations has a nearly delta function peak at 1, corresponding to c...

work page
[3]

From RMT, we know that the eigenvalue densities of random matrices often converge to their limiting form whenT , N, etc

Eigenvalues The shape of the eigenvalue density depends on the model parameters T , N, m and βU. From RMT, we know that the eigenvalue densities of random matrices often converge to their limiting form whenT , N, etc. tend to infinity, while ratios, such as N/T or the ratio of the number of latent features to the number of observables m/N remain fixed. Th...

work page
[4]

Thus we check whether the data could also be fit by a simpler, low-rank model. For this, we fit the correla- tion distribution of the Gaussian-Gaussian model on the correlation data and find an ML value of mGG ML = 94 ± 1 closely matching the ML value determined for the SUV model. In Fig. 4 (b), we show the ML correlation curve of the Gaussian-Gaussian mo...

work page
[5]

P. W. Anderson, More is different: broken symmetry and the nature of the hierarchical structure of science., Sci- ence 177, 393 (1972)

work page 1972
[6]

Marre, D

O. Marre, D. Amodei, N. Deshmukh, K. Sadeghi, F. Soo, T. E. Holy, and M. J. Berry, Mapping a complete neu- ral population in the retina, Journal of Neuroscience 32, 14859 (2012)

work page 2012
[7]

N. A. Steinmetz, C. Aydin, A. Lebedeva, M. Okun, M. Pachitariu, M. Bauza, M. Beau, J. Bhagat, C. B¨ ohm, M. Broux, S. Chen, J. Colonell, R. J. Gardner, B. Karsh, F. Kloosterman, D. Kostadinov, C. Mora- Lopez, J. O’Callaghan, J. Park, J. Putzeys, B. Sauerbrei, R. J. J. van Daal, A. Z. Vollan, S. Wang, M. Welkenhuy- sen, Z. Ye, J. T. Dudman, B. Dutta, A. W....

work page 2021
[8]

M. Nagy, H. Naik, F. Kano, N. V. Carlson, J. C. Koblitz, M. Wikelski, and I. D. Couzin, Smart-barn: Scalable multimodal arena for real-time tracking behavior of an- imals in large numbers, Science Advances 9, eadf8068 (2023). 11

work page 2023
[9]

Cavagna, X

A. Cavagna, X. Feng, S. Melillo, L. Parisi, L. Postiglione, and P. Villegas, Como: A novel comoving 3d camera sys- tem, IEEE Transactions on Instrumentation and Mea- surement 70, 1 (2021)

work page 2021
[10]

Schneidman, M

E. Schneidman, M. J. Berry, R. Segev, and W. Bialek, Weak pairwise correlations imply strongly correlated net- work states in a neural population, Nature 440, 1007 (2006)

work page 2006
[11]

Meshulam, J

L. Meshulam, J. L. Gauthier, C. D. Brody, D. W. Tank, and W. Bialek, Coarse graining, fixed points, and scaling in a large population of neurons, Physical review letters 123, 178103 (2019)

work page 2019
[12]

D. L. Ruderman, Origins of scaling in natural images, Vision research 37, 3385 (1997)

work page 1997
[13]

Halabi, O

N. Halabi, O. Rivoire, S. Leibler, and R. Ranganathan, Protein sectors: evolutionary units of three-dimensional structure, Cell 138, 774 (2009)

work page 2009
[14]

D. S. Marks, L. J. Colwell, R. Sheridan, T. A. Hopf, A. Pagnani, R. Zecchina, and C. Sander, Protein 3d structure computed from evolutionary sequence varia- tion, PLOS ONE 6, 1 (2011)

work page 2011
[15]

D. L. Ruderman and W. Bialek, Statistics of natural im- ages: Scaling in the woods, Phys. Rev. Lett. 73, 814 (1994)

work page 1994
[16]

Qin and L

C. Qin and L. J. Colwell, Power law tails in phyloge- netic systems, Proceedings of the National Academy of Sciences 115, 690 (2018)

work page 2018
[17]

Nitzan and M

M. Nitzan and M. P. Brenner, Revealing lineage-related signals in single-cell gene expression using random matrix theory, Proceedings of the National Academy of Sciences 118, e1913931118 (2021)

work page 2021
[18]

Tomasini and M

U. Tomasini and M. Wyart, How deep networks learn sparse and hierarchical data: the sparse random hierar- chy model, arXiv preprint arXiv:2404.10727 (2024)

work page arXiv 2024
[19]

http://www.rctn.org/bruno/sparsenet/ (version: 2023-04-19)

work page 2023
[20]

B. A. Olshausen and D. J. Field, Emergence of simple- cell receptive field properties by learning a sparse code for natural images, Nature 381, 607 (1996)

work page 1996
[21]

J. P. Cunningham and B. M. Yu, Dimensionality re- duction for large-scale neural recordings, Nature Neu- roscience 17, 1500 (2014)

work page 2014
[22]

P. Gao, E. Trautmann, B. Yu, G. Santhanam, S. Ryu, K. Shenoy, and S. Ganguli, A theory of multineuronal dimensionality, dynamics and measurement, bioRxiv , 214262 (2017)

work page 2017
[23]

J. A. Gallego, M. G. Perich, L. E. Miller, and S. A. Solla, Neural manifolds for the control of movement, Neuron 94, 978 (2017)

work page 2017
[24]

S. M. Perkins, J. P. Cunningham, Q. Wang, and M. M. Churchland, Simple decoding of behavior from a compli- cated neural manifold 10.1101/2023.04.05.535396 (2023)

work page doi:10.1101/2023.04.05.535396 2023
[25]

Pandarinath, D

C. Pandarinath, D. J. O’Shea, J. Collins, R. Jozefow- icz, S. D. Stavisky, J. C. Kao, E. M. Trautmann, M. T. Kaufman, S. I. Ryu, L. R. Hochberg, et al. , Inferring single-trial neural population dynamics using sequential auto-encoders, Nature methods 15, 805 (2018)

work page 2018
[26]

M. C. Morrell, A. J. Sederberg, and I. Nemenman, Latent Dynamical Variables Produce Signatures of Spatiotem- poral Criticality in Large Biological Systems, Physical Review Letters 126, 118302 (2021)

work page 2021
[27]

E. H. Nieh, M. Schottdorf, N. W. Freeman, R. J. Low, S. Lewallen, S. A. Koay, L. Pinto, J. L. Gauthier, C. D. Brody, and D. W. Tank, Geometry of abstract learned knowledge in the hippocampus, Nature 595, 80 (2021)

work page 2021
[28]

G. J. Stephens, B. Johnson-Kerner, W. Bialek, and W. S. Ryu, Dimensionality and dynamics in the behavior of c. elegans, PLoS Comput Biol 4, e1000028 (2008)

work page 2008
[29]

Moran and M

J. Moran and M. Tikhonov, Defining coarse-grainability in a model of structured microbial ecosystems, Physical Review X 12, 021038 (2022)

work page 2022
[30]

Moran and M

J. Moran and M. Tikhonov, Emergent predictability in microbial ecosystems, bioRxiv , 2024 (2024)

work page 2024
[31]

Jordan, S

D. Jordan, S. Kuehn, E. Katifori, and S. Leibler, Be- havioral diversity in microbes and low-dimensional phe- notypic spaces, Proceedings of the National Academy of Sciences 110, 14018 (2013)

work page 2013
[32]

M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Bot- stein, Cluster analysis and display of genome-wide ex- pression patterns, Proceedings of the National Academy of Sciences 95, 14863 (1998)

work page 1998
[33]

Goldt, M

S. Goldt, M. M´ ezard, F. Krzakala, and L. Zdeborov´ a, Modeling the influence of data structure on learning in neural networks: The hidden manifold model, Physical Review X 10, 041044 (2020)

work page 2020
[34]

D. D. Lee and H. S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature 401, 788 (1999)

work page 1999
[35]

Potters and J.-P

M. Potters and J.-P. Bouchaud, A First Course in Ran- dom Matrix Theory: For Physicists, Engineers and Data Scientists (Cambridge University Press, 2020)

work page 2020
[36]

Mignacco, F

F. Mignacco, F. Krzakala, Y. Lu, P. Urbani, and L. Zde- borova, The role of regularization in classification of high-dimensional noisy gaussian mixture, in Interna- tional Conference on Machine Learning (PMLR, 2020) pp. 6874–6883

work page 2020
[37]

Fleig and I

P. Fleig and I. Nemenman, Statistical properties of large data sets with linear latent features, Phys. Rev. E 106, 014102 (2022)

work page 2022
[38]

R. B. Grosse, R. Salakhutdinov, W. T. Freeman, and J. B. Tenenbaum, Exploiting compositionality to explore a large space of model structures, in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, UAI’12 (AUAI Press, Arlington, Virginia, USA, 2012) p. 306–315

work page 2012
[39]

B. A. Olshausen and D. J. Field, Sparse coding with an overcomplete basis set: A strategy employed by v1?, Vision Research 37, 3311 (1997)

work page 1997
[40]

Nemenman, F

I. Nemenman, F. Shafee, and W. Bialek, Entropy and inference, revisited, in Advances in Neural Information Processing Systems , Vol. 14, edited by T. Dietterich, S. Becker, and Z. Ghahramani (MIT Press, 2001)

work page 2001
[41]

https://en.wikipedia.org/wiki/Dirichlet_ distribution (version: 2022-09-11)

work page 2022
[42]

https://docs.scipy.org/doc/scipy/reference/ generated/scipy.cluster.hierarchy.linkage.html (SciPy v1.7.1)

work page
[43]

Laloux, P

L. Laloux, P. Cizeau, J.-P. Bouchaud, and M. Potters, Noise dressing of financial correlation matrices, Phys. Rev. Lett. 83, 1467 (1999)

work page 1999
[44]

Hotelling, New light on the correlation coefficient and its transforms, Journal of the Royal Statistical Society

H. Hotelling, New light on the correlation coefficient and its transforms, Journal of the Royal Statistical Society. Series B (Methodological) 15, 193 (1953)

work page 1953
[45]

V. A. Marˇ cenko and L. A. Pastur, DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES, Mathematics of the USSR-Sbornik 1, 457 (1967)

work page 1967
[46]

J. H. van Hateren and D. L. Ruderman, Independent 12 component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex, Proceedings of the Royal Society of Lon- don. Series B: Biological Sciences 265, 2315 (1998)

work page 1998
[47]

Saremi and T

S. Saremi and T. J. Sejnowski, Hierarchical model of natural images and the origin of scale invariance, Pro- ceedings of the National Academy of Sciences 110, 3071 (2013)

work page 2013
[48]

Bialek, A

W. Bialek, A. Cavagna, I. Giardina, T. Mora, E. Sil- vestri, M. Viale, and A. M. Walczak, Statistical mechan- ics for natural flocks of birds, Proceedings of the National Academy of Sciences 109, 4786 (2012)

work page 2012
[49]

Y. Gao, E. W. Archer, L. Paninski, and J. P. Cun- ningham, Linear dynamical neural population models through nonlinear embeddings, Advances in neural in- formation processing systems 29 (2016)

work page 2016
[50]

Sohl-Dickstein, E

J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, Deep unsupervised learning using nonequi- librium thermodynamics, in International conference on machine learning (PMLR, 2015) pp. 2256–2265

work page 2015
[51]

A. J. Bell and T. J. Sejnowski, An information- maximization approach to blind separation and blind de- convolution, Neural Computation 7, 1129 (1995)

work page 1995
[52]

Hyv¨ arinen and E

A. Hyv¨ arinen and E. Oja, Independent component anal- ysis: algorithms and applications, Neural Networks 13, 411 (2000)

work page 2000
[53]

A Tutorial on Principal Component Analysis

J. Shlens, A Tutorial on Principal Component Analysis, arXiv preprint arXiv:1404.1100 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[54]

M. J. Berry and G. Tkaˇ cik, Clustering of neural activ- ity: A design principle for population codes, Frontiers in Computational Neuroscience 14 (2020)

work page 2020
[55]

G4vzAKUfqBvuBoSMSn1cPbS+Hlo=

W. M lynarski and G. Tkaˇ cik, Efficient coding theory of dynamic attentional modulation, PLoS Biology 20, e3001889 (2022) Appendix A: Computation of data mean and variance We compute the theoretical mean and variance of data from the Dirichlet-Gaussian model with a trivial modu- lation matrix ( stµ = 1), and i. i. d. Gaussian latent features. Starting fr...

work page 2022

[1] [1]

Generative random latent features models and statistics of natural images

Generally, T ∼ N, though modern experiments of- ten push us to N ≫ T ≫ 1. Surprisingly, such large- dimensional data coming from natural systems are of- ten simpler than they could have been in that they re- veal an intrinsically lower-dimensional, hidden structure. The number of latent features (aka, collective degrees of freedom, which may be externally...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[2] [2]

2(c) re- flects the transition between the two extreme limits (from pure clusters to uniform mixing)

Correlations The family of correlation distributions in Fig. 2(c) re- flects the transition between the two extreme limits (from pure clusters to uniform mixing). The first observation is that the density is not symmetric around zero. In the clusters limit (black curve), the density of correlations has a nearly delta function peak at 1, corresponding to c...

work page

[3] [3]

From RMT, we know that the eigenvalue densities of random matrices often converge to their limiting form whenT , N, etc

Eigenvalues The shape of the eigenvalue density depends on the model parameters T , N, m and βU. From RMT, we know that the eigenvalue densities of random matrices often converge to their limiting form whenT , N, etc. tend to infinity, while ratios, such as N/T or the ratio of the number of latent features to the number of observables m/N remain fixed. Th...

work page

[4] [4]

Thus we check whether the data could also be fit by a simpler, low-rank model. For this, we fit the correla- tion distribution of the Gaussian-Gaussian model on the correlation data and find an ML value of mGG ML = 94 ± 1 closely matching the ML value determined for the SUV model. In Fig. 4 (b), we show the ML correlation curve of the Gaussian-Gaussian mo...

work page

[5] [5]

P. W. Anderson, More is different: broken symmetry and the nature of the hierarchical structure of science., Sci- ence 177, 393 (1972)

work page 1972

[6] [6]

Marre, D

O. Marre, D. Amodei, N. Deshmukh, K. Sadeghi, F. Soo, T. E. Holy, and M. J. Berry, Mapping a complete neu- ral population in the retina, Journal of Neuroscience 32, 14859 (2012)

work page 2012

[7] [7]

N. A. Steinmetz, C. Aydin, A. Lebedeva, M. Okun, M. Pachitariu, M. Bauza, M. Beau, J. Bhagat, C. B¨ ohm, M. Broux, S. Chen, J. Colonell, R. J. Gardner, B. Karsh, F. Kloosterman, D. Kostadinov, C. Mora- Lopez, J. O’Callaghan, J. Park, J. Putzeys, B. Sauerbrei, R. J. J. van Daal, A. Z. Vollan, S. Wang, M. Welkenhuy- sen, Z. Ye, J. T. Dudman, B. Dutta, A. W....

work page 2021

[8] [8]

M. Nagy, H. Naik, F. Kano, N. V. Carlson, J. C. Koblitz, M. Wikelski, and I. D. Couzin, Smart-barn: Scalable multimodal arena for real-time tracking behavior of an- imals in large numbers, Science Advances 9, eadf8068 (2023). 11

work page 2023

[9] [9]

Cavagna, X

A. Cavagna, X. Feng, S. Melillo, L. Parisi, L. Postiglione, and P. Villegas, Como: A novel comoving 3d camera sys- tem, IEEE Transactions on Instrumentation and Mea- surement 70, 1 (2021)

work page 2021

[10] [10]

Schneidman, M

E. Schneidman, M. J. Berry, R. Segev, and W. Bialek, Weak pairwise correlations imply strongly correlated net- work states in a neural population, Nature 440, 1007 (2006)

work page 2006

[11] [11]

Meshulam, J

L. Meshulam, J. L. Gauthier, C. D. Brody, D. W. Tank, and W. Bialek, Coarse graining, fixed points, and scaling in a large population of neurons, Physical review letters 123, 178103 (2019)

work page 2019

[12] [12]

D. L. Ruderman, Origins of scaling in natural images, Vision research 37, 3385 (1997)

work page 1997

[13] [13]

Halabi, O

N. Halabi, O. Rivoire, S. Leibler, and R. Ranganathan, Protein sectors: evolutionary units of three-dimensional structure, Cell 138, 774 (2009)

work page 2009

[14] [14]

D. S. Marks, L. J. Colwell, R. Sheridan, T. A. Hopf, A. Pagnani, R. Zecchina, and C. Sander, Protein 3d structure computed from evolutionary sequence varia- tion, PLOS ONE 6, 1 (2011)

work page 2011

[15] [15]

D. L. Ruderman and W. Bialek, Statistics of natural im- ages: Scaling in the woods, Phys. Rev. Lett. 73, 814 (1994)

work page 1994

[16] [16]

Qin and L

C. Qin and L. J. Colwell, Power law tails in phyloge- netic systems, Proceedings of the National Academy of Sciences 115, 690 (2018)

work page 2018

[17] [17]

Nitzan and M

M. Nitzan and M. P. Brenner, Revealing lineage-related signals in single-cell gene expression using random matrix theory, Proceedings of the National Academy of Sciences 118, e1913931118 (2021)

work page 2021

[18] [18]

Tomasini and M

U. Tomasini and M. Wyart, How deep networks learn sparse and hierarchical data: the sparse random hierar- chy model, arXiv preprint arXiv:2404.10727 (2024)

work page arXiv 2024

[19] [19]

http://www.rctn.org/bruno/sparsenet/ (version: 2023-04-19)

work page 2023

[20] [20]

B. A. Olshausen and D. J. Field, Emergence of simple- cell receptive field properties by learning a sparse code for natural images, Nature 381, 607 (1996)

work page 1996

[21] [21]

J. P. Cunningham and B. M. Yu, Dimensionality re- duction for large-scale neural recordings, Nature Neu- roscience 17, 1500 (2014)

work page 2014

[22] [22]

P. Gao, E. Trautmann, B. Yu, G. Santhanam, S. Ryu, K. Shenoy, and S. Ganguli, A theory of multineuronal dimensionality, dynamics and measurement, bioRxiv , 214262 (2017)

work page 2017

[23] [23]

J. A. Gallego, M. G. Perich, L. E. Miller, and S. A. Solla, Neural manifolds for the control of movement, Neuron 94, 978 (2017)

work page 2017

[24] [24]

S. M. Perkins, J. P. Cunningham, Q. Wang, and M. M. Churchland, Simple decoding of behavior from a compli- cated neural manifold 10.1101/2023.04.05.535396 (2023)

work page doi:10.1101/2023.04.05.535396 2023

[25] [25]

Pandarinath, D

C. Pandarinath, D. J. O’Shea, J. Collins, R. Jozefow- icz, S. D. Stavisky, J. C. Kao, E. M. Trautmann, M. T. Kaufman, S. I. Ryu, L. R. Hochberg, et al. , Inferring single-trial neural population dynamics using sequential auto-encoders, Nature methods 15, 805 (2018)

work page 2018

[26] [26]

M. C. Morrell, A. J. Sederberg, and I. Nemenman, Latent Dynamical Variables Produce Signatures of Spatiotem- poral Criticality in Large Biological Systems, Physical Review Letters 126, 118302 (2021)

work page 2021

[27] [27]

E. H. Nieh, M. Schottdorf, N. W. Freeman, R. J. Low, S. Lewallen, S. A. Koay, L. Pinto, J. L. Gauthier, C. D. Brody, and D. W. Tank, Geometry of abstract learned knowledge in the hippocampus, Nature 595, 80 (2021)

work page 2021

[28] [28]

G. J. Stephens, B. Johnson-Kerner, W. Bialek, and W. S. Ryu, Dimensionality and dynamics in the behavior of c. elegans, PLoS Comput Biol 4, e1000028 (2008)

work page 2008

[29] [29]

Moran and M

J. Moran and M. Tikhonov, Defining coarse-grainability in a model of structured microbial ecosystems, Physical Review X 12, 021038 (2022)

work page 2022

[30] [30]

Moran and M

J. Moran and M. Tikhonov, Emergent predictability in microbial ecosystems, bioRxiv , 2024 (2024)

work page 2024

[31] [31]

Jordan, S

D. Jordan, S. Kuehn, E. Katifori, and S. Leibler, Be- havioral diversity in microbes and low-dimensional phe- notypic spaces, Proceedings of the National Academy of Sciences 110, 14018 (2013)

work page 2013

[32] [32]

M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Bot- stein, Cluster analysis and display of genome-wide ex- pression patterns, Proceedings of the National Academy of Sciences 95, 14863 (1998)

work page 1998

[33] [33]

Goldt, M

S. Goldt, M. M´ ezard, F. Krzakala, and L. Zdeborov´ a, Modeling the influence of data structure on learning in neural networks: The hidden manifold model, Physical Review X 10, 041044 (2020)

work page 2020

[34] [34]

D. D. Lee and H. S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature 401, 788 (1999)

work page 1999

[35] [35]

Potters and J.-P

M. Potters and J.-P. Bouchaud, A First Course in Ran- dom Matrix Theory: For Physicists, Engineers and Data Scientists (Cambridge University Press, 2020)

work page 2020

[36] [36]

Mignacco, F

F. Mignacco, F. Krzakala, Y. Lu, P. Urbani, and L. Zde- borova, The role of regularization in classification of high-dimensional noisy gaussian mixture, in Interna- tional Conference on Machine Learning (PMLR, 2020) pp. 6874–6883

work page 2020

[37] [37]

Fleig and I

P. Fleig and I. Nemenman, Statistical properties of large data sets with linear latent features, Phys. Rev. E 106, 014102 (2022)

work page 2022

[38] [38]

R. B. Grosse, R. Salakhutdinov, W. T. Freeman, and J. B. Tenenbaum, Exploiting compositionality to explore a large space of model structures, in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, UAI’12 (AUAI Press, Arlington, Virginia, USA, 2012) p. 306–315

work page 2012

[39] [39]

B. A. Olshausen and D. J. Field, Sparse coding with an overcomplete basis set: A strategy employed by v1?, Vision Research 37, 3311 (1997)

work page 1997

[40] [40]

Nemenman, F

I. Nemenman, F. Shafee, and W. Bialek, Entropy and inference, revisited, in Advances in Neural Information Processing Systems , Vol. 14, edited by T. Dietterich, S. Becker, and Z. Ghahramani (MIT Press, 2001)

work page 2001

[41] [41]

https://en.wikipedia.org/wiki/Dirichlet_ distribution (version: 2022-09-11)

work page 2022

[42] [42]

https://docs.scipy.org/doc/scipy/reference/ generated/scipy.cluster.hierarchy.linkage.html (SciPy v1.7.1)

work page

[43] [43]

Laloux, P

L. Laloux, P. Cizeau, J.-P. Bouchaud, and M. Potters, Noise dressing of financial correlation matrices, Phys. Rev. Lett. 83, 1467 (1999)

work page 1999

[44] [44]

Hotelling, New light on the correlation coefficient and its transforms, Journal of the Royal Statistical Society

H. Hotelling, New light on the correlation coefficient and its transforms, Journal of the Royal Statistical Society. Series B (Methodological) 15, 193 (1953)

work page 1953

[45] [45]

V. A. Marˇ cenko and L. A. Pastur, DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES, Mathematics of the USSR-Sbornik 1, 457 (1967)

work page 1967

[46] [46]

J. H. van Hateren and D. L. Ruderman, Independent 12 component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex, Proceedings of the Royal Society of Lon- don. Series B: Biological Sciences 265, 2315 (1998)

work page 1998

[47] [47]

Saremi and T

S. Saremi and T. J. Sejnowski, Hierarchical model of natural images and the origin of scale invariance, Pro- ceedings of the National Academy of Sciences 110, 3071 (2013)

work page 2013

[48] [48]

Bialek, A

W. Bialek, A. Cavagna, I. Giardina, T. Mora, E. Sil- vestri, M. Viale, and A. M. Walczak, Statistical mechan- ics for natural flocks of birds, Proceedings of the National Academy of Sciences 109, 4786 (2012)

work page 2012

[49] [49]

Y. Gao, E. W. Archer, L. Paninski, and J. P. Cun- ningham, Linear dynamical neural population models through nonlinear embeddings, Advances in neural in- formation processing systems 29 (2016)

work page 2016

[50] [50]

Sohl-Dickstein, E

J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, Deep unsupervised learning using nonequi- librium thermodynamics, in International conference on machine learning (PMLR, 2015) pp. 2256–2265

work page 2015

[51] [51]

A. J. Bell and T. J. Sejnowski, An information- maximization approach to blind separation and blind de- convolution, Neural Computation 7, 1129 (1995)

work page 1995

[52] [52]

Hyv¨ arinen and E

A. Hyv¨ arinen and E. Oja, Independent component anal- ysis: algorithms and applications, Neural Networks 13, 411 (2000)

work page 2000

[53] [53]

A Tutorial on Principal Component Analysis

J. Shlens, A Tutorial on Principal Component Analysis, arXiv preprint arXiv:1404.1100 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[54] [54]

M. J. Berry and G. Tkaˇ cik, Clustering of neural activ- ity: A design principle for population codes, Frontiers in Computational Neuroscience 14 (2020)

work page 2020

[55] [55]

G4vzAKUfqBvuBoSMSn1cPbS+Hlo=

W. M lynarski and G. Tkaˇ cik, Efficient coding theory of dynamic attentional modulation, PLoS Biology 20, e3001889 (2022) Appendix A: Computation of data mean and variance We compute the theoretical mean and variance of data from the Dirichlet-Gaussian model with a trivial modu- lation matrix ( stµ = 1), and i. i. d. Gaussian latent features. Starting fr...

work page 2022