pith. sign in

arxiv: 2501.18333 · v1 · submitted 2025-01-30 · 🌌 astro-ph.CO

Interpretability of deep-learning methods applied to large-scale structure surveys

Pith reviewed 2026-05-23 04:41 UTC · model grok-4.3

classification 🌌 astro-ph.CO
keywords interpretabilityconvolutional neural networkslarge-scale structurecosmological parametersGaussian informationnon-Gaussian informationdeep learning
0
0 comments X

The pith

A convolutional neural network for large-scale structure surveys draws its predictions from a mix of Gaussian and non-Gaussian information, with emphasis on scales near the linear-to-nonlinear transition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests what information drives a convolutional neural network's cosmological parameter estimates by training and evaluating the network on survey data from which specific features have been deliberately removed. This approach reveals whether the network depends on the same statistical properties that classical summary statistics use or on additional aspects of the maps. A reader would care because it offers a direct way to inspect the otherwise hidden reasoning inside a deep-learning model applied to cosmology data. The results show the network combines both Gaussian and non-Gaussian signals and weights most heavily the structures whose sizes sit at the boundary between linear and nonlinear regimes.

Core claim

Training the network on degraded large-scale structure data shows that its parameter predictions rely on a mix of both Gaussian and non-Gaussian information, and that the network places particular emphasis on structures whose scales lie at the limit between the linear and nonlinear regimes.

What carries the argument

The technique of training and predicting with input maps from which targeted information has been removed, then measuring the resulting change in constraining power.

If this is right

  • The network accesses information beyond what is captured by Gaussian statistics alone.
  • The emphasis on transitional scales implies sensitivity to mildly nonlinear structures.
  • The combination of information types may allow the network to break parameter degeneracies that affect traditional analyses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the result holds, networks could be tested on mocks engineered to suppress non-Gaussian features to confirm the claimed dependence.
  • The same degradation method could be applied to other summary statistics to compare their information sources directly.

Load-bearing premise

Removing specific information from the survey data isolates the network's dependence on those features without the removal process itself creating new training dynamics or compensatory behaviors.

What would settle it

A measurement in which removing the claimed non-Gaussian or transitional-scale information leaves the network's error bars and bias unchanged.

Figures

Figures reproduced from arXiv: 2501.18333 by Alexandre Refregier, Gaspard Aymerich, Tomasz Kacprzak.

Figure 1
Figure 1. Figure 1: Redshift bins used for this work, chosen to be generally rep￾resentative of a Stage III survey. Figure taken from Kacprzak & Fluri (2022). The sums runs over the redshift shells (that are of thickness ∆zb) and the weight for each shell is defined as: WWL b = 3 2 Ωm R ∆zb dz E(z) R zs z dz′n(z ′ ) D(z)D(z,z ′ ) D(z ′ )a(z) R ∆zb dz E(z) R zs z0 dz′n(z ′ ) (2) The mean convergence of each map is subtracted: … view at source ↗
Figure 2
Figure 2. Figure 2: Example of a simulated 900 deg² survey with 4 redshift bins, obtained by creating a mosaic of 6 × 6 individual 5 × 5 degrees maps, with Gaussian noise and Gaussian smoothing at scale R = 4 Mpc/h emcee algorithm (Foreman-Mackey et al. 2013) with the dis￾tribution given by the MDN. 200 chains of 128k samples are run for each model (or a single 1.28m chain for plotting Fig.5). 3. A novel approach to the inter… view at source ↗
Figure 3
Figure 3. Figure 3: Example of a map separated into 4 channels by a starlet transform. Only the first redshift bin is shown, but all four were included in the training. Top row is the initial map, bottom row are the 4 starlet transform channels and the corresponding scales [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Example of a map separated into three convergence regions. Only the first redshift bin is shown, but all four were included in the training. From left to right, there is the base map, the low convergence regions, the mid convergence regions and finally the high convergence regions. Pixels outside of the range appear in yellow for both high and low κ and in deep blue for mid κ. mation that is removed. There… view at source ↗
Figure 5
Figure 5. Figure 5: Constraints on [Ωm, σ8] obtained by the CNN. The black dots mark the true value of parameters. degradations in σS 8 and H is almost identical. The slight dis￾agreements can be explained by one main difference between the two measurements: σS 8 is not very affected by outliers when compared to the entropy, and probes mostly how tight the centre part of the distribution is. To better visualise the difference… view at source ↗
Figure 6
Figure 6. Figure 6: Network performance for various scale related degradations. Left panel is σS 8 , the constraining power on S 8, right panel is H, the infor￾mation entropy. The top four rows present the performance for various smoothing scales, for both CNN and PS-neural network. The lower rows present the performance of the CNN for various scale range, obtained by keeping only certain starlet transform channels. 0.00 0.01… view at source ↗
Figure 7
Figure 7. Figure 7: CNN performance for various zero-loss transformations. Left panel is σS 8 , the constraining power on S 8, right panel is H, the information entropy. The results for 3 or 5 channels starlet transform as well as for a Fourier transform in the form of either real and imaginary parts or amplitude and phase are presented. 0.00 0.01 0.02 0.03 0.04 0.05 S8 constraining power ( S8) Reference High + FFT Amp Low + … view at source ↗
Figure 8
Figure 8. Figure 8: CNN performance for various convergence regions selections. Left panel is σS 8 , the constraining power on S 8, right panel is H, the information entropy. Low/mid/high κ denotes the low/mid/high convergence regions. The second row presents the performance of a network taking as input the high convergence regions in one channel and the Fourier transform amplitude in another, to mimic a widely used statistic… view at source ↗
Figure 9
Figure 9. Figure 9: CNN performance for redshift shuffling, redshift summing and shuffling all pixels. Left panel is σS 8 , the constraining power on S 8, right panel is H, the information entropy. Article number, page 7 of 11 [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗
read the original abstract

Deep learning and convolutional neural networks in particular are powerful and promising tools for cosmological analysis of large-scale structure surveys. They are already providing similar performance to classical analysis methods using fixed summary statistics, are showing potential to break key degeneracies by better probe combination and will likely improve rapidly in the coming years as progress is made in the physical modelling through both software and hardware improvement. One key issue remains: unlike classical analysis, a convolutional neural network's decision process is hidden from the user as the network optimises millions of parameters with no direct physical meaning. This prevents a clear understanding of the potential limitations and biases of the analysis, making it hard to rely on as a main analysis method. In this work, we explore the behaviour of such a convolutional neural network through a novel method. Instead of trying to analyse a network a posteriori, i.e. after training has been completed, we study the impact on the constraining power of training the network and predicting parameters with degraded data where we removed part of the information. This allows us to gain an understanding of which parts and features of a large-scale structure survey are most important in the network's prediction process. We find that the network's prediction process relies on a mix of both Gaussian and non-Gaussian information, and seems to put an emphasis on structures whose scales are at the limit between linear and non-linear regimes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a method to interpret convolutional neural networks for cosmological parameter inference from large-scale structure surveys. Rather than post-hoc analysis, the authors retrain networks on deliberately degraded data with targeted removal of Gaussian versus non-Gaussian information or specific scale ranges, then measure changes in constraining power to infer which features the network relies upon. The headline result is that the network draws on a combination of both Gaussian and non-Gaussian information while emphasizing scales near the linear-to-nonlinear transition.

Significance. If the degradation protocol can be shown to isolate feature dependence without confounding changes to training dynamics or data statistics, the approach would supply a practical, forward-modeling route to interpretability that is directly relevant to ongoing and future LSS analyses. The method is original in its emphasis on retraining rather than post-training attribution and could help address the black-box concern that currently limits adoption of DL methods as primary analysis tools.

major comments (2)
  1. [Methods (degradation protocol)] The central claim that performance degradation after targeted information removal directly reveals the network's learned reliance on Gaussian/non-Gaussian content or specific scales rests on the untested assumption that the degradation operator leaves the remaining data statistics and optimization landscape unchanged except for the excised component. No quantitative controls (power-spectrum matching, preservation of higher-order statistics, or ablation studies on the degradation operator itself) are described that would rule out compensatory training dynamics or induced artifacts.
  2. [Abstract and Results] The abstract and provided description supply no quantitative results, error bars, or implementation details on how data degradation is performed or on the magnitude of the reported performance changes. Without these, it is not possible to assess whether the evidence supports the stated conclusion that the network 'relies on a mix' or 'puts an emphasis' on particular scales.
minor comments (2)
  1. [Methods] Notation for the degradation operators and the precise definition of 'Gaussian' versus 'non-Gaussian' information should be introduced explicitly with equations or pseudocode.
  2. [Introduction / Data] The manuscript would benefit from a clear statement of the cosmological parameters being inferred and the survey specifications (volume, redshift range, noise model) used in the training sets.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive report. We address each major comment below and commit to revisions that strengthen the presentation of our degradation protocol and the quantitative support for our claims.

read point-by-point responses
  1. Referee: [Methods (degradation protocol)] The central claim that performance degradation after targeted information removal directly reveals the network's learned reliance on Gaussian/non-Gaussian content or specific scales rests on the untested assumption that the degradation operator leaves the remaining data statistics and optimization landscape unchanged except for the excised component. No quantitative controls (power-spectrum matching, preservation of higher-order statistics, or ablation studies on the degradation operator itself) are described that would rule out compensatory training dynamics or induced artifacts.

    Authors: We agree that explicit validation of the degradation operator is necessary to support the interpretability conclusions. The manuscript describes the targeted removal procedures but does not present the requested quantitative controls. We will add these in the revised methods section, including direct comparisons of the power spectrum and selected higher-order statistics before and after degradation, as well as ablation tests on the degradation parameters to check for induced artifacts or changes in training dynamics. revision: yes

  2. Referee: [Abstract and Results] The abstract and provided description supply no quantitative results, error bars, or implementation details on how data degradation is performed or on the magnitude of the reported performance changes. Without these, it is not possible to assess whether the evidence supports the stated conclusion that the network 'relies on a mix' or 'puts an emphasis' on particular scales.

    Authors: We accept that the current abstract is qualitative and lacks the requested numerical support. We will revise the abstract to report the magnitude of performance changes (e.g., relative increases in parameter uncertainties) when Gaussian or non-Gaussian information is removed, together with error bars obtained from multiple independent realizations. Implementation details of the degradation steps will be summarized concisely in the abstract or moved to a prominent position in the results section. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ablation method is independent of its inputs

full rationale

The paper presents an interpretability study that trains CNNs on deliberately degraded survey data (removing Gaussian/non-Gaussian content or specific scales) and measures resulting changes in parameter constraints. This diagnostic is not derived from any fitted parameter that is then re-predicted, nor does it rely on self-definitional equations, uniqueness theorems imported from the authors' prior work, or ansatzes smuggled via self-citation. No load-bearing step reduces the central claim (reliance on mixed Gaussian/non-Gaussian information at linear-to-nonlinear scales) to a tautology or to the degradation operator itself. The approach is self-contained against external benchmarks of network performance on held-out data, yielding a normal non-finding of circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no explicit free parameters, axioms, or invented entities are described; the central claim rests on the unstated premise that the chosen degradation procedure cleanly isolates information usage.

pith-pipeline@v0.9.0 · 5776 in / 1055 out tokens · 21960 ms · 2026-05-23T04:41:42.927340+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 2 internal anchors

  1. [1]

    2016, Tensorflow: Large-scale machine learning on heterogeneous distributed systems, publication Title: arXiv.org

    Abadi, M., Agarwal, A., Barham, P., et al. 2016, Tensorflow: Large-scale machine learning on heterogeneous distributed systems, publication Title: arXiv.org

  2. [2]

    Abbott, T. M. C., Aguena, M., Alarcon, A., et al. 2022, Physical Review D, 105

  3. [3]

    Aiola, S., Calabrese, E., Maurin, L., et al. 2020, J. Cosmol. Astropart. Phys., 2020, 047

  4. [4]

    A., et al

    Amon, A., Gruen, D., Troxel, M. A., et al. 2022, Phys. Rev. D, 105, 023514

  5. [5]

    S., et al

    Balkenhol, L., Dutcher, D., Mancini, A. S., et al. 2023, Phys. Rev. D, 108, 023510

  6. [6]

    V ., & Mellier, Y

    Bernardeau, F., Waerbeke, L. V ., & Mellier, Y . 1997, Astronomy and Astro- physics, 322, 1

  7. [7]

    & King, L

    Bridle, S. & King, L. 2007, New J. Phys., 9, 444

  8. [8]

    Dietrich, J. P. & Hartlap, J. 2010, Monthly Notices of the Royal Astronomical Society, 402, 1049 Article number, page 8 of 11 G. Aymerich et al.: Interpretability of deep-learning methods applied to large-scale structure surveys

  9. [9]

    2022, Machine Learning and Cosmology, arXiv:2203.08056 [astro-ph, physics:hep-ph, stat]

    Dvorkin, C., Mishra-Sharma, S., Nord, B., et al. 2022, Machine Learning and Cosmology, arXiv:2203.08056 [astro-ph, physics:hep-ph, stat]

  10. [10]

    2019, Physical Review D, 100

    Fluri, J., Kacprzak, T., Lucchi, A., et al. 2019, Physical Review D, 100

  11. [11]

    2022, A full wCDM analysis of KiDS- 1000 weak lensing maps using Deep Learning, publication Title: arXiv.org

    Fluri, J., Kacprzak, T., Lucchi, A., et al. 2022, A full wCDM analysis of KiDS- 1000 weak lensing maps using Deep Learning, publication Title: arXiv.org

  12. [12]

    W., Lang, D., & Goodman, J

    Foreman-Mackey, D., Hogg, D. W., Lang, D., & Goodman, J. 2013, Publications of the Astronomical Society of the Pacific, 125, 306

  13. [13]

    2018, Phys

    Friedrich, O., Gruen, D., DeRose, J., et al. 2018, Phys. Rev. D, 98, 023508, pub- lisher: American Physical Society

  14. [14]

    Gong, Z., Halder, A., Barreira, A., Seitz, S., & Friedrich, O. 2023, J. Cosmol. Astropart. Phys., 2023, 040

  15. [15]

    2024, C3NN: Cosmo- logical Correlator Convolutional Neural Network – an interpretable machine learning tool for cosmological analyses, arXiv:2402.09526 [astro-ph]

    Gong, Z., Halder, A., Bohrdt, A., Seitz, S., & Gebauer, D. 2024, C3NN: Cosmo- logical Correlator Convolutional Neural Network – an interpretable machine learning tool for cosmological analyses, arXiv:2402.09526 [astro-ph]

  16. [16]

    & Abel, T

    Hahn, O. & Abel, T. 2011, Monthly Notices of the Royal Astronomical Society, 415, 2101

  17. [17]

    2015, Deep residual learning for image recognition, publication Title: arXiv.org

    He, K., Zhang, X., Ren, S., & Sun, J. 2015, Deep residual learning for image recognition, publication Title: arXiv.org

  18. [18]

    2023, A&A, 672, A44

    Heydenreich, S., Linke, L., Burger, P., & Schneider, P. 2023, A&A, 672, A44

  19. [19]

    2021, A&A, 646, A140

    Heymans, C., Tröster, T., Asgari, M., et al. 2021, A&A, 646, A140

  20. [20]

    Hirata, C. M. & Seljak, U. 2004, Phys. Rev. D, 70, 063526, publisher: American Physical Society

  21. [21]

    B., & Bridle, S

    Joachimi, B., Mandelbaum, R., Abdalla, F. B., & Bridle, S. L. 2011, A&A, 527, A26

  22. [22]

    & Fluri, J

    Kacprzak, T. & Fluri, J. 2022, Phys. Rev. X, 12, 031029

  23. [23]

    Kingma, D. P. & Ba, J. 2017, Adam: A Method for Stochastic Optimization, arXiv:1412.6980 [cs]

  24. [24]

    2012, Monthly Notices of the Royal Astronomical Society, 424, 1647

    Kirk, D., Rassat, A., Host, O., & Bridle, S. 2012, Monthly Notices of the Royal Astronomical Society, 424, 1647

  25. [25]

    S., et al

    LeCun, Y ., Boser, B., Denker, J. S., et al. 1989, Neural Computation, 1, 541, conference Name: Neural Computation

  26. [26]

    LeCun, Y ., Bottou, L., Bengio, Y ., & Ha, P. 1998

  27. [27]

    2023, Monthly Notices of the Royal Astronomical Society, 521, 2050

    Lu, T., Haiman, Z., & Li, X. 2023, Monthly Notices of the Royal Astronomical Society, 521, 2050

  28. [28]

    V ., Pontzen, A., Nord, B., & Thiyagalingam, J

    Lucie-Smith, L., Peiris, H. V ., Pontzen, A., Nord, B., & Thiyagalingam, J. 2024, Phys. Rev. D, 109, 063524

  29. [29]

    Matilla, J. M. Z., Sharma, M., Hsu, D., & Haiman, Z. 2020, Phys. Rev. D, 102, 123506, arXiv:2007.06529 [astro-ph]

  30. [30]

    2020, Sci

    Pan, S., Liu, M., Forero-Romero, J., et al. 2020, Sci. China Phys. Mech. Astron., 63, 110412

  31. [31]

    & Lombriser, L

    Piras, D. & Lombriser, L. 2024, Phys. Rev. D, 110, 023514, arXiv:2310.10717 [astro-ph] Planck Collaboration, Ade, P. A. R., Aghanim, N., et al. 2014, A&A, 571, A16 Planck Collaboration, Aghanim, N., Akrami, Y ., et al. 2020, A&A, 641, A6

  32. [32]

    2022, Phys

    Porredon, A., Crocce, M., Elvin-Poole, J., et al. 2022, Phys. Rev. D, 106, 103530

  33. [33]

    2017, Comput

    Potter, D., Stadel, J., & Teyssier, R. 2017, Comput. Astrophys., 4, 2

  34. [34]

    2016, in Proceedings of the 33rd International Conference on International Conference on Machine Learning - V olume 48, ICML’16 (New York, NY , USA: JMLR.org), 2407– 2416

    Ravanbakhsh, S., Oliva, J., Fromenteau, S., et al. 2016, in Proceedings of the 33rd International Conference on International Conference on Machine Learning - V olume 48, ICML’16 (New York, NY , USA: JMLR.org), 2407– 2416

  35. [35]

    2003, Annu

    Refregier, A. 2003, Annu. Rev. Astron. Astrophys., 41, 645

  36. [36]
  37. [37]

    J., & Müller, K.-R

    Samek, W., Montavon, G., Lapuschkin, S., Anders, C. J., & Müller, K.-R. 2021, Proceedings of the IEEE, 109, 247, conference Name: Proceedings of the IEEE

  38. [38]

    Seetharaman, P., Wichern, G., Pardo, B., & Roux, J. L. 2020, AutoClip: Adap- tive gradient clipping for Source Separation Networks, publication Title: arXiv.org

  39. [39]

    Sgier, R., Réfrégier, A., Amara, A., & Nicola, A. 2019, J. Cosmol. Astropart. Phys., 2019, 044

  40. [40]

    Shannon, C. E. 1948, Bell System Technical Journal, 27, 379

  41. [41]

    2007, IEEE Transactions on Image Pro- cessing, 16, 297

    Starck, J.-L., Fadili, J., & Murtagh, F. 2007, IEEE Transactions on Image Pro- cessing, 16, 297

  42. [42]

    D., Anglés-Alcázar, D., et al

    Villaescusa-Navarro, F., Wandelt, B. D., Anglés-Alcázar, D., et al. 2022, ApJ, 928, 44

  43. [43]

    & Villaescusa-Navarro, F

    Villanueva-Domingo, P. & Villaescusa-Navarro, F. 2021, ApJ, 907, 44 Zürcher, D., Fluri, J., Sgier, R., et al. 2022, Monthly Notices of the Royal Astro- nomical Society, 511, 2075 Article number, page 9 of 11 A&A proofs: manuscript no. aanda Appendix A: Results including intrinsic alignment In this appendix, we present the results obtained when mod- elling...