Constraining dark matter halo profiles with symbolic regression
Pith reviewed 2026-05-17 04:26 UTC · model grok-4.3
The pith
Exhaustive symbolic regression recovers the NFW profile from weak lensing data when fractional errors are around 5 percent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Exhaustive symbolic regression applied to mock excess surface density data drawn from NFW halos selects the NFW function for fractional uncertainties of approximately 5 percent, even when the sample contains only 20 clusters. At the higher uncertainty levels that characterize present-day weak lensing surveys, simpler analytic functions are preferred although NFW remains competitive; this occurs because the measurement errors are smallest in the cluster outskirts, so the fit is driven by the outer density slope.
What carries the argument
Exhaustive Symbolic Regression (ESR), which searches over a space of possible analytic functions to identify the expression that fits the data while remaining as simple as possible.
If this is right
- For fractional errors near 5 percent, the NFW profile is recovered from samples as small as 20 clusters.
- At uncertainties typical of current surveys, simpler functions are selected over NFW though it stays competitive.
- The selection of simpler functions is driven by weak lensing errors being smallest at large radii.
- The approach supplies a simulation-independent framework for testing mass models and determining which profile features data actually constrain.
Where Pith is reading between the lines
- Applying the same regression to actual observed cluster lensing data could show whether real measurements support NFW or require even simpler descriptions.
- Comparing results across different cluster samples might reveal if the preferred profile depends on mass or redshift in ways not captured by fixed forms.
- Using mock data generated from non-NFW profiles would test how well the method identifies the true underlying shape under realistic noise.
- The technique could be extended to joint fits with other probes such as X-ray or Sunyaev-Zel'dovich observations to tighten constraints on the density profile.
Load-bearing premise
Constant fractional uncertainty assigned to each excess surface density point together with mock data generated exactly from NFW profiles is enough to represent the error properties and selection biases present in real weak lensing observations of galaxy clusters.
What would settle it
Running exhaustive symbolic regression on real weak lensing excess surface density measurements from a large sample of galaxy clusters and checking if the best-fit simple functions match the NFW form or deviate toward power laws or other simpler expressions.
Figures
read the original abstract
Dark matter haloes are typically characterised by radial density profiles with fixed forms motivated by simulations (e.g. NFW). However, simulation predictions depend on uncertain dark matter physics and baryonic modelling. Here, we present a method to constrain halo density profiles directly from observations using Exhaustive Symbolic Regression (ESR), a technique that searches the space of analytic expressions for the function that best balances accuracy and simplicity for a given dataset. We test the approach on mock weak lensing excess surface density (ESD) data of synthetic clusters with NFW profiles. Motivated by real data, we assign each ESD data point a constant fractional uncertainty and vary this uncertainty and the number of clusters to probe how data precision and sample size affect model selection. For fractional errors around 5%, ESR recovers the NFW profile even from samples as small as 20 clusters. At higher uncertainties representative of current surveys, simpler functions are favoured over NFW, though it remains competitive. This preference arises because weak lensing errors are smallest in the outskirts, causing the fits to be dominated by the outer profile. ESR therefore provides a robust, simulation-independent framework both for testing mass models and determining which features of a halo's density profile are genuinely constrained by the data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Exhaustive Symbolic Regression (ESR) as a method to derive analytic expressions for dark matter halo density profiles directly from weak-lensing excess surface density (ESD) data, without assuming simulation-motivated forms such as NFW. On mock ESD datasets generated from synthetic NFW clusters, with each point assigned a constant fractional uncertainty, the approach recovers the input NFW profile for fractional errors around 5% even with samples as small as 20 clusters. At higher uncertainties representative of current surveys, simpler functional forms are selected over NFW, which the authors attribute to the radial dependence of weak-lensing errors.
Significance. If the central results hold, the work offers a simulation-independent framework for testing halo-profile assumptions and identifying which radial features are actually constrained by data. The mock tests provide a clear demonstration of recovery behavior under controlled error levels and sample sizes, which strengthens the methodological contribution. The approach could be valuable for future surveys if the model-selection outcomes can be shown to reflect genuine observational constraints rather than specifics of the mock error model.
major comments (1)
- [Abstract] Abstract: the statement that 'simpler functions are favoured over NFW... because weak lensing errors are smallest in the outskirts, causing the fits to be dominated by the outer profile' is not supported by the described mock-data procedure, which assigns a constant fractional uncertainty to every ESD data point. Constant fractional errors lack the radial variation (smaller uncertainties at large radii) invoked in the explanation; therefore the observed preference for simpler expressions may be an artifact of the uniform-error assumption rather than a reflection of physical weak-lensing error distributions. This directly affects the interpretation of the higher-uncertainty results and the claim that the method reveals genuine data constraints on halo-profile features.
minor comments (2)
- [Abstract / Methods] The abstract and methods section would benefit from explicit statements of the ESR search space, complexity penalty functional form, and quantitative model-comparison statistics (e.g., exact AIC or BIC thresholds) used to declare one expression 'favoured'.
- [Figures / Results] Figure captions and text should clarify whether the reported 'recovery' is based on exact functional match, parameter recovery within uncertainties, or a statistical model-selection criterion.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the work and for the constructive comment on the abstract. We address the major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the statement that 'simpler functions are favoured over NFW... because weak lensing errors are smallest in the outskirts, causing the fits to be dominated by the outer profile' is not supported by the described mock-data procedure, which assigns a constant fractional uncertainty to every ESD data point. Constant fractional errors lack the radial variation (smaller uncertainties at large radii) invoked in the explanation; therefore the observed preference for simpler expressions may be an artifact of the uniform-error assumption rather than a reflection of physical weak-lensing error distributions. This directly affects the interpretation of the higher-uncertainty results and the claim that the method reveals genuine data constraints on halo-profile features.
Authors: We appreciate the referee drawing attention to this detail. Assigning a constant fractional uncertainty to each ESD point does produce radially varying absolute uncertainties, since the ESD signal itself declines with radius. The absolute errors (fractional uncertainty multiplied by the local ESD value) are therefore smallest in the outskirts. When performing the fits, these outer points carry greater weight in the likelihood, causing the model selection to be dominated by the outer profile. This directly supports the explanation given in the abstract. To prevent any ambiguity, we will revise the abstract and the relevant methods paragraph to state explicitly that the constant fractional error model implies smaller absolute uncertainties at large radii. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper's central results are obtained by generating mock ESD data from known NFW profiles, assigning constant fractional uncertainties to each point, and then running ESR to identify the best-fitting analytic expressions as a function of error level and sample size. These outcomes follow directly from the simulation protocol and the ESR search procedure without any step reducing by construction to a fitted parameter, self-defined quantity, or load-bearing self-citation. The abstract's explanatory remark about radial error variation is an interpretive comment on real data rather than a premise that the mock results depend upon; the reported recovery rates and model preferences are independent of that remark. No equations or derivation chain in the described workflow collapses to its inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Mock ESD data generated from NFW profiles with constant fractional uncertainty per point is representative for testing profile recovery and model selection.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We test the approach on mock weak lensing excess surface density (ESD) data of synthetic clusters with NFW profiles... assign each ESD data point a constant fractional uncertainty... rank... using the Minimum Description Length (MDL) principle
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
NFW profile... recovered... at fractional errors around 5%
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 3 Pith papers
-
Exhaustive Symbolic Integration: Integration by Differentiation and the Landscape of Symbolic Integrability
Exhaustive enumeration of functions up to complexity k across operator bases shows the integrability fraction declines with k but rises sharply with logarithms, and the method discovers three integrals that resist Sym...
-
The functional form of galaxy and halo luminosity and mass functions
Exhaustive symbolic regression identifies low-complexity functional forms for luminosity and mass functions that outperform Schechter and Press-Schechter parametrizations while satisfying physical extrapolation and in...
-
Model-independent constraints on generalized FLRW consistency relations with bootstrap-based symbolic regression
Bootstrap-based symbolic regression on supernova and BAO data finds mild 2-4 sigma deviations from FLRW consistency relations, which if real would rule out most FLRW-based solutions to cosmological tensions.
Reference graph
Works this paper leans on
-
[1]
Frenk CS, White SD, Davis M, Efstathiou G. 1988 The formation of dark halos in a universe dominated by cold dark matter.Astrophysical Journal, Part 1 (ISSN 0004-637X), vol. 327, April 15, 1988, p. 507-525.327, 507–525
work page 1988
-
[2]
1984 Formation of galaxies and large-scale structure with cold dark matter.Nature311, 517–525
Blumenthal GR, Faber S, Primack JR, Rees MJ. 1984 Formation of galaxies and large-scale structure with cold dark matter.Nature311, 517–525
work page 1984
-
[3]
1997 A universal density profile from hierarchical clustering
Navarro JF, Frenk CS, White SD. 1997 A universal density profile from hierarchical clustering. The Astrophysical Journal490, 493
work page 1997
-
[4]
Banik U, Bhattacharjee A. 2025 Collisionless relaxation to quasi-steady state attractors in cold dark matter halos: origin of the universal NFW profile.arXiv preprint arXiv:2506.02104
-
[5]
2010 The Core-Cusp Problem.Advances in Astronomy2010, 789293
De Blok W. 2010 The Core-Cusp Problem.Advances in Astronomy2010, 789293
work page 2010
-
[6]
Duffy AR, Schaye J, Kay ST et al.. 2010 Impact of baryon physics on dark matter structures: a detailed simulation study of halo density profiles.Monthly Notices of the Royal Astronomical Society405, 2161–2178
work page 2010
-
[7]
Schaye J, Crain RA et al.. 2015 The EAGLE project: simulating the evolution and assembly of galaxies and their environments.Monthly Notices of the Royal Astronomical Society446, 521–554
work page 2015
-
[8]
2014 Properties of galaxies reproduced by a hydrodynamic simulation.Nature509, 177–182
Vogelsberger M, Genel S, Springel V et al.. 2014 Properties of galaxies reproduced by a hydrodynamic simulation.Nature509, 177–182
work page 2014
-
[9]
Euclid Definition Study Report
Laureijs R, Amiaux J, Arduini Se. 2011 Euclid Definition Study Report.arXiv e-printsp. arXiv:1110.3193. (10.48550/arXiv.1110.3193)
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1110.3193 2011
-
[10]
The DESI Experiment Part I: Science,Targeting, and Survey Design
DESI Collaboration. 2016 The DESI Experiment Part I: Science,Targeting, and Survey Design. arXiv e-printsp. arXiv:1611.00036. (10.48550/arXiv.1611.00036)
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1611.00036 2016
-
[11]
LSST Science Book, Version 2.0
LSST Science Collaboration. 2009 LSST Science Book, Version 2.0.arXiv e-printsp. arXiv:0912.0201. (10.48550/arXiv.0912.0201)
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.0912.0201 2009
-
[12]
Kronberger G, Burlacu B, Kommenda M, Winkler SM, Affenzeller M. 2024Symbolic Regression. Chapman & Hall / CRC Press
-
[13]
2024 Exhaustive Symbolic Regression.IEEE T ransactions on Evolutionary Computation28, 950–964
Bartlett DJ, Desmond H, Ferreira PG. 2024 Exhaustive Symbolic Regression.IEEE T rans. Evol. Computat.28, 950. (10.1109/TEVC.2023.3280250)
-
[14]
On the functional form of the radial acceleration relation , volume =
Desmond H, Bartlett DJ, Ferreira PG. 2023 On the functional form of the radial acceleration relation.Mon. Not. R. Astron. Soc.521, 1817–1831. (10.1093/mnras/stad597)
-
[15]
2024 Class Symbolic Regression: Gotta Fit’Em All.The Astrophysical Journal Letters969, L26
Tenachi W, Ibata R, François TL, Diakogiannis FI. 2024 Class Symbolic Regression: Gotta Fit’Em All.The Astrophysical Journal Letters969, L26
work page 2024
-
[16]
2009Computing machinery and intelligence
Turing AM. 2009Computing machinery and intelligence. Springer
-
[17]
1994 Genetic and evolutionary algorithms come of age.Communications of the ACM37, 113–120
Goldberg DE. 1994 Genetic and evolutionary algorithms come of age.Communications of the ACM37, 113–120
work page 1994
-
[18]
Cranmer M. 2020 Pysr: Fast & parallelized symbolic regression in python/julia.http:// doi.org/10.5281/zenodo.4041459. 18royalsocietypublishing.org/journal/rsta Phil. Trans. R. Soc. A 0000000
-
[19]
Kronberger G, de Franca FO, Desmond H, Bartlett DJ, Kammerer L. 2024 The Inefficiency of Genetic Programming for Symbolic Regression–Extended Version.arXiv preprint arXiv:2404.17292
-
[20]
Sousa T, Bartlett DJ, Desmond H, Ferreira PG Optimal Inflationary Potentials.Physical Review D109, 083524. (10.1103/PhysRevD.109.083524)
-
[21]
Petersen BK, Larma ML, Mundhenk TN et al.. 2021 Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients. InInternational Conference on Learning Representations
work page 2021
-
[22]
The Convergence of a Class of Double-rank Minimization Algorithms 1
BROYDEN CG. 1970 The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations.IMA Journal of Applied Mathematics6, 76–90. (10.1093/imamat/6.1.76)
-
[23]
1970 A new approach to variable metric algorithms.The Computer Journal13, 317–
Fletcher R. 1970 A new approach to variable metric algorithms.The Computer Journal13, 317–
work page 1970
-
[24]
(10.1093/comjnl/13.3.317)
-
[25]
1970 A Family of Variable-Metric Methods Derived by Variational Means
Goldfarb D. 1970 A Family of Variable-Metric Methods Derived by Variational Means. Mathematics of Computation24, 23–26
work page 1970
-
[26]
1970 Conditioning of Quasi-Newton Methods for Function Minimization
Shanno DF. 1970 Conditioning of Quasi-Newton Methods for Function Minimization. Mathematics of Computation24, 647–656
work page 1970
-
[27]
A Simplex Method for Function Minimization,
Nelder JA, Mead R. 1965 A Simplex Method for Function Minimization.The Computer Journal 7, 308–313. (10.1093/comjnl/7.4.308)
-
[28]
1978 Modeling by shortest data description.Automatica14, 465–471
Rissanen J. 1978 Modeling by shortest data description.Automatica14, 465–471
work page 1978
-
[29]
2007The minimum description length principle
Grünwald PD. 2007The minimum description length principle. MIT press
-
[30]
Grünwald PD, Roos T. 2019 Minimum description length revisited.International journal of mathematics for industry11, 1930001
work page 2019
-
[31]
1999Elements of information theory
Cover TM. 1999Elements of information theory. John Wiley & Sons
-
[32]
2023 Priors for symbolic regression
Bartlett D, Desmond H, Ferreira P . 2023 Priors for symbolic regression. InProceedings of the Companion Conference on Genetic and Evolutionary Computationpp. 2402–2411
work page 2023
-
[33]
2001, Physics Reports, 340, 291, 10.1016/S0370-1573(00)00082-X
Bartelmann M, Schneider P . 2001 Weak gravitational lensing.Physics Reports340, 291–472. (10.1016/s0370-1573(00)00082-x)
-
[34]
(10.1088/1475-7516/2022/10/034)
Cromer D, Battaglia N, Miyatake H, Simet M Towards 1% accurate galaxy cluster masses: Including baryons in weak-lensing mass inference.Journal of Cosmology and Astroparticle Physics2022, 034. (10.1088/1475-7516/2022/10/034)
-
[36]
Bhattacharya S, Habib S, Heitmann K, Vikhlinin A. 2013 Dark matter Halo profiles of massive clusters: Theory versus observations.The Astrophysical Journal766, 32
work page 2013
-
[37]
Umetsu K, Medezinski E, Nonino M, Merten J, Postman M, Meneghetti M, Donahue M, Czakon N, Molino A, Seitz S et al.. 2014 CLASH: weak-lensing shear-and-magnification analysis of 20 galaxy clusters.The Astrophysical Journal795, 163
work page 2014
-
[38]
Umetsu K Cluster-galaxy weak lensing.28, 7. (10.1007/s00159-020-00129-w)
-
[39]
Chiu IN, Umetsu K, Sereno M, Ettori S, Meneghetti M, Merten J, Sayers J, Zitrin A. 2018 CLUMP-3D: Three-dimensional Shape and Structure of 20 CLASH Galaxy Clusters from Combined Weak and Strong Lensing.The Astrophysical Journal860, 126. (10.3847/1538- 4357/aac4a0)
-
[40]
Lelli F, McGaugh SS, Schombert JM. 2016 SPARC: Mass models for 175 disk galaxies with Spitzer photometry and accurate rotation curves.The Astronomical Journal152, 157
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.