pith. sign in

arxiv: 2605.23114 · v1 · pith:3CWOAXDOnew · submitted 2026-05-22 · 🌌 astro-ph.CO

Increasing the Precision of Surrogate Models for Weak Lensing Mass Maps with Flow Matching

Pith reviewed 2026-05-25 03:55 UTC · model grok-4.3

classification 🌌 astro-ph.CO
keywords weak gravitational lensingconvergence mapsflow matchinggenerative modelssurrogate modelscosmological parametersOmega_msigma_8
0
0 comments X

The pith

A residual flow matching network generates weak lensing convergence maps whose basic and higher-order statistics match N-body simulations to within 1% and 5%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to establish that a continuous normalizing flow trained via flow matching can serve as a high-fidelity surrogate for weak lensing mass maps conditioned on Omega_m and sigma_8. This would matter if true because N-body simulations are too expensive to produce the large map ensembles needed for flexible map-level cosmological inference. The model works in residual space, learning a probability flow from label-specific noise to convergence maps for a fixed source redshift distribution. Evaluation on pixel, peak, power-spectrum, bispectrum and correlation-matrix statistics shows an order-of-magnitude gain over prior GAN emulators.

Core claim

We present a residual label-conditional flow matching generative network that conditions explicitly on the matter density Omega_m and clustering amplitude sigma_8 for a fixed source redshift distribution n(z). The model learns a continuous probability flow in a residual space from label-specific noise distributions to convergence maps. Compared with the previous GAN benchmark, the proposed method improves the typical fidelity of generated maps from below 10% and below 20% to below 1% and below 5% for basic and higher-order statistics, respectively. The agreement at the level of map distributions is also very good.

What carries the argument

Residual label-conditional flow matching generative network that conditions on Omega_m and sigma_8 and learns a continuous probability flow from noise to convergence maps.

If this is right

  • Maps generated from random noise match the distribution of maps generated with N-body simulations from random initial conditions.
  • The emulator accelerates map-level theory prediction while capturing the cosmological signal.
  • The approach supports multiple forms of data analysis at the map level.
  • Fidelity gains hold for both basic statistics and higher-order statistics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the same fidelity extends to varying source redshift distributions, the model could support tomographic analyses without retraining.
  • The flow-matching architecture may transfer to surrogate generation for other cosmological observables such as galaxy clustering fields.
  • Downstream inference pipelines could directly test whether the reported statistical improvements translate into tighter parameter constraints.

Load-bearing premise

Agreement on pixel, peak, power-spectrum, bispectrum and correlation-matrix metrics is sufficient to guarantee that the generated maps preserve all cosmological information needed for downstream inference at a fixed source redshift distribution.

What would settle it

A demonstration that cosmological parameter constraints derived from flow-matching maps differ materially from those derived from N-body maps, even when the reported statistics agree.

read the original abstract

Weak gravitational lensing maps compactly encode the evolution of cosmic large-scale structure and are a key tool for cosmological analyses. Performing inference directly at the map level allows flexible choices of statistics and can increase constraining power. Conventional methods rely solely on N-body simulations and are computationally expensive. Generative machine-learning emulators can accelerate map-level theory prediction. However, existing GAN-based map-level surrogates still have limited statistical fidelity. They can produce over-smoothed maps, may fail to capture the full distribution of generated map sets and can be difficult to train. Continuous normalizing flows trained with flow matching have recently emerged as a powerful class of generative models. We present a residual label-conditional flow matching generative network that conditions explicitly on the matter density Omega_m and clustering amplitude sigma_8 for a fixed source redshift distribution n(z). The model learns a continuous probability flow in a residual space from label-specific noise distributions to convergence maps. We evaluate it using pixel and peak statistics, the power spectrum, bispectrum, power-spectrum correlation matrices, and other validation metrics. Compared with the previous GAN benchmark, the proposed method improves the typical fidelity of generated maps from below 10% and below 20% to below 1% and below 5% for basic and higher-order statistics, respectively. The agreement at the level of map distributions is also very good: maps generated from random noise match well the distribution of maps generated with N-body simulations from random initial conditions. This work brings us closer to a practical mass-map emulator that captures the cosmological signal while supporting multiple forms of data analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a residual label-conditional flow-matching generative network for weak-lensing convergence maps, conditioned explicitly on Ω_m and σ_8 at fixed source redshift distribution n(z). The model is trained on N-body simulations and evaluated against a prior GAN benchmark using pixel/peak statistics, power spectrum, bispectrum, power-spectrum correlation matrices, and distribution-level agreement metrics. It reports fidelity improvements from <10%/20% to <1%/5% on basic and higher-order statistics, respectively, and states that generated maps match the distribution of N-body maps from random initial conditions.

Significance. If the reported fidelity gains hold under the stated conditions, the work advances surrogate modeling for map-level weak-lensing analyses by demonstrating that flow matching can achieve substantially higher statistical agreement than GANs on the tested summary statistics. Explicit credit is due for the quantitative, multi-statistic comparison to an established benchmark and for the use of a continuous normalizing-flow approach that avoids some documented GAN training difficulties.

major comments (2)
  1. [Evaluation / validation metrics] Evaluation section (metrics listed in abstract and validation): agreement on pixel, peak, power-spectrum, bispectrum, and correlation-matrix statistics is shown, but these summaries do not automatically certify that the residual flow-matching distribution matches the N-body distribution in all directions relevant to map-level likelihood-free inference. A mismatch in non-local or higher-than-bispectrum features could remain invisible while still biasing downstream parameter constraints; the manuscript should add at least one direct test (e.g., parameter recovery from generated vs. simulated maps) to support the claim that the model “captures the cosmological signal.”
  2. [Abstract and methods description] Conditioning and scope: the model is trained and evaluated only for a single fixed n(z). Because the abstract positions the emulator as a step toward practical map-level inference, the lack of any test or discussion of performance under varied source-redshift distributions is a load-bearing limitation for the broader utility claim.
minor comments (2)
  1. [Methods] Notation for the residual space and label conditioning should be defined explicitly in the methods section rather than left implicit from the abstract description.
  2. [Figures] Figure captions for the comparison plots should state the exact number of realizations used for each statistic and whether error bars reflect sample variance or bootstrap estimates.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment of our work and for the constructive major comments. We address each point below.

read point-by-point responses
  1. Referee: [Evaluation / validation metrics] Evaluation section (metrics listed in abstract and validation): agreement on pixel, peak, power-spectrum, bispectrum, and correlation-matrix statistics is shown, but these summaries do not automatically certify that the residual flow-matching distribution matches the N-body distribution in all directions relevant to map-level likelihood-free inference. A mismatch in non-local or higher-than-bispectrum features could remain invisible while still biasing downstream parameter constraints; the manuscript should add at least one direct test (e.g., parameter recovery from generated vs. simulated maps) to support the claim that the model “captures the cosmological signal.”

    Authors: We agree that agreement on the listed summary statistics, while extensive, does not by itself rule out mismatches in untested higher-order or non-local features that could affect downstream inference. The manuscript already reports distribution-level agreement between generated and N-body map ensembles (via the quoted random-initial-condition comparison), but we accept that an explicit parameter-recovery test would provide stronger support for the claim that the cosmological signal is captured. We will add this test in the revised manuscript. revision: yes

  2. Referee: [Abstract and methods description] Conditioning and scope: the model is trained and evaluated only for a single fixed n(z). Because the abstract positions the emulator as a step toward practical map-level inference, the lack of any test or discussion of performance under varied source-redshift distributions is a load-bearing limitation for the broader utility claim.

    Authors: The work is explicitly scoped to a single fixed n(z), as stated in the abstract, methods, and title. We acknowledge that the abstract frames the result as progress toward practical map-level inference and that the absence of any test or discussion for varied n(z) is a genuine limitation on the broader claim. In revision we will add an explicit discussion of this scope limitation together with an outline of how the conditioning could be extended to include n(z) parameters in future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity; model trained on external simulations with independent metric evaluation

full rationale

The paper trains a flow-matching generative model on N-body simulations and evaluates generated maps against separate summary statistics (pixel/peak counts, power spectrum, bispectrum, correlation matrices) computed from the same external simulations. No equations or claims in the abstract or described method reduce the reported fidelity improvements to quantities defined by the fit itself. The central result is a comparison of two independently generated map ensembles, with no self-citation load-bearing on the core claim and no renaming or ansatz smuggling. This is the normal case of a self-contained empirical comparison.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the flow-matching objective learns the correct probability flow and that the chosen validation statistics adequately capture cosmological information; the model weights are fitted during training but are not free parameters in the scientific sense.

free parameters (1)
  • network architecture and training hyperparameters
    Standard ML choices (layers, learning rate, batch size) that are fitted or selected during training; not enumerated in the abstract.
axioms (1)
  • domain assumption Flow matching correctly transports noise distributions to the target data distribution when conditioned on labels.
    Invoked implicitly by the choice of training objective for the generative network.

pith-pipeline@v0.9.0 · 5816 in / 1270 out tokens · 26351 ms · 2026-05-25T03:55:17.025637+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 6 internal anchors

  1. [1]

    Bartelmann and P

    M. Bartelmann and P. Schneider,Weak gravitational lensing,Physics Reports340(2001) 291

  2. [2]

    Kilbinger,Cosmology with cosmic shear observations: a review,Reports on Progress in Physics78(2015) 086901

    M. Kilbinger,Cosmology with cosmic shear observations: a review,Reports on Progress in Physics78(2015) 086901

  3. [3]

    de Jong, G.A

    J.T. de Jong, G.A. Verdoes Kleijn, K.H. Kuijken, E.A. Valentijn, KiDS and A.-W. Consortiums,The Kilo-Degree Survey,Experimental Astronomy35(2013) 25

  4. [4]

    Collaboration:, T

    D.E.S. Collaboration:, T. Abbott, F. Abdalla, J. Aleksi´ c, S. Allam, A. Amara et al.,The Dark Energy Survey: more than dark energy–an overview,Monthly Notices of the Royal Astronomical Society460(2016) 1270

  5. [5]

    Aihara, R

    H. Aihara, R. Armstrong, S. Bickerton, J. Bosch, J. Coupon, H. Furusawa et al.,First data release of the Hyper Suprime-Cam Subaru Strategic Program,Publications of the Astronomical Society of Japan70(2018) S8

  6. [6]

    Euclid Definition Study Report

    R. Laureijs, J. Amiaux, S. Arduini, J.-L. Augueres, J. Brinchmann, R. Cole et al.,Euclid definition study report,arXiv preprint arXiv:1110.3193(2011)

  7. [7]

    Ivezi´ c, S.M

    ˇZ. Ivezi´ c, S.M. Kahn, J.A. Tyson, B. Abel, E. Acosta, R. Allsman et al.,LSST: from science drivers to reference design and anticipated data products,The Astrophysical Journal873 (2019) 111

  8. [8]

    B. Jain, U. Seljak and S. White,Ray-tracing simulations of weak lensing by large-scale structure,The Astrophysical Journal530(2000) 547

  9. [9]

    Takahashi, T

    R. Takahashi, T. Hamana, M. Shirasaki, T. Namikawa, T. Nishimichi, K. Osato et al.,Full-sky gravitational lensing simulation for large-area galaxy surveys and cosmic microwave background experiments,The Astrophysical Journal850(2017) 24

  10. [10]

    Petri, Z

    A. Petri, Z. Haiman and M. May,Sample variance in weak lensing: How many simulations are required?,Physical Review D93(2016) 063524

  11. [11]

    Harnois-Deraps, B

    J. Harnois-Deraps, B. Giblin and B. Joachimi,Cosmic shear covariance matrix inwCDM: cosmology matters,Astronomy & Astrophysics631(2019) A160

  12. [12]

    Ribli, B

    D. Ribli, B. ´A. Pataki, J.M. Zorrilla Matilla, D. Hsu, Z. Haiman and I. Csabai,Weak lensing cosmology with convolutional neural networks on noisy data,Monthly Notices of the Royal Astronomical Society490(2019) 1843

  13. [13]

    Fluri, T

    J. Fluri, T. Kacprzak, A. Lucchi, A. Schneider, A. Refregier and T. Hofmann,FullwCDM analysis of KiDS-1000 weak lensing maps using deep learning,Physical Review D105(2022) 083518

  14. [14]

    Sharma, B

    D. Sharma, B. Dai and U. Seljak,A comparative study of cosmological constraints from weak lensing using convolutional neural networks,Journal of Cosmology and Astroparticle Physics 2024(2024) 010

  15. [15]

    Mustafa, D

    M. Mustafa, D. Bard, W. Bhimji, Z. Luki´ c, R. Al-Rfou and J.M. Kratochvil,CosmoGAN: creating high-fidelity weak lensing convergence maps using generative adversarial networks, Computational Astrophysics and Cosmology6(2019) 1. – 20 –

  16. [16]

    Perraudin, S

    N. Perraudin, S. Marcon, A. Lucchi and T. Kacprzak,Emulation of cosmological mass maps with conditional generative adversarial networks,Frontiers in Artificial Intelligence4(2021) 673062

  17. [17]

    T.W.H. Yiu, J. Fluri and T. Kacprzak,A tomographic spherical mass map emulator of the KiDS-1000 survey using conditional generative adversarial networks,Journal of Cosmology and Astroparticle Physics2022(2022) 013

  18. [18]

    B. Remy, F. Lanusse, N. Jeffrey, J. Liu, J.-L. Starck, K. Osato et al.,Probabilistic mass-mapping with neural score estimation,Astronomy & Astrophysics672(2023) A51

  19. [19]

    Denoising weak lensing mass maps with diffusion model: systematic comparison with generative adversarial network

    S.D. Aoyama, K. Osato and M. Shirasaki,Denoising weak lensing mass maps with diffusion model: systematic comparison with generative adversarial network,arXiv preprint arXiv:2505.00345(2025)

  20. [20]

    Boruah, M

    S.S. Boruah, M. Jacob and B. Jain,Diffusion-based mass map reconstruction from weak lensing data,Physical Review D111(2025) 083542

  21. [21]

    L. Dinh, J. Sohl-Dickstein and S. Bengio,Density estimation using Real NVP,arXiv preprint arXiv:1605.08803(2016)

  22. [22]

    Kingma and P

    D.P. Kingma and P. Dhariwal,Glow: generative flow with invertible 1x1 convolutions, Advances in neural information processing systems31(2018)

  23. [23]

    Papamakarios, E

    G. Papamakarios, E. Nalisnick, D.J. Rezende, S. Mohamed and B. Lakshminarayanan, Normalizing flows for probabilistic modeling and inference,Journal of Machine Learning Research22(2021) 1

  24. [24]

    Flow Matching for Generative Modeling

    Y. Lipman, R.T. Chen, H. Ben-Hamu, M. Nickel and M. Le,Flow matching for generative modeling,arXiv preprint arXiv:2210.02747(2022)

  25. [25]

    X. Liu, C. Gong and Q. Liu,Flow straight and fast: Learning to generate and transfer data with rectified flow,arXiv preprint arXiv:2209.03003(2022)

  26. [26]

    R.T. Chen, Y. Rubanova, J. Bettencourt and D.K. Duvenaud,Neural ordinary differential equations,Advances in neural information processing systems31(2018)

  27. [27]

    A. Tong, K. Fatras, N. Malkin, G. Huguet, Y. Zhang, J. Rector-Brooks et al.,Improving and generalizing flow-based generative models with minibatch optimal transport,arXiv preprint arXiv:2302.00482(2023)

  28. [28]

    K. Diao, B. Dai and U. Seljak,Detecting modeling bias with continuous time flow models on weak lensing maps,Journal of Cosmology and Astroparticle Physics2025(2025) 004

  29. [29]

    Zeghal, B

    J. Zeghal, B. Remy, Y. Hezaveh, F. Lanusse and L.P. Levasseur,Bridging simulators with conditional optimal transport,arXiv preprint arXiv:2510.24631(2025)

  30. [30]

    Kannan, T

    S. Kannan, T. Qiu, C. Cuesta-Lazaro and H. Jeong,CosmoFlow: scale-aware representation learning for cosmology with flow matching,arXiv preprint arXiv:2507.11842(2025)

  31. [31]

    Tamosiunas, H.A

    A. Tamosiunas, H.A. Winther, K. Koyama, D.J. Bacon, R.C. Nichol and B. Mawdsley, Investigating cosmological gan emulators using latent space interpolation,Monthly Notices of the Royal Astronomical Society506(2021) 3049

  32. [32]

    Flow matching with gaussian process priors for probabilistic time series forecasting.arXiv preprint arXiv:2410.03024, 2024

    M. Kollovieh, M. Lienen, D. L¨ udke, L. Schwinn and S. G¨ unnemann,Flow matching with gaussian process priors for probabilistic time series forecasting,arXiv preprint arXiv:2410.03024(2024)

  33. [33]

    Issachar, M

    N. Issachar, M. Salama, R. Fattal and S. Benaim,Designing a conditional prior distribution for flow-based generative models,arXiv preprint arXiv:2502.09611(2025)

  34. [34]

    J. Wu, X. Kong, N. Sun, J. Wei, S. Shan, F. Feng et al.,FlowDesign: Improved design of antibody CDRs through flow matching and better prior distributions,Cell Systems16(2025) 101270. – 21 –

  35. [35]

    Fluri, T

    J. Fluri, T. Kacprzak, A. Lucchi, A. Refregier, A. Amara, T. Hofmann et al.,Cosmological constraints with deep learning from KiDS-450 weak lensing maps,arXiv preprint arXiv:1906.03156(2019)

  36. [36]

    Taruya, M

    A. Taruya, M. Takada, T. Hamana, I. Kayo and T. Futamase,Lognormal property of weak-lensing fields,The Astrophysical Journal571(2002) 638

  37. [37]

    Clerkin, D

    L. Clerkin, D. Kirk, M. Manera, O. Lahav, F. Abdalla, A. Amara et al.,Testing the lognormality of the galaxy and weak lensing convergence distributions from Dark Energy Survey maps,Monthly Notices of the Royal Astronomical Society466(2017) 1444

  38. [38]

    Boruah, E

    S.S. Boruah, E. Rozo and P. Fiedorowicz,Map-based cosmology inference with lognormal cosmic shear maps,Monthly Notices of the Royal Astronomical Society516(2022) 4111

  39. [39]

    Wang, A.C

    Z. Wang, A.C. Bovik, H.R. Sheikh and E.P. Simoncelli,Image quality assessment: from error visibility to structural similarity,IEEE Transactions on Image Processing13(2004) 600

  40. [40]

    Wang, E.P

    Z. Wang, E.P. Simoncelli and A.C. Bovik,Multiscale structural similarity for image quality assessment, inThe Thirty-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2, pp. 1398–1402, IEEE, 2003

  41. [41]

    Petri,Mocking the weak lensing universe: the LensTools Python computing package, Astronomy and Computing17(2016) 73

    A. Petri,Mocking the weak lensing universe: the LensTools Python computing package, Astronomy and Computing17(2016) 73

  42. [42]

    Boruah, P

    S.S. Boruah, P. Fiedorowicz, R. Garcia, W.R. Coulton, E. Rozo and G. Fabbian,GANSky: fast curved-sky weak lensing simulations using generative adversarial networks,arXiv preprint arXiv:2406.05867(2024)

  43. [43]

    Boruah, P

    S.S. Boruah, P. Fiedorowicz and E. Rozo,Bayesian mass mapping with weak lensing data using KaRMMa: validation with simulations and application to Dark Energy Survey Year 3 data, Physical Review D110(2024) 023524

  44. [44]

    Kacprzak, J

    T. Kacprzak, J. Fluri, A. Schneider, A. Refregier and J. Stadel,CosmoGridV1: a simulated ΛCDM theory prediction for map-level cosmological inference,Journal of Cosmology and Astroparticle Physics2023(2023) 050

  45. [45]

    Alsing, B

    J. Alsing, B. Wandelt and S. Feeney,Massive optimal data compression and density estimation for scalable, likelihood-free inference in cosmology,Monthly Notices of the Royal Astronomical Society477(2018) 2874

  46. [46]

    Jeffrey, L

    N. Jeffrey, L. Whiteway, M. Gatti, J. Williamson, J. Alsing, A. Porredon et al.,Dark Energy Survey Year 3 results: likelihood-free, simulation-basedwCDM inference with neural compression of weak-lensing map statistics,Monthly Notices of the Royal Astronomical Society 536(2025) 1303

  47. [47]

    prior”) can be far from the corresponding label-conditional targetsq(x|y) (crosses; “target

    J. Zeghal, D. Lanzieri, F. Lanusse, A. Boucaud, G. Louppe, E. Aubourg et al.,Simulation-based inference benchmark for weak lensing cosmology,Astronomy & Astrophysics699(2025) A327. A Network architecture Dataset.The dataset used in this work is described in Section 3. Each mass map has a pixel dimension of 1×128×128. We select 46 cosmologies (552,000 samp...