pith. machine review for the scientific record. sign in

arxiv: 2605.01004 · v1 · submitted 2026-05-01 · 🌌 astro-ph.HE · astro-ph.IM

Recognition: unknown

Neural Posterior Estimation for UHECR source inference from 3D propagation simulations

Authors on Pith no claims yet

Pith reviewed 2026-05-09 18:23 UTC · model grok-4.3

classification 🌌 astro-ph.HE astro-ph.IM
keywords ultra high energy cosmic rayssource inferencesimulation based inferenceneural posterior estimationCRPropacosmic ray propagationdeep learning for astrophysicsnormalizing flows
0
0 comments X

The pith

A neural model trained on 3D propagation simulations infers source energy, distance, direction, and composition for individual ultra-high energy cosmic ray events with calibrated posteriors and no systematic bias.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Ultra-high energy cosmic rays lose energy and change direction while traveling through space, which makes tracing them back to their sources difficult. The paper trains a neural network on millions of detailed three-dimensional simulations of this propagation process so that, for each detected event, the model outputs full probability distributions over the possible source properties. The network processes the variable number of secondary particles produced in each event and recovers all source parameters accurately with no bias while identifying the primary particle type at over 98 percent accuracy. This creates a practical, scalable link between complex physics simulations and Bayesian analysis of real cosmic ray observations.

Core claim

The authors train a model combining a Deep Set encoder and a normalizing flow on roughly 5 million events simulated with CRPropa 3 across many extragalactic magnetic field setups. For each individual ultra-high energy cosmic ray event, the model produces calibrated posterior distributions for the source's energy, distance, direction, and primary composition. On held-out simulations, the parameters are recovered without systematic bias, with direction best constrained and distance least certain, while composition classification accuracy is at least 98.2 percent for all mass groups.

What carries the argument

Deep Set encoder that processes variable numbers of detected secondary particles together with a normalizing flow for density estimation, trained end-to-end on three-dimensional CRPropa 3 propagation simulations.

If this is right

  • Source parameters for single events can be inferred directly instead of through statistical population studies.
  • The calibrated posteriors allow reliable quantification of uncertainties in source properties.
  • Primary composition can be determined per event with high accuracy across all mass groups.
  • The framework scales to the large event samples expected from current and future observatories.
  • It serves as an interface that makes detailed propagation physics usable in Bayesian source inference.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If applied to real data, this could allow matching individual events to specific candidate sources like nearby galaxies or active galactic nuclei.
  • The method might be extended to infer properties of the extragalactic magnetic fields in addition to source parameters.
  • Combining the inferred posteriors with multi-messenger data from neutrinos or gamma rays could improve source identification.
  • Robustness can be tested by retraining or validating on simulations with alternative models of cosmic ray interactions.

Load-bearing premise

The CRPropa 3 simulations with the selected extragalactic magnetic field configurations and particle interaction models are representative enough of real physics that the learned posteriors will be accurate for actual observed events.

What would settle it

Apply the trained model to a set of real ultra-high energy cosmic ray events and check whether the resulting source posterior distributions align with independent catalogs of candidate sources, or test whether the posteriors remain calibrated on new simulations that use different magnetic field strengths or interaction models.

Figures

Figures reproduced from arXiv: 2605.01004 by Francesca Capel, Nadine Bourriche, Nicole Hartmann.

Figure 1
Figure 1. Figure 1: Method for UHECR source estimation. We process the secondaries {xsec,i} n i=1 with a Deep Set to flexibly handle the variable cardinality of the secondaries. Inputs xsec,i are processed through per-particle networks Fsec. The latent space hsec is concatenated onto the event level inputs xev and correlations are modelled with a per-event neural network, Fev. The latent space output from Fev is then passes t… view at source ↗
Figure 2
Figure 2. Figure 2: True versus recovered parameter values from the prior recovery test. Each panel compares the ground truth value to the recovered prior: (a) energy at the source, (b) distance of the source, (c) galactic longitude of the source, and (d) galactic latitude of the source. event in panel b the true value falls within the 90% con￾tour. By contrast, the posterior for Dsrc is consistently the broadest, which again… view at source ↗
Figure 3
Figure 3. Figure 3: Normalized confusion matrix for primary cosmic￾ray composition classification on the full validation set. Rows correspond to true compositions and columns to model pre￾dictions, with each entry expressing the fraction of true-class events assigned to the predicted class. The lightest compo￾sition, 1H and 4He, are classified with 100% accuracy. The heavier nuclei 14N, 28Si, and 56Fe are correctly identified… view at source ↗
Figure 4
Figure 4. Figure 4: Scatter plots of the true values of 50 randomly picked validation events compared with the predicted values. On the upper panel the results for the source longitude and latitude, in the lower panel the results for the source energy and source distance. In dark red, green, blue and purple the 1σ error bars and in light red, green, blue and purple the 2σ error bars. and enable conditioning on different GMF r… view at source ↗
Figure 5
Figure 5. Figure 5: Posterior uncertainty σ for all four source parameters as a function of Brms√ Lc, with points colored by the true source distance Dsrc. Each panel corresponds to one of the four inferred source parameters: Galactic longitude σglon , Galactic latitude σglat , source energy σE, and source distance σD. direction. For these reasons, we haven’t explored neural likelihood estimation here, and leave these studies… view at source ↗
Figure 6
Figure 6. Figure 6: Posterior uncertainty σ for all four source parameters as a function of secondary multiplicity Nsec, with points colored by true primary cosmic-ray composition. Each panel corresponds to one of the four inferred source parameters: Galactic longitude σglon , Galactic latitude σglat , source energy σE, and source distance σD. The model’s conditional design treating the EGMF parameters Brms and Lc as inputs r… view at source ↗
Figure 7
Figure 7. Figure 7: Example posterior corner plots for three representative events. Diagonal panels show marginal posterior densities for Esrc, Dsrc, glon and glat. Off-diagonal panels show two-dimensional joint density contours. Red crosses mark the true parameter values; black crosses indicate posterior means view at source ↗
Figure 8
Figure 8. Figure 8: Brms√ Lc for all four source parameters as a function of of secondary multiplicity Nsec, with points colored by posterior uncertainty σ. Each panel corresponds to one of the four inferred source parameters: Galactic longitude σglon , Galactic latitude σglat , source energy σE, and source distance σD. We thank L.Heinrich and A. Kofler for their valuable input during our discussions. N. Bourriche and N. Hart… view at source ↗
read the original abstract

The identification of ultra-high energy cosmic ray sources is one of the open challenges of high-energy astrophysics. As charged particles travel through the Universe, they are deflected by extragalactic magnetic fields and lose energy through interactions with background radiation, making source inference highly non-trivial. Existing approaches either rely on simplified propagation models or on computationally prohibitive Monte Carlo methods. Here we present a simulation-based inference framework trained on three-dimensional \texttt{CRPropa~3} propagation simulations that produces calibrated posterior distributions over source energy, distance, direction, and primary composition for individual UHECR events. The model combines a Deep Set encoder, handling the variable number of detected secondary particles, with a normalizing flow, and is trained on approximately 5 million simulated events covering a broad range of extragalactic magnetic field configurations. Validated on held-out simulations, all source parameters are recovered without systematic bias, with directional parameters best constrained and source distance most uncertain, consistent with the underlying propagation physics. Primary composition classification achieves $\geq$~98.2\% accuracy across all mass groups. This framework provides a scalable and physically interpretable interface between detailed propagation simulations and Bayesian source inference relevant for current UHECR data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents a simulation-based inference framework that uses a Deep Set encoder combined with a normalizing flow, trained on approximately 5 million CRPropa 3 three-dimensional propagation simulations, to produce posterior distributions over source energy, distance, direction, and primary composition for individual ultra-high energy cosmic ray events. It reports unbiased parameter recovery and primary composition classification accuracy of at least 98.2% on held-out simulations drawn from the same distribution.

Significance. If the learned posteriors remain calibrated when applied outside the training distribution, the method would offer a computationally scalable route to Bayesian source inference that incorporates detailed 3D propagation physics, addressing a key limitation of both simplified analytic models and per-event Monte Carlo approaches in UHECR astrophysics.

major comments (2)
  1. [Abstract / validation] Abstract and validation section: all reported performance (unbiased recovery of source parameters and ≥98.2% composition accuracy) is demonstrated exclusively on held-out draws from the identical CRPropa 3 simulation ensemble; no sensitivity tests to changes in extragalactic magnetic field turbulence spectra, source evolution assumptions, or hadronic interaction models are presented, which directly bears on whether the posteriors will remain calibrated for real detector data.
  2. [Methods] Methods section: quantitative details on the Deep Set architecture (number of layers, pooling operation, embedding dimension), the normalizing flow implementation, training procedure (optimizer, learning rate schedule, batch size), and the exact sampling ranges for EGMF parameters are not provided, preventing assessment of model capacity, reproducibility, and potential sensitivity to hyperparameter choices.
minor comments (2)
  1. [Abstract] The abstract states 'approximately 5 million simulated events' but the main text should give the precise count and the joint distribution over source parameters and EGMF configurations used for training.
  2. [Results / figures] Figure captions and text should explicitly state the metrics used to quantify 'no systematic bias' (e.g., posterior mean offset, coverage probability) and how directional constraints are measured (e.g., 68% credible interval solid angle).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below and indicate the revisions planned for the manuscript.

read point-by-point responses
  1. Referee: [Abstract / validation] Abstract and validation section: all reported performance (unbiased recovery of source parameters and ≥98.2% composition accuracy) is demonstrated exclusively on held-out draws from the identical CRPropa 3 simulation ensemble; no sensitivity tests to changes in extragalactic magnetic field turbulence spectra, source evolution assumptions, or hadronic interaction models are presented, which directly bears on whether the posteriors will remain calibrated for real detector data.

    Authors: We agree that all quantitative validation results are obtained on held-out events drawn from the same CRPropa 3 ensemble used for training, and that no explicit sensitivity tests to variations in EGMF turbulence spectra, source evolution, or hadronic interaction models are included. This is a genuine limitation when considering application to real detector data, where the true underlying physics may differ. The current work focuses on establishing the feasibility and calibration properties of the inference framework within a fixed, well-specified simulation model. In the revised manuscript we will add a new subsection to the Discussion that explicitly states this scope limitation, discusses the implications for posterior calibration on real events, and outlines the additional robustness studies required before deployment on observational data. revision: partial

  2. Referee: [Methods] Methods section: quantitative details on the Deep Set architecture (number of layers, pooling operation, embedding dimension), the normalizing flow implementation, training procedure (optimizer, learning rate schedule, batch size), and the exact sampling ranges for EGMF parameters are not provided, preventing assessment of model capacity, reproducibility, and potential sensitivity to hyperparameter choices.

    Authors: We acknowledge the omission of these implementation specifics. In the revised Methods section we will supply the missing quantitative information, including the exact Deep Set architecture (number of layers, pooling operation, embedding dimension), the normalizing flow configuration, the complete training protocol (optimizer, learning-rate schedule, batch size), and the precise sampling ranges employed for the EGMF parameters. These additions will enable readers to assess model capacity, reproduce the results, and evaluate sensitivity to hyperparameter choices. revision: yes

Circularity Check

0 steps flagged

No circularity: posteriors learned and validated on independent simulation draws

full rationale

The paper trains a Deep Set + normalizing flow model on ~5M CRPropa 3 events and evaluates recovery, bias, and composition accuracy exclusively on held-out draws from the identical simulation distribution. This is standard supervised validation within the training measure; the reported metrics (unbiased parameter recovery, ≥98.2% composition accuracy) are empirical test-set statistics, not quantities that reduce by construction to the training inputs or to any self-citation. No load-bearing step equates a claimed result to a fitted parameter or prior-work ansatz. The framework is self-contained against its stated simulation benchmark.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework depends on the fidelity of the CRPropa 3 propagation model and on the training distribution covering the relevant range of extragalactic magnetic fields and source parameters.

axioms (1)
  • domain assumption CRPropa 3 simulations with the chosen magnetic-field realizations accurately represent the dominant propagation effects for UHECRs.
    All training and validation data are generated from these simulations; any mismatch with reality directly limits the reliability of the learned posteriors.

pith-pipeline@v0.9.0 · 5513 in / 1303 out tokens · 50861 ms · 2026-05-09T18:23:12.257848+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 28 canonical work pages

  1. [1]

    2017 , title =

    Aab, A and Abreu, P and Aglietta, M and Samarai, I Al and Albuquerque, I F M and Allekotte, I and Almela, A and Castillo, J Alvarez and Alvarez-Muñiz, J and Anastasi, G A and Anchordoqui, L and Andrada, B and Andringa, S and Aramo, C and Arqueros, F and Arsene, N and Asorey, H and Assis, P and Aublin, J and Avila, G and Badescu, A M and Balaceanu, A and L...

  2. [2]

    ATL-PHYS-PUB-2020-014 , author=

    Deep Sets based Neural Networks for Impact Parameter Flavour Tagging in ATLAS , url=. ATL-PHYS-PUB-2020-014 , author=

  3. [3]

    2016 , eprint=

    Layer Normalization , author=. 2016 , eprint=

  4. [4]

    Beyond the Local Void: A Data-driven Search for the Origins of the Amaterasu Particle , volume=

    Bourriche, Nadine and Capel, Francesca , year=. Beyond the Local Void: A Data-driven Search for the Origins of the Amaterasu Particle , volume=. The Astrophysical Journal , publisher=. doi:10.3847/1538-4357/ae2c89 , number=

  5. [5]

    and Weber, Fridolin and Whiteson, Daniel , year=

    Brandes, Len and Modi, Chirag and Ghosh, Aishik and Farrell, Delaney and Lindblom, Lee and Heinrich, Lukas and Steiner, Andrew W. and Weber, Fridolin and Whiteson, Daniel , year=. Neural simulation-based inference of the neutron star equation of state directly from telescope spectra , volume=. Journal of Cosmology and Astroparticle Physics , publisher=. d...

  6. [6]

    2016 , eprint=

    Approximating Likelihood Ratios with Calibrated Discriminative Classifiers , author=. 2016 , eprint=

  7. [7]

    2017 , eprint=

    Density estimation using Real NVP , author=. 2017 , eprint=

  8. [8]

    2019 , eprint=

    Neural Spline Flows , author=. 2019 , eprint=

  9. [9]

    Greisen, Phys

    Greisen, Kenneth , journal =. 1966 , title =. doi:10.1103/physrevlett.16.748 , pages =

  10. [10]

    2024 , eprint=

    Hierarchical Neural Simulation-Based Inference Over Event Ensembles , author=. 2024 , eprint=

  11. [11]

    Monthly Notices of the Royal Astronomical Society , volume =

    Hackstein, S and Vazza, F and Brüggen, M and Sorce, J G and Gottlöber, S , title = ". Monthly Notices of the Royal Astronomical Society , volume =. 2018 , month =. doi:10.1093/mnras/stx3354 , url =

  12. [12]

    , keywords =

    New constraints on the magnetic field in cosmic web filaments⋆ , DOI= "10.1051/0004-6361/202140526", url= "https://doi.org/10.1051/0004-6361/202140526", journal =

  13. [13]

    Journal of High Energy Physics , volume =

    Komiske, Patrick T. and Metodiev, Eric M. and Thaler, Jesse , year=. Energy flow networks: deep sets for particle jets , volume=. Journal of High Energy Physics , publisher=. doi:10.1007/jhep01(2019)121 , number=

  14. [14]

    2011 , title =

    Kotera, Kumiko and Olinto, Angela V , journal =. 2011 , title =. doi:10.1146/annurev-astro-081710-102620 , eprint =

  15. [15]

    2019 , eprint=

    Neural Density Estimation and Likelihood-free Inference , author=. 2019 , eprint=

  16. [16]

    2021 , eprint=

    Normalizing Flows for Probabilistic Modeling and Inference , author=. 2021 , eprint=

  17. [17]

    2018 , eprint=

    Deep Sets , author=. 2018 , eprint=

  18. [18]

    Alves Batista, et al., CRPropa 3.2 — an advanced framework for high-energy particle propagation in extra- galactic and galactic spaces, JCAP 2022 (09) (2022) 035

    Batista, Rafael Alves and Becker Tjus, Julia and D. Journal of Cosmology and Astroparticle Physics , year =. doi:10.1088/1475-7516/2022/09/035 , eprint =

  19. [19]

    and Petrera, Sergio and Salamida, Francesco , title =

    Aloisio, Roberto and Boncioli, Denise and di Matteo, Armando and Grillo, Aurelio F. and Petrera, Sergio and Salamida, Francesco , title =. 2017 , month =. doi:10.1088/1475-7516/2017/11/009 , url =

  20. [20]

    2019 , month =

    Heinze, Jonas and Fedynitch, Anatoli and Boncioli, Denise and Winter, Walter , title =. 2019 , month =. doi:10.3847/1538-4357/ab05ce , url =

  21. [21]

    2026 , pages =

    Morejon, Leonel and Kampert, Karl-Heinz , journal =. 2026 , pages =. doi:10.1051/0004-6361/202557405 , url =

  22. [22]

    Coleman and J

    A. Coleman and J. Eser and E. Mayotte and F. Sarazin and F.G. Schröder and D. Soldin and T.M. Venters and R. Aloisio and J. Alvarez-Muñiz and R. Ultra high energy cosmic rays The intersection of the Cosmic and Energy Frontiers , journal =. 2023 , issn =. doi:https://doi.org/10.1016/j.astropartphys.2023.102819 , url =

  23. [23]

    Cosmic cartography with UHECRs: Source constraints from individual events at the highest energies

    Bourriche, Nadine and Capel, Francesca. Cosmic cartography with UHECRs: Source constraints from individual events at the highest energies. PoS. doi:10.22323/1.444.0362

  24. [24]

    Proceedings of the National Academy of Sciences , volume =

    Kyle Cranmer and Johann Brehmer and Gilles Louppe , title =. Proceedings of the National Academy of Sciences , volume =. 2020 , doi =

  25. [25]

    Inference of cosmic-ray source properties by conditional invertible neural networks , url =

    Bister, Teresa and Erdmann, Martin and K. Inference of cosmic-ray source properties by conditional invertible neural networks , url =. The European Physical Journal C , number =. 2022 , bdsk-url-1 =. doi:10.1140/epjc/s10052-022-10138-x , id =

  26. [26]

    Physical Review Letters127(24) (2021) https://doi.org/10.1103/physrevlett.127.241103

    Dax, Maximilian and Green, Stephen R. and Gair, Jonathan and Macke, Jakob H. and Buonanno, Alessandra and Schölkopf, Bernhard , journal =. 2021 , title =. doi:10.1103/physrevlett.127.241103 , pmid =. 2106.12594 , abstract =

  27. [27]

    2025 , eprint=

    Simulation-Based Inference: A Practical Guide , author=. 2025 , eprint=

  28. [28]

    Simulation-based inference for direction reconstruction of ultrahigh-energy cosmic rays with radio arrays , author =. Phys. Rev. D , volume =. 2026 , month =. doi:10.1103/j77n-1pl3 , url =

  29. [29]
  30. [30]

    Jasche, A

    van Vliet, Arjen and Jasche, Jens and Rachen, Jörg P. Targeting Earth: CRPropa learns to aim. PoS. doi:10.22323/1.358.0447

  31. [31]

    Proceedings of 37th International Cosmic Ray Conference —

    Guido, Eleonora and Collaboration, Pierre Auger and Abreu, Pedro and Aglietta, Marco and Albury, Justin M and Allekotte, Ingomar and Almela, Alejandro and Alvarez-Muñiz, Jaime and Batista, Rafael Alves and Anastasi, Gioacchino Alex and Anchordoqui, Luis A and Andrada, Belén and Andringa, Sofia and Aramo, Carla and Ferreira, Paulo Ricardo Araújo and Velazq...

  32. [32]

    , journal =

    Muzio, Marco Stein and Unger, Michael and Farrar, Glennys R. , journal =. 2019 , title =. doi:10.1103/physrevd.100.103008 , eprint =

  33. [33]

    2023 , title =

    Abbasi, R U and Allen, M G and Arimura, R and Belz, J W and Bergman, D R and Blake, S A and Shin, B K and Buckland, I J and Cheon, B G and Fujii, T and Fujisue, K and Fujita, K and Fukushima, M and Furlich, G D and Gerber, Z R and Globus, N and Hibino, K and Higuchi, R and Honda, K and Ikeda, D and Ito, H and Iwasaki, A and Jeong, S and Jeong, H M and Jui...

  34. [34]

    Journal of Machine Learning Research , year =

    Nitish Srivastava and Geoffrey Hinton and Alex Krizhevsky and Ilya Sutskever and Ruslan Salakhutdinov , title =. Journal of Machine Learning Research , year =

  35. [35]

    2018 , eprint =

    Talts, Sean and Betancourt, Michael and Simpson, Daniel and Vehtari, Aki and Gelman, Andrew , title =. 2018 , eprint =

  36. [36]

    2017 , eprint =

    Arjovsky, Martin and Chintala, Soumith and Bottou, Léon , title =. 2017 , eprint =

  37. [37]
  38. [38]

    2021 , eprint=

    Neural Empirical Bayes: Source Distribution Estimation and its Applications to Simulation-Based Inference , author=. 2021 , eprint=

  39. [39]

    Automatic Posterior Transformation for Likelihood-Free Inference

    Automatic Posterior Transformation for Likelihood-Free Inference , author=. 2019 , journal=. 1905.07488 , archivePrefix=

  40. [40]

    2017 , journal=

    Flexible statistical inference for mechanistic models of neural dynamics , author=. 2017 , journal=. 1711.01861 , archivePrefix=

  41. [41]

    2025 , eprint=

    Flexible Gravitational-Wave Parameter Estimation with Transformers , author=. 2025 , eprint=

  42. [42]

    Advances in neural information processing systems , year=

    Fast -free Inference of Simulation Models with Bayesian Conditional Density Estimation , author=. Advances in neural information processing systems , year=. 1605.06376 , archivePrefix=

  43. [43]

    Advances in Neural Information Processing Systems 32 , editor =

    PyTorch: An Imperative Style, High-Performance Deep Learning Library , author =. Advances in Neural Information Processing Systems 32 , editor =. 2019 , publisher =

  44. [44]

    2026 , title =

    Heyer, Nils and Glaser, Christian and Glüsenkamp, Thorsten and Ravn, Martin , journal =. 2026 , title =. doi:10.1140/epjc/s10052-026-15424-6 , eprint =

  45. [45]

    R. U. Abbasi and others , title =. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment , volume =. 2021 , issn =. doi:https://doi.org/10.1016/j.nima.2021.165726 , url =