pith. sign in

arxiv: 2508.00049 · v2 · submitted 2025-07-31 · 🌌 astro-ph.CO · astro-ph.IM· cs.CV

Segmenting proto-halos with vision transformers

Pith reviewed 2026-05-19 01:41 UTC · model grok-4.3

classification 🌌 astro-ph.CO astro-ph.IMcs.CV
keywords proto-halo segmentationvision transformerssemantic segmentationdark matter halosinitial density fieldN-body simulationshalo mass classificationcosmological structure formation
0
0 comments X p. Extension

The pith

Vision transformers segment proto-halos in the initial density field with sub-percent accuracy in total mass per class.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether deep learning can classify patches of the early universe's density field by the mass of the dark-matter halos that will form from them by the present day. It trains a U-Net style transformer and a convolutional network on outputs from N-body simulations, then measures how well each recovers the final halo masses and spatial boundaries. The transformer recovers the total mass in each mass bin to sub-percent accuracy and traces boundaries more faithfully than the convolutional network or the PINOCCHIO code based on perturbation theory. This approach matters because it offers a direct, data-driven route from linear initial conditions to the nonlinear end products of structure formation without running a full simulation for every new volume or cosmology.

Core claim

The transformer-based network significantly outperforms the CNN across all metrics, achieving sub-percent error in the total segmented mass per halo class and much higher accuracy than the perturbation-theory-based model PINOCCHIO, especially at low halo masses and in the detailed reconstruction of proto-halo boundaries.

What carries the argument

U-Net transformer that performs semantic segmentation on the initial density field (and optionally the tidal shear) to assign each region a final halo-mass label at z=0.

If this is right

  • Sub-percent accuracy is reached in the total mass assigned to each halo-mass bin.
  • Low-mass proto-halos and their boundaries are recovered more accurately than with PINOCCHIO.
  • Combining density and tidal-shear inputs improves performance over density alone.
  • Grad-CAM maps reveal which features the network uses to make its class assignments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could generate approximate halo catalogs for volumes too large for direct simulation.
  • Retraining or fine-tuning on a wider range of resolutions would test how far the learned mapping generalizes.
  • The same segmentation logic might be applied to predict other final properties such as halo concentration or spin.

Load-bearing premise

The mapping from the initial density field to final z=0 halo masses is consistent enough across simulations that a network trained on a finite set can be applied to new cosmologies or resolutions without large errors.

What would settle it

Running the trained transformer on an independent N-body simulation with different resolution or cosmological parameters and finding that the error in segmented mass per class rises well above one percent.

read the original abstract

The formation of dark-matter halos from small cosmological perturbations generated in the early universe is a highly non-linear process typically modeled through N-body simulations. In this work, we explore the use of deep learning to segment and classify proto-halo regions in the initial density field according to their final halo mass at redshift z=0. We compare two architectures: a fully convolutional neural network (CNN) based on the V-Net design and a U-Net transformer. We find that the transformer-based network significantly outperforms the CNN across all metrics, achieving sub-percent error in the total segmented mass per halo class. Both networks deliver much higher accuracy than the perturbation-theory-based model \textsc{pinocchio}, especially at low halo masses and in the detailed reconstruction of proto-halo boundaries. We also investigate the impact of different input features by training models on the density field, the tidal shear, and their combination. Finally, we use Grad-CAM to generate class-activation heatmaps for the CNN, providing preliminary yet suggestive insights into how the network exploits the input fields.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper explores deep learning for segmenting proto-halo regions in the initial cosmological density field according to final z=0 halo mass. It compares a V-Net CNN and a U-Net transformer, finding that the transformer outperforms the CNN with sub-percent error in total segmented mass per halo class and higher accuracy than the PINOCCHIO model, especially at low masses and for boundary reconstruction. The study also tests input features (density field, tidal shear, and their combination) and applies Grad-CAM for interpretability on the CNN.

Significance. If the empirical gains hold under broader testing, the work could support more efficient halo property prediction from initial conditions, complementing N-body simulations and perturbation-theory codes like PINOCCHIO. Strengths include the direct architecture comparison, feature ablation on density versus shear, and the attempt at model interpretability via Grad-CAM. The transformer results on boundary detail and low-mass performance are noteworthy if reproducible.

major comments (2)
  1. Methods: The abstract and methods description provide no information on training/validation splits, hyperparameter search, or overfitting checks. Without these, it is difficult to assess whether the reported sub-percent mass errors and outperformance metrics are robust or potentially inflated by in-distribution memorization.
  2. Results: All quantitative claims (transformer vs. CNN and vs. PINOCCHIO) are shown only on held-out volumes from the same N-body simulation suite. No cross-cosmology, cross-resolution, or cross-power-spectrum validation is presented, which is load-bearing for the practical claim that the learned mapping generalizes beyond the training distribution.
minor comments (2)
  1. Clarify the exact binning or definition of halo mass classes used for segmentation, as this choice directly affects the reported per-class mass errors.
  2. The Grad-CAM analysis is performed only on the CNN; extending similar interpretability to the transformer would strengthen the comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for the constructive comments, which have helped us improve the clarity and robustness of the presentation. We respond to each major comment below and indicate the revisions made to the manuscript.

read point-by-point responses
  1. Referee: Methods: The abstract and methods description provide no information on training/validation splits, hyperparameter search, or overfitting checks. Without these, it is difficult to assess whether the reported sub-percent mass errors and outperformance metrics are robust or potentially inflated by in-distribution memorization.

    Authors: We agree that these details are necessary for a full assessment of the results. The original manuscript was insufficiently explicit on this point. In the revised version we have added a dedicated subsection to the Methods section that specifies the training/validation/test partitioning (distinct simulation volumes with no spatial overlap, allocated in an 80/10/10 ratio), the hyperparameter search (grid search over learning rate, batch size, and network depth with 5-fold cross-validation on the training set), and the overfitting safeguards (dropout, weight decay, and early stopping based on validation loss). We have also included training and validation loss curves in the supplementary material to demonstrate that the models converged without signs of memorization. revision: yes

  2. Referee: Results: All quantitative claims (transformer vs. CNN and vs. PINOCCHIO) are shown only on held-out volumes from the same N-body simulation suite. No cross-cosmology, cross-resolution, or cross-power-spectrum validation is presented, which is load-bearing for the practical claim that the learned mapping generalizes beyond the training distribution.

    Authors: The referee is correct that all reported metrics are obtained on held-out volumes drawn from the same simulation suite. This constitutes an in-distribution test rather than a demonstration of generalization across cosmologies or resolutions. We have therefore revised the abstract, the final paragraph of the Results section, and the Discussion to state the scope of the claims more precisely and to explicitly note the absence of cross-simulation validation as a limitation. We have also added a short paragraph outlining planned future work on this topic. Because performing new suites of simulations with varied cosmologies lies outside the scope of the present study, we have not added new numerical results of that kind. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical ML segmentation results on held-out simulations

full rationale

The paper reports empirical performance metrics from training and testing vision transformer and CNN models on N-body simulation volumes for proto-halo segmentation. Claims of sub-percent mass error and outperformance versus PINOCCHIO are direct measurements on held-out test patches, not reductions of any derived quantity to its own inputs by construction. No equations, self-citations, or ansatzes are invoked to force results; the mapping is learned from data without load-bearing self-referential steps. This is the standard non-circular outcome for a data-driven ML application paper.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The work rests on the assumption that N-body simulations provide ground-truth labels and that the initial density field contains sufficient information for the network to learn the mapping. No new physical entities are postulated.

free parameters (1)
  • network hyperparameters and training schedule
    Chosen to optimize segmentation metrics on the training simulations; exact values not reported in abstract.
axioms (1)
  • domain assumption Final halo mass at z=0 is a deterministic function of the initial density and tidal fields within the resolution of the simulation.
    Implicit in the choice to train segmentation models on N-body outputs.

pith-pipeline@v0.9.0 · 5712 in / 1381 out tokens · 45005 ms · 2026-05-19T01:41:09.810841+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · 26 internal anchors

  1. [1]

    White and M.J

    S.D.M. White and M.J. Rees,Core condensation in heavy halos: a two-stage theory for galaxy formation and clustering., Monthly Notices of the Royal Astronomical Society183 (1978) 341

  2. [2]

    Testing Tidal-Torque Theory: I. Spin Amplitude and Direction

    C. Porciani, A. Dekel and Y. Hoffman,Testing tidal-torque theory - I. Spin amplitude and direction, Monthly Notices of the Royal Astronomical Society332 (2002) 325 [astro-ph/0105123]

  3. [3]

    Testing Tidal-Torque Theory: II. Alignment of Inertia and Shear and the Characteristics of Proto-haloes

    C. Porciani, A. Dekel and Y. Hoffman,Testing tidal-torque theory - II. Alignment of inertia and shear and the characteristics of protohaloes, Monthly Notices of the Royal Astronomical Society 332 (2002) 339 [astro-ph/0105165]

  4. [4]

    Cosmic Shear from Galaxy Spins

    J. Lee and U.-L. Pen,Cosmic Shear from Galaxy Spins, The Astrophysical Journal Letters532 (2000) L5 [astro-ph/9911328]

  5. [5]

    The Peaks Formalism and the Formation of Cold Dark Matter Haloes

    A.D. Ludlow and C. Porciani,The peaks formalism and the formation of cold dark matter haloes, Monthly Notices of the Royal Astronomical Society413 (2011) 1961 [1011.2493]

  6. [6]

    Ellipsoidal halo finders and implications for models of triaxial halo formation

    G. Despali, G. Tormen and R.K. Sheth,Ellipsoidal halo finders and implications for models of triaxial halo formation, Monthly Notices of the Royal Astronomical Society431 (2013) 1143 [1212.4157]

  7. [7]

    Ludlow, M

    A.D. Ludlow, M. Borzyszkowski and C. Porciani,The formation of CDM haloes - I. Collapse thresholds and the ellipsoidal collapse model, Monthly Notices of the Royal Astronomical Society 445 (2014) 4110

  8. [8]

    The formation of CDM haloes II: collapse time and tides

    M. Borzyszkowski, A.D. Ludlow and C. Porciani,The formation of cold dark matter haloes - II. Collapse time and tides, Monthly Notices of the Royal Astronomical Society445 (2014) 4124 [1405.7367]

  9. [9]

    Katz and S.D.M

    N. Katz and S.D.M. White,Hierarchical Galaxy Formation: Overmerging and the Formation of an X-Ray Cluster, The Astrophysical Journal412 (1993) 455

  10. [10]

    Multiscale Gaussian Random Fields for Cosmological Simulations

    E. Bertschinger,Multiscale Gaussian Random Fields and Their Application to Cosmological Simulations, The Astrophysical Journal Supplement Series137 (2001) 1 [astro-ph/0103301]

  11. [11]

    Multi-scale initial conditions for cosmological simulations

    O. Hahn and T. Abel,Multi-scale initial conditions for cosmological simulations, Monthly Notices of the Royal Astronomical Society415 (2011) 2101 [1103.6031]

  12. [12]

    Kaiser,On the spatial correlations of Abell clusters., The Astrophysical Journal Letters284 (1984) L9

    N. Kaiser,On the spatial correlations of Abell clusters., The Astrophysical Journal Letters284 (1984) L9

  13. [13]

    Two Ways of Biasing Galaxy Formation

    P. Catelan, C. Porciani and M. Kamionkowski,Two ways of biasing galaxy formation, Monthly Notices of the Royal Astronomical Society318 (2000) L39 [astro-ph/0005544]

  14. [14]

    Nonlinear perturbation theory with halo bias and redshift-space distortions via the Lagrangian picture

    T. Matsubara,Nonlinear perturbation theory with halo bias and redshift-space distortions via the Lagrangian picture, Physical Review D78 (2008) 083519 [0807.1733]

  15. [15]

    Z. Vlah, E. Castorina and M. White,The Gaussian streaming model and convolution Lagrangian effective field theory, JCAP 2016 (2016) 007 [1609.02908]

  16. [16]

    C. Modi, E. Castorina and U. Seljak,Halo bias in Lagrangian space: estimators and theoretical predictions, Monthly Notices of the Royal Astronomical Society472 (2017) 3959 [1612.01621]

  17. [17]

    Zennaro, R.E

    M. Zennaro, R.E. Angulo, S. Contreras, M. Pellejero-Ibáñez and F. Maion,Priors on Lagrangian bias parameters from galaxy formation modelling, Monthly Notices of the Royal Astronomical Society 514 (2022) 5443 [2110.05408]

  18. [18]

    Bond and S.T

    J.R. Bond and S.T. Myers,The Peak-Patch Picture of Cosmic Catalogs. I. Algorithms, The Astrophysical Journal Supplement Series103 (1996) 1

  19. [19]

    Approximate methods for the generation of dark matter halo catalogs in the age of precision cosmology

    P. Monaco,Approximate Methods for the Generation of Dark Matter Halo Catalogs in the Age of Precision Cosmology, Galaxies 4 (2016) 53 [1605.07752]. – 34 –

  20. [20]

    Gunn and J.R

    J.E. Gunn and J.R. Gott, III,On the Infall of Matter Into Clusters of Galaxies and Some Effects on Their Evolution, The Astrophysical Journal176 (1972) 1

  21. [21]

    Bardeen, J.R

    J.M. Bardeen, J.R. Bond, N. Kaiser and A.S. Szalay,The Statistics of Peaks of Gaussian Random Fields, The Astrophysical Journal304 (1986) 15

  22. [22]

    The locations of halo formation and the peaks formalism

    O. Hahn and A. Paranjape,The locations of halo formation and the peaks formalism, Monthly Notices of the Royal Astronomical Society438 (2014) 878 [1308.4142]

  23. [23]

    Press and P

    W.H. Press and P. Schechter,Formation of Galaxies and Clusters of Galaxies by Self-Similar Gravitational Condensation, The Astrophysical Journal187 (1974) 425

  24. [24]

    J.R. Bond, S. Cole, G. Efstathiou and N. Kaiser,Excursion Set Mass Functions for Hierarchical Gaussian Fluctuations, The Astrophysical Journal379 (1991) 440

  25. [25]

    The Excursion Set Theory of Halo Mass Functions, Halo Clustering, and Halo Growth

    A.R. Zentner,The Excursion Set Theory of Halo Mass Functions, Halo Clustering, and Halo Growth, International Journal of Modern Physics D16 (2007) 763 [astro-ph/0611454]

  26. [26]

    Ellipsoidal collapse and an improved model for the number and spatial distribution of dark matter haloes

    R.K. Sheth, H.J. Mo and G. Tormen,Ellipsoidal collapse and an improved model for the number and spatial distribution of dark matter haloes, Monthly Notices of the Royal Astronomical Society 323 (2001) 1 [astro-ph/9907024]

  27. [27]

    Borzyszkowski, C

    M. Borzyszkowski, C. Porciani, E. Romano-Díaz and E. Garaldi,Zomg – i. how the cosmic web inhibits halo growth and generates assembly bias, Monthly Notices of the Royal Astronomical Society 469 (2017) 594–611

  28. [28]

    The Halo Mass Function from Excursion Set Theory. II. The Diffusing Barrier

    M. Maggiore and A. Riotto,The Halo mass function from Excursion Set Theory. II. The Diffusing Barrier, The Astrophysical Journal717 (2010) 515 [0903.1250]

  29. [29]

    Collapse Barriers and Halo Abundance: Testing the Excursion Set Ansatz

    B.E. Robertson, A.V. Kravtsov, J. Tinker and A.R. Zentner,Collapse Barriers and Halo Abundance: Testing the Excursion Set Ansatz, The Astrophysical Journal696 (2009) 636 [0812.3148]

  30. [30]

    The spatial and velocity bias of linear density peaks and proto-haloes in the Lambda cold dark matter cosmology

    A. Elia, A.D. Ludlow and C. Porciani,The spatial and velocity bias of linear density peaks and protohaloes in theΛ cold dark matter cosmology, Monthly Notices of the Royal Astronomical Society 421 (2012) 3472 [1111.4211]

  31. [31]

    Formation and Evolution of Galaxies: Les Houches Lectures

    S.D.M. White,Formation and Evolution of Galaxies: Les Houches Lectures, arXiv e-prints (1994) astro [astro-ph/9410043]

  32. [32]

    Peaks theory and the excursion set approach

    A. Paranjape and R.K. Sheth,Peaks theory and the excursion set approach, Monthly Notices of the Royal Astronomical Society426 (2012) 2789 [1206.3506]

  33. [33]

    Salvador-Solé and A

    E. Salvador-Solé and A. Manrique,Culminating the Peak Cusp to Descry the Dark Side of Halos, The Astrophysical Journal914 (2021) 141 [2104.07318]

  34. [34]

    Musso and R.K

    M. Musso and R.K. Sheth,Excursion set peaks in energy as a model for haloes, Monthly Notices of the Royal Astronomical Society508 (2021) 3634 [1907.09147]

  35. [35]

    The mass-Peak Patch algorithm for fast generation of deep all-sky dark matter halo catalogues and its N-Body validation

    G. Stein, M.A. Alvarez and J.R. Bond,The mass-Peak Patch algorithm for fast generation of deep all-sky dark matter halo catalogues and its N-body validation, Monthly Notices of the Royal Astronomical Society 483 (2019) 2236 [1810.07727]

  36. [36]

    Monaco, T

    P. Monaco, T. Theuns and G. Taffoni,Pinocchio: pinpointing orbit-crossing collapsed hierarchical objects in a linear density field, Monthly Notices of the Royal Astronomical Society 331 (2002) 587–608

  37. [37]

    Monaco, T

    P. Monaco, T. Theuns, G. Taffoni, F. Governato, T. Quinn and J. Stadel,Predicting the number, spatial distribution, and merging history of dark matter halos, The Astrophysical Journal 564 (2002) 8

  38. [38]

    Machine learning cosmological structure formation

    L. Lucie-Smith, H.V. Peiris, A. Pontzen and M. Lochner,Machine learning cosmological structure formation, Monthly Notices of the Royal Astronomical Society479 (2018) 3405 [1802.04271]. – 35 –

  39. [39]

    Breiman,Random forests, Machine Learning 45 (2001) 5

    L. Breiman,Random forests, Machine Learning 45 (2001) 5

  40. [40]

    Chacón, J.A

    J. Chacón, J.A. Vázquez and E. Almaraz,Classification algorithms applied to structure formation simulations, Astronomy and Computing38 (2022) 100527 [2106.06587]

  41. [41]

    Betts, C

    J.C. Betts, C. van de Bruck, C. Arnold and B. Li,Machine learning and structure formation in modified gravity, Monthly Notices of the Royal Astronomical Society526 (2023) 4148 [2305.02122]

  42. [42]

    Freund and R.E

    Y. Freund and R.E. Schapire,A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences55 (1997) 119

  43. [43]

    Friedman,Greedy function approximation: A gradient boosting machine., The Annals of Statistics 29 (2001) 1189

    J.H. Friedman,Greedy function approximation: A gradient boosting machine., The Annals of Statistics 29 (2001) 1189

  44. [44]

    Friedman,Stochastic gradient boosting, Computational Statistics & Data Analysis38 (2002) 367

    J.H. Friedman,Stochastic gradient boosting, Computational Statistics & Data Analysis38 (2002) 367

  45. [45]

    Lucie-Smith, H.V

    L. Lucie-Smith, H.V. Peiris and A. Pontzen,An interpretable machine-learning framework for dark matter halo formation, Monthly Notices of the Royal Astronomical Society490 (2019) 331 [1906.06339]

  46. [46]

    Lecun, L

    Y. Lecun, L. Bottou, Y. Bengio and P. Haffner,Gradient-based learning applied to document recognition, Proceedings of the IEEE86 (1998) 2278

  47. [47]

    A volumetric deep Convolutional Neural Network for simulation of mock dark matter halo catalogues

    P. Berger and G. Stein,A volumetric deep Convolutional Neural Network for simulation of mock dark matter halo catalogues, Monthly Notices of the Royal Astronomical Society482 (2019) 2861 [1805.04537]

  48. [48]

    Bernardini, L

    M. Bernardini, L. Mayer, D. Reed and R. Feldmann,Predicting dark matter halo formation in N-body simulations with deep regression networks, Monthly Notices of the Royal Astronomical Society 496 (2020) 5116 [1912.04299]

  49. [49]

    Lucie-Smith, H.V

    L. Lucie-Smith, H.V. Peiris, A. Pontzen, B. Nord and J. Thiyagalingam,Deep learning insights into cosmological structure formation, Physical Review D109 (2024) 063524

  50. [50]

    López-Cano, J

    D. López-Cano, J. Stücker, M. Pellejero Ibañez, R.E. Angulo and D. Franco-Barranco, Characterizing structure formation through instance segmentation, Astronomy & Astrophysics 685 (2024) A37 [2311.12110]

  51. [51]

    Milletari, N

    F. Milletari, N. Navab and S.-A. Ahmadi,V-net: Fully convolutional neural networks for volumetric medical image segmentation, in2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571, 2016, DOI

  52. [52]

    Lucie-Smith, A

    L. Lucie-Smith, A. Barreira and F. Schmidt,Halo assembly bias from a deep learning model of halo formation, Monthly Notices of the Royal Astronomical Society524 (2023) 1746 [2304.09880]

  53. [53]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez et al.,Attention is all you need, inAdvances in Neural Information Processing Systems, I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan et al., eds., vol. 30, Curran Associates, Inc., 2017

  54. [54]

    Hatamizadeh, Y

    A. Hatamizadeh, Y. Tang, V. Nath, D. Yang, A. Myronenko, B. Landman et al.,Unetr: Transformers for 3d medical image segmentation, in2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 1748–1758, 2022, DOI

  55. [55]

    Springel,The cosmological simulation code GADGET-2, Monthly Notices of the Royal Astronomical Society 364 (2005) 1105

    V. Springel,The cosmological simulation code GADGET-2, Monthly Notices of the Royal Astronomical Society 364 (2005) 1105

  56. [56]

    Planck Collaboration,Planck 2018 results. VI. Cosmological parameters, Astronomy & Astrophysics 641 (2020) A6 [1807.06209]. – 36 –

  57. [57]

    Hahn and T

    O. Hahn and T. Abel,Multi-scale initial conditions for cosmological simulations, Monthly Notices of the Royal Astronomical Society415 (2011) 2101

  58. [58]

    Knollmann and A

    S.R. Knollmann and A. Knebe,AHF: AMIGA's HALO FINDER, The Astrophysical Journal Supplement Series 182 (2009) 608

  59. [59]

    K. He, X. Zhang, S. Ren and J. Sun,Deep residual learning for image recognition, in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016

  60. [60]

    Ulyanov, A

    D. Ulyanov, A. Vedaldi and V. Lempitsky,Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis, in2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4105–4113, 2017, DOI

  61. [61]

    Nair and G.E

    V. Nair and G.E. Hinton,Rectified linear units improve restricted boltzmann machines, in Proceedings of the 27th International Conference on Machine Learning (ICML), pp. 807–814, Omnipress, 2010, http://www.icml2010.org/papers/304.pdf

  62. [62]

    J. Bridle,Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters, inAdvances in Neural Information Processing Systems, D. Touretzky, ed., vol. 2, Morgan-Kaufmann, 1989

  63. [63]

    Gaussian Error Linear Units (GELUs)

    D. Hendrycks and K. Gimpel,Gaussian error linear units (gelus), arXiv preprint arXiv:1606.08415 (2016)

  64. [64]

    K. Zou, S. Warfield, A. Bharatha, C. Tempany, M. Kaus, S. Haker et al.,Statistical validation of image segmentation quality based on a spatial overlap index, Academic radiology11 (2004) 178

  65. [65]

    Jadon,A survey of loss functions for semantic segmentation, in2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp

    S. Jadon,A survey of loss functions for semantic segmentation, in2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 1–7, 2020, DOI

  66. [66]

    Sudre, W

    C.H. Sudre, W. Li, T. Vercauteren, S. Ourselin and M.J. Cardoso,Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations, inDeep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, vol. 10553 ofLecture Notes in Computer Science, pp. 240–248, Springer, 2017, DOI

  67. [67]

    Kingma and J

    D.P. Kingma and J. Ba,Adam: A method for stochastic optimization, in3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun, eds., 2015

  68. [68]

    Loshchilov and F

    I. Loshchilov and F. Hutter,Decoupled weight decay regularization, inInternational Conference on Learning Representations, 2019

  69. [69]

    Selvaraju, M

    R.R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh and D. Batra,Grad-cam: Visual explanations from deep networks via gradient-based localization, inProceedings of the IEEE International Conference on Computer Vision (ICCV), Oct, 2017

  70. [70]

    Tenachi, R

    W. Tenachi, R. Ibata and F.I. Diakogiannis,Deep symbolic regression for physics guided by units constraints: Toward the automated discovery of physical laws, The Astrophysical Journal 959 (2023) 99. – 37 –