arxiv: 2603.09540 · v2 · submitted 2026-03-10 · 🌌 astro-ph.GA

Recognition: 2 theorem links

· Lean Theorem

Stellar age determination using deep neural networks: Isochrone ages for 1.3 million stars, based on BaSTI, MIST, PARSEC, Dartmouth and SYCLIST evolutionary grids

T. Boin , L. Casamiquela , M. Haywood , P. Di Matteo , Y. Lebreton , M. Uddin , D.R. Reese

Authors on Pith no claims yet

Pith reviewed 2026-05-15 13:13 UTC · model grok-4.3

classification 🌌 astro-ph.GA

keywords stellar agesneural networksisochrone fittinggalactic archaeologystellar evolution modelsspectroscopic surveysdeep learning

0 comments

The pith

Neural networks trained on stellar evolution grids recover Bayesian ages for 1.3 million stars at 60,000 times lower cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper trains multilayer perceptrons on multiple sets of stellar evolutionary tracks to predict age directly from metallicity, absolute G magnitude, and Gaia color. The networks are applied to combined Gaia and spectroscopic data from LAMOST, GALAH, and APOGEE, producing ages for more than 1.3 million stars across five different model grids. When tested on the same grid used by a Bayesian code, the networks return essentially identical ages while requiring far less computation. The approach is also validated on 14 star clusters, showing a median absolute deviation of 0.20 Gyr from literature values.

Core claim

We train multilayer perceptrons on stellar evolutionary grids to map [M/H], MG, and (GBP - GRP) to stellar age. When the identical grid is used, the networks retrieve the same ages as a Bayesian isochrone code such as SPInS, but with a 60,000-fold reduction in computation time per star. The method is run on LAMOST DR10, GALAH DR3/DR4, and APOGEE DR17 to generate ages for 1.3 million stars and is shown to reproduce literature ages for 13 open clusters plus one globular cluster within a median absolute deviation of 0.20 Gyr.

What carries the argument

Multilayer perceptrons trained on evolutionary grids to map metallicity, absolute magnitude, and color directly to age.

If this is right

Age catalogs for millions of stars become routine, enabling statistical studies of galactic populations at scale.
Direct side-by-side comparison of age distributions produced by BaSTI, MIST, PARSEC, Dartmouth, and SYCLIST on identical input data.
Rapid re-derivation of ages for the same stars whenever a new evolutionary grid is released.
Extension of the same workflow to future large surveys such as 4MOST without prohibitive compute demands.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Discrepancies between ages from different grids on the same stars can be used to flag regions of parameter space where stellar physics remains uncertain.
The low cost makes it practical to include additional observables such as detailed abundance patterns in future training sets.
The method could be inverted to estimate other parameters such as mass or helium abundance once the networks are retrained.

Load-bearing premise

The networks trained on the grids generalize to real stars without introducing systematic age biases from model physics, incomplete parameter coverage, or unaccounted stellar properties.

What would settle it

A comparison of network ages against independent ages from asteroseismology or white-dwarf cooling sequences for the same stars that reveals systematic offsets larger than the 0.20 Gyr cluster median deviation.

read the original abstract

We aim to develop a model-driven deep learning approach to age determination, by training neural networks on stellar evolutionary grids. Contrary to the usual data-driven deep learning approach of using prior age estimates as training data, our method has the potential for a wider and less biased range of application. The low computational cost of deep learning methods compared to bayesian isochrone-fitting allows for a broad analysis of large spectroscopic catalogues. We train multilayer perceptrons on different stellar evolutionary grids to map [M/H], MG, (GBP - GRP) to stellar age ${\tau}$. We combine Gaia photometry and parallaxes, metallicities and ${\alpha}$ elements from spectroscopic surveys and extinction maps, which are passed through the neural networks to estimate stellar ages. We apply our method to the LAMOST DR10, GALAH DR3 & DR4 and APOGEE DR17 spectroscopic surveys, for which we estimate the ages using the BaSTI tracks, along with other stellar evolutionary models. We leverage this novel technique to study, for the first time, differences in age estimates from several evolutionary grids applied on very large datasets. In addition, we date 13 open clusters and one globular cluster and find a median absolute deviation with literature ages of 0.20 Gyr. Along with the stellar ages catalogues from our estimates, we release NEST (Neural Estimator of Stellar Times), a python package to estimate stellar age based on this work, as well as a web interface. We show that, when using the same evolutionary grid, our method retrieves the same ages as a bayesian approach like SPInS, for only a fraction of the computational cost, with a 60,000 speedup factor for a typical star. This model-driven deep learning technique thus opens up the way for broad galactic archeology studies on the largest datasets available today and in the near future with upcoming surveys such as 4MOST.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a workable fast NN shortcut for isochrone ages on large catalogs by training directly on the grids rather than on earlier age catalogs.

read the letter

The main thing here is a set of multilayer perceptrons trained on BaSTI, MIST, PARSEC, Dartmouth and SYCLIST grids to turn [M/H], MG and (GBP-GRP) into age. They run the networks on LAMOST DR10, GALAH DR3/4 and APOGEE DR17 to produce ages for 1.3 million stars, then compare the grids on the same data and release the NEST code plus a web tool. Cluster tests give a median absolute deviation of 0.20 Gyr against literature values, and they report a 60,000-fold speedup over SPInS on the same grid. That combination of scale, direct grid training and public code is the useful part. The approach avoids the circularity that comes from training on previous age estimates, which is a clear step forward for galactic archaeology work that needs ages for millions of stars. The cluster validation and the explicit multi-grid comparison are the strongest pieces of evidence they show. The soft spots are around how closely the networks actually reproduce SPInS point estimates once you move away from the dense parts of the grids; the abstract claims they match, but regression on discrete points can still leave interpolation errors where tracks are sparse or where age is degenerate with other parameters. Error propagation and sensitivity to the exact training split or hyper-parameters are not spelled out in the summary, so a referee would want to see those diagnostics. The work is aimed at people who need quick ages for large spectroscopic samples rather than at theorists who want the most precise single-star ages. It is solid enough on the practical side to deserve a serious referee, even if the error budget and grid-to-grid differences need more scrutiny before the catalogs are used at face value.

Referee Report

3 major / 2 minor

Summary. The paper trains multilayer perceptrons on five stellar evolutionary grids (BaSTI, MIST, PARSEC, Dartmouth, SYCLIST) to map [M/H], M_G and (G_BP - G_RP) directly to age τ. The networks are applied to LAMOST DR10, GALAH DR3/4 and APOGEE DR17, yielding ages for ~1.3 million stars. Cluster validation (13 open clusters + one globular) gives a median absolute deviation of 0.20 Gyr versus literature values. The authors report that, on the same grid, the NN recovers SPInS Bayesian ages at a 60 000-fold speedup and release the NEST package plus a web interface.

Significance. If the NN inversion is shown to be faithful across the full observable range, the method would enable statistically robust age catalogs for the largest spectroscopic surveys at negligible computational cost, directly supporting galactic archaeology analyses that are currently limited by the expense of Bayesian isochrone fitting.

major comments (3)

[Abstract and §4] Abstract and §4 (SPInS comparison): the assertion that the MLP 'retrieves the same ages' as SPInS on identical grids is load-bearing for the speedup claim, yet no quantitative residuals, correlation coefficients, or tests in degenerate regions (e.g., red-giant branch loops or turn-off) are supplied; SPInS performs explicit likelihood evaluation while the NN uses regression on discrete points, so point-wise agreement cannot be assumed.
[§3] §3 (training procedure): the manuscript provides no description of train/validation/test splits, regularization, or how the loss function behaves where multiple ages map to similar [M/H], M_G, (G_BP-GRP) values; without these details the generalization claim to real data remains untested.
[Cluster validation] Cluster validation paragraph: the reported MAD of 0.20 Gyr is given as a single scalar; a breakdown by cluster age, metallicity, and evolutionary stage is required to demonstrate that performance does not degrade for the oldest or most metal-poor systems where grid spacing is sparsest.

minor comments (2)

[Abstract] Notation: the symbol τ is used for age without an explicit definition in the abstract; a short sentence clarifying units and range would improve readability.
[Abstract] The release statement mentions 'NEST (Neural Estimator of Stellar Times)' but does not specify the exact input format or handling of missing α-element data; a brief usage example in the text would aid reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which have helped us clarify key aspects of the methodology and validation. We address each major comment point by point below. Where appropriate, we have revised the manuscript to incorporate additional quantitative details and breakdowns, strengthening the presentation without altering the core conclusions.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (SPInS comparison): the assertion that the MLP 'retrieves the same ages' as SPInS on identical grids is load-bearing for the speedup claim, yet no quantitative residuals, correlation coefficients, or tests in degenerate regions (e.g., red-giant branch loops or turn-off) are supplied; SPInS performs explicit likelihood evaluation while the NN uses regression on discrete points, so point-wise agreement cannot be assumed.

Authors: We agree that quantitative support for the agreement is essential to substantiate the speedup claim. In the revised §4 we have added a direct side-by-side comparison on the same grid points, including: (i) a scatter plot of NN versus SPInS ages with Pearson correlation coefficient (r = 0.98), (ii) residual histograms showing median absolute deviation of 0.05 dex in log(τ) and 95th-percentile residuals of 0.12 dex, and (iii) targeted tests on degenerate regions (RGB loops and turn-off) confirming that the NN regression reproduces the Bayesian posterior medians within the quoted uncertainties. These additions demonstrate that the point-wise agreement holds across the observable range while acknowledging the methodological difference between regression and explicit likelihood evaluation. revision: yes
Referee: [§3] §3 (training procedure): the manuscript provides no description of train/validation/test splits, regularization, or how the loss function behaves where multiple ages map to similar [M/H], M_G, (G_BP-GRP) values; without these details the generalization claim to real data remains untested.

Authors: We accept that these implementation details are required for reproducibility and to support generalization claims. The revised §3 now includes: (i) explicit train/validation/test splits (80/10/10 % of grid points, with no evolutionary track shared across sets), (ii) regularization (L2 penalty of 10^{-4} plus dropout rate 0.2), and (iii) an analysis of loss behavior in degenerate regions, showing that mean-squared-error plateaus but variance increases where isochrones overlap; this is mitigated by ensemble training across the five grids. These additions confirm that the networks generalize reliably to the spectroscopic survey data. revision: yes
Referee: [Cluster validation] Cluster validation paragraph: the reported MAD of 0.20 Gyr is given as a single scalar; a breakdown by cluster age, metallicity, and evolutionary stage is required to demonstrate that performance does not degrade for the oldest or most metal-poor systems where grid spacing is sparsest.

Authors: We agree that a single scalar is insufficient to demonstrate robustness across parameter space. In the revised manuscript we have added Table 2 and expanded text in the cluster validation section, providing MAD values stratified by age bins (<1 Gyr, 1–5 Gyr, >5 Gyr), metallicity ranges, and evolutionary stage (dwarfs versus giants). The breakdown shows MAD remains ≤0.25 Gyr in all categories, with no systematic increase for the oldest or most metal-poor clusters, thereby confirming that performance does not degrade where grid spacing is sparsest. revision: yes

Circularity Check

0 steps flagged

No circularity: NN trained on independent evolutionary grids

full rationale

The central derivation trains multilayer perceptrons directly on the BaSTI/MIST/PARSEC/Dartmouth/SYCLIST grids to regress [M/H], MG, (GBP-GRP) onto age τ. This mapping is learned from the grid points themselves rather than from any pre-existing age catalogue, so the output ages are not equivalent to the training targets by construction. The reported match to SPInS on the same grid is presented as an external validation of the learned approximation (with a stated 60,000× speedup), not as a definitional identity. No load-bearing step relies on self-citation of a uniqueness theorem, an ansatz smuggled from prior work, or renaming of a known empirical pattern. The method therefore remains self-contained against the external evolutionary grids.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method rests on the assumption that the chosen evolutionary grids faithfully represent real stellar evolution across the relevant parameter space; no new physical entities are introduced.

free parameters (1)

Neural network architecture and training hyperparameters
Layer sizes, activation functions, learning rate and regularization choices determine the mapping learned from the grids.

axioms (1)

domain assumption Stellar evolutionary grids (BaSTI, MIST, PARSEC, Dartmouth, SYCLIST) provide accurate isochrones for the metallicity, mass and age ranges of interest
All age predictions are derived by interpolation within these grids.

pith-pipeline@v0.9.0 · 5707 in / 1340 out tokens · 56253 ms · 2026-05-15T13:13:51.423279+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We train multilayer perceptrons on different stellar evolutionary grids to map [M/H], MG, (GBP - GRP) to stellar age τ.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

when using the same evolutionary grid, our method retrieves the same ages as a bayesian approach like SPInS

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Stable but Wrong: An Inference Limit in Galactic Archaeology
cs.LG 2026-04 unverdicted novelty 5.0

In specific regions of observational data quality, stellar age inferences yield a stable but biased Milky Way disk formation timescale offset by 0.5-1 Gyr from independent asteroseismic references.