Replacing Gaussian Processes with Neural Networks in Pulsar Timing Array Inference of the Gravitational-Wave Background

Chris Gordon; Shreyas Tiruvaskar

arxiv: 2604.04340 · v3 · pith:VEROC7JLnew · submitted 2026-04-06 · 🌌 astro-ph.CO · physics.data-an

Replacing Gaussian Processes with Neural Networks in Pulsar Timing Array Inference of the Gravitational-Wave Background

Shreyas Tiruvaskar , Chris Gordon This is my paper

Pith reviewed 2026-05-10 20:24 UTC · model grok-4.3

classification 🌌 astro-ph.CO physics.data-an

keywords pulsar timing arraysgravitational wave backgroundneural networksGaussian processesBayesian inferencenanohertz gravitational wavesstrain spectrum interpolation

0 comments

The pith

Probabilistic neural networks can replace Gaussian process interpolators in pulsar timing array analyses of nanohertz gravitational wave backgrounds, producing matching posteriors at lower computational cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether neural networks can stand in for Gaussian processes when interpolating expensive strain-spectrum calculations inside Bayesian inference of the nanohertz gravitational-wave background. In two concrete models—one for self-interacting dark matter and one for phenomenological environmental effects—the networks are trained once on a finite set of spectra and then used during Markov chain Monte Carlo sampling. The resulting posteriors stay consistent with those obtained from Gaussian processes. Both the initial training step and the subsequent sampling step run faster, with the largest speed-ups appearing in the more demanding model. The substitution matters because current Gaussian-process training already limits how large or detailed the models can be made before the computation becomes prohibitive.

Core claim

Bayesian inference of nanohertz gravitational-wave background models in pulsar timing array analyses often relies on Gaussian-process interpolators to avoid repeated, computationally expensive strain-spectrum calculations. However, Gaussian-process training becomes a bottleneck for large training sets. We test whether probabilistic neural networks can replace Gaussian processes in this role for both a self-interacting dark matter model and a phenomenological environmental model. We find that neural networks recover consistent posteriors while significantly reducing both training and Markov chain Monte Carlo runtime, with the largest gains for the more computationally demanding model.

What carries the argument

probabilistic neural networks trained on strain-spectrum evaluations to serve as fast surrogates during posterior sampling

If this is right

Posteriors for the self-interacting dark matter model remain consistent between the two interpolators.
Posteriors for the phenomenological environmental model remain consistent between the two interpolators.
Training time is reduced compared with Gaussian-process methods.
Markov chain Monte Carlo runtime is reduced, with larger savings in the more computationally intensive model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same substitution could be tried in other astrophysical inference problems that currently use Gaussian processes for expensive forward-model evaluations.
Larger training sets or more complex models might become feasible once the per-sample cost drops.
The approach opens a path to embedding the surrogate inside nested sampling or other sampling algorithms that are even more expensive than standard Markov chain Monte Carlo.

Load-bearing premise

Once trained on a finite set of strain-spectrum points, the neural networks generalize accurately across the full prior volume of the target models without adding systematic biases to the recovered posteriors.

What would settle it

A side-by-side run in which the posterior distributions or credible intervals obtained from the neural-network interpolator differ from those of the Gaussian-process interpolator by more than sampling noise on an independent validation set.

Figures

Figures reproduced from arXiv: 2604.04340 by Chris Gordon, Shreyas Tiruvaskar.

**Figure 2.** Figure 2: FIG. 2. Corner plot of the posterior distributions for the SIDM model parameters from MCMC runs using GP interpolators [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3. Gravitational strain spectra predicted by NNs (red) [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4. Corner plot of the posterior distributions for the SIDM model parameters from MCMC runs using NN interpolators [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5. Corner plot comparing the posterior distributions of the SIDM model parameters obtained from MCMC analyses using [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: FIG. 6. Gravitational strain spectra predicted by NNs (red) [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: FIG. 7. Posterior distributions for the phenomenological model parameters obtained using GP and NN interpolators. Contours [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: FIG. 8. GP and NN predictive errors for the SIDM model. [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: FIG. 9. Same as Fig [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

read the original abstract

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Neural nets replace GPs in PTA surrogates with reported speedups and matching posteriors, but the write-up supplies almost no validation numbers or architecture details to back the generalization claim.

read the letter

The paper shows that probabilistic neural networks can serve as drop-in surrogates for Gaussian processes when evaluating strain spectra inside PTA likelihoods for a self-interacting dark matter model and a phenomenological environmental model. Training and MCMC times drop, with bigger gains on the heavier model, and the recovered posteriors look consistent with the GP baseline on the runs they performed. That is the concrete result: a targeted empirical swap that addresses a real bottleneck in current nanohertz analyses. The comparison is head-to-head on models already in use, which keeps the test grounded. The practical payoff is clear if the speed-up holds without introducing bias. The main weakness is the thin evidence for the central assumption. The abstract claims consistency and runtime cuts but gives no posterior-overlap numbers, no held-out recovery errors, no coverage checks, and no description of network depth, training-set size, or how the prior volume was sampled. Without those diagnostics it is difficult to judge whether the NN approximation stays accurate near prior boundaries or in regions of high curvature. The generalization step is doing the heavy lifting, and it is not yet shown to be sub-dominant to statistical uncertainty. This work is aimed at PTA analysts who already run GP-based pipelines and are looking for faster surrogates. A reader who needs to implement or extend the method would get the basic idea but would still have to fill in the missing validation steps themselves. I would send it to referees. The empirical test is narrow enough and the motivation strong enough that a careful review can sort out whether the consistency claim survives scrutiny.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes replacing Gaussian-process interpolators with probabilistic neural networks for strain-spectrum calculations in Bayesian pulsar timing array inference of the nanohertz gravitational-wave background. It tests the substitution on a self-interacting dark matter model and a phenomenological environmental model, reporting that the networks recover consistent posteriors while reducing both training time and MCMC runtime, with larger gains for the more demanding model.

Significance. If the networks generalize without systematic bias across the prior volume, the approach could remove a key computational bottleneck in PTA analyses, enabling faster exploration of complex models or larger datasets. The work supplies an empirical head-to-head comparison rather than new theoretical machinery.

major comments (2)

[Abstract / Results] Abstract and results: the central claim that neural networks recover 'consistent posteriors' is stated without quantitative diagnostics (e.g., posterior overlap metrics, held-out parameter recovery error, coverage probabilities, or credible-interval calibration tests). This leaves the load-bearing assertion that approximation error remains sub-dominant to statistical uncertainty unverified.
[Methods] Methods: the manuscript supplies no details on network architecture, training-set construction, hyperparameter choices, or validation strategy against the full prior support. Without these, it is impossible to evaluate whether under-sampling near prior boundaries or high-curvature regions could induce systematic shifts in recovered parameters.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review. The comments identify key areas where additional quantitative support and methodological transparency will strengthen the manuscript. We address each major comment below and describe the revisions we plan to implement.

read point-by-point responses

Referee: [Abstract / Results] Abstract and results: the central claim that neural networks recover 'consistent posteriors' is stated without quantitative diagnostics (e.g., posterior overlap metrics, held-out parameter recovery error, coverage probabilities, or credible-interval calibration tests). This leaves the load-bearing assertion that approximation error remains sub-dominant to statistical uncertainty unverified.

Authors: We agree that the central claim requires stronger quantitative backing. In the revised manuscript we will add explicit diagnostics in the Results section, including Jensen-Shannon divergence between the GP- and NN-derived posteriors, parameter recovery bias and variance on held-out simulations drawn from the prior, and coverage probability checks for the 68% and 95% credible intervals. These metrics will be reported for both the self-interacting dark matter and environmental models. The abstract will be updated to reference these diagnostics rather than stating consistency in purely qualitative terms. revision: yes
Referee: [Methods] Methods: the manuscript supplies no details on network architecture, training-set construction, hyperparameter choices, or validation strategy against the full prior support. Without these, it is impossible to evaluate whether under-sampling near prior boundaries or high-curvature regions could induce systematic shifts in recovered parameters.

Authors: We acknowledge that the current Methods section lacks the necessary detail for reproducibility and bias assessment. The revised version will expand this section to specify the neural-network architecture (number of layers, hidden units, activation functions, and probabilistic output parameterization), the training-set generation procedure (prior sampling density, total number of points, and stratification near boundaries), hyperparameter selection via cross-validation, and validation tests that explicitly probe performance in high-curvature and prior-edge regions. These additions will allow readers to judge the risk of systematic shifts. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmark of NN vs GP interpolators

full rationale

The paper conducts a direct numerical comparison of two interpolation methods (Gaussian processes versus probabilistic neural networks) for strain-spectrum evaluations inside PTA Bayesian inference pipelines. All reported results—posterior consistency, training time, and MCMC runtime—are obtained from independent held-out simulations and full inference runs rather than from any self-referential definition, fitted parameter renamed as a prediction, or load-bearing self-citation. No derivation chain exists that reduces an output quantity to the same fitted inputs used to define it; the work is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard Bayesian inference machinery and established machine-learning surrogate techniques without introducing new free parameters, axioms beyond domain conventions, or invented physical entities.

axioms (1)

domain assumption Bayesian posterior sampling in PTA analyses requires repeated, accurate evaluations of the strain-spectrum likelihood.
Standard assumption in the pulsar-timing-array literature.

pith-pipeline@v0.9.0 · 5384 in / 1203 out tokens · 59410 ms · 2026-05-10T20:24:52.576017+00:00 · methodology

Replacing Gaussian Processes with Neural Networks in Pulsar Timing Array Inference of the Gravitational-Wave Background

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)