pith. sign in

arxiv: 2604.03372 · v1 · submitted 2026-04-03 · 🌌 astro-ph.IM · astro-ph.HE

GOPREAUX I: Open-source Code and Data to Model Multi-wavelength Emission of Extragalactic Transients using Gaussian Processes

Pith reviewed 2026-05-13 18:12 UTC · model grok-4.3

classification 🌌 astro-ph.IM astro-ph.HE
keywords Gaussian process regressionextragalactic transientsmulti-wavelength photometrylight curve modelingopen-source codesupernovaetidal disruption eventsphotometric classification
0
0 comments X

The pith

Code interpolates transient light curves across phase and wavelength

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GOPREAUX, an open-source Python package that applies Gaussian process regression to model the photometric emission of extragalactic transients across both time and wavelength. By training on a large sample of nearly 1,300 events with UV, optical, and IR data, the models generate data-driven interpolations without assuming specific shapes for the light curves. This enables predictions of how these transients would appear at higher redshifts, where their ultraviolet emission is shifted into observer-frame optical or infrared bands. Such capabilities support population studies, photometric classification, and physical inference from the sparse light curves that future surveys will provide.

Core claim

By aggregating multi-wavelength observations of almost 1,300 transients including Type II supernovae, stripped-envelope supernovae, superluminous supernovae, and tidal disruption events, Gaussian process regression can be used to create non-parametric models that interpolate emission across phase and wavelength, producing light curve and spectral predictions for events at higher redshifts.

What carries the argument

Gaussian process regression jointly over phase and wavelength dimensions applied to aggregated transient photometry.

If this is right

  • Predictions of light curves and spectra become available at higher redshifts.
  • Photometric classification of transients is enabled from relatively sparse individual light curves.
  • Physical parameter inference from photometry is supported for population-level analysis.
  • Multi-wavelength light curves and spectral templates are generated as open data products.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The interpolations could help identify selection biases by testing consistency across different survey depths.
  • Application to real-time survey data streams could improve early classification of new discoveries.
  • Similar Gaussian process approaches might extend to other time-domain astrophysics problems like variable stars.

Load-bearing premise

The aggregated sample of nearly 1,300 transients is sufficiently representative and free of selection biases to produce reliable GP interpolations for unseen events and redshifts.

What would settle it

Training the models on a random subset of the transients and checking whether the predicted light curves for the held-out events match the actual multi-wavelength observations within the reported uncertainties.

Figures

Figures reproduced from arXiv: 2604.03372 by A. Crawford, C. Pellegrino, F. Bianco, M. Modjaz, S. Khakpash, T. A. Pritchard.

Figure 1
Figure 1. Figure 1 [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The number of individual detections as a function of phase for the transients in each major spectroscopic class in our sample. The size of each point scales with the number of detections in that phase bin. Each bin covers 5 days. There are a total of 146,074 detections across all transient classes in our sample. eling, including recent works, generally focuses on spe￾cific subtypes. For example, D. Hiramat… view at source ↗
Figure 3
Figure 3. Figure 3: Violin plot showing the distribution of detections per filter as a function of phase for the entirety of our sample. The number of detections in each filter is shown to the left of each distribution. While each filter has a greater number of detections around peak brightness, we still maintain excellent coverage from the UV to the IR at very early and very late phases. 3.2. Data Collection and Reduction Al… view at source ↗
Figure 4
Figure 4. Figure 4: Left: Forced photometry from difference images retrieved from the ZTF forced photometry server. Flux values in each filter-camera combination are used to establish a flux baseline. Right: The final magnitude-calibrated ZTF photometry for the same object. 5σ upper limits are shown in lightly-shaded triangles while detections are shown in colored circles. the ZTF (E. C. Bellm et al. 2019; M. J. Graham et al.… view at source ↗
Figure 5
Figure 5. Figure 5: A flow chart demonstrating the GOPREAUX workflow, including pre-processing, data preparation, and modeling routines. The legend at the top details the meaning of each shape in the chart. In particular, Python class objects are shown as parallelograms with names in bold, and functions (methods) on those classes are shown as rectangles. peak powered by radioactive decay of 56Ni, and not the earlier shock-coo… view at source ↗
Figure 6
Figure 6. Figure 6: The final “data cube” used as input to the GPR fitting routines. Each point represents a photometry mea￾surement in a given filter. During pre-processing, the ef￾fective wavelengths of each measurement are shifted to the transient rest frame and then “warped” iteratively until the measured photometry matches the filter functions convolved with the interpolated SED at each epoch. ing unphysically to conform… view at source ↗
Figure 7
Figure 7. Figure 7: Polynomial templates (dashed black lines) calculated from all stripped-envelope SNe detections (solid points) and nondetections (shaded triangles) in six representative filters. Filter names are given in the top right of each plot. The polynomial template is subtracted from the photometry of each SN in the fitting sample to calculate photometric residuals, which are then fit as part of our GPR routine. Thi… view at source ↗
Figure 8
Figure 8. Figure 8: Plots demonstrating different aspects of our GPR modeling routine, from computing residuals to fitting light curve evolution and forecasting SED evolution. Left: Simultaneous GPR model fits (solid lines) to the observed UV-optical photometry of the Type IIb SN 2016gkg (colored points). The shaded region represents the 95% confidence interval for each filter. The inset focuses on the early-time evolution du… view at source ↗
Figure 9
Figure 9. Figure 9: The final template model surface for the Type IIb supernovae, fit between -20 and 50 days relative to peak brightness across the UV and optical filters. Three different viewing angles are shown relative to the yz-plane normal: -37.5 degrees (top left), -127.5 degrees (top right), and 240 degrees (bottom). The final model is constructed by sampling and median combining the individual fits from each object i… view at source ↗
Figure 10
Figure 10. Figure 10: The observed V -band photometry of the Type IIb SN 2016gkg (green points) compared to the final GPR model template surface for SNe IIb at that wavelength (solid blue line). All values are plotted relative to the estimated peak brightness of the transient. 95% confidence intervals are given by the faint blue lines. The model encompasses the observed photometry at almost all phases—the sharp initial peak is… view at source ↗
Figure 11
Figure 11. Figure 11: Predicted light curves (solid lines) generated from fits to synthetic photometry (colored points) for the final SN IIb template model. Each line represents one sample drawn from the GPR probability distribution at that filter’s effective wavelength. The variable early-time light curves reflect the range of behavior for the SNe IIb at these phases; some show double-peaked light curves and others only show … view at source ↗
Figure 12
Figure 12. Figure 12: Heatmap showing the median residuals between five train-test splits of the SN IIb sample. The magnitude of the residuals is largest in the bluest wavelengths, particularly at the phase boundaries, owing to a scarcity of data in this region of parameter space. deviation at each wavelength and phase step caused by differences in the inputted transient collections. An example of the results of this process i… view at source ↗
read the original abstract

Contemporary all-sky surveys have observed thousands of extragalactic transients in the nearby universe, and upcoming surveys will discover exponentially more at higher redshifts. With these large samples, population-level analysis of the photometric behavior of different transient classes is now possible, allowing for photometric classification and physical parameter inference from relatively sparse individual light curves. To enable such studies, we introduce Gaussian process Optimized Photometric Regression of Extragalactic Archival Ultraviolet-infrared eXplosions, a.k.a GOPREAUX--a Python package for Gaussian Process Regression of multi-wavelength transient photometry. Our modeling is unique in that it interpolates transient emission across phase and wavelength in a non-parametric, data-driven way. This allows for predictions of light curves and spectra at higher redshifts, where the rest-frame ultraviolet (UV) emission is redshifted into the observer-frame optical or infrared. To this end, we aggregate a sample of almost 1,300 transients observed in the UV and optical with the Neil Gehrels Swift Telescope, complemented with additional optical and infrared coverage from surveys such as ZTF and open-source data releases. Our sample includes 275 Type II SNe, 172 stripped-envelope SNe, 72 superluminous SNe, and 58 tidal disruption events, among other classes. Our code and reduced photometry--comprising over 146,000 photometric observations--are available as open-source software and data products. Here we discuss our sample criteria, data reduction and modeling methodologies, the multi-wavelength light curves and spectral templates produced by our models, and the future directions in photometric classification and physical parameter inference this code and data repository enables.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces GOPREAUX, an open-source Python package for Gaussian process regression modeling of multi-wavelength photometry from a compiled sample of nearly 1,300 extragalactic transients (including 275 Type II SNe, 172 stripped-envelope SNe, 72 superluminous SNe, and 58 TDEs). The central contribution is a non-parametric, data-driven interpolation across phase and wavelength to generate light curves and spectral templates, enabling predictions at higher redshifts where rest-frame UV shifts into observer-frame optical/IR. The reduced photometry (>146,000 observations) and code are released publicly.

Significance. If validated, this resource would support population-level transient studies, photometric classification, and physical parameter inference from sparse data. The open release of code and data products is a clear strength for reproducibility and community use in astro-ph.IM.

major comments (2)
  1. [modeling methodologies] Modeling methodologies section: No quantitative validation metrics (e.g., cross-validation scores, held-out test performance, or kernel-specific results) are reported for the GP regressions. This directly affects assessment of the non-parametric interpolation reliability claimed in the abstract.
  2. [sample criteria] Sample criteria section: The ~1,300-transient sample aggregates Swift UV (favoring strong UV emitters) with ZTF/optical/IR data, but no analysis of selection biases, completeness, or coverage gaps in phase-wavelength space is provided. This is load-bearing for the central claim that the models enable reliable predictions at higher redshifts.
minor comments (1)
  1. [Abstract] Abstract: A summary table of transient classes and counts would improve clarity over the inline list.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive review and for highlighting areas where the manuscript can be strengthened. We have revised the paper to address both major comments by adding quantitative validation metrics for the GP models and an analysis of selection biases and coverage in the sample criteria section.

read point-by-point responses
  1. Referee: [modeling methodologies] Modeling methodologies section: No quantitative validation metrics (e.g., cross-validation scores, held-out test performance, or kernel-specific results) are reported for the GP regressions. This directly affects assessment of the non-parametric interpolation reliability claimed in the abstract.

    Authors: We agree that quantitative validation metrics are essential for assessing the reliability of the non-parametric interpolation. In the revised manuscript, we have added a dedicated subsection to the modeling methodologies section that reports cross-validation scores, performance on held-out test data, and kernel-specific results for the GP regressions. These additions directly support the claims made in the abstract regarding the models' predictive capabilities. revision: yes

  2. Referee: [sample criteria] Sample criteria section: The ~1,300-transient sample aggregates Swift UV (favoring strong UV emitters) with ZTF/optical/IR data, but no analysis of selection biases, completeness, or coverage gaps in phase-wavelength space is provided. This is load-bearing for the central claim that the models enable reliable predictions at higher redshifts.

    Authors: We acknowledge that the aggregation of Swift UV data, which preferentially includes strong UV emitters, may introduce selection biases, and that an explicit analysis of completeness and phase-wavelength coverage is needed to support higher-redshift applications. In the revised version, we have expanded the sample criteria section with a new analysis of selection biases, class-specific completeness estimates, and quantitative mapping of coverage gaps in phase-wavelength space. This provides a clearer basis for evaluating the models' reliability at higher redshifts while noting the associated limitations. revision: yes

Circularity Check

0 steps flagged

No circularity; data product and GP interpolation are self-contained

full rationale

The paper introduces GOPREAUX as open-source code and an aggregated dataset of ~1300 transients for non-parametric Gaussian process regression across phase and wavelength. The central claim—that the modeling interpolates emission in a data-driven way to enable higher-redshift predictions—follows directly from applying standard GP methods to the released photometry without any equations that reduce outputs back to fitted inputs by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatzes are invoked; the work is a software/data release whose validity rests on the external sample and GP formalism rather than internal redefinition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work rests on standard Gaussian process regression assumptions and the quality of the compiled observational dataset; no new physical axioms or invented entities are introduced.

pith-pipeline@v0.9.0 · 5627 in / 1099 out tokens · 43060 ms · 2026-05-13T18:12:35.173303+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

  1. [1]

    , " * write output.state after.block = add.period write newline

    ENTRY address archivePrefix author booktitle chapter doi edition editor eprint howpublished institution journal key month number organization pages publisher school series title misctitle type volume year version url label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts ...

  2. [2]

    write newline

    " write newline "" before.all 'output.state := FUNCTION format.url url empty "" new.block "" url * "" * if FUNCTION format.eprint eprint empty "" archivePrefix empty "" archivePrefix "arXiv" = new.block " " eprint * " " * new.block " " eprint * " " * if if if FUNCTION format.doi doi empty "" " " doi * " " * if FUNCTION format.pid doi empty eprint empty ur...

  3. [3]

    A., 2007, @doi [Annual Review of Astronomy and Astrophysics] 10.1146/annurev.astro.45.051806.110615 , https://ui.adsabs.harvard.edu/#abs/2007ARA&A..45..177C 45, 177

    thebibliography [1] 20pt to REFERENCES 6pt =0pt \@twocolumntrue 12pt -12pt 10pt plus 3pt =0pt =0pt =1pt plus 1pt =0pt =0pt -12pt =13pt plus 1pt =20pt =13pt plus 1pt \@M =10000 =-1.0em =0pt =0pt 0pt =0pt =1.0em @enumiv\@empty 10000 10000 `\.\@m \@noitemerr \@latex@warning Empty `thebibliography' environment \@ifnextchar \@reference \@latexerr Missing key o...