pith. machine review for the scientific record. sign in

arxiv: 2601.02462 · v2 · submitted 2026-01-05 · 🌌 astro-ph.IM · astro-ph.EP· astro-ph.SR

Recognition: 2 theorem links

· Lean Theorem

Exposure-averaged Gaussian Processes for Combining Overlapping Datasets

Authors on Pith no claims yet

Pith reviewed 2026-05-16 17:23 UTC · model grok-4.3

classification 🌌 astro-ph.IM astro-ph.EPastro-ph.SR
keywords Gaussian processesexposure averagingstellar variabilityradial velocitysolar observationsdata combinationasteroseismologyinstrumental drifts
0
0 comments X

The pith

Exposure-averaged Gaussian processes enable combining overlapping stellar datasets with different exposure times.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a Gaussian process method to handle the fact that observed signals are averages over the instrument's exposure time rather than instantaneous samples. Standard kernels model instantaneous covariance, but when exposures are long relative to variability timescales, integrated versions are needed to match what instruments actually record. This matters for combining data from different EPRV instruments observing the Sun, which have varying exposure times and some overlapping coverage. The framework predicts both the underlying true signal and each instrument's binned version, while modeling relative drifts as separable terms. A sympathetic reader would care because it removes a systematic mismatch that could otherwise bias joint analyses of stellar variability.

Core claim

We present a GP framework that accounts for exposure times by computing integrated forms of the instantaneous kernels typically used. These functions allow one to predict the true latent oscillation signals and the exposure-binned version expected by each instrument. We extend the framework to work for instruments with significant time overlap (i.e., similar longitude) by including relative instrumental drift components that can be predicted and separated from the stellar variability components.

What carries the argument

Exposure-integrated forms of instantaneous GP kernels that compute the covariance between exposure-averaged observations.

Load-bearing premise

The stellar variability must be adequately described by the chosen instantaneous GP kernels, and instrumental drifts must be separable additive components without leaking into the stellar signal.

What would settle it

A direct test comparing the model's predicted exposure-binned signals against simultaneous short-exposure and long-exposure observations of the same target, checking whether residuals match expected noise levels.

Figures

Figures reproduced from arXiv: 2601.02462 by Jacob K. Luhn, Lily L. Zhao, Ryan A. Rubenzahl, Samuel Halverson.

Figure 1
Figure 1. Figure 1: When computing the double integral kF F , any two exposures can be decomposed into a sum of perfectly overlapping and separate sub-exposures; each sub-integral is evaluated with newly defined exposure times (δs) and time separations (∆s). Note that to compute the sum of the three sub-integrals, one must first mul￾tiply each sub-integral by the product of the two sub-exposure times before summing, and final… view at source ↗
Figure 2
Figure 2. Figure 2: Left: PSD for the solar variability kernels used in this work. We use a sum of a granulation kernel and an oscillation kernel. Right: Sample draws for each of the stellar variability kernels (arbitrary offsets added for clarity. In summary, Equation 29 is the full general GP model we use, with NS as the number of stellar components used in the model, and NI as the number of instruments in the data. We note… view at source ↗
Figure 3
Figure 3. Figure 3: Simulated time series for our 4 representative instruments, assuming no instrumental noise. The colored horizontal lines indicate each observation’s exposure time and lighter points display the observed RV centered within. The “true” stellar variability signal is shown in black in each panel (the sum of the components shown in [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: An example of computing the predictive mean on a combined dataset, combining the synthetic time series from Instruments B, C, and D. The top two panels show the predictive mean without exposure-time accounting (teal) as compared to the true signal (black), and the bottom two panels show the same predictive mean after properly accounting for the exposure times. For each case, the lower panel shows the resid… view at source ↗
Figure 5
Figure 5. Figure 5: Recovery of instrument drifts from combining noise-free observations from 4 instruments with a model that includes both stellar signals and instrument drifts. Top panel: Recovered, or “predicted” instrument drifts (solid lines with 1-σ confidence regions) compared to the true synthetic drifts for each instrument in the drift model case. Recovered drifts are overall well correlated with the true drifts, how… view at source ↗
Figure 6
Figure 6. Figure 6: Combined GP model fitting between overlapping NEID and KPF observations taken on 2024, April 19. The top panel shows the raw RVs from KPF (green diamonds) and NEID (orange circles) observations after mean subtraction. The second panel shows the result of our GP model, where the black line shows the stellar components (sum of granulation and oscillations), and the data have been corrected using the predicte… view at source ↗
Figure 7
Figure 7. Figure 7: GP predictive means for each stellar component after combining HARPS-N (yellow), HARPS (purple), EXPRES (blue), and NEID (orange) solar data from the ESSP dataset (L. L. Zhao et al. 2023); successive panels zoom in from the full dataset (28 days) to a single portion of one day (3 hours). We have subtracted the instrument drift component from each dataset. In each panel, the GP prediction in black shows the… view at source ↗
read the original abstract

Physically motivated Gaussian process (GP) kernels for stellar variability, like the commonly used damped, driven simple harmonic oscillators that model stellar granulation and p-mode oscillations, quantify the instantaneous covariance between any two points. For kernels whose timescales are significantly longer than the typical exposure times, such GP kernels are sufficient. For time series where the exposure time is comparable to the kernel timescale, the observed signal represents an exposure-averaged version of the true underlying signal. This distinction is important in the context of recent data streams from Extreme Precision Radial Velocity (EPRV) spectrographs like fast readout stellar data of asteroseismology targets and solar data to monitor the Sun's variability during daytime observations. Current solar EPRV facilities have significantly different exposure times per-site, owing to the different design choices made. Consequently, each instrument traces different binned versions of the same "latent" signal. Here we present a GP framework that accounts for exposure times by computing integrated forms of the instantaneous kernels typically used. These functions allow one to predict the true latent oscillation signals and the exposure-binned version expected by each instrument. We extend the framework to work for instruments with significant time overlap (i.e., similar longitude) by including relative instrumental drift components that can be predicted and separated from the stellar variability components. We use Sun-as-a-star EPRV datasets as our primary example, but present these approaches in a generalized way for application to any dataset where exposure times are a relevant factor or combining instruments with significant overlap.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces a Gaussian process framework for modeling stellar variability in observations with finite exposure times by deriving integrated forms of standard instantaneous kernels (e.g., damped driven SHO). These integrated kernels enable prediction of both the underlying latent signal and the exposure-binned signals recorded by different instruments. The approach is extended to multi-site overlapping datasets by adding separable relative instrumental drift terms, allowing the stellar and drift components to be jointly modeled and separated. The methods are illustrated with Sun-as-a-star EPRV data but presented generally for any time-series application where exposure averaging or instrument overlap matters.

Significance. If the kernel integrations are correctly derived and the component separation is shown to be robust, the framework would be a useful methodological advance for EPRV and asteroseismology pipelines that must combine heterogeneous datasets with differing exposure times and site overlaps. It directly addresses a practical modeling gap when kernel timescales are comparable to exposure durations.

major comments (2)
  1. [Framework extension to overlapping datasets] The extension to overlapping instruments claims that relative drift terms can be predicted and separated from the stellar GP signal. However, no diagnostic is provided to establish identifiability, such as the condition number of the joint covariance matrix or posterior correlations between stellar hyperparameters and drift parameters. This assumption is load-bearing for the multi-instrument combination claim.
  2. [Kernel integration section] The central construction relies on integrated kernels (e.g., for the damped SHO) being derived from instantaneous forms without reducing to quantities fixed by the fitted parameters. The manuscript should include the explicit integrated expressions and a verification that they remain non-degenerate for the relevant timescale ratios.
minor comments (1)
  1. [Abstract] The abstract mentions application to 'fast readout stellar data of asteroseismology targets' but does not name the specific datasets or report quantitative performance metrics from the Sun-as-a-star example.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive review and for recognizing the potential utility of the exposure-integrated GP framework for combining heterogeneous EPRV datasets. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: The extension to overlapping instruments claims that relative drift terms can be predicted and separated from the stellar GP signal. However, no diagnostic is provided to establish identifiability, such as the condition number of the joint covariance matrix or posterior correlations between stellar hyperparameters and drift parameters. This assumption is load-bearing for the multi-instrument combination claim.

    Authors: We agree that explicit diagnostics would strengthen the identifiability claim for the joint stellar-plus-drift model. In the revised manuscript we will add a dedicated subsection that computes the condition number of the joint covariance matrix for representative overlapping configurations and reports the posterior correlations (via MCMC sampling) between the stellar GP hyperparameters and the relative instrumental drift amplitudes. These diagnostics will be shown for the Sun-as-a-star example and for simulated cases spanning the relevant overlap fractions. revision: yes

  2. Referee: The central construction relies on integrated kernels (e.g., for the damped SHO) being derived from instantaneous forms without reducing to quantities fixed by the fitted parameters. The manuscript should include the explicit integrated expressions and a verification that they remain non-degenerate for the relevant timescale ratios.

    Authors: We agree that the explicit integrated kernel expressions and a non-degeneracy check are necessary for clarity and reproducibility. The revised manuscript will present the full analytical forms of the exposure-integrated damped-driven SHO kernel (and the other kernels used) derived from the instantaneous versions. We will also add a verification subsection that analytically and numerically confirms the integrated kernels remain non-degenerate across the timescale ratios encountered in EPRV data (exposure times of 1–10 min versus kernel timescales from minutes to hours). revision: yes

Circularity Check

0 steps flagged

No circularity: integrated kernels derived mathematically from standard instantaneous forms

full rationale

The paper derives exposure-integrated GP kernels directly from standard instantaneous kernels (e.g., damped SHO) via explicit integration over exposure time. This is a standard mathematical operation on known covariance functions and does not reduce any prediction to a fitted parameter by construction, nor does it rely on self-citation chains or imported uniqueness theorems. The framework for separating stellar variability from relative instrumental drifts in overlapping data is presented as an extension using additive components with distinct kernels; no load-bearing step collapses to a self-referential definition or renaming of an empirical pattern. The derivation chain remains self-contained against external benchmarks for kernel integration.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on standard GP stationarity and kernel assumptions plus the choice of specific kernel families (SHO etc.) whose hyperparameters are fitted to data; no new entities are postulated.

free parameters (1)
  • GP kernel hyperparameters (amplitude, damping timescale, etc.)
    Standard parameters optimized to match observed stellar variability; their values are not fixed a priori.
axioms (1)
  • domain assumption The underlying stellar signal is a stationary Gaussian process whose covariance is given by a known instantaneous kernel form.
    Invoked to justify the integration step that produces the exposure-averaged kernel.

pith-pipeline@v0.9.0 · 5583 in / 1167 out tokens · 53531 ms · 2026-05-16T17:23:30.780627+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 2 internal anchors

  1. [1]

    2023, ARA&A, 61, 329, doi: 10.1146/annurev-astro-052920-103508

    Aigrain, S., & Foreman-Mackey, D. 2023, ARA&A, 61, 329, doi: 10.1146/annurev-astro-052920-103508

  2. [2]

    2011, , 414, 264, 10.1111/j.1365-2966.2011.18402.x

    Aigrain, S., Pont, F., & Zucker, S. 2012, MNRAS, 419, 3147, doi: 10.1111/j.1365-2966.2011.19960.x 20

  3. [3]

    E., Finley, A

    Banerjee, S., Gelfand, A. E., Finley, A. O., & Sang, H. 2008, Journal of the Royal Statistical Society Series B, 70, 825, doi: 10.1111/j.1467-9868.2008.00663.x

  4. [4]

    2015, ApJ, 800, 46, doi: 10.1088/0004-637X/800/1/46

    Barclay, T., Endl, M., Huber, D., et al. 2015, ApJ, 800, 46, doi: 10.1088/0004-637X/800/1/46

  5. [5]

    T., Fischer, D

    Blackman, R. T., Fischer, D. A., Jurgenson, C. A., et al. 2020, AJ, 159, 238, doi: 10.3847/1538-3881/ab811d

  6. [6]

    L., May, E

    Carter, A. L., May, E. M., Espinoza, N., et al. 2024, Nature Astronomy, 8, 1008, doi: 10.1038/s41550-024-02292-x

  7. [7]

    A., & Winn, J

    Carter, J. A., & Winn, J. N. 2009, ApJ, 704, 51, doi: 10.1088/0004-637X/704/1/51

  8. [8]

    M., Noyes, R

    Charbonneau, D., Brown, T. M., Noyes, R. W., & Gilliland, R. L. 2002, ApJ, 568, 377, doi: 10.1086/338770

  9. [9]

    S., Leifer, S., et al

    Crass, J., Gaudi, B. S., Leifer, S., et al. 2021, arXiv e-prints, arXiv:2107.14291. https://arxiv.org/abs/2107.14291

  10. [10]

    2020, A&A, 638, A95, doi: 10.1051/0004-6361/201936906

    Delisle, J.-B., Hara, N., & S´ egransan, D. 2020, A&A, 638, A95, doi: 10.1051/0004-6361/201936906

  11. [11]

    C., & S´ egransan, D

    Delisle, J.-B., Unger, N., Hara, N. C., & S´ egransan, D. 2022, A&A, 659, A182, doi: 10.1051/0004-6361/202141949

  12. [12]

    F., et al

    Dumusque, X., Glenday, A., Phillips, D. F., et al. 2015, ApJL, 814, L21, doi: 10.1088/2041-8205/814/2/L21

  13. [13]

    2014, PhD thesis, Computational and Biological Learning Laboratory, University of Cambridge

    Duvenaud, D. 2014, PhD thesis, Computational and Biological Learning Laboratory, University of Cambridge

  14. [14]

    Feroz, F., & Hobson, M. P. 2008, Monthly Notices of the Royal Astronomical Society, 384, 449, doi: 10.1111/j.1365-2966.2007.12353.x

  15. [15]

    E., Lacey , C

    Feroz, F., Hobson, M. P., & Bridges, M. 2009, Monthly Notices of the Royal Astronomical Society, 398, 1601, doi: 10.1111/j.1365-2966.2009.14548.x

  16. [16]

    O., Sang, H., Banerjee, S., & Gelfand, A

    Finley, A. O., Sang, H., Banerjee, S., & Gelfand, A. E. 2009, Computational Statistics & Data Analysis, 53, 2873, doi: None

  17. [17]

    B., Bender, C

    Ford, E. B., Bender, C. F., Blake, C. H., et al. 2024, arXiv e-prints, arXiv:2408.13318, doi: 10.48550/arXiv.2408.13318

  18. [18]

    2017, AJ, 154, 220, doi: 10.3847/1538-3881/aa9332

    Foreman-Mackey, D., Agol, E., Ambikasaran, S., & Angus, R. 2017, AJ, 154, 220, doi: 10.3847/1538-3881/aa9332

  19. [19]

    2011, , 414, 264, 10.1111/j.1365-2966.2011.18402.x

    Gibson, N. P., Aigrain, S., Roberts, S., et al. 2012, MNRAS, 419, 2683, doi: 10.1111/j.1365-2966.2011.19915.x

  20. [20]

    R., Howard, A

    Gibson, S. R., Howard, A. W., Marcy, G. W., et al. 2016, in SPIE Proceedings, Vol. 9908, 990870, doi: 10.1117/12.2233334

  21. [21]

    B., Jones, D

    Gilbertson, C., Ford, E. B., Jones, D. E., & Stenning, D. C. 2020, ApJ, 905, 155, doi: 10.3847/1538-4357/abc627

  22. [22]

    B., Stello, D., et al

    Guo, Z., Ford, E. B., Stello, D., et al. 2022, arXiv e-prints, arXiv:2202.06094, doi: 10.48550/arXiv.2202.06094

  23. [23]

    F., & Bedell, M

    Gupta, A. F., & Bedell, M. 2024, AJ, 168, 29, doi: 10.3847/1538-3881/ad4ce6

  24. [24]

    C., & Delisle, J.-B

    Hara, N. C., & Delisle, J.-B. 2025, A&A, 696, A141, doi: 10.1051/0004-6361/202346391

  25. [25]

    1985, in ESA Special Publication, Vol

    Harvey, J. 1985, in ESA Special Publication, Vol. 235, Future Missions in Solar, Heliospheric & Space Plasma Physics, ed. E. Rolfe & B. Battrick, 199

  26. [26]

    D., Collier Cameron, A., Queloz, D., et al

    Haywood, R. D., Collier Cameron, A., Queloz, D., et al. 2014, MNRAS, 443, 2517, doi: 10.1093/mnras/stu1320

  27. [27]

    D., Milbourne, T

    Haywood, R. D., Milbourne, T. W., Saar, S. H., et al. 2022, ApJ, 935, 6, doi: 10.3847/1538-4357/ac7c12

  28. [28]

    E., Stenning, D

    Jones, D. E., Stenning, D. C., Ford, E. B., et al. 2022, The Annals of Applied Statistics, 16, 652 , doi: 10.1214/21-AOAS1471

  29. [29]

    Lin, A. S. J., Monson, A., Mahadevan, S., et al. 2022, AJ, 163, 184, doi: 10.3847/1538-3881/ac5622

  30. [30]

    L., Brewer, J

    Llama, J., Zhao, L. L., Brewer, J. M., et al. 2024, in Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, Vol. 13094, Ground-based and Airborne Telescopes X, ed. H. K. Marshall, J. Spyromilio, & T. Usuda, 130942O, doi: 10.1117/12.3020494

  31. [31]

    E., et al

    Lucas, M., Kaur, S., Fjelde, T. E., et al. 2023,, v0.8.3 Zenodo, doi: 10.5281/zenodo.8009875

  32. [32]

    K., Wright, J

    Luhn, J. K., Wright, J. T., Howard, A. W., & Isaacson, H. 2020, AJ, 159, 235, doi: 10.3847/1538-3881/ab855a

  33. [33]

    K., Ford, E

    Luhn, J. K., Ford, E. B., Guo, Z., et al. 2023, AJ, 165, 98, doi: 10.3847/1538-3881/acad08

  34. [34]

    W., Haywood, R

    Milbourne, T. W., Haywood, R. D., Phillips, D. F., et al. 2019, ApJ, 874, 107, doi: 10.3847/1538-4357/ab064a

  35. [35]

    C., Anderson, L., Leistedt, B., et al

    Miller, A. C., Anderson, L., Leistedt, B., et al. 2022, arXiv e-prints, arXiv:2202.06797, doi: 10.48550/arXiv.2202.06797

  36. [36]

    Mukherjee, P., Parkinson, D., & Liddle, A. R. 2006, The Astrophysical Journal, 638, L51, doi: 10.1086/501068

  37. [37]

    D., Plavchan, P., Burt, J

    Newman, P. D., Plavchan, P., Burt, J. A., et al. 2023, AJ, 165, 151, doi: 10.3847/1538-3881/acad07

  38. [38]

    M., Krishnan, H., Risser, M

    Noack, M. M., Krishnan, H., Risser, M. D., & Reyes, K. G. 2022, arXiv e-prints, arXiv:2205.09070, doi: 10.48550/arXiv.2205.09070 O’Sullivan, N. K., & Aigrain, S. 2024, MNRAS, 531, 4181, doi: 10.1093/mnras/stae1059 O’Sullivan, N. K., Aigrain, S., Cretignier, M., et al. 2025, MNRAS, 541, 3942, doi: 10.1093/mnras/staf1168

  39. [39]

    2021, A&A, 645, A96, doi: 10.1051/0004-6361/202038306

    Pepe, F., Cristiani, S., Rebolo, R., et al. 2021, A&A, 645, A96, doi: 10.1051/0004-6361/202038306

  40. [40]

    L., Cunha, M

    Pereira, F., Campante, T. L., Cunha, M. S., et al. 2019, MNRAS, 489, 5764, doi: 10.1093/mnras/stz2405

  41. [41]

    F., Glenday, A

    Phillips, D. F., Glenday, A. G., Dumusque, X., et al. 2016, in Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, Vol. 9912, Advances in Optical and Mechanical Technologies for Telescopes and Instrumentation II, 99126Z, doi: 10.1117/12.2232452 21

  42. [42]

    2015, MNRAS, 452, 2269, doi: 10.1093/mnras/stv1428

    Roberts, S. 2015, MNRAS, 452, 2269, doi: 10.1093/mnras/stv1428

  43. [43]

    E., & Williams, C

    Rasmussen, C. E., & Williams, C. K. I. 2006, Gaussian Processes for Machine Learning

  44. [44]

    A., Halverson, S., Walawender, J., et al

    Rubenzahl, R. A., Halverson, S., Walawender, J., et al. 2023, PASP, 135, 125002, doi: 10.1088/1538-3873/ad0b30

  45. [45]

    2016, in Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, Vol

    Schwab, C., Rakich, A., Gong, Q., et al. 2016, in Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, Vol. 9908, Ground-based and Airborne Instrumentation for Astronomy VI, ed. C. J. Evans, L. Simard, & H. Takami, 99087H, doi: 10.1117/12.2234411

  46. [46]

    2011, , 414, 264, 10.1111/j.1365-2966.2011.18402.x

    Sing, D. K., Pont, F., Aigrain, S., et al. 2011, MNRAS, 416, 1443, doi: 10.1111/j.1365-2966.2011.19142.x

  47. [47]

    Gaussian Process Regression for Binned Data

    Smith, M. T., Alvarez, M. A., & Lawrence, N. D. 2018, arXiv e-prints, arXiv:1809.02010, doi: 10.48550/arXiv.1809.02010

  48. [48]

    Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP)

    Wilson, A. G., & Nickisch, H. 2015, arXiv e-prints, arXiv:1503.01057, doi: 10.48550/arXiv.1503.01057

  49. [49]

    L., Fischer, D

    Zhao, L. L., Fischer, D. A., Ford, E. B., et al. 2022, AJ, 163, 171, doi: 10.3847/1538-3881/ac5176

  50. [50]

    L., Dumusque, X., Ford, E

    Zhao, L. L., Dumusque, X., Ford, E. B., et al. 2023, AJ, 166, 173, doi: 10.3847/1538-3881/acf83e