Nested Sampling for ARIMA Model Selection in Astronomical Time-Series Analysis

Ajinkya Naik; Will Handley

arxiv: 2512.01929 · v3 · pith:UVQLOHJAnew · submitted 2025-12-01 · 🌌 astro-ph.IM · astro-ph.EP· astro-ph.SR

Nested Sampling for ARIMA Model Selection in Astronomical Time-Series Analysis

Ajinkya Naik , Will Handley This is my paper

Pith reviewed 2026-05-17 02:18 UTC · model grok-4.3

classification 🌌 astro-ph.IM astro-ph.EPastro-ph.SR

keywords ARIMAnested samplingBayesian evidencetime-series analysismodel selectionastronomical surveysstochastic variability

0 comments

The pith

Nested sampling computes Bayesian evidence to select optimal ARIMA orders for astronomical time series.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method that pairs ARIMA models with nested sampling to calculate Bayesian evidences across different autoregressive and moving average orders. This evidence includes a natural penalty against unnecessary complexity, addressing the overfitting risk in time-series modeling. The implementation is vectorized with GPU acceleration to handle grids of models efficiently. Validation on simulated data recovers known orders and parameters, while applications to real data such as sunspot records, Kepler stellar light curves, and TESS quasar light curves show that the selected models capture the observed stochastic variability.

Core claim

Integrating ARIMA models with nested sampling produces Bayesian evidences for model comparison across AR and MA orders while automatically incorporating an Occam's penalty for extra complexity, and the resulting framework enables both reliable order selection and parameter inference for astronomical time series.

What carries the argument

Nested sampling algorithm used to evaluate Bayesian evidence for ARIMA likelihoods over grids of model orders.

Load-bearing premise

Astronomical time series are adequately described by linear ARIMA processes whose orders can be reliably distinguished by Bayesian evidence from nested sampling without significant model misspecification or sampling failures.

What would settle it

Generating simulated time series from a known ARIMA order and showing that the method repeatedly selects a different order or fails to recover the parameters would falsify the recovery claim.

Figures

Figures reproduced from arXiv: 2512.01929 by Ajinkya Naik, Will Handley.

**Figure 1.** Figure 1: Artificially generated AR(2) time-series of 300 data points with 𝜙1 = 0.6 and 𝜙2 = 0.3. A constant intercept term of 𝑐 = 1.5 and a standard deviation of 𝜎 = 1.0 associated to 𝜖𝑡 was used. 0 1 2 3 4 5 AR (p) 0 1 2 3 4 5 MA (q) -94.6 ±0.4 -60.8 ±0.4 -35.1 ±0.6 -32.0 ±0.6 -26.8 ±0.5 -15.4 ±0.5 -1.3 ±0.5 -3.2 ±0.5 -4.7 ±0.5 -6.4 ±0.6 -5.7 ±0.6 -1.2 ±0.5 -2.2 ±0.5 -2.7 ±0.5 -5.0 ±0.5 -5.9 ±0.5 -6.9 ±0.6 -2.3 ±0… view at source ↗

**Figure 2.** Figure 2: Heatmap of the model log posterior probabilities 𝑃𝑖 for simulated AR(2) time-series ( [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 5.** Figure 5: Yearly Sunspots Number Data from 1700 to 2008 [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 3.** Figure 3: Posterior distributions of the AR(2) model parameters inferred from the simulated AR(2) time-series ( [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Artificially generated ARMA(1, 1) process of 490 data points with a linear trend. The ARMA coefficients are chosen to be 𝜙1 = 0.6 and 𝜃 = −0.4. The constant intercept term and standard deviation are 𝑐 = 2 and 𝜎 = 1, respectively [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 6.** Figure 6: Heatmap of the model log posterior probabilities obtained from the nested sampling runs on yearly sunspots number data ( [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Heatmap of the Bayesian Information Criterion (BIC) values for ARIMA model fit on yearly sunspots number data. The lowest value of BIC is observed for ARIMA(3, 0, 3). There is no clear preference seen for higher ARIMA orders suggesting that this method of model selection has penalized complex models more strongly than nested sampling. als. The Autocorrelation function (ACF) and Partial Autocorrelation func… view at source ↗

**Figure 8.** Figure 8: Posterior corner plot of ARIMA(9, 0, 1) model parameters obtained from the nested sampling run on the yearly sunspots number data ( [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 11.** Figure 11: Autocorrelation (top) and Partial Autocorrelation (bottom) function plots of the mean residuals from the ARIMA(9, 0, 1) fit to the yearly sunspots number data ( [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗

**Figure 12.** Figure 12: Posterior predictive forecasts of the yearly sunspots number for the years from 1954 to 2008. The forecasts are obtained using 5000 weighted posterior samples from the ARIMA(9, 0, 1) fit ( [PITH_FULL_IMAGE:figures/full_fig_p009_12.png] view at source ↗

**Figure 13.** Figure 13: Kepler long-cadence light curve of KIC 12008916, normalized and shown over a 10-day segment. The star exhibits the characteristic stochastic, solar-like oscillations of a low-luminosity red giant. 0 1 2 3 4 AR (p) 0 1 2 3 4 MA (q) -0.0 ±0.8 -4.3 ±0.9 -8.0 ±0.7 -13.3 ±0.9 -13.7 ±0.7 -21.6 ±1.0 -10.1 ±0.9 -29.0 ±1.0 -45.9 ±1.2 -11.3 ±0.8 -86.6 ±1.2 -72.8 ±0.8 -120.5 ±1.1 -70.9 ±1.4 -16.0 ±0.9 -91.6 ±1.0 -63… view at source ↗

**Figure 14.** Figure 14: Heatmap of the ARIMA models’ log posterior probabilities obtained from the nested sampling runs on KIC 12008916 data ( [PITH_FULL_IMAGE:figures/full_fig_p010_14.png] view at source ↗

**Figure 15.** Figure 15: Autocorrelation and Partial autocorrelation plots for the lightcurve of KIC 12008916 (top) and the residuals (bottom) obtained after subtracting the best fit ARIMA(0, 0, 1) model from the lightcurve data. The model fit has been able to capture most of the autocorrelation in the lightcurve data. 3560 3565 3570 3575 3580 3585 Days (BTJD 2457000) 0.995 1.000 1.005 Normalized Flux (1) (2) (3) (4) (5) [PITH_F… view at source ↗

**Figure 16.** Figure 16: Normalized TESS photometric lightcurve of Ross 176 from Sector-83, processed using the MIT Quick Look Pipeline (QLP). Dashed red lines indicate gaps in the data where we perform the partition into the corresponding training datasets and label them numerically. relation in the time-series is not expected a priori, then the “rejection sampling" process can be sped up by regularizing and constricting the pri… view at source ↗

**Figure 17.** Figure 17: Autocorrelation function (ACF) and Partial Autocorrelation function (PACF) plots for the training datasets (1), (4), and (5) (from top to bottom) of Ross 176 (see [PITH_FULL_IMAGE:figures/full_fig_p012_17.png] view at source ↗

read the original abstract

The era of large-scale, high-cadence astronomical surveys demands efficient and robust methods for time-series analysis. ARIMA models provide a versatile parametric description of stochastic variability in this context. However, their practical use is limited by the challenge of selecting optimal model orders while avoiding overfitting. We present a novel solution this problem by combining Autoregressive Integrated Moving Average (ARIMA) models with the Nested Sampling algorithm. Our method yields Bayesian evidences for model comparison and also incorporates an intrinsic Occam's penalty for unnecessary model complexity. Using JAX and Blackjax, a vectorized ARIMA-Nested Sampling framework with GPU-acceleration support is implemented, allowing us to perform model selection across grids of Autoregressive (AR) and Moving Average (MA) orders, with efficient inference of selected model parameters. We validate the approach using simulated time series with known ground-truth parameters and demonstrate accurate recovery of both model order and parameters. We then apply the method to several astronomical datasets, including the historical sunspot number record, stellar light curves of KIC 12008916 and Kepler 17 from the Kepler mission, and quasar light curves of 3C 273 and S4 0954+65 from the TESS mission. For all cases, except Kepler 17, the ARIMA models selected by this method were able to accurately model the stochastic variability in the time series data as well as produce accurate multi-step ahead forecasts for the sunspot number time series. Our results demonstrate that nested sampling offers a rigorous and computationally tractable alternative to autoregressive model selection in astronomical time-series analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Nested sampling for ARIMA order selection works on the tested astro datasets but skips independent checks on the evidence values themselves.

read the letter

The main point is a working vectorized nested sampling implementation for ARIMA order selection on astronomical time series, with GPU support and tests on both simulations and real data like sunspots, Kepler stars, and TESS quasars. It recovers known orders and parameters in the simulations and produces usable fits on the actual series, which is the practical test that matters for this kind of tool. The Bayesian evidence supplies a built-in penalty for extra parameters, avoiding some of the manual tuning that comes with AIC or BIC. The vectorized code that scans grids of AR and MA orders efficiently is the concrete advance here, and applying it across several different astronomical sources shows they thought about real use cases. The soft spot is the lack of any cross-check on the evidence numbers. For higher-order models the likelihood surfaces get narrow and correlated, and nested sampling can miss mass or converge slowly without enough live points or shrinkage control. The paper reports successful order recovery but does not compare the log-evidence values against bridge sampling, thermodynamic integration, or reversible-jump MCMC on the same series. That leaves open the possibility that the selected orders rest partly on sampler behavior rather than fully accurate marginal likelihoods. This is aimed at astronomers who already fit ARIMA models to light curves and want a Bayesian route that scales to survey volumes. A reader who needs to process many time series could extract value from the framework and any released code. It is coherent enough and grounded in actual data to deserve a serious referee, though reviewers will probably ask for the evidence validation step.

Referee Report

2 major / 2 minor

Summary. The paper proposes combining ARIMA models with nested sampling to compute Bayesian evidences for selecting optimal orders (p, d, q) in astronomical time-series analysis. It implements a vectorized, GPU-accelerated framework that yields model evidences incorporating an Occam's penalty, validates the approach on simulated data with known ground-truth parameters, and applies it to real datasets including the sunspot record, Kepler light curves (KIC 12008916, Kepler 17), and TESS quasar light curves (3C 273, S4 0954+65), claiming accurate recovery of orders/parameters in simulations and plausible fits on observations.

Significance. If the central claim holds, the work would provide a computationally tractable Bayesian tool for ARIMA order selection in high-cadence surveys, unifying evidence-based model comparison with parameter inference. The GPU support and application to diverse real datasets are strengths; however, the absence of quantitative validation metrics and independent evidence cross-checks reduces the assessed significance.

major comments (2)

[§4] §4 (simulation validation): the claim of 'accurate recovery of both model order and parameters' supplies no quantitative metrics (e.g., recovery fraction, RMSE on parameters, or evidence error bars across realizations), leaving the support for the central claim of reliable order distinction unquantified.
[§3] §3 (nested sampling implementation): the reported Bayesian evidences for higher-order ARMA models are not cross-validated against an independent estimator such as bridge sampling or thermodynamic integration on the same series; given the high-dimensional, correlated parameter spaces of ARIMA models, this verification is load-bearing for trusting the evidence-based order selection.

minor comments (2)

[§2] The abstract and method sections would benefit from explicit reference to standard ARIMA likelihood formulations (e.g., the recursive residual or Toeplitz covariance approaches) to clarify how the nested sampling likelihood is constructed.
[§5] Figure captions for the real-data fits should include the selected (p,d,q) orders and the corresponding log-evidence values for direct comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments on our manuscript. We have carefully considered each point and revised the paper accordingly to strengthen the presentation of our results.

read point-by-point responses

Referee: [§4] §4 (simulation validation): the claim of 'accurate recovery of both model order and parameters' supplies no quantitative metrics (e.g., recovery fraction, RMSE on parameters, or evidence error bars across realizations), leaving the support for the central claim of reliable order distinction unquantified.

Authors: We agree that quantitative metrics are necessary to rigorously support the claim of accurate recovery. In the revised manuscript we have expanded §4 to include the fraction of simulations in which the ground-truth order (p, d, q) is correctly recovered, the root-mean-square error on the inferred AR and MA coefficients across realizations, and the standard deviation of the log-evidence values computed from repeated nested-sampling runs. These additions provide a clearer, numerical demonstration of the method’s reliability. revision: yes
Referee: [§3] §3 (nested sampling implementation): the reported Bayesian evidences for higher-order ARMA models are not cross-validated against an independent estimator such as bridge sampling or thermodynamic integration on the same series; given the high-dimensional, correlated parameter spaces of ARIMA models, this verification is load-bearing for trusting the evidence-based order selection.

Authors: We acknowledge that independent cross-validation of the evidence estimates would increase confidence, especially in the correlated, high-dimensional parameter spaces of ARIMA models. Performing bridge sampling or thermodynamic integration on the same series would require substantial additional implementation and compute time that lies outside the scope of the present work. We have therefore added an explicit discussion of this limitation in the revised text, while retaining the simulation-based validation (recovery of known ground-truth parameters and orders) as the primary empirical support for the reliability of the nested-sampling evidences. We believe this approach is sufficient for the current contribution but agree that future studies could usefully include such cross-checks. revision: partial

Circularity Check

0 steps flagged

No circularity: standard nested sampling applied to standard ARIMA likelihoods with external validation on simulations

full rationale

The paper combines established ARIMA likelihoods with nested sampling to compute Bayesian evidences for model-order selection. Validation proceeds by generating simulated time series with known ground-truth orders and parameters, then recovering both via the method; this constitutes an independent check against external benchmarks rather than any self-referential reduction. No equations are presented that define the evidence or selected orders in terms of the fitted parameters themselves, nor does any load-bearing step rely on a self-citation chain that is itself unverified. The derivation is therefore self-contained: the inputs are the standard ARIMA model class and the standard nested-sampling algorithm, while the outputs (evidences and order selections) are computed quantities that can be falsified by the simulation recovery tests.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on the domain assumption that astronomical variability can be represented by ARIMA processes and on the standard mathematical properties of nested sampling for evidence calculation; no new entities are postulated and no free parameters are fitted beyond the usual ARIMA coefficients and orders that are selected by the evidence.

axioms (2)

domain assumption Astronomical time series can be adequately modeled as ARIMA processes after appropriate differencing
Invoked when applying the method to sunspot numbers, Kepler light curves, and TESS quasar data.
standard math Nested sampling correctly computes the Bayesian evidence for ARIMA models of varying orders
Relies on the established properties of nested sampling as implemented in the vectorized framework.

pith-pipeline@v0.9.0 · 5576 in / 1593 out tokens · 36250 ms · 2026-05-17T02:18:13.062945+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our method yields Bayesian evidences for model comparison and also incorporates an intrinsic Occam’s penalty for unnecessary model complexity.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.