pith. sign in

arxiv: 2605.04211 · v1 · submitted 2026-05-05 · 📊 stat.ME

A multivariate Birnbaum-Saunders autoregressive moving average model with application to air pollution concentration data

Pith reviewed 2026-05-08 17:21 UTC · model grok-4.3

classification 📊 stat.ME
keywords multivariate time seriesBirnbaum-Saunders distributionARMA modelsEM algorithmair pollutionPM2.5positive asymmetric dataenvironmental monitoring
0
0 comments X

The pith

The MBSARMA model combines the multivariate Birnbaum-Saunders distribution with ARMA dynamics on the conditional location parameter to jointly model correlated positive asymmetric time series such as PM2.5 concentrations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a multivariate Birnbaum-Saunders autoregressive moving average model for the joint analysis of positive, right-skewed time series that exhibit correlations across multiple responses. It incorporates ARMA components directly into the conditional location parameter of each series within the multivariate log-linear BS framework and uses the EM algorithm for parameter estimation. Monte Carlo simulations evaluate the estimators under different sample sizes and correlation levels, while an application to weekly PM2.5 data from three monitoring stations in Santiago demonstrates practical performance on real environmental series.

Core claim

The proposed MBSARMA model combines the multivariate log-linear BS framework with dynamic autoregressive moving average components on the conditional location parameter of each response and shows good performance in Monte Carlo simulations and real PM2.5 data.

What carries the argument

Multivariate Birnbaum-Saunders distribution with ARMA dynamics applied to the conditional location parameters, enabling joint modeling of temporal dependence and cross-response correlations in positive asymmetric series.

If this is right

  • Joint forecasting of pollution levels across monitoring stations becomes possible while preserving the skewed marginal distributions.
  • Exogenous terms can be included to account for external factors such as weather variables in the multivariate setting.
  • The EM estimation procedure maintains accuracy for moderate sample sizes and varying correlation strengths according to the simulation results.
  • The model supports environmental applications by handling the positive asymmetric nature of concentration data without transformation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same structure could be applied to other positive skewed multivariate series in fields such as reliability engineering or insurance claim modeling.
  • Adding nonlinear or higher-order dependence might further improve fit for series with complex seasonal patterns.
  • Distribution-specific multivariate time series models may reduce misspecification errors compared with standard approaches that assume normality after transformation.

Load-bearing premise

The observed series follow the multivariate Birnbaum-Saunders distribution with the specified ARMA structure on the location parameters, and the EM algorithm recovers the parameters reliably under the correlation structures in the data.

What would settle it

Generate synthetic multivariate series from a different distribution such as multivariate lognormal with ARMA dependence and fit the MBSARMA model to check whether parameter estimates show large bias or whether out-of-sample predictions degrade sharply.

Figures

Figures reproduced from arXiv: 2605.04211 by Helton Saulo.

Figure 1
Figure 1. Figure 1: Histograms of the weekly PM2.5 series (top row, raw scale) and their log￾transforms (bottom row) for Las Condes (St1, left), El Bosque (St2, centre), and Quilicura (St3, right). In order to assess the goodness of fit of the selected model, we compute the Maha￾lanobis distance residuals D2 t defined in (36), which under correct specification follow a χ 2 d distribution with d = 3 view at source ↗
Figure 2
Figure 2. Figure 2: Chi-squared (χ 2 3 ) QQ plot of the Mahalanobis distance residuals D2 t with 95% simulated envelopes. and (34) view at source ↗
Figure 3
Figure 3. Figure 3: Sample ACF (top row) and PACF (bottom row) of the compo view at source ↗
read the original abstract

Fine particulate matter (PM$_{2.5}$) concentration data are positive, right-skewed series that arise naturally in environmental monitoring and are well described by the Birnbaum-Saunders (BS) distribution. In this paper, we propose a multivariate BS autoregressive moving average (MBSARMA) model with exogenous terms for the joint analysis of correlated positive asymmetric time series. The proposed model combines the multivariate log-linear BS framework with dynamic autoregressive moving average components on the conditional location parameter of each response. We estimate the model parameters by means of the Expectation-Maximisation (EM) algorithm. The performance of the proposed conditional likelihood estimators is evaluated by means of a Monte Carlo simulation study under several correlation levels and sample sizes. An application to weekly PM$_{2.5}$ pollution concentration data recorded at three monitoring stations in Santiago, Chile, obtained from the National Air Quality Information System of Chile (SINCA), is presented. The results show the good performance of the proposed methodology.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes a multivariate Birnbaum-Saunders autoregressive moving average (MBSARMA) model for joint analysis of correlated positive asymmetric time series such as PM2.5 concentrations. It combines the multivariate log-linear BS framework with ARMA dynamics on the conditional location parameter of each response, estimates parameters via the EM algorithm, evaluates the conditional likelihood estimators through Monte Carlo simulations under several correlation levels and sample sizes, and applies the model to weekly PM2.5 data from three Santiago monitoring stations.

Significance. If the central claims hold, the MBSARMA model supplies a flexible parametric framework for multivariate skewed positive series with temporal dependence, which is relevant for environmental statistics. The EM estimation approach and the real-data illustration are standard strengths; the Monte Carlo design under varying correlations is also a positive feature when fully documented.

major comments (3)
  1. [§3] §3 (EM algorithm): The E-step requires the conditional expectation of the latent variables from the multivariate BS representation given the full observed vector and history. The manuscript must specify whether this expectation is obtained exactly from the joint distribution or via an approximation or marginalization; under the high cross-correlations typical of nearby PM2.5 stations, any approximation risks biasing the ARMA coefficient updates in the M-step.
  2. [§4] §4 (Monte Carlo study): The study reports 'good performance' under several correlation levels, yet provides neither the explicit correlation matrices tested nor a comparison of those levels to the empirical cross-correlations in the Santiago data. This omission prevents verification that the EM estimators remain reliable at the dependence strengths encountered in the application.
  3. [§5] §5 (Application): The fitted model is presented without the estimated correlation matrix or the selected ARMA orders (p, q); these quantities are needed to assess whether the dynamics and dependence structure are adequately captured and to judge the practical utility of the results.
minor comments (2)
  1. [Abstract] The abstract refers to 'conditional likelihood estimators' while the body uses EM; a brief clarification of the relationship would improve readability.
  2. [Notation] Notation for the BS shape parameters and the ARMA orders should be checked for consistency between the model definition and the simulation/application sections.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment point by point below and will incorporate the suggested clarifications and additions in the revised version.

read point-by-point responses
  1. Referee: [§3] §3 (EM algorithm): The E-step requires the conditional expectation of the latent variables from the multivariate BS representation given the full observed vector and history. The manuscript must specify whether this expectation is obtained exactly from the joint distribution or via an approximation or marginalization; under the high cross-correlations typical of nearby PM2.5 stations, any approximation risks biasing the ARMA coefficient updates in the M-step.

    Authors: We appreciate the referee drawing attention to this point. In our EM algorithm the conditional expectations of the latent variables are obtained exactly from the joint multivariate Birnbaum-Saunders distribution; no approximation or marginalization is employed. We will add an explicit statement and the relevant conditional-expectation formulas to Section 3 of the revised manuscript to make this clear. The simulation results already indicate that the M-step updates remain stable at the correlation levels examined, including those comparable to the application. revision: yes

  2. Referee: [§4] §4 (Monte Carlo study): The study reports 'good performance' under several correlation levels, yet provides neither the explicit correlation matrices tested nor a comparison of those levels to the empirical cross-correlations in the Santiago data. This omission prevents verification that the EM estimators remain reliable at the dependence strengths encountered in the application.

    Authors: We agree that the simulation design would be more transparent with this information. In the revised manuscript we will report the explicit correlation matrices used for the low-, moderate-, and high-correlation scenarios and will add a direct comparison with the sample cross-correlation matrix computed from the three Santiago PM2.5 series. This will allow readers to confirm that the simulated dependence structures bracket the empirical dependence observed in the data. revision: yes

  3. Referee: [§5] §5 (Application): The fitted model is presented without the estimated correlation matrix or the selected ARMA orders (p, q); these quantities are needed to assess whether the dynamics and dependence structure are adequately captured and to judge the practical utility of the results.

    Authors: We thank the referee for noting this omission. The revised version will include the estimated correlation matrix obtained from the fitted MBSARMA model, the selected ARMA orders (p, q) for each of the three series, and a brief description of the model-selection criterion employed. These additions will enable readers to evaluate the fitted dynamics and dependence structure directly. revision: yes

Circularity Check

0 steps flagged

No circularity: model specification, EM estimation, and validation are independent of fitted outputs

full rationale

The MBSARMA model is constructed by extending the multivariate log-linear Birnbaum-Saunders distribution with exogenous ARMA components on the conditional location parameters; parameters are recovered via a standard EM algorithm whose E-step and M-step follow from the joint distribution and the ARMA recursion. Monte Carlo evaluation is performed on simulated data generated from the model under controlled correlation levels, and the real-data application compares fitted values against held-out observations without re-using the same quantities as both input and output. No equation equates a derived quantity to a fitted parameter by definition, no uniqueness theorem is imported from self-citation, and no ansatz is smuggled through prior work. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

3 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the data-generating process is exactly multivariate log-linear Birnbaum-Saunders with ARMA dynamics on location; several free parameters (ARMA orders, correlation matrix, shape parameters) must be chosen or estimated.

free parameters (3)
  • ARMA orders p and q
    Orders of autoregressive and moving-average terms on the conditional location parameter; selected or fitted per series.
  • Correlation matrix parameters
    Off-diagonal entries of the multivariate dependence structure between the three series.
  • BS shape parameters
    Shape parameters of the Birnbaum-Saunders marginals, estimated jointly.
axioms (1)
  • domain assumption The conditional distribution of each response given past values and covariates is multivariate Birnbaum-Saunders.
    Invoked to justify the model for positive right-skewed series.

pith-pipeline@v0.9.0 · 5470 in / 1323 out tokens · 30623 ms · 2026-05-08T17:21:32.215573+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages

  1. [1]

    Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control , 19:716--723

  2. [2]

    Bhatti, C. (2010). The Birnbaum-Saunders autoregressive conditional duration model . Mathematics and Computers in Simulation , 80:2062--2078

  3. [3]

    Bhogal, S. K. and Variyam Thekke, R. (2019). Conditional duration models for high-frequency data: A review on recent developments. Journal of Economic Surveys , 33(1):252--273

  4. [4]

    and Saunders, S

    Birnbaum, Z. and Saunders, S. (1969). A new family of life distributions. Journal of Applied Probability , 6:319--327

  5. [5]

    and Snell, E

    Cox, D. and Snell, E. (1968). A general definition of residuals. Journal of the Royal Statistical Society, Series B , 30:248--275

  6. [6]

    Dempster, A., Laird, N., and Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B , 39:1--38

  7. [7]

    and Hinkley, D

    Efron, B. and Hinkley, D. (1978). Assessing the accuracy of the maximum likelihood estimator: observed versus expected F isher information. Biometrika , 65:457--487

  8. [8]

    Ibacache-Pulgar, G., Marchant, C., Osorio, M., and Saulo, H. (2026). A novel partially linear varying coefficient model with diagnostic analysis for the B irnbaum- S aunders distribution: application to real-world air pollution data. Journal of Applied Statistics

  9. [9]

    Johnson, N., Kotz, S., and Balakrishnan, N. (1994). Continuous Univariate Distributions, V ol. 1 . Wiley, New York

  10. [10]

    Lange, K. (2010). Numerical Analysis for Statisticians . Springer, New York, 2nd edition

  11. [11]

    Leiva, V. (2016). The B irnbaum- S aunders Distribution . Academic Press, New York

  12. [12]

    Leiva, V., Marchant, C., Ruggeri, F., and Saulo, H. (2015). A criterion for environmental assessment using B irnbaum- S aunders attribute control charts. Environmetrics , 26:463--476

  13. [13]

    Leiva, V., Rojas, M., Paula, F., and Sanhueza, A. (2008). Generalized B irnbaum- S aunders distributions applied to air pollutant concentration. Environmetrics , 19:235--249

  14. [14]

    Leiva, V., Saulo, H., Le\ a o, J., and Marchant, C. (2014). A family of autoregressive conditional duration models applied to financial data . Computational Statistics and Data Analysis , 79:175--191

  15. [15]

    Leiva, V., Saulo, H., Souza, R., Aykroyd, R., and Vila, R. (2021). A new BISARMA time series model for forecasting mortality using weather and particulate matter data. Journal of Forecasting , 40:346--364

  16. [16]

    M\" a kel\" a inen, T., Schmidt, K., and Styan, G. (1981). On the existence and uniqueness of the maximum likelihood estimate of a vector-valued parameter in fixed-size samples. Annals of Statistics , 9:758--767

  17. [17]

    Marchant, C., Leiva, V., Cysneiros, F., and Liu, S. (2018). Robust multivariate control charts based on B irnbaum- S aunders distributions. Journal of Statistical Computation and Simulation , 88:182--202

  18. [18]

    Marchant, C., Leiva, V., Cysneiros, F., and Vivanco, J. (2016). A multivariate log-linear model for B irnbaum- S aunders distributions. IEEE Transactions on Reliability , 65:816--827

  19. [19]

    and Krishnan, T

    McLachlan , G. and Krishnan, T. (2008). The EM Algorithm and Extensions . Wiley, New York, 2nd edition

  20. [20]

    Sistema de información nacional de calidad del aire ( SINCA )

    Ministerio del Medio Ambiente de Chile (2024). Sistema de información nacional de calidad del aire ( SINCA ). https://sinca.mma.gob.cl/. Accessed: April 2024

  21. [21]

    Mu\ n oz, R., Garreaud, R., Rutllant, J., Seguel, R., and Corral, M. (2023). New observations of the meteorological conditions associated with particulate matter air pollution episodes in S antiago, C hile. Atmosphere , 14:1454

  22. [22]

    Puentes, R., Marchant, C., Leiva, V., Figueroa-Z\' u \ n iga, J., and Ruggeri, F. (2021). Predicting PM _ 2.5 and PM _ 10 levels during critical episodes management in S antiago, C hile, with a bivariate B irnbaum- S aunders log-linear model. Mathematics , 9:645

  23. [23]

    R: A Language and Environment for Statistical Computing

    R Core Team (2023). R: A Language and Environment for Statistical Computing . R Foundation for Statistical Computing, Vienna, Austria

  24. [24]

    and Nedelman, J

    Rieck, J. and Nedelman, J. (1991). A log-linear model for the B irnbaum- S aunders distribution. Technometrics , 33:51--60

  25. [25]

    Saulo, H., Balakrishnan, N., and Vila, R. (2023). On a quantile autoregressive conditional duration model. Mathematics and Computers in Simulation , 203:425--448

  26. [26]

    Saulo, H., Le\ a o, J., Leiva, V., and Aykroyd, R. G. (2019). Birnbaum- S aunders autoregressive conditional duration models applied to high-frequency financial data. Statistical Papers , 60:1605--1629

  27. [27]

    Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics , 6:461--464

  28. [28]

    Zhang, S., Guo, B., Dong, A., He, J., Xu, Z., and Chen, S. (2017). Cautionary tales on air-quality improvement in B eijing. Proceedings of the Royal Society A , 473:20170457