Estimation of Long-Range Dependent Models with Missing Data: to Impute or not to Impute?
Pith reviewed 2026-05-24 09:38 UTC · model grok-4.3
The pith
A Monte Carlo study of 35 setups compares imputation to direct methods for estimating the long-memory parameter d in ARFIMA models with missing data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper conducts a Monte Carlo simulation study that compares 35 different setups for estimating d under numerous scenarios with 10% to 70% missing data and several levels of dependence, contrasting imputation-based estimation with methods tailored for missing observations in ARFIMA(p,d,q) models.
What carries the argument
Monte Carlo simulation comparing 35 estimation setups for the long-memory parameter d with missing data
If this is right
- Imputation before estimation remains competitive with specialized missing-data estimators across a wide range of missing percentages.
- Performance differences between the two approaches depend on both the fraction of missing observations and the strength of dependence.
- Review of available methods shows practical options exist under both the imputation route and the direct-missing-data route.
Where Pith is reading between the lines
- The simulation results could let analysts avoid unnecessarily complex missing-data algorithms when simpler imputation suffices.
- Similar large-scale comparisons could be repeated for other long-memory models or for non-Gaussian series.
- The study framework supplies a template that later work can apply to real empirical series whose true d is known from complete subsamples.
Load-bearing premise
The Monte Carlo simulation designs and missing-data mechanisms used are representative of real-world performance for ARFIMA estimation with missing observations.
What would settle it
Take a long series with known d, delete observations according to the mechanisms studied, apply the simulation-identified best method, and check whether it recovers the true d more accurately than the alternatives.
Figures
read the original abstract
Among the most important models for long-range dependent time series is the class of ARFIMA$(p,d,q)$ (Autoregressive Fractionally Integrated Moving Average) models. Estimating the long-range dependence parameter $d$ in ARFIMA models is a well-studied problem, but the literature regarding the estimation of $d$ in the presence of missing data is very sparse. There are two basic approaches to dealing with the problem: missing data can be imputed using some plausible method, and then the estimation can proceed as if no data were missing, or we can use a specially tailored methodology to estimate $d$ in the presence of missing data. In this work, we review some of the methods available for both approaches and compare them through a Monte Carlo simulation study. We present a comparison among 35 different setups to estimate $d$, under tenths of different scenarios, considering percentages of missing data ranging from as few as 10\% up to 70\% and several levels of dependence.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reviews imputation-based and specialized likelihood methods for estimating the long-memory parameter d in ARFIMA(p,d,q) models with missing observations. It conducts a Monte Carlo study comparing 35 estimation setups across missing-data fractions from 10% to 70% and multiple dependence strengths, with the goal of determining whether imputation or tailored methods perform better under these conditions.
Significance. The literature on ARFIMA estimation with missing data is sparse, so a systematic comparison of imputation versus direct methods could offer practical guidance. The value of the comparison hinges on whether the simulated missingness mechanisms are representative of realistic dependence structures in long-memory series; if they are, the results would help practitioners choose between the two approaches.
major comments (2)
- [Simulation study] Simulation study section: the manuscript does not specify whether missingness is generated under MCAR, MAR, or MNAR, nor whether the observation indicator is allowed to depend on lagged values of the ARFIMA process. In long-memory data this dependence can materially alter the bias of imputed estimators relative to exact or approximate likelihood methods that account for the missing pattern; without this information the reported performance rankings cannot be generalized beyond the particular simulation design.
- [Simulation study] Simulation study section: the 35 setups are described only at a high level in the abstract; the precise combination of imputation techniques, likelihood approximations, and software implementations used for each setup is not enumerated, making it impossible to reproduce or assess whether the comparison is exhaustive or contains redundant variants.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our Monte Carlo study. We address each major comment below and will revise the manuscript to improve clarity and reproducibility.
read point-by-point responses
-
Referee: [Simulation study] Simulation study section: the manuscript does not specify whether missingness is generated under MCAR, MAR, or MNAR, nor whether the observation indicator is allowed to depend on lagged values of the ARFIMA process. In long-memory data this dependence can materially alter the bias of imputed estimators relative to exact or approximate likelihood methods that account for the missing pattern; without this information the reported performance rankings cannot be generalized beyond the particular simulation design.
Authors: We agree that the missingness mechanism must be clearly stated for proper interpretation. Our simulations used an MCAR mechanism in which the observation indicator is generated independently of the ARFIMA process values and their lags. We will add an explicit description of the data-generation process, including the MCAR assumption and independence from lagged values, to the Simulation study section. revision: yes
-
Referee: [Simulation study] Simulation study section: the 35 setups are described only at a high level in the abstract; the precise combination of imputation techniques, likelihood approximations, and software implementations used for each setup is not enumerated, making it impossible to reproduce or assess whether the comparison is exhaustive or contains redundant variants.
Authors: We accept that a high-level description limits reproducibility. The revised manuscript will contain a detailed table enumerating all 35 setups, specifying the exact imputation technique, likelihood method or approximation, software package and version, and any tuning parameters for each combination. revision: yes
Circularity Check
No circularity: empirical Monte Carlo comparison with no derivation chain
full rationale
This is a pure simulation study that generates performance metrics for 35 estimation setups across missing-data percentages and dependence levels. No mathematical derivation, parameter fitting to target quantities, or self-citation load-bearing premise is present. The central output (relative performance rankings) is produced by running the estimators on simulated series; it does not reduce to any input by construction. External benchmarks (real data or alternative missingness mechanisms) are not required for the internal consistency of the reported Monte Carlo results.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
< ∞. Assume that the copula related to ( X0, Xk) is Cθk, where {θk}k∈N∗ is a sequence in D satisfying lim n→∞ θn = a. Assume further that Cov( X0, Xn) ∼ R(n, η), where R(n, η) is a given continuous function such that R(n, η) → 0, as n goes to infinity, and η ∈ S ⊆ Rp is some (identifiable) parameter of interest. Also assume that θn − a ∼ L(n, η), 26 where...
work page 2023
-
[2]
Given {Cθ}θ∈Θ as discussed above, chose a method to perform parameter estimation in the family
-
[3]
Estimate ˆK1 and ˆK2 by plugging in these estimators into (A.13) and (A.14), respectively
Chose estimators ˆFn, ˆF −1 n and ˆF ′ n of the underlying unknown distribution F , quantile function F −1 and density function F ′, respectively. Estimate ˆK1 and ˆK2 by plugging in these estimators into (A.13) and (A.14), respectively. We must have 0 < ˆK1 < ∞ and ˆK2 < ∞
-
[4]
Set yi := ˆFn(xi), for i = 1, · · · , n
-
[5]
Let s and m be two integers satisfying 1 < s < m < n . For each ℓ ∈ { s, · · · , m}, form a sequence {u(ℓ) k }n−ℓ k=1 by setting u(ℓ) i := (yi, yi+ℓ) ∈ [0, 1]2, i = 1, · · · , n − ℓ. From these pseudo-observations, estimate of the copula parameter θℓ, denoted by ˆθℓ(n)
-
[6]
Let bLs,m(n) := ˆK1(ˆθs(n) − a, · · · , ˆθm(n) − a)′ and Rs,m(η) := R(s, η), · · · , R(m, η) ′
Let D : Rm−s+1 ×Rm−s+1 → [0, ∞), be a given function measuring the distance between two vectors in Rm−s+1. Let bLs,m(n) := ˆK1(ˆθs(n) − a, · · · , ˆθm(n) − a)′ and Rs,m(η) := R(s, η), · · · , R(m, η) ′. The estimator ˆηs,m(n) of η is then defined as ˆηs,m(n) := argmin η∈S D bLs,m(n), Rs,m(η) . (A.15) In practice, the estimation procedure in 1 and the esti...
work page 1997
-
[7]
native2. mean3. linear4. random C.abry C.full DFA ELW GPH LoMPE LW PP.F PP.G RS C.abry C.full DFA ELW GPH LoMPE LW PP.F PP.G RS C.abry C.full DFA ELW GPH LoMPE LW PP.F PP.G RS −0.6 −0.3 0.0 0.3 0.6 −0.6 −0.3 0.0 0.3 0.6 −0.6 −0.3 0.0 0.3 0.6 −0.6 −0.3 0.0 0.3 0.6 Estimator C.abry C.full DFA ELW GPH LoMPE LW PP.F PP.G RS Figure 1: Box plot of the adjusted ...
-
[8]
native2. mean3. linear4. random C.abry C.full DFA ELW GPH LoMPE LW PP.F PP.G RS C.abry C.full DFA ELW GPH LoMPE LW PP.F PP.G RS C.abry C.full DFA ELW GPH LoMPE LW PP.F PP.G RS −0.6 −0.3 0.0 0.3 0.6 −0.6 −0.3 0.0 0.3 0.6 −0.6 −0.3 0.0 0.3 0.6 −0.6 −0.3 0.0 0.3 0.6 Estimator C.abry C.full DFA ELW GPH LoMPE LW PP.F PP.G RS Figure 2: Box plot of the adjusted ...
-
[9]
native2. mean3. linear4. random C.abry C.full DFA ELW GPH LoMPE LW PP.F PP.G RS C.abry C.full DFA ELW GPH LoMPE LW PP.F PP.G RS C.abry C.full DFA ELW GPH LoMPE LW PP.F PP.G RS −0.6 −0.3 0.0 0.3 0.6 −0.6 −0.3 0.0 0.3 0.6 −0.6 −0.3 0.0 0.3 0.6 −0.6 −0.3 0.0 0.3 0.6 Estimator C.abry C.full DFA ELW GPH LoMPE LW PP.F PP.G RS Figure 3: Box plot of the adjusted ...
-
[10]
native2. mean3. linear4. random C.abry C.full DFA ELW GPH LoMPE LW PP.F PP.G RS C.abry C.full DFA ELW GPH LoMPE LW PP.F PP.G RS C.abry C.full DFA ELW GPH LoMPE LW PP.F PP.G RS −0.6 −0.3 0.0 0.3 0.6 −0.6 −0.3 0.0 0.3 0.6 −0.6 −0.3 0.0 0.3 0.6 −0.6 −0.3 0.0 0.3 0.6 Estimator C.abry C.full DFA ELW GPH LoMPE LW PP.F PP.G RS Figure 4: Box plot of the adjusted ...
-
[11]
native2. mean3. linear4. random C.abry C.full DFA ELW GPH LoMPE LW PP.F PP.G RS C.abry C.full DFA ELW GPH LoMPE LW PP.F PP.G RS C.abry C.full DFA ELW GPH LoMPE LW PP.F PP.G RS −0.6 −0.3 0.0 0.3 0.6 −0.6 −0.3 0.0 0.3 0.6 −0.6 −0.3 0.0 0.3 0.6 −0.6 −0.3 0.0 0.3 0.6 Estimator C.abry C.full DFA ELW GPH LoMPE LW PP.F PP.G RS Figure 5: Box plot of the adjusted ...
-
[12]
native2. mean3. linear4. random C.abry C.full DFA ELW GPH LoMPE LW PP.F PP.G RS C.abry C.full DFA ELW GPH LoMPE LW PP.F PP.G RS C.abry C.full DFA ELW GPH LoMPE LW PP.F PP.G RS −0.6 −0.3 0.0 0.3 0.6 −0.6 −0.3 0.0 0.3 0.6 −0.6 −0.3 0.0 0.3 0.6 −0.6 −0.3 0.0 0.3 0.6 Estimator C.abry C.full DFA ELW GPH LoMPE LW PP.F PP.G RS Figure 6: Box plot of the adjusted ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.