pith. sign in

arxiv: 2604.18863 · v1 · submitted 2026-04-20 · 📊 stat.ME

Overstuffed sandwiches and separation anxiety: finite-sample variance estimation for penalized GEE with near-separated binary data

Pith reviewed 2026-05-10 03:27 UTC · model grok-4.3

classification 📊 stat.ME
keywords penalized GEEfinite-sample variancenear-separationbinary longitudinal datasandwich estimatorleverage correctiontype I errorsmall sample inference
0
0 comments X

The pith

Penalized GEE variance estimates overcorrect in small samples with near-separated binary data, but a targeted upward adjustment restores reliable type I error rates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Penalized generalized estimating equations stabilize point estimates for longitudinal binary responses when events are rare or data are nearly separated. Standard sandwich corrections for the variance of these estimates, however, can overadjust in high-leverage directions or fail entirely in small, unbalanced designs. The paper first derives first-order asymptotics for PGEE along convergent interior-root sequences and gives a matrix description of the exact overcorrection produced by full leverage adjustment. It then introduces V_AR, which retains the score-level leverage correction while adding a finite-sample upward translation whose leading term is the finite-population correction. Simulations confirm that V_AR keeps type I error at or above nominal levels down to N=10, including settings where other corrections are anti-conservative and pooling methods cannot be applied.

Core claim

We establish first-order asymptotics for PGEE along convergent interior-root sequences and derive a matrix characterization of the parameter-specific overcorrection induced by full leverage adjustment. Finite-sample calibration is limited by both mean bias and the variability of leverage-corrected variance estimates. We propose V_AR, which keeps the score-level leverage correction and adds a finite-sample upward translation dominated at first order by the finite-population factor, with a smaller centering term. In simulations, V_AR gives conservative or near-nominal type I error in low-event, small-N settings, including N=10, where several standard corrections remain anti-conservative and 3+

What carries the argument

V_AR estimator, which retains score-level leverage correction and augments it with a finite-sample upward translation whose dominant term is the finite-population factor

If this is right

  • V_AR supplies usable standard errors for PGEE in the small-N, low-event regime where existing corrections are anti-conservative.
  • The method remains applicable to unbalanced longitudinal designs where pooling-based variance estimators cannot be formed.
  • Type I error control extends at least to N=10, allowing inference in data sets previously considered too small for reliable PGEE analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same finite-population adjustment idea could be tested in other penalized estimating-equation settings such as penalized GLMs for independent data.
  • If the overcorrection matrix characterization is accurate, it supplies a diagnostic for deciding when full leverage adjustment should be replaced by the partial adjustment used in V_AR.
  • Further calibration of the centering term might reduce the conservatism of V_AR while preserving validity in the smallest samples.

Load-bearing premise

First-order asymptotics hold for PGEE along convergent interior-root sequences and the matrix form of overcorrection from full leverage adjustment correctly identifies the finite-sample bias.

What would settle it

A simulation with N=10, low event counts, and unbalanced clusters in which V_AR produces type I error rates materially below the nominal level would show that the finite-sample translation does not achieve the claimed conservative or near-nominal performance.

Figures

Figures reproduced from arXiv: 2604.18863 by Awan Afiaz, M. Shafiqur Rahman.

Figure 1
Figure 1. Figure 1: Type I error for β1 (treatment effect) under H0, by number of subjects N and event rate. Rows fix N ∈ {10, 20, 30, 50}; columns group estimators into baseline (Vˆ LZ, VˆDF ), leverage corrections (VˆKC, VˆMD, Vˆ F G), additive and non-pooling hybrids (VˆMBN , Vˆ FW , Vˆ F Z), and pooling / pooling-based hybrids (Vˆ Pan, VˆGST , VˆW L, VˆW B, Vˆ RS). Vˆ AR (orange) appears in every panel for direct comparis… view at source ↗
Figure 2
Figure 2. Figure 2: Median standard error divided by simulation standard error (SE/SimSE) for [PITH_FULL_IMAGE:figures/full_fig_p022_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Type I error for β1 under unbalanced repeated-measures designs. Rows fix the pattern of repeated observations per subject; columns group estimators as in [PITH_FULL_IMAGE:figures/full_fig_p023_3.png] view at source ↗
read the original abstract

Penalized generalized estimating equations (PGEE) stabilize point estimation for longitudinal binary data under near-separation, but inference still depends on how the sandwich variance is corrected. Existing corrections for PGEE can overadjust in high-leverage directions, require restrictive pooling assumptions, or add global regularization without explaining the bias. We establish first-order asymptotics for PGEE along convergent interior-root sequences and derive a matrix characterization of the parameter-specific overcorrection induced by full leverage adjustment. Finite-sample calibration is limited by both mean bias and the variability of leverage-corrected variance estimates. We propose $\hat{V}_{AR}$, which keeps the score-level leverage correction and adds a finite-sample upward translation dominated at first order by the finite-population factor, with a smaller centering term. In simulations, $\hat{V}_{AR}$ gives conservative or near-nominal type I error in low-event, small-$N$ settings, including $N = 10$, where several standard corrections remain anti-conservative and pooling estimators are unavailable for unbalanced designs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper addresses variance estimation for penalized generalized estimating equations (PGEE) applied to longitudinal binary data under near-separation. It establishes first-order asymptotics for PGEE along convergent interior-root sequences, derives a matrix characterization of parameter-specific overcorrection from full leverage adjustment, and proposes a new estimator V_AR that retains the score-level leverage correction while adding a finite-sample upward translation dominated at first order by the finite-population factor (with a smaller centering term). Simulations are reported to show that V_AR achieves conservative or near-nominal type I error rates in low-event, small-N regimes (including N=10), outperforming several standard corrections that remain anti-conservative.

Significance. If the central claims hold, the work offers a targeted improvement to sandwich variance estimation for PGEE in finite-sample settings with rare events, where existing methods often produce anti-conservative inference or are inapplicable. The explicit asymptotic derivation combined with simulation evidence for small-N performance constitutes a concrete methodological advance in longitudinal data analysis, particularly for applications in biostatistics or social sciences involving binary outcomes with separation issues. The proposal avoids global regularization or restrictive pooling while providing a transparent finite-population adjustment.

major comments (2)
  1. [first-order asymptotics derivation] The first-order asymptotics section: the derivation is restricted to convergent interior-root sequences, yet the motivating regime of near-separation (low events, small N) systematically inflates coefficient magnitudes and can render the effective information matrix ill-conditioned. This places the target application outside the assumed sequences, so the claimed first-order dominance of the finite-population upward translation in V_AR lacks theoretical guarantee and the conservative type-I-error behavior remains an empirical observation.
  2. [simulation study] Simulation results section: the reported type I error control for V_AR at N=10 is presented without sufficient detail on the exact data-generating process, rules for excluding or handling separated clusters, number of replications, or verification that the first-order terms match the derived expressions. This weakens the ability to assess whether the observed conservatism is robust or an artifact of the design.
minor comments (2)
  1. [Abstract] The abstract refers to 'pooling estimators' being unavailable for unbalanced designs without defining or citing them, which may confuse readers unfamiliar with the specific literature on GEE variance pooling.
  2. [throughout] Notation for the proposed estimator alternates between V_AR and hat{V}_AR; consistent use throughout would improve readability.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the detailed and constructive comments. We address each major comment point by point below, indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [first-order asymptotics derivation] The first-order asymptotics section: the derivation is restricted to convergent interior-root sequences, yet the motivating regime of near-separation (low events, small N) systematically inflates coefficient magnitudes and can render the effective information matrix ill-conditioned. This places the target application outside the assumed sequences, so the claimed first-order dominance of the finite-population upward translation in V_AR lacks theoretical guarantee and the conservative type-I-error behavior remains an empirical observation.

    Authors: We agree that the first-order asymptotics for PGEE are derived under convergent interior-root sequences, which do not encompass the most extreme near-separation cases where the effective information matrix may become ill-conditioned and the estimator may not converge to a finite interior point. Consequently, the first-order dominance result for the finite-population adjustment in V_AR is guaranteed only within the stated asymptotic framework. In the small-N, low-event regime that motivates the paper, the conservative or near-nominal type I error performance of V_AR is an empirical finding supported by the simulations rather than a direct consequence of the asymptotic dominance. We will revise the manuscript to explicitly clarify the scope of the theoretical results and to emphasize that the small-sample behavior is demonstrated empirically. revision: partial

  2. Referee: [simulation study] Simulation results section: the reported type I error control for V_AR at N=10 is presented without sufficient detail on the exact data-generating process, rules for excluding or handling separated clusters, number of replications, or verification that the first-order terms match the derived expressions. This weakens the ability to assess whether the observed conservatism is robust or an artifact of the design.

    Authors: We thank the referee for highlighting the need for greater transparency in the simulation design. In the revised manuscript we will expand the simulation results section to provide complete details on the data-generating process (including covariate distributions, correlation structures, and event probabilities), explicit rules for detecting and handling separated clusters, the exact number of Monte Carlo replications, and supplementary checks confirming that the simulated first-order terms align with the derived asymptotic expressions. revision: yes

standing simulated objections not resolved
  • A theoretical guarantee for the first-order dominance of the finite-population adjustment in V_AR under non-convergent or boundary sequences that arise in extreme near-separation cannot be provided within the current analytic framework and would require a separate, substantially different asymptotic development.

Circularity Check

0 steps flagged

No significant circularity: V_AR constructed from explicit finite-population adjustment on existing score corrections

full rationale

The paper first establishes first-order asymptotics for PGEE along convergent interior-root sequences and derives a matrix characterization of parameter-specific overcorrection from full leverage adjustment. It then proposes V_AR by retaining the score-level leverage correction and adding an explicit finite-sample upward translation term dominated at first order by the finite-population factor plus a smaller centering term. This construction does not reduce the estimator to a fitted quantity from the same data by definition, nor does any load-bearing step rely on self-citation chains or imported uniqueness theorems. The central proposal remains an independent mathematical adjustment whose validity rests on the stated asymptotic regime rather than tautological re-expression of inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard regularity conditions for GEE asymptotics and the existence of convergent interior-root sequences; no new free parameters or invented entities are introduced beyond the proposed correction term itself.

axioms (1)
  • domain assumption First-order asymptotics hold for PGEE along convergent interior-root sequences
    Invoked to derive the matrix characterization of overcorrection and the form of the finite-sample adjustment.

pith-pipeline@v0.9.0 · 5484 in / 1216 out tokens · 34200 ms · 2026-05-10T03:27:40.597916+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

  1. [1]

    John Wiley & Sons, 2012

    Garrett M Fitzmaurice, Nan M Laird, and James H Ware.Applied longitudinal analysis. John Wiley & Sons, 2012

  2. [2]

    Longitudinal data analysis using generalized linear models.Biometrika, 73(1):13–22, 1986

    Kung-Yee Liang and Scott L Zeger. Longitudinal data analysis using generalized linear models.Biometrika, 73(1):13–22, 1986

  3. [3]

    A covariance estimator for gee with improved small-sample properties.Biometrics, 57(1):126–134, 2001

    Lloyd A Mancl and Timothy A DeRouen. A covariance estimator for gee with improved small-sample properties.Biometrics, 57(1):126–134, 2001

  4. [4]

    On the existence of maximum likelihood estimates in logistic regression models.Biometrika, 71(1):1–10, 1984

    Adelin Albert and John A Anderson. On the existence of maximum likelihood estimates in logistic regression models.Biometrika, 71(1):1–10, 1984

  5. [5]

    Bias-reduced and separation-proof gee with small or sparse longitudinal binary data.Statistics in Medicine, 38(14):2544– 2560, 2019

    Momenul Haque Mondol and M Shafiqur Rahman. Bias-reduced and separation-proof gee with small or sparse longitudinal binary data.Statistics in Medicine, 38(14):2544– 2560, 2019

  6. [6]

    Sparse data bias: a problem hiding in plain sight.bmj, 352, 2016

    Sander Greenland, Mohammad Ali Mansournia, and Douglas G Altman. Sparse data bias: a problem hiding in plain sight.bmj, 352, 2016

  7. [7]

    Bias reduction of maximum likelihood estimates.Biometrika, 80(1):27–38, 1993

    David Firth. Bias reduction of maximum likelihood estimates.Biometrika, 80(1):27–38, 1993

  8. [8]

    A note on the efficiency of sandwich co- variance matrix estimation.Journal of the American Statistical Association, 96(456): 1387–1396, 2001

    G¨ oran Kauermann and Raymond J Carroll. A note on the efficiency of sandwich co- variance matrix estimation.Journal of the American Statistical Association, 96(456): 1387–1396, 2001

  9. [9]

    Small sample correction for the variance of gee estimators.Biometrical Journal: journal of mathematical methods in biosciences, 45(4):395–409, 2003

    Jorge G Morel, MC Bokossa, and Nagaraj K Neerchal. Small sample correction for the variance of gee estimators.Biometrical Journal: journal of mathematical methods in biosciences, 45(4):395–409, 2003

  10. [10]

    Paul Rogers and Julie Stoner. Modification of the sandwich estimator in generalized estimating equations with correlated binary outcomes in rare event and small sample settings.American journal of applied mathematics and statistics, 3(6):243, 2015

  11. [11]

    On the robust variance estimator in generalised estimating equations

    Wei Pan. On the robust variance estimator in generalised estimating equations. Biometrika, 88(3):901–906, 2001

  12. [12]

    Robust covariance estimator for small-sample adjustment in the generalized estimating equations: a simulation study

    Masahiko Gosho, Yasunori Sato, and Hisao Takeuchi. Robust covariance estimator for small-sample adjustment in the generalized estimating equations: a simulation study. Science Journal of Applied Mathematics and Statistics, 2(1):20, 2014. 31

  13. [13]

    Modified robust variance estimator for generalized estimating equations with improved small-sample performance.Statistics in medicine, 30(11):1278– 1291, 2011

    Ming Wang and Qi Long. Modified robust variance estimator for generalized estimating equations with improved small-sample performance.Statistics in medicine, 30(11):1278– 1291, 2011

  14. [14]

    Whitney P Ford and Philip M Westgate. A comparison of bias-corrected empirical co- variance estimators with generalized estimating equations in small-sample longitudinal study settings.Statistics in Medicine, 37(28):4318–4334, 2018

  15. [15]

    Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties.Journal of econometrics, 29 (3):305–325, 1985

    James G MacKinnon and Halbert White. Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties.Journal of econometrics, 29 (3):305–325, 1985

  16. [16]

    Small sample performance of bias-corrected sandwich estimators for cluster-randomized trials with binary outcomes.Statistics in medicine, 34(2):281–296, 2015

    Peng Li and David T Redden. Small sample performance of bias-corrected sandwich estimators for cluster-randomized trials with binary outcomes.Statistics in medicine, 34(2):281–296, 2015

  17. [17]

    Covariance estimators for gener- alized estimating equations (gee) in longitudinal analysis with small samples.Statistics in medicine, 35(10):1706–1721, 2016

    Ming Wang, Lan Kong, Zheng Li, and Lijun Zhang. Covariance estimators for gener- alized estimating equations (gee) in longitudinal analysis with small samples.Statistics in medicine, 35(10):1706–1721, 2016

  18. [18]

    Angelika Geroldinger, Rok Blagus, Helen Ogden, and Georg Heinze. An investigation of penalization and data augmentation to improve convergence of generalized estimating equations for clustered binary outcomes.BMC Medical Research Methodology, 22(1): 168, 2022

  19. [19]

    A comparison of bias-adjusted generalized estimating equations for sparse binary data in small-sample longitudinal studies.Statistics in Medicine, 2023

    Masahiko Gosho, Ryota Ishii, Hisashi Noma, and Kazushi Maruo. A comparison of bias-adjusted generalized estimating equations for sparse binary data in small-sample longitudinal studies.Statistics in Medicine, 2023

  20. [20]

    geessbin: GEE solver for small sample binary data.R package version 0.1.1, CRAN, 2024

    Ryota Ishii, Masahiko Gosho, Hisashi Noma, and Kazushi Maruo. geessbin: GEE solver for small sample binary data.R package version 0.1.1, CRAN, 2024

  21. [21]

    Asymptotics for generalized estimating equations with large cluster sizes.The Annals of Statistics, 31(1):310–347, 2003

    Minge Xie and Yaning Yang. Asymptotics for generalized estimating equations with large cluster sizes.The Annals of Statistics, 31(1):310–347, 2003

  22. [22]

    Asymptotic results with generalized esti- mating equations for longitudinal data.Journal of Multivariate Analysis, 95(2):285–300, 2005

    Radu M Balan and Isabela Schiopu-Kratina. Asymptotic results with generalized esti- mating equations for longitudinal data.Journal of Multivariate Analysis, 95(2):285–300, 2005

  23. [23]

    Bias reduction in exponential family nonlinear models.Biometrika, 96(4):793–804, 2009

    Ioannis Kosmidis and David Firth. Bias reduction in exponential family nonlinear models.Biometrika, 96(4):793–804, 2009. 32

  24. [24]

    Cambridge University Press, 1998

    Aad W van der Vaart.Asymptotic Statistics. Cambridge University Press, 1998

  25. [25]

    Small-sample adjustments for wald-type tests using sandwich estimators.Biometrics, 57(4):1198–1206, 2001

    Michael P Fay and Barry I Graubard. Small-sample adjustments for wald-type tests using sandwich estimators.Biometrics, 57(4):1198–1206, 2001

  26. [26]

    A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations.Biometrika, 90(2):455– 463, 2003

    Bahjat F Qaqish. A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations.Biometrika, 90(2):455– 463, 2003

  27. [27]

    A 12-week treat- ment for dermatophyte toe onychomycosis terbinafine 250mg/day vs

    M De Backer, P De Keyser, C De Vroey, and Emmanuel Lesaffre. A 12-week treat- ment for dermatophyte toe onychomycosis terbinafine 250mg/day vs. itraconazole 200mg/day—a double-blind comparative trial.British Journal of Dermatology, 134: 16–17, 1996

  28. [28]

    Geesmv: Modified variance estimators for generalized estimating equations, r package version 1.3, 2015

    M Wang. Geesmv: Modified variance estimators for generalized estimating equations, r package version 1.3, 2015

  29. [29]

    Within-herd spread of contagious bovine pleuropneumonia in Ethiopian highlands.Pre- ventive Veterinary Medicine, 64(1):27–40, 2004

    Matthieu Lesnoff, Gerard Laval, Pascal Bonnet, Sintayehu Abdicho, Asseged Workalemahu, Daniel Kifle, Andr´ e Peyraud, Renaud Lancelot, and Fran¸ cois Thiaucourt. Within-herd spread of contagious bovine pleuropneumonia in Ethiopian highlands.Pre- ventive Veterinary Medicine, 64(1):27–40, 2004

  30. [30]

    Fitting linear mixed- effects models using lme4.Journal of Statistical Software, 67(1):1–48, 2015

    Douglas Bates, Martin M¨ achler, Ben Bolker, and Steve Walker. Fitting linear mixed- effects models using lme4.Journal of Statistical Software, 67(1):1–48, 2015. 33 A Existing variance estimators Complete formulas for the thirteen literature estimators referenced in Section 2.5. Each entry starts from the GEE sandwich (2) and states the estimator-specific...

  31. [31]

    Hence κ(ϕ) = max 1, ϕ−1 C p

    =ϕ −1C. Hence κ(ϕ) = max 1, ϕ−1 C p . Multiplying by ∆(ϕ) =ϕ e∆ gives κ(ϕ)∆(ϕ) = max 1, ϕ−1 C p ϕe∆ = max ϕ, C p e∆. 47 Now e∆ =O(N −1) because eI0 =O(N), while eI c 1 =O(N) as a sum ofNcentered subject contributions, so C= trace( e∆eI c

  32. [32]

    Sinceδ n =O(N −1), we obtain δn κ(ϕ)∆(ϕ) =O(N −2) forϕin any fixed neighborhood ofϕ 0 >0

    =O(1) and thereforeC/p=O(1). Sinceδ n =O(N −1), we obtain δn κ(ϕ)∆(ϕ) =O(N −2) forϕin any fixed neighborhood ofϕ 0 >0. Thus the multiplicative Morel term is exactly invariant to a plug-in scalar working scale, and the additive term changes only at higher order. The regression-parameter count in Morel’s correction therefore remainsp= dim(β). C Full applied...

  33. [33]

    Table S1 reports the standard errors andp-values for ˆβ1 (treatment) across all fourteen estimators for the toenail data atN= 13 (Case II, near separation) andN= 60 (Case III)

    involves quasi-complete separation, the results depend on which patients are sampled; different seeds can produce different cell counts and different degrees of separation. Table S1 reports the standard errors andp-values for ˆβ1 (treatment) across all fourteen estimators for the toenail data atN= 13 (Case II, near separation) andN= 60 (Case III). As the ...