Overstuffed sandwiches and separation anxiety: finite-sample variance estimation for penalized GEE with near-separated binary data
Pith reviewed 2026-05-10 03:27 UTC · model grok-4.3
The pith
Penalized GEE variance estimates overcorrect in small samples with near-separated binary data, but a targeted upward adjustment restores reliable type I error rates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We establish first-order asymptotics for PGEE along convergent interior-root sequences and derive a matrix characterization of the parameter-specific overcorrection induced by full leverage adjustment. Finite-sample calibration is limited by both mean bias and the variability of leverage-corrected variance estimates. We propose V_AR, which keeps the score-level leverage correction and adds a finite-sample upward translation dominated at first order by the finite-population factor, with a smaller centering term. In simulations, V_AR gives conservative or near-nominal type I error in low-event, small-N settings, including N=10, where several standard corrections remain anti-conservative and 3+
What carries the argument
V_AR estimator, which retains score-level leverage correction and augments it with a finite-sample upward translation whose dominant term is the finite-population factor
If this is right
- V_AR supplies usable standard errors for PGEE in the small-N, low-event regime where existing corrections are anti-conservative.
- The method remains applicable to unbalanced longitudinal designs where pooling-based variance estimators cannot be formed.
- Type I error control extends at least to N=10, allowing inference in data sets previously considered too small for reliable PGEE analysis.
Where Pith is reading between the lines
- The same finite-population adjustment idea could be tested in other penalized estimating-equation settings such as penalized GLMs for independent data.
- If the overcorrection matrix characterization is accurate, it supplies a diagnostic for deciding when full leverage adjustment should be replaced by the partial adjustment used in V_AR.
- Further calibration of the centering term might reduce the conservatism of V_AR while preserving validity in the smallest samples.
Load-bearing premise
First-order asymptotics hold for PGEE along convergent interior-root sequences and the matrix form of overcorrection from full leverage adjustment correctly identifies the finite-sample bias.
What would settle it
A simulation with N=10, low event counts, and unbalanced clusters in which V_AR produces type I error rates materially below the nominal level would show that the finite-sample translation does not achieve the claimed conservative or near-nominal performance.
Figures
read the original abstract
Penalized generalized estimating equations (PGEE) stabilize point estimation for longitudinal binary data under near-separation, but inference still depends on how the sandwich variance is corrected. Existing corrections for PGEE can overadjust in high-leverage directions, require restrictive pooling assumptions, or add global regularization without explaining the bias. We establish first-order asymptotics for PGEE along convergent interior-root sequences and derive a matrix characterization of the parameter-specific overcorrection induced by full leverage adjustment. Finite-sample calibration is limited by both mean bias and the variability of leverage-corrected variance estimates. We propose $\hat{V}_{AR}$, which keeps the score-level leverage correction and adds a finite-sample upward translation dominated at first order by the finite-population factor, with a smaller centering term. In simulations, $\hat{V}_{AR}$ gives conservative or near-nominal type I error in low-event, small-$N$ settings, including $N = 10$, where several standard corrections remain anti-conservative and pooling estimators are unavailable for unbalanced designs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper addresses variance estimation for penalized generalized estimating equations (PGEE) applied to longitudinal binary data under near-separation. It establishes first-order asymptotics for PGEE along convergent interior-root sequences, derives a matrix characterization of parameter-specific overcorrection from full leverage adjustment, and proposes a new estimator V_AR that retains the score-level leverage correction while adding a finite-sample upward translation dominated at first order by the finite-population factor (with a smaller centering term). Simulations are reported to show that V_AR achieves conservative or near-nominal type I error rates in low-event, small-N regimes (including N=10), outperforming several standard corrections that remain anti-conservative.
Significance. If the central claims hold, the work offers a targeted improvement to sandwich variance estimation for PGEE in finite-sample settings with rare events, where existing methods often produce anti-conservative inference or are inapplicable. The explicit asymptotic derivation combined with simulation evidence for small-N performance constitutes a concrete methodological advance in longitudinal data analysis, particularly for applications in biostatistics or social sciences involving binary outcomes with separation issues. The proposal avoids global regularization or restrictive pooling while providing a transparent finite-population adjustment.
major comments (2)
- [first-order asymptotics derivation] The first-order asymptotics section: the derivation is restricted to convergent interior-root sequences, yet the motivating regime of near-separation (low events, small N) systematically inflates coefficient magnitudes and can render the effective information matrix ill-conditioned. This places the target application outside the assumed sequences, so the claimed first-order dominance of the finite-population upward translation in V_AR lacks theoretical guarantee and the conservative type-I-error behavior remains an empirical observation.
- [simulation study] Simulation results section: the reported type I error control for V_AR at N=10 is presented without sufficient detail on the exact data-generating process, rules for excluding or handling separated clusters, number of replications, or verification that the first-order terms match the derived expressions. This weakens the ability to assess whether the observed conservatism is robust or an artifact of the design.
minor comments (2)
- [Abstract] The abstract refers to 'pooling estimators' being unavailable for unbalanced designs without defining or citing them, which may confuse readers unfamiliar with the specific literature on GEE variance pooling.
- [throughout] Notation for the proposed estimator alternates between V_AR and hat{V}_AR; consistent use throughout would improve readability.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments. We address each major comment point by point below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [first-order asymptotics derivation] The first-order asymptotics section: the derivation is restricted to convergent interior-root sequences, yet the motivating regime of near-separation (low events, small N) systematically inflates coefficient magnitudes and can render the effective information matrix ill-conditioned. This places the target application outside the assumed sequences, so the claimed first-order dominance of the finite-population upward translation in V_AR lacks theoretical guarantee and the conservative type-I-error behavior remains an empirical observation.
Authors: We agree that the first-order asymptotics for PGEE are derived under convergent interior-root sequences, which do not encompass the most extreme near-separation cases where the effective information matrix may become ill-conditioned and the estimator may not converge to a finite interior point. Consequently, the first-order dominance result for the finite-population adjustment in V_AR is guaranteed only within the stated asymptotic framework. In the small-N, low-event regime that motivates the paper, the conservative or near-nominal type I error performance of V_AR is an empirical finding supported by the simulations rather than a direct consequence of the asymptotic dominance. We will revise the manuscript to explicitly clarify the scope of the theoretical results and to emphasize that the small-sample behavior is demonstrated empirically. revision: partial
-
Referee: [simulation study] Simulation results section: the reported type I error control for V_AR at N=10 is presented without sufficient detail on the exact data-generating process, rules for excluding or handling separated clusters, number of replications, or verification that the first-order terms match the derived expressions. This weakens the ability to assess whether the observed conservatism is robust or an artifact of the design.
Authors: We thank the referee for highlighting the need for greater transparency in the simulation design. In the revised manuscript we will expand the simulation results section to provide complete details on the data-generating process (including covariate distributions, correlation structures, and event probabilities), explicit rules for detecting and handling separated clusters, the exact number of Monte Carlo replications, and supplementary checks confirming that the simulated first-order terms align with the derived asymptotic expressions. revision: yes
- A theoretical guarantee for the first-order dominance of the finite-population adjustment in V_AR under non-convergent or boundary sequences that arise in extreme near-separation cannot be provided within the current analytic framework and would require a separate, substantially different asymptotic development.
Circularity Check
No significant circularity: V_AR constructed from explicit finite-population adjustment on existing score corrections
full rationale
The paper first establishes first-order asymptotics for PGEE along convergent interior-root sequences and derives a matrix characterization of parameter-specific overcorrection from full leverage adjustment. It then proposes V_AR by retaining the score-level leverage correction and adding an explicit finite-sample upward translation term dominated at first order by the finite-population factor plus a smaller centering term. This construction does not reduce the estimator to a fitted quantity from the same data by definition, nor does any load-bearing step rely on self-citation chains or imported uniqueness theorems. The central proposal remains an independent mathematical adjustment whose validity rests on the stated asymptotic regime rather than tautological re-expression of inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption First-order asymptotics hold for PGEE along convergent interior-root sequences
Reference graph
Works this paper leans on
-
[1]
Garrett M Fitzmaurice, Nan M Laird, and James H Ware.Applied longitudinal analysis. John Wiley & Sons, 2012
work page 2012
-
[2]
Longitudinal data analysis using generalized linear models.Biometrika, 73(1):13–22, 1986
Kung-Yee Liang and Scott L Zeger. Longitudinal data analysis using generalized linear models.Biometrika, 73(1):13–22, 1986
work page 1986
-
[3]
A covariance estimator for gee with improved small-sample properties.Biometrics, 57(1):126–134, 2001
Lloyd A Mancl and Timothy A DeRouen. A covariance estimator for gee with improved small-sample properties.Biometrics, 57(1):126–134, 2001
work page 2001
-
[4]
Adelin Albert and John A Anderson. On the existence of maximum likelihood estimates in logistic regression models.Biometrika, 71(1):1–10, 1984
work page 1984
-
[5]
Momenul Haque Mondol and M Shafiqur Rahman. Bias-reduced and separation-proof gee with small or sparse longitudinal binary data.Statistics in Medicine, 38(14):2544– 2560, 2019
work page 2019
-
[6]
Sparse data bias: a problem hiding in plain sight.bmj, 352, 2016
Sander Greenland, Mohammad Ali Mansournia, and Douglas G Altman. Sparse data bias: a problem hiding in plain sight.bmj, 352, 2016
work page 2016
-
[7]
Bias reduction of maximum likelihood estimates.Biometrika, 80(1):27–38, 1993
David Firth. Bias reduction of maximum likelihood estimates.Biometrika, 80(1):27–38, 1993
work page 1993
-
[8]
G¨ oran Kauermann and Raymond J Carroll. A note on the efficiency of sandwich co- variance matrix estimation.Journal of the American Statistical Association, 96(456): 1387–1396, 2001
work page 2001
-
[9]
Jorge G Morel, MC Bokossa, and Nagaraj K Neerchal. Small sample correction for the variance of gee estimators.Biometrical Journal: journal of mathematical methods in biosciences, 45(4):395–409, 2003
work page 2003
-
[10]
Paul Rogers and Julie Stoner. Modification of the sandwich estimator in generalized estimating equations with correlated binary outcomes in rare event and small sample settings.American journal of applied mathematics and statistics, 3(6):243, 2015
work page 2015
-
[11]
On the robust variance estimator in generalised estimating equations
Wei Pan. On the robust variance estimator in generalised estimating equations. Biometrika, 88(3):901–906, 2001
work page 2001
-
[12]
Masahiko Gosho, Yasunori Sato, and Hisao Takeuchi. Robust covariance estimator for small-sample adjustment in the generalized estimating equations: a simulation study. Science Journal of Applied Mathematics and Statistics, 2(1):20, 2014. 31
work page 2014
-
[13]
Ming Wang and Qi Long. Modified robust variance estimator for generalized estimating equations with improved small-sample performance.Statistics in medicine, 30(11):1278– 1291, 2011
work page 2011
-
[14]
Whitney P Ford and Philip M Westgate. A comparison of bias-corrected empirical co- variance estimators with generalized estimating equations in small-sample longitudinal study settings.Statistics in Medicine, 37(28):4318–4334, 2018
work page 2018
-
[15]
James G MacKinnon and Halbert White. Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties.Journal of econometrics, 29 (3):305–325, 1985
work page 1985
-
[16]
Peng Li and David T Redden. Small sample performance of bias-corrected sandwich estimators for cluster-randomized trials with binary outcomes.Statistics in medicine, 34(2):281–296, 2015
work page 2015
-
[17]
Ming Wang, Lan Kong, Zheng Li, and Lijun Zhang. Covariance estimators for gener- alized estimating equations (gee) in longitudinal analysis with small samples.Statistics in medicine, 35(10):1706–1721, 2016
work page 2016
-
[18]
Angelika Geroldinger, Rok Blagus, Helen Ogden, and Georg Heinze. An investigation of penalization and data augmentation to improve convergence of generalized estimating equations for clustered binary outcomes.BMC Medical Research Methodology, 22(1): 168, 2022
work page 2022
-
[19]
Masahiko Gosho, Ryota Ishii, Hisashi Noma, and Kazushi Maruo. A comparison of bias-adjusted generalized estimating equations for sparse binary data in small-sample longitudinal studies.Statistics in Medicine, 2023
work page 2023
-
[20]
geessbin: GEE solver for small sample binary data.R package version 0.1.1, CRAN, 2024
Ryota Ishii, Masahiko Gosho, Hisashi Noma, and Kazushi Maruo. geessbin: GEE solver for small sample binary data.R package version 0.1.1, CRAN, 2024
work page 2024
-
[21]
Minge Xie and Yaning Yang. Asymptotics for generalized estimating equations with large cluster sizes.The Annals of Statistics, 31(1):310–347, 2003
work page 2003
-
[22]
Radu M Balan and Isabela Schiopu-Kratina. Asymptotic results with generalized esti- mating equations for longitudinal data.Journal of Multivariate Analysis, 95(2):285–300, 2005
work page 2005
-
[23]
Bias reduction in exponential family nonlinear models.Biometrika, 96(4):793–804, 2009
Ioannis Kosmidis and David Firth. Bias reduction in exponential family nonlinear models.Biometrika, 96(4):793–804, 2009. 32
work page 2009
-
[24]
Cambridge University Press, 1998
Aad W van der Vaart.Asymptotic Statistics. Cambridge University Press, 1998
work page 1998
-
[25]
Michael P Fay and Barry I Graubard. Small-sample adjustments for wald-type tests using sandwich estimators.Biometrics, 57(4):1198–1206, 2001
work page 2001
-
[26]
Bahjat F Qaqish. A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations.Biometrika, 90(2):455– 463, 2003
work page 2003
-
[27]
A 12-week treat- ment for dermatophyte toe onychomycosis terbinafine 250mg/day vs
M De Backer, P De Keyser, C De Vroey, and Emmanuel Lesaffre. A 12-week treat- ment for dermatophyte toe onychomycosis terbinafine 250mg/day vs. itraconazole 200mg/day—a double-blind comparative trial.British Journal of Dermatology, 134: 16–17, 1996
work page 1996
-
[28]
M Wang. Geesmv: Modified variance estimators for generalized estimating equations, r package version 1.3, 2015
work page 2015
-
[29]
Matthieu Lesnoff, Gerard Laval, Pascal Bonnet, Sintayehu Abdicho, Asseged Workalemahu, Daniel Kifle, Andr´ e Peyraud, Renaud Lancelot, and Fran¸ cois Thiaucourt. Within-herd spread of contagious bovine pleuropneumonia in Ethiopian highlands.Pre- ventive Veterinary Medicine, 64(1):27–40, 2004
work page 2004
-
[30]
Fitting linear mixed- effects models using lme4.Journal of Statistical Software, 67(1):1–48, 2015
Douglas Bates, Martin M¨ achler, Ben Bolker, and Steve Walker. Fitting linear mixed- effects models using lme4.Journal of Statistical Software, 67(1):1–48, 2015. 33 A Existing variance estimators Complete formulas for the thirteen literature estimators referenced in Section 2.5. Each entry starts from the GEE sandwich (2) and states the estimator-specific...
work page 2015
-
[31]
=ϕ −1C. Hence κ(ϕ) = max 1, ϕ−1 C p . Multiplying by ∆(ϕ) =ϕ e∆ gives κ(ϕ)∆(ϕ) = max 1, ϕ−1 C p ϕe∆ = max ϕ, C p e∆. 47 Now e∆ =O(N −1) because eI0 =O(N), while eI c 1 =O(N) as a sum ofNcentered subject contributions, so C= trace( e∆eI c
-
[32]
Sinceδ n =O(N −1), we obtain δn κ(ϕ)∆(ϕ) =O(N −2) forϕin any fixed neighborhood ofϕ 0 >0
=O(1) and thereforeC/p=O(1). Sinceδ n =O(N −1), we obtain δn κ(ϕ)∆(ϕ) =O(N −2) forϕin any fixed neighborhood ofϕ 0 >0. Thus the multiplicative Morel term is exactly invariant to a plug-in scalar working scale, and the additive term changes only at higher order. The regression-parameter count in Morel’s correction therefore remainsp= dim(β). C Full applied...
-
[33]
involves quasi-complete separation, the results depend on which patients are sampled; different seeds can produce different cell counts and different degrees of separation. Table S1 reports the standard errors andp-values for ˆβ1 (treatment) across all fourteen estimators for the toenail data atN= 13 (Case II, near separation) andN= 60 (Case III). As the ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.