Scalable Subset Selection in Linear Mixed Models
Pith reviewed 2026-05-19 08:07 UTC · model grok-4.3
The pith
A new ℓ0-regularized method scales subset selection in linear mixed models to thousands of predictors in seconds to minutes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper presents a new ℓ0 regularized method for LMM subset selection that can run on datasets containing thousands of predictors in seconds to minutes. On the computational front, a coordinate descent algorithm serves as the main workhorse with a provided convergence guarantee, while a local search algorithm helps traverse the nonconvex optimization surface. Both algorithms extend readily to subset selection in generalized LMMs via a penalized quasi-likelihood approximation. On the statistical front, a finite-sample bound on the Kullback-Leibler divergence of the new method is derived, and experiments demonstrate excellent performance on synthetic and real datasets.
What carries the argument
ℓ0-regularized penalized quasi-likelihood objective for linear mixed models, solved by coordinate descent with convergence guarantee plus local search to handle non-convexity.
If this is right
- Subset selection in LMMs becomes practical for wide datasets that previously required dropping random effects or using only convex relaxations.
- The coordinate descent procedure converges for the given non-convex objective.
- The same computational and statistical framework applies directly to generalized linear mixed models.
- A finite-sample KL divergence bound quantifies how well the fitted sparse model approximates the true data-generating process.
Where Pith is reading between the lines
- Practitioners analyzing clinical or genomic data with individual-level heterogeneity could now perform high-dimensional feature selection without sacrificing the random-effects structure.
- The local-search component suggests that direct attack on non-convex ℓ0 problems can be more effective than convex surrogates when paired with a good initialization strategy.
- Screening rules or warm-starting from convex solutions might further reduce the already modest runtimes for even larger predictor counts.
Load-bearing premise
Coordinate descent combined with local search reliably finds good solutions on the non-convex ℓ0-regularized objective, and the penalized quasi-likelihood approximation stays accurate when extending to generalized linear mixed models.
What would settle it
Running the algorithm on a dataset with 2000 predictors and measuring whether runtime stays under a few minutes while the selected model outperforms a dense baseline in cross-validated prediction error would directly test the central scalability and performance claims.
read the original abstract
Linear mixed models (LMMs), which incorporate fixed and random effects, are key tools for analyzing heterogeneous data, such as in personalized medicine. Nowadays, this type of data is increasingly wide, sometimes containing thousands of candidate predictors, necessitating sparsity for prediction and interpretation. However, existing sparse learning methods for LMMs do not scale well beyond tens or hundreds of predictors, leaving a large gap compared with sparse methods for linear models, which ignore random effects. This paper closes the gap with a new $\ell_0$ regularized method for LMM subset selection that can run on datasets containing thousands of predictors in seconds to minutes. On the computational front, we develop a coordinate descent algorithm as our main workhorse and provide a guarantee of its convergence. We also develop a local search algorithm to help traverse the nonconvex optimization surface. Both algorithms readily extend to subset selection in generalized LMMs via a penalized quasi-likelihood approximation. On the statistical front, we provide a finite-sample bound on the Kullback-Leibler divergence of the new method. We then demonstrate its excellent performance in experiments involving synthetic and real datasets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a scalable ℓ0-regularized method for subset selection in linear mixed models (LMMs) that handles thousands of predictors. It introduces a coordinate descent algorithm with a stated convergence guarantee, augments it with local search to address non-convexity, extends the approach to generalized LMMs via penalized quasi-likelihood, derives a finite-sample bound on the Kullback-Leibler divergence, and reports strong empirical performance on synthetic and real data.
Significance. If the algorithmic reliability and statistical bound hold, the work would meaningfully close the scalability gap between sparse linear regression and LMMs, enabling practical sparse modeling for high-dimensional heterogeneous data in areas such as personalized medicine. The explicit convergence guarantee and finite-sample KL bound are positive theoretical contributions that go beyond purely empirical claims.
major comments (2)
- [Algorithm and convergence guarantee (abstract and main algorithmic section)] The convergence guarantee for coordinate descent is stated to apply to the non-convex ℓ0-penalized objective, yet on such surfaces it only reaches coordinate-wise stationary points. Because the central scalability and performance claims (thousands of predictors in seconds to minutes with excellent recovery) rest on the combination of coordinate descent plus local search reliably locating high-quality supports, the manuscript should provide either a stronger guarantee or targeted experiments quantifying how often local search escapes poor stationary points as p grows (e.g., in the synthetic experiments section).
- [Statistical bound section] The finite-sample KL bound is presented as an independent statistical result, but the practical performance claims rely on data-driven choice of the regularization strength λ and other hyperparameters. The manuscript should clarify whether the bound remains valid or informative under such tuning, or whether an additional oracle inequality or stability argument is needed to connect the bound to the reported empirical results.
minor comments (2)
- [Generalized LMM extension] The extension to generalized LMMs via penalized quasi-likelihood is mentioned but would benefit from an explicit statement of the approximation error or conditions under which the quasi-likelihood remains accurate for the subset selection task.
- [Experiments] Figure captions and experimental tables should include the exact values of p, n, and number of random effects used in each synthetic setting to allow direct comparison with prior LMM sparse methods.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify the scope of our theoretical results and the role of the local search procedure. We address each major comment below, indicating the revisions we plan to incorporate.
read point-by-point responses
-
Referee: The convergence guarantee for coordinate descent is stated to apply to the non-convex ℓ0-penalized objective, yet on such surfaces it only reaches coordinate-wise stationary points. Because the central scalability and performance claims (thousands of predictors in seconds to minutes with excellent recovery) rest on the combination of coordinate descent plus local search reliably locating high-quality supports, the manuscript should provide either a stronger guarantee or targeted experiments quantifying how often local search escapes poor stationary points as p grows (e.g., in the synthetic experiments section).
Authors: We agree that the stated convergence result for coordinate descent establishes convergence to a coordinate-wise stationary point of the non-convex objective, consistent with standard analysis for block coordinate methods on non-convex problems. The local search step is explicitly introduced to move beyond such points by evaluating neighboring supports. To directly address the concern, we will add a targeted set of synthetic experiments that report, for increasing p, the fraction of instances in which local search improves the objective value or support recovery relative to coordinate descent alone. These results will be placed in the synthetic experiments section. revision: yes
-
Referee: The finite-sample KL bound is presented as an independent statistical result, but the practical performance claims rely on data-driven choice of the regularization strength λ and other hyperparameters. The manuscript should clarify whether the bound remains valid or informative under such tuning, or whether an additional oracle inequality or stability argument is needed to connect the bound to the reported empirical results.
Authors: The finite-sample KL bound is derived for the estimator minimizing the penalized objective at a fixed λ. In the reported experiments λ is selected by cross-validation. The bound therefore applies directly only to fixed regularization; its informativeness for the tuned estimator would indeed require an additional stability or oracle argument. We will revise the statistical bound section to state this distinction explicitly and to note that a full oracle inequality connecting the bound to cross-validated λ is left for future work, while preserving the bound as a guarantee for any fixed λ. revision: partial
Circularity Check
No significant circularity; derivations and bounds presented as independent contributions
full rationale
The paper develops a coordinate descent algorithm with an explicit convergence guarantee, augments it with local search for the non-convex ℓ0 objective, extends the approach to GLMMs via penalized quasi-likelihood, and derives a finite-sample KL divergence bound. None of these steps reduce by construction to fitted inputs, self-citations, or renamed empirical patterns within the provided sections. The algorithmic guarantees and statistical bound are framed as new results rather than tautological restatements of the method's own tuning or prior self-referential claims. Performance demonstrations rely on experiments but do not substitute for the core derivations.
Axiom & Free-Parameter Ledger
free parameters (1)
- regularization strength lambda
axioms (2)
- domain assumption Data are generated from a linear mixed model with fixed and random effects
- ad hoc to paper Coordinate descent converges for the non-convex ℓ0-penalized objective
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We develop a coordinate descent algorithm as our main workhorse and provide a guarantee of its convergence. We also develop a local search algorithm to help traverse the nonconvex optimization surface.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
finite-sample bound on the Kullback-Leibler divergence
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.