Combined shrinkage of fixed and random effects in linear mixed models using empirical Bayes

Mark A. van de Wiel; Matteo Amestoy; R. Vermeulen; Wessel N. van Wieringen

arxiv: 2604.24430 · v1 · submitted 2026-04-27 · 📊 stat.ME

Combined shrinkage of fixed and random effects in linear mixed models using empirical Bayes

Matteo Amestoy , R. Vermeulen , Mark A. van de Wiel , Wessel N. van Wieringen This is my paper

Pith reviewed 2026-05-08 02:16 UTC · model grok-4.3

classification 📊 stat.ME

keywords empirical Bayeslinear mixed modelsshrinkage estimationrandom effectsfixed effectsLaplace approximationmarginal likelihood

0 comments

The pith

Empirical Bayes selects prior parameters jointly for fixed and random effects in linear mixed models by maximizing the marginal likelihood.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a data-driven procedure that chooses the hyperparameters of the priors on both fixed effects and random effects at the same time inside linear mixed models. It does so by maximizing an approximation to the marginal likelihood of the observed data, obtained through the Laplace method. This automation matters because informative prior values are seldom available in advance, especially for the covariance matrices that govern random effects. The resulting estimator applies shrinkage to both effect types without requiring the user to guess tuning constants. Simulations reported in the paper indicate gains in the accuracy of recovered parameters and in the quality of predictions on new observations, while the air-pollution application shows that more elaborate random-effect structures become feasible.

Core claim

By treating the prior parameters for fixed and random effects as unknown quantities estimated from the data, the method maximizes the Laplace-approximated marginal likelihood and thereby produces a combined shrinkage estimator for linear mixed models that handles complex random-effect structures and high-dimensional settings without pre-specified priors.

What carries the argument

Laplace approximation to the marginal likelihood, jointly maximized over the prior hyperparameters of the fixed effects and the random-effect covariance structure.

If this is right

Parameter estimates for both fixed effects and variance components become more accurate when random-effect structures grow elaborate.
Out-of-sample predictions improve because the data-driven priors reduce overfitting.
Models with richer and statistically more appropriate random-effect covariances can be fitted routinely.
Subjective or arbitrary choices of shrinkage intensities are replaced by an automatic, likelihood-based procedure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same joint-maximization idea could be carried to generalized linear mixed models or other hierarchical settings where prior specification is equally difficult.
Performance might degrade if the Laplace approximation itself becomes unreliable in extremely high-dimensional random-effect spaces, suggesting a natural boundary for the method.
Direct comparison with fully Bayesian MCMC implementations on the same data sets would quantify the accuracy loss, if any, incurred by the Laplace shortcut.

Load-bearing premise

The Laplace approximation to the integral over the random effects remains accurate enough that its maximization yields reliable values for the prior parameters even when the random-effect structure is complex or high-dimensional.

What would settle it

A new simulation study with known true parameters in which the proposed estimator produces larger estimation error or worse predictive mean squared error than standard separate-shrinkage approaches would show that the claimed accuracy gains do not hold.

Figures

Figures reproduced from arXiv: 2604.24430 by Mark A. van de Wiel, Matteo Amestoy, R. Vermeulen, Wessel N. van Wieringen.

**Figure 1.** Figure 1: Violinplot of the estimated hyperparameters of the IW-prior of the covariance of the random view at source ↗

**Figure 2.** Figure 2: Violinplot of the MAP (in red) and maximum likelihood (in blue) estimates of the fixed and view at source ↗

**Figure 3.** Figure 3: Violinplot of the estimation error of the MAP (in red) and maximum likelihood (in blue) view at source ↗

**Figure 4.** Figure 4: Violinplot of the KL divergence difference KL[p view at source ↗

**Figure 5.** Figure 5: Violin plots of estimated random-effect parameters and hyperparameters under high view at source ↗

**Figure 6.** Figure 6: Right: Violin plots of the RMSE for the fixed-effect estimates. Estimates under fixed-effect view at source ↗

**Figure 7.** Figure 7: Boxplot of the RMSE of the prediction for the regularised estimators (red) and their non view at source ↗

**Figure 8.** Figure 8: Violinplot of the RMSE difference RMSE(θˆML) − RMSE(θˆ EB) over the 150 experiments. 5.2.1 Speed comparison to INLA view at source ↗

**Figure 9.** Figure 9: Computational time (in seconds) for marginal likelihood estimation of a LMM with a two view at source ↗

**Figure 10.** Figure 10: Violinplot of the estimated fixed effect hyperparameters when the fixed effects are respec view at source ↗

read the original abstract

A novel data-driven methodology is presented for the joint selection of prior parameters for both fixed and random effects in Linear Mixed Models (LMMs). This approach facilitates the estimation of complex random-effects structures, as well as potentially high-dimensional data. Although Bayesian frameworks require the specification of informative prior parameters, such values are often unavailable a priori - especially for random-effect covariances. The proposed method automates this selection through an Empirical Bayes framework, which maximizes the marginal likelihood using an efficient Laplace approximation. Numerical simulations demonstrate that this methodology significantly enhances parameter estimation accuracy and predictive performance. Finally, an application to a real-world air pollution and health dataset illustrates how the method enables the use of more sophisticated and statistically appropriate models to improve predictive outcomes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a novel empirical Bayes procedure for jointly selecting prior parameters on both fixed and random effects in linear mixed models. Prior parameters are estimated by maximizing the marginal likelihood via a Laplace approximation; the method is claimed to enable more complex random-effect structures and to deliver improved estimation accuracy and predictive performance, as demonstrated in numerical simulations and a real-data application to air-pollution and health outcomes.

Significance. If the Laplace approximation proves reliable across the tested regimes, the approach would automate a currently manual and often unavailable step in Bayesian LMM analysis, thereby lowering the barrier to using richer random-effect covariance structures in moderate-to-high-dimensional settings. The combination of simulation evidence and a concrete applied example is a positive feature.

major comments (2)

[Numerical Simulations] Numerical Simulations section: the central claim that the procedure 'significantly enhances parameter estimation accuracy and predictive performance' rests entirely on simulation results that themselves employ the Laplace approximation to maximize the marginal likelihood. No diagnostic or sensitivity check is reported on the accuracy of that approximation when the random-effect covariance is high-dimensional or non-diagonal; if the approximation distorts the location or curvature of the marginal likelihood surface, the reported gains in shrinkage parameters may be artifacts.
[Methods] Methods / Simulation design: quantitative details required to evaluate the headline claim are missing—number of Monte Carlo replicates, standard errors or confidence bands on the reported accuracy and prediction metrics, exact data-generating processes for the complex random-effect structures, and the precise baseline estimators against which improvement is measured.

minor comments (2)

The abstract states that the method 'facilitates the estimation of complex random-effects structures' but does not specify the dimension or structure of the covariances actually tested in the simulations; adding this information would strengthen the link between method and claim.
Notation for the joint prior parameters and the Laplace-approximated marginal likelihood could be introduced earlier and used consistently to aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which identify important areas where additional clarity and validation will strengthen the manuscript. We address each major comment below and describe the revisions we will implement.

read point-by-point responses

Referee: [Numerical Simulations] Numerical Simulations section: the central claim that the procedure 'significantly enhances parameter estimation accuracy and predictive performance' rests entirely on simulation results that themselves employ the Laplace approximation to maximize the marginal likelihood. No diagnostic or sensitivity check is reported on the accuracy of that approximation when the random-effect covariance is high-dimensional or non-diagonal; if the approximation distorts the location or curvature of the marginal likelihood surface, the reported gains in shrinkage parameters may be artifacts.

Authors: We agree that explicit validation of the Laplace approximation is necessary to support the central claims, especially as random-effect dimension increases. Our original simulations targeted moderate-dimensional regimes in which the approximation is expected to be reliable, but we did not include direct checks against more accurate methods. In the revised manuscript we will add a dedicated sensitivity subsection that compares Laplace-approximated marginal likelihood values and resulting shrinkage parameters to MCMC-based estimates on a subset of low- and moderate-dimensional designs, and we will report the observed relative error. These additions will allow readers to assess whether the reported gains could be artifacts of the approximation. revision: yes
Referee: [Methods] Methods / Simulation design: quantitative details required to evaluate the headline claim are missing—number of Monte Carlo replicates, standard errors or confidence bands on the reported accuracy and prediction metrics, exact data-generating processes for the complex random-effect structures, and the precise baseline estimators against which improvement is measured.

Authors: We acknowledge that these quantitative details were not stated with sufficient explicitness. The revised Methods and Simulation sections will report the exact number of Monte Carlo replicates, include standard errors or confidence bands for all accuracy and prediction metrics, provide the full data-generating processes (including the specific covariance matrices and parameter values used for the complex random-effect structures), and clearly identify the baseline estimators (standard REML, separate empirical Bayes procedures, and any other comparators). These additions will make the simulation design fully reproducible and allow direct evaluation of the claimed improvements. revision: yes

Circularity Check

0 steps flagged

No circularity: standard empirical Bayes estimation with external simulation validation

full rationale

The paper describes a standard empirical Bayes procedure that estimates prior parameters for fixed and random effects by maximizing the marginal likelihood (via Laplace approximation) and then applies the resulting shrinkage. This is a conventional, non-tautological workflow in which the hyperparameter estimates are derived from the data but the subsequent shrinkage and performance claims are not equivalent to the inputs by construction. No self-definitional equations, fitted-input predictions, or load-bearing self-citations appear in the provided description. The numerical simulations constitute external evidence rather than part of the derivation chain itself, so the central claims remain independently testable.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of the Laplace approximation for the marginal likelihood and on the assumption that maximizing that likelihood yields useful prior parameters for both effect types.

axioms (1)

domain assumption Laplace approximation is sufficiently accurate for marginal likelihood maximization in the targeted LMM settings
Invoked to enable efficient computation for complex random-effect structures

pith-pipeline@v0.9.0 · 5433 in / 1092 out tokens · 62467 ms · 2026-05-08T02:16:33.391215+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 2 canonical work pages

[1]

Akinc, D., & Vandebroek, M. (2018). Bayesian estimation of mixed logit models: Selecting an appro- priate prior for the covariance matrix [Publisher: Elsevier].Journal of choice modelling,29, 133–151. Amestoy, M., Van De Wiel, M. A., & Van Wieringen, W. N. (2024). Identifiability of the random effects’ covariance matrix of the linear mixed model.Communica...

work page doi:10.1080/03610926.2023.2272003 2018
[2]

https://doi.org/10.1016/S0167-9473(96)00047-3 Yang, M., Wang, M., & Dong, G. (2020). Bayesian variable selection for mixed effects model with shrinkage prior [Publisher: Springer].Computational Statistics,35(1), 227–243. 21 5 Appendix 5.1 Mathematical derivations In this section we first give the details of the EM algorithm to find the MAP and its adaptat...

work page doi:10.1016/s0167-9473(96)00047-3 2020
[3]

In the above we have grouped all the terms that do not depend onθin a constantcthat can differ from one equality to the other

log(σ 2)− α σ2 − RX r=1 ηr +q r + 1 2 log(|Σr|)− 1 2 Tr(Σ−1 r Φr). In the above we have grouped all the terms that do not depend onθin a constantcthat can differ from one equality to the other. To computef θ(k)(θ) =E γ∼p(.|θ (k),Y) log p(Y,γ|θ)π(θ|Θ) we need an expression forµ (k) r,j = Eγ∼p(· |θ (k),Y)(γr,j) andΩ (k) r,j =E γ∼p(.|θ (k),Y)(γr,j γ⊤ r,j). T...

1990

[1] [1]

Akinc, D., & Vandebroek, M. (2018). Bayesian estimation of mixed logit models: Selecting an appro- priate prior for the covariance matrix [Publisher: Elsevier].Journal of choice modelling,29, 133–151. Amestoy, M., Van De Wiel, M. A., & Van Wieringen, W. N. (2024). Identifiability of the random effects’ covariance matrix of the linear mixed model.Communica...

work page doi:10.1080/03610926.2023.2272003 2018

[2] [2]

https://doi.org/10.1016/S0167-9473(96)00047-3 Yang, M., Wang, M., & Dong, G. (2020). Bayesian variable selection for mixed effects model with shrinkage prior [Publisher: Springer].Computational Statistics,35(1), 227–243. 21 5 Appendix 5.1 Mathematical derivations In this section we first give the details of the EM algorithm to find the MAP and its adaptat...

work page doi:10.1016/s0167-9473(96)00047-3 2020

[3] [3]

In the above we have grouped all the terms that do not depend onθin a constantcthat can differ from one equality to the other

log(σ 2)− α σ2 − RX r=1 ηr +q r + 1 2 log(|Σr|)− 1 2 Tr(Σ−1 r Φr). In the above we have grouped all the terms that do not depend onθin a constantcthat can differ from one equality to the other. To computef θ(k)(θ) =E γ∼p(.|θ (k),Y) log p(Y,γ|θ)π(θ|Θ) we need an expression forµ (k) r,j = Eγ∼p(· |θ (k),Y)(γr,j) andΩ (k) r,j =E γ∼p(.|θ (k),Y)(γr,j γ⊤ r,j). T...

1990