Federated generalized linear mixed models based on one-time shared summary statistics

Christel Faes; Marie Analiz April Limpoco; Niel Hens

arxiv: 2605.01379 · v1 · submitted 2026-05-02 · 📊 stat.ME

Federated generalized linear mixed models based on one-time shared summary statistics

Marie Analiz April Limpoco , Christel Faes , Niel Hens This is my paper

Pith reviewed 2026-05-09 18:12 UTC · model grok-4.3

classification 📊 stat.ME

keywords generalized linear mixed modelsfederated estimationsummary statisticspseudo-data generationdata privacyone-time communicationmixed modelsGLMM

0 comments

The pith

Generalized linear mixed models can be estimated from one-time shared summary statistics by generating matching pseudo-data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that generalized linear mixed models can be fit accurately without access to individual-level records. Researchers generate pseudo-data whose summary statistics exactly replicate those of the unavailable private data, then estimate the model on the pseudo-data instead. This approach requires only a single round of summary sharing and produces estimates that match those from the real data up to the third decimal place, with comparable bias, coverage, and prediction accuracy for linear, logistic, and Poisson mixed models. A sympathetic reader would care because it removes a major practical barrier: the time, paperwork, and privacy risks that currently prevent many collaborative analyses.

Core claim

We propose generating pseudo-data whose summary statistics match those of the actual but unavailable data. These pseudo-data are then used for model estimation instead of the actual data. The estimates we achieve are identical (up to the third decimal place) to those derived from actual data and have similar bias, coverage, and prediction performance. Communication and resource efficiency distinguish our approach from existing methods.

What carries the argument

The generation of pseudo-data engineered to match the summary statistics of the unavailable real data, which then serves as the input for standard GLMM fitting routines.

If this is right

Parameter estimates match full-data results up to the third decimal place for linear, logistic, and Poisson mixed models.
Bias, coverage probabilities, and prediction performance remain comparable to those obtained from actual data.
Only one communication of summary statistics is required, avoiding repeated exchanges.
The method applies directly to cases where individual records cannot be shared due to privacy constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pseudo-data construction could be tested on other mixed-model families or on survival models where summary statistics are also commonly available.
One-time summary sharing may lower the barrier to multi-site studies in fields like epidemiology where data governance rules forbid raw data transfer.
If the method scales to high-dimensional covariates, it could support routine meta-analysis pipelines that currently rely on aggregated results alone.

Load-bearing premise

That pseudo-data constructed solely to match summary statistics will retain enough structure to recover unbiased parameter estimates, proper coverage, and accurate predictions, including for the random effects in mixed models.

What would settle it

A simulation in which the pseudo-data method yields random-effect variance estimates that differ by more than 0.01 from the full-data estimates or produces confidence intervals with coverage below 90 percent.

Figures

Figures reproduced from arXiv: 2605.01379 by Christel Faes, Marie Analiz April Limpoco, Niel Hens.

**Figure 1.** Figure 1: Proposed framework for a setup with three data providers. Each data provider view at source ↗

**Figure 2.** Figure 2: Relative bias distributions of estimates from pseudo-data and simulated data view at source ↗

**Figure 3.** Figure 3: 95% confidence intervals computed on pseudo-data and simulated data across view at source ↗

**Figure 4.** Figure 4: 95% confidence interval coverage computed on pseudo-data and simulated data view at source ↗

**Figure 5.** Figure 5: Predictions of models based on pseudo-data and simulated data across 500 view at source ↗

read the original abstract

Data privacy has increasingly become a daunting challenge because it limits data availability, which is essential in estimating statistical models such as generalized linear mixed models. Access to personal data often involves considerable time, effort, and paperwork, which can impede research progress and collaboration. Existing approaches that do not use individual-level data for model estimation are either prone to ecological bias, cannot handle heterogeneity, or require iterative communication. In this paper, we propose an approach to estimate generalized linear mixed models based on summary statistics shared only once. We used linear, logistic, and Poisson mixed models as examples to demonstrate the methodology. Our strategy involves generating pseudo-data whose summary statistics match those of the actual but unavailable data. These pseudo-data are then used for model estimation instead of the actual data. The estimates we achieve are identical (up to the third decimal place) to those derived from actual data and have similar bias, coverage, and prediction performance. Communication and resource efficiency distinguish our approach from existing methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

One-time summary sharing plus pseudo-data gives close numerical matches for GLMMs in their tests, but the method's ability to recover random effects from global aggregates remains the open question.

read the letter

This paper shows how to fit generalized linear mixed models across sites by sharing summary statistics just once and then generating pseudo-data to match those statistics for the actual fitting step. The new part is the one-time sharing strategy combined with pseudo-data for GLMMs. Earlier methods either used aggregates that introduce ecological bias, ignored heterogeneity, or needed repeated communication rounds. Here they demonstrate it on linear, logistic, and Poisson cases, and the fitted values line up with full-data results to three decimals while keeping similar bias, coverage, and prediction metrics. It handles the privacy constraint efficiently, which is useful when data can't leave the site. The approach is simple enough to implement and avoids the overhead of iterative methods. The potential issue is that summary statistics like overall means or totals do not necessarily determine the within-cluster distributions that drive the random effects in a GLMM. The marginal likelihood integrates over the random effects, so if the pseudo-data does not reproduce the right joint variation per cluster, the estimates for variance components or even fixed effects in some settings could diverge. Their simulations apparently worked, but it would help to see explicit checks on how the pseudo-data is constructed and whether it preserves enough structure for the information matrix. Readers working on federated analysis in epidemiology or social sciences would find this practical. It gives a concrete alternative when full data pooling is blocked. The work shows clear thinking on the problem and engages with the limitations of prior federated approaches. It deserves peer review to check the details of the pseudo-data generation and to test it on more varied cluster structures. I would send it to referees.

Referee Report

1 major / 2 minor

Summary. The paper proposes a one-time communication federated method for estimating generalized linear mixed models (GLMMs) by sharing summary statistics, generating pseudo-data that match those statistics, and fitting the GLMM to the pseudo-data instead of the original individual-level records. Demonstrations are provided for linear, logistic, and Poisson mixed models, with the central claim that the resulting estimates are identical to those from the actual data up to the third decimal place and exhibit similar bias, coverage, and prediction performance.

Significance. If the pseudo-data construction reliably preserves the information needed for GLMM inference, the approach would offer a practical, communication-efficient alternative to iterative federated methods or aggregate-data techniques that suffer from ecological bias. It could enable collaborative GLMM analysis in privacy-sensitive domains while maintaining the ability to model heterogeneity via random effects.

major comments (1)

[Methods] The pseudo-data generation step (described in the Methods) relies on matching unspecified summary statistics, but the marginal likelihood for GLMMs integrates over the random-effects distribution and depends on within-cluster joint distributions of responses and covariates. Global aggregates (e.g., overall means or totals) do not uniquely determine these cluster-level quantities, so it is unclear why the pseudo-data likelihood is guaranteed to yield the same score and information matrix as the true data; the reported third-decimal agreement may be data-specific rather than general.

minor comments (2)

[Abstract] The abstract and introduction would benefit from an explicit list or table of the exact summary statistics that are shared (e.g., cluster sizes, covariate means per cluster, response totals).
[Results] Simulation results in Section 4 should include the specific values of the shared summaries for each example so readers can assess sufficiency.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive and insightful comments on our manuscript. We address the major comment below and have incorporated revisions to improve clarity regarding the empirical nature of our approach.

read point-by-point responses

Referee: [Methods] The pseudo-data generation step (described in the Methods) relies on matching unspecified summary statistics, but the marginal likelihood for GLMMs integrates over the random-effects distribution and depends on within-cluster joint distributions of responses and covariates. Global aggregates (e.g., overall means or totals) do not uniquely determine these cluster-level quantities, so it is unclear why the pseudo-data likelihood is guaranteed to yield the same score and information matrix as the true data; the reported third-decimal agreement may be data-specific rather than general.

Authors: We agree that matching summary statistics does not provide a theoretical guarantee that the pseudo-data will reproduce the exact marginal likelihood, score function, or information matrix of the original data, since the GLMM marginal likelihood depends on within-cluster joint distributions that global aggregates alone cannot uniquely determine. The summary statistics matched in our procedure include both global and cluster-level aggregates (e.g., per-cluster means, variances, and sizes for responses and covariates), as specified in the Methods section; the pseudo-data are generated to match these moments approximately. We do not claim exact equivalence of the likelihoods but rather demonstrate, via extensive simulations across linear, logistic, and Poisson GLMMs with varying numbers of clusters, cluster sizes, and effect magnitudes, that the resulting estimates agree with full-data estimates to at least three decimal places and exhibit comparable bias, coverage, and predictive performance. These results suggest the approximation is reliable in the settings examined, though we acknowledge it may not hold universally. We will revise the manuscript to explicitly describe the method as providing a close empirical approximation, to list the matched statistics more prominently, and to add a limitations discussion on the absence of a general theoretical guarantee. revision: yes

Circularity Check

0 steps flagged

No circularity: pseudo-data generation and GLMM fitting remain independent of target estimates

full rationale

The paper's core procedure generates pseudo-data to match one-time shared summary statistics of unavailable data, then fits the GLMM directly to the pseudo-data. This step is a modeling choice whose validity is assessed by external comparison to fits on the original data (reported as matching to three decimals with comparable bias/coverage). No equation reduces the fitted GLMM parameters to the input summaries by construction, no self-citation supplies a uniqueness theorem or ansatz, and no parameter is fitted on a subset then relabeled as a prediction. The derivation chain is therefore self-contained; performance claims rest on simulation evidence rather than definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that summary statistics suffice to generate pseudo-data preserving GLMM information; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Pseudo-data can be generated to match summary statistics of real data sufficiently for accurate GLMM parameter estimation
This is the core mechanism stated in the abstract for replacing actual data.

pith-pipeline@v0.9.0 · 5468 in / 1257 out tokens · 58441 ms · 2026-05-09T18:12:16.836350+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 17 canonical work pages

[1]

Healthcare Data Breaches: Insights and Impli- cations

Seh AH, Zarour M, Alenezi M, et al. Healthcare Data Breaches: Insights and Impli- cations. Healthcare 2020; 8(2). 10.3390/healthcare8020133

work page doi:10.3390/healthcare8020133 2020
[2]

How should meta-regression analyses be un- dertaken and interpreted?

Thompson SG, Higgins JPT. How should meta-regression analyses be un- dertaken and interpreted?. Statistics in Medicine 2002; 21(11): 1559-1573. https://doi.org/10.1002/sim.1187

work page doi:10.1002/sim.1187 2002
[3]

Individual patient- ver- sus group-level data meta-regressions for the investigation of treatment eﬀect modi- ﬁers: ecological bias rears its ugly head

Berlin JA, Santanna J, Schmid CH, Szczech LA, Feldman HI. Individual patient- ver- sus group-level data meta-regressions for the investigation of treatment eﬀect modi- ﬁers: ecological bias rears its ugly head. Statistics in Medicine 2002; 21(3): 371-387. https://doi.org/10.1002/sim.1023

work page doi:10.1002/sim.1023 2002
[4]

Meta-analysis using individual participant data: one- stage and two-stage approaches, and why they may diﬀer

Burke DL, Ensor J, Riley RD. Meta-analysis using individual participant data: one- stage and two-stage approaches, and why they may diﬀer. Statistics in Medicine 2017; 36(5): 855 875. Cited by: 357; All Open Access, Green Open Access, Hybrid Gold Open Access 10.1002/sim.7141

work page doi:10.1002/sim.7141 2017
[5]

Individual participant data meta- analyses compared with meta-analyses based on aggregate data

Tudur Smith C, Marcucci M, Nolan SJ, et al. Individual participant data meta- analyses compared with meta-analyses based on aggregate data. Cochrane Database Syst. Rev. 2016; 9: MR000007

2016
[6]

Individual Par- ticipant Data Meta-Analysis for a Binary Outcome: One-Stage or Two-Stage?

Debray TPA, Moons KGM, Abo-Zaid GMA, Koﬃjberg H, Riley RD. Individual Par- ticipant Data Meta-Analysis for a Binary Outcome: One-Stage or Two-Stage?. PLOS ONE 2013; 8(4): 1-10. 10.1371/journal.pone.0060650. 29

work page doi:10.1371/journal.pone.0060650 2013
[7]

Privacy-preserving construction of generalized linear mixed model for biomedical computation

Zhu R, Jiang C, Wang X, Wang S, Zheng H, Tang H. Privacy-preserving construction of generalized linear mixed model for biomedical computation. Bioinformatics 2020; 36(Supplement_1): i128-i135. 10.1093/bioinformatics/btaa478

work page doi:10.1093/bioinformatics/btaa478 2020
[8]

dPQL: a lossless distributed algorithm for gen- eralized linear mixed model with application to privacy-preserving hospital proﬁling

Luo C, Islam MN, Sheils NE, et al. dPQL: a lossless distributed algorithm for gen- eralized linear mixed model with application to privacy-preserving hospital proﬁling. Journal of the American Medical Informatics Association 2022; 29(8): 1366-1371. 10.1093/jamia/ocac067

work page doi:10.1093/jamia/ocac067 2022
[9]

Federated learning algorithms for generalized mixed-eﬀects model (GLMM) on horizontally partitioned data from distributed sources

Li W, Tong J, Anjum MM, Mohammed N, Chen Y, Jiang X. Federated learning algorithms for generalized mixed-eﬀects model (GLMM) on horizontally partitioned data from distributed sources. BMC Medical Informatics and Decision Making 2022; 22(1): 269. 10.1186/s12911-022-02014-1

work page doi:10.1186/s12911-022-02014-1 2022
[10]

A privacy-preserving and computation-eﬃcient federated algorithm for generalized linear mixed models to analyze correlated electronic health records data

Yan Z, Zachrison KS, Schwamm LH, Estrada JJ, Duan R. A privacy-preserving and computation-eﬃcient federated algorithm for generalized linear mixed models to analyze correlated electronic health records data. PLOS ONE 2023; 18(1): 1-15. 10.1371/journal.pone.0280192

work page doi:10.1371/journal.pone.0280192 2023
[11]

ODAL: A one-shot distributed algorithm to perform logistic regressions on electronic health records data from multiple clinical sites: 30-41

Duan R, Boland MR, Moore JH, Chen Y. ODAL: A one-shot distributed algorithm to perform logistic regressions on electronic health records data from multiple clinical sites: 30-41
[12]

Learning from local to global: An eﬃcient dis- tributed algorithm for modeling time-to-event data

Duan R, Luo C, Schuemie MJ, et al. Learning from local to global: An eﬃcient dis- tributed algorithm for modeling time-to-event data. Journal of the American Medical Informatics Association 2020; 27(7): 1028-1036. 10.1093/jamia/ocaa044

work page doi:10.1093/jamia/ocaa044 2020
[13]

DLMM as a lossless one-shot algorithm for col- laborative multi-site distributed linear mixed models

Luo C, Islam M, Sheils N, et al. DLMM as a lossless one-shot algorithm for col- laborative multi-site distributed linear mixed models. Nat Commun. 2022; 13(1). https://doi.org/10.1038/s41467-022-29160-4. 30

work page doi:10.1038/s41467-022-29160-4 2022
[14]

Linear Mixed Modeling of Federated Data When Only the Mean, Covariance, and Sample Size Are A vailable

Limpoco MAA, Faes C, Hens N. Linear Mixed Modeling of Federated Data When Only the Mean, Covariance, and Sample Size Are A vailable. Statistics in Medicine 2025; 44(1-2): e10300. https://doi.org/10.1002/sim.10300

work page doi:10.1002/sim.10300 2025
[15]

Statistical Inference

Casella G, Berger R. Statistical Inference. 2nd ed. California: Duxbury Resource Center; 2001

2001
[16]

Federated Mixed Eﬀects Logistic Regression Based on One-Time Shared Summary Statistics

Limpoco MAA, Faes C, Hens N. Federated Mixed Eﬀects Logistic Regression Based on One-Time Shared Summary Statistics. Biometrical Journal 2025; 67(5): e70080. https://doi.org/10.1002/bimj.70080

work page doi:10.1002/bimj.70080 2025
[17]

R: A Language and Environment for Statistical Computing

R Core Team. R: A Language and Environment for Statistical Computing . Vienna, Austria: R Foundation for Statistical Computing; 2024

2024
[18]

pracma: Practical Numerical Math Functions

Borchers HW. pracma: Practical Numerical Math Functions . 2023. R package version 2.4.4

2023
[19]

A Systematic Review of Synthetic Data Generation Tech- niques Using Generative AI

Goyal M, Mahmoud QH. A Systematic Review of Synthetic Data Generation Tech- niques Using Generative AI. Electronics 2024; 13(17). 10.3390/electronics13173509

work page doi:10.3390/electronics13173509 2024
[20]

Synthetic data generation methods in healthcare: A review on open-source tools and methods

Pezoulas VC, Zaridis DI, Mylona E, et al. Synthetic data generation methods in healthcare: A review on open-source tools and methods. Computational and Structural Biotechnology Journal 2024; 23: 2892-2910. https://doi.org/10.1016/j.csbj.2024.07.005

work page doi:10.1016/j.csbj.2024.07.005 2024
[21]

Generating high-ﬁdelity privacy-conscious syn- thetic patient data for causal eﬀect estimation with multiple treatments

Shi J, Wang D, Tesei G, Norgeot B. Generating high-ﬁdelity privacy-conscious syn- thetic patient data for causal eﬀect estimation with multiple treatments. Frontiers in Artiﬁcial Intelligence 2022; Volume 5 - 2022. 10.3389/frai.2022.918813

work page doi:10.3389/frai.2022.918813 2022
[22]

Challenges of Using Synthetic Data Generation Methods for Tabular Microdata

Miletic M, Sariyar M. Challenges of Using Synthetic Data Generation Methods for Tabular Microdata. Applied Sciences 2024; 14(14). 10.3390/app14145975. 31 A Appendix A.1 Theorems Theorem A.1 (Weierstrass approximation theorem) Suppose f is a continuous real- valued function deﬁned on the real interval [a, b]. For every ε > 0, there exists a polynomial p of...

work page doi:10.3390/app14145975 2024

[1] [1]

Healthcare Data Breaches: Insights and Impli- cations

Seh AH, Zarour M, Alenezi M, et al. Healthcare Data Breaches: Insights and Impli- cations. Healthcare 2020; 8(2). 10.3390/healthcare8020133

work page doi:10.3390/healthcare8020133 2020

[2] [2]

How should meta-regression analyses be un- dertaken and interpreted?

Thompson SG, Higgins JPT. How should meta-regression analyses be un- dertaken and interpreted?. Statistics in Medicine 2002; 21(11): 1559-1573. https://doi.org/10.1002/sim.1187

work page doi:10.1002/sim.1187 2002

[3] [3]

Individual patient- ver- sus group-level data meta-regressions for the investigation of treatment eﬀect modi- ﬁers: ecological bias rears its ugly head

Berlin JA, Santanna J, Schmid CH, Szczech LA, Feldman HI. Individual patient- ver- sus group-level data meta-regressions for the investigation of treatment eﬀect modi- ﬁers: ecological bias rears its ugly head. Statistics in Medicine 2002; 21(3): 371-387. https://doi.org/10.1002/sim.1023

work page doi:10.1002/sim.1023 2002

[4] [4]

Meta-analysis using individual participant data: one- stage and two-stage approaches, and why they may diﬀer

Burke DL, Ensor J, Riley RD. Meta-analysis using individual participant data: one- stage and two-stage approaches, and why they may diﬀer. Statistics in Medicine 2017; 36(5): 855 875. Cited by: 357; All Open Access, Green Open Access, Hybrid Gold Open Access 10.1002/sim.7141

work page doi:10.1002/sim.7141 2017

[5] [5]

Individual participant data meta- analyses compared with meta-analyses based on aggregate data

Tudur Smith C, Marcucci M, Nolan SJ, et al. Individual participant data meta- analyses compared with meta-analyses based on aggregate data. Cochrane Database Syst. Rev. 2016; 9: MR000007

2016

[6] [6]

Individual Par- ticipant Data Meta-Analysis for a Binary Outcome: One-Stage or Two-Stage?

Debray TPA, Moons KGM, Abo-Zaid GMA, Koﬃjberg H, Riley RD. Individual Par- ticipant Data Meta-Analysis for a Binary Outcome: One-Stage or Two-Stage?. PLOS ONE 2013; 8(4): 1-10. 10.1371/journal.pone.0060650. 29

work page doi:10.1371/journal.pone.0060650 2013

[7] [7]

Privacy-preserving construction of generalized linear mixed model for biomedical computation

Zhu R, Jiang C, Wang X, Wang S, Zheng H, Tang H. Privacy-preserving construction of generalized linear mixed model for biomedical computation. Bioinformatics 2020; 36(Supplement_1): i128-i135. 10.1093/bioinformatics/btaa478

work page doi:10.1093/bioinformatics/btaa478 2020

[8] [8]

dPQL: a lossless distributed algorithm for gen- eralized linear mixed model with application to privacy-preserving hospital proﬁling

Luo C, Islam MN, Sheils NE, et al. dPQL: a lossless distributed algorithm for gen- eralized linear mixed model with application to privacy-preserving hospital proﬁling. Journal of the American Medical Informatics Association 2022; 29(8): 1366-1371. 10.1093/jamia/ocac067

work page doi:10.1093/jamia/ocac067 2022

[9] [9]

Federated learning algorithms for generalized mixed-eﬀects model (GLMM) on horizontally partitioned data from distributed sources

Li W, Tong J, Anjum MM, Mohammed N, Chen Y, Jiang X. Federated learning algorithms for generalized mixed-eﬀects model (GLMM) on horizontally partitioned data from distributed sources. BMC Medical Informatics and Decision Making 2022; 22(1): 269. 10.1186/s12911-022-02014-1

work page doi:10.1186/s12911-022-02014-1 2022

[10] [10]

A privacy-preserving and computation-eﬃcient federated algorithm for generalized linear mixed models to analyze correlated electronic health records data

Yan Z, Zachrison KS, Schwamm LH, Estrada JJ, Duan R. A privacy-preserving and computation-eﬃcient federated algorithm for generalized linear mixed models to analyze correlated electronic health records data. PLOS ONE 2023; 18(1): 1-15. 10.1371/journal.pone.0280192

work page doi:10.1371/journal.pone.0280192 2023

[11] [11]

ODAL: A one-shot distributed algorithm to perform logistic regressions on electronic health records data from multiple clinical sites: 30-41

Duan R, Boland MR, Moore JH, Chen Y. ODAL: A one-shot distributed algorithm to perform logistic regressions on electronic health records data from multiple clinical sites: 30-41

[12] [12]

Learning from local to global: An eﬃcient dis- tributed algorithm for modeling time-to-event data

Duan R, Luo C, Schuemie MJ, et al. Learning from local to global: An eﬃcient dis- tributed algorithm for modeling time-to-event data. Journal of the American Medical Informatics Association 2020; 27(7): 1028-1036. 10.1093/jamia/ocaa044

work page doi:10.1093/jamia/ocaa044 2020

[13] [13]

DLMM as a lossless one-shot algorithm for col- laborative multi-site distributed linear mixed models

Luo C, Islam M, Sheils N, et al. DLMM as a lossless one-shot algorithm for col- laborative multi-site distributed linear mixed models. Nat Commun. 2022; 13(1). https://doi.org/10.1038/s41467-022-29160-4. 30

work page doi:10.1038/s41467-022-29160-4 2022

[14] [14]

Linear Mixed Modeling of Federated Data When Only the Mean, Covariance, and Sample Size Are A vailable

Limpoco MAA, Faes C, Hens N. Linear Mixed Modeling of Federated Data When Only the Mean, Covariance, and Sample Size Are A vailable. Statistics in Medicine 2025; 44(1-2): e10300. https://doi.org/10.1002/sim.10300

work page doi:10.1002/sim.10300 2025

[15] [15]

Statistical Inference

Casella G, Berger R. Statistical Inference. 2nd ed. California: Duxbury Resource Center; 2001

2001

[16] [16]

Federated Mixed Eﬀects Logistic Regression Based on One-Time Shared Summary Statistics

Limpoco MAA, Faes C, Hens N. Federated Mixed Eﬀects Logistic Regression Based on One-Time Shared Summary Statistics. Biometrical Journal 2025; 67(5): e70080. https://doi.org/10.1002/bimj.70080

work page doi:10.1002/bimj.70080 2025

[17] [17]

R: A Language and Environment for Statistical Computing

R Core Team. R: A Language and Environment for Statistical Computing . Vienna, Austria: R Foundation for Statistical Computing; 2024

2024

[18] [18]

pracma: Practical Numerical Math Functions

Borchers HW. pracma: Practical Numerical Math Functions . 2023. R package version 2.4.4

2023

[19] [19]

A Systematic Review of Synthetic Data Generation Tech- niques Using Generative AI

Goyal M, Mahmoud QH. A Systematic Review of Synthetic Data Generation Tech- niques Using Generative AI. Electronics 2024; 13(17). 10.3390/electronics13173509

work page doi:10.3390/electronics13173509 2024

[20] [20]

Synthetic data generation methods in healthcare: A review on open-source tools and methods

Pezoulas VC, Zaridis DI, Mylona E, et al. Synthetic data generation methods in healthcare: A review on open-source tools and methods. Computational and Structural Biotechnology Journal 2024; 23: 2892-2910. https://doi.org/10.1016/j.csbj.2024.07.005

work page doi:10.1016/j.csbj.2024.07.005 2024

[21] [21]

Generating high-ﬁdelity privacy-conscious syn- thetic patient data for causal eﬀect estimation with multiple treatments

Shi J, Wang D, Tesei G, Norgeot B. Generating high-ﬁdelity privacy-conscious syn- thetic patient data for causal eﬀect estimation with multiple treatments. Frontiers in Artiﬁcial Intelligence 2022; Volume 5 - 2022. 10.3389/frai.2022.918813

work page doi:10.3389/frai.2022.918813 2022

[22] [22]

Challenges of Using Synthetic Data Generation Methods for Tabular Microdata

Miletic M, Sariyar M. Challenges of Using Synthetic Data Generation Methods for Tabular Microdata. Applied Sciences 2024; 14(14). 10.3390/app14145975. 31 A Appendix A.1 Theorems Theorem A.1 (Weierstrass approximation theorem) Suppose f is a continuous real- valued function deﬁned on the real interval [a, b]. For every ε > 0, there exists a polynomial p of...

work page doi:10.3390/app14145975 2024