Privacy-preserving Meta-analysis through Low-Rank Basis Hunting
Pith reviewed 2026-05-08 05:44 UTC · model grok-4.3
The pith
Meta-analysis predicts functions for new populations from study summaries alone by recovering shared low-rank bases.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under the modeling assumption that every study-specific function is a convex combination of a fixed low-dimensional set of latent basis functions, the bases themselves can be recovered consistently from noisy study-level estimates by extending the successive projection algorithm to the functional setting and adding a denoising step. Once the bases are in hand, the combination weights for each study are modeled flexibly against the observed study covariates, which in turn permits direct prediction of the target function for a new population whose covariates are known, all while using only aggregate information and preserving the privacy of individual records.
What carries the argument
The shared low-rank structure in which each study's true function lies in the convex hull of a small set of latent basis functions, recovered via a denoised functional extension of the successive projection algorithm.
If this is right
- Prediction of regression or treatment-effect functions becomes possible for a target population using only its covariate profile and the study's aggregate estimates.
- Each original study can employ its own machine-learning estimator without forcing a common functional form across studies.
- Privacy is maintained because no individual participant records need to leave their originating sites.
- Conformal prediction intervals achieve asymptotically valid marginal coverage under exchangeability plus mild estimation-error bounds.
Where Pith is reading between the lines
- The same low-rank convex-hull device could be applied to other functional meta-analysis settings such as survival curves or density estimation.
- If the number of latent bases is allowed to grow slowly with the number of studies, the method might accommodate more heterogeneous populations than currently assumed.
- In domains where data sharing is restricted by regulation, the aggregate-only workflow could enable collaborative modeling that would otherwise be infeasible.
Load-bearing premise
The true functions from the different studies are all convex combinations of the same small collection of latent basis functions.
What would settle it
Collect many studies, apply the basis-hunting procedure, and check whether the recovered bases plus estimated weights can approximate the held-out study functions with error that shrinks at the expected rate as study sample sizes grow; failure of this approximation on real or simulated data with known convex-hull structure would falsify the claim.
Figures
read the original abstract
A central challenge of meta-analysis is that the populations underlying existing studies often differ from the target population in unknown ways. We study the problem of predicting function-valued quantities, such as regression and conditional average treatment effect functions, for a new target population using only study-level covariates and estimates. We propose MetaHunt, a new meta-analysis methodology based on a shared low-rank structure, in which the true function from each study lies within the convex hull of a small set of latent basis functions. To recover these basis functions, we extend the Successive Projection Algorithm to the functional setting, incorporating a denoised basis-hunting step. We establish consistency of the recovered basis functions under mild regularity conditions. We then model the relationship between study-level covariates and the corresponding mixing weights using flexible semi-parametric or non-parametric methods. MetaHunt is privacy-preserving and enables meta-analytic prediction based on study-level information alone, even when individual-level data are unavailable to analysts. In addition, for each study, functions of interest can be estimated using possibly different machine learning algorithms. For uncertainty quantification, we construct prediction intervals via conformal prediction. We show that, under exchangeability and mild estimation-error conditions, these intervals achieve asymptotically valid marginal coverage. We demonstrate the effectiveness of MetaHunt through both simulation studies and empirical applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces MetaHunt, a privacy-preserving meta-analysis method for predicting function-valued quantities (e.g., regression or CATE functions) in a target population using only study-level covariates and estimates. It assumes each study's true function lies in the convex hull of a small set of latent basis functions, recovers these bases via an extension of the Successive Projection Algorithm incorporating a denoised step, establishes consistency under mild regularity conditions, models the mixing weights with flexible semi- or non-parametric methods, and constructs conformal prediction intervals achieving asymptotic marginal coverage under exchangeability and mild estimation-error conditions. The approach permits heterogeneous machine-learning estimators across studies and is demonstrated via simulations and empirical applications.
Significance. If the consistency and coverage results hold, the work offers a meaningful contribution to meta-analysis by enabling transport of function-valued inferences to new populations while requiring only aggregate data, which is valuable in privacy-constrained settings. The low-rank convex-hull structure provides a structured handle on heterogeneity, the allowance for study-specific estimators adds flexibility, and the combination of basis recovery with conformal inference supplies both point estimates and valid uncertainty quantification. The theoretical claims are presented as conditional on the modeling assumptions rather than unconditional, which is appropriately scoped.
major comments (2)
- [Basis recovery and consistency] Basis recovery section: the consistency of the recovered basis functions is established via the extended Successive Projection Algorithm with a denoised step under mild regularity conditions; the manuscript should explicitly state how the denoising modification alters the original SPA guarantees and whether additional conditions on the functional space or noise level are required beyond those stated in the abstract.
- [Conformal prediction] Conformal prediction section: the asymptotic marginal coverage is claimed under exchangeability and mild estimation-error conditions; the manuscript should clarify whether the approximation error from the low-rank basis recovery is absorbed into the estimation-error term or requires a separate rate condition to ensure the coverage guarantee remains valid.
minor comments (2)
- [Implementation and tuning] The selection procedure for the number of latent basis functions is mentioned as a free parameter; the manuscript would benefit from guidance or a data-driven rule for choosing this number in the simulation and application sections.
- [Notation] Notation for estimated versus population quantities (e.g., basis functions and mixing weights) should be made uniform across the theoretical and empirical sections to improve readability.
Simulated Author's Rebuttal
We thank the referee for their careful review and recommendation for minor revision. The comments provide valuable guidance on enhancing the clarity of our theoretical contributions regarding basis recovery and conformal prediction. We respond to each major comment below.
read point-by-point responses
-
Referee: [Basis recovery and consistency] Basis recovery section: the consistency of the recovered basis functions is established via the extended Successive Projection Algorithm with a denoised step under mild regularity conditions; the manuscript should explicitly state how the denoising modification alters the original SPA guarantees and whether additional conditions on the functional space or noise level are required beyond those stated in the abstract.
Authors: We agree that the manuscript would benefit from an explicit statement on the impact of the denoising modification. Our extension incorporates a denoised basis-hunting step that reduces the effect of noise in the observed functions before applying the successive projections. This modification does not alter the fundamental consistency guarantees of the original SPA; it maintains them under the same mild regularity conditions, provided the denoising is consistent, which holds without additional conditions on the functional space or noise level beyond those in the abstract. In the revised manuscript, we will add a clarifying paragraph in the basis recovery section to detail this relationship. revision: yes
-
Referee: [Conformal prediction] Conformal prediction section: the asymptotic marginal coverage is claimed under exchangeability and mild estimation-error conditions; the manuscript should clarify whether the approximation error from the low-rank basis recovery is absorbed into the estimation-error term or requires a separate rate condition to ensure the coverage guarantee remains valid.
Authors: We appreciate this request for clarification. The approximation error arising from the low-rank basis recovery is absorbed into the mild estimation-error conditions. Under the established consistency of the basis functions, this error term vanishes at a sufficient rate to be encompassed within the conditions that ensure the asymptotic marginal coverage of the conformal prediction intervals, without necessitating a separate rate condition. We will revise the conformal prediction section to explicitly note this absorption and its implications for the coverage guarantee. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper explicitly posits the shared low-rank convex-hull structure as the modeling assumption enabling MetaHunt, then derives consistency of the recovered basis functions by extending the Successive Projection Algorithm with a denoised step under mild regularity conditions. The subsequent semi-parametric modeling of mixing weights from study-level covariates and the conformal prediction intervals are applied separately to the recovered bases and do not reduce any target quantity to a fitted input by construction. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps; the internal logic remains self-contained against the stated assumptions and external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- number of latent basis functions
axioms (2)
- domain assumption The true function from each study lies within the convex hull of a small set of latent basis functions.
- domain assumption Mild regularity conditions suffice for consistency of the recovered basis functions.
Reference graph
Works this paper leans on
-
[1]
The transfer performance of economic models.arXiv preprint arXiv:2202.04796,
Isaiah Andrews, Drew Fudenberg, Lihua Lei, Annie Liang, and Chaofeng Wu. The transfer performance of economic models.arXiv preprint arXiv:2202.04796,
-
[2]
SiqiCaoandShuYang. Heterogeneity-awarefederatedcausalinferenceleveragingeffect-measure transportability. arXiv preprint arXiv:2510.16317,
-
[3]
Causally-interpretable random-effects meta-analysis.arXiv preprint arXiv:2302.03544,
Justin M Clark, Kollin W Rott, James S Hodges, and Jared D Huling. Causally-interpretable random-effects meta-analysis.arXiv preprint arXiv:2302.03544,
-
[4]
Pairwise covariates-adjusted block model for commu- nity detection.arXiv preprint arXiv:1807.03469,
Sihan Huang, Jiajin Sun, and Yang Feng. Pairwise covariates-adjusted block model for commu- nity detection.arXiv preprint arXiv:1807.03469,
-
[5]
Yujin Jeong and Dominik Rothenhäusler. Out-of-distribution generalization under random, dense distributional shifts.arXiv preprint arXiv:2404.18370,
-
[6]
Mixed membership estimation for social networks
Jiashun Jin, Zheng Tracy Ke, and Shengming Luo. Mixed membership estimation for social networks. Journal of Econometrics, 239(2):105369, 2024a. Jiashun Jin, Zheng Tracy Ke, Gabriel Moryoussef, Jiajun Tang, and Jingming Wang. Improved algorithm and bounds for successive projection.arXiv preprint arXiv:2403.11013, 2024b. 34 Ying Jin, Zhimei Ren, and Emmanue...
-
[7]
Minimax regret learning for data with heterogeneous subgroups.arXiv preprint arXiv:2405.01709,
Weibin Mo, Weijing Tang, Songkai Xue, Yufeng Liu, and Ji Zhu. Minimax regret learning for data with heterogeneous subgroups.arXiv preprint arXiv:2405.01709,
-
[8]
Wenqi Shi and José R Zubizarreta. On the use of weighting for personalized and transparent evidence synthesis.arXiv preprint arXiv:2509.00228,
-
[9]
Federated learning in distributed medical databases: Meta-analysis of large- scale subcortical brain data
Santiago Silva, Boris A Gutman, Eduardo Romero, Paul M Thompson, Andre Altmann, and Marco Lorenzi. Federated learning in distributed medical databases: Meta-analysis of large- scale subcortical brain data. In 2019 IEEE 16th international symposium on biomedical imaging (ISBI 2019), pages 270–274. IEEE,
2019
-
[10]
A history of meta-regression: Technical, conceptual, and practical developments between 1974 and 2018.Researchsynthesis methods, 10(2):161–179,
Elizabeth Tipton, James E Pustejovsky, and Hedyeh Ahmadi. A history of meta-regression: Technical, conceptual, and practical developments between 1974 and 2018.Researchsynthesis methods, 10(2):161–179,
1974
-
[11]
Tat-Thang Vo, Tran Trong Khoi Le, Sivem Afach, and Stijn Vansteelandt. Integration of aggregated data in causally interpretable meta-analysis by inverse weighting.arXiv preprint arXiv:2503.05634,
-
[12]
Ruoxuan Xiong, Allison Koenecke, Michael Powell, Zhu Shen, Joshua T Vogelstein, and Susan Athey. Federated causal inference in heterogeneous observational data.Statistics in Medicine, 42(24):4418–4439, 2023a. Xin Xiong, Zijian Guo, and Tianxi Cai. Distributionally robust transfer learning.arXiv preprint arXiv:2309.06534, 2023b. Jie Xu, Benjamin S Glicksbe...
-
[13]
Minimax Regret Estimation for Generalizing Heterogeneous Treatment Effects with Multisite Data
Yi Zhang, Melody Huang, and Kosuke Imai. Minimax regret estimation for generalizing hetero- geneous treatment effects with multisite data.arXiv preprint arXiv:2412.11136, 2024a. Zihan Zhang, Wenhao Zhan, Yuxin Chen, Simon S Du, and Jason D Lee. Optimal multi- distribution learning. InThe Thirty Seventh Annual Conference on Learning Theory, pages 5220–5223...
work page internal anchor Pith review Pith/arXiv arXiv
-
[14]
Letd (s) max := max 1≤k≤K ∥g(s),k∥H
This proves the second claim. Letd (s) max := max 1≤k≤K ∥g(s),k∥H. By construction, for each already-selected vertex index kr (r= 1, . . . , s−1), the residual norm∥g (s),kr ∥H is small. Specifically, it equals the norm of the component ofgkr orthogonal toH s−1, and the induction hypothesis ensuresgkr is well- approximated by ˆf (ir) ∈ H s−1. This gives a...
2005
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.