Multi-Site Health Research Integrating Complementary Data Sources: A Scoping Review of Statistical Inference Methods for Vertically Partitioned Data

arxiv: 2604.02595 · v1 · submitted 2026-04-03 · 📊 stat.ME

Multi-Site Health Research Integrating Complementary Data Sources: A Scoping Review of Statistical Inference Methods for Vertically Partitioned Data

Marie-Pier Domingue , Simon L\'evesque , Anita Burgun , Jean-Fran\c{c}ois Ethier , F\'elix Camirand Lemyre This is my paper

Pith reviewed 2026-05-13 19:13 UTC · model grok-4.3

classification 📊 stat.ME

keywords vertically partitioned datastatistical inferenceprivacy-preserving analysismulti-site health datascoping reviewlinear regressionlogistic regressiondata integration

0 comments p. Extension

The pith

Vertical methods for statistical inference on partitioned health data rarely achieve equivalence to centralized results with efficient communication and strong privacy protection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This scoping review maps existing vertical methods that let separate institutions run statistical analyses on complementary data for the same people without pooling or sharing individual records. It screened thousands of papers and extracted details from 30 methods, focusing on whether results match what a single pooled database would produce, how many communication rounds between sites are needed, and whether privacy is formally guaranteed. Most methods target linear or logistic regression, yet equivalence is not always proven, communication often requires multiple exchanges, and few papers supply actual privacy assessments despite describing their work as privacy-preserving. The review concludes that the set of available techniques remains narrow and that very few methods meet all three practical requirements at the same time.

Core claim

The scope of existing approaches enabling statistical inference for vertically partitioned data is still relatively limited. Most existing methods do not concurrently achieve results equivalent to centralized analyses, high communication efficiency, and guaranteed protection of individual-level data.

What carries the argument

Systematic extraction of three properties across identified methods: comparability with pooled analysis, efficiency of communication schemes, and confidentiality safeguards.

If this is right

Linear and logistic regression are the most common inference tasks addressed by vertical methods.
Equivalence to pooled analyses is not systematically demonstrated across all proposed methods.
Most methods require multiple communication rounds between participating parties.
Only a minority of articles provide explicit privacy assessments even though nearly all describe their approach as privacy-preserving.
Applications of these methods to real-world vertically partitioned health data remain limited.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

New vertical methods should be designed from the start to satisfy equivalence, low communication cost, and verifiable privacy in one package.
Standardized reporting requirements for privacy guarantees would make it easier to compare methods across studies.
The scarcity of real-data applications suggests a need to test promising methods on actual multi-site health datasets to move from theory to practice.

Load-bearing premise

That the database searches and citation screening captured a representative sample of all vertical methods and that the extracted properties were reported consistently enough across papers to support the overall characterization.

What would settle it

Identification of many additional vertical methods that simultaneously deliver exact equivalence to pooled results, require only one communication round, and include formal privacy proofs would indicate the scope is broader than reported.

Figures

Figures reproduced from arXiv: 2604.02595 by Anita Burgun, F\'elix Camirand Lemyre, Jean-Fran\c{c}ois Ethier, Marie-Pier Domingue, Simon L\'evesque.

**Figure 2.** Figure 2: General process in collaborative health data research with vertically partitioned data and main features [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: PRISMA flow chart for article screening process. Detailed inclusion and exclusion criteria are described [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Main communication workflows for the vertically distributed linear regression and logistic regression [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

**Figure 5.** Figure 5: Role of privacy among included articles. [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 1.** Figure 1: (a) Centralized data. (b) Horizontally partitioned data (example of two data [PITH_FULL_IMAGE:figures/full_fig_p023_1.png] view at source ↗

**Figure 1.** Figure 1: Extraction template from Covidence (1 of 2) [PITH_FULL_IMAGE:figures/full_fig_p035_1.png] view at source ↗

**Figure 2.** Figure 2: Extraction template from Covidence (2 of 2) [PITH_FULL_IMAGE:figures/full_fig_p036_2.png] view at source ↗

read the original abstract

To address the multidimensional nature of health-related questions, advances in health research often require integrating information from various data sources within statistical analyses. When complementary information pertaining to the same set of individuals are distributed across different institutions, vertical methods make it possible to obtain analysis results without sharing or pooling individual-level data. To guide stakeholders toward a transparent use of vertical methods, this study aims to (1) Identify existing vertical methods enabling statistical inference; and (2) Characterize the methodological properties of these methods and the current extent of their use with health data. We conducted a scoping review using four interdisciplinary databases. We then systematically extracted the characteristics of identified vertical methods with respect to comparability with the pooled analysis, efficiency of communication schemes and confidentiality. We additionally screened studies that cited included articles to identify applications on vertically partitioned real-world health data. Among 2887 articles initially screened, 30 were included in the review. Inference for the linear and the logistic regression framework were the most frequent statistical inference tasks undertaken in proposed methods. Equivalence with the pooled analyses was not systematically addressed and most methods required multiple communications between participating parties. Almost all articles described their approach as privacy-preserving, although a minority provided privacy assessments. The scope of existing approaches enabling statistical inference for vertically partitioned data is still relatively limited. Most existing methods do not concurrently achieve results equivalent to centralized analyses, high communication efficiency, and guaranteed protection of individual-level data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This scoping review gives a practical map of vertical inference methods for health data but its headline limitation may be inflated by how it handles papers that skip reporting on equivalence or privacy.

read the letter

The main thing to know is that this is a scoping review pulling together 30 papers on statistical inference for vertically partitioned health data. It checks how well they match pooled results, how many communication rounds they use, and whether they protect privacy, and concludes that few hit all three at once. Linear and logistic regression dominate the methods they found. The search across four databases plus citation screening looks systematic on the surface, and extracting those three properties gives readers a quick way to compare options without digging into every original paper. That part is useful for someone choosing or extending a method in multi-site health studies. What is new is just the synthesis and the property coding itself, not any fresh technique or data. The soft spot is in the big claim about limitations. The abstract already notes that equivalence was not systematically addressed and only a minority of papers gave privacy assessments. If the authors coded missing details as failure to achieve the property, then the finding that most methods fall short could reflect uneven reporting in the literature more than actual shortcomings in the methods. The stress-test note raises exactly this, and it holds up from what is described. This paper is for researchers working on distributed or federated analysis in health who want an overview of the current toolkit and its gaps. It is not aimed at people outside that niche or looking for new derivations. I would send it to peer review. The search and extraction are solid enough to deserve referee input, even if the interpretation of the gaps needs tightening in revision.

Referee Report

2 major / 1 minor

Summary. The manuscript is a scoping review that screened 2887 articles across four databases and included 30 papers on statistical inference methods for vertically partitioned data. It focuses on methods for linear and logistic regression, extracts properties related to equivalence with pooled analyses, communication efficiency, and privacy protection, and concludes that the scope of existing approaches remains limited, with most methods failing to concurrently achieve equivalence to centralized results, high communication efficiency, and guaranteed individual-level data protection.

Significance. If the property extraction is shown to be consistent and representative, the review would usefully map the landscape of vertical methods for multi-site health research and identify key gaps, particularly the frequent trade-offs among statistical fidelity, communication overhead, and privacy guarantees. This could inform both method developers and practitioners seeking to integrate complementary data sources without pooling raw records.

major comments (2)

[Abstract and Results] Abstract and Results: The abstract states that equivalence 'was not systematically addressed' yet concludes that most methods do not achieve results equivalent to centralized analyses. Clarify in the methods or results how non-equivalence was determined when papers did not address it (e.g., whether absence of reporting was coded as failure), and provide the explicit coding rubric or supplementary extraction table used for the 30 included studies.
[Results] Results: Only a minority of articles provided privacy assessments, yet almost all described their approach as privacy-preserving and the headline finding states that most methods lack 'guaranteed protection.' Specify the criteria applied to evaluate privacy guarantees (self-description versus formal assessment) and how this was operationalized across heterogeneous reporting standards.

minor comments (1)

[Methods] Methods: Expand the description of the citation screening process and any quality or risk-of-bias considerations applied to the included methods, even if formal bias tools were not used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these constructive comments, which identify opportunities to improve transparency in our methods and results. We provide point-by-point responses below and will revise the manuscript to include explicit documentation of our coding process.

read point-by-point responses

Referee: [Abstract and Results] Abstract and Results: The abstract states that equivalence 'was not systematically addressed' yet concludes that most methods do not achieve results equivalent to centralized analyses. Clarify in the methods or results how non-equivalence was determined when papers did not address it (e.g., whether absence of reporting was coded as failure), and provide the explicit coding rubric or supplementary extraction table used for the 30 included studies.

Authors: We thank the referee for noting this ambiguity. Equivalence was coded only when papers explicitly addressed it (via theoretical proofs of identical estimators or empirical demonstrations of numerical equivalence to pooled results). Papers that omitted any discussion of equivalence were coded strictly as 'not addressed' and were not counted as non-equivalent. The headline conclusion that most methods fail to achieve equivalence is drawn from the subset of papers that did evaluate it (where the majority fell short) together with the observation that unverified methods cannot be assumed equivalent. We will add a supplementary extraction table and a detailed coding rubric in the revised methods section that lists, for each of the 30 studies, the exact status recorded for equivalence, number of communication rounds, and privacy evaluation. revision: yes
Referee: [Results] Results: Only a minority of articles provided privacy assessments, yet almost all described their approach as privacy-preserving and the headline finding states that most methods lack 'guaranteed protection.' Specify the criteria applied to evaluate privacy guarantees (self-description versus formal assessment) and how this was operationalized across heterogeneous reporting standards.

Authors: We agree that the distinction requires explicit statement. 'Guaranteed protection' was operationalized as the presence of a formal assessment (differential privacy bounds, cryptographic security proofs, or empirical attack-resistance evaluations). Self-descriptions such as 'privacy-preserving' or 'secure' without accompanying formal analysis were recorded but did not qualify as guaranteed protection. This rule was applied uniformly by examining the methods and results sections of each paper. We will revise the methods section to state these criteria verbatim and will add illustrative examples in the results to show how heterogeneous reporting was classified. revision: yes

Circularity Check

0 steps flagged

Scoping review of vertical inference methods contains no derivations or predictions

full rationale

The paper is a descriptive scoping review that screens 2887 articles, includes 30, and extracts methodological properties (equivalence to pooled analysis, communication rounds, privacy assessments) from external literature. It performs no mathematical derivations, statistical predictions, parameter fitting, or modeling steps. No equations, ansatzes, or self-citation chains are used to justify central claims; the characterization follows directly from the screening and coding protocol applied to independent papers. The skeptic concern about heterogeneous reporting affects the strength of the synthesis but does not create circularity within the paper's own logic.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a scoping review the paper introduces no new free parameters, axioms, or invented entities; it only summarizes properties reported in prior methodological papers.

pith-pipeline@v0.9.0 · 5587 in / 1129 out tokens · 44757 ms · 2026-05-13T19:13:01.542542+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Among 2887 articles initially screened, 30 were included... Equivalence with the pooled analyses was not systematically addressed and most methods required multiple communications... Almost all articles described their approach as privacy-preserving, although a minority provided privacy assessments.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Three categories of techniques were used... encryption-based methods, methods based on vertically separable quantities and methods based on secure multiparty computations (SMC).

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

85 extracted references · 85 canonical work pages

[1]

Dash, S.K

S. Dash, S.K. Shakyawar, M. Sharma, S. Kaushik, Big data in healthcare: management, analysis and future prospects, J. Big Data 6 (2019) 54. https://doi.org/10.1186/s40537-019-0217-0

work page doi:10.1186/s40537-019-0217-0 2019
[2]

Chawla, D.A

N.V . Chawla, D.A. Davis, Bringing Big Data to Personalized Healthcare: A Patient- Centered Framework, J. Gen. Intern. Med. 28 (2013) 660–665. https://doi.org/10.1007/s11606-013-2455-8

work page doi:10.1007/s11606-013-2455-8 2013
[3]

Dankar, A

F.K. Dankar, A. Ptitsyn, S.K. Dankar, The development of large-scale de-identified biomedical databases in the age of genomics—principles and challenges, Hum. Genomics 12 (2018) 19. https://doi.org/10.1186/s40246-018-0147-5

work page doi:10.1186/s40246-018-0147-5 2018
[4]

https://nuage.recherche.usherbrooke.ca/en/ (accessed February 9, 2026)

Banques NuAge – Banques de données et d’échantillons biologiques de l’Étude longitudinale québécoise sur la nutrition comme déterminant d’un vieillissement réussi, (n.d.). https://nuage.recherche.usherbrooke.ca/en/ (accessed February 9, 2026)

work page 2026
[5]

Pfitzner, N

B. Pfitzner, N. Steckhan, B. Arnrich, Federated Learning in a Medical Context: A Systematic Literature Review, ACM Trans. Internet Technol. 21 (2021) 1–31. https://doi.org/10.1145/3412357

work page doi:10.1145/3412357 2021
[6]

Camirand Lemyre, S

F. Camirand Lemyre, S. Lévesque, M.-P. Domingue, K. Herrmann, J.-F. Ethier, Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Health Analytics, JMIR Med. Inform. 12 (2024) e53622. https://doi.org/10.2196/53622

work page doi:10.2196/53622 2024
[7]

J. Bohn, W. Eddings, S. Schneeweiss, Conducting privacy-preserving multivariable propensity score analysis when patient covariate information is stored in separate locations, Am. J. Epidemiol. 185 (2017) 501–510. https://doi.org/10.1093/aje/kww155

work page doi:10.1093/aje/kww155 2017
[8]

Chang, Z

C. Chang, Z. Bu, Q. Long, CEDAR: communication efficient distributed analysis for regressions, Biometrics 79 (2023) 2357–2369. https://doi.org/10.1111/biom.13786

work page doi:10.1111/biom.13786 2023
[9]

Tong, J.M

J. Tong, J.M. Reps, C. Luo, Y . Lu, L. Li, J.M. Ramirez-Anguita, M.T. Brand, S.L. DuVall, T. Falconer, A.M. Fuentes, Unlocking efficiency in real-world collaborative studies: a multi-site international study with one-shot lossless GLMM algorithm, Npj Digit. Med. 8 (2025) 457

work page 2025
[10]

Zhang, D

F. Zhang, D. Kreuter, Y . Chen, S. Dittmer, S. Tull, T. Shadbahr, M. Schut, F. Asselbergs, S. Kar, S. Sivapalaratnam, Recent methodological advances in federated learning for healthcare, Patterns 5 (2024)

work page 2024
[11]

Shukla, S

S. Shukla, S. Doyle, Federated learning in computational pathology: a literature review, J. Med. Imaging 12 (2025) 061412–061412

work page 2025
[12]

Torkzadehmahani, R

R. Torkzadehmahani, R. Nasirigerdeh, D.B. Blumenthal, T. Kacprowski, M. List, J. Matschinske, J. Spaeth, N.K. Wenke, J. Baumbach, Privacy-Preserving Artificial Intelligence Techniques in Biomedicine, Methods Inf. Med. 61 (2022) e12–e27. https://doi.org/10.1055/s-0041-1740630

work page doi:10.1055/s-0041-1740630 2022
[13]

Multimodal biomedical AI.Nature Medicine, 28(9):1773–1784, 2022

J.N. Acosta, G.J. Falcone, P. Rajpurkar, E.J. Topol, Multimodal biomedical AI, Nat. Med. 28 (2022) 1773–1784. https://doi.org/10.1038/s41591-022-01981-2

work page doi:10.1038/s41591-022-01981-2 2022
[15]

Y . Li, X. Jiang, S. Wang, H. Xiong, L. Ohno-Machado, VERTIcal Grid lOgistic regression (VERTIGO), J. Am. Med. Inform. Assoc. 23 (2016) 570–579. https://doi.org/10.1093/jamia/ocv146

work page doi:10.1093/jamia/ocv146 2016
[16]

C. Sun, L. Ippel, A. Dekker, M. Dumontier, J. van Soest, A systematic review on privacy-preserving distributed data mining, Data Sci. 4 (2021) 121–150. https://doi.org/10.3233/DS-210036

work page doi:10.3233/ds-210036 2021
[17]

Y . Liu, Y . Kang, T. Zou, Y . Pu, Y . He, X. Ye, Y . Ouyang, Y .-Q. Zhang, Q. Yang, Vertical Federated Learning: Concepts, Advances, and Challenges, IEEE Trans. Knowl. Data Eng. 36 (2024) 3615–3634. https://doi.org/10.1109/TKDE.2024.3352628

work page doi:10.1109/tkde.2024.3352628 2024
[18]

A. Khan, M. ten Thij, A. Wilbik, Vertical federated learning: a structured literature review, Knowl. Inf. Syst. 67 (2025) 3205–3243. https://doi.org/10.1007/s10115-025- 02356-y

work page doi:10.1007/s10115-025- 2025
[19]

H. Chen, H. Wang, Q. Long, D. Jin, Y . Li, Advancements in Federated Learning: Models, Methods, and Privacy, ACM Comput. Surv. (2024) 3664650. https://doi.org/10.1145/3664650

work page doi:10.1145/3664650 2024
[20]

Z.L. Teo, L. Jin, N. Liu, S. Li, D. Miao, X. Zhang, W.Y . Ng, T.F. Tan, D.M. Lee, K.J. Chua, J. Heng, Y . Liu, R.S.M. Goh, D.S.W. Ting, Federated machine learning in healthcare: A systematic review on clinical applications and technical architecture, Cell Rep. Med. 5 (2024) 101419. https://doi.org/10.1016/j.xcrm.2024.101419

work page doi:10.1016/j.xcrm.2024.101419 2024
[21]

Shmueli, To Explain or to Predict?, Stat

G. Shmueli, To Explain or to Predict?, Stat. Sci. 25 (2010) 289–310

work page 2010
[22]

James, D

G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning, Springer New York, New York, NY , 2013. https://doi.org/10.1007/978-1-4614-7138- 7

work page doi:10.1007/978-1-4614-7138- 2013
[23]

A wideband tunable, nonreciprocal bandpass filter using magnetostatic 19 surface waves with zero static power consumption,

C. Luo, Md.N. Islam, N.E. Sheils, J. Buresh, J. Reps, M.J. Schuemie, P.B. Ryan, M. Edmondson, R. Duan, J. Tong, A. Marks-Anglin, J. Bian, Z. Chen, T. Duarte-Salles, S. Fernández-Bertolín, T. Falconer, C. Kim, R.W. Park, S.R. Pfohl, N.H. Shah, A.E. Williams, H. Xu, Y . Zhou, E. Lautenbach, J.A. Doshi, R.M. Werner, D.A. Asch, Y . Chen, DLMM as a lossless on...

work page doi:10.1038/s41467- 2022
[24]

Q. Wu, J.M. Reps, L. Li, B. Zhang, Y . Lu, J. Tong, D. Zhang, T. Lumley, M.T. Brand, M. Van Zandt, T. Falconer, X. He, Y . Huang, H. Li, C. Yan, G. Tang, A.E. Williams, F. Wang, J. Bian, B. Malin, G. Hripcsak, M.J. Schuemie, Y . Lu, S. Drew, J. Zhou, D.A. Asch, Y . Chen, COLA-GLM: collaborative one-shot and lossless algorithms of generalized linear models...

work page doi:10.1038/s41746-025-01781-1 2025
[26]

Domingue, J.-F

M.-P. Domingue, J.-F. Ethier, J.-P. Morissette, S. Lévesque, A. Burgun, F. Camirand Lemyre, Revisiting VERTIGO and VERTIGO-CI: Identifying confidentiality breaches and introducing a statistically sound, efficient alternative, (2025). https://doi.org/10.21203/rs.3.rs-6933988/v1

work page doi:10.21203/rs.3.rs-6933988/v1 2025
[27]

Bak, V .I

M. Bak, V .I. Madai, L.A. Celi, G.A. Kaissis, R. Cornet, M. Maris, D. Rueckert, A. Buyx, S. McLennan, Federated learning is not a cure-all for data ethics, Nat. Mach. Intell. 6 (2024) 370–372. https://doi.org/10.1038/s42256-024-00813-x

work page doi:10.1038/s42256-024-00813-x 2024
[28]

Munn, M.D.J

Z. Munn, M.D.J. Peters, C. Stern, C. Tufanaru, A. McArthur, E. Aromataris, Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach, BMC Med. Res. Methodol. 18 (2018) 143. https://doi.org/10.1186/s12874-018-0611-x

work page doi:10.1186/s12874-018-0611-x 2018
[29]

Arksey, L

H. Arksey, L. O’Malley, Scoping studies: towards a methodological framework, Int. J. Soc. Res. Methodol. 8 (2005) 19–32. https://doi.org/10.1080/1364557032000119616

work page doi:10.1080/1364557032000119616 2005
[30]

Tricco, Erin Lillie, Wasifa Zarin, Kelly K

A.C. Tricco, E. Lillie, W. Zarin, K.K. O’Brien, H. Colquhoun, D. Levac, D. Moher, M.D.J. Peters, T. Horsley, L. Weeks, S. Hempel, E.A. Akl, C. Chang, J. McGowan, L. Stewart, L. Hartling, A. Aldcroft, M.G. Wilson, C. Garritty, S. Lewin, C.M. Godfrey, M.T. Macdonald, E.V . Langlois, K. Soares-Weiser, J. Moriarty, T. Clifford, Ö. Tunçalp, S.E. Straus, PRISMA...

work page doi:10.7326/m18-0850 2018
[31]

Wohlin, Guidelines for snowballing in systematic literature studies and a replication in software engineering, in: Proc

C. Wohlin, Guidelines for snowballing in systematic literature studies and a replication in software engineering, in: Proc. 18th Int. Conf. Eval. Assess. Softw. Eng., ACM, London England United Kingdom, 2014: pp. 1–10. https://doi.org/10.1145/2601248.2601268

work page doi:10.1145/2601248.2601268 2014
[32]

Carrera-Rivera, W

A. Carrera-Rivera, W. Ochoa, F. Larrinaga, G. Lasa, How-to conduct a systematic literature review: A quick guide for computer science research, MethodsX 9 (2022) 101895. https://doi.org/10.1016/j.mex.2022.101895

work page doi:10.1016/j.mex.2022.101895 2022
[33]

Fienberg, A.F

S.E. Fienberg, A.F. Karr, Y . Nardi, A.B. Slavkovic, Secure Logistic Regression with Multi-Party Distributed Databases, Proc. 56th Sess. ISI (2007)

work page 2007
[35]

M.-C. Liu, N. Zhang, A cryptographic solution to privacy-preserving two-party sign test computation on vertically partitioned data, in: Adv. Mater. Res., 2012: pp. 1249–

work page 2012
[39]

Reiter, C.N

J.P. Reiter, C.N. Kohnen, A.F. Karr, X. Lin, A.P. Sanil, Secure Regression for Vertically Partitioned, Partially Overlapping Data, Proc. Am. Stat. Assoc. (2004)

work page 2004
[40]

Hall, S.E

R. Hall, S.E. Fienberg, Y . Nardi, Secure Multiple Linear Regression Based on Homomorphic Encryption, (2011)

work page 2011
[42]

Kikuchi, C

H. Kikuchi, C. Hamanaga, H. Yasunaga, H. Matsui, H. Hashimoto, Privacy- Preserving Multiple Linear Regression of Vertically Partitioned Real Medical Datasets, in: 2017 IEEE 31st Int. Conf. Adv. Inf. Netw. Appl. AINA, 2017: pp. 1042–

work page 2017
[44]

A.F. Karr, X. Lin, A.P. Sanil, J.P. Reiter, Secure statistical analysis of distributed databases, in: Stat. Methods Counterterrorism Game Theory Model. Syndr. Surveill. Biom. Authentication, 2006: pp. 237–261. https://doi.org/10.1007/0-387-35209- 0_14

work page doi:10.1007/0-387-35209- 2006
[45]

A.F. Karr, X. Lin, A.P. Sanil, J.P. Reiter, Privacy-preserving analysis of vertically partitioned data using secure matrix products, J. Off. Stat. 25 (2009) 125–138

work page 2009
[47]

In: Krebbers, R

S.E. Fienberg, Y . Nardi, A.B. Slavković, Valid statistical analysis for logistic regression with multiple sources, in: Lect. Notes Comput. Sci. Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinforma., 2009: pp. 82–94. https://doi.org/10.1007/978- 3-642-10233-2_8

work page doi:10.1007/978- 2009
[50]

J. Kim, W. Li, T. Bath, X. Jiang, L. Ohno-Machado, VERTIcal Grid lOgistic regression with Confidence Intervals (VERTIGO-CI), Proc. – AMIA Jt. Summits Transl. Sci. 2021 (2021) 355–364

work page 2021
[52]

Hector, P.X.-K

E.C. Hector, P.X.-K. Song, Doubly distributed supervised learning and inference with high-dimensional correlated outcomes, J Mach Learn Res 21 (2020) Article-173

work page 2020
[56]

Snoke, T

J. Snoke, T. Brick, A. Slavković, Accurate Estimation of Structural Equation Models with Remote Partitioned Data, in: Priv. Stat. Databases, Springer International Publishing, Cham, 2016: pp. 190–209

work page 2016
[63]

Aggarwal, P.S

C.C. Aggarwal, P.S. Yu, Privacy-Preserving Data Mining: Models and Algorithms, Springer, 2008

work page 2008
[64]

Q. Her, T. Kent, Y . Samizo, A. Slavkovic, Y . Vilk, S. Toh, Automatable Distributed Regression Analysis of Vertically Partitioned Data Facilitated by PopMedNet: Feasibility and Enhancement Study, JMIR Med. Inform. 9 (2021) e21459. https://doi.org/10.2196/21459

work page doi:10.2196/21459 2021
[65]

Kuo, R.A

T.-T. Kuo, R.A. Gabriel, J. Koola, R.T. Schooley, L. Ohno-Machado, Distributed cross-learning for equitable federated models-privacy-preserving prediction on data from five California hospitals, Nat. Commun. 16 (2025) 1371

work page 2025
[66]

Huang, H

X. Huang, H. Kikuchi, C.-I. Fan, Privacy Preserved Spectral Analysis Using IoT mHealth Biomedical Data for Stress Estimation, in: 2018 IEEE 32nd Int. Conf. Adv. Inf. Netw. Appl. AINA, 2018: pp. 793–800. https://doi.org/10.1109/AINA.2018.00118

work page doi:10.1109/aina.2018.00118 2018
[67]

Y . Li, D. Feng, Y . Sui, H. Li, Y . Song, T. Zhan, G. Cicconetti, M. Jin, H. Wang, I. Chan, X. Wang, Analyzing longitudinal binary data in clinical studies, Contemp. Clin. Trials 115 (2022) 106717. https://doi.org/10.1016/j.cct.2022.106717. Multi-Site Health Research Integrating Complementary Data Sources: A Scoping Review of Statistical Inference Method...

work page doi:10.1016/j.cct.2022.106717 2022
[68]

RESEARCH QUESTION ....................................................................................................... 1

work page
[69]

METHODS .............................................................................................................................. 1 2.1. Keywords ....................................................................................................................... 1 2.1.1. Limits and restrictions ........................................................

work page
[70]

continuous vs binary covariates)

RESEARCH QUESTION 1.1.What existing distributed methods allow conducting statistical inference procedures with vertically partitioned data?  Regarding: Methods for different statistical models; Methods for various settings in terms of privacy; Methods for different data settings (e.g. continuous vs binary covariates). 1.2.What are the characteristics spe...

work page
[71]

privacy-preserving

METHODS 2.1. Keywords In accordance with our previous review, while making sure we capture all potential existing methods, two categories of keywords were targeted, corresponding to the two themes in the research question.  Distributed analyses: o Partitioned  Partitioned  Federated  Distributed  Aggregated  Privacy-preserving  Multiparty  Multipl...

work page
[72]

Examples of papers that would not meet the criteria: the method is 5 presented on horizontally partitioned data only; or the method requires pooling all line- level data

Vertically partitioned data This paper/study presents a solution to perform inferential statistics with vertically partitioned data. Examples of papers that would not meet the criteria: the method is 5 presented on horizontally partitioned data only; or the method requires pooling all line- level data

work page
[73]

e.g., the focus is not on estimation and/or confidence intervals and/or hypothesis testing

Inferential Statistics The paper/study does not specifically address inferential statistics (Confidence intervals, Hypothesis testing or Asymptotic normality result). e.g., the focus is not on estimation and/or confidence intervals and/or hypothesis testing

work page
[74]

e.g., the study is solely an application of a previously developed and presented method

Methodologi cal contribution The paper/study does not provide a new methodological contribution. e.g., the study is solely an application of a previously developed and presented method

work page
[75]

Published study The paper/study has not been published

work page
[76]

Language The full-text is not available in English or French. 2.4. Data-charting A data-charting form was collectively developed, and extraction will be addressed manually. Data will be independently extracted by two authors (MPD and SL) for all studies. In the case of opposite opinions from the two initial reviewers, the reviewers will discuss and, if ne...

work page 2015
[77]

Accurate Estimation of Structural Equation Models with Remote Partitioned Data

Snoke J, Brick T, Slavković A. Accurate Estimation of Structural Equation Models with Remote Partitioned Data. In: Privacy in Statistical Databases. Cham: Springer International Publishing; 2016. p. 190–209

work page 2016
[78]

VERTIcal Grid lOgistic regression with Confidence Intervals (VERTIGO-CI)

Kim J, Li W, Bath T, Jiang X, Ohno-Machado L. VERTIcal Grid lOgistic regression with Confidence Intervals (VERTIGO-CI). Proc – AMIA Jt Summits Transl Sci. 2021;2021:355–64

work page 2021
[79]

VERTICOX: Vertically Distributed Cox Proportional Hazards Model Using the Alternating Direction Method of Multipliers

Dai W, Jiang X, Bonomi L, Li Y , Xiong H, Ohno-Machado L. VERTICOX: Vertically Distributed Cox Proportional Hazards Model Using the Alternating Direction Method of Multipliers. IEEE Trans Knowl Data Eng. 2022;34(2):996–1010. doi:10.1109/TKDE.2020.2989301

work page doi:10.1109/tkde.2020.2989301 2022
[80]

Advances in Water Resources 166, 104264

Imakura A, Tsunoda R, Kagawa R, Yamagata K, Sakurai T. DC-COX: Data collaboration Cox proportional hazards model for privacy-preserving survival analysis on multiple parties. J Biomed Inform. 2023;137. doi:10.1016/j.jbi.2022.104264

work page doi:10.1016/j.jbi.2022.104264 2023
[81]

Valid statistical analysis for logistic regression with multiple sources

Fienberg SE, Nardi Y , Slavković AB. Valid statistical analysis for logistic regression with multiple sources. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2009. p. 82–94. doi:10.1007/978-3-642-10233-2_8

work page doi:10.1007/978-3-642-10233-2_8 2009
[82]

Secure Logistic Regression with Multi-Party Distributed Databases

Fienberg SE, Karr AF, Nardi Y , Slavkovic AB. Secure Logistic Regression with Multi-Party Distributed Databases. Proc 56th Sess ISI. 2007

work page 2007
[83]

Secure logistic regression of horizontally and vertically partitioned distributed databases

Slavkovic AB, Nardi Y , Tibbits MM. Secure logistic regression of horizontally and vertically partitioned distributed databases. In: Proceedings - IEEE International Conference on Data Mining, ICDM. 2007. p. 723–8. doi:10.1109/ICDMW.2007.114

work page doi:10.1109/icdmw.2007.114 2007
[84]

Providing accurate models across private partitioned data: Secure maximum likelihood estimation

Snoke J, Brick TR, Slavković A, Hunte MD. Providing accurate models across private partitioned data: Secure maximum likelihood estimation. Ann Appl Stat. 2018;12(2):877–914. doi:10.1214/18-AOAS1171

work page doi:10.1214/18-aoas1171 2018
[85]

Privacy-preserving analysis of vertically partitioned data using secure matrix products

Karr AF, Lin X, Sanil AP, Reiter JP. Privacy-preserving analysis of vertically partitioned data using secure matrix products. J Off Stat. 2009;25(1):125–38

work page 2009
[86]

Secure statistical analysis of distributed databases

Karr AF, Lin X, Sanil AP, Reiter JP. Secure statistical analysis of distributed databases. In: Statistical Methods in Counterterrorism: Game Theory, Modeling, Syndromic Surveillance, and Biometric Authentication. 2006. p. 237–61. doi:10.1007/0-387-35209-0_14

work page doi:10.1007/0-387-35209-0_14 2006
[87]

Secure Regression for Vertically Partitioned, Partially Overlapping Data

Reiter JP, Kohnen CN, Karr AF, Lin X, Sanil AP. Secure Regression for Vertically Partitioned, Partially Overlapping Data. Proc Am Stat Assoc. 2004

work page 2004
[88]

Privacy-preserving multivariate statistical analysis: Linear regression and classification

Du W, Han YS, Chen S. Privacy-preserving multivariate statistical analysis: Linear regression and classification. In: SIAM Proceedings Series. 2004. p. 222–33. doi:10.1137/1.9781611972740.21 6

work page doi:10.1137/1.9781611972740.21 2004
[89]

A solution to privacy-preserving two-party sign test on vertically partitioned data (P22NSTv) using data disguising techniques

Liu MC, Zhang N. A solution to privacy-preserving two-party sign test on vertically partitioned data (P22NSTv) using data disguising techniques. In: ICNIT 2010 - 2010 International Conference on Networking and Information Technology. 2010. p. 526–34. doi:10.1109/ICNIT.2010.5508458

work page doi:10.1109/icnit.2010.5508458 2010
[90]

Privacy-preserving cloud-based statistical analyses on sensitive categorical data

Ricci S, Domingo-Ferrer J, Sánchez D. Privacy-preserving cloud-based statistical analyses on sensitive categorical data. In: International Conference on Modeling Decisions for Artificial Intelligence. 2016. p. 227–38. doi:10.1007/978-3-319-45656-0_19

work page doi:10.1007/978-3-319-45656-0_19 2016
[91]

Privacy-Preserving Randomized Controlled Trials: A Protocol for Industry Scale Deployment

Movahedi M, Case BM, Honaker J, Knox A, Li L, Li YP, et al. Privacy-Preserving Randomized Controlled Trials: A Protocol for Industry Scale Deployment. In: Proceedings of the 2021 on Cloud Computing Security Workshop. Association for Computing Machinery; 2021. p. 59–69. doi:10.1145/3474123.3486764

work page doi:10.1145/3474123.3486764 2021
[92]

Privacy-preserving hypothesis testing for the analysis of epidemiological medical data

Kikuchi H, Sato T, Sakuma J. Privacy-preserving hypothesis testing for the analysis of epidemiological medical data. In: Proceedings - International Conference on Advanced Information Networking and Applications, AINA. 2014. p. 359–65. doi:10.1109/AINA.2014.46

work page doi:10.1109/aina.2014.46 2014
[93]

Privacy-Preserving Multiple Linear Regression of Vertically Partitioned Real Medical Datasets

Kikuchi H, Hamanaga C, Yasunaga H, Matsui H, Hashimoto H. Privacy-Preserving Multiple Linear Regression of Vertically Partitioned Real Medical Datasets. In: 2017 IEEE 31st International Conference on Advanced Information Networking and Applications (AINA). 2017. p. 1042–9. doi:10.1109/AINA.2017.52

work page doi:10.1109/aina.2017.52 2017
[94]

Yasunaga, Saito T

Kikuchi H, Hashimoto H, H. Yasunaga, Saito T. Scalability of Privacy-Preserving Linear Regression in Epidemiological Studies. In: 2015 IEEE 29th International Conference on Advanced Information Networking and Applications. 2015. p. 510–4. doi:10.1109/AINA.2015.229

work page doi:10.1109/aina.2015.229 2015
[95]

Efficient Privacy-Preserving Logistic Regression with Iteratively Re-weighted Least Squares

Kikuchi H, Yasunaga H, Matsui H, Fan CI. Efficient Privacy-Preserving Logistic Regression with Iteratively Re-weighted Least Squares. In: 2016 11th Asia Joint Conference on Information Security (AsiaJCIS). 2016. p. 48–54. doi:10.1109/AsiaJCIS.2016.21

work page doi:10.1109/asiajcis.2016.21 2016
[96]

Secure Multiple Linear Regression Based on Homomorphic Encryption

Hall R, Fienberg SE, Nardi Y . Secure Multiple Linear Regression Based on Homomorphic Encryption. 2011

work page 2011
[97]

Privacy-preserving cooperative statistical analysis

Du W, Atallah MJ. Privacy-preserving cooperative statistical analysis. In: Proceedings - Annual Computer Security Applications Conference, ACSAC. 2001. p. 102–10. doi:10.1109/ACSAC.2001.991526

work page doi:10.1109/acsac.2001.991526 2001
[98]

Fast, Privacy Preserving Linear Regression over Distributed Datasets based on Pre-Distributed Data

Cock M de, Dowsley R, Nascimento ACA, Newman SC. Fast, Privacy Preserving Linear Regression over Distributed Datasets based on Pre-Distributed Data. In: Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security. Association for Computing Machinery; 2015. p. 3–14. doi:10.1145/2808769.2808774

work page doi:10.1145/2808769.2808774 2015
[99]

Privacy-preserving statistical analysis by exact logistic regression

Duverle DA, Kawasaki S, Yamada Y , Sakuma J, Tsuda K. Privacy-preserving statistical analysis by exact logistic regression. In: Proceedings - 2015 IEEE Security and Privacy Workshops, SPW 2015. 2015. p. 7–16. doi:10.1109/SPW.2015.14 7

work page doi:10.1109/spw.2015.14 2015
[100]

Accurate training of the Cox proportional hazards model on vertically-partitioned data while preserving privacy

Kamphorst B, Rooijakkers T, Veugen T, Cellamare M, Knoors D. Accurate training of the Cox proportional hazards model on vertically-partitioned data while preserving privacy. BMC Med Inform Decis Mak. 2022;22(1):49. Located at: 35209883. doi:10.1186/s12911-022-01771-3

work page doi:10.1186/s12911-022-01771-3 2022
[101]

F. Wu, B. Xi. Differentially Private Causal Inference Under Hierarchical Design. In: 2023 IEEE International Conference on Data Mining Workshops (ICDMW). 2023. p. 1390–9. doi:10.1109/ICDMW60847.2023.00177

work page doi:10.1109/icdmw60847.2023.00177 2023

Showing first 80 references.

[1] [1]

Dash, S.K

S. Dash, S.K. Shakyawar, M. Sharma, S. Kaushik, Big data in healthcare: management, analysis and future prospects, J. Big Data 6 (2019) 54. https://doi.org/10.1186/s40537-019-0217-0

work page doi:10.1186/s40537-019-0217-0 2019

[2] [2]

Chawla, D.A

N.V . Chawla, D.A. Davis, Bringing Big Data to Personalized Healthcare: A Patient- Centered Framework, J. Gen. Intern. Med. 28 (2013) 660–665. https://doi.org/10.1007/s11606-013-2455-8

work page doi:10.1007/s11606-013-2455-8 2013

[3] [3]

Dankar, A

F.K. Dankar, A. Ptitsyn, S.K. Dankar, The development of large-scale de-identified biomedical databases in the age of genomics—principles and challenges, Hum. Genomics 12 (2018) 19. https://doi.org/10.1186/s40246-018-0147-5

work page doi:10.1186/s40246-018-0147-5 2018

[4] [4]

https://nuage.recherche.usherbrooke.ca/en/ (accessed February 9, 2026)

Banques NuAge – Banques de données et d’échantillons biologiques de l’Étude longitudinale québécoise sur la nutrition comme déterminant d’un vieillissement réussi, (n.d.). https://nuage.recherche.usherbrooke.ca/en/ (accessed February 9, 2026)

work page 2026

[5] [5]

Pfitzner, N

B. Pfitzner, N. Steckhan, B. Arnrich, Federated Learning in a Medical Context: A Systematic Literature Review, ACM Trans. Internet Technol. 21 (2021) 1–31. https://doi.org/10.1145/3412357

work page doi:10.1145/3412357 2021

[6] [6]

Camirand Lemyre, S

F. Camirand Lemyre, S. Lévesque, M.-P. Domingue, K. Herrmann, J.-F. Ethier, Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Health Analytics, JMIR Med. Inform. 12 (2024) e53622. https://doi.org/10.2196/53622

work page doi:10.2196/53622 2024

[7] [7]

J. Bohn, W. Eddings, S. Schneeweiss, Conducting privacy-preserving multivariable propensity score analysis when patient covariate information is stored in separate locations, Am. J. Epidemiol. 185 (2017) 501–510. https://doi.org/10.1093/aje/kww155

work page doi:10.1093/aje/kww155 2017

[8] [8]

Chang, Z

C. Chang, Z. Bu, Q. Long, CEDAR: communication efficient distributed analysis for regressions, Biometrics 79 (2023) 2357–2369. https://doi.org/10.1111/biom.13786

work page doi:10.1111/biom.13786 2023

[9] [9]

Tong, J.M

J. Tong, J.M. Reps, C. Luo, Y . Lu, L. Li, J.M. Ramirez-Anguita, M.T. Brand, S.L. DuVall, T. Falconer, A.M. Fuentes, Unlocking efficiency in real-world collaborative studies: a multi-site international study with one-shot lossless GLMM algorithm, Npj Digit. Med. 8 (2025) 457

work page 2025

[10] [10]

Zhang, D

F. Zhang, D. Kreuter, Y . Chen, S. Dittmer, S. Tull, T. Shadbahr, M. Schut, F. Asselbergs, S. Kar, S. Sivapalaratnam, Recent methodological advances in federated learning for healthcare, Patterns 5 (2024)

work page 2024

[11] [11]

Shukla, S

S. Shukla, S. Doyle, Federated learning in computational pathology: a literature review, J. Med. Imaging 12 (2025) 061412–061412

work page 2025

[12] [12]

Torkzadehmahani, R

R. Torkzadehmahani, R. Nasirigerdeh, D.B. Blumenthal, T. Kacprowski, M. List, J. Matschinske, J. Spaeth, N.K. Wenke, J. Baumbach, Privacy-Preserving Artificial Intelligence Techniques in Biomedicine, Methods Inf. Med. 61 (2022) e12–e27. https://doi.org/10.1055/s-0041-1740630

work page doi:10.1055/s-0041-1740630 2022

[13] [13]

Multimodal biomedical AI.Nature Medicine, 28(9):1773–1784, 2022

J.N. Acosta, G.J. Falcone, P. Rajpurkar, E.J. Topol, Multimodal biomedical AI, Nat. Med. 28 (2022) 1773–1784. https://doi.org/10.1038/s41591-022-01981-2

work page doi:10.1038/s41591-022-01981-2 2022

[14] [15]

Y . Li, X. Jiang, S. Wang, H. Xiong, L. Ohno-Machado, VERTIcal Grid lOgistic regression (VERTIGO), J. Am. Med. Inform. Assoc. 23 (2016) 570–579. https://doi.org/10.1093/jamia/ocv146

work page doi:10.1093/jamia/ocv146 2016

[15] [16]

C. Sun, L. Ippel, A. Dekker, M. Dumontier, J. van Soest, A systematic review on privacy-preserving distributed data mining, Data Sci. 4 (2021) 121–150. https://doi.org/10.3233/DS-210036

work page doi:10.3233/ds-210036 2021

[16] [17]

Y . Liu, Y . Kang, T. Zou, Y . Pu, Y . He, X. Ye, Y . Ouyang, Y .-Q. Zhang, Q. Yang, Vertical Federated Learning: Concepts, Advances, and Challenges, IEEE Trans. Knowl. Data Eng. 36 (2024) 3615–3634. https://doi.org/10.1109/TKDE.2024.3352628

work page doi:10.1109/tkde.2024.3352628 2024

[17] [18]

A. Khan, M. ten Thij, A. Wilbik, Vertical federated learning: a structured literature review, Knowl. Inf. Syst. 67 (2025) 3205–3243. https://doi.org/10.1007/s10115-025- 02356-y

work page doi:10.1007/s10115-025- 2025

[18] [19]

H. Chen, H. Wang, Q. Long, D. Jin, Y . Li, Advancements in Federated Learning: Models, Methods, and Privacy, ACM Comput. Surv. (2024) 3664650. https://doi.org/10.1145/3664650

work page doi:10.1145/3664650 2024

[19] [20]

Z.L. Teo, L. Jin, N. Liu, S. Li, D. Miao, X. Zhang, W.Y . Ng, T.F. Tan, D.M. Lee, K.J. Chua, J. Heng, Y . Liu, R.S.M. Goh, D.S.W. Ting, Federated machine learning in healthcare: A systematic review on clinical applications and technical architecture, Cell Rep. Med. 5 (2024) 101419. https://doi.org/10.1016/j.xcrm.2024.101419

work page doi:10.1016/j.xcrm.2024.101419 2024

[20] [21]

Shmueli, To Explain or to Predict?, Stat

G. Shmueli, To Explain or to Predict?, Stat. Sci. 25 (2010) 289–310

work page 2010

[21] [22]

James, D

G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning, Springer New York, New York, NY , 2013. https://doi.org/10.1007/978-1-4614-7138- 7

work page doi:10.1007/978-1-4614-7138- 2013

[22] [23]

A wideband tunable, nonreciprocal bandpass filter using magnetostatic 19 surface waves with zero static power consumption,

C. Luo, Md.N. Islam, N.E. Sheils, J. Buresh, J. Reps, M.J. Schuemie, P.B. Ryan, M. Edmondson, R. Duan, J. Tong, A. Marks-Anglin, J. Bian, Z. Chen, T. Duarte-Salles, S. Fernández-Bertolín, T. Falconer, C. Kim, R.W. Park, S.R. Pfohl, N.H. Shah, A.E. Williams, H. Xu, Y . Zhou, E. Lautenbach, J.A. Doshi, R.M. Werner, D.A. Asch, Y . Chen, DLMM as a lossless on...

work page doi:10.1038/s41467- 2022

[23] [24]

Q. Wu, J.M. Reps, L. Li, B. Zhang, Y . Lu, J. Tong, D. Zhang, T. Lumley, M.T. Brand, M. Van Zandt, T. Falconer, X. He, Y . Huang, H. Li, C. Yan, G. Tang, A.E. Williams, F. Wang, J. Bian, B. Malin, G. Hripcsak, M.J. Schuemie, Y . Lu, S. Drew, J. Zhou, D.A. Asch, Y . Chen, COLA-GLM: collaborative one-shot and lossless algorithms of generalized linear models...

work page doi:10.1038/s41746-025-01781-1 2025

[24] [26]

Domingue, J.-F

M.-P. Domingue, J.-F. Ethier, J.-P. Morissette, S. Lévesque, A. Burgun, F. Camirand Lemyre, Revisiting VERTIGO and VERTIGO-CI: Identifying confidentiality breaches and introducing a statistically sound, efficient alternative, (2025). https://doi.org/10.21203/rs.3.rs-6933988/v1

work page doi:10.21203/rs.3.rs-6933988/v1 2025

[25] [27]

Bak, V .I

M. Bak, V .I. Madai, L.A. Celi, G.A. Kaissis, R. Cornet, M. Maris, D. Rueckert, A. Buyx, S. McLennan, Federated learning is not a cure-all for data ethics, Nat. Mach. Intell. 6 (2024) 370–372. https://doi.org/10.1038/s42256-024-00813-x

work page doi:10.1038/s42256-024-00813-x 2024

[26] [28]

Munn, M.D.J

Z. Munn, M.D.J. Peters, C. Stern, C. Tufanaru, A. McArthur, E. Aromataris, Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach, BMC Med. Res. Methodol. 18 (2018) 143. https://doi.org/10.1186/s12874-018-0611-x

work page doi:10.1186/s12874-018-0611-x 2018

[27] [29]

Arksey, L

H. Arksey, L. O’Malley, Scoping studies: towards a methodological framework, Int. J. Soc. Res. Methodol. 8 (2005) 19–32. https://doi.org/10.1080/1364557032000119616

work page doi:10.1080/1364557032000119616 2005

[28] [30]

Tricco, Erin Lillie, Wasifa Zarin, Kelly K

A.C. Tricco, E. Lillie, W. Zarin, K.K. O’Brien, H. Colquhoun, D. Levac, D. Moher, M.D.J. Peters, T. Horsley, L. Weeks, S. Hempel, E.A. Akl, C. Chang, J. McGowan, L. Stewart, L. Hartling, A. Aldcroft, M.G. Wilson, C. Garritty, S. Lewin, C.M. Godfrey, M.T. Macdonald, E.V . Langlois, K. Soares-Weiser, J. Moriarty, T. Clifford, Ö. Tunçalp, S.E. Straus, PRISMA...

work page doi:10.7326/m18-0850 2018

[29] [31]

Wohlin, Guidelines for snowballing in systematic literature studies and a replication in software engineering, in: Proc

C. Wohlin, Guidelines for snowballing in systematic literature studies and a replication in software engineering, in: Proc. 18th Int. Conf. Eval. Assess. Softw. Eng., ACM, London England United Kingdom, 2014: pp. 1–10. https://doi.org/10.1145/2601248.2601268

work page doi:10.1145/2601248.2601268 2014

[30] [32]

Carrera-Rivera, W

A. Carrera-Rivera, W. Ochoa, F. Larrinaga, G. Lasa, How-to conduct a systematic literature review: A quick guide for computer science research, MethodsX 9 (2022) 101895. https://doi.org/10.1016/j.mex.2022.101895

work page doi:10.1016/j.mex.2022.101895 2022

[31] [33]

Fienberg, A.F

S.E. Fienberg, A.F. Karr, Y . Nardi, A.B. Slavkovic, Secure Logistic Regression with Multi-Party Distributed Databases, Proc. 56th Sess. ISI (2007)

work page 2007

[32] [35]

M.-C. Liu, N. Zhang, A cryptographic solution to privacy-preserving two-party sign test computation on vertically partitioned data, in: Adv. Mater. Res., 2012: pp. 1249–

work page 2012

[33] [39]

Reiter, C.N

J.P. Reiter, C.N. Kohnen, A.F. Karr, X. Lin, A.P. Sanil, Secure Regression for Vertically Partitioned, Partially Overlapping Data, Proc. Am. Stat. Assoc. (2004)

work page 2004

[34] [40]

Hall, S.E

R. Hall, S.E. Fienberg, Y . Nardi, Secure Multiple Linear Regression Based on Homomorphic Encryption, (2011)

work page 2011

[35] [42]

Kikuchi, C

H. Kikuchi, C. Hamanaga, H. Yasunaga, H. Matsui, H. Hashimoto, Privacy- Preserving Multiple Linear Regression of Vertically Partitioned Real Medical Datasets, in: 2017 IEEE 31st Int. Conf. Adv. Inf. Netw. Appl. AINA, 2017: pp. 1042–

work page 2017

[36] [44]

A.F. Karr, X. Lin, A.P. Sanil, J.P. Reiter, Secure statistical analysis of distributed databases, in: Stat. Methods Counterterrorism Game Theory Model. Syndr. Surveill. Biom. Authentication, 2006: pp. 237–261. https://doi.org/10.1007/0-387-35209- 0_14

work page doi:10.1007/0-387-35209- 2006

[37] [45]

A.F. Karr, X. Lin, A.P. Sanil, J.P. Reiter, Privacy-preserving analysis of vertically partitioned data using secure matrix products, J. Off. Stat. 25 (2009) 125–138

work page 2009

[38] [47]

In: Krebbers, R

S.E. Fienberg, Y . Nardi, A.B. Slavković, Valid statistical analysis for logistic regression with multiple sources, in: Lect. Notes Comput. Sci. Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinforma., 2009: pp. 82–94. https://doi.org/10.1007/978- 3-642-10233-2_8

work page doi:10.1007/978- 2009

[39] [50]

J. Kim, W. Li, T. Bath, X. Jiang, L. Ohno-Machado, VERTIcal Grid lOgistic regression with Confidence Intervals (VERTIGO-CI), Proc. – AMIA Jt. Summits Transl. Sci. 2021 (2021) 355–364

work page 2021

[40] [52]

Hector, P.X.-K

E.C. Hector, P.X.-K. Song, Doubly distributed supervised learning and inference with high-dimensional correlated outcomes, J Mach Learn Res 21 (2020) Article-173

work page 2020

[41] [56]

Snoke, T

J. Snoke, T. Brick, A. Slavković, Accurate Estimation of Structural Equation Models with Remote Partitioned Data, in: Priv. Stat. Databases, Springer International Publishing, Cham, 2016: pp. 190–209

work page 2016

[42] [63]

Aggarwal, P.S

C.C. Aggarwal, P.S. Yu, Privacy-Preserving Data Mining: Models and Algorithms, Springer, 2008

work page 2008

[43] [64]

Q. Her, T. Kent, Y . Samizo, A. Slavkovic, Y . Vilk, S. Toh, Automatable Distributed Regression Analysis of Vertically Partitioned Data Facilitated by PopMedNet: Feasibility and Enhancement Study, JMIR Med. Inform. 9 (2021) e21459. https://doi.org/10.2196/21459

work page doi:10.2196/21459 2021

[44] [65]

Kuo, R.A

T.-T. Kuo, R.A. Gabriel, J. Koola, R.T. Schooley, L. Ohno-Machado, Distributed cross-learning for equitable federated models-privacy-preserving prediction on data from five California hospitals, Nat. Commun. 16 (2025) 1371

work page 2025

[45] [66]

Huang, H

X. Huang, H. Kikuchi, C.-I. Fan, Privacy Preserved Spectral Analysis Using IoT mHealth Biomedical Data for Stress Estimation, in: 2018 IEEE 32nd Int. Conf. Adv. Inf. Netw. Appl. AINA, 2018: pp. 793–800. https://doi.org/10.1109/AINA.2018.00118

work page doi:10.1109/aina.2018.00118 2018

[46] [67]

Y . Li, D. Feng, Y . Sui, H. Li, Y . Song, T. Zhan, G. Cicconetti, M. Jin, H. Wang, I. Chan, X. Wang, Analyzing longitudinal binary data in clinical studies, Contemp. Clin. Trials 115 (2022) 106717. https://doi.org/10.1016/j.cct.2022.106717. Multi-Site Health Research Integrating Complementary Data Sources: A Scoping Review of Statistical Inference Method...

work page doi:10.1016/j.cct.2022.106717 2022

[47] [68]

RESEARCH QUESTION ....................................................................................................... 1

work page

[48] [69]

METHODS .............................................................................................................................. 1 2.1. Keywords ....................................................................................................................... 1 2.1.1. Limits and restrictions ........................................................

work page

[49] [70]

continuous vs binary covariates)

RESEARCH QUESTION 1.1.What existing distributed methods allow conducting statistical inference procedures with vertically partitioned data?  Regarding: Methods for different statistical models; Methods for various settings in terms of privacy; Methods for different data settings (e.g. continuous vs binary covariates). 1.2.What are the characteristics spe...

work page

[50] [71]

privacy-preserving

METHODS 2.1. Keywords In accordance with our previous review, while making sure we capture all potential existing methods, two categories of keywords were targeted, corresponding to the two themes in the research question.  Distributed analyses: o Partitioned  Partitioned  Federated  Distributed  Aggregated  Privacy-preserving  Multiparty  Multipl...

work page

[51] [72]

Examples of papers that would not meet the criteria: the method is 5 presented on horizontally partitioned data only; or the method requires pooling all line- level data

Vertically partitioned data This paper/study presents a solution to perform inferential statistics with vertically partitioned data. Examples of papers that would not meet the criteria: the method is 5 presented on horizontally partitioned data only; or the method requires pooling all line- level data

work page

[52] [73]

e.g., the focus is not on estimation and/or confidence intervals and/or hypothesis testing

Inferential Statistics The paper/study does not specifically address inferential statistics (Confidence intervals, Hypothesis testing or Asymptotic normality result). e.g., the focus is not on estimation and/or confidence intervals and/or hypothesis testing

work page

[53] [74]

e.g., the study is solely an application of a previously developed and presented method

Methodologi cal contribution The paper/study does not provide a new methodological contribution. e.g., the study is solely an application of a previously developed and presented method

work page

[54] [75]

Published study The paper/study has not been published

work page

[55] [76]

Language The full-text is not available in English or French. 2.4. Data-charting A data-charting form was collectively developed, and extraction will be addressed manually. Data will be independently extracted by two authors (MPD and SL) for all studies. In the case of opposite opinions from the two initial reviewers, the reviewers will discuss and, if ne...

work page 2015

[56] [77]

Accurate Estimation of Structural Equation Models with Remote Partitioned Data

Snoke J, Brick T, Slavković A. Accurate Estimation of Structural Equation Models with Remote Partitioned Data. In: Privacy in Statistical Databases. Cham: Springer International Publishing; 2016. p. 190–209

work page 2016

[57] [78]

VERTIcal Grid lOgistic regression with Confidence Intervals (VERTIGO-CI)

Kim J, Li W, Bath T, Jiang X, Ohno-Machado L. VERTIcal Grid lOgistic regression with Confidence Intervals (VERTIGO-CI). Proc – AMIA Jt Summits Transl Sci. 2021;2021:355–64

work page 2021

[58] [79]

VERTICOX: Vertically Distributed Cox Proportional Hazards Model Using the Alternating Direction Method of Multipliers

Dai W, Jiang X, Bonomi L, Li Y , Xiong H, Ohno-Machado L. VERTICOX: Vertically Distributed Cox Proportional Hazards Model Using the Alternating Direction Method of Multipliers. IEEE Trans Knowl Data Eng. 2022;34(2):996–1010. doi:10.1109/TKDE.2020.2989301

work page doi:10.1109/tkde.2020.2989301 2022

[59] [80]

Advances in Water Resources 166, 104264

Imakura A, Tsunoda R, Kagawa R, Yamagata K, Sakurai T. DC-COX: Data collaboration Cox proportional hazards model for privacy-preserving survival analysis on multiple parties. J Biomed Inform. 2023;137. doi:10.1016/j.jbi.2022.104264

work page doi:10.1016/j.jbi.2022.104264 2023

[60] [81]

Valid statistical analysis for logistic regression with multiple sources

Fienberg SE, Nardi Y , Slavković AB. Valid statistical analysis for logistic regression with multiple sources. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2009. p. 82–94. doi:10.1007/978-3-642-10233-2_8

work page doi:10.1007/978-3-642-10233-2_8 2009

[61] [82]

Secure Logistic Regression with Multi-Party Distributed Databases

Fienberg SE, Karr AF, Nardi Y , Slavkovic AB. Secure Logistic Regression with Multi-Party Distributed Databases. Proc 56th Sess ISI. 2007

work page 2007

[62] [83]

Secure logistic regression of horizontally and vertically partitioned distributed databases

Slavkovic AB, Nardi Y , Tibbits MM. Secure logistic regression of horizontally and vertically partitioned distributed databases. In: Proceedings - IEEE International Conference on Data Mining, ICDM. 2007. p. 723–8. doi:10.1109/ICDMW.2007.114

work page doi:10.1109/icdmw.2007.114 2007

[63] [84]

Providing accurate models across private partitioned data: Secure maximum likelihood estimation

Snoke J, Brick TR, Slavković A, Hunte MD. Providing accurate models across private partitioned data: Secure maximum likelihood estimation. Ann Appl Stat. 2018;12(2):877–914. doi:10.1214/18-AOAS1171

work page doi:10.1214/18-aoas1171 2018

[64] [85]

Privacy-preserving analysis of vertically partitioned data using secure matrix products

Karr AF, Lin X, Sanil AP, Reiter JP. Privacy-preserving analysis of vertically partitioned data using secure matrix products. J Off Stat. 2009;25(1):125–38

work page 2009

[65] [86]

Secure statistical analysis of distributed databases

Karr AF, Lin X, Sanil AP, Reiter JP. Secure statistical analysis of distributed databases. In: Statistical Methods in Counterterrorism: Game Theory, Modeling, Syndromic Surveillance, and Biometric Authentication. 2006. p. 237–61. doi:10.1007/0-387-35209-0_14

work page doi:10.1007/0-387-35209-0_14 2006

[66] [87]

Secure Regression for Vertically Partitioned, Partially Overlapping Data

Reiter JP, Kohnen CN, Karr AF, Lin X, Sanil AP. Secure Regression for Vertically Partitioned, Partially Overlapping Data. Proc Am Stat Assoc. 2004

work page 2004

[67] [88]

Privacy-preserving multivariate statistical analysis: Linear regression and classification

Du W, Han YS, Chen S. Privacy-preserving multivariate statistical analysis: Linear regression and classification. In: SIAM Proceedings Series. 2004. p. 222–33. doi:10.1137/1.9781611972740.21 6

work page doi:10.1137/1.9781611972740.21 2004

[68] [89]

A solution to privacy-preserving two-party sign test on vertically partitioned data (P22NSTv) using data disguising techniques

Liu MC, Zhang N. A solution to privacy-preserving two-party sign test on vertically partitioned data (P22NSTv) using data disguising techniques. In: ICNIT 2010 - 2010 International Conference on Networking and Information Technology. 2010. p. 526–34. doi:10.1109/ICNIT.2010.5508458

work page doi:10.1109/icnit.2010.5508458 2010

[69] [90]

Privacy-preserving cloud-based statistical analyses on sensitive categorical data

Ricci S, Domingo-Ferrer J, Sánchez D. Privacy-preserving cloud-based statistical analyses on sensitive categorical data. In: International Conference on Modeling Decisions for Artificial Intelligence. 2016. p. 227–38. doi:10.1007/978-3-319-45656-0_19

work page doi:10.1007/978-3-319-45656-0_19 2016

[70] [91]

Privacy-Preserving Randomized Controlled Trials: A Protocol for Industry Scale Deployment

Movahedi M, Case BM, Honaker J, Knox A, Li L, Li YP, et al. Privacy-Preserving Randomized Controlled Trials: A Protocol for Industry Scale Deployment. In: Proceedings of the 2021 on Cloud Computing Security Workshop. Association for Computing Machinery; 2021. p. 59–69. doi:10.1145/3474123.3486764

work page doi:10.1145/3474123.3486764 2021

[71] [92]

Privacy-preserving hypothesis testing for the analysis of epidemiological medical data

Kikuchi H, Sato T, Sakuma J. Privacy-preserving hypothesis testing for the analysis of epidemiological medical data. In: Proceedings - International Conference on Advanced Information Networking and Applications, AINA. 2014. p. 359–65. doi:10.1109/AINA.2014.46

work page doi:10.1109/aina.2014.46 2014

[72] [93]

Privacy-Preserving Multiple Linear Regression of Vertically Partitioned Real Medical Datasets

Kikuchi H, Hamanaga C, Yasunaga H, Matsui H, Hashimoto H. Privacy-Preserving Multiple Linear Regression of Vertically Partitioned Real Medical Datasets. In: 2017 IEEE 31st International Conference on Advanced Information Networking and Applications (AINA). 2017. p. 1042–9. doi:10.1109/AINA.2017.52

work page doi:10.1109/aina.2017.52 2017

[73] [94]

Yasunaga, Saito T

Kikuchi H, Hashimoto H, H. Yasunaga, Saito T. Scalability of Privacy-Preserving Linear Regression in Epidemiological Studies. In: 2015 IEEE 29th International Conference on Advanced Information Networking and Applications. 2015. p. 510–4. doi:10.1109/AINA.2015.229

work page doi:10.1109/aina.2015.229 2015

[74] [95]

Efficient Privacy-Preserving Logistic Regression with Iteratively Re-weighted Least Squares

Kikuchi H, Yasunaga H, Matsui H, Fan CI. Efficient Privacy-Preserving Logistic Regression with Iteratively Re-weighted Least Squares. In: 2016 11th Asia Joint Conference on Information Security (AsiaJCIS). 2016. p. 48–54. doi:10.1109/AsiaJCIS.2016.21

work page doi:10.1109/asiajcis.2016.21 2016

[75] [96]

Secure Multiple Linear Regression Based on Homomorphic Encryption

Hall R, Fienberg SE, Nardi Y . Secure Multiple Linear Regression Based on Homomorphic Encryption. 2011

work page 2011

[76] [97]

Privacy-preserving cooperative statistical analysis

Du W, Atallah MJ. Privacy-preserving cooperative statistical analysis. In: Proceedings - Annual Computer Security Applications Conference, ACSAC. 2001. p. 102–10. doi:10.1109/ACSAC.2001.991526

work page doi:10.1109/acsac.2001.991526 2001

[77] [98]

Fast, Privacy Preserving Linear Regression over Distributed Datasets based on Pre-Distributed Data

Cock M de, Dowsley R, Nascimento ACA, Newman SC. Fast, Privacy Preserving Linear Regression over Distributed Datasets based on Pre-Distributed Data. In: Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security. Association for Computing Machinery; 2015. p. 3–14. doi:10.1145/2808769.2808774

work page doi:10.1145/2808769.2808774 2015

[78] [99]

Privacy-preserving statistical analysis by exact logistic regression

Duverle DA, Kawasaki S, Yamada Y , Sakuma J, Tsuda K. Privacy-preserving statistical analysis by exact logistic regression. In: Proceedings - 2015 IEEE Security and Privacy Workshops, SPW 2015. 2015. p. 7–16. doi:10.1109/SPW.2015.14 7

work page doi:10.1109/spw.2015.14 2015

[79] [100]

Accurate training of the Cox proportional hazards model on vertically-partitioned data while preserving privacy

Kamphorst B, Rooijakkers T, Veugen T, Cellamare M, Knoors D. Accurate training of the Cox proportional hazards model on vertically-partitioned data while preserving privacy. BMC Med Inform Decis Mak. 2022;22(1):49. Located at: 35209883. doi:10.1186/s12911-022-01771-3

work page doi:10.1186/s12911-022-01771-3 2022

[80] [101]

F. Wu, B. Xi. Differentially Private Causal Inference Under Hierarchical Design. In: 2023 IEEE International Conference on Data Mining Workshops (ICDMW). 2023. p. 1390–9. doi:10.1109/ICDMW60847.2023.00177

work page doi:10.1109/icdmw60847.2023.00177 2023