pith. sign in

arxiv: 2604.02595 · v1 · submitted 2026-04-03 · 📊 stat.ME

Multi-Site Health Research Integrating Complementary Data Sources: A Scoping Review of Statistical Inference Methods for Vertically Partitioned Data

Pith reviewed 2026-05-13 19:13 UTC · model grok-4.3

classification 📊 stat.ME
keywords vertically partitioned datastatistical inferenceprivacy-preserving analysismulti-site health datascoping reviewlinear regressionlogistic regressiondata integration
0
0 comments X p. Extension

The pith

Vertical methods for statistical inference on partitioned health data rarely achieve equivalence to centralized results with efficient communication and strong privacy protection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This scoping review maps existing vertical methods that let separate institutions run statistical analyses on complementary data for the same people without pooling or sharing individual records. It screened thousands of papers and extracted details from 30 methods, focusing on whether results match what a single pooled database would produce, how many communication rounds between sites are needed, and whether privacy is formally guaranteed. Most methods target linear or logistic regression, yet equivalence is not always proven, communication often requires multiple exchanges, and few papers supply actual privacy assessments despite describing their work as privacy-preserving. The review concludes that the set of available techniques remains narrow and that very few methods meet all three practical requirements at the same time.

Core claim

The scope of existing approaches enabling statistical inference for vertically partitioned data is still relatively limited. Most existing methods do not concurrently achieve results equivalent to centralized analyses, high communication efficiency, and guaranteed protection of individual-level data.

What carries the argument

Systematic extraction of three properties across identified methods: comparability with pooled analysis, efficiency of communication schemes, and confidentiality safeguards.

If this is right

  • Linear and logistic regression are the most common inference tasks addressed by vertical methods.
  • Equivalence to pooled analyses is not systematically demonstrated across all proposed methods.
  • Most methods require multiple communication rounds between participating parties.
  • Only a minority of articles provide explicit privacy assessments even though nearly all describe their approach as privacy-preserving.
  • Applications of these methods to real-world vertically partitioned health data remain limited.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • New vertical methods should be designed from the start to satisfy equivalence, low communication cost, and verifiable privacy in one package.
  • Standardized reporting requirements for privacy guarantees would make it easier to compare methods across studies.
  • The scarcity of real-data applications suggests a need to test promising methods on actual multi-site health datasets to move from theory to practice.

Load-bearing premise

That the database searches and citation screening captured a representative sample of all vertical methods and that the extracted properties were reported consistently enough across papers to support the overall characterization.

What would settle it

Identification of many additional vertical methods that simultaneously deliver exact equivalence to pooled results, require only one communication round, and include formal privacy proofs would indicate the scope is broader than reported.

Figures

Figures reproduced from arXiv: 2604.02595 by Anita Burgun, F\'elix Camirand Lemyre, Jean-Fran\c{c}ois Ethier, Marie-Pier Domingue, Simon L\'evesque.

Figure 1
Figure 1. Figure 1: (a) Centralized data. (b) Horizontally partitioned data (example of two data nodes), where each pattern [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: General process in collaborative health data research with vertically partitioned data and main features [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: PRISMA flow chart for article screening process. Detailed inclusion and exclusion criteria are described [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Main communication workflows for the vertically distributed linear regression and logistic regression [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Role of privacy among included articles. [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 1
Figure 1. Figure 1: (a) Centralized data. (b) Horizontally partitioned data (example of two data [PITH_FULL_IMAGE:figures/full_fig_p023_1.png] view at source ↗
Figure 1
Figure 1. Figure 1: Extraction template from Covidence (1 of 2) [PITH_FULL_IMAGE:figures/full_fig_p035_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Extraction template from Covidence (2 of 2) [PITH_FULL_IMAGE:figures/full_fig_p036_2.png] view at source ↗
read the original abstract

To address the multidimensional nature of health-related questions, advances in health research often require integrating information from various data sources within statistical analyses. When complementary information pertaining to the same set of individuals are distributed across different institutions, vertical methods make it possible to obtain analysis results without sharing or pooling individual-level data. To guide stakeholders toward a transparent use of vertical methods, this study aims to (1) Identify existing vertical methods enabling statistical inference; and (2) Characterize the methodological properties of these methods and the current extent of their use with health data. We conducted a scoping review using four interdisciplinary databases. We then systematically extracted the characteristics of identified vertical methods with respect to comparability with the pooled analysis, efficiency of communication schemes and confidentiality. We additionally screened studies that cited included articles to identify applications on vertically partitioned real-world health data. Among 2887 articles initially screened, 30 were included in the review. Inference for the linear and the logistic regression framework were the most frequent statistical inference tasks undertaken in proposed methods. Equivalence with the pooled analyses was not systematically addressed and most methods required multiple communications between participating parties. Almost all articles described their approach as privacy-preserving, although a minority provided privacy assessments. The scope of existing approaches enabling statistical inference for vertically partitioned data is still relatively limited. Most existing methods do not concurrently achieve results equivalent to centralized analyses, high communication efficiency, and guaranteed protection of individual-level data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript is a scoping review that screened 2887 articles across four databases and included 30 papers on statistical inference methods for vertically partitioned data. It focuses on methods for linear and logistic regression, extracts properties related to equivalence with pooled analyses, communication efficiency, and privacy protection, and concludes that the scope of existing approaches remains limited, with most methods failing to concurrently achieve equivalence to centralized results, high communication efficiency, and guaranteed individual-level data protection.

Significance. If the property extraction is shown to be consistent and representative, the review would usefully map the landscape of vertical methods for multi-site health research and identify key gaps, particularly the frequent trade-offs among statistical fidelity, communication overhead, and privacy guarantees. This could inform both method developers and practitioners seeking to integrate complementary data sources without pooling raw records.

major comments (2)
  1. [Abstract and Results] Abstract and Results: The abstract states that equivalence 'was not systematically addressed' yet concludes that most methods do not achieve results equivalent to centralized analyses. Clarify in the methods or results how non-equivalence was determined when papers did not address it (e.g., whether absence of reporting was coded as failure), and provide the explicit coding rubric or supplementary extraction table used for the 30 included studies.
  2. [Results] Results: Only a minority of articles provided privacy assessments, yet almost all described their approach as privacy-preserving and the headline finding states that most methods lack 'guaranteed protection.' Specify the criteria applied to evaluate privacy guarantees (self-description versus formal assessment) and how this was operationalized across heterogeneous reporting standards.
minor comments (1)
  1. [Methods] Methods: Expand the description of the citation screening process and any quality or risk-of-bias considerations applied to the included methods, even if formal bias tools were not used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these constructive comments, which identify opportunities to improve transparency in our methods and results. We provide point-by-point responses below and will revise the manuscript to include explicit documentation of our coding process.

read point-by-point responses
  1. Referee: [Abstract and Results] Abstract and Results: The abstract states that equivalence 'was not systematically addressed' yet concludes that most methods do not achieve results equivalent to centralized analyses. Clarify in the methods or results how non-equivalence was determined when papers did not address it (e.g., whether absence of reporting was coded as failure), and provide the explicit coding rubric or supplementary extraction table used for the 30 included studies.

    Authors: We thank the referee for noting this ambiguity. Equivalence was coded only when papers explicitly addressed it (via theoretical proofs of identical estimators or empirical demonstrations of numerical equivalence to pooled results). Papers that omitted any discussion of equivalence were coded strictly as 'not addressed' and were not counted as non-equivalent. The headline conclusion that most methods fail to achieve equivalence is drawn from the subset of papers that did evaluate it (where the majority fell short) together with the observation that unverified methods cannot be assumed equivalent. We will add a supplementary extraction table and a detailed coding rubric in the revised methods section that lists, for each of the 30 studies, the exact status recorded for equivalence, number of communication rounds, and privacy evaluation. revision: yes

  2. Referee: [Results] Results: Only a minority of articles provided privacy assessments, yet almost all described their approach as privacy-preserving and the headline finding states that most methods lack 'guaranteed protection.' Specify the criteria applied to evaluate privacy guarantees (self-description versus formal assessment) and how this was operationalized across heterogeneous reporting standards.

    Authors: We agree that the distinction requires explicit statement. 'Guaranteed protection' was operationalized as the presence of a formal assessment (differential privacy bounds, cryptographic security proofs, or empirical attack-resistance evaluations). Self-descriptions such as 'privacy-preserving' or 'secure' without accompanying formal analysis were recorded but did not qualify as guaranteed protection. This rule was applied uniformly by examining the methods and results sections of each paper. We will revise the methods section to state these criteria verbatim and will add illustrative examples in the results to show how heterogeneous reporting was classified. revision: yes

Circularity Check

0 steps flagged

Scoping review of vertical inference methods contains no derivations or predictions

full rationale

The paper is a descriptive scoping review that screens 2887 articles, includes 30, and extracts methodological properties (equivalence to pooled analysis, communication rounds, privacy assessments) from external literature. It performs no mathematical derivations, statistical predictions, parameter fitting, or modeling steps. No equations, ansatzes, or self-citation chains are used to justify central claims; the characterization follows directly from the screening and coding protocol applied to independent papers. The skeptic concern about heterogeneous reporting affects the strength of the synthesis but does not create circularity within the paper's own logic.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a scoping review the paper introduces no new free parameters, axioms, or invented entities; it only summarizes properties reported in prior methodological papers.

pith-pipeline@v0.9.0 · 5587 in / 1129 out tokens · 44757 ms · 2026-05-13T19:13:01.542542+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

85 extracted references · 85 canonical work pages

  1. [1]

    Dash, S.K

    S. Dash, S.K. Shakyawar, M. Sharma, S. Kaushik, Big data in healthcare: management, analysis and future prospects, J. Big Data 6 (2019) 54. https://doi.org/10.1186/s40537-019-0217-0

  2. [2]

    Chawla, D.A

    N.V . Chawla, D.A. Davis, Bringing Big Data to Personalized Healthcare: A Patient- Centered Framework, J. Gen. Intern. Med. 28 (2013) 660–665. https://doi.org/10.1007/s11606-013-2455-8

  3. [3]

    Dankar, A

    F.K. Dankar, A. Ptitsyn, S.K. Dankar, The development of large-scale de-identified biomedical databases in the age of genomics—principles and challenges, Hum. Genomics 12 (2018) 19. https://doi.org/10.1186/s40246-018-0147-5

  4. [4]

    https://nuage.recherche.usherbrooke.ca/en/ (accessed February 9, 2026)

    Banques NuAge – Banques de données et d’échantillons biologiques de l’Étude longitudinale québécoise sur la nutrition comme déterminant d’un vieillissement réussi, (n.d.). https://nuage.recherche.usherbrooke.ca/en/ (accessed February 9, 2026)

  5. [5]

    Pfitzner, N

    B. Pfitzner, N. Steckhan, B. Arnrich, Federated Learning in a Medical Context: A Systematic Literature Review, ACM Trans. Internet Technol. 21 (2021) 1–31. https://doi.org/10.1145/3412357

  6. [6]

    Camirand Lemyre, S

    F. Camirand Lemyre, S. Lévesque, M.-P. Domingue, K. Herrmann, J.-F. Ethier, Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Health Analytics, JMIR Med. Inform. 12 (2024) e53622. https://doi.org/10.2196/53622

  7. [7]

    J. Bohn, W. Eddings, S. Schneeweiss, Conducting privacy-preserving multivariable propensity score analysis when patient covariate information is stored in separate locations, Am. J. Epidemiol. 185 (2017) 501–510. https://doi.org/10.1093/aje/kww155

  8. [8]

    Chang, Z

    C. Chang, Z. Bu, Q. Long, CEDAR: communication efficient distributed analysis for regressions, Biometrics 79 (2023) 2357–2369. https://doi.org/10.1111/biom.13786

  9. [9]

    Tong, J.M

    J. Tong, J.M. Reps, C. Luo, Y . Lu, L. Li, J.M. Ramirez-Anguita, M.T. Brand, S.L. DuVall, T. Falconer, A.M. Fuentes, Unlocking efficiency in real-world collaborative studies: a multi-site international study with one-shot lossless GLMM algorithm, Npj Digit. Med. 8 (2025) 457

  10. [10]

    Zhang, D

    F. Zhang, D. Kreuter, Y . Chen, S. Dittmer, S. Tull, T. Shadbahr, M. Schut, F. Asselbergs, S. Kar, S. Sivapalaratnam, Recent methodological advances in federated learning for healthcare, Patterns 5 (2024)

  11. [11]

    Shukla, S

    S. Shukla, S. Doyle, Federated learning in computational pathology: a literature review, J. Med. Imaging 12 (2025) 061412–061412

  12. [12]

    Torkzadehmahani, R

    R. Torkzadehmahani, R. Nasirigerdeh, D.B. Blumenthal, T. Kacprowski, M. List, J. Matschinske, J. Spaeth, N.K. Wenke, J. Baumbach, Privacy-Preserving Artificial Intelligence Techniques in Biomedicine, Methods Inf. Med. 61 (2022) e12–e27. https://doi.org/10.1055/s-0041-1740630

  13. [13]

    Multimodal biomedical AI.Nature Medicine, 28(9):1773–1784, 2022

    J.N. Acosta, G.J. Falcone, P. Rajpurkar, E.J. Topol, Multimodal biomedical AI, Nat. Med. 28 (2022) 1773–1784. https://doi.org/10.1038/s41591-022-01981-2

  14. [15]

    Y . Li, X. Jiang, S. Wang, H. Xiong, L. Ohno-Machado, VERTIcal Grid lOgistic regression (VERTIGO), J. Am. Med. Inform. Assoc. 23 (2016) 570–579. https://doi.org/10.1093/jamia/ocv146

  15. [16]

    C. Sun, L. Ippel, A. Dekker, M. Dumontier, J. van Soest, A systematic review on privacy-preserving distributed data mining, Data Sci. 4 (2021) 121–150. https://doi.org/10.3233/DS-210036

  16. [17]

    Y . Liu, Y . Kang, T. Zou, Y . Pu, Y . He, X. Ye, Y . Ouyang, Y .-Q. Zhang, Q. Yang, Vertical Federated Learning: Concepts, Advances, and Challenges, IEEE Trans. Knowl. Data Eng. 36 (2024) 3615–3634. https://doi.org/10.1109/TKDE.2024.3352628

  17. [18]

    A. Khan, M. ten Thij, A. Wilbik, Vertical federated learning: a structured literature review, Knowl. Inf. Syst. 67 (2025) 3205–3243. https://doi.org/10.1007/s10115-025- 02356-y

  18. [19]

    H. Chen, H. Wang, Q. Long, D. Jin, Y . Li, Advancements in Federated Learning: Models, Methods, and Privacy, ACM Comput. Surv. (2024) 3664650. https://doi.org/10.1145/3664650

  19. [20]

    Z.L. Teo, L. Jin, N. Liu, S. Li, D. Miao, X. Zhang, W.Y . Ng, T.F. Tan, D.M. Lee, K.J. Chua, J. Heng, Y . Liu, R.S.M. Goh, D.S.W. Ting, Federated machine learning in healthcare: A systematic review on clinical applications and technical architecture, Cell Rep. Med. 5 (2024) 101419. https://doi.org/10.1016/j.xcrm.2024.101419

  20. [21]

    Shmueli, To Explain or to Predict?, Stat

    G. Shmueli, To Explain or to Predict?, Stat. Sci. 25 (2010) 289–310

  21. [22]

    James, D

    G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning, Springer New York, New York, NY , 2013. https://doi.org/10.1007/978-1-4614-7138- 7

  22. [23]

    A wideband tunable, nonreciprocal bandpass filter using magnetostatic 19 surface waves with zero static power consumption,

    C. Luo, Md.N. Islam, N.E. Sheils, J. Buresh, J. Reps, M.J. Schuemie, P.B. Ryan, M. Edmondson, R. Duan, J. Tong, A. Marks-Anglin, J. Bian, Z. Chen, T. Duarte-Salles, S. Fernández-Bertolín, T. Falconer, C. Kim, R.W. Park, S.R. Pfohl, N.H. Shah, A.E. Williams, H. Xu, Y . Zhou, E. Lautenbach, J.A. Doshi, R.M. Werner, D.A. Asch, Y . Chen, DLMM as a lossless on...

  23. [24]

    Q. Wu, J.M. Reps, L. Li, B. Zhang, Y . Lu, J. Tong, D. Zhang, T. Lumley, M.T. Brand, M. Van Zandt, T. Falconer, X. He, Y . Huang, H. Li, C. Yan, G. Tang, A.E. Williams, F. Wang, J. Bian, B. Malin, G. Hripcsak, M.J. Schuemie, Y . Lu, S. Drew, J. Zhou, D.A. Asch, Y . Chen, COLA-GLM: collaborative one-shot and lossless algorithms of generalized linear models...

  24. [26]

    Domingue, J.-F

    M.-P. Domingue, J.-F. Ethier, J.-P. Morissette, S. Lévesque, A. Burgun, F. Camirand Lemyre, Revisiting VERTIGO and VERTIGO-CI: Identifying confidentiality breaches and introducing a statistically sound, efficient alternative, (2025). https://doi.org/10.21203/rs.3.rs-6933988/v1

  25. [27]

    Bak, V .I

    M. Bak, V .I. Madai, L.A. Celi, G.A. Kaissis, R. Cornet, M. Maris, D. Rueckert, A. Buyx, S. McLennan, Federated learning is not a cure-all for data ethics, Nat. Mach. Intell. 6 (2024) 370–372. https://doi.org/10.1038/s42256-024-00813-x

  26. [28]

    Munn, M.D.J

    Z. Munn, M.D.J. Peters, C. Stern, C. Tufanaru, A. McArthur, E. Aromataris, Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach, BMC Med. Res. Methodol. 18 (2018) 143. https://doi.org/10.1186/s12874-018-0611-x

  27. [29]

    Arksey, L

    H. Arksey, L. O’Malley, Scoping studies: towards a methodological framework, Int. J. Soc. Res. Methodol. 8 (2005) 19–32. https://doi.org/10.1080/1364557032000119616

  28. [30]

    Tricco, Erin Lillie, Wasifa Zarin, Kelly K

    A.C. Tricco, E. Lillie, W. Zarin, K.K. O’Brien, H. Colquhoun, D. Levac, D. Moher, M.D.J. Peters, T. Horsley, L. Weeks, S. Hempel, E.A. Akl, C. Chang, J. McGowan, L. Stewart, L. Hartling, A. Aldcroft, M.G. Wilson, C. Garritty, S. Lewin, C.M. Godfrey, M.T. Macdonald, E.V . Langlois, K. Soares-Weiser, J. Moriarty, T. Clifford, Ö. Tunçalp, S.E. Straus, PRISMA...

  29. [31]

    Wohlin, Guidelines for snowballing in systematic literature studies and a replication in software engineering, in: Proc

    C. Wohlin, Guidelines for snowballing in systematic literature studies and a replication in software engineering, in: Proc. 18th Int. Conf. Eval. Assess. Softw. Eng., ACM, London England United Kingdom, 2014: pp. 1–10. https://doi.org/10.1145/2601248.2601268

  30. [32]

    Carrera-Rivera, W

    A. Carrera-Rivera, W. Ochoa, F. Larrinaga, G. Lasa, How-to conduct a systematic literature review: A quick guide for computer science research, MethodsX 9 (2022) 101895. https://doi.org/10.1016/j.mex.2022.101895

  31. [33]

    Fienberg, A.F

    S.E. Fienberg, A.F. Karr, Y . Nardi, A.B. Slavkovic, Secure Logistic Regression with Multi-Party Distributed Databases, Proc. 56th Sess. ISI (2007)

  32. [35]

    M.-C. Liu, N. Zhang, A cryptographic solution to privacy-preserving two-party sign test computation on vertically partitioned data, in: Adv. Mater. Res., 2012: pp. 1249–

  33. [39]

    Reiter, C.N

    J.P. Reiter, C.N. Kohnen, A.F. Karr, X. Lin, A.P. Sanil, Secure Regression for Vertically Partitioned, Partially Overlapping Data, Proc. Am. Stat. Assoc. (2004)

  34. [40]

    Hall, S.E

    R. Hall, S.E. Fienberg, Y . Nardi, Secure Multiple Linear Regression Based on Homomorphic Encryption, (2011)

  35. [42]

    Kikuchi, C

    H. Kikuchi, C. Hamanaga, H. Yasunaga, H. Matsui, H. Hashimoto, Privacy- Preserving Multiple Linear Regression of Vertically Partitioned Real Medical Datasets, in: 2017 IEEE 31st Int. Conf. Adv. Inf. Netw. Appl. AINA, 2017: pp. 1042–

  36. [44]

    A.F. Karr, X. Lin, A.P. Sanil, J.P. Reiter, Secure statistical analysis of distributed databases, in: Stat. Methods Counterterrorism Game Theory Model. Syndr. Surveill. Biom. Authentication, 2006: pp. 237–261. https://doi.org/10.1007/0-387-35209- 0_14

  37. [45]

    A.F. Karr, X. Lin, A.P. Sanil, J.P. Reiter, Privacy-preserving analysis of vertically partitioned data using secure matrix products, J. Off. Stat. 25 (2009) 125–138

  38. [47]

    In: Krebbers, R

    S.E. Fienberg, Y . Nardi, A.B. Slavković, Valid statistical analysis for logistic regression with multiple sources, in: Lect. Notes Comput. Sci. Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinforma., 2009: pp. 82–94. https://doi.org/10.1007/978- 3-642-10233-2_8

  39. [50]

    J. Kim, W. Li, T. Bath, X. Jiang, L. Ohno-Machado, VERTIcal Grid lOgistic regression with Confidence Intervals (VERTIGO-CI), Proc. – AMIA Jt. Summits Transl. Sci. 2021 (2021) 355–364

  40. [52]

    Hector, P.X.-K

    E.C. Hector, P.X.-K. Song, Doubly distributed supervised learning and inference with high-dimensional correlated outcomes, J Mach Learn Res 21 (2020) Article-173

  41. [56]

    Snoke, T

    J. Snoke, T. Brick, A. Slavković, Accurate Estimation of Structural Equation Models with Remote Partitioned Data, in: Priv. Stat. Databases, Springer International Publishing, Cham, 2016: pp. 190–209

  42. [63]

    Aggarwal, P.S

    C.C. Aggarwal, P.S. Yu, Privacy-Preserving Data Mining: Models and Algorithms, Springer, 2008

  43. [64]

    Q. Her, T. Kent, Y . Samizo, A. Slavkovic, Y . Vilk, S. Toh, Automatable Distributed Regression Analysis of Vertically Partitioned Data Facilitated by PopMedNet: Feasibility and Enhancement Study, JMIR Med. Inform. 9 (2021) e21459. https://doi.org/10.2196/21459

  44. [65]

    Kuo, R.A

    T.-T. Kuo, R.A. Gabriel, J. Koola, R.T. Schooley, L. Ohno-Machado, Distributed cross-learning for equitable federated models-privacy-preserving prediction on data from five California hospitals, Nat. Commun. 16 (2025) 1371

  45. [66]

    Huang, H

    X. Huang, H. Kikuchi, C.-I. Fan, Privacy Preserved Spectral Analysis Using IoT mHealth Biomedical Data for Stress Estimation, in: 2018 IEEE 32nd Int. Conf. Adv. Inf. Netw. Appl. AINA, 2018: pp. 793–800. https://doi.org/10.1109/AINA.2018.00118

  46. [67]

    Y . Li, D. Feng, Y . Sui, H. Li, Y . Song, T. Zhan, G. Cicconetti, M. Jin, H. Wang, I. Chan, X. Wang, Analyzing longitudinal binary data in clinical studies, Contemp. Clin. Trials 115 (2022) 106717. https://doi.org/10.1016/j.cct.2022.106717. Multi-Site Health Research Integrating Complementary Data Sources: A Scoping Review of Statistical Inference Method...

  47. [68]

    RESEARCH QUESTION ....................................................................................................... 1

  48. [69]

    METHODS .............................................................................................................................. 1 2.1. Keywords ....................................................................................................................... 1 2.1.1. Limits and restrictions ........................................................

  49. [70]

    continuous vs binary covariates)

    RESEARCH QUESTION 1.1.What existing distributed methods allow conducting statistical inference procedures with vertically partitioned data?  Regarding: Methods for different statistical models; Methods for various settings in terms of privacy; Methods for different data settings (e.g. continuous vs binary covariates). 1.2.What are the characteristics spe...

  50. [71]

    privacy-preserving

    METHODS 2.1. Keywords In accordance with our previous review, while making sure we capture all potential existing methods, two categories of keywords were targeted, corresponding to the two themes in the research question.  Distributed analyses: o Partitioned  Partitioned  Federated  Distributed  Aggregated  Privacy-preserving  Multiparty  Multipl...

  51. [72]

    Examples of papers that would not meet the criteria: the method is 5 presented on horizontally partitioned data only; or the method requires pooling all line- level data

    Vertically partitioned data This paper/study presents a solution to perform inferential statistics with vertically partitioned data. Examples of papers that would not meet the criteria: the method is 5 presented on horizontally partitioned data only; or the method requires pooling all line- level data

  52. [73]

    e.g., the focus is not on estimation and/or confidence intervals and/or hypothesis testing

    Inferential Statistics The paper/study does not specifically address inferential statistics (Confidence intervals, Hypothesis testing or Asymptotic normality result). e.g., the focus is not on estimation and/or confidence intervals and/or hypothesis testing

  53. [74]

    e.g., the study is solely an application of a previously developed and presented method

    Methodologi cal contribution The paper/study does not provide a new methodological contribution. e.g., the study is solely an application of a previously developed and presented method

  54. [75]

    Published study The paper/study has not been published

  55. [76]

    Language The full-text is not available in English or French. 2.4. Data-charting A data-charting form was collectively developed, and extraction will be addressed manually. Data will be independently extracted by two authors (MPD and SL) for all studies. In the case of opposite opinions from the two initial reviewers, the reviewers will discuss and, if ne...

  56. [77]

    Accurate Estimation of Structural Equation Models with Remote Partitioned Data

    Snoke J, Brick T, Slavković A. Accurate Estimation of Structural Equation Models with Remote Partitioned Data. In: Privacy in Statistical Databases. Cham: Springer International Publishing; 2016. p. 190–209

  57. [78]

    VERTIcal Grid lOgistic regression with Confidence Intervals (VERTIGO-CI)

    Kim J, Li W, Bath T, Jiang X, Ohno-Machado L. VERTIcal Grid lOgistic regression with Confidence Intervals (VERTIGO-CI). Proc – AMIA Jt Summits Transl Sci. 2021;2021:355–64

  58. [79]

    VERTICOX: Vertically Distributed Cox Proportional Hazards Model Using the Alternating Direction Method of Multipliers

    Dai W, Jiang X, Bonomi L, Li Y , Xiong H, Ohno-Machado L. VERTICOX: Vertically Distributed Cox Proportional Hazards Model Using the Alternating Direction Method of Multipliers. IEEE Trans Knowl Data Eng. 2022;34(2):996–1010. doi:10.1109/TKDE.2020.2989301

  59. [80]

    Advances in Water Resources 166, 104264

    Imakura A, Tsunoda R, Kagawa R, Yamagata K, Sakurai T. DC-COX: Data collaboration Cox proportional hazards model for privacy-preserving survival analysis on multiple parties. J Biomed Inform. 2023;137. doi:10.1016/j.jbi.2022.104264

  60. [81]

    Valid statistical analysis for logistic regression with multiple sources

    Fienberg SE, Nardi Y , Slavković AB. Valid statistical analysis for logistic regression with multiple sources. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2009. p. 82–94. doi:10.1007/978-3-642-10233-2_8

  61. [82]

    Secure Logistic Regression with Multi-Party Distributed Databases

    Fienberg SE, Karr AF, Nardi Y , Slavkovic AB. Secure Logistic Regression with Multi-Party Distributed Databases. Proc 56th Sess ISI. 2007

  62. [83]

    Secure logistic regression of horizontally and vertically partitioned distributed databases

    Slavkovic AB, Nardi Y , Tibbits MM. Secure logistic regression of horizontally and vertically partitioned distributed databases. In: Proceedings - IEEE International Conference on Data Mining, ICDM. 2007. p. 723–8. doi:10.1109/ICDMW.2007.114

  63. [84]

    Providing accurate models across private partitioned data: Secure maximum likelihood estimation

    Snoke J, Brick TR, Slavković A, Hunte MD. Providing accurate models across private partitioned data: Secure maximum likelihood estimation. Ann Appl Stat. 2018;12(2):877–914. doi:10.1214/18-AOAS1171

  64. [85]

    Privacy-preserving analysis of vertically partitioned data using secure matrix products

    Karr AF, Lin X, Sanil AP, Reiter JP. Privacy-preserving analysis of vertically partitioned data using secure matrix products. J Off Stat. 2009;25(1):125–38

  65. [86]

    Secure statistical analysis of distributed databases

    Karr AF, Lin X, Sanil AP, Reiter JP. Secure statistical analysis of distributed databases. In: Statistical Methods in Counterterrorism: Game Theory, Modeling, Syndromic Surveillance, and Biometric Authentication. 2006. p. 237–61. doi:10.1007/0-387-35209-0_14

  66. [87]

    Secure Regression for Vertically Partitioned, Partially Overlapping Data

    Reiter JP, Kohnen CN, Karr AF, Lin X, Sanil AP. Secure Regression for Vertically Partitioned, Partially Overlapping Data. Proc Am Stat Assoc. 2004

  67. [88]

    Privacy-preserving multivariate statistical analysis: Linear regression and classification

    Du W, Han YS, Chen S. Privacy-preserving multivariate statistical analysis: Linear regression and classification. In: SIAM Proceedings Series. 2004. p. 222–33. doi:10.1137/1.9781611972740.21 6

  68. [89]

    A solution to privacy-preserving two-party sign test on vertically partitioned data (P22NSTv) using data disguising techniques

    Liu MC, Zhang N. A solution to privacy-preserving two-party sign test on vertically partitioned data (P22NSTv) using data disguising techniques. In: ICNIT 2010 - 2010 International Conference on Networking and Information Technology. 2010. p. 526–34. doi:10.1109/ICNIT.2010.5508458

  69. [90]

    Privacy-preserving cloud-based statistical analyses on sensitive categorical data

    Ricci S, Domingo-Ferrer J, Sánchez D. Privacy-preserving cloud-based statistical analyses on sensitive categorical data. In: International Conference on Modeling Decisions for Artificial Intelligence. 2016. p. 227–38. doi:10.1007/978-3-319-45656-0_19

  70. [91]

    Privacy-Preserving Randomized Controlled Trials: A Protocol for Industry Scale Deployment

    Movahedi M, Case BM, Honaker J, Knox A, Li L, Li YP, et al. Privacy-Preserving Randomized Controlled Trials: A Protocol for Industry Scale Deployment. In: Proceedings of the 2021 on Cloud Computing Security Workshop. Association for Computing Machinery; 2021. p. 59–69. doi:10.1145/3474123.3486764

  71. [92]

    Privacy-preserving hypothesis testing for the analysis of epidemiological medical data

    Kikuchi H, Sato T, Sakuma J. Privacy-preserving hypothesis testing for the analysis of epidemiological medical data. In: Proceedings - International Conference on Advanced Information Networking and Applications, AINA. 2014. p. 359–65. doi:10.1109/AINA.2014.46

  72. [93]

    Privacy-Preserving Multiple Linear Regression of Vertically Partitioned Real Medical Datasets

    Kikuchi H, Hamanaga C, Yasunaga H, Matsui H, Hashimoto H. Privacy-Preserving Multiple Linear Regression of Vertically Partitioned Real Medical Datasets. In: 2017 IEEE 31st International Conference on Advanced Information Networking and Applications (AINA). 2017. p. 1042–9. doi:10.1109/AINA.2017.52

  73. [94]

    Yasunaga, Saito T

    Kikuchi H, Hashimoto H, H. Yasunaga, Saito T. Scalability of Privacy-Preserving Linear Regression in Epidemiological Studies. In: 2015 IEEE 29th International Conference on Advanced Information Networking and Applications. 2015. p. 510–4. doi:10.1109/AINA.2015.229

  74. [95]

    Efficient Privacy-Preserving Logistic Regression with Iteratively Re-weighted Least Squares

    Kikuchi H, Yasunaga H, Matsui H, Fan CI. Efficient Privacy-Preserving Logistic Regression with Iteratively Re-weighted Least Squares. In: 2016 11th Asia Joint Conference on Information Security (AsiaJCIS). 2016. p. 48–54. doi:10.1109/AsiaJCIS.2016.21

  75. [96]

    Secure Multiple Linear Regression Based on Homomorphic Encryption

    Hall R, Fienberg SE, Nardi Y . Secure Multiple Linear Regression Based on Homomorphic Encryption. 2011

  76. [97]

    Privacy-preserving cooperative statistical analysis

    Du W, Atallah MJ. Privacy-preserving cooperative statistical analysis. In: Proceedings - Annual Computer Security Applications Conference, ACSAC. 2001. p. 102–10. doi:10.1109/ACSAC.2001.991526

  77. [98]

    Fast, Privacy Preserving Linear Regression over Distributed Datasets based on Pre-Distributed Data

    Cock M de, Dowsley R, Nascimento ACA, Newman SC. Fast, Privacy Preserving Linear Regression over Distributed Datasets based on Pre-Distributed Data. In: Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security. Association for Computing Machinery; 2015. p. 3–14. doi:10.1145/2808769.2808774

  78. [99]

    Privacy-preserving statistical analysis by exact logistic regression

    Duverle DA, Kawasaki S, Yamada Y , Sakuma J, Tsuda K. Privacy-preserving statistical analysis by exact logistic regression. In: Proceedings - 2015 IEEE Security and Privacy Workshops, SPW 2015. 2015. p. 7–16. doi:10.1109/SPW.2015.14 7

  79. [100]

    Accurate training of the Cox proportional hazards model on vertically-partitioned data while preserving privacy

    Kamphorst B, Rooijakkers T, Veugen T, Cellamare M, Knoors D. Accurate training of the Cox proportional hazards model on vertically-partitioned data while preserving privacy. BMC Med Inform Decis Mak. 2022;22(1):49. Located at: 35209883. doi:10.1186/s12911-022-01771-3

  80. [101]

    F. Wu, B. Xi. Differentially Private Causal Inference Under Hierarchical Design. In: 2023 IEEE International Conference on Data Mining Workshops (ICDMW). 2023. p. 1390–9. doi:10.1109/ICDMW60847.2023.00177

Showing first 80 references.