Sherpa.ai Privacy-Preserving Multi-Party Entity Alignment without Intersection Disclosure for Noisy Identifiers

Daniel M. Jimenez-Gutierrez; Dario Pighin; Enrique Zuazua; Georgios Kellaris; Joaquin Del Rio; Oleksii Sliusarenko; Xabi Uribe-Etxebarria

arxiv: 2604.19219 · v2 · pith:TKADPU2Knew · submitted 2026-04-21 · 💻 cs.CR · cs.AI· cs.DC· cs.LG

Sherpa.ai Privacy-Preserving Multi-Party Entity Alignment without Intersection Disclosure for Noisy Identifiers

Daniel M. Jimenez-Gutierrez , Dario Pighin , Enrique Zuazua , Georgios Kellaris , Joaquin Del Rio , Oleksii Sliusarenko , Xabi Uribe-Etxebarria This is my paper

Pith reviewed 2026-05-21 00:08 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.DCcs.LG

keywords privacy-preserving entity alignmentprivate set unionvertical federated learningmulti-party computationnoisy matchingintersection privacyentity resolution

0 comments

The pith

A multi-party private set union protocol aligns entities for vertical federated learning while hiding which records are shared.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a protocol that lets multiple organizations align their datasets on a common index without revealing which samples they have in common. It generalizes earlier two-party methods to handle more participants and adds support for both exact matches and matches tolerant to typos or formatting differences in identifiers. The approach uses private set union instead of intersection to avoid exposing sensitive relationships between datasets. Proofs of correctness and privacy are given along with complexity analysis for communication and computation. This setup targets practical vertical federated learning tasks such as joint disease modeling or fraud detection across institutions.

Core claim

The Sherpa.ai multi-party PSU protocol for VFL provides privacy-preserving entity alignment by operating on the union of identifiers rather than their intersection, thereby concealing membership information; it offers an order-preserving variant for exact alignment and an unordered variant that tolerates typographical and formatting noise in identifiers, with formal proofs of correctness and privacy plus a universal index mapping from local records to a shared space.

What carries the argument

The Sherpa.ai multi-party private set union protocol, which aligns records on the union of identifiers across parties while keeping intersection membership hidden and supporting both exact and approximate matching.

If this is right

Vertical federated learning becomes feasible across multiple organizations without exposing shared sample relationships.
Alignment works for noisy real-world identifiers such as misspelled names or inconsistent address formats.
Communication scales to more than two parties with lower overhead than running pairwise protocols.
Formal privacy and correctness guarantees apply to both the exact and noisy-matching variants.
Applications include cross-institution healthcare modeling and collaborative fraud detection without central data sharing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Organizations could adopt this alignment step before training joint models, reducing reliance on a trusted intermediary.
The unordered variant might extend to other approximate record-linkage settings beyond the paper's examples.
Integration with existing vertical federated learning frameworks would require only the index-mapping step described.
Testing on real multi-institutional datasets with controlled noise levels would quantify the practical privacy gain.

Load-bearing premise

The protocol can be realized securely under standard multi-party cryptographic assumptions and the unordered variant introduces no new leakage when identifiers contain noise.

What would settle it

An attack recovering intersection membership from protocol messages or outputs with success probability noticeably above random guessing.

Figures

Figures reproduced from arXiv: 2604.19219 by Daniel M. Jimenez-Gutierrez, Dario Pighin, Enrique Zuazua, Georgios Kellaris, Joaquin Del Rio, Oleksii Sliusarenko, Xabi Uribe-Etxebarria.

**Figure 2.** Figure 2: Illustration of the PSI protocol. Only the common identifiers (IDs) between the two parties are [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of the PSU protocol. All unique IDs across parties form the union dataset used for [PITH_FULL_IMAGE:figures/full_fig_p002_3.png] view at source ↗

**Figure 4.** Figure 4: Pipeline of the proposed PSU protocol for multi-party VFL. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Scheme describing the main steps of the first part of the Diffie-Hellman protocol employed for PSU. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Scheme describing the main steps of the second part of the Diffie-Hellman protocol employed for [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

read the original abstract

Federated Learning (FL) enables collaborative model training among multiple parties without centralizing raw data. There are two main paradigms in FL: Horizontal FL (HFL), where all participants share the same feature space but hold different samples, and Vertical FL (VFL), where parties possess complementary features for the same set of samples. A prerequisite for VFL training is privacy-preserving entity alignment (PPEA), which establishes a common index of samples across parties (alignment) without revealing which samples are shared between them. Conventional private set intersection (PSI) achieves alignment but leaks intersection membership, exposing sensitive relationships between datasets. The standard private set union (PSU) mitigates this risk by aligning on the union of identifiers rather than the intersection. However, existing approaches are often limited to two parties or lack support for typo-tolerant matching. In this paper, we introduce the Sherpa.ai multi-party PSU protocol for VFL, a PPEA method that hides intersection membership and enables both exact and noisy matching. The protocol generalizes two-party approaches to multiple parties with low communication overhead and offers two variants: an order-preserving version for exact alignment and an unordered version tolerant to typographical and formatting discrepancies. We prove correctness and privacy, analyze communication and computational (exponentiation) complexity, and formalize a universal index mapping from local records to a shared index space. This multi-party PSU offers a scalable, mathematically grounded protocol for PPEA in real-world VFL deployments, such as multi-institutional healthcare disease detection, collaborative risk modeling between banks and insurers, and cross-domain fraud detection between telecommunications and financial institutions, while preserving intersection privacy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a multi-party PSU construction for PPEA in VFL that adds noisy matching support and includes proofs plus complexity analysis that hold together under standard assumptions.

read the letter

The main point is that Sherpa.ai presents a multi-party private set union protocol for privacy-preserving entity alignment in vertical federated learning. It supports both exact and noisy identifier matching while avoiding disclosure of the intersection, and it generalizes prior two-party work with two variants plus a universal index mapping. The authors prove correctness and privacy, then analyze communication and exponentiation costs. This targets practical settings like healthcare data sharing or cross-bank modeling where parties need alignment without leaking relationships. The construction and formal elements are the clearest additions here. The protocol description, security definitions, and complexity claims line up internally under the semi-honest model with ordinary cryptographic primitives, so the central argument does not collapse on its own terms. The noise-tolerant unordered variant is handled through the design choices rather than ad-hoc fixes. Soft spots are limited and mostly standard for this area. Security stays in the semi-honest setting, which leaves malicious-party resistance for later work. The low-overhead claim is backed by the analysis but would be stronger with explicit scaling numbers for larger party counts. No circular reasoning or unfalsifiable steps appear in the proofs or mapping. This paper is aimed at cryptographers and applied researchers working on federated learning in regulated domains. A reader focused on multi-party privacy protocols or VFL infrastructure would pick up usable variants and the index formalization. It has enough formal grounding and addresses a real gap to merit a serious referee. I would send it to peer review, asking for more on multi-party scaling behavior and any concrete noise-tolerance checks.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces Sherpa.ai, a multi-party private set union (PSU) protocol for privacy-preserving entity alignment (PPEA) in vertical federated learning (VFL). It enables multiple parties to align records on the union of identifiers without disclosing intersection membership, supports both exact matching via an order-preserving variant and noisy matching tolerant to typographical discrepancies via an unordered variant, generalizes prior two-party techniques with low communication overhead, includes proofs of correctness and privacy, analyzes communication and exponentiation complexity, and formalizes a universal index mapping from local records to a shared index space. Applications in multi-institutional healthcare, bank-insurer risk modeling, and cross-domain fraud detection are discussed.

Significance. If the security definitions, proofs, and complexity bounds hold under standard assumptions such as the semi-honest model with common cryptographic primitives, the work provides a practical advance for multi-party VFL by solving the intersection-leakage problem of PSI while adding support for noisy identifiers. The low-overhead multi-party generalization and formal index mapping could enable scalable deployments in privacy-sensitive domains where existing two-party PSU methods fall short.

minor comments (3)

Abstract: the claim of 'low communication overhead' is stated without a quantitative comparison to the two-party baselines generalized from; adding one sentence with asymptotic or concrete costs would improve context.
Section on the unordered variant: the description of how typographical and formatting discrepancies are handled without creating new leakage channels could include a short worked example of identifier normalization to aid verification.
Complexity analysis: the exponentiation count is reported but a small table juxtaposing the multi-party costs against the referenced two-party protocols would clarify the overhead scaling.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision for our manuscript on Sherpa.ai. We appreciate the recognition of the multi-party PSU protocol's contributions to PPEA in VFL, including support for exact and noisy matching while hiding intersection membership. No specific major comments were listed in the report, so we provide no point-by-point responses below. We will incorporate any minor suggestions during revision.

Circularity Check

0 steps flagged

No significant circularity in protocol construction

full rationale

The paper presents a cryptographic multi-party PSU protocol for PPEA that generalizes two-party methods, with explicit proofs of correctness/privacy, complexity analysis, and a formalized universal index mapping. No equations, fitted parameters, self-definitional reductions, or load-bearing self-citations appear in the provided claims or abstract. The derivation rests on standard semi-honest cryptographic assumptions and formal proofs rather than reducing to prior fitted results or self-referential inputs by construction. This is a self-contained construction paper with independent content.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The protocol description relies on standard cryptographic primitives for PSI/PSU and the assumption that noisy matching can be performed privately; no free parameters or new invented entities are mentioned in the abstract.

axioms (1)

domain assumption Standard cryptographic assumptions underlying private set union protocols hold for the multi-party case.
The privacy and correctness proofs are stated to rest on these background primitives.

pith-pipeline@v0.9.0 · 5876 in / 1321 out tokens · 42797 ms · 2026-05-21T00:08:10.009434+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce the Sherpa.ai multi-party PSU protocol for VFL, a PPEA method that hides intersection membership and enables both exact and noisy matching... commutative encryption process based on the Diffie–Hellman key exchange principle

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

[1]

Acero, D

A. Acero, D. M. Jimenez-Gutierrez, D. Pighin, E. Zuazua, J. Del Rio, and X. Uribe-Etxebarria. The sherpa. ai blind vertical federated learning paradigm to minimize the number of communications.arXiv preprint arXiv:2510.17901, 2025

work page arXiv 2025
[2]

A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm.Journal of the royal statistical society: series B (methodological), 39(1):1--22, 1977

work page 1977
[3]

Diffie and M

W. Diffie and M. Hellman. New directions in cryptography.IEEE transactions on Information Theory, 22(6):644--654, 1976

work page 1976
[4]

E. A. Durham, M. Kantarcioglu, Y. Xue, C. Toth, M. Kuzu, and B. Malin. Composite bloom filters for secure record linkage.IEEE transactions on knowledge and data engineering, 26(12):2956--2968, 2013

work page 2013
[5]

K. Frikken. Privacy-preserving set union. InInternational Conference on Applied Cryptography and Network Security, pages 237--252. Springer, 2007

work page 2007
[6]

J. Gao, S. Nguyen, M. Blanton, and N. Trieu. Pulse: Parallel private set union for large-scale entities. Cryptology ePrint Archive, 2025

work page 2025
[7]

J. Gao, S. Nguyen, and N. Trieu. Toward a practical multi-party private set union.Cryptology ePrint Archive, 2023

work page 2023
[8]

Y. Gao, X. Zheng, and C. Hu. A multi-party private set union protocol against malicious adversary. In International Conference on Innovative Computing, pages 159--167. Springer, 2024

work page 2024
[9]

Gkoulalas-Divanis, D

A. Gkoulalas-Divanis, D. Vatsalan, D. Karapiperis, and M. Kantarcioglu. Modern privacy-preserving record linkage techniques: An overview.IEEE Transactions on Information Forensics and Security, 16:4966--4987, 2021

work page 2021
[10]

S. Gopi, P. Gulhane, J. Kulkarni, J. H. Shen, M. Shokouhi, and S. Yekhanin. Differentially private set union. InInternational Conference on Machine Learning, pages 3627--3636. PMLR, 2020

work page 2020
[11]

Y. He, X. Tan, J. Ni, L. T. Yang, and X. Deng. Differentially private set intersection for asymmetrical id alignment.IEEE Transactions on Information Forensics and Security, 17:3479--3494, 2022

work page 2022
[12]

Huang, X

P.-S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. Learning deep structured semantic models for web search using clickthrough data. InProceedings of the 22nd ACM international conference on Information & Knowledge Management, pages 2333--2338, 2013

work page 2013
[13]

Indyk and R

P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. InProceedings of the thirtieth annual ACM symposium on Theory of computing, pages 604--613, 1998

work page 1998
[14]

Jia, S.-F

Y. Jia, S.-F. Sun, H.-S. Zhou, J. Du, and D. Gu. Shuffle-based private set union: Faster and more secure. In31st USENIX Security Symposium (USENIX Security 22), pages 2947--2964, 2022

work page 2022
[15]

Jia, S.-F

Y. Jia, S.-F. Sun, H.-S. Zhou, and D. Gu. Scalable private set union, with stronger security. In33rd USENIX Security Symposium (USENIX Security 24), pages 6471--6488, 2024

work page 2024
[16]

D. M. Jimenez-Gutierrez, Y. Falkouskaya, J. L. Hernandez-Ramos, A. Anagnostopoulos, I. Chatzi- giannakis, and A. Vitaletti. On the security and privacy of federated learning: A survey with attacks, defenses, frameworks, applications, and future directions.arXiv preprint arXiv:2508.13730, 2025

work page arXiv 2025
[17]

Kissner and D

L. Kissner and D. Song. Privacy-preserving set operations. InAnnual International Cryptology Conference, pages 241--257. Springer, 2005

work page 2005
[18]

Kolesnikov, M

V. Kolesnikov, M. Rosulek, N. Trieu, and X. Wang. Scalable private set union from symmetric-key techniques. InInternational Conference on the Theory and Application of Cryptology and Information Security, pages 636--666. Springer, 2019

work page 2019
[19]

R. J. Little and D. B. Rubin.Statistical analysis with missing data. John Wiley & Sons, 2019

work page 2019
[20]

McMahan, E

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas. Communication-efficient learning of deep networks from decentralized data. InArtificial intelligence and statistics, pages 1273--1282. PMLR, 2017

work page 2017
[21]

Patki, R

N. Patki, R. Wedge, and K. Veeramachaneni. The synthetic data vault. In2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pages 399--410, 2016

work page 2016
[22]

D. B. Rubin. Inference and missing data.Biometrika, 63(3):581--592, 1976

work page 1976
[23]

J. L. Schafer.Analysis of incomplete multivariate data. CRC press, 1997. 21 PRIME AI paper

work page 1997
[24]

Schnell, T

R. Schnell, T. Bachteler, and J. Reiher. Privacy-preserving record linkage using bloom filters.BMC medical informatics and decision making, 9(1):1--11, 2009

work page 2009
[25]

J. H. Seo, J. H. Cheon, and J. Katz. Constant-round multi-party private set union using reversed laurent series. InInternational Workshop on Public Key Cryptography, pages 398--412. Springer, 2012

work page 2012
[26]

J. Sun, X. Yang, Y. Yao, A. Zhang, W. Gao, J. Xie, and C. Wang. Vertical federated learning without revealing intersection membership.arXiv preprint:2106.05508, 2021

work page arXiv 2021
[27]

B. Tu, Y. Bai, C. Zhang, Y. Cao, and Y. Chen. Fast enhanced private set union in the balanced and unbalanced scenarios.Cryptology ePrint Archive, 2025

work page 2025
[28]

B. Tu, Y. Chen, Q. Liu, and C. Zhang. Fast unbalanced private set union from fully homomorphic encryption. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pages 2959--2973, 2023

work page 2023
[29]

E. Uzun, S. P. Chung, V. Kolesnikov, A. Boldyreva, and W. Lee. Fuzzy labeled private set intersection with applications to private{Real-Time}biometric search. In30th USENIX Security Symposium (USENIX Security 21), pages 911--928, 2021

work page 2021
[30]

F. Wang, B. Mi, and R. Zeng. Efficient private set intersection for vertical federated learning in iov. In International Conference on Frontiers in Cyber Security, pages 120--130. Springer, 2024

work page 2024
[31]

J. Wang, E. X. Huang, P. Duan, H. Wang, and K.-Y. Lam. Psa: private set alignment for secure and collaborative analytics on large-scale data.IEEE Transactions on Dependable and Secure Computing, 2025

work page 2025
[32]

J. Wen, Z. Zhang, Y. Lan, Z. Cui, J. Cai, and W. Zhang. A survey on federated learning: challenges and applications.International journal of machine learning and cybernetics, 14(2):513--535, 2023

work page 2023
[33]

Y. Xi, Y. Guo, S. Xu, C. Cai, and X. Jia. Private sample alignment for vertical federated learning: An efficient and reliable realization.IEEE Transactions on Information Forensics and Security, 2025

work page 2025
[34]

Q. Yang, Y. Liu, Y. Cheng, Y. Kang, T. Chen, and H. Yu. Horizontal federated learning. InFederated learning, pages 49--67. Springer, 2022

work page 2022
[35]

Zhang, Y

C. Zhang, Y. Chen, W. Liu, L. Peng, M. Hao, A. Wang, and X. Wang. Unbalanced private set union with reduced computation and communication. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, pages 1434--1447, 2024

work page 2024
[36]

Zhang, Y

C. Zhang, Y. Chen, W. Liu, M. Zhang, and D. Lin. Linear private set union from{Multi-Query}reverse private membership test. In32nd USENIX Security Symposium (USENIX Security 23), pages 337--354, 2023

work page 2023
[37]

Z. Zhao, X. Liang, H. Huang, and K. Wang. Deep federated learning hybrid optimization model based on encrypted aligned data.Pattern Recognition, 148:110193, 2024. 22

work page 2024

[1] [1]

Acero, D

A. Acero, D. M. Jimenez-Gutierrez, D. Pighin, E. Zuazua, J. Del Rio, and X. Uribe-Etxebarria. The sherpa. ai blind vertical federated learning paradigm to minimize the number of communications.arXiv preprint arXiv:2510.17901, 2025

work page arXiv 2025

[2] [2]

A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm.Journal of the royal statistical society: series B (methodological), 39(1):1--22, 1977

work page 1977

[3] [3]

Diffie and M

W. Diffie and M. Hellman. New directions in cryptography.IEEE transactions on Information Theory, 22(6):644--654, 1976

work page 1976

[4] [4]

E. A. Durham, M. Kantarcioglu, Y. Xue, C. Toth, M. Kuzu, and B. Malin. Composite bloom filters for secure record linkage.IEEE transactions on knowledge and data engineering, 26(12):2956--2968, 2013

work page 2013

[5] [5]

K. Frikken. Privacy-preserving set union. InInternational Conference on Applied Cryptography and Network Security, pages 237--252. Springer, 2007

work page 2007

[6] [6]

J. Gao, S. Nguyen, M. Blanton, and N. Trieu. Pulse: Parallel private set union for large-scale entities. Cryptology ePrint Archive, 2025

work page 2025

[7] [7]

J. Gao, S. Nguyen, and N. Trieu. Toward a practical multi-party private set union.Cryptology ePrint Archive, 2023

work page 2023

[8] [8]

Y. Gao, X. Zheng, and C. Hu. A multi-party private set union protocol against malicious adversary. In International Conference on Innovative Computing, pages 159--167. Springer, 2024

work page 2024

[9] [9]

Gkoulalas-Divanis, D

A. Gkoulalas-Divanis, D. Vatsalan, D. Karapiperis, and M. Kantarcioglu. Modern privacy-preserving record linkage techniques: An overview.IEEE Transactions on Information Forensics and Security, 16:4966--4987, 2021

work page 2021

[10] [10]

S. Gopi, P. Gulhane, J. Kulkarni, J. H. Shen, M. Shokouhi, and S. Yekhanin. Differentially private set union. InInternational Conference on Machine Learning, pages 3627--3636. PMLR, 2020

work page 2020

[11] [11]

Y. He, X. Tan, J. Ni, L. T. Yang, and X. Deng. Differentially private set intersection for asymmetrical id alignment.IEEE Transactions on Information Forensics and Security, 17:3479--3494, 2022

work page 2022

[12] [12]

Huang, X

P.-S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. Learning deep structured semantic models for web search using clickthrough data. InProceedings of the 22nd ACM international conference on Information & Knowledge Management, pages 2333--2338, 2013

work page 2013

[13] [13]

Indyk and R

P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. InProceedings of the thirtieth annual ACM symposium on Theory of computing, pages 604--613, 1998

work page 1998

[14] [14]

Jia, S.-F

Y. Jia, S.-F. Sun, H.-S. Zhou, J. Du, and D. Gu. Shuffle-based private set union: Faster and more secure. In31st USENIX Security Symposium (USENIX Security 22), pages 2947--2964, 2022

work page 2022

[15] [15]

Jia, S.-F

Y. Jia, S.-F. Sun, H.-S. Zhou, and D. Gu. Scalable private set union, with stronger security. In33rd USENIX Security Symposium (USENIX Security 24), pages 6471--6488, 2024

work page 2024

[16] [16]

D. M. Jimenez-Gutierrez, Y. Falkouskaya, J. L. Hernandez-Ramos, A. Anagnostopoulos, I. Chatzi- giannakis, and A. Vitaletti. On the security and privacy of federated learning: A survey with attacks, defenses, frameworks, applications, and future directions.arXiv preprint arXiv:2508.13730, 2025

work page arXiv 2025

[17] [17]

Kissner and D

L. Kissner and D. Song. Privacy-preserving set operations. InAnnual International Cryptology Conference, pages 241--257. Springer, 2005

work page 2005

[18] [18]

Kolesnikov, M

V. Kolesnikov, M. Rosulek, N. Trieu, and X. Wang. Scalable private set union from symmetric-key techniques. InInternational Conference on the Theory and Application of Cryptology and Information Security, pages 636--666. Springer, 2019

work page 2019

[19] [19]

R. J. Little and D. B. Rubin.Statistical analysis with missing data. John Wiley & Sons, 2019

work page 2019

[20] [20]

McMahan, E

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas. Communication-efficient learning of deep networks from decentralized data. InArtificial intelligence and statistics, pages 1273--1282. PMLR, 2017

work page 2017

[21] [21]

Patki, R

N. Patki, R. Wedge, and K. Veeramachaneni. The synthetic data vault. In2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pages 399--410, 2016

work page 2016

[22] [22]

D. B. Rubin. Inference and missing data.Biometrika, 63(3):581--592, 1976

work page 1976

[23] [23]

J. L. Schafer.Analysis of incomplete multivariate data. CRC press, 1997. 21 PRIME AI paper

work page 1997

[24] [24]

Schnell, T

R. Schnell, T. Bachteler, and J. Reiher. Privacy-preserving record linkage using bloom filters.BMC medical informatics and decision making, 9(1):1--11, 2009

work page 2009

[25] [25]

J. H. Seo, J. H. Cheon, and J. Katz. Constant-round multi-party private set union using reversed laurent series. InInternational Workshop on Public Key Cryptography, pages 398--412. Springer, 2012

work page 2012

[26] [26]

J. Sun, X. Yang, Y. Yao, A. Zhang, W. Gao, J. Xie, and C. Wang. Vertical federated learning without revealing intersection membership.arXiv preprint:2106.05508, 2021

work page arXiv 2021

[27] [27]

B. Tu, Y. Bai, C. Zhang, Y. Cao, and Y. Chen. Fast enhanced private set union in the balanced and unbalanced scenarios.Cryptology ePrint Archive, 2025

work page 2025

[28] [28]

B. Tu, Y. Chen, Q. Liu, and C. Zhang. Fast unbalanced private set union from fully homomorphic encryption. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pages 2959--2973, 2023

work page 2023

[29] [29]

E. Uzun, S. P. Chung, V. Kolesnikov, A. Boldyreva, and W. Lee. Fuzzy labeled private set intersection with applications to private{Real-Time}biometric search. In30th USENIX Security Symposium (USENIX Security 21), pages 911--928, 2021

work page 2021

[30] [30]

F. Wang, B. Mi, and R. Zeng. Efficient private set intersection for vertical federated learning in iov. In International Conference on Frontiers in Cyber Security, pages 120--130. Springer, 2024

work page 2024

[31] [31]

J. Wang, E. X. Huang, P. Duan, H. Wang, and K.-Y. Lam. Psa: private set alignment for secure and collaborative analytics on large-scale data.IEEE Transactions on Dependable and Secure Computing, 2025

work page 2025

[32] [32]

J. Wen, Z. Zhang, Y. Lan, Z. Cui, J. Cai, and W. Zhang. A survey on federated learning: challenges and applications.International journal of machine learning and cybernetics, 14(2):513--535, 2023

work page 2023

[33] [33]

Y. Xi, Y. Guo, S. Xu, C. Cai, and X. Jia. Private sample alignment for vertical federated learning: An efficient and reliable realization.IEEE Transactions on Information Forensics and Security, 2025

work page 2025

[34] [34]

Q. Yang, Y. Liu, Y. Cheng, Y. Kang, T. Chen, and H. Yu. Horizontal federated learning. InFederated learning, pages 49--67. Springer, 2022

work page 2022

[35] [35]

Zhang, Y

C. Zhang, Y. Chen, W. Liu, L. Peng, M. Hao, A. Wang, and X. Wang. Unbalanced private set union with reduced computation and communication. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, pages 1434--1447, 2024

work page 2024

[36] [36]

Zhang, Y

C. Zhang, Y. Chen, W. Liu, M. Zhang, and D. Lin. Linear private set union from{Multi-Query}reverse private membership test. In32nd USENIX Security Symposium (USENIX Security 23), pages 337--354, 2023

work page 2023

[37] [37]

Z. Zhao, X. Liang, H. Huang, and K. Wang. Deep federated learning hybrid optimization model based on encrypted aligned data.Pattern Recognition, 148:110193, 2024. 22

work page 2024