pith. sign in

arxiv: 2605.26882 · v1 · pith:3VDUE3H2new · submitted 2026-05-26 · 💻 cs.CR

Privacy-Preserving Screening for Record Linkage

Pith reviewed 2026-06-29 16:47 UTC · model grok-4.3

classification 💻 cs.CR
keywords privacy-preserving record linkagescreeningcircuit-PSIoblivious alignmentdata collaborationsecure computationrecord linkagescalability
0
0 comments X

The pith

The Screening-then-Linkage framework adds a lightweight secure screening phase before full privacy-preserving record linkage to handle far larger sets of candidate collaborators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that full privacy-preserving record linkage is too slow and costly when many potential partners must be evaluated, so it introduces a preliminary screening stage that filters candidates using a lighter secure protocol. Appraisal implements this stage with circuit-PSI and a custom Oblivious Attribute/Feature Alignment protocol that supports approximate and schema-aware comparisons while lowering communication costs. If the approach holds, data owners can assess collaboration value across hundreds of times more candidates without violating privacy rules. Readers would care because regulatory constraints on data sharing make scalable screening necessary for real data markets to function.

Core claim

The authors establish the Screening-then-Linkage framework and realize it in Appraisal, a circuit-PSI system for privacy-preserving record screening. The Oblivious Attribute/Feature Alignment protocol reconciles approximate matching requirements with circuit-PSI's symmetric-function limits, cutting communication by a factor of 14. Rigorous analysis and experiments show Appraisal accommodates up to 850 times more records than the prior PPRS system SFour under identical constraints and runs 165 times faster than state-of-the-art PPRL, confirming that the screening stage substantially reduces the time to identify valuable collaborators from large pools.

What carries the argument

Screening-then-Linkage framework that runs circuit-PSI-based privacy-preserving record screening first, using the Oblivious Attribute/Feature Alignment protocol to support non-symmetric comparisons before full linkage.

If this is right

  • The framework reduces overall computation time needed to find the most valuable collaborators from large candidate pools.
  • Appraisal supports 850 times more records than the prior PPRS system within the same resource limits.
  • The alignment protocol lowers communication costs by a factor of 14 relative to conventional methods.
  • Security guarantees hold while effectiveness and efficiency are demonstrated in comprehensive evaluations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The hybrid screening approach could be adapted to other secure multi-party tasks that currently scale poorly with many candidates.
  • Data markets might adopt such filters to shortlist partners quickly before investing in full linkages.
  • Accuracy-speed trade-offs could be quantified further by testing on additional real-world datasets with known match distributions.

Load-bearing premise

The screening phase must retain low false-negative rates for valuable matches even though it is restricted to symmetric functions from circuit-PSI.

What would settle it

Running Appraisal on a dataset with known valuable collaborator pairs and measuring whether the screening phase discards more than a small fraction of those pairs would show whether accuracy is preserved at scale.

Figures

Figures reproduced from arXiv: 2605.26882 by Chenyu Huang, Danqing Huang, Fan Zhang, Huaming Rao, Huangxun Chen, Peng Chen, Yongjun Zhao.

Figure 1
Figure 1. Figure 1: Screening-then-Linkage Framework. the number of linked records with each potential candidate: if the number is too small, it is a waste of time to compute the exact linking record pairs. With this observation, we propose a new framework for data collaboration called Screening-then￾Linkage ( [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An example of Appraisal. The records marked in orange right after the feature engineering module are linked [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: An example of the information leakage if [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The structure of 1-switch. ri , rj , rk, rl are the random wire labels generated by the receiver, b is the selection bit from the sender, xi and xj are receiver’s input, yi and yj are the blinded output to the sender. bijection mapping (i.e., one-one and onto). Both OPN and ORN are built on the 1-switch, which replicates/permutes the adjacent input value for both parties based on a selection bit, as [PITH… view at source ↗
Figure 5
Figure 5. Figure 5: The workflow of Oblivious Feature Alignment (OFA) protocol. “*” represents a random fake entry, the attribute value [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Ideal functionality and protocol of Appraisal Cuckoo hashing and has access to π C . It can adjust the score of the records having the missing attribute value. Taking the linear/logistic model as an example, we denote the weight for missing values as w m j for attribute j. After both parties have computed ⟨s⟩ A, if attribute value V¯ [i, j] of record i is absent, P0 can adjust the score via ⟨s⟩ A 0 [i] = ⟨… view at source ↗
Figure 8
Figure 8. Figure 8: Accuracy across various datasets, with P0 utilizing one set of data from a given pair. TABLE III: Runtime of Appraisal with 1/4 threads in seconds. Schema-aware Schema-agnostic dataset LAN WAN LAN WAN iDash500K 610/155 675/248 / / iDash1M 1274/386 1331/492 / / DBLP 21/8 217/66 8/3 85/27 ACM 21/7 213/62 8/3 83/26 BNB 5032/1710 5402/2025 2664/846 2861/990 TPL 20165/9328 21317/9934 10558/4559 11281/4855 TPL r… view at source ↗
Figure 9
Figure 9. Figure 9: Running time (in seconds) of OEP, optimization 1, optimization 1+2, and OFA with 1 or 4 threads. [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
read the original abstract

In an era dominated by big data and machine learning, establishing valuable data collaboration has never been more critical. However, such collaborations must operate under regulatory and legal constraints. Two-party Privacy-Preserving Record Linkage (PPRL) emerges to assess the potential collaboration value and also ensure the privacy and security of the involved data. Nevertheless, the substantial computational and communication overheads associated with PPRL hinder its practical adoption in data markets with numerous potential collaborators. Therefore, we present the Screening-then-Linkage framework, which incorporates a lightweight Screening phase prior to the resource-intensive PPRL phase, i.e., PPRS, to mitigate the scalability issue of PPRL. We propose a circuit-PSI-based system, named Appraisal to realize a secure, effective, and efficient PPRS. To reconcile the approximate matching and/or schema-aware setting required in PPRS with the limitations of the circuit-PSI supporting only symmetric functions, we propose a more communication-efficient secure permutation, i.e., Oblivious Attribute/Feature Alignment protocol tailored for PPRS. This protocol supports a broader range of comparison functions and significantly improves efficiency, i.e., reducing communication costs by a factor of 14 compared to the conventional protocol. Our rigorous analysis and comprehensive empirical evaluations demonstrate the security, effectiveness, and efficiency of Appraisal. Appraisal can accommodate up to $850\times$ more records than the SOTA PPRS system, SFour, within the same constraints. Moreover, it is $165 \times$ faster than SOTA PPRL, indicating the Screening-then-Linkage framework substantially decreases the computation time required to identify the most valuable collaborators from a large pool of candidates.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes a Screening-then-Linkage framework to address scalability limitations in two-party Privacy-Preserving Record Linkage (PPRL). It introduces Appraisal, a circuit-PSI-based system for Privacy-Preserving Record Screening (PPRS), and an Oblivious Attribute/Feature Alignment protocol to enable approximate/schema-aware matching under circuit-PSI's symmetric-function restriction. The central claims are that Appraisal supports up to 850× more records than the SOTA PPRS system SFour under equivalent constraints, achieves 165× speedup over SOTA PPRL, reduces communication by 14× via the new protocol, and provides security/effectiveness/efficiency via rigorous analysis and empirical evaluations.

Significance. If the empirical scalability claims hold after supplying the missing quantitative bounds, the Screening-then-Linkage approach could meaningfully expand practical deployment of PPRL in data markets by allowing larger candidate pools to be screened before full linkage. The Oblivious Attribute/Feature Alignment protocol's reported 14× communication reduction is a concrete efficiency contribution that stands independently of the headline multipliers.

major comments (2)
  1. [Abstract] Abstract: The headline claim that Appraisal accommodates up to 850× more records than SFour (and 165× faster than SOTA PPRL) is load-bearing for the Screening-then-Linkage framework, yet the screening phase supplies no closed-form bound on recall loss, no worst-case similarity threshold, and no dataset property (e.g., match-score distribution or schema heterogeneity) that would guarantee the filtered set retains the top collaborators. Without this, the capacity multiplier remains conditional on unstated empirical behavior.
  2. [Abstract] Abstract / empirical evaluations section: The assertions of 'rigorous analysis and comprehensive empirical evaluations' supporting the 850× and 165× gains are presented without visible error bars, full dataset descriptions, baseline implementation details, or explicit false-negative rates for the screening phase, leaving the central performance claims on unverified empirical statements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on the abstract claims. We address each major comment below and will make revisions to improve clarity on the empirical basis of our results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline claim that Appraisal accommodates up to 850× more records than SFour (and 165× faster than SOTA PPRL) is load-bearing for the Screening-then-Linkage framework, yet the screening phase supplies no closed-form bound on recall loss, no worst-case similarity threshold, and no dataset property (e.g., match-score distribution or schema heterogeneity) that would guarantee the filtered set retains the top collaborators. Without this, the capacity multiplier remains conditional on unstated empirical behavior.

    Authors: The 850× and 165× figures are empirical results obtained under the specific datasets, similarity thresholds, and match-score distributions described in the evaluation section. We acknowledge that a general closed-form bound on recall loss is not feasible, as it depends on the underlying data distribution and schema heterogeneity, which are application-specific. In the revision we will explicitly restate these dataset properties, the chosen similarity thresholds, and the observed false-negative rates in both the abstract and the main text so that the conditional nature of the multipliers is clear. revision: yes

  2. Referee: [Abstract] Abstract / empirical evaluations section: The assertions of 'rigorous analysis and comprehensive empirical evaluations' supporting the 850× and 165× gains are presented without visible error bars, full dataset descriptions, baseline implementation details, or explicit false-negative rates for the screening phase, leaving the central performance claims on unverified empirical statements.

    Authors: We will revise the manuscript to make the supporting details more prominent: error bars will be added to all performance plots, full dataset descriptions and preprocessing steps will be expanded, baseline implementation details (including library versions and hardware) will be listed in a dedicated table, and explicit false-negative rates for the screening phase will be reported alongside the headline multipliers. These elements exist in the full evaluation but will be highlighted in the abstract and evaluation section for easier verification. revision: yes

Circularity Check

0 steps flagged

No circularity: scalability claims are empirical comparisons to external SOTA systems

full rationale

The paper's central results (850× capacity vs SFour, 165× speedup vs prior PPRL) are presented as outcomes of empirical evaluations and analysis of the Screening-then-Linkage framework plus the new Oblivious Attribute/Feature Alignment protocol. No equations, fitted parameters, or self-citations are shown that reduce these multipliers to internal definitions or prior author work by construction. The protocol is introduced to address circuit-PSI limitations, with efficiency gains (e.g., 14× communication reduction) claimed via direct comparison rather than tautology. This is a standard non-circular empirical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no explicit free parameters, axioms, or invented entities are stated; the work relies on standard cryptographic primitives (circuit-PSI) whose security properties are assumed from prior literature.

pith-pipeline@v0.9.1-grok · 5839 in / 1235 out tokens · 42814 ms · 2026-06-29T16:47:51.793683+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 1 canonical work pages · 1 internal anchor

  1. [1]

    Towards distribution-aware query answer- ing in data markets,

    A. Asudeh and F. Nargesian, “Towards distribution-aware query answer- ing in data markets,”Proceedings of the VLDB Endowment, vol. 15, no. 11, pp. 3137–3144, 2022

  2. [2]

    Deeper: A data enrichment system powered by deep web,

    P. Wang, Y . He, R. Shea, J. Wang, and E. Wu, “Deeper: A data enrichment system powered by deep web,” inProceedings of the 2018 International Conference on Management of Data, 2018, pp. 1801–1804

  3. [3]

    Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption

    S. Hardy, W. Henecka, H. Ivey-Law, R. Nock, G. Patrini, G. Smith, and B. Thorne, “Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption,”arXiv preprint arXiv:1711.10677, 2017

  4. [4]

    The impact of record linkage on learning from feature partitioned data,

    R. Nock, S. Hardy, W. Henecka, H. Ivey-Law, J. Nabaglo, G. Patrini, G. Smith, and B. Thorne, “The impact of record linkage on learning from feature partitioned data,” inInternational Conference on Machine Learning. PMLR, 2021, pp. 8216–8226

  5. [5]

    Fincom, modernizing banking technologies - real time aml sanctions screening & entity resolution,

    Fincom, “Fincom, modernizing banking technologies - real time aml sanctions screening & entity resolution,” 2016, available at: https://fincom.co (2025)

  6. [6]

    Cohort profile: the swiss childhood cancer survivor study,

    C. E. Kuehni, C. S. Rueegg, G. Michel, C. E. Rebholz, M.-P. F. Strippoli, F. K. Niggli, M. Egger, N. X. von der Weid, and S. P. O. G. (SPOG), “Cohort profile: the swiss childhood cancer survivor study,” International journal of epidemiology, vol. 41, no. 6, pp. 1553–1564, 2012

  7. [7]

    Datavant match: High accuracy, fit-for-purpose record linkage for any research need,

    Datavant, “Datavant match: High accuracy, fit-for-purpose record linkage for any research need,” 2023, available at: https://datavant.com/resources/product-sheet/matching-patient-records- with-datavant (2025)

  8. [8]

    Amazon publisher service,

    Amazon, “Amazon publisher service,” 2024, available at: https://aps.amazon.com/aps/index.html (2025)

  9. [9]

    Unlock data insights across multi-party datasets using aws entity resolution on aws clean rooms without sharing underlying data,

    Amazon, “Unlock data insights across multi-party datasets using aws entity resolution on aws clean rooms without sharing underlying data,” 2024, available at: https://aws.amazon.com/blogs/industries/unlock- data-insights-across-multi-party-datasets-using-aws-entity-resolution- on-aws-clean-rooms-without-sharing-underlying-data (2025)

  10. [10]

    Datavant - a data platform company for healthcare,

    Datavant, “Datavant - a data platform company for healthcare,” 2025, available at: https://www.datavant.com (2025)

  11. [11]

    Datavant connect: Linkage solutions,

    Datavant, “Datavant connect: Linkage solutions,” 2025, available at: https://www.datavant.com/products/connect-linkage (2025)

  12. [12]

    Datavant - data and analytics thought leader: Paul petraro, boehringer ingelheim,

    Datavant, “Datavant - data and analytics thought leader: Paul petraro, boehringer ingelheim,” 2022, available at: https://www.datavant.com/real-world-data-rwd/data-analytics-thought- leader-series-paul-petraro-boehringer-ingelheim (2025)

  13. [13]

    Privacy- preserving record linkage for cardinality counting,

    N. Wu, D. Vatsalan, M. A. K ˆaafar, and S. K. Ramesh, “Privacy- preserving record linkage for cardinality counting,” inProceedings of the 2023 ACM Asia Conference on Computer and Communications Security, ASIA CCS 2023, Melbourne, VIC, Australia, July 10-14, 2023. ACM, 2023, pp. 53–64

  14. [14]

    Privacy-preserving similarity coefficients for binary data,

    K.-S. Wong and M. H. Kim, “Privacy-preserving similarity coefficients for binary data,”Computers & Mathematics with Applications, vol. 65, no. 9, pp. 1280–1290, 2013

  15. [15]

    Secure approximate string matching for privacy-preserving record linkage,

    A. Essex, “Secure approximate string matching for privacy-preserving record linkage,”IEEE transactions on information forensics and secu- rity, vol. 14, no. 10, pp. 2623–2632, 2019

  16. [16]

    Resilient identity crime detection,

    C. Phua, K. Smith-Miles, V . Lee, and R. Gayler, “Resilient identity crime detection,”IEEE transactions on knowledge and data engineering, vol. 24, no. 3, pp. 533–546, 2010

  17. [17]

    Modern privacy-preserving record linkage techniques: An overview,

    A. Gkoulalas-Divanis, D. Vatsalan, D. Karapiperis, and M. Kantarcioglu, “Modern privacy-preserving record linkage techniques: An overview,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 4966–4987, 2021

  18. [18]

    Sfour: a protocol for cryptographi- cally secure record linkage at scale,

    B. Khurram and F. Kerschbaum, “Sfour: a protocol for cryptographi- cally secure record linkage at scale,” in2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 2020, pp. 277–288

  19. [19]

    Privacy-preserving record linkage using local sensitive hash and private set intersection,

    A. Adir, E. Aharoni, N. Drucker, E. Kushnir, R. Masalha, M. Mirkin, and O. Soceanu, “Privacy-preserving record linkage using local sensitive hash and private set intersection,” inApplied Cryptography and Network Security Workshops: ACNS 2022 Satellite Workshops, AIBlock, AIHWS, AIoTS, CIMSS, Cloud S&P , SCI, SecMT, SiMLA, Rome, Italy, June 20– 23, 2022, P...

  20. [20]

    Cryptographically secure private record linkage using locality-sensitive hashing,

    R. Wei and F. Kerschbaum, “Cryptographically secure private record linkage using locality-sensitive hashing,”Proceedings of the VLDB Endowment, vol. 17, no. 2, pp. 79–91, 2023

  21. [21]

    Composing differential privacy and secure computation: A case study on scaling private record linkage,

    X. He, A. Machanavajjhala, C. Flynn, and D. Srivastava, “Composing differential privacy and secure computation: A case study on scaling private record linkage,” inProceedings of the 2017 ACM SIGSAC conference on computer and communications security, 2017, pp. 1389– 1406

  22. [22]

    Efficient circuit- based psi with linear communication,

    B. Pinkas, T. Schneider, O. Tkachenko, and A. Yanai, “Efficient circuit- based psi with linear communication,” inAdvances in Cryptology– EUROCRYPT 2019: 38th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Darmstadt, Ger- many, May 19–23, 2019, Proceedings, Part III 38. Springer, 2019, pp. 122–153

  23. [23]

    Circuit-psi with linear complexity via relaxed batch opprf,

    N. Chandran, D. Gupta, and A. Shah, “Circuit-psi with linear complexity via relaxed batch opprf,”Proceedings on Privacy Enhancing Technolo- gies, no. 1, pp. 353–372, 2022

  24. [24]

    V ole-psi: fast oprf and circuit-psi from vector-ole,

    P. Rindal and P. Schoppmann, “V ole-psi: fast oprf and circuit-psi from vector-ole,” inAdvances in Cryptology–EUROCRYPT 2021: 40th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Zagreb, Croatia, October 17–21, 2021, Pro- ceedings, Part II. Springer, 2021, pp. 901–930

  25. [25]

    How to hide circuits in mpc an efficient framework for private function evaluation,

    P. Mohassel and S. Sadeghian, “How to hide circuits in mpc an efficient framework for private function evaluation,” inAdvances in Cryptology– EUROCRYPT 2013: 32nd Annual International Conference on the Theory and Applications of Cryptographic Techniques, Athens, Greece, May 26-30, 2013. Proceedings 32. Springer, 2013, pp. 557–574

  26. [26]

    An analysis of one-to-one matching algorithms for entity resolution,

    G. Papadakis, V . Efthymiou, E. Thanos, O. Hassanzadeh, and P. Christen, “An analysis of one-to-one matching algorithms for entity resolution,” The VLDB Journal, vol. 32, no. 6, pp. 1369–1400, 2023

  27. [27]

    Benchmarking filtering techniques for entity resolution,

    G. Papadakis, M. Fisichella, F. Schoger, G. Mandilaras, N. Augsten, and W. Nejdl, “Benchmarking filtering techniques for entity resolution,” in 2023 IEEE 39th International Conference on Data Engineering (ICDE). IEEE, 2023

  28. [28]

    Approximate nearest neighbors: towards removing the curse of dimensionality,

    P. Indyk and R. Motwani, “Approximate nearest neighbors: towards removing the curse of dimensionality,” inProceedings of the thirtieth annual ACM symposium on Theory of computing, 1998, pp. 604–613

  29. [29]

    Practical and optimal lsh for angular distance,

    A. Andoni, P. Indyk, T. Laarhoven, I. Razenshteyn, and L. Schmidt, “Practical and optimal lsh for angular distance,”Advances in neural information processing systems, vol. 28, 2015

  30. [30]

    Goldreich,The Foundations of Cryptography - Volume 2: Basic Applications

    O. Goldreich,The Foundations of Cryptography - Volume 2: Basic Applications. Cambridge University Press, 2004

  31. [31]

    How to exchange secrets by oblivious transfer,

    M. RABIN, “How to exchange secrets by oblivious transfer,”Tech. Memo TR-81, Aiken Computation Laboratory, Harvard University, 1981

  32. [32]

    Aby-a framework for efficient mixed-protocol secure two-party computation

    D. Demmler, T. Schneider, and M. Zohner, “Aby-a framework for efficient mixed-protocol secure two-party computation.” inNDSS, 2015

  33. [33]

    Efficient multiparty protocols using circuit randomization,

    D. Beaver, “Efficient multiparty protocols using circuit randomization,” inAdvances in Cryptology—CRYPTO’91: Proceedings 11. Springer, 1992, pp. 420–432

  34. [34]

    Cryptflow2: Practical 2-party secure inference,

    D. Rathee, M. Rathee, N. Kumar, N. Chandran, D. Gupta, A. Rastogi, and R. Sharma, “Cryptflow2: Practical 2-party secure inference,” in Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, 2020, pp. 325–342

  35. [35]

    Secret-shared shuffle,

    M. Chase, E. Ghosh, and O. Poburinnaya, “Secret-shared shuffle,” inAd- vances in Cryptology–ASIACRYPT 2020: 26th International Conference on the Theory and Application of Cryptology and Information Security, Daejeon, South Korea, December 7–11, 2020, Proceedings, Part III 26. Springer, 2020, pp. 342–372

  36. [36]

    Large-scale secure xgb for vertical federated learning,

    W. Fang, D. Zhao, J. Tan, C. Chen, C. Yu, L. Wang, L. Wang, J. Zhou, and B. Zhang, “Large-scale secure xgb for vertical federated learning,” in Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 2021, pp. 443–452

  37. [37]

    Cheetah: Lean and fast secure{two-party}deep neural network inference,

    Z. Huang, W.-j. Lu, C. Hong, and J. Ding, “Cheetah: Lean and fast secure{two-party}deep neural network inference,” in31st USENIX Security Symposium (USENIX Security 22), 2022, pp. 809–826

  38. [38]

    Shuffle-based private set union: Faster and more secure,

    Y . Jia, S.-F. Sun, H.-S. Zhou, J. Du, and D. Gu, “Shuffle-based private set union: Faster and more secure,” in31st USENIX Security Symposium (USENIX Security 22), 2022, pp. 2947–2964

  39. [39]

    Private set operations from oblivious switching,

    G. Garimella, P. Mohassel, M. Rosulek, S. Sadeghian, and J. Singh, “Private set operations from oblivious switching,” inIACR International Conference on Public-Key Cryptography. Springer, 2021, pp. 591–617

  40. [40]

    iDASH Privacy & Security Workshop. Se- cure genome analysis competition,

    iDash, “iDASH Privacy & Security Workshop. Se- cure genome analysis competition,” 2022, available at: http://www.humangenomeprivacy.org/2022/ (2025)

  41. [41]

    Evaluation of entity resolution approaches on real-world match problems,

    H. K ¨opcke, A. Thor, and E. Rahm, “Evaluation of entity resolution approaches on real-world match problems,”Proceedings of the VLDB Endowment, vol. 3, no. 1-2, pp. 484–493, 2010

  42. [42]

    Github - gpapadis/continuousfilteringbenchmark: Continuous benchmark of filtering methods for entity resolution,

    G. Papadakis, M. Fisichella, F. Schoger, G. Mandilaras, N. Augsten, and W. Nejdl, “Github - gpapadis/continuousfilteringbenchmark: Continuous benchmark of filtering methods for entity resolution,” 2022, available at:https://github.com/gpapadis/ContinuousFilteringBenchmark/ (2025)

  43. [43]

    British national bibliography,

    The British Library, “British national bibliography,” 2023, available at: https://www.bl.uk/collection-metadata/metadata-services (2025)

  44. [44]

    The toronto public library open dataset,

    The Toronto Public Library, “The toronto public library open dataset,” 2023, available at: http://www.torontopubliclibrary.ca/opendata/ (2025)

  45. [45]

    Strengthening privacy- preserving record linkage using diffusion,

    F. Armknecht, Y . Heng, and R. Schnell, “Strengthening privacy- preserving record linkage using diffusion,”Proceedings on Privacy Enhancing Technologies, vol. 2, pp. 298–311, 2023

  46. [46]

    Github - youzheheng/2022 popets: Strengthening privacy- preserving record linkage using diffusion,

    “Github - youzheheng/2022 popets: Strengthening privacy- preserving record linkage using diffusion,” 2022, available at: https://github.com/youzheheng/2022 PoPETS (2025)

  47. [47]

    Perfectly secure and efficient two-party electronic-health-record linkage,

    F. Chen, X. Jiang, S. Wang, L. M. Schilling, D. Meeker, T. Ong, M. E. Matheny, J. N. Doctor, L. Ohno-Machado, and J. Vaidya, “Perfectly secure and efficient two-party electronic-health-record linkage,”IEEE internet computing, vol. 22, no. 2, pp. 32–41, 2018

  48. [48]

    Private record matching using differential privacy,

    A. Inan, M. Kantarcioglu, G. Ghinita, and E. Bertino, “Private record matching using differential privacy,” inProceedings of the 13th Interna- tional Conference on Extending Database Technology, 2010, pp. 123– 134

  49. [49]

    A hybrid private record linkage scheme: Separating differentially private synopses from matching records,

    J. Cao, F.-Y . Rao, E. Bertino, and M. Kantarcioglu, “A hybrid private record linkage scheme: Separating differentially private synopses from matching records,” in2015 IEEE 31st International Conference on Data Engineering. IEEE, 2015, pp. 1011–1022

  50. [50]

    Efficient privacy-aware record integration,

    M. Kuzu, M. Kantarcioglu, A. Inan, E. Bertino, E. Durham, and B. Malin, “Efficient privacy-aware record integration,” inProceedings of the 16th International Conference on Extending Database Technology, 2013, pp. 167–178

  51. [51]

    Hybrid private record linkage: Separating differentially private synopses from matching records,

    F.-Y . Rao, J. Cao, E. Bertino, and M. Kantarcioglu, “Hybrid private record linkage: Separating differentially private synopses from matching records,”ACM Transactions on Privacy and Security (TOPS), vol. 22, no. 3, pp. 1–36, 2019

  52. [52]

    Privacy-preserving record linkage using bloom filters,

    R. Schnell, T. Bachteler, and J. Reiher, “Privacy-preserving record linkage using bloom filters,”BMC medical informatics and decision making, vol. 9, no. 1, pp. 1–11, 2009

  53. [53]

    Application of privacy- preserving techniques in operational record linkage centres,

    J. H. Boyd, S. M. Randall, and A. M. Ferrante, “Application of privacy- preserving techniques in operational record linkage centres,”Medical data privacy handbook, pp. 267–287, 2015

  54. [54]

    A constraint satisfaction cryptanalysis of bloom filters in private record linkage,

    M. Kuzu, M. Kantarcioglu, E. Durham, and B. Malin, “A constraint satisfaction cryptanalysis of bloom filters in private record linkage,” inPrivacy Enhancing Technologies: 11th International Symposium, PETS 2011, Waterloo, ON, Canada, July 27-29, 2011. Proceedings 11. Springer, 2011, pp. 226–245

  55. [55]

    Cryptanalysis of basic bloom filters used for privacy preserving record linkage,

    F. Niedermeyer, S. Steinmetzer, M. Kroll, and R. Schnell, “Cryptanalysis of basic bloom filters used for privacy preserving record linkage,” German Record Linkage Center, Working Paper Series, No. WP-GRLC- 2014-04, 2014

  56. [56]

    Efficient pattern mining based cryptanalysis for privacy-preserving record link- age,

    A. Vidanage, T. Ranbaduge, P. Christen, and R. Schnell, “Efficient pattern mining based cryptanalysis for privacy-preserving record link- age,” in2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 2019, pp. 1698–1701

  57. [57]

    A graph matching attack on privacy-preserving record linkage,

    A. Vidanage, P. Christen, T. Ranbaduge, and R. Schnell, “A graph matching attack on privacy-preserving record linkage,” inProceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020, pp. 1485–1494

  58. [58]

    Linking sensitive data,

    P. Christen, T. Ranbaduge, and R. Schnell, “Linking sensitive data,” Methods and techniques for practical privacy-preserving information sharing. Cham: Springer, 2020

  59. [59]

    Securing bloom filters for privacy- preserving record linkage,

    T. Ranbaduge and R. Schnell, “Securing bloom filters for privacy- preserving record linkage,” inProceedings of the 29th ACM Interna- tional Conference on Information & Knowledge Management, 2020, pp. 2185–2188

  60. [60]

    Evaluation of hardening techniques for privacy-preserving record linkage

    M. Franke, Z. Sehili, F. Rohde, and E. Rahm, “Evaluation of hardening techniques for privacy-preserving record linkage.” inEDBT, 2021, pp. 289–300

  61. [61]

    A taxonomy of privacy- preserving record linkage techniques,

    D. Vatsalan, P. Christen, and V . S. Verykios, “A taxonomy of privacy- preserving record linkage techniques,”Information Systems, vol. 38, no. 6, pp. 946–969, 2013

  62. [62]

    Privacy-preserving record linkage for big data: Current approaches and research challenges,

    D. Vatsalan, Z. Sehili, P. Christen, and E. Rahm, “Privacy-preserving record linkage for big data: Current approaches and research challenges,” Handbook of big data technologies, pp. 851–895, 2017