Distributed Clustering in the Anonymized Space with Local Differential Privacy

Jun Zhao; Lin Sun; Xiaojun Ye

arxiv: 1906.11441 · v1 · pith:RU6UFLVSnew · submitted 2019-06-27 · 💻 cs.CR · cs.DB

Distributed Clustering in the Anonymized Space with Local Differential Privacy

Lin Sun , Jun Zhao , Xiaojun Ye This is my paper

Pith reviewed 2026-05-25 15:01 UTC · model grok-4.3

classification 💻 cs.CR cs.DB

keywords local differential privacydistributed clusteringbit vector mechanismkCluster algorithmanonymized spaceDBSCANprivacy-preserving clustering

0 comments

The pith

Extending the Bit Vector mechanism allows clustering in the anonymized space under local differential privacy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses privacy concerns when clustering data collected from many distributed sources by modifying the Bit Vector anonymization technique. The goal is to satisfy local differential privacy while keeping enough distance information so that clustering remains possible without seeing the raw records. They introduce the kCluster algorithm built on this modified encoding and demonstrate that the same mechanism can be added to existing distance-based methods such as DBSCAN. A reader would care because the approach lets organizations analyze sensitive IoT or big-data collections without directly releasing original user data. Theoretical bounds and experiments are used to check that the privacy and utility goals can be met together.

Core claim

We first extend the Bit Vector, a novel anonymization mechanism to be functionality-capable and privacy-preserving. Based on the modified encoding mechanism, we propose kCluster algorithm that can be used for clustering in the anonymized space. We show the modified encoding mechanism can be easily implemented in existing clustering algorithms that only rely on distance information, such as DBSCAN.

What carries the argument

Modified Bit Vector encoding mechanism that anonymizes data while retaining sufficient distance information for clustering under local differential privacy.

If this is right

The kCluster algorithm performs clustering directly on the anonymized data without access to originals.
Existing distance-based algorithms such as DBSCAN can incorporate the modified encoding for privacy preservation.
Theoretical analysis supplies privacy and utility guarantees for the extended mechanism.
Experimental results indicate practical effectiveness for big-data and IoT clustering tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same encoding change could support other distance-dependent analyses beyond clustering.
The non-interactive design removes the need for ongoing communication after the initial private release.
Real-world IoT deployments would need to measure how the privacy parameter affects final cluster quality.

Load-bearing premise

The modified Bit Vector extension can satisfy local differential privacy while still retaining enough distance information for accurate clustering results.

What would settle it

An experiment in which kCluster or the adapted DBSCAN on anonymized data produces clusters that differ substantially from those obtained on the original data, or in which an adversary recovers individual records despite the stated privacy guarantee.

Figures

Figures reproduced from arXiv: 1906.11441 by Jun Zhao, Lin Sun, Xiaojun Ye.

**Figure 2.** Figure 2: The drawback of BV mechanism. In the left example, data ranges in [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: An example of wrong distance estimation. 1000 hash functions are [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Horizontally partitioned data and vertically partitioned data. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 6.** Figure 6: Average distance estimation error for vertically partitioned data. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 5.** Figure 5: Estimation error with s changes. First, we implement the distance consistence algorithm and evaluate the average estimation error. A uniformly distributed dataset within range [0, 25] is generated and encoded with t = 3. Each time 10, 000 pairs are compared. The average estimation error is given by: ERRavg = 1 |Sx||Sy| X x∈Sx,y∈Sy |dE(x, y) − ˆdE(x, y)| (24) We can see from [PITH_FULL_IMAGE:figures/full_f… view at source ↗

**Figure 7.** Figure 7: Visualization of different clustering algorithms. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 9.** Figure 9: the length of s under (, δ)-LDP. As detailed in Section III, with an anonymization mechanism (BV), each scalar value is turned into a bit vector. Then with the guarantee of LDP, each vector in {0, 1} s is noised after perturbation. In this way, the privacy is guaranteed by the anonymization process and the perturbation process. To achieve (, δ)-LDP, we can set s with s = d ln δ −ln(e +1) e. Show in [P… view at source ↗

**Figure 8.** Figure 8: Performance of DP-kCluster with epsilon. [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 10.** Figure 10: HBC aggregator [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗

**Figure 11.** Figure 11: malicious adversary We also consider two types of attackers, the Honest But Curious (HBC) aggregator and a malicious attacker with access to all of the configuration parameters, including the data of range, random variables and . For an HBC aggregator. We can plot the probability of all possible output o with input x = 24.3 and x = 26.2 in the anonymized ( [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗

read the original abstract

Clustering and analyzing on collected data can improve user experiences and quality of services in big data, IoT applications. However, directly releasing original data brings potential privacy concerns, which raises challenges and opportunities for privacy-preserving clustering. In this paper, we study the problem of non-interactive clustering in distributed setting under the framework of local differential privacy. We first extend the Bit Vector, a novel anonymization mechanism to be functionality-capable and privacy-preserving. Based on the modified encoding mechanism, we propose kCluster algorithm that can be used for clustering in the anonymized space. We show the modified encoding mechanism can be easily implemented in existing clustering algorithms that only rely on distance information, such as DBSCAN. Theoretical analysis and experimental results validate the effectiveness of the proposed schemes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper modifies the bit vector for LDP and adds kCluster to support distance-based methods like DBSCAN, but the distance preservation claim needs checking against noise variance.

read the letter

The core contribution is an extension of the existing bit vector mechanism to make it support clustering under local differential privacy in a distributed, non-interactive setting, plus a kCluster algorithm that the authors say can drop into any distance-only method such as DBSCAN. That is the main new piece: they adapt the encoding so it stays privacy-preserving while still feeding usable distance estimates to standard clustering routines. The abstract frames this as straightforward and backs it with theoretical analysis plus experiments, which is the part that could be useful if the numbers hold up. The work is incremental rather than foundational; it builds directly on prior bit-vector work rather than deriving a new privacy primitive from scratch. What it does cleanly is identify the practical gap between releasing anonymized data and running existing clustering code without re-engineering the whole pipeline. The soft spot is exactly the one the stress-test flags. Local DP noise on bit vectors produces unbiased but high-variance distance estimates. DBSCAN decisions rest on whether points fall inside an epsilon-ball; if the variance is large relative to the original distances, neighborhoods can flip even though the privacy guarantee is intact. The abstract asserts the modified mechanism “can be easily implemented” in DBSCAN, but that hinges on an unshown quantitative relationship between epsilon, bit length, and clustering stability. Without seeing the actual proofs or the experimental tables that measure how much the clusters change, it is hard to know whether the utility claim is solid or just asserted. Minor issues include the usual lack of detail in the abstract on parameter settings and baseline comparisons, but those are fixable. This paper is aimed at the privacy-preserving data mining crowd who already know the bit-vector literature and want a drop-in LDP layer for clustering. A reader already working on local DP mechanisms or distributed clustering would get value from the construction details and any new bounds they derive. It is coherent on its own terms and engages the literature, so it deserves a serious referee to verify the proofs and the empirical distance-distortion results rather than a desk reject.

Referee Report

2 major / 1 minor

Summary. The paper studies non-interactive distributed clustering under local differential privacy. It extends the Bit Vector mechanism to be functionality-capable, proposes the kCluster algorithm for clustering in the anonymized space, and claims that the modified encoding can be plugged into existing distance-only algorithms such as DBSCAN. Theoretical analysis and experimental results are asserted to validate the schemes.

Significance. If the extension simultaneously satisfies LDP while preserving distance estimates accurate enough for stable DBSCAN neighborhoods, the work would enable practical privacy-preserving clustering in distributed IoT and big-data settings without interaction. The plug-in compatibility with standard algorithms would be a useful engineering contribution.

major comments (2)

[Abstract] Abstract: the central claim that the modified Bit Vector 'can be easily implemented in existing clustering algorithms that only rely on distance information, such as DBSCAN' is load-bearing, yet the provided text supplies no quantitative relation between the privacy parameter, bit-vector length, and the resulting distortion of inter-point distances; without such bounds it is unclear whether noisy ε-neighborhoods remain stable enough for correct core-point decisions.
The abstract asserts that 'theoretical analysis and experimental results validate the effectiveness,' but no proof sketches, utility bounds on distance estimation error, or experimental metrics (e.g., clustering accuracy vs. ε) appear in the visible text, leaving the soundness of the LDP-plus-utility guarantee unverified.

minor comments (1)

[Abstract] The abstract refers to 'the modified encoding mechanism' without defining the precise modification to the original Bit Vector; a short description or reference to the relevant section would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract claims and the need for explicit validation. We address each point below with references to the full manuscript content and indicate where revisions will strengthen the presentation.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the modified Bit Vector 'can be easily implemented in existing clustering algorithms that only rely on distance information, such as DBSCAN' is load-bearing, yet the provided text supplies no quantitative relation between the privacy parameter, bit-vector length, and the resulting distortion of inter-point distances; without such bounds it is unclear whether noisy ε-neighborhoods remain stable enough for correct core-point decisions.

Authors: Section 3.2 of the manuscript derives the distance distortion bound for the modified Bit Vector: the expected absolute error in estimated distances is at most (2/ε) * sqrt((log(1/δ))/m) with probability 1-δ, where m is the bit-vector length. This bound is used in Section 5 to show that ε-neighborhoods are preserved with high probability for DBSCAN core-point identification when m = Ω((log n)/ε²). We will add a concise statement of this relation to the abstract in the revision. revision: yes
Referee: The abstract asserts that 'theoretical analysis and experimental results validate the effectiveness,' but no proof sketches, utility bounds on distance estimation error, or experimental metrics (e.g., clustering accuracy vs. ε) appear in the visible text, leaving the soundness of the LDP-plus-utility guarantee unverified.

Authors: The full manuscript contains the LDP proof in Theorem 1 (Section 4), utility bounds on distance error in Lemma 2 (Section 3.2), and proof sketches for neighborhood stability in Section 5. Section 6 reports experimental metrics including clustering accuracy, F1-score, and ARI versus ε on synthetic and real datasets, with comparisons to baselines. The abstract summarizes these; we will insert explicit pointers to the bounds and key experimental trends in a revised abstract. revision: partial

Circularity Check

0 steps flagged

No circularity: new mechanism and algorithm proposed without self-referential reduction

full rationale

The paper extends the Bit Vector mechanism for LDP and introduces kCluster for anonymized-space clustering, claiming compatibility with distance-based algorithms like DBSCAN. No equations, fitted parameters renamed as predictions, or self-citation chains are present in the abstract or description. The construction is presented as a direct modification and implementation step rather than deriving from its own outputs by definition. This is a standard non-circular proposal of a privacy-preserving algorithm.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available; no free parameters, invented entities, or explicit axioms are described beyond the core premise of the encoding extension.

axioms (1)

domain assumption The modified Bit Vector encoding preserves both local differential privacy and sufficient utility for distance-based clustering.
This premise is required for kCluster and the DBSCAN integration to function as claimed.

pith-pipeline@v0.9.0 · 5650 in / 1210 out tokens · 32989 ms · 2026-05-25T15:01:28.960212+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We first extend the Bit Vector... to be (ϵ,δ)-locally differentially private... distance estimation... kCluster... DBSCAN
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 4 (DPBV-Composition)... δ = (e^ε/(e^ε+1))^s − e^ε·(1/(e^ε+1))^s

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 1 internal anchor

[1]

Robust latent low rank representation for subspace clustering,

H. Zhang, Z. Lin, C. Zhang, and J. Gao, “Robust latent low rank representation for subspace clustering,” Neurocomputing, vol. 145, pp. 369–373, 2014

work page 2014
[2]

Differentially private recommender systems: Building privacy into the netﬂix prize contenders,

F. McSherry and I. Mironov, “Differentially private recommender systems: Building privacy into the netﬂix prize contenders,” in Proceedings of the 15th ACM SIGKDD . ACM, 2009, pp. 627–636

work page 2009
[3]

Cluster-indistinguishability: A practical differential privacy mechanism for trajectory clustering,

H. Wang, Z. Xu, and S. Jia, “Cluster-indistinguishability: A practical differential privacy mechanism for trajectory clustering,” Intelligent Data Analysis, vol. 21, no. 6, pp. 1305–1326, 2017

work page 2017
[4]

Differentially private publication of general time-serial trajectory data,

J. Hua, Y . Gao, and S. Zhong, “Differentially private publication of general time-serial trajectory data,” in INFOCOM 2015. IEEE, 2015, pp. 549–557

work page 2015
[5]

Differentially private k- means clustering,

D. Su, J. Cao, N. Li, E. Bertino, and H. Jin, “Differentially private k- means clustering,” in Proceedings of the Sixth ACM CODASPY . ACM, 2016, pp. 26–37

work page 2016
[6]

Privacy-preserving record linkage for big data: Current approaches and research challenges,

D. Vatsalan, Z. Sehili, P. Christen, and E. Rahm, “Privacy-preserving record linkage for big data: Current approaches and research challenges,” in Handbook of Big Data Technologies . Springer, 2017, pp. 851–895

work page 2017
[7]

Clustering Via Crowdsourcing

A. Mazumdar and B. Saha, “Clustering via crowdsourcing,” arXiv preprint arXiv:1604.01839, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[8]

Secure two- party differentially private data release for vertically partitioned data,

N. Mohammed, D. Alhadidi, B. C. Fung, and M. Debbabi, “Secure two- party differentially private data release for vertically partitioned data,” IEEE transactions on dependable and secure computing , vol. 11, no. 1, pp. 59–71, 2014

work page 2014
[9]

Privacy aware k-means clustering with high utility,

T. Dai Nguyen, S. Gupta, S. Rana, and S. Venkatesh, “Privacy aware k-means clustering with high utility,” in PAKDD. Springer, 2016, pp. 388–400

work page 2016
[10]

Calibrating noise to sensitivity in private data analysis,

C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” in Theory of Cryptography Conference. Springer, 2006, pp. 265–284

work page 2006
[11]

The algorithmic foundations of differential privacy,

C. Dwork, A. Roth et al., “The algorithmic foundations of differential privacy,” Foundations and Trends® in Theoretical Computer Science , vol. 9, no. 3–4, pp. 211–407, 2014

work page 2014
[12]

Randomized bit vector: Privacy-preserving encoding mechanism,

L. Sun, L. Zhang, and X. Ye, “Randomized bit vector: Privacy-preserving encoding mechanism,” in CIKM. ACM, 2018, pp. 1263–1272

work page 2018
[13]

Distance- aware encoding of numerical values for privacy-preserving record linkage,

D. Karapiperis, A. Gkoulalas-Divanis, and V . S. Verykios, “Distance- aware encoding of numerical values for privacy-preserving record linkage,” in Data Engineering (ICDE), 2017 IEEE 33rd International Conference on. IEEE, 2017, pp. 135–138. 9

work page 2017
[14]

Federal: a frame- work for distance-aware privacy-preserving record linkage,

D. Karapiperis, A. Gkoulalas-Divanis, and Verykios, “Federal: a frame- work for distance-aware privacy-preserving record linkage,” IEEE Transactions on Knowledge and Data Engineering , vol. 30, no. 2, pp. 292–304, 2018

work page 2018
[15]

Differential privacy,

C. Dwork, “Differential privacy,” in Proceedings of the 33rd International Conference on Automata, Languages and Programming - Volume Part II. Berlin, Heidelberg: Springer-Verlag, 2006, pp. 1–12

work page 2006
[16]

Extremal mechanisms for local differential privacy,

P. Kairouz, S. Oh, and P. Viswanath, “Extremal mechanisms for local differential privacy,” in NeurIPS, 2014, pp. 2879–2887

work page 2014
[17]

Collecting and analyzing multidimensional data with local differential privacy,

N. Wang, X. Xiao, Y . Yang, J. Zhao, and S. C. Hui, “Collecting and analyzing multidimensional data with local differential privacy,” in ICDE, 2019

work page 2019
[18]

Locally differentially private pro- tocols for frequency estimation,

T. Wang, J. Blocki, N. Li, and S. Jha, “Locally differentially private pro- tocols for frequency estimation,” in 26th{USENIX} Security Symposium ({USENIX} Security 17), 2017, pp. 729–745

work page 2017
[19]

Local, private, efﬁcient protocols for succinct histograms,

R. Bassily and A. Smith, “Local, private, efﬁcient protocols for succinct histograms,” in Proceedings of the forty-seventh annual ACM STOC . ACM, 2015, pp. 127–135

work page 2015
[20]

Rappor: Randomized aggregatable privacy-preserving ordinal response,

´U. Erlingsson, V . Pihur, and A. Korolova, “Rappor: Randomized aggregatable privacy-preserving ordinal response,” in Proceedings of the 2014 ACM SIGSAC conference on computer and communications security. ACM, 2014, pp. 1054–1067

work page 2014
[21]

Collecting telemetry data privately,

B. Ding, J. Kulkarni, and S. Yekhanin, “Collecting telemetry data privately,” in Advances in Neural Information Processing Systems , 2017, pp. 3571–3580

work page 2017
[22]

Comparing population means under local differential privacy: With signiﬁcance and power,

B. Ding, H. Nori, P. Li, and J. Allen, “Comparing population means under local differential privacy: With signiﬁcance and power,” in AAAI, 2018, pp. 26–33

work page 2018
[23]

Locally differentially private frequent itemset mining,

T. Wang, N. Li, and S. Jha, “Locally differentially private frequent itemset mining,” in 2018 IEEE Symposium on Security and Privacy (SP) . IEEE, 2018, pp. 127–143

work page 2018
[24]

Privacy pre- serving clustering by data transformation,

S. R. M. Oliveira, O. R. Zaane, and E. I. Agropecuaria, “Privacy pre- serving clustering by data transformation,” Proc of Brazilian Symposium on Databases, no. 1, pp. 37–52, 2003

work page 2003
[25]

Random projection-based multiplica- tive data perturbation for privacy preserving distributed data mining,

K. Liu, H. Kargupta, and J. Ryan, “Random projection-based multiplica- tive data perturbation for privacy preserving distributed data mining,” IEEE Transactions on knowledge and Data Engineering , vol. 18, no. 1, pp. 92–106, 2006

work page 2006
[26]

Some methods for classiﬁcation and analysis of multivariate observations,

J. MacQueen et al., “Some methods for classiﬁcation and analysis of multivariate observations,” in Proceedings of the ﬁfth Berkeley symposium on mathematical statistics and probability , vol. 1, no. 14. Oakland, CA, USA, 1967, pp. 281–297

work page 1967
[27]

Clustering aggregation,

A. Gionis, H. Mannila, and P. Tsaparas, “Clustering aggregation,” ACM Transactions on Knowledge Discovery from Data (TKDD) , vol. 1, no. 1, p. 4, 2007

work page 2007
[28]

Robust path-based spectral clustering,

H. Chang and D.-Y . Yeung, “Robust path-based spectral clustering,” Pattern Recognition, vol. 41, no. 1, pp. 191–203, 2008

work page 2008
[29]

A novel graph-based ﬁsher kernel method for semi-supervised learning,

A. Rozza, M. Manzo, and A. Petrosino, “A novel graph-based ﬁsher kernel method for semi-supervised learning,” in ICPR, 2014 . IEEE, 2014, pp. 3786–3791

work page 2014
[30]

Scikit-learn: Machine learning in Python,

F. Pedregosa, G. Varoquaux, and Gramfort, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research , vol. 12, pp. 2825–2830, 2011. APPENDIX Given random variables r ={r1,r 2,...,r s}, the randomized response with bit vector satisﬁes (ϵ,δ )-local differential privacy, where δ = ( eϵ eϵ+1 )s−eϵ· ( 1 eϵ+1 )s. Proof. As random variable...

work page 2011

[1] [1]

Robust latent low rank representation for subspace clustering,

H. Zhang, Z. Lin, C. Zhang, and J. Gao, “Robust latent low rank representation for subspace clustering,” Neurocomputing, vol. 145, pp. 369–373, 2014

work page 2014

[2] [2]

Differentially private recommender systems: Building privacy into the netﬂix prize contenders,

F. McSherry and I. Mironov, “Differentially private recommender systems: Building privacy into the netﬂix prize contenders,” in Proceedings of the 15th ACM SIGKDD . ACM, 2009, pp. 627–636

work page 2009

[3] [3]

Cluster-indistinguishability: A practical differential privacy mechanism for trajectory clustering,

H. Wang, Z. Xu, and S. Jia, “Cluster-indistinguishability: A practical differential privacy mechanism for trajectory clustering,” Intelligent Data Analysis, vol. 21, no. 6, pp. 1305–1326, 2017

work page 2017

[4] [4]

Differentially private publication of general time-serial trajectory data,

J. Hua, Y . Gao, and S. Zhong, “Differentially private publication of general time-serial trajectory data,” in INFOCOM 2015. IEEE, 2015, pp. 549–557

work page 2015

[5] [5]

Differentially private k- means clustering,

D. Su, J. Cao, N. Li, E. Bertino, and H. Jin, “Differentially private k- means clustering,” in Proceedings of the Sixth ACM CODASPY . ACM, 2016, pp. 26–37

work page 2016

[6] [6]

Privacy-preserving record linkage for big data: Current approaches and research challenges,

D. Vatsalan, Z. Sehili, P. Christen, and E. Rahm, “Privacy-preserving record linkage for big data: Current approaches and research challenges,” in Handbook of Big Data Technologies . Springer, 2017, pp. 851–895

work page 2017

[7] [7]

Clustering Via Crowdsourcing

A. Mazumdar and B. Saha, “Clustering via crowdsourcing,” arXiv preprint arXiv:1604.01839, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[8] [8]

Secure two- party differentially private data release for vertically partitioned data,

N. Mohammed, D. Alhadidi, B. C. Fung, and M. Debbabi, “Secure two- party differentially private data release for vertically partitioned data,” IEEE transactions on dependable and secure computing , vol. 11, no. 1, pp. 59–71, 2014

work page 2014

[9] [9]

Privacy aware k-means clustering with high utility,

T. Dai Nguyen, S. Gupta, S. Rana, and S. Venkatesh, “Privacy aware k-means clustering with high utility,” in PAKDD. Springer, 2016, pp. 388–400

work page 2016

[10] [10]

Calibrating noise to sensitivity in private data analysis,

C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” in Theory of Cryptography Conference. Springer, 2006, pp. 265–284

work page 2006

[11] [11]

The algorithmic foundations of differential privacy,

C. Dwork, A. Roth et al., “The algorithmic foundations of differential privacy,” Foundations and Trends® in Theoretical Computer Science , vol. 9, no. 3–4, pp. 211–407, 2014

work page 2014

[12] [12]

Randomized bit vector: Privacy-preserving encoding mechanism,

L. Sun, L. Zhang, and X. Ye, “Randomized bit vector: Privacy-preserving encoding mechanism,” in CIKM. ACM, 2018, pp. 1263–1272

work page 2018

[13] [13]

Distance- aware encoding of numerical values for privacy-preserving record linkage,

D. Karapiperis, A. Gkoulalas-Divanis, and V . S. Verykios, “Distance- aware encoding of numerical values for privacy-preserving record linkage,” in Data Engineering (ICDE), 2017 IEEE 33rd International Conference on. IEEE, 2017, pp. 135–138. 9

work page 2017

[14] [14]

Federal: a frame- work for distance-aware privacy-preserving record linkage,

D. Karapiperis, A. Gkoulalas-Divanis, and Verykios, “Federal: a frame- work for distance-aware privacy-preserving record linkage,” IEEE Transactions on Knowledge and Data Engineering , vol. 30, no. 2, pp. 292–304, 2018

work page 2018

[15] [15]

Differential privacy,

C. Dwork, “Differential privacy,” in Proceedings of the 33rd International Conference on Automata, Languages and Programming - Volume Part II. Berlin, Heidelberg: Springer-Verlag, 2006, pp. 1–12

work page 2006

[16] [16]

Extremal mechanisms for local differential privacy,

P. Kairouz, S. Oh, and P. Viswanath, “Extremal mechanisms for local differential privacy,” in NeurIPS, 2014, pp. 2879–2887

work page 2014

[17] [17]

Collecting and analyzing multidimensional data with local differential privacy,

N. Wang, X. Xiao, Y . Yang, J. Zhao, and S. C. Hui, “Collecting and analyzing multidimensional data with local differential privacy,” in ICDE, 2019

work page 2019

[18] [18]

Locally differentially private pro- tocols for frequency estimation,

T. Wang, J. Blocki, N. Li, and S. Jha, “Locally differentially private pro- tocols for frequency estimation,” in 26th{USENIX} Security Symposium ({USENIX} Security 17), 2017, pp. 729–745

work page 2017

[19] [19]

Local, private, efﬁcient protocols for succinct histograms,

R. Bassily and A. Smith, “Local, private, efﬁcient protocols for succinct histograms,” in Proceedings of the forty-seventh annual ACM STOC . ACM, 2015, pp. 127–135

work page 2015

[20] [20]

Rappor: Randomized aggregatable privacy-preserving ordinal response,

´U. Erlingsson, V . Pihur, and A. Korolova, “Rappor: Randomized aggregatable privacy-preserving ordinal response,” in Proceedings of the 2014 ACM SIGSAC conference on computer and communications security. ACM, 2014, pp. 1054–1067

work page 2014

[21] [21]

Collecting telemetry data privately,

B. Ding, J. Kulkarni, and S. Yekhanin, “Collecting telemetry data privately,” in Advances in Neural Information Processing Systems , 2017, pp. 3571–3580

work page 2017

[22] [22]

Comparing population means under local differential privacy: With signiﬁcance and power,

B. Ding, H. Nori, P. Li, and J. Allen, “Comparing population means under local differential privacy: With signiﬁcance and power,” in AAAI, 2018, pp. 26–33

work page 2018

[23] [23]

Locally differentially private frequent itemset mining,

T. Wang, N. Li, and S. Jha, “Locally differentially private frequent itemset mining,” in 2018 IEEE Symposium on Security and Privacy (SP) . IEEE, 2018, pp. 127–143

work page 2018

[24] [24]

Privacy pre- serving clustering by data transformation,

S. R. M. Oliveira, O. R. Zaane, and E. I. Agropecuaria, “Privacy pre- serving clustering by data transformation,” Proc of Brazilian Symposium on Databases, no. 1, pp. 37–52, 2003

work page 2003

[25] [25]

Random projection-based multiplica- tive data perturbation for privacy preserving distributed data mining,

K. Liu, H. Kargupta, and J. Ryan, “Random projection-based multiplica- tive data perturbation for privacy preserving distributed data mining,” IEEE Transactions on knowledge and Data Engineering , vol. 18, no. 1, pp. 92–106, 2006

work page 2006

[26] [26]

Some methods for classiﬁcation and analysis of multivariate observations,

J. MacQueen et al., “Some methods for classiﬁcation and analysis of multivariate observations,” in Proceedings of the ﬁfth Berkeley symposium on mathematical statistics and probability , vol. 1, no. 14. Oakland, CA, USA, 1967, pp. 281–297

work page 1967

[27] [27]

Clustering aggregation,

A. Gionis, H. Mannila, and P. Tsaparas, “Clustering aggregation,” ACM Transactions on Knowledge Discovery from Data (TKDD) , vol. 1, no. 1, p. 4, 2007

work page 2007

[28] [28]

Robust path-based spectral clustering,

H. Chang and D.-Y . Yeung, “Robust path-based spectral clustering,” Pattern Recognition, vol. 41, no. 1, pp. 191–203, 2008

work page 2008

[29] [29]

A novel graph-based ﬁsher kernel method for semi-supervised learning,

A. Rozza, M. Manzo, and A. Petrosino, “A novel graph-based ﬁsher kernel method for semi-supervised learning,” in ICPR, 2014 . IEEE, 2014, pp. 3786–3791

work page 2014

[30] [30]

Scikit-learn: Machine learning in Python,

F. Pedregosa, G. Varoquaux, and Gramfort, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research , vol. 12, pp. 2825–2830, 2011. APPENDIX Given random variables r ={r1,r 2,...,r s}, the randomized response with bit vector satisﬁes (ϵ,δ )-local differential privacy, where δ = ( eϵ eϵ+1 )s−eϵ· ( 1 eϵ+1 )s. Proof. As random variable...

work page 2011