pith. sign in

arxiv: 1204.2606 · v1 · pith:5U3UPOBYnew · submitted 2012-04-12 · 💻 cs.DS · cs.CY· cs.DB· cs.SI

Privacy via the Johnson-Lindenstrauss Transform

classification 💻 cs.DS cs.CYcs.DBcs.SI
keywords datauserusersdistancelower-dimensionalpartyprivacydistances
0
0 comments X
read the original abstract

Suppose that party A collects private information about its users, where each user's data is represented as a bit vector. Suppose that party B has a proprietary data mining algorithm that requires estimating the distance between users, such as clustering or nearest neighbors. We ask if it is possible for party A to publish some information about each user so that B can estimate the distance between users without being able to infer any private bit of a user. Our method involves projecting each user's representation into a random, lower-dimensional space via a sparse Johnson-Lindenstrauss transform and then adding Gaussian noise to each entry of the lower-dimensional representation. We show that the method preserves differential privacy---where the more privacy is desired, the larger the variance of the Gaussian noise. Further, we show how to approximate the true distances between users via only the lower-dimensional, perturbed data. Finally, we consider other perturbation methods such as randomized response and draw comparisons to sketch-based methods. While the goal of releasing user-specific data to third parties is more broad than preserving distances, this work shows that distance computations with privacy is an achievable goal.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Differentially Private Motif-Preserving Multi-modal Hashing

    cs.IR 2026-05 unverdicted novelty 7.0

    DMP-MH clips degrees to control triangle sensitivity, synthesizes an edge-DP graph with Noisy Mirror Descent, and distills it into dual-stream hash networks, beating private baselines by up to 11.4 mAP on MIRFlickr-25...

  2. Differentially Private Spectral Graph Clustering: Balancing Privacy, Accuracy, and Efficiency

    cs.IT 2025-10 conditional novelty 7.0

    A matrix shuffling mechanism for edge-differentially private spectral clustering achieves Õ(1/n) misclassification error via privacy amplification and a unified Davis-Kahan plus margin analysis, outperforming Analyze ...