pith. sign in

arxiv: 1907.01695 · v1 · pith:D5YGPBGDnew · submitted 2019-07-03 · 💻 cs.SI · cs.IR

On the Privacy of dK-Random Graphs

Pith reviewed 2026-05-25 10:11 UTC · model grok-4.3

classification 💻 cs.SI cs.IR
keywords privacyde-anonymizationdK-seriessocial networksgraph anonymizationmachine learning attacksstructural propertiesre-identification
0
0 comments X

The pith

The vulnerability of dK-anonymized social graphs to de-anonymization depends on the original graph's structural properties and the attacker's starting node set.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper investigates why dK-anonymized versions of real social graphs remain susceptible to machine learning de-anonymization attacks. It demonstrates that an attacker's success rate can be tied to the graph characteristics of the initial subset of nodes used to launch the attack. The work further measures how different orders of dK-series perform as anonymization tools and isolates the original graph traits that heighten exposure. These findings matter for any effort to release social network data while limiting re-identification risks. The results indicate that anonymization choices need to account for specific graph features rather than relying on a single method across all datasets.

Core claim

The paper establishes that dK-random graphs preserve enough structural information from the source networks to enable effective de-anonymization, with attacker strength predictable from the degree and connectivity traits of the seed node subset and with overall protection varying by the dK order applied and by properties such as degree distribution and clustering in the original graph.

What carries the argument

dK-series, which produce random graphs that match the original graph's joint degree distributions up to order d, used here to generate anonymized versions whose resistance to ML attacks is then tested.

If this is right

  • Attackers who begin with nodes sharing high-degree or high-clustering traits achieve higher re-identification rates.
  • Higher-order dK-series provide measurably stronger protection than lower-order ones, yet none achieve complete anonymity.
  • Graphs whose original degree sequences or clustering coefficients fall into certain ranges remain more exposed after anonymization.
  • The relative strength of an attacker can be estimated in advance from the structural profile of its starting node subset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Data curators could pre-screen graphs for the identified vulnerable traits before selecting an anonymization order.
  • The same structural signals may affect the performance of other graph anonymization approaches beyond dK-series.
  • Extending the analysis to dynamic or weighted graphs could reveal whether the vulnerability patterns persist over time.

Load-bearing premise

The machine learning de-anonymization attacks tested here represent realistic threats and the dK graphs were produced without hidden biases that favor the reported outcomes.

What would settle it

A controlled experiment in which de-anonymization success rates show no correlation with the measured graph properties of the seed sets or the original networks would falsify the claimed dependence.

Figures

Figures reproduced from arXiv: 1907.01695 by Adriana Iamnitchi, Sameera Horawalavithana.

Figure 1
Figure 1. Figure 1: Graph mining system flow to generate node pairs [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Average structural distance between the original [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Average percentage of nodes with degree 1 in the [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 3
Figure 3. Figure 3: Accuracy of prediction of identical pairs over dif [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Accuracy of prediction of identical pairs across [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Kernel Density Estimate (KDE) of the accuracy of predictions over identical pairs across several graphs. Results are [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
read the original abstract

Real social network datasets provide significant benefits for understanding phenomena such as information diffusion or network evolution. Yet the privacy risks raised from sharing real graph datasets, even when stripped of user identity information, are significant. Previous research shows that many graph anonymization techniques fail against existing graph de-anonymization attacks. However, the specific reason for the success of such de-anonymization attacks is yet to be understood. This paper systematically studies the structural properties of real graphs that make them more vulnerable to machine learning-based techniques for de-anonymization. More precisely, we study the boundaries of anonymity based on the structural properties of real graph datasets in terms of how their dK-based anonymized versions resist (or fail) to various types of attacks. Our experimental results lead to three contributions. First, we identify the strength of an attacker based on the graph characteristics of the subset of nodes from which it starts the de-anonymization attack. Second, we quantify the relative effectiveness of dK-series for graph anonymization. And third, we identify the properties of the original graph that make it more vulnerable to de-anonymization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript presents an experimental study of privacy risks in real social network graphs under dK-based anonymization. It claims that structural properties of the original graphs and of the node subsets used to seed machine-learning de-anonymization attacks determine attacker success, that the relative effectiveness of different dK-series can be quantified, and that certain graph properties render the data more vulnerable to de-anonymization.

Significance. If the experimental claims hold, the work would supply concrete, actionable guidance on when dK-anonymization is or is not sufficient, which is directly relevant to data-sharing practices in network science. The three listed contributions are falsifiable and address a recognized gap between theoretical anonymity guarantees and practical attack performance.

major comments (1)
  1. Abstract: the manuscript asserts experimental results on attacker strength, dK effectiveness, and graph vulnerability but supplies no datasets, attack implementations, metrics, controls, or statistical details. This absence is load-bearing for all three claimed contributions because the results cannot be reproduced or evaluated for bias in attack selection or dK generation procedure.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the report and the opportunity to respond. Below we address the major comment point by point.

read point-by-point responses
  1. Referee: [—] Abstract: the manuscript asserts experimental results on attacker strength, dK effectiveness, and graph vulnerability but supplies no datasets, attack implementations, metrics, controls, or statistical details. This absence is load-bearing for all three claimed contributions because the results cannot be reproduced or evaluated for bias in attack selection or dK generation procedure.

    Authors: The abstract is a concise summary and does not enumerate implementation details, which is standard. The full manuscript supplies the requested information in the Experimental Setup (Section 4) and Results (Section 5) sections: the specific real-world social network datasets, the machine-learning de-anonymization attack procedures and parameter choices, the evaluation metrics (including success rates and statistical significance tests), controls for bias, and the dK-series generation procedure. These sections contain sufficient information for reproduction and for assessing potential bias. We therefore maintain that the three contributions are supported by the body of the work rather than the abstract alone. If the editor prefers, we can add one sentence to the abstract summarizing the experimental scope. revision: no

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a purely experimental study with no equations, fitted parameters, derivations, or self-referential mathematical claims. Its three contributions (attacker strength from node-subset characteristics, relative dK effectiveness, and graph-property vulnerability) are presented as outcomes of empirical evaluation of ML de-anonymization attacks on dK-anonymized graphs. No load-bearing step reduces by construction to its own inputs, and the abstract supplies no self-citation chains or ansatzes that could create circularity. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are identifiable from the abstract; the work is an empirical evaluation of existing dK techniques.

pith-pipeline@v0.9.0 · 5724 in / 1010 out tokens · 45494 ms · 2026-05-25T10:11:11.415910+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages

  1. [1]

    A Face Is Exposed for AOL Searcher No

    2006. A Face Is Exposed for AOL Searcher No. 4417749. http://www.nytimes. com/2006/08/09/technology/09aol.html. (2006)

  2. [2]

    Researchers reverse Netflix anonymization

    2007. Researchers reverse Netflix anonymization. http://www.securityfocus.com/ news/11497. (2007)

  3. [3]

    Web-Strip: Intimate data from federal politi- cians for sale

    2016. Web-Strip: Intimate data from federal politi- cians for sale. https://daserste.ndr.de/panorama/aktuell/ Web-Strip-Intimate-data-from-federal-politicians-for-sale,nacktimnetz114. html. (2016)

  4. [4]

    Charu C Aggarwal, Yao Li, and S Yu Philip. 2011. On the hardness of graph anonymization. In Data Mining (ICDM), 2011 IEEE 11th International Conference on. IEEE, 1002–1007

  5. [5]

    Carolyn J Anderson, Stanley Wasserman, and Bradley Crouch. 1999. A p* primer: Logit models for social networks. Social networks 21, 1 (1999), 37–66

  6. [6]

    Lars Backstrom, Cynthia Dwork, and Jon Kleinberg. 2007. Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganog- raphy. In Proceedings of the 16th international conference on World Wide Web . ACM, 181–190

  7. [7]

    Xenofontas Dimitropoulos, Dmitri Krioukov, Amin Vahdat, and George Riley

  8. [8]

    ACM Trans- actions on Modeling and Computer Simulation (TOMACS) 19, 4 (2009), 17

    Graph annotations in modeling complex network topologies. ACM Trans- actions on Modeling and Computer Simulation (TOMACS) 19, 4 (2009), 17

  9. [9]

    Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differ- ential privacy. Foundations and Trends® in Theoretical Computer Science 9, 3–4 (2014), 211–407

  10. [10]

    M. Fire, R. Puzis, and Y. Elovici. 2012. Link Prediction in Highly Fractional Data Sets. Handbook of Computational Approaches to Counterterrorism (2012)

  11. [11]

    Minas Gjoka, Maciej Kurant, and Athina Markopoulou. 2013. 2.5 k-graphs: from sampling to generation. In INFOCOM, 2013 Proceedings IEEE . IEEE, 1968–1976

  12. [12]

    Virgil Griffith and Markus Jakobsson. 2005. Messin’with texas deriving mother’s maiden names using public records. InApplied Cryptography and Network Security. Springer, 91–103

  13. [13]

    Gábor György Gulyás, Benedek Simon, and Sándor Imre. 2016. An Efficient and Robust Social Network De-anonymization Attack. InProceedings of the 2016 ACM on Workshop on Privacy in the Electronic Society . ACM, 1–11

  14. [14]

    Peter J Haas. 2016. Data-stream sampling: basic techniques and results. In Data Stream Management. Springer, 13–44

  15. [15]

    Keith Henderson, Brian Gallagher, Lei Li, Leman Akoglu, Tina Eliassi-Rad, Hang- hang Tong, and Christos Faloutsos. 2011. It’s who you know: graph mining using recursive structural features. InProceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining . ACM, 663–671

  16. [16]

    Shouling Ji, Weiqing Li, Neil Zhenqiang Gong, Prateek Mittal, and Raheem A Beyah. 2015. On Your Social Network De-anonymizablity: Quantification and Large Scale Evaluation with Seed Knowledge.. In NDSS

  17. [17]

    Shouling Ji, Weiqing Li, Prateek Mittal, Xin Hu, and Raheem A Beyah. 2015. SecGraph: A Uniform and Open-source Evaluation System for Graph Data Anonymization and De-anonymization.. In USENIX Security Symposium. 303– 318

  18. [18]

    Shouling Ji, Weiqing Li, Mudhakar Srivatsa, and Raheem Beyah. 2014. Structural data de-anonymization: Quantification, practice, and implications. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security . ACM, 1040–1053

  19. [19]

    Shouling Ji, Weiqing Li, Mudhakar Srivatsa, and Raheem Beyah. 2016. Struc- tural data de-anonymization: Theory and practice. IEEE/ACM Transactions on Networking 24, 6 (2016), 3523–3536

  20. [21]

    In International Conference on Information Security

    Structure based data de-anonymization of social networks and mobility traces. In International Conference on Information Security . Springer, 237–254

  21. [22]

    Shouling Ji, Weiqing Li, Mudhakar Srivatsa, Jing Selena He, and Raheem Beyah

  22. [23]

    ACM Transactions on Information and System Security (TISSEC) 18, 4 (2016), 12

    General graph data de-anonymization: From mobility traces to social networks. ACM Transactions on Information and System Security (TISSEC) 18, 4 (2016), 12

  23. [24]

    Shouling Ji, Weiqing Li, Shukun Yang, Prateek Mittal, and Raheem Beyah. 2016. On the relative de-anonymizability of graph data: Quantification and evalua- tion. In Computer Communications, IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on. IEEE, 1–9

  24. [25]

    Shouling Ji, Prateek Mittal, and Raheem Beyah. 2016. Graph data anonymization, de-anonymization attacks, and de-anonymizability quantification: A survey.IEEE Communications Surveys & Tutorials (2016)

  25. [26]

    Nitish Korula and Silvio Lattanzi. 2014. An efficient reconciliation algorithm for social networks. Proceedings of the VLDB Endowment 7, 5 (2014), 377–388

  26. [27]

    Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2007. Graph evolution: Densification and shrinking diameters.ACM Transactions on Knowledge Discovery from Data (TKDD) 1, 1 (2007), 2

  27. [28]

    Jure Leskovec and Julian J Mcauley. 2012. Learning to discover social circles in ego networks. In Advances in neural information processing systems . 539–547

  28. [29]

    Changchang Liu and Prateek Mittal. 2016. LinkMirage: Enabling Privacy- preserving Analytics on Social Relationships.. In NDSS

  29. [30]

    Kun Liu and Evimaria Terzi. 2008. Towards identity anonymization on graphs. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 93–106

  30. [31]

    Priya Mahadevan, Dmitri Krioukov, Kevin Fall, and Amin Vahdat. 2006. Sys- tematic topology analysis and generation using degree correlations. In ACM SIGCOMM Computer Communication Review , Vol. 36. ACM, 135–146

  31. [32]

    Arvind Narayanan, Elaine Shi, and Benjamin IP Rubinstein. 2011. Link prediction by de-anonymization: How we won the kaggle social network challenge. InNeural Networks (IJCNN), The 2011 International Joint Conference on . IEEE, 1825–1834

  32. [33]

    Arvind Narayanan and Vitaly Shmatikov. 2009. De-anonymizing social networks. In Security and Privacy, 2009 30th IEEE Symposium on . IEEE, 173–187

  33. [34]

    Shirin Nilizadeh, Apu Kapadia, and Yong-Yeol Ahn. 2014. Community-enhanced de-anonymization of online social networks. In Proceedings of the 2014 acm sigsac conference on computer and communications security . ACM, 537–548

  34. [35]

    Chiara Orsini, Marija M Dankulov, Pol Colomer-de Simón, Almerima Jamakovic, Priya Mahadevan, Amin Vahdat, Kevin E Bassler, Zoltán Toroczkai, Marián Boguñá, Guido Caldarelli, et al. 2015. Quantifying randomness in real networks. Nature communications 6 (2015), 8627

  35. [36]

    Pedram Pedarsani, Daniel R Figueiredo, and Matthias Grossglauser. 2013. A bayesian method for matching two similar graphs without seeds. In Communica- tion, Control, and Computing (Allerton), 2013 51st Annual Allerton Conference on . IEEE, 1598–1607

  36. [37]

    Pedram Pedarsani and Matthias Grossglauser. 2011. On the privacy of anonymized networks. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining . ACM, 1235–1243

  37. [38]

    Davide Proserpio, Sharon Goldberg, and Frank McSherry. 2012. A workflow for differentially-private graph synthesis. In Proceedings of the 2012 ACM workshop on Workshop on online social networks . ACM, 13–18

  38. [39]

    Davide Proserpio, Sharon Goldberg, and Frank McSherry. 2014. Calibrating data to sensitivity in private data analysis: a platform for differentially-private analysis of weighted datasets. Proceedings of the VLDB Endowment 7, 8 (2014), 637–648

  39. [40]

    Rossi and Nesreen K

    Ryan A. Rossi and Nesreen K. Ahmed. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. InProceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. http://networkrepository.com

  40. [41]

    Alessandra Sala, Xiaohan Zhao, Christo Wilson, Haitao Zheng, and Ben Y Zhao

  41. [42]

    In Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference

    Sharing graphs using differentially private graph models. In Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference . ACM, 81–98

  42. [43]

    Tiago A Schieber, Laura Carpi, Albert Díaz-Guilera, Panos M Pardalos, Cristina Masoller, and Martín G Ravetti. 2017. Quantification of network structural dissimilarities. Nature communications 8 (2017), 13928

  43. [44]

    Kumar Sharad. 2016. Learning to de-anonymize social networks. Ph.D. Dissertation. Computer Laboratory, University of Cambridge

  44. [45]

    Kumar Sharad. 2016. True Friends Let You Down: Benchmarking Social Graph Anonymization Schemes. In Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security (AISec ’16). ACM, New York, NY, USA, 93–104. https: //doi.org/10.1145/2996758.2996765

  45. [46]

    Kumar Sharad and George Danezis. 2013. De-anonymizing d4d datasets. In Workshop on Hot Topics in Privacy Enhancing Technologies, Bloomington, Indiana, USA. 10

  46. [47]

    Kumar Sharad and George Danezis. 2014. An automated social graph de- anonymization technique. In Proceedings of the 13th Workshop on Privacy in the Electronic Society. ACM, 47–58

  47. [48]

    Reza Shokri. 2015. Privacy games: Optimal user-centric data obfuscation. Pro- ceedings on Privacy Enhancing Technologies 2015, 2 (2015), 299–315

  48. [49]

    Mudhakar Srivatsa and Mike Hicks. 2012. Deanonymizing mobility traces: Using social network as a side-channel. In Proceedings of the 2012 ACM conference on Computer and communications security . ACM, 628–637

  49. [50]

    George Theodorakopoulos, Reza Shokri, Carmela Troncoso, Jean-Pierre Hubaux, and Jean-Yves Le Boudec. 2014. Prolonging the hide-and-seek game: Optimal trajectory privacy for location-based services. InProceedings of the 13th Workshop on Privacy in the Electronic Society . ACM, 73–82

  50. [51]

    Yue Wang and Xintao Wu. 2013. Preserving differential privacy in degree- correlation based graph generation. Transactions on data privacy 6, 2 (2013), 127

  51. [52]

    Lyudmila Yartseva and Matthias Grossglauser. 2013. On the performance of percolation graph matching. In Proceedings of the first ACM conference on Online social networks. ACM, 119–130