Inferring Sensitive Attributes from Knowledge Graph Embeddings: Attack and Defense Strategies
Pith reviewed 2026-05-20 04:56 UTC · model grok-4.3
The pith
Knowledge graph embeddings allow inference of sensitive user attributes even when those attributes are not explicitly stored in the graph.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Reasoning over knowledge graph embeddings enables attribute inference attacks that deduce sensitive user information not explicitly stored, and a framework that applies post-processing sanitization techniques to the embeddings mitigates these privacy risks.
What carries the argument
Post-processing sanitization applied to KGE outputs to limit attribute inference while preserving utility.
If this is right
- Adversaries can deduce sensitive attributes from the outputs of common KGE models even without direct access to those attributes.
- Randomization-based sanitization on embeddings lowers the success rate of inference attacks.
- Sanitization creates a measurable trade-off between privacy gains and reduced accuracy in recommendation systems that rely on the embeddings.
- Basic randomization is insufficient, so more advanced sanitization methods are needed to improve the privacy-utility balance.
Where Pith is reading between the lines
- The same inference risk may appear in other embedding-based systems outside knowledge graphs, such as those used for user profiles in social networks.
- Embedding models could be retrained with privacy constraints built in rather than relying only on later fixes.
- Standards for evaluating privacy in learned representations may need to include tests for indirect attribute leakage.
Load-bearing premise
The embeddings retain enough non-sensitive structure for reliable inference of sensitive attributes, and randomization can be tuned to cut leakage without destroying usefulness for tasks like recommendations.
What would settle it
A controlled experiment on a public KG where no correlation appears between embedding vectors and held-out sensitive attributes after removing obvious confounders, or where any effective randomization destroys recommendation accuracy.
read the original abstract
Knowledge Graphs (KGs) are a powerful representation of linked data, offering flexibility, semantic richness, and support for knowledge enrichment and reasoning. They help data owners organize and exploit heterogeneous data to provide insightful services (e.g., recommendations), yet real-world KGs are often incomplete, hiding true facts or missing valuable insights. Knowledge graph embedding techniques are commonly used to infer valuable missing information. However, reasoning over KGs can inadvertently expose sensitive user information, even when such data is not explicitly stored. In this work, we investigate the privacy risks associated with KGE-based reasoning, focusing on attribute inference attacks where adversaries attempt to deduce sensitive user attributes from seemingly non-sensitive outputs. We propose and evaluate a framework that mitigates these privacy risks by applying post processing sanitization techniques to KGE outputs. Preliminary results demonstrate the effectiveness of these attacks on the outputs of KGE models, and explore the trade-off between recommendation quality and privacy protection when applying randomization based approaches, highlighting the need to experiment with more advanced techniques in future work to address this issue.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript investigates privacy risks in knowledge graph embeddings (KGE), showing that attribute inference attacks can deduce sensitive user attributes from KGE outputs even when such data is not explicitly stored in the graph. It proposes a post-processing sanitization framework applying randomization-based techniques to KGE outputs to reduce leakage, while evaluating the resulting trade-off against recommendation quality. Preliminary results are presented to support attack effectiveness and the sanitization approach.
Significance. If the central claims hold after addressing evaluation gaps, the work identifies a relevant privacy vulnerability in KGE-based reasoning over incomplete KGs, which are widely used in recommendation and knowledge enrichment services. The proposed sanitization framework offers a practical, post-hoc defense that could be adopted without retraining embeddings. Explicit credit is due for focusing on the utility-privacy trade-off in a preliminary study, though stronger isolation of embedding-specific leakage would increase its contribution to privacy-preserving KG research.
major comments (3)
- Abstract and preliminary results section: the claims of attack effectiveness and sanitization trade-offs rest on unspecified preliminary results with no quantitative metrics, baselines, or error analysis provided, which is load-bearing for assessing whether the attacks succeed reliably or the randomization preserves downstream utility.
- Attack methodology (likely §4 or equivalent): the evaluation does not compare attribute inference performance on KGE outputs against equivalent inference directly on the raw KG structure or non-embedding ML models. This leaves open whether observed leakage arises from KGE-specific retained structure or from pre-existing correlations in the original graph, directly impacting the central claim that embeddings introduce the privacy risk.
- Sanitization framework (likely §5): details on how randomization parameters are chosen and tuned are insufficient to determine whether the approach reduces sensitive attribute leakage without destroying non-sensitive relational patterns needed for recommendation tasks.
minor comments (2)
- Introduction: a few sentences describing the distinction between explicitly stored and inferred sensitive attributes could be rephrased for precision and to avoid potential reader confusion.
- Related work: verify that recent papers on membership or attribute inference in graph embeddings are cited to better situate the contribution.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments on our preliminary study. The feedback identifies important gaps in evaluation rigor and methodological clarity that we will address in the revision. Below we respond to each major comment.
read point-by-point responses
-
Referee: Abstract and preliminary results section: the claims of attack effectiveness and sanitization trade-offs rest on unspecified preliminary results with no quantitative metrics, baselines, or error analysis provided, which is load-bearing for assessing whether the attacks succeed reliably or the randomization preserves downstream utility.
Authors: We agree that the current presentation of preliminary results is insufficiently detailed. In the revised manuscript we will expand the relevant sections to report concrete quantitative metrics for the attribute inference attacks (precision, recall, F1, and AUC) together with appropriate baselines and error analysis. For the sanitization experiments we will likewise include explicit utility metrics (e.g., NDCG, precision@K) and privacy-utility trade-off curves so that the effectiveness claims can be properly evaluated. revision: yes
-
Referee: Attack methodology (likely §4 or equivalent): the evaluation does not compare attribute inference performance on KGE outputs against equivalent inference directly on the raw KG structure or non-embedding ML models. This leaves open whether observed leakage arises from KGE-specific retained structure or from pre-existing correlations in the original graph, directly impacting the central claim that embeddings introduce the privacy risk.
Authors: This is a substantive methodological concern. To isolate embedding-specific leakage we will add new experiments that apply the same attribute inference attack to (i) the raw knowledge graph using graph-based or rule-based methods and (ii) non-embedding supervised models trained on the original graph features. The revised evaluation will report these comparisons side-by-side with the KGE results, allowing readers to assess whether the observed leakage is amplified by the embedding process itself. revision: yes
-
Referee: Sanitization framework (likely §5): details on how randomization parameters are chosen and tuned are insufficient to determine whether the approach reduces sensitive attribute leakage without destroying non-sensitive relational patterns needed for recommendation tasks.
Authors: We acknowledge the lack of transparency regarding parameter selection. The revised manuscript will describe the randomization procedure in detail, including the noise distributions employed, the grid-search or heuristic procedure used to select parameter values, and ablation results that quantify the effect of each parameter on both attribute-inference leakage and downstream recommendation quality. This will make the utility-privacy trade-off reproducible and easier to interpret. revision: yes
Circularity Check
No significant circularity in empirical attack-defense framework
full rationale
The paper describes an empirical investigation of attribute inference attacks on KGE outputs and a post-processing sanitization framework to mitigate leakage, with preliminary results on privacy-utility trade-offs. No equations, parameter-fitting procedures, or derivations are referenced in the abstract or summary that could reduce a claimed prediction or result to an input by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatzes are invoked. The central claims rest on experimental evaluation rather than self-referential definitions or renamed known patterns. This is a standard applied privacy paper whose reasoning chain does not exhibit the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The attacker exploits correlations between recommended movies and demographic attributes encoded in the knowledge graph
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Dieter Fensel, Umutcan Şimşek, Kevin Angele, Elwin Huama n, Elias Kärle, Olek- sandra Panasiuk, Ioan Toma, Jürgen Umbrich, Alexander Wahl er, Dieter Fensel, et al. Introduction: what is a knowledge graph? Knowledge graphs: Methodology, tools and selected use cases , pages 1–10, 2020
work page 2020
-
[2]
Knowle dge graph embedding: An overview
Xiou Ge, Yun Cheng Wang, Bin Wang, C-C Jay Kuo, et al. Knowle dge graph embedding: An overview. APSIPA Transactions on Signal and Information Pro- cessing, 13(1), 2024
work page 2024
-
[3]
Combining t ext embedding and knowledge graph embedding techniques for academic search e ngines
Gengchen Mai, Krzysztof Janowicz, and Bo Yan. Combining t ext embedding and knowledge graph embedding techniques for academic search e ngines. In Semdeep- /NLIWoD@ ISWC, pages 77–88, 2018
work page 2018
-
[4]
A review of recommender systems based on knowledge graph emb edding
Jin-Cheng Zhang, Azlan Mohd Zain, Kai-Qing Zhou, Xi Chen, and Ren-Min Zhang. A review of recommender systems based on knowledge graph emb edding. Expert Systems With Applications , 250:123876, 2024
work page 2024
-
[5]
Benchmark and best practices for biomedical knowledge graph embeddings
David Chang, Ivana Balažević, Carl Allen, Daniel Chawla, Cynthia Brandt, and Andrew Taylor. Benchmark and best practices for biomedical knowledge graph embeddings. In Proceedings of the 19th SIGBioMed Workshop on Biomedical La n- guage Processing, pages 167–176, 2020
work page 2020
-
[6]
Chaochao Chen, Fei Zheng, Jamie Cui, Yuwei Cao, Guanfeng L iu, Jia Wu, and Jun Zhou. Survey and open problems in privacy-preserving knowl edge graph: merg- ing, query, representation, completion, and applications . International Journal of Machine Learning and Cybernetics , pages 1–20, 2024. 10 Y Hayder et al
work page 2024
-
[7]
A framework for differentially-private know ledge graph embed- dings
Xiaolin Han, Daniele Dell’Aglio, Tobias Grubenmann, Rey nold Cheng, and Abra- ham Bernstein. A framework for differentially-private know ledge graph embed- dings. Journal of Web Semantics , 72:100696, 2022
work page 2022
-
[8]
Fede: Embedding knowledge graphs in federated setting
Mingyang Chen, Wen Zhang, Zonggang Yuan, Yantao Jia, and H uajun Chen. Fede: Embedding knowledge graphs in federated setting. In Proceedings of the 10th international joint conference on knowledge graphs , pages 80–88, 2021
work page 2021
-
[9]
Graph embedding for recommendation against attribu te inference attacks
Shijie Zhang, Hongzhi Yin, Tong Chen, Zi Huang, Lizhen Cui , and Xiangliang Zhang. Graph embedding for recommendation against attribu te inference attacks. In Proceedings of the web conference 2021 , pages 3002–3014, 2021
work page 2021
-
[10]
Translating embeddings for modeling multi -relational data
Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran , Jason Weston, and Ok- sana Yakhnenko. Translating embeddings for modeling multi -relational data. Ad- vances in neural information processing systems , 26, 2013
work page 2013
-
[11]
RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space
Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. Rotate: Knowl- edge graph embedding by relational rotation in complex spac e. arXiv preprint arXiv:1902.10197, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1902
-
[12]
Attribute inference att acks in online social networks
Neil Zhenqiang Gong and Bin Liu. Attribute inference att acks in online social networks. ACM Transactions on Privacy and Security (TOPS) , 21(1):1–30, 2018
work page 2018
-
[13]
Membersh ip inference attacks on knowledge graphs
Yu Wang, Lifu Huang, Philip S Yu, and Lichao Sun. Membersh ip inference attacks on knowledge graphs. arXiv preprint arXiv:2104.08273 , 2021
-
[14]
Quantifying and defending against privac y threats on federated knowledge graph embedding
Yuke Hu, Wei Liang, Ruofan Wu, Kai Xiao, Weiqiang Wang, Xi aochen Li, Jinfei Liu, and Zhan Qin. Quantifying and defending against privac y threats on federated knowledge graph embedding. In Proceedings of the ACM Web Conference 2023 , pages 2306–2317, 2023
work page 2023
-
[15]
How To Break Anonymity of the Netflix Prize Dataset
Arvind Narayanan and Vitaly Shmatikov. How to break anon ymity of the netflix prize dataset. arXiv preprint cs/0610105 , 2006
work page internal anchor Pith review Pith/arXiv arXiv 2006
-
[16]
Cynthia Dwork. Differential privacy. In International colloquium on automata, languages, and programming , pages 1–12. Springer, 2006
work page 2006
-
[17]
Reckg: Know ledge graph for recommender systems
Junhyuk Kwon, Seokho Ahn, and Young-Duk Seo. Reckg: Know ledge graph for recommender systems. In Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing, pages 600–607, 2024
work page 2024
-
[18]
Mechanism design via di fferential privacy
Frank McSherry and Kunal Talwar. Mechanism design via di fferential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Scien ce (FOCS’07) , pages 94–103. IEEE, 2007
work page 2007
-
[19]
Amie: association rule mining under incomplete evidence in ontological knowledge bases
Luis Antonio Galárraga, Christina Teflioudi, Katja Hose , and Fabian Suchanek. Amie: association rule mining under incomplete evidence in ontological knowledge bases. In Proceedings of the 22nd international conference on World W ide Web , pages 413–422, 2013
work page 2013
-
[20]
Rule: Knowl- edge graph reasoning with rule embedding
Xiaojuan Tang, Song-Chun Zhu, Yitao Liang, and Muhan Zha ng. Rule: Knowl- edge graph reasoning with rule embedding. In Findings of the Association for Computational Linguistics: ACL 2024 , pages 4316–4335, 2024
work page 2024
-
[21]
RNNLogic: Learning Logic Rules for Reasoning on Knowledge Graphs
Meng Qu, Junkun Chen, Louis-Pascal Xhonneux, Yoshua Ben gio, and Jian Tang. Rnnlogic: Learning logic rules for reasoning on knowledge g raphs. arXiv preprint arXiv:2010.04029, 2020
-
[22]
Onto-dp: Constructing neighborhoods for differential privacy on ont ological databases
Yasmine Hayder, Adrien Boiret, Cédric Eichler, and Benj amin Nguyen. Onto-dp: Constructing neighborhoods for differential privacy on ont ological databases. arXiv preprint arXiv:2602.15614, 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.