pith. sign in

arxiv: 2605.19644 · v1 · pith:MSAMYOFHnew · submitted 2026-05-19 · 💻 cs.CR · cs.LG

Inferring Sensitive Attributes from Knowledge Graph Embeddings: Attack and Defense Strategies

Pith reviewed 2026-05-20 04:56 UTC · model grok-4.3

classification 💻 cs.CR cs.LG
keywords knowledge graphsembeddingsprivacy attacksattribute inferencesanitizationrecommendation systemsdata leakagepost-processing defenses
0
0 comments X

The pith

Knowledge graph embeddings allow inference of sensitive user attributes even when those attributes are not explicitly stored in the graph.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that standard knowledge graph embedding techniques, used to fill in missing facts and power services like recommendations, create outputs from which adversaries can deduce private details about users. It demonstrates that attribute inference attacks succeed on these outputs despite the absence of explicit sensitive data. To counter this, the authors introduce a framework that applies post-processing sanitization, including randomization, directly to the embedding results. Experiments reveal both the success of the attacks and the resulting trade-off between reduced privacy leakage and lower quality in downstream recommendation tasks. The work argues that more sophisticated sanitization methods will be required to manage this exposure effectively.

Core claim

Reasoning over knowledge graph embeddings enables attribute inference attacks that deduce sensitive user information not explicitly stored, and a framework that applies post-processing sanitization techniques to the embeddings mitigates these privacy risks.

What carries the argument

Post-processing sanitization applied to KGE outputs to limit attribute inference while preserving utility.

If this is right

  • Adversaries can deduce sensitive attributes from the outputs of common KGE models even without direct access to those attributes.
  • Randomization-based sanitization on embeddings lowers the success rate of inference attacks.
  • Sanitization creates a measurable trade-off between privacy gains and reduced accuracy in recommendation systems that rely on the embeddings.
  • Basic randomization is insufficient, so more advanced sanitization methods are needed to improve the privacy-utility balance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same inference risk may appear in other embedding-based systems outside knowledge graphs, such as those used for user profiles in social networks.
  • Embedding models could be retrained with privacy constraints built in rather than relying only on later fixes.
  • Standards for evaluating privacy in learned representations may need to include tests for indirect attribute leakage.

Load-bearing premise

The embeddings retain enough non-sensitive structure for reliable inference of sensitive attributes, and randomization can be tuned to cut leakage without destroying usefulness for tasks like recommendations.

What would settle it

A controlled experiment on a public KG where no correlation appears between embedding vectors and held-out sensitive attributes after removing obvious confounders, or where any effective randomization destroys recommendation accuracy.

read the original abstract

Knowledge Graphs (KGs) are a powerful representation of linked data, offering flexibility, semantic richness, and support for knowledge enrichment and reasoning. They help data owners organize and exploit heterogeneous data to provide insightful services (e.g., recommendations), yet real-world KGs are often incomplete, hiding true facts or missing valuable insights. Knowledge graph embedding techniques are commonly used to infer valuable missing information. However, reasoning over KGs can inadvertently expose sensitive user information, even when such data is not explicitly stored. In this work, we investigate the privacy risks associated with KGE-based reasoning, focusing on attribute inference attacks where adversaries attempt to deduce sensitive user attributes from seemingly non-sensitive outputs. We propose and evaluate a framework that mitigates these privacy risks by applying post processing sanitization techniques to KGE outputs. Preliminary results demonstrate the effectiveness of these attacks on the outputs of KGE models, and explore the trade-off between recommendation quality and privacy protection when applying randomization based approaches, highlighting the need to experiment with more advanced techniques in future work to address this issue.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript investigates privacy risks in knowledge graph embeddings (KGE), showing that attribute inference attacks can deduce sensitive user attributes from KGE outputs even when such data is not explicitly stored in the graph. It proposes a post-processing sanitization framework applying randomization-based techniques to KGE outputs to reduce leakage, while evaluating the resulting trade-off against recommendation quality. Preliminary results are presented to support attack effectiveness and the sanitization approach.

Significance. If the central claims hold after addressing evaluation gaps, the work identifies a relevant privacy vulnerability in KGE-based reasoning over incomplete KGs, which are widely used in recommendation and knowledge enrichment services. The proposed sanitization framework offers a practical, post-hoc defense that could be adopted without retraining embeddings. Explicit credit is due for focusing on the utility-privacy trade-off in a preliminary study, though stronger isolation of embedding-specific leakage would increase its contribution to privacy-preserving KG research.

major comments (3)
  1. Abstract and preliminary results section: the claims of attack effectiveness and sanitization trade-offs rest on unspecified preliminary results with no quantitative metrics, baselines, or error analysis provided, which is load-bearing for assessing whether the attacks succeed reliably or the randomization preserves downstream utility.
  2. Attack methodology (likely §4 or equivalent): the evaluation does not compare attribute inference performance on KGE outputs against equivalent inference directly on the raw KG structure or non-embedding ML models. This leaves open whether observed leakage arises from KGE-specific retained structure or from pre-existing correlations in the original graph, directly impacting the central claim that embeddings introduce the privacy risk.
  3. Sanitization framework (likely §5): details on how randomization parameters are chosen and tuned are insufficient to determine whether the approach reduces sensitive attribute leakage without destroying non-sensitive relational patterns needed for recommendation tasks.
minor comments (2)
  1. Introduction: a few sentences describing the distinction between explicitly stored and inferred sensitive attributes could be rephrased for precision and to avoid potential reader confusion.
  2. Related work: verify that recent papers on membership or attribute inference in graph embeddings are cited to better situate the contribution.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments on our preliminary study. The feedback identifies important gaps in evaluation rigor and methodological clarity that we will address in the revision. Below we respond to each major comment.

read point-by-point responses
  1. Referee: Abstract and preliminary results section: the claims of attack effectiveness and sanitization trade-offs rest on unspecified preliminary results with no quantitative metrics, baselines, or error analysis provided, which is load-bearing for assessing whether the attacks succeed reliably or the randomization preserves downstream utility.

    Authors: We agree that the current presentation of preliminary results is insufficiently detailed. In the revised manuscript we will expand the relevant sections to report concrete quantitative metrics for the attribute inference attacks (precision, recall, F1, and AUC) together with appropriate baselines and error analysis. For the sanitization experiments we will likewise include explicit utility metrics (e.g., NDCG, precision@K) and privacy-utility trade-off curves so that the effectiveness claims can be properly evaluated. revision: yes

  2. Referee: Attack methodology (likely §4 or equivalent): the evaluation does not compare attribute inference performance on KGE outputs against equivalent inference directly on the raw KG structure or non-embedding ML models. This leaves open whether observed leakage arises from KGE-specific retained structure or from pre-existing correlations in the original graph, directly impacting the central claim that embeddings introduce the privacy risk.

    Authors: This is a substantive methodological concern. To isolate embedding-specific leakage we will add new experiments that apply the same attribute inference attack to (i) the raw knowledge graph using graph-based or rule-based methods and (ii) non-embedding supervised models trained on the original graph features. The revised evaluation will report these comparisons side-by-side with the KGE results, allowing readers to assess whether the observed leakage is amplified by the embedding process itself. revision: yes

  3. Referee: Sanitization framework (likely §5): details on how randomization parameters are chosen and tuned are insufficient to determine whether the approach reduces sensitive attribute leakage without destroying non-sensitive relational patterns needed for recommendation tasks.

    Authors: We acknowledge the lack of transparency regarding parameter selection. The revised manuscript will describe the randomization procedure in detail, including the noise distributions employed, the grid-search or heuristic procedure used to select parameter values, and ablation results that quantify the effect of each parameter on both attribute-inference leakage and downstream recommendation quality. This will make the utility-privacy trade-off reproducible and easier to interpret. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical attack-defense framework

full rationale

The paper describes an empirical investigation of attribute inference attacks on KGE outputs and a post-processing sanitization framework to mitigate leakage, with preliminary results on privacy-utility trade-offs. No equations, parameter-fitting procedures, or derivations are referenced in the abstract or summary that could reduce a claimed prediction or result to an input by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatzes are invoked. The central claims rest on experimental evaluation rather than self-referential definitions or renamed known patterns. This is a standard applied privacy paper whose reasoning chain does not exhibit the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the work implicitly assumes that KGE outputs encode inferable sensitive signals and that sanitization parameters can be chosen without further justification.

pith-pipeline@v0.9.0 · 5710 in / 1036 out tokens · 62353 ms · 2026-05-20T04:56:46.493474+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 2 internal anchors

  1. [1]

    Introduction: what is a knowledge graph? Knowledge graphs: Methodology, tools and selected use cases , pages 1–10, 2020

    Dieter Fensel, Umutcan Şimşek, Kevin Angele, Elwin Huama n, Elias Kärle, Olek- sandra Panasiuk, Ioan Toma, Jürgen Umbrich, Alexander Wahl er, Dieter Fensel, et al. Introduction: what is a knowledge graph? Knowledge graphs: Methodology, tools and selected use cases , pages 1–10, 2020

  2. [2]

    Knowle dge graph embedding: An overview

    Xiou Ge, Yun Cheng Wang, Bin Wang, C-C Jay Kuo, et al. Knowle dge graph embedding: An overview. APSIPA Transactions on Signal and Information Pro- cessing, 13(1), 2024

  3. [3]

    Combining t ext embedding and knowledge graph embedding techniques for academic search e ngines

    Gengchen Mai, Krzysztof Janowicz, and Bo Yan. Combining t ext embedding and knowledge graph embedding techniques for academic search e ngines. In Semdeep- /NLIWoD@ ISWC, pages 77–88, 2018

  4. [4]

    A review of recommender systems based on knowledge graph emb edding

    Jin-Cheng Zhang, Azlan Mohd Zain, Kai-Qing Zhou, Xi Chen, and Ren-Min Zhang. A review of recommender systems based on knowledge graph emb edding. Expert Systems With Applications , 250:123876, 2024

  5. [5]

    Benchmark and best practices for biomedical knowledge graph embeddings

    David Chang, Ivana Balažević, Carl Allen, Daniel Chawla, Cynthia Brandt, and Andrew Taylor. Benchmark and best practices for biomedical knowledge graph embeddings. In Proceedings of the 19th SIGBioMed Workshop on Biomedical La n- guage Processing, pages 167–176, 2020

  6. [6]

    Survey and open problems in privacy-preserving knowl edge graph: merg- ing, query, representation, completion, and applications

    Chaochao Chen, Fei Zheng, Jamie Cui, Yuwei Cao, Guanfeng L iu, Jia Wu, and Jun Zhou. Survey and open problems in privacy-preserving knowl edge graph: merg- ing, query, representation, completion, and applications . International Journal of Machine Learning and Cybernetics , pages 1–20, 2024. 10 Y Hayder et al

  7. [7]

    A framework for differentially-private know ledge graph embed- dings

    Xiaolin Han, Daniele Dell’Aglio, Tobias Grubenmann, Rey nold Cheng, and Abra- ham Bernstein. A framework for differentially-private know ledge graph embed- dings. Journal of Web Semantics , 72:100696, 2022

  8. [8]

    Fede: Embedding knowledge graphs in federated setting

    Mingyang Chen, Wen Zhang, Zonggang Yuan, Yantao Jia, and H uajun Chen. Fede: Embedding knowledge graphs in federated setting. In Proceedings of the 10th international joint conference on knowledge graphs , pages 80–88, 2021

  9. [9]

    Graph embedding for recommendation against attribu te inference attacks

    Shijie Zhang, Hongzhi Yin, Tong Chen, Zi Huang, Lizhen Cui , and Xiangliang Zhang. Graph embedding for recommendation against attribu te inference attacks. In Proceedings of the web conference 2021 , pages 3002–3014, 2021

  10. [10]

    Translating embeddings for modeling multi -relational data

    Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran , Jason Weston, and Ok- sana Yakhnenko. Translating embeddings for modeling multi -relational data. Ad- vances in neural information processing systems , 26, 2013

  11. [11]

    RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space

    Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. Rotate: Knowl- edge graph embedding by relational rotation in complex spac e. arXiv preprint arXiv:1902.10197, 2019

  12. [12]

    Attribute inference att acks in online social networks

    Neil Zhenqiang Gong and Bin Liu. Attribute inference att acks in online social networks. ACM Transactions on Privacy and Security (TOPS) , 21(1):1–30, 2018

  13. [13]

    Membersh ip inference attacks on knowledge graphs

    Yu Wang, Lifu Huang, Philip S Yu, and Lichao Sun. Membersh ip inference attacks on knowledge graphs. arXiv preprint arXiv:2104.08273 , 2021

  14. [14]

    Quantifying and defending against privac y threats on federated knowledge graph embedding

    Yuke Hu, Wei Liang, Ruofan Wu, Kai Xiao, Weiqiang Wang, Xi aochen Li, Jinfei Liu, and Zhan Qin. Quantifying and defending against privac y threats on federated knowledge graph embedding. In Proceedings of the ACM Web Conference 2023 , pages 2306–2317, 2023

  15. [15]

    How To Break Anonymity of the Netflix Prize Dataset

    Arvind Narayanan and Vitaly Shmatikov. How to break anon ymity of the netflix prize dataset. arXiv preprint cs/0610105 , 2006

  16. [16]

    Differential privacy

    Cynthia Dwork. Differential privacy. In International colloquium on automata, languages, and programming , pages 1–12. Springer, 2006

  17. [17]

    Reckg: Know ledge graph for recommender systems

    Junhyuk Kwon, Seokho Ahn, and Young-Duk Seo. Reckg: Know ledge graph for recommender systems. In Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing, pages 600–607, 2024

  18. [18]

    Mechanism design via di fferential privacy

    Frank McSherry and Kunal Talwar. Mechanism design via di fferential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Scien ce (FOCS’07) , pages 94–103. IEEE, 2007

  19. [19]

    Amie: association rule mining under incomplete evidence in ontological knowledge bases

    Luis Antonio Galárraga, Christina Teflioudi, Katja Hose , and Fabian Suchanek. Amie: association rule mining under incomplete evidence in ontological knowledge bases. In Proceedings of the 22nd international conference on World W ide Web , pages 413–422, 2013

  20. [20]

    Rule: Knowl- edge graph reasoning with rule embedding

    Xiaojuan Tang, Song-Chun Zhu, Yitao Liang, and Muhan Zha ng. Rule: Knowl- edge graph reasoning with rule embedding. In Findings of the Association for Computational Linguistics: ACL 2024 , pages 4316–4335, 2024

  21. [21]

    RNNLogic: Learning Logic Rules for Reasoning on Knowledge Graphs

    Meng Qu, Junkun Chen, Louis-Pascal Xhonneux, Yoshua Ben gio, and Jian Tang. Rnnlogic: Learning logic rules for reasoning on knowledge g raphs. arXiv preprint arXiv:2010.04029, 2020

  22. [22]

    Onto-dp: Constructing neighborhoods for differential privacy on ont ological databases

    Yasmine Hayder, Adrien Boiret, Cédric Eichler, and Benj amin Nguyen. Onto-dp: Constructing neighborhoods for differential privacy on ont ological databases. arXiv preprint arXiv:2602.15614, 2026