The sameAs Problem: A Survey on Identity Management in the Web of Data
Pith reviewed 2026-05-24 16:22 UTC · model grok-4.3
The pith
Incorrect sameAs links disrupt data reuse across the Web of Data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that identity management in the Web of Data is broken. Several earlier studies have established problems with sameAs statements. This survey maps the current state of solutions, draws out their main weaknesses, and lists the open challenges that remain to be solved.
What carries the argument
The owl:sameAs statement, which asserts that two names denote the same entity and thereby links data for reuse.
If this is right
- Knowledge graphs from different sources cannot be merged reliably without correct identity links.
- Deductive systems built on the Web of Data will propagate errors from faulty sameAs statements.
- Data reuse across the Web of Data remains limited until identity weaknesses are resolved.
- Future identity solutions must address the specific weaknesses identified in current approaches.
Where Pith is reading between the lines
- Applications such as federated query engines or linked-data browsers may need explicit handling of uncertain identity rather than assuming sameAs is reliable.
- Similar identity management difficulties could arise in other decentralized data environments outside the Web of Data.
- Verification or qualification mechanisms for identity statements may be required in addition to the existing binary sameAs relation.
Load-bearing premise
The body of prior work reviewed accurately represents the prevalence and nature of the sameAs problem today.
What would settle it
A large-scale audit of public datasets that finds most sameAs statements to be correct and shows no measurable negative effects on downstream data applications.
read the original abstract
In a decentralised knowledge representation system such as the Web of Data, it is common and indeed desirable for different knowledge graphs to overlap. Whenever multiple names are used to denote the same thing, owl:sameAs statements are needed in order to link the data and foster reuse. Whilst the deductive value of such identity statements can be extremely useful in enhancing various knowledge-based systems, incorrect use of identity can have wide-ranging effects in a global knowledge space like the Web of Data. With several works already proven that identity in the Web is broken, this survey investigates the current state of this "sameAs problem". An open discussion highlights the main weaknesses suffered by solutions in the literature, and draws open challenges to be faced in the future.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This survey investigates the 'sameAs problem' in the Web of Data: the use of owl:sameAs to link overlapping knowledge graphs, the deductive utility of correct identity statements, and the wide-ranging effects of incorrect identity assertions. It reviews prior works demonstrating that identity management on the Web is broken, discusses main weaknesses of existing solutions, and identifies open challenges for future work.
Significance. If the survey faithfully represents the cited literature, it consolidates knowledge on a practically important issue for decentralized Semantic Web systems and can usefully direct attention to open challenges in identity management.
major comments (1)
- [Introduction / Survey methodology] The manuscript does not describe the search strategy, inclusion/exclusion criteria, or temporal scope used to select the reviewed works. This information is required to evaluate whether the cited body of literature accurately represents the current state of the sameAs problem (as asserted in the abstract and introduction).
minor comments (1)
- [Abstract] The abstract states that 'several works already proven that identity in the Web is broken' but does not cite those works; the introduction should provide explicit references at this point.
Simulated Author's Rebuttal
We thank the referee for this constructive comment on our survey. We agree that a clear description of the literature selection process is necessary for a survey paper and will incorporate it in the revision.
read point-by-point responses
-
Referee: [Introduction / Survey methodology] The manuscript does not describe the search strategy, inclusion/exclusion criteria, or temporal scope used to select the reviewed works. This information is required to evaluate whether the cited body of literature accurately represents the current state of the sameAs problem (as asserted in the abstract and introduction).
Authors: We acknowledge the omission. The original manuscript focused on synthesizing known issues and challenges rather than on the systematic review protocol. In the revised version we will add a new subsection (likely in Section 1 or as a dedicated 'Survey Methodology' section) that explicitly states the search strategy (e.g., Google Scholar, DBLP, Semantic Web venues), the keywords and Boolean queries employed, the temporal scope (papers up to mid-2019), and the inclusion/exclusion criteria applied to ensure the cited works represent the state of the sameAs problem. revision: yes
Circularity Check
No significant circularity: survey of external literature
full rationale
This is a literature survey whose central claim (incorrect identity statements can have wide-ranging effects) is supported by citations to prior external demonstrations that 'identity in the Web is broken.' No new equations, fitted parameters, predictions, uniqueness theorems, or ansatzes are introduced by the authors. The load-bearing condition is accurate representation of the cited body of work, which is external to the present paper and therefore does not reduce to self-definition or self-citation chains within this manuscript. No circular steps are present.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
identity is reflexive, symmetrical and transitive also follows from Leibniz’s Law... a = b → (∀ψ∈Ψ)(ψ(a) = ψ(b))
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
[]Batchelor et al., 2014 C. Batchelor, C. Brenninkmeijer, C. Chichester, M. Davies, D. Digles, I. Dunlop, C. Evelo, A. Gaulton, C. Goble, A. Gray, et al. Scientific lenses to support multiple views over linked chemistry data. In ISWC, pages 98–113. Springer,
work page 2014
-
[2]
[]Beek et al., 2016 W . Beek, S. Schlobach, and F. van Harmelen. A contextualised semantics for owl: sameas. In ISWC, pages 405–419. Springer,
work page 2016
-
[3]
[]Beek et al., 2018 W . Beek, J. Raad, J. Wielemaker, and F. van Harmelen. sameas. cc: The closure of 500m owl: sameas statements. In ESWC, pages 65–80. Springer,
work page 2018
-
[4]
[]Bouquet et al., 2003 P . Bouquet, F. Giunchiglia, F. V an Harmelen, L. Serafini, and H. Stuckenschmidt. C-owl: Contextualizing ontologies. In ISWC, pages 164–179. Springer,
work page 2003
-
[5]
[]Bouquet et al., 2007 P . Bouquet, H. Stoermer, and D. Gi- acomuzzi. OKKAM: enabling a web of entities. In I3, volume 249 of CEUR W orkshop Proceedings,
work page 2007
-
[6]
[]CudreMauroux et al., 2009 P . CudreMauroux, P . Haghani, M. Jost, K. Aberer, and H. De Meer. idmesh: graph-based disambiguation of linked data. In WWW, pages 591–600. ACM,
work page 2009
-
[7]
[]Cuzzola et al., 2015 J. Cuzzola, E. Bagheri, and J. Jo- vanovic. Filtering inaccurate entity co-references on the linked open data. In DEXA, pages 128–143. Springer,
work page 2015
-
[8]
[]de Melo, 2013 G. de Melo. Not quite the same: Identity constraints for the web of linked data. In Twenty-Seventh AAAI Conference on Artificial Intelligence ,
work page 2013
-
[9]
[]Euzenat and Shvaiko, 2013 J. Euzenat and P . Shvaiko. On- tology Matching, 2nd Edition . Springer,
work page 2013
-
[10]
[]Fern´ andezet al., 2017 J. Fern´ andez, W . Beek, M. Mart´ ınez-Prieto, and M. Arias. Lod-a-lot. In ISWC, pages 75–83. Springer,
work page 2017
-
[11]
[]Ferrara et al., 2013 A. Ferrara, A. Nikolov, and F. Scharffe. Data linking for the semantic web. Semantic W eb: On- tology and Knowledge Base Enabled T ools, Services, and Applications, 169:326,
work page 2013
-
[12]
[]Geach, 1967 P .T. Geach. Identity. Review of Metaphysics , 21:3–12,
work page 1967
- [13]
-
[14]
[]Grant and Subrahmanian, 1995 J. Grant and V . S. Subrah- manian. Reasoning in inconsistent knowledge bases. IEEE Trans. Knowl. Data Eng., 7(1):177–189,
work page 1995
-
[15]
[]Gu´ eretet al., 2012 C. Gu´ eret, P . Groth, C. Stadler, and J. Lehmann. Assessing linked data mappings using net- work measures. In ESWC, pages 87–102. Springer,
work page 2012
-
[16]
[]Guha, 1991 R. Guha. Contexts: a formalization and some applications, volume
work page 1991
- [17]
-
[18]
Sense and reference on the web (doctoral dissertation)
[]Halpin, 2010 Harry Halpin. Sense and reference on the web (doctoral dissertation). University of Edinburgh,
work page 2010
- [19]
-
[20]
[]Idrissou et al., 2017 A. Idrissou, R. Hoekstra, F. van Harmelen, A. Khalili, and P . van den Besselaar. Is my: sameas the same as your: sameas?: Lenticular lenses for context-specific identity. In K-CAP, page
work page 2017
-
[21]
[]Kripke, 1972 S. Kripke. Naming and necessity. In Seman- tics of natural language , pages 253–355. Springer,
work page 1972
-
[22]
[]Lewis, 1986 D. Lewis. On the plurality of worlds. Oxford, 14:43,
work page 1986
-
[23]
[]Mealling and Daniel, 1999 M. Mealling and R Daniel. Uri resolution services necessary for urn resolution (rfc 2483 ),
work page 1999
-
[24]
[]Nentwig et al., 2017 M. Nentwig, M. Hartung, A. Ngonga Ngomo, and E. Rahm. A survey of current link discovery frameworks. Semantic W eb, 8(3):419–436,
work page 2017
-
[25]
[]Nguyen, 2007 N. Nguyen. Advanced methods for inconsis- tent knowledge management. Springer Science & Business Media, Secaucus, NJ, USA,
work page 2007
-
[26]
[]Papaleo et al., 2014 L. Papaleo, N. Pernelle, F. Sa¨ ıs, and C. Dumont. Logical detection of invalid sameas statements in rdf data. In EKAW, pages 373–384. Springer,
work page 2014
- [27]
-
[28]
[]Raad et al., 2017 J. Raad, N. Pernelle, and F. Sa¨ ıs. Detec- tion of contextual identity links in a knowledge base. In K-CAP, page
work page 2017
-
[29]
[]Raad et al., 2018 J. Raad, W . Beek, F. van Harmelen, N. Pernelle, and F. Sa¨ ıs. Detecting erroneous identity links on the web using network metrics. In ISWC, pages 391–
work page 2018
-
[30]
[]Schlegel et al., 2014 K. Schlegel, F. Stegmaier, S. Bayerl, M. Granitzer, and H. Kosch. Balloon fusion: Sparql rewrit- ing based on unified co-reference information. In Data Engineering W orkshops, pages 254–259. IEEE,
work page 2014
-
[31]
Entity linking with a knowledge base: Issues, tech- niques, and solutions
[]Shen et al., 2015 Wei Shen, Jianyong Wang, and Jiawei Han. Entity linking with a knowledge base: Issues, tech- niques, and solutions. IEEE Transactions on Knowledge and Data Engineering , 27(2):443–460,
work page 2015
-
[32]
[]V aldestilhaset al., 2017 A. V aldestilhas, T. Soru, and A. Ngonga Ngomo. Cedal: time-efficient detection of er- roneous links in large-scale link repositories. In ICWI, pages 106–113. ACM, 2017
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.