Extracting Novel Facts from Tables for Knowledge Graph Completion (Extended version)
Pith reviewed 2026-05-25 12:49 UTC · model grok-4.3
The pith
A graphical model using entity similarities extracts more novel facts from tables than methods biased toward redundant information.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that table interpretation based on a scalable graphical model using entity similarities, with KG embeddings used for additional disambiguation, produces more novel facts for knowledge graph completion, attains higher recall than existing methods, and resists the common bias toward redundant extractions, all while requiring no domain-specific assumptions about the target KG and allowing fine-grained control over the precision-recall trade-off.
What carries the argument
Scalable graphical model using entity similarities, augmented by KG embeddings for cell-value ranking and disambiguation.
If this is right
- Tables become a richer source of unique information that can be added to knowledge graphs.
- The precision-recall balance of extracted facts can be adjusted directly without changing the underlying model.
- The technique can be applied to any knowledge graph without custom rules or domain knowledge.
- Fewer facts that are already present in the graph are extracted, reducing redundancy in the completion process.
Where Pith is reading between the lines
- Knowledge graphs populated this way could contain a broader range of relations and therefore support a wider set of downstream queries.
- The emphasis on novelty might reduce the tendency of automated KG construction to reinforce existing data distributions.
- The same graphical model could be tested on other structured sources such as spreadsheets or CSV files to check whether the novelty advantage generalizes.
Load-bearing premise
Entity similarities captured in a graphical model, together with KG embeddings, can surface novel facts from tables without requiring assumptions about the target graph or introducing new extraction biases.
What would settle it
Apply the method and a baseline to the same collection of tables that have been manually annotated with ground-truth novel versus redundant facts, then measure whether the new method produces a measurably higher fraction of novel extractions and higher recall.
Figures
read the original abstract
We propose a new end-to-end method for extending a Knowledge Graph (KG) from tables. Existing techniques tend to interpret tables by focusing on information that is already in the KG, and therefore tend to extract many redundant facts. Our method aims to find more novel facts. We introduce a new technique for table interpretation based on a scalable graphical model using entity similarities. Our method further disambiguates cell values using KG embeddings as additional ranking method. Other distinctive features are the lack of assumptions about the underlying KG and the enabling of a fine-grained tuning of the precision/recall trade-off of extracted facts. Our experiments show that our approach has a higher recall during the interpretation process than the state-of-the-art, and is more resistant against the bias observed in extracting mostly redundant facts since it produces more novel extractions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes an end-to-end method for extending a knowledge graph from tables. It introduces a scalable graphical model driven by entity similarities for table interpretation, augmented by KG embeddings for cell-value disambiguation. Distinctive features claimed include the absence of assumptions about the target KG, a tunable precision/recall trade-off, higher recall than prior art during interpretation, and greater resistance to redundancy bias via more novel extractions.
Significance. If the experimental claims hold after addressing the noted inconsistency, the work would offer a practical advance in table-driven KG completion by shifting extraction toward novel facts rather than redundant ones already present in the graph. The graphical-model-plus-embeddings design and explicit precision/recall control could be broadly useful for KG maintenance tasks.
major comments (1)
- [Abstract] Abstract: the central claim of a 'lack of assumptions about the underlying KG' is undercut by the explicit addition of 'KG embeddings as additional ranking method' for disambiguation. Because these embeddings are learned from the same KG being extended, they necessarily import that KG's distributional statistics (entity popularity, relation frequencies); any reported gain in novel-fact extraction therefore cannot be attributed solely to the graphical model on entity similarities.
Simulated Author's Rebuttal
We thank the referee for the thoughtful review and for highlighting this point about our abstract. We address the concern below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim of a 'lack of assumptions about the underlying KG' is undercut by the explicit addition of 'KG embeddings as additional ranking method' for disambiguation. Because these embeddings are learned from the same KG being extended, they necessarily import that KG's distributional statistics (entity popularity, relation frequencies); any reported gain in novel-fact extraction therefore cannot be attributed solely to the graphical model on entity similarities.
Authors: We agree that the use of KG embeddings incorporates distributional statistics from the KG and that this should be acknowledged when claiming a lack of assumptions. The phrase 'lack of assumptions about the underlying KG' in the abstract was intended to indicate that the method does not require the KG to possess a fixed schema, complete coverage of table entities, or domain-specific relation types, unlike some prior table-interpretation approaches. The embeddings function as an optional auxiliary ranking signal for cell disambiguation rather than a core requirement. Nevertheless, we accept that the current wording is imprecise and risks overstating the assumption-free nature of the full pipeline. We will revise the abstract to clarify the intended meaning of 'lack of assumptions,' explicitly note the auxiliary role of embeddings, and ensure that performance gains are attributed primarily to the graphical model on entity similarities. revision: yes
Circularity Check
No circularity: experimental method proposal with independent empirical claims
full rationale
The paper describes an end-to-end table interpretation method using a graphical model on entity similarities plus KG embeddings for disambiguation. It reports experimental results on recall and novel-fact extraction rates versus baselines. No equations, derivations, or fitted parameters are presented that reduce the claimed performance gains to quantities defined inside the same work. The 'lack of assumptions' phrasing is a methodological claim, not a self-referential derivation. Self-citations are absent from the provided text and not load-bearing. The work is self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
free parameters (1)
- precision/recall tuning parameter
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
scalable graphical model using entity similarities... KG embeddings as additional ranking method... lack of assumptions about the underlying KG
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
EntitySimilarity(e1,e2) = sum LinkScore(r,v) over shared links; LBP update qe = product (LS)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: Dbpedia: A nucleus for a web of open data. In: The semantic web, pp. 722–735. Springer (2007)
work page 2007
-
[2]
Bhagavatula, C.S., Noraset, T., Downey, D.: TabEL: Entity Linking in Web Tables. In: Proceedings of ISWC. pp. 425–441 (2015)
work page 2015
-
[3]
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translat- ing Embeddings for Modeling Multi-relational Data. In: Proceedings of NIPS. pp. 2787–2795 (2013)
work page 2013
-
[4]
Proceedings of VLDB 11(12), 2140–2149 (2018)
Cafarella, M., Halevy, A., Lee, H., Madhavan, J., Yu, C., Wang, D.Z., Wu, E.: Ten years of webtables. Proceedings of VLDB 11(12), 2140–2149 (2018)
work page 2018
-
[5]
Cannaviccio, M., Barbosa, D., Merialdo, P.: Towards Annotating Relational Data on the Web with Language Models. In: Proceedings of WWW. pp. 1307–1316 (2018)
work page 2018
-
[6]
Dong, X.L., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S., Zhang, W.: Knowledge Vault: a Web-scale Approach to Probabilistic Knowledge Fusion. In: Proceedings of KDD. pp. 601–610 (2014)
work page 2014
-
[7]
Efthymiou, V., Hassanzadeh, O., Rodriguez-Muro, M., Christophides, V.: Match- ing Web Tables with Knowledge Base Entities: From Entity Lookups to Entity Embeddings. In: Proceedings of ISWC. pp. 260–277 (2017)
work page 2017
-
[8]
Efthymiou, V., Hassanzadeh, O., Sadoghi, M., Rodriguez-Muro, M.: Annotating Web Rables Through Ontology Matching. In: Proceedings of OM at ISWC. pp. 229–230 (2016)
work page 2016
-
[9]
Ermilov, I., Ngomo, A.C.N.: TAIPAN: Automatic Property Mapping for Tabular Data. In: Proceedings of EKAW. pp. 163–179 (2016)
work page 2016
-
[10]
Hassanzadeh, O., Ward, M.J., Rodriguez-Muro, M., Srinivas, K.: Understanding a Large Corpus of Web Tables Through Matching with Knowledge Bases: an Em- pirical Study. In: Proceedings of OM at ISWC. pp. 25–34 (2015)
work page 2015
-
[11]
Hayes, P.: RDF Semantics. W3C Recommendation. Available at http://www.w3.org/TR/rdf-mt/ (2004)
work page 2004
-
[12]
Ibrahim, Y., Riedewald, M., Weikum, G.: Making Sense of Entities and Quantities in Web Tables. Proceedings of CIKM pp. 1703–1712 (2016)
work page 2016
-
[13]
Ji, H., Grishman, R.: Knowledge base population: Successful approaches and chal- lenges. In: Proceedings of the 49th annual meeting of the association for computa- tional linguistics: Human language technologies-volume 1. pp. 1148–1158. Associ- ation for Computational Linguistics (2011) 18 Benno Kruit, Peter Boncz, and Jacopo Urbani
work page 2011
-
[14]
Kruit, B., Boncz, P., Urbani, J.: Extracting New Knowledge from Web Tables: Novelty or Confidence? In: Proceedings of KBCOM (2018)
work page 2018
-
[15]
Extracting Novel Facts from Tables for Knowledge Graph Completion (Extended version)
Kruit, B., Boncz, P., Urbani, J.: Extracting Novel Facts from Tables for Knowledge Graph Completion (Extended version). arXiv e-prints arXiv:1907.00083 (2019)
work page internal anchor Pith review Pith/arXiv arXiv 1907
-
[16]
PVLDB 3(1-2), 1338–1347 (2010)
Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and Searching Web Tables Using Entities, Types and Relationships. PVLDB 3(1-2), 1338–1347 (2010)
work page 2010
-
[17]
Mulwad, V., Finin, T., Joshi, A.: Semantic Message Passing for Generating Linked Data from Tables. In: Proceedings of ISWC. pp. 363–378 (2013)
work page 2013
-
[18]
Mu˜ noz, E., Hogan, A., Mileo, A.: Using Linked Data to Mine RDF from Wikipedia’s Tables. In: Proceedings of WSDM. pp. 533–542 (2014)
work page 2014
-
[19]
Neumaier, S., Umbrich, J., Parreira, J.X., Polleres, A.: Multi-level Semantic La- belling of Numerical Values. In: Proceedings of ISWC. pp. 428–445 (2016)
work page 2016
-
[20]
Proceedings of the IEEE 104(1), 11–33 (2016)
Nickel, M., Murphy, K., Tresp, V., Gabrilovich, E.: A review of relational machine learning for knowledge graphs. Proceedings of the IEEE 104(1), 11–33 (2016)
work page 2016
-
[21]
Morgan Kaufmann Publishers Inc
Pearl, J.: Probabilistic reasoning in intelligent systems - networks of plausible in- ference. Morgan Kaufmann Publishers Inc. (1989)
work page 1989
-
[22]
Pham, M., Alse, S., Knoblock, C.A., Szekely, P.: Semantic Labeling : A Domain- independent Approach. Proceedings of ISWC pp. 446–462 (2016)
work page 2016
-
[23]
Ran, C., Shen, W., Wang, J., Zhu, X.: Domain-Specific Knowledge Base Enrich- ment Using Wikipedia Tables. In: Proceedings of ICDM. pp. 349–358 (2015)
work page 2015
-
[24]
In: Advances in Neural Information Processing Systems
Recht, B., Re, C., Wright, S., Niu, F.: Hogwild: A lock-free approach to paralleliz- ing stochastic gradient descent. In: Advances in Neural Information Processing Systems. pp. 693–701 (2011)
work page 2011
-
[25]
In: Proceedings of HLT-NAACL (2013)
Riedel, S., Yao, L., McCallum, A., Marlin, B.M.: Relation Extraction with Matrix Factorization and Universal Schemas. In: Proceedings of HLT-NAACL (2013)
work page 2013
-
[26]
Ritze, D., Lehmberg, O., Bizer, C.: Matching HTML tables to DBpedia. In: Pro- ceedings of WIMS. p. 10 (2015)
work page 2015
-
[27]
Ritze, D., Lehmberg, O., Oulabi, Y., Bizer, C.: Profiling the Potential of Web Tables for Augmenting Cross-domain Knowledge Bases. In: Proceedings of WWW. pp. 251–261 (2016)
work page 2016
-
[28]
In: Proceedings of LDOW at WWW (2014)
Sekhavat, Y.A., Paolo, F.D., Barbosa, D., Merialdo, P.: Knowledge Base Augmen- tation using Tabular Data. In: Proceedings of LDOW at WWW (2014)
work page 2014
-
[29]
Sun, H., Ma, H., He, X., Yih, W.T., Su, Y., Yan, X.: Table Cell Search for Question Answering. In: Proceedings of WWW. pp. 771–782 (2016)
work page 2016
-
[30]
Venetis, P., Halevy, A., Madhavan, J., Paca, M., Shen, W., Wu, F., Miao, G., Wu, C.: Recovering Semantics of Tables on the Web. PVLDB 4, 528–538 (2011)
work page 2011
-
[31]
Wang, J., Shao, B., Wang, H.: Understanding Tables on the Web. In: ER. vol. 1, pp. 141–155 (2010)
work page 2010
-
[32]
Yakout, M., Ganjam, K., Chakrabarti, K., Chaudhuri, S.: InfoGather: Entity Aug- mentation and Attribute Discovery by Holistic Matching with Web Tables. In: Proceedings of SIGMOD. pp. 97–108 (2012)
work page 2012
-
[33]
Semantic Web 8(6), 921–957 (2017)
Zhang, Z.: Effective and efficient semantic table interpretation using tableminer+. Semantic Web 8(6), 921–957 (2017)
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.