pith. sign in

arxiv: 1907.00083 · v2 · pith:JHXVXJ6Znew · submitted 2019-06-28 · 💻 cs.IR · cs.DB

Extracting Novel Facts from Tables for Knowledge Graph Completion (Extended version)

Pith reviewed 2026-05-25 12:49 UTC · model grok-4.3

classification 💻 cs.IR cs.DB
keywords knowledge graph completiontable interpretationnovel fact extractiongraphical modelsentity disambiguationknowledge graphs
0
0 comments X

The pith

A graphical model using entity similarities extracts more novel facts from tables than methods biased toward redundant information.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents an end-to-end technique to extend knowledge graphs by interpreting tables in a way that prioritizes novel facts over redundant ones already present in the graph. It relies on a scalable graphical model driven by entity similarities for the core interpretation step and applies KG embeddings as a secondary ranking method to disambiguate cell values. The approach makes no assumptions about the structure or content of the target knowledge graph and supports explicit tuning of the precision-recall balance. Experiments indicate that the method achieves higher recall during interpretation and yields a larger share of novel extractions compared with prior work.

Core claim

The central claim is that table interpretation based on a scalable graphical model using entity similarities, with KG embeddings used for additional disambiguation, produces more novel facts for knowledge graph completion, attains higher recall than existing methods, and resists the common bias toward redundant extractions, all while requiring no domain-specific assumptions about the target KG and allowing fine-grained control over the precision-recall trade-off.

What carries the argument

Scalable graphical model using entity similarities, augmented by KG embeddings for cell-value ranking and disambiguation.

If this is right

  • Tables become a richer source of unique information that can be added to knowledge graphs.
  • The precision-recall balance of extracted facts can be adjusted directly without changing the underlying model.
  • The technique can be applied to any knowledge graph without custom rules or domain knowledge.
  • Fewer facts that are already present in the graph are extracted, reducing redundancy in the completion process.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Knowledge graphs populated this way could contain a broader range of relations and therefore support a wider set of downstream queries.
  • The emphasis on novelty might reduce the tendency of automated KG construction to reinforce existing data distributions.
  • The same graphical model could be tested on other structured sources such as spreadsheets or CSV files to check whether the novelty advantage generalizes.

Load-bearing premise

Entity similarities captured in a graphical model, together with KG embeddings, can surface novel facts from tables without requiring assumptions about the target graph or introducing new extraction biases.

What would settle it

Apply the method and a baseline to the same collection of tables that have been manually annotated with ground-truth novel versus redundant facts, then measure whether the new method produces a measurably higher fraction of novel extractions and higher recall.

Figures

Figures reproduced from arXiv: 1907.00083 by Benno Kruit, Jacopo Urbani, Peter Boncz.

Figure 1
Figure 1. Figure 1: Schematic representation of our method. network using the potential functions in order to determine the final distribution of the random variables. KG Embeddings. We also make use of latent representations of the KG [20] to filter out incorrect extractions. In particular, we consider TransE [3], one of the most popular methods in this category. The main idea of TransE is to “embed” each entity and relation… view at source ↗
Figure 2
Figure 2. Figure 2: Row-entity evaluation scores and precision-recall tradeoff for the T2D [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Row-entity evaluation scores and precision-recall tradeoff of our approach [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The novel and redundant precision-recall tradeoff for the T2D-v2 dataset [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
read the original abstract

We propose a new end-to-end method for extending a Knowledge Graph (KG) from tables. Existing techniques tend to interpret tables by focusing on information that is already in the KG, and therefore tend to extract many redundant facts. Our method aims to find more novel facts. We introduce a new technique for table interpretation based on a scalable graphical model using entity similarities. Our method further disambiguates cell values using KG embeddings as additional ranking method. Other distinctive features are the lack of assumptions about the underlying KG and the enabling of a fine-grained tuning of the precision/recall trade-off of extracted facts. Our experiments show that our approach has a higher recall during the interpretation process than the state-of-the-art, and is more resistant against the bias observed in extracting mostly redundant facts since it produces more novel extractions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper proposes an end-to-end method for extending a knowledge graph from tables. It introduces a scalable graphical model driven by entity similarities for table interpretation, augmented by KG embeddings for cell-value disambiguation. Distinctive features claimed include the absence of assumptions about the target KG, a tunable precision/recall trade-off, higher recall than prior art during interpretation, and greater resistance to redundancy bias via more novel extractions.

Significance. If the experimental claims hold after addressing the noted inconsistency, the work would offer a practical advance in table-driven KG completion by shifting extraction toward novel facts rather than redundant ones already present in the graph. The graphical-model-plus-embeddings design and explicit precision/recall control could be broadly useful for KG maintenance tasks.

major comments (1)
  1. [Abstract] Abstract: the central claim of a 'lack of assumptions about the underlying KG' is undercut by the explicit addition of 'KG embeddings as additional ranking method' for disambiguation. Because these embeddings are learned from the same KG being extended, they necessarily import that KG's distributional statistics (entity popularity, relation frequencies); any reported gain in novel-fact extraction therefore cannot be attributed solely to the graphical model on entity similarities.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the thoughtful review and for highlighting this point about our abstract. We address the concern below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim of a 'lack of assumptions about the underlying KG' is undercut by the explicit addition of 'KG embeddings as additional ranking method' for disambiguation. Because these embeddings are learned from the same KG being extended, they necessarily import that KG's distributional statistics (entity popularity, relation frequencies); any reported gain in novel-fact extraction therefore cannot be attributed solely to the graphical model on entity similarities.

    Authors: We agree that the use of KG embeddings incorporates distributional statistics from the KG and that this should be acknowledged when claiming a lack of assumptions. The phrase 'lack of assumptions about the underlying KG' in the abstract was intended to indicate that the method does not require the KG to possess a fixed schema, complete coverage of table entities, or domain-specific relation types, unlike some prior table-interpretation approaches. The embeddings function as an optional auxiliary ranking signal for cell disambiguation rather than a core requirement. Nevertheless, we accept that the current wording is imprecise and risks overstating the assumption-free nature of the full pipeline. We will revise the abstract to clarify the intended meaning of 'lack of assumptions,' explicitly note the auxiliary role of embeddings, and ensure that performance gains are attributed primarily to the graphical model on entity similarities. revision: yes

Circularity Check

0 steps flagged

No circularity: experimental method proposal with independent empirical claims

full rationale

The paper describes an end-to-end table interpretation method using a graphical model on entity similarities plus KG embeddings for disambiguation. It reports experimental results on recall and novel-fact extraction rates versus baselines. No equations, derivations, or fitted parameters are presented that reduce the claimed performance gains to quantities defined inside the same work. The 'lack of assumptions' phrasing is a methodological claim, not a self-referential derivation. Self-citations are absent from the provided text and not load-bearing. The work is self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are described beyond the high-level mention of a tunable precision/recall trade-off.

free parameters (1)
  • precision/recall tuning parameter
    The method enables fine-grained tuning of the precision/recall trade-off, implying at least one tunable parameter whose value affects extracted facts.

pith-pipeline@v0.9.0 · 5667 in / 1108 out tokens · 31435 ms · 2026-05-25T12:49:40.045432+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 1 internal anchor

  1. [1]

    In: The semantic web, pp

    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: Dbpedia: A nucleus for a web of open data. In: The semantic web, pp. 722–735. Springer (2007)

  2. [2]

    In: Proceedings of ISWC

    Bhagavatula, C.S., Noraset, T., Downey, D.: TabEL: Entity Linking in Web Tables. In: Proceedings of ISWC. pp. 425–441 (2015)

  3. [3]

    In: Proceedings of NIPS

    Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translat- ing Embeddings for Modeling Multi-relational Data. In: Proceedings of NIPS. pp. 2787–2795 (2013)

  4. [4]

    Proceedings of VLDB 11(12), 2140–2149 (2018)

    Cafarella, M., Halevy, A., Lee, H., Madhavan, J., Yu, C., Wang, D.Z., Wu, E.: Ten years of webtables. Proceedings of VLDB 11(12), 2140–2149 (2018)

  5. [5]

    In: Proceedings of WWW

    Cannaviccio, M., Barbosa, D., Merialdo, P.: Towards Annotating Relational Data on the Web with Language Models. In: Proceedings of WWW. pp. 1307–1316 (2018)

  6. [6]

    In: Proceedings of KDD

    Dong, X.L., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S., Zhang, W.: Knowledge Vault: a Web-scale Approach to Probabilistic Knowledge Fusion. In: Proceedings of KDD. pp. 601–610 (2014)

  7. [7]

    In: Proceedings of ISWC

    Efthymiou, V., Hassanzadeh, O., Rodriguez-Muro, M., Christophides, V.: Match- ing Web Tables with Knowledge Base Entities: From Entity Lookups to Entity Embeddings. In: Proceedings of ISWC. pp. 260–277 (2017)

  8. [8]

    In: Proceedings of OM at ISWC

    Efthymiou, V., Hassanzadeh, O., Sadoghi, M., Rodriguez-Muro, M.: Annotating Web Rables Through Ontology Matching. In: Proceedings of OM at ISWC. pp. 229–230 (2016)

  9. [9]

    In: Proceedings of EKAW

    Ermilov, I., Ngomo, A.C.N.: TAIPAN: Automatic Property Mapping for Tabular Data. In: Proceedings of EKAW. pp. 163–179 (2016)

  10. [10]

    In: Proceedings of OM at ISWC

    Hassanzadeh, O., Ward, M.J., Rodriguez-Muro, M., Srinivas, K.: Understanding a Large Corpus of Web Tables Through Matching with Knowledge Bases: an Em- pirical Study. In: Proceedings of OM at ISWC. pp. 25–34 (2015)

  11. [11]

    W3C Recommendation

    Hayes, P.: RDF Semantics. W3C Recommendation. Available at http://www.w3.org/TR/rdf-mt/ (2004)

  12. [12]

    Proceedings of CIKM pp

    Ibrahim, Y., Riedewald, M., Weikum, G.: Making Sense of Entities and Quantities in Web Tables. Proceedings of CIKM pp. 1703–1712 (2016)

  13. [13]

    In: Proceedings of the 49th annual meeting of the association for computa- tional linguistics: Human language technologies-volume 1

    Ji, H., Grishman, R.: Knowledge base population: Successful approaches and chal- lenges. In: Proceedings of the 49th annual meeting of the association for computa- tional linguistics: Human language technologies-volume 1. pp. 1148–1158. Associ- ation for Computational Linguistics (2011) 18 Benno Kruit, Peter Boncz, and Jacopo Urbani

  14. [14]

    Kruit, B., Boncz, P., Urbani, J.: Extracting New Knowledge from Web Tables: Novelty or Confidence? In: Proceedings of KBCOM (2018)

  15. [15]

    Extracting Novel Facts from Tables for Knowledge Graph Completion (Extended version)

    Kruit, B., Boncz, P., Urbani, J.: Extracting Novel Facts from Tables for Knowledge Graph Completion (Extended version). arXiv e-prints arXiv:1907.00083 (2019)

  16. [16]

    PVLDB 3(1-2), 1338–1347 (2010)

    Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and Searching Web Tables Using Entities, Types and Relationships. PVLDB 3(1-2), 1338–1347 (2010)

  17. [17]

    In: Proceedings of ISWC

    Mulwad, V., Finin, T., Joshi, A.: Semantic Message Passing for Generating Linked Data from Tables. In: Proceedings of ISWC. pp. 363–378 (2013)

  18. [18]

    In: Proceedings of WSDM

    Mu˜ noz, E., Hogan, A., Mileo, A.: Using Linked Data to Mine RDF from Wikipedia’s Tables. In: Proceedings of WSDM. pp. 533–542 (2014)

  19. [19]

    In: Proceedings of ISWC

    Neumaier, S., Umbrich, J., Parreira, J.X., Polleres, A.: Multi-level Semantic La- belling of Numerical Values. In: Proceedings of ISWC. pp. 428–445 (2016)

  20. [20]

    Proceedings of the IEEE 104(1), 11–33 (2016)

    Nickel, M., Murphy, K., Tresp, V., Gabrilovich, E.: A review of relational machine learning for knowledge graphs. Proceedings of the IEEE 104(1), 11–33 (2016)

  21. [21]

    Morgan Kaufmann Publishers Inc

    Pearl, J.: Probabilistic reasoning in intelligent systems - networks of plausible in- ference. Morgan Kaufmann Publishers Inc. (1989)

  22. [22]

    Proceedings of ISWC pp

    Pham, M., Alse, S., Knoblock, C.A., Szekely, P.: Semantic Labeling : A Domain- independent Approach. Proceedings of ISWC pp. 446–462 (2016)

  23. [23]

    In: Proceedings of ICDM

    Ran, C., Shen, W., Wang, J., Zhu, X.: Domain-Specific Knowledge Base Enrich- ment Using Wikipedia Tables. In: Proceedings of ICDM. pp. 349–358 (2015)

  24. [24]

    In: Advances in Neural Information Processing Systems

    Recht, B., Re, C., Wright, S., Niu, F.: Hogwild: A lock-free approach to paralleliz- ing stochastic gradient descent. In: Advances in Neural Information Processing Systems. pp. 693–701 (2011)

  25. [25]

    In: Proceedings of HLT-NAACL (2013)

    Riedel, S., Yao, L., McCallum, A., Marlin, B.M.: Relation Extraction with Matrix Factorization and Universal Schemas. In: Proceedings of HLT-NAACL (2013)

  26. [26]

    In: Pro- ceedings of WIMS

    Ritze, D., Lehmberg, O., Bizer, C.: Matching HTML tables to DBpedia. In: Pro- ceedings of WIMS. p. 10 (2015)

  27. [27]

    In: Proceedings of WWW

    Ritze, D., Lehmberg, O., Oulabi, Y., Bizer, C.: Profiling the Potential of Web Tables for Augmenting Cross-domain Knowledge Bases. In: Proceedings of WWW. pp. 251–261 (2016)

  28. [28]

    In: Proceedings of LDOW at WWW (2014)

    Sekhavat, Y.A., Paolo, F.D., Barbosa, D., Merialdo, P.: Knowledge Base Augmen- tation using Tabular Data. In: Proceedings of LDOW at WWW (2014)

  29. [29]

    In: Proceedings of WWW

    Sun, H., Ma, H., He, X., Yih, W.T., Su, Y., Yan, X.: Table Cell Search for Question Answering. In: Proceedings of WWW. pp. 771–782 (2016)

  30. [30]

    PVLDB 4, 528–538 (2011)

    Venetis, P., Halevy, A., Madhavan, J., Paca, M., Shen, W., Wu, F., Miao, G., Wu, C.: Recovering Semantics of Tables on the Web. PVLDB 4, 528–538 (2011)

  31. [31]

    Wang, J., Shao, B., Wang, H.: Understanding Tables on the Web. In: ER. vol. 1, pp. 141–155 (2010)

  32. [32]

    In: Proceedings of SIGMOD

    Yakout, M., Ganjam, K., Chakrabarti, K., Chaudhuri, S.: InfoGather: Entity Aug- mentation and Attribute Discovery by Holistic Matching with Web Tables. In: Proceedings of SIGMOD. pp. 97–108 (2012)

  33. [33]

    Semantic Web 8(6), 921–957 (2017)

    Zhang, Z.: Effective and efficient semantic table interpretation using tableminer+. Semantic Web 8(6), 921–957 (2017)