pith. sign in

arxiv: 2507.03617 · v2 · submitted 2025-07-04 · 💻 cs.CL

EMERGE: A Benchmark for Updating Knowledge Graphs with Emerging Textual Knowledge

Pith reviewed 2026-05-19 05:59 UTC · model grok-4.3

classification 💻 cs.CL
keywords knowledge graphsWikidataWikipediadatasetbenchmarkknowledge updateemerging knowledgetext to graph
0
0 comments X

The pith

A benchmark dataset pairs 233K Wikipedia passages with 1.45 million Wikidata edits across seven yearly snapshots to test knowledge-graph updates from new text.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs a dataset that aligns evolving Wikipedia text with the precise add, delete, and update operations each passage would trigger on a particular snapshot of Wikidata. Traditional extraction methods pull facts from text without regard to what the graph already contains, so the new resource forces models to decide updates while taking the current graph state into account. If the alignment method works, researchers gain a large-scale testbed for training systems that keep structured knowledge current as facts emerge in unstructured sources. The resulting collection covers 233K passages and 1.45 million edits spanning 2019 to 2025, exposing concrete integration difficulties that static extraction pipelines do not address.

Core claim

The paper introduces a construction method that produces Wikidata snapshots at yearly intervals together with Wikipedia passages paired to the exact edit operations those passages induce on each snapshot. The resulting resource contains 233K aligned passages and 1.45 million edits over seven snapshots from 2019 to 2025 and is released as a public benchmark for the task of state-aware knowledge-graph updating.

What carries the argument

The alignment of each Wikipedia passage to the specific add, delete, or update operations it induces on a fixed Wikidata snapshot at a given year.

If this is right

  • Models can now be trained and evaluated on the joint problem of extracting knowledge and deciding how it should modify an existing graph.
  • The benchmark reveals specific failure modes when new text contradicts or extends the current graph structure.
  • Yearly snapshots allow temporal studies of how update difficulty changes as the underlying graph grows.
  • Public release enables direct comparison of update strategies across research groups.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same snapshot-and-alignment technique could be applied to other large KGs to create comparable benchmarks without manual annotation.
  • Finer time granularity than yearly snapshots might expose short-term update patterns that the current data set cannot capture.
  • The resource could support research on detecting when text implies a relation should be removed rather than added or updated.

Load-bearing premise

The edit operations that a Wikipedia passage would induce on a particular KG snapshot can be identified and labeled reliably enough to create aligned training pairs.

What would settle it

A controlled experiment in which models trained on the new dataset produce no higher accuracy or consistency when predicting required edits on held-out text-KG pairs than models trained only on standard information-extraction objectives.

Figures

Figures reproduced from arXiv: 2507.03617 by Daniel Daza, Edoardo Barba, Ira Assent, Klim Zaporojets, Paul Groth, Roberto Navigli.

Figure 1
Figure 1. Figure 1: Illustration of one instance in EMERGE. The [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of EMERGE creation pipeline. First, the [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The distribution of the TKGU operations defined in Section 3 in EMERGE. We also include de￾tection of existing triples (X-Triples), which does not entail any operation on KG. 4.4 Dataset extension EMERGE is an automatically constructed dataset, which we plan to extend using quarterly snapshots of Wikipedia and Wikidata, following the pipeline described in Section 4 and illustrated in [PITH_FULL_IMAGE:figu… view at source ↗
read the original abstract

Knowledge Graphs (KGs) are structured knowledge repositories containing entities and relations between them. In this paper, we study the problem of automatically updating KGs over time in response to evolving knowledge in unstructured textual sources. Addressing this problem requires identifying a wide range of update operations based on the state of an existing KG at a given time and the information extracted from text. This contrasts with traditional information extraction pipelines, which extract knowledge from text independently of the current state of a KG. To address this challenge, we propose a method for construction of a dataset consisting of Wikidata KG snapshots over time and Wikipedia passages paired with the corresponding edit operations that they induce in a particular KG snapshot. The resulting dataset comprises 233K Wikipedia passages aligned with a total of 1.45 million KG edits over 7 different yearly snapshots of Wikidata from 2019 to 2025. Our experimental results highlight key challenges in updating KG snapshots based on emerging textual knowledge, particularly in integrating knowledge expressed in text with the existing KG structure. These findings position the dataset as a valuable benchmark for future research. Our dataset and model implementations are publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces EMERGE, a benchmark for updating knowledge graphs with emerging textual knowledge. It proposes a construction method that aligns 233K Wikipedia passages with 1.45 million induced edit operations (add/delete/update) across 7 yearly Wikidata snapshots (2019–2025), and presents experiments that highlight challenges in state-aware integration of textual knowledge with existing KG structure. The dataset and implementations are released publicly.

Significance. If the induced-edit labels prove reliable, the benchmark would be a valuable contribution for research on dynamic KG updating, as it supplies large-scale, temporally aligned text–edit pairs that explicitly condition on KG snapshot state. This goes beyond standard IE and supports evaluation of methods that must decide add/delete/update relative to current KG content. Public release and use of real Wikidata/Wikipedia sources are clear strengths.

major comments (2)
  1. [Dataset construction] Dataset construction (abstract and §3): the procedure that extracts candidate facts from each Wikipedia passage, aligns them to Wikidata entities/relations, and labels the precise update type (add, delete, update) relative to a given yearly snapshot is presented without any precision/recall figures, human validation, or error analysis. Because the 1.45 M labeled edits are the core of the benchmark, lack of validation on this step is load-bearing for the claim that EMERGE is a usable resource.
  2. [Experiments] Experiments (§4): results are described only at a high level as “highlighting key challenges.” Concrete details on the models or baselines tested, the exact metrics (e.g., edit-type accuracy, entity-linking F1), and quantitative evidence for the claimed difficulties would be needed to substantiate that the dataset exposes non-trivial problems.
minor comments (2)
  1. [Abstract] Abstract: reports dataset size and high-level construction but omits any mention of how edit operations are identified or validated; a single sentence on this point would improve clarity.
  2. [Notation] Notation: ensure consistent terminology for “induced edit,” “update operation,” and “KG snapshot” across sections and figures.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below and indicate the changes we will make in the revision.

read point-by-point responses
  1. Referee: [Dataset construction] Dataset construction (abstract and §3): the procedure that extracts candidate facts from each Wikipedia passage, aligns them to Wikidata entities/relations, and labels the precise update type (add, delete, update) relative to a given yearly snapshot is presented without any precision/recall figures, human validation, or error analysis. Because the 1.45 M labeled edits are the core of the benchmark, lack of validation on this step is load-bearing for the claim that EMERGE is a usable resource.

    Authors: We agree that explicit validation of the induced-edit labeling process is necessary to support the benchmark's usability. The construction in §3 relies on automated alignment between Wikipedia passages and Wikidata snapshots, but we did not report precision/recall or human validation in the submitted version. In the revision we will add a new subsection with human evaluation on a sampled subset of the 1.45 M edits, together with precision/recall figures for the fact extraction, alignment, and update-type labeling steps, plus a brief error analysis. revision: yes

  2. Referee: [Experiments] Experiments (§4): results are described only at a high level as “highlighting key challenges.” Concrete details on the models or baselines tested, the exact metrics (e.g., edit-type accuracy, entity-linking F1), and quantitative evidence for the claimed difficulties would be needed to substantiate that the dataset exposes non-trivial problems.

    Authors: We accept that the experimental results in §4 are summarized at too high a level. The current text focuses on qualitative observations of integration challenges. In the revised version we will expand this section to specify the models and baselines evaluated, report exact metrics including edit-type accuracy and entity-linking F1, and present quantitative tables and analysis that demonstrate the non-trivial difficulties the dataset reveals. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset built from external public sources

full rationale

The paper constructs its benchmark by aligning publicly available Wikidata snapshots (2019-2025) with Wikipedia passages and the edit operations those passages induce on each snapshot. This process draws on external, independently verifiable data rather than any fitted parameters, self-definitional loops, or load-bearing self-citations. No derivation step reduces to its own inputs by construction; the resulting 233K passages and 1.45M edits are outputs of an alignment procedure applied to outside sources, making the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work rests on standard assumptions about the reliability of Wikidata as ground truth and the feasibility of mapping text to discrete edit operations; no free parameters or invented entities are introduced in the abstract.

axioms (2)
  • domain assumption Wikidata snapshots accurately capture the state of the knowledge graph at each yearly point.
    The dataset construction uses these snapshots as the baseline against which text-induced edits are defined.
  • domain assumption Wikipedia passages can be aligned to produce identifiable and labelable edit operations on a given snapshot.
    This alignment is the core step that creates the 233K text-edit pairs.

pith-pipeline@v0.9.0 · 5743 in / 1304 out tokens · 37342 ms · 2026-05-19T05:59:33.895071+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

81 extracted references · 81 canonical work pages · 1 internal anchor

  1. [1]

    URL: " 'urlintro :=

    ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

  2. [2]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

  3. [3]

    Oshin Agarwal, Heming Ge, Siamak Shakeri, and Rami Al-Rfou. 2021. https://doi.org/10.18653/v1/2021.naacl-main.278 Knowledge graph based synthetic corpus generation for knowledge-enhanced language model pre-training . In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technol...

  4. [4]

    Jacqueline Aguilar, Charley Beller, Paul McNamee, Benjamin Van Durme, Stephanie Strassel, Zhiyi Song, and Joe Ellis. 2014. https://aclanthology.org/W14-2907.pdf A comparison of the events and relations across ace, ere, tac-kbp, and framenet annotation standards . In Proceedings of the 2nd Workshop on EVENTS: Definition, Detection, Coreference, and Represe...

  5. [5]

    Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman, and Andrew McCallum. 2017. https://doi.org/10.18653/v1/S17-2091 Semeval 2017 task 10: Scienceie-extracting keyphrases and relations from scientific publications . In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 546--555

  6. [6]

    Zhen Bi, Jing Chen, Yinuo Jiang, Feiyu Xiong, Wei Guo, Huajun Chen, and Ningyu Zhang. 2024. https://dl.acm.org/doi/full/10.1145/3641850 Codekgc: Code language model for generative knowledge graph construction . ACM Transactions on Asian and Low-Resource Language Information Processing, 23(3):1--16

  7. [7]

    Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. https://doi.org/10.1145/1376616.1376746 Freebase: a collaboratively created graph database for structuring human knowledge . In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247--1250

  8. [8]

    Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. https://proceedings.neurips.cc/paper/2013/hash/1cecc7a77928ca8133fa24680a88d2f9-Abstract.html Translating embeddings for modeling multi-relational data . In Advances in neural information processing systems, pages 2787--2795

  9. [9]

    Elizabeth Boschee, Jennifer Lautenschlager, Sean O’Brien, Steve Shellman, James Starz, and Michael Ward. 2015. https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/28075 ICEWS coded event data . Harvard Dataverse, 12

  10. [10]

    Pere-Llu \' s Huguet Cabot and Roberto Navigli. 2021. https://doi.org/10.18653/v1/2021.findings-emnlp.204 REBEL : Relation extraction by end-to-end language generation . In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2370--2381

  11. [11]

    Arun Chaganty, Ashwin Paranjape, Percy Liang, and Christopher D Manning. 2017. https://doi.org/10.18653/v1/D17-1109 Importance sampling for unbiased on-demand evaluation of knowledge base population . In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1038--1048

  12. [12]

    Nancy Chinchor and Elaine Marsh. 1998. https://catalog.ldc.upenn.edu/docs/LDC2001T02/guidelines.IEtask42.ps Muc-7 information extraction task definition . In Proceeding of the 1998 Message Understanding Conference (MUC-7), pages 359--367

  13. [13]

    John Dagdelen, Alexander Dunn, Sanghoon Lee, Nicholas Walker, Andrew S Rosen, Gerbrand Ceder, Kristin A Persson, and Anubhav Jain. 2024. https://doi.org/https://doi.org/10.1038/s41467-024-45563-x Structured information extraction from scientific text with large language models . Nature Communications, 15(1):1418

  14. [14]

    Shib Sankar Dasgupta, Swayambhu Nath Ray, and Partha Talukdar. 2018. https://doi.org/10.18653/v1/D18-1225 Hyte: Hyperplane-based temporally aware knowledge graph embedding . In Proceedings of the 2018 conference on empirical methods in natural language processing, pages 2001--2011

  15. [15]

    Daniel Daza, Michael Cochez, and Paul Groth. 2021. https://doi.org/10.1145/3442381.3450141 Inductive entity representations from text via link prediction . In Proceedings of the Web Conference 2021, pages 798--808

  16. [16]

    Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. 2018. https://doi.org/https://doi.org/10.1609/aaai.v32i1.11573 Convolutional 2d knowledge graph embeddings . In Proceedings of the AAAI conference on artificial intelligence, volume 32

  17. [17]

    Bhuwan Dhingra, Jeremy R Cole, Julian Martin Eisenschlos, Daniel Gillick, Jacob Eisenstein, and William W Cohen. 2022. https://doi.org/10.1162/tacl_a_00459 Time-aware language models as temporal knowledge bases . Transactions of the Association for Computational Linguistics, 10:257--273

  18. [18]

    Bayu Distiawan, Gerhard Weikum, Jianzhong Qi, and Rui Zhang. 2019. https://doi.org/10.18653/v1/P19-1023 Neural relation extraction for knowledge base enrichment . In Proceedings of the 2019 Conference of the Association for Computational Linguistics, pages 229--240

  19. [19]

    Hady Elsahar, Pavlos Vougiouklis, Arslen Remaci, Christophe Gravier, Jonathon Hare, Elena Simperl, and Frederique Laforest. 2019. https://aclanthology.org/L18-1544.pdf T-rex: A large scale alignment of natural language with knowledge base triples

  20. [20]

    Evgeniy Gabrilovich, Michael Ringgaard, and Amarnag Subramanya. 2013. https://lemurproject.org/clueweb12/FACC1/ Facc1: Freebase annotation of clueweb corpora

  21. [21]

    Luis Gal \'a rraga, Geremy Heitz, Kevin Murphy, and Fabian M Suchanek. 2014. https://doi.org/10.1145/2661829.2662073 Canonicalizing open knowledge bases . In Proceedings of the 23rd acm international conference on conference on information and knowledge management, pages 1679--1688

  22. [22]

    Tianyu Gao, Xu Han, Hao Zhu, Zhiyuan Liu, Peng Li, Maosong Sun, and Jie Zhou. 2019. https://doi.org/10.18653/v1/D19-1649 Fewrel 2.0: Towards more challenging few-shot relation classification . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing ...

  23. [23]

    Claire Gardent, Anastasia Shimorina, Shashi Narayan, and Laura Perez-Beltrachini. 2017. https://doi.org/10.18653/v1/W17-3518 The webnlg challenge: Generating text from rdf data . In Proceedings of the 10th International Conference on Natural Language Generation, pages 124--133

  24. [24]

    Saibo Geng, Martin Josifoski, Maxime Peyrard, and Robert West. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.674 Grammar-constrained decoding for structured nlp tasks without finetuning . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10932--10952

  25. [25]

    Xu Han, Hao Zhu, Pengfei Yu, Ziyun Wang, Yuan Yao, Zhiyuan Liu, and Maosong Sun. 2018. https://doi.org/10.18653/v1/D18-1514 FewRel : A large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), pages 4803--4809

  26. [26]

    Iris Hendrickx, Su Nam Kim, Zornitsa Kozareva, Preslav Nakov, Diarmuid \'O S \'e aghdha, Sebastian Pad \'o , Marco Pennacchiotti, Lorenza Romano, and Stan Szpakowicz. 2010. https://aclanthology.org/S10-1006.pdf Semeval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals . In Proceedings of the 5th International Workshop o...

  27. [27]

    Joel Jang, Seonghyeon Ye, Changho Lee, Sohee Yang, Joongbo Shin, Janghoon Han, Gyeonghun Kim, and Minjoon Seo. 2022 a . https://doi.org/10.18653/v1/2022.emnlp-main.418 T emporal W iki: A lifelong benchmark for training and evaluating ever-evolving language models . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, ...

  28. [28]

    Joel Jang, Seonghyeon Ye, Sohee Yang, Joongbo Shin, Janghoon Han, Gyeonghun Kim, Stanley Jungkyu Choi, and Minjoon Seo. 2022 b . https://openreview.net/forum?id=vfsRB5MImo9 Towards continual knowledge learning of language models . In ICLR

  29. [29]

    Heng Ji, Ralph Grishman, Hoa Trang Dang, Kira Griffitt, and Joe Ellis. 2010. https://blender.cs.illinois.edu/paper/kbp2010overview.pdf Overview of the TAC 2010 knowledge base population track . In Proceedings of the 2010 Text Analysis Conference (TAC 2010), pages 1--25

  30. [30]

    Pengcheng Jiang, Jiacheng Lin, Zifeng Wang, Jimeng Sun, and Jiawei Han. 2024. https://doi.org/10.18653/v1/2024.naacl-long.155 Genres: Rethinking evaluation for generative relation extraction in the era of large language models . In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Lang...

  31. [31]

    Martin Josifoski, Nicola De Cao, Maxime Peyrard, Fabio Petroni, and Robert West. 2022. https://doi.org/10.18653/v1/2022.naacl-main.342 G en IE : Generative information extraction . In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4626--4643, Seattle, Un...

  32. [32]

    Martin Josifoski, Marija Sakota, Maxime Peyrard, and Robert West. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.96 Exploiting asymmetry for synthetic training data generation: S ynth IE and the case of information extraction . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 1555--1574, Singapore. Associ...

  33. [33]

    Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Akari Asai, Xinyan Yu, Dragomir Radev, Noah A Smith, Yejin Choi, Kentaro Inui, et al. 2024. https://proceedings.neurips.cc/paper_files/paper/2023/file/9941624ef7f867a502732b5154d30cb7-Paper-Datasets_and_Benchmarks.pdf Realtime qa: What's the answer right now? Advances in Neural Information Processing Systems, 36

  34. [34]

    Jinyoung Kim, Dayoon Ko, and Gunhee Kim. 2024 a . https://doi.org/10.18653/v1/2024.emnlp-main.762 D ynamic ER : Resolving emerging mentions to dynamic entities for RAG . In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 13752--13770, Miami, Florida, USA. Association for Computational Linguistics

  35. [35]

    Yujin Kim, Jaehong Yoon, Seonghyeon Ye, Sangmin Bae, Namgyu Ho, Sung Ju Hwang, and Se-Young Yun. 2024 b . https://doi.org/10.18653/v1/2024.naacl-long.302 Carpe diem: On the evaluation of world knowledge in lifelong language models . In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human ...

  36. [36]

    Dayoon Ko, Jinyoung Kim, Hahyeon Choi, and Gunhee Kim. 2024. https://doi.org/10.18653/v1/2024.acl-long.181 G row OVER : How can LLM s adapt to growing real-world knowledge? In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3282--3308, Bangkok, Thailand. Association for Computational L...

  37. [37]

    Timoth \'e e Lacroix, Guillaume Obozinski, and Nicolas Usunier. 2020. https://openreview.net/forum?id=rke2P1BFwS Tensor decompositions for temporal knowledge base completion . arXiv preprint arXiv:2004.04926

  38. [38]

    Kenton Lee, Luheng He, Mike Lewis, and Luke Zettlemoyer. 2017. https://doi.org/10.18653/v1/D17-1018 End-to-end neural coreference resolution . In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 188--197, Copenhagen, Denmark. Association for Computational Linguistics

  39. [39]

    u ttler, Mike Lewis, Wen-tau Yih, Tim Rockt \

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K \"u ttler, Mike Lewis, Wen-tau Yih, Tim Rockt \"a schel, et al. 2020. https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html Retrieval-augmented generation for knowledge-intensive nlp tasks . In Proceedings of th...

  40. [40]

    Belinda Z Li, Emmy Liu, Alexis Ross, Abbas Zeitoun, Graham Neubig, and Jacob Andreas. 2024. https://arxiv.org/pdf/2406.11830 Language modeling with editable external knowledge . arXiv preprint arXiv:2406.11830

  41. [41]

    Jiao Li, Yueping Sun, Robin J Johnson, Daniela Sciaky, Chih-Hsuan Wei, Robert Leaman, Allan Peter Davis, Carolyn J Mattingly, Thomas C Wiegers, and Zhiyong Lu. 2016. https://doi.org/10.1093/database/baw068 Biocreative v cdr task corpus: a resource for chemical disease relation extraction . Database, 2016

  42. [42]

    Adam Liska, Tomas Kocisky, Elena Gribovskaya, Tayfun Terzi, Eren Sezener, Devang Agrawal, D’Autume Cyprien De Masson, Tim Scholtes, Manzil Zaheer, Susannah Young, et al. 2022. https://proceedings.mlr.press/v162/liska22a.html Streaming QA : A benchmark for adaptation to new knowledge over time in question answering models . In International Conference on M...

  43. [43]

    Yi Luan, Luheng He, Mari Ostendorf, and Hannaneh Hajishirzi. 2018. https://doi.org/10.18653/v1/D18-1360 Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3219--3232

  44. [44]

    Ling Luo, Po-Ting Lai, Chih-Hsuan Wei, Cecilia N Arighi, and Zhiyong Lu. 2022. https://doi.org/10.1093/bib/bbac282 Bio RED : a rich biomedical relation extraction dataset . Briefings in Bioinformatics, 23(5):bbac282

  45. [45]

    Xin Lv, Yankai Lin, Yixin Cao, Lei Hou, Juanzi Li, Zhiyuan Liu, Peng Li, and Jie Zhou. 2022. https://doi.org/10.18653/v1/2022.findings-acl.282 Do pre-trained models benefit knowledge graph completion? a reliable evaluation and a reasonable approach . In Findings of the Association for Computational Linguistics: ACL 2022, pages 3570--3581, Dublin, Ireland....

  46. [46]

    Farzaneh Mahdisoltani, Joanna Biega, and Fabian Suchanek. 2014. https://imt.hal.science/hal-01699874/ Yago3: A knowledge base from multilingual wikipedias . In 7th biennial conference on innovative data systems research. CIDR Conference

  47. [47]

    Katerina Margatina, Shuai Wang, Yogarshi Vyas, Neha Anna John, Yassine Benajiba, and Miguel Ballesteros. 2023. https://doi.org/10.18653/v1/2023.eacl-main.211 Dynamic benchmarking of masked language models on temporal concept drift with multiple views . In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Lingu...

  48. [48]

    Filipe Mesquita, Matteo Cannaviccio, Jordan Schmidek, Paramita Mirza, and Denilson Barbosa. 2019. https://doi.org/10.18653/v1/D19-1069 K nowledge N et: A benchmark dataset for knowledge base population . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language ...

  49. [49]

    George A Miller. 1995. https://doi.org/10.1145/219717.219748 Wordnet: a lexical database for english . Communications of the ACM, 38(11):39--41

  50. [50]

    Yasumasa Onoe, Michael Zhang, Eunsol Choi, and Greg Durrett. 2022. https://doi.org/10.18653/v1/2022.findings-naacl.52 Entity cloze by date: What LM s know about unseen entities . In Findings of the Association for Computational Linguistics: NAACL 2022, pages 693--702, Seattle, United States. Association for Computational Linguistics

  51. [51]

    Yasumasa Onoe, Michael Zhang, Shankar Padmanabhan, Greg Durrett, and Eunsol Choi. 2023. https://doi.org/10.18653/v1/2023.acl-long.300 Can LM s learn new entities from descriptions? challenges in propagating injected knowledge . In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5469--5...

  52. [52]

    Riccardo Orlando, Pere-Llu \'i s Huguet Cabot, Edoardo Barba, and Roberto Navigli. 2024. https://doi.org/10.18653/v1/2024.findings-acl.839 R e L i K : Retrieve and L in K , fast and accurate entity linking and relation extraction on an academic budget . In Findings of the Association for Computational Linguistics: ACL 2024, pages 14114--14132, Bangkok, Th...

  53. [53]

    Heiko Paulheim. 2016. https://doi.org/10.3233/SW-160218 Knowledge graph refinement: A survey of approaches and evaluation methods . Semantic web, 8(3):489--508

  54. [54]

    Simon Razniewski, Hiba Arnaout, Shrestha Ghosh, and Fabian Suchanek. 2024. https://doi.org/10.1145/3639563 Completeness, recall, and negation in open-world knowledge bases: A survey . ACM Computing Surveys, 56(6):1--42

  55. [55]

    Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010. https://doi.org/https://doi.org/10.1007/978-3-642-15939-8_10 Modeling relations and their mentions without labeled text . In Proceedings of the 2010 European Conference on Machine Learning and Knowledge Discovery in Databases, pages 148--163

  56. [56]

    Gaetano Rossiello, Md Faisal Mahbub Chowdhury, Nandana Mihindukulasooriya, Owen Cornec, and Alfio Massimiliano Gliozzo. 2023. https://doi.org/https://doi.org/10.1609/aaai.v37i13.27084 Knowgl: Knowledge generation and linking from text . In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 16476--16478

  57. [57]

    Dan Roth and Wen-tau Yih. 2004. https://aclanthology.org/W04-2401.pdf A linear programming formulation for global inference in natural language tasks . Technical report, Illinois Univ at Urbana-Champaign Dept of Computer Science

  58. [58]

    Tara Safavi and Danai Koutra. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.669 C o DE x: A C omprehensive K nowledge G raph C ompletion B enchmark . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8328--8350, Online. Association for Computational Linguistics

  59. [59]

    Tong Shen, Fu Zhang, and Jingwei Cheng. 2022. https://doi.org/https://doi.org/10.1016/j.knosys.2022.109597 A comprehensive overview of knowledge graph completion . Knowledge-Based Systems, 255:109597

  60. [60]

    Zhiyi Song, Ann Bies, Stephanie Strassel, Tom Riese, Justin Mott, Joe Ellis, Jonathan Wright, Seth Kulick, Neville Ryant, and Xiaoyi Ma. 2015. https://aclanthology.org/W15-0812.pdf From light to rich ere: annotation of entities, relations, and events . In Proceedings of the the 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation...

  61. [61]

    Budhitama Subagdja, D Shanthoshigaa, Zhaoxia Wang, and Ah-Hwee Tan. 2024. https://doi.org/10.1145/3640313 Machine learning for refining knowledge graphs: A survey . ACM Computing Surveys, 56(6):1--38

  62. [62]

    TAC-KBP. 2022. https://tac.nist.gov/tracks/index.html TAC-KBP home page

  63. [63]

    Kristina Toutanova and Danqi Chen. 2015. https://aclanthology.org/W15-4007.pdf Observed versus latent features for knowledge base and text inference . In Proceedings of the 3rd workshop on continuous vector space models and their compositionality, pages 57--66

  64. [64]

    Denny Vrande c i \'c and Markus Kr \"o tzsch. 2014. https://doi.org/10.1145/2629489 Wikidata: a free collaborative knowledgebase . Communications of the ACM, 57(10):78--85

  65. [65]

    Tu Vu, Mohit Iyyer, Xuezhi Wang, Noah Constant, Jerry Wei, Jason Wei, Chris Tar, Yun-Hsuan Sung, Denny Zhou, Quoc Le, and Thang Luong. 2024. https://doi.org/10.18653/v1/2024.findings-acl.813 F resh LLM s: Refreshing large language models with search engine augmentation . In Findings of the Association for Computational Linguistics: ACL 2024, pages 13697--...

  66. [66]

    Christopher Walker, Stephanie Strassel, Julie Medero, and Kazuaki Maeda. 2006. Ace 2005 multilingual training corpus. Linguistic Data Consortium, Philadelphia, 57

  67. [67]

    Liang Wang, Wei Zhao, Zhuoyu Wei, and Jingming Liu. 2022. https://doi.org/10.18653/v1/2022.acl-long.295 S im KGC : Simple contrastive knowledge graph completion with pre-trained language models . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4281--4294, Dublin, Ireland. Associatio...

  68. [68]

    Xiaozhi Wang, Tianyu Gao, Zhaocheng Zhu, Zhiyuan Liu, Juanzi Li, and Jian Tang. 2021. https://doi.org/10.1162/tacl_a_00360 KEPLER : A unified model for knowledge embedding and pre-trained language representation . Transactions of the Association for Computational Linguistics, 9:176--194

  69. [69]

    Colin White, Samuel Dooley, Manley Roberts, Arka Pal, Ben Feuer, Siddhartha Jain, Ravid Shwartz-Ziv, Neel Jain, Khalid Saifullah, Siddartha Naidu, et al. 2024. https://openreview.net/forum?id=sKYHBTAxVa Livebench: A challenging, contamination-free llm benchmark . arXiv preprint arXiv:2406.19314

  70. [70]

    Tongtong Wu, Linhao Luo, Yuan-Fang Li, Shirui Pan, Thuy-Trang Vu, and Gholamreza Haffari. 2024 a . https://arxiv.org/pdf/2402.01364 Continual learning for large language models: A survey . arXiv preprint arXiv:2402.01364

  71. [71]

    Xiaobao Wu, Liangming Pan, William Yang Wang, and Anh Tuan Luu. 2024 b . https://doi.org/10.18653/v1/2024.emnlp-main.843 AKEW : Assessing knowledge editing in the wild . In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 15118--15133, Miami, Florida, USA. Association for Computational Linguistics

  72. [72]

    Xiaobao Wu, Liangming Pan, Yuxi Xie, Ruiwen Zhou, Shuai Zhao, Yubo Ma, Mingzhe Du, Rui Mao, Anh Tuan Luu, and William Yang Wang. 2024 c . https://arxiv.org/pdf/2412.13670 Antileak-bench: Preventing data contamination by automatically constructing benchmarks with updated real-world knowledge . arXiv preprint arXiv:2412.13670

  73. [73]

    Rui Xing, Jie Luo, and Tengwei Song. 2020. https://doi.org/https://doi.org/10.1186/s12859-020-03889-5 Biorel: towards large-scale biomedical relation extraction . BMC bioinformatics, 21:1--13

  74. [74]

    Wenhan Xiong, Mo Yu, Shiyu Chang, Xiaoxiao Guo, and William Yang Wang. 2018. https://doi.org/10.18653/v1/D18-1223 One-shot relational learning for knowledge graphs . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1980--1990, Brussels, Belgium. Association for Computational Linguistics

  75. [75]

    Derong Xu, Wei Chen, Wenjun Peng, Chao Zhang, Tong Xu, Xiangyu Zhao, Xian Wu, Yefeng Zheng, Yang Wang, and Enhong Chen. 2024. https://doi.org/https://doi.org/10.1007/s11704-024-40555-y Large language models for generative information extraction: A survey . Frontiers of Computer Science, 18(6):186357

  76. [76]

    Yuan Yao, Deming Ye, Peng Li, Xu Han, Yankai Lin, Zhenghao Liu, Zhiyuan Liu, Lixin Huang, Jie Zhou, and Maosong Sun. 2019. https://aclanthology.org/P19-1074 DocRED : A large-scale document-level relation extraction dataset . In Proceedings of the 2019 Annual Meeting of the Association for Computational Linguistics (ACL 2019), pages 764--777

  77. [77]

    Klim Zaporojets, Johannes Deleu, Chris Develder, and Thomas Demeester. 2021. https://doi.org/10.1016/j.ipm.2021.102563 DWIE : An entity-centric dataset for multi-task document-level information extraction . Information Processing & Management, 58(4):102563

  78. [78]

    Urchade Zaratiana, Nadi Tomeh, Pierre Holat, and Thierry Charnois. 2024. https://doi.org/https://doi.org/10.1609/aaai.v38i17.29919 An autoregressive text-to-graph framework for joint entity and relation extraction . In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 19477--19487

  79. [79]

    Bowen Zhang and Harold Soh. 2024. https://doi.org/10.18653/v1/2024.emnlp-main.548 Extract, define, canonicalize: An LLM -based framework for knowledge graph construction . In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 9820--9836, Miami, Florida, USA. Association for Computational Linguistics

  80. [80]

    Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and Christopher D. Manning. 2017. https://doi.org/10.18653/v1/D17-1004 Position-aware attention and supervised data improve slot filling . In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 35--45, Copenhagen, Denmark. Association for Computational Linguistics

Showing first 80 references.