EMERGE: A Benchmark for Updating Knowledge Graphs with Emerging Textual Knowledge
Pith reviewed 2026-05-19 05:59 UTC · model grok-4.3
The pith
A benchmark dataset pairs 233K Wikipedia passages with 1.45 million Wikidata edits across seven yearly snapshots to test knowledge-graph updates from new text.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper introduces a construction method that produces Wikidata snapshots at yearly intervals together with Wikipedia passages paired to the exact edit operations those passages induce on each snapshot. The resulting resource contains 233K aligned passages and 1.45 million edits over seven snapshots from 2019 to 2025 and is released as a public benchmark for the task of state-aware knowledge-graph updating.
What carries the argument
The alignment of each Wikipedia passage to the specific add, delete, or update operations it induces on a fixed Wikidata snapshot at a given year.
If this is right
- Models can now be trained and evaluated on the joint problem of extracting knowledge and deciding how it should modify an existing graph.
- The benchmark reveals specific failure modes when new text contradicts or extends the current graph structure.
- Yearly snapshots allow temporal studies of how update difficulty changes as the underlying graph grows.
- Public release enables direct comparison of update strategies across research groups.
Where Pith is reading between the lines
- The same snapshot-and-alignment technique could be applied to other large KGs to create comparable benchmarks without manual annotation.
- Finer time granularity than yearly snapshots might expose short-term update patterns that the current data set cannot capture.
- The resource could support research on detecting when text implies a relation should be removed rather than added or updated.
Load-bearing premise
The edit operations that a Wikipedia passage would induce on a particular KG snapshot can be identified and labeled reliably enough to create aligned training pairs.
What would settle it
A controlled experiment in which models trained on the new dataset produce no higher accuracy or consistency when predicting required edits on held-out text-KG pairs than models trained only on standard information-extraction objectives.
Figures
read the original abstract
Knowledge Graphs (KGs) are structured knowledge repositories containing entities and relations between them. In this paper, we study the problem of automatically updating KGs over time in response to evolving knowledge in unstructured textual sources. Addressing this problem requires identifying a wide range of update operations based on the state of an existing KG at a given time and the information extracted from text. This contrasts with traditional information extraction pipelines, which extract knowledge from text independently of the current state of a KG. To address this challenge, we propose a method for construction of a dataset consisting of Wikidata KG snapshots over time and Wikipedia passages paired with the corresponding edit operations that they induce in a particular KG snapshot. The resulting dataset comprises 233K Wikipedia passages aligned with a total of 1.45 million KG edits over 7 different yearly snapshots of Wikidata from 2019 to 2025. Our experimental results highlight key challenges in updating KG snapshots based on emerging textual knowledge, particularly in integrating knowledge expressed in text with the existing KG structure. These findings position the dataset as a valuable benchmark for future research. Our dataset and model implementations are publicly available.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces EMERGE, a benchmark for updating knowledge graphs with emerging textual knowledge. It proposes a construction method that aligns 233K Wikipedia passages with 1.45 million induced edit operations (add/delete/update) across 7 yearly Wikidata snapshots (2019–2025), and presents experiments that highlight challenges in state-aware integration of textual knowledge with existing KG structure. The dataset and implementations are released publicly.
Significance. If the induced-edit labels prove reliable, the benchmark would be a valuable contribution for research on dynamic KG updating, as it supplies large-scale, temporally aligned text–edit pairs that explicitly condition on KG snapshot state. This goes beyond standard IE and supports evaluation of methods that must decide add/delete/update relative to current KG content. Public release and use of real Wikidata/Wikipedia sources are clear strengths.
major comments (2)
- [Dataset construction] Dataset construction (abstract and §3): the procedure that extracts candidate facts from each Wikipedia passage, aligns them to Wikidata entities/relations, and labels the precise update type (add, delete, update) relative to a given yearly snapshot is presented without any precision/recall figures, human validation, or error analysis. Because the 1.45 M labeled edits are the core of the benchmark, lack of validation on this step is load-bearing for the claim that EMERGE is a usable resource.
- [Experiments] Experiments (§4): results are described only at a high level as “highlighting key challenges.” Concrete details on the models or baselines tested, the exact metrics (e.g., edit-type accuracy, entity-linking F1), and quantitative evidence for the claimed difficulties would be needed to substantiate that the dataset exposes non-trivial problems.
minor comments (2)
- [Abstract] Abstract: reports dataset size and high-level construction but omits any mention of how edit operations are identified or validated; a single sentence on this point would improve clarity.
- [Notation] Notation: ensure consistent terminology for “induced edit,” “update operation,” and “KG snapshot” across sections and figures.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below and indicate the changes we will make in the revision.
read point-by-point responses
-
Referee: [Dataset construction] Dataset construction (abstract and §3): the procedure that extracts candidate facts from each Wikipedia passage, aligns them to Wikidata entities/relations, and labels the precise update type (add, delete, update) relative to a given yearly snapshot is presented without any precision/recall figures, human validation, or error analysis. Because the 1.45 M labeled edits are the core of the benchmark, lack of validation on this step is load-bearing for the claim that EMERGE is a usable resource.
Authors: We agree that explicit validation of the induced-edit labeling process is necessary to support the benchmark's usability. The construction in §3 relies on automated alignment between Wikipedia passages and Wikidata snapshots, but we did not report precision/recall or human validation in the submitted version. In the revision we will add a new subsection with human evaluation on a sampled subset of the 1.45 M edits, together with precision/recall figures for the fact extraction, alignment, and update-type labeling steps, plus a brief error analysis. revision: yes
-
Referee: [Experiments] Experiments (§4): results are described only at a high level as “highlighting key challenges.” Concrete details on the models or baselines tested, the exact metrics (e.g., edit-type accuracy, entity-linking F1), and quantitative evidence for the claimed difficulties would be needed to substantiate that the dataset exposes non-trivial problems.
Authors: We accept that the experimental results in §4 are summarized at too high a level. The current text focuses on qualitative observations of integration challenges. In the revised version we will expand this section to specify the models and baselines evaluated, report exact metrics including edit-type accuracy and entity-linking F1, and present quantitative tables and analysis that demonstrate the non-trivial difficulties the dataset reveals. revision: yes
Circularity Check
No circularity: dataset built from external public sources
full rationale
The paper constructs its benchmark by aligning publicly available Wikidata snapshots (2019-2025) with Wikipedia passages and the edit operations those passages induce on each snapshot. This process draws on external, independently verifiable data rather than any fitted parameters, self-definitional loops, or load-bearing self-citations. No derivation step reduces to its own inputs by construction; the resulting 233K passages and 1.45M edits are outputs of an alignment procedure applied to outside sources, making the work self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Wikidata snapshots accurately capture the state of the knowledge graph at each yearly point.
- domain assumption Wikipedia passages can be aligned to produce identifiable and labelable edit operations on a given snapshot.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose a method for construction of a dataset consisting of Wikidata KG snapshots over time and Wikipedia passages paired with the corresponding edit operations that they induce in a particular KG snapshot.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
five text-driven knowledge graph updating (TKGU) operations ... Emerging triples (E-Triples), Emerging entities and triples (EE-Triples), ... Deprecated triples (D-Triples)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...
-
[2]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
-
[3]
Oshin Agarwal, Heming Ge, Siamak Shakeri, and Rami Al-Rfou. 2021. https://doi.org/10.18653/v1/2021.naacl-main.278 Knowledge graph based synthetic corpus generation for knowledge-enhanced language model pre-training . In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technol...
-
[4]
Jacqueline Aguilar, Charley Beller, Paul McNamee, Benjamin Van Durme, Stephanie Strassel, Zhiyi Song, and Joe Ellis. 2014. https://aclanthology.org/W14-2907.pdf A comparison of the events and relations across ace, ere, tac-kbp, and framenet annotation standards . In Proceedings of the 2nd Workshop on EVENTS: Definition, Detection, Coreference, and Represe...
work page 2014
-
[5]
Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman, and Andrew McCallum. 2017. https://doi.org/10.18653/v1/S17-2091 Semeval 2017 task 10: Scienceie-extracting keyphrases and relations from scientific publications . In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 546--555
-
[6]
Zhen Bi, Jing Chen, Yinuo Jiang, Feiyu Xiong, Wei Guo, Huajun Chen, and Ningyu Zhang. 2024. https://dl.acm.org/doi/full/10.1145/3641850 Codekgc: Code language model for generative knowledge graph construction . ACM Transactions on Asian and Low-Resource Language Information Processing, 23(3):1--16
-
[7]
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. https://doi.org/10.1145/1376616.1376746 Freebase: a collaboratively created graph database for structuring human knowledge . In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247--1250
-
[8]
Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. https://proceedings.neurips.cc/paper/2013/hash/1cecc7a77928ca8133fa24680a88d2f9-Abstract.html Translating embeddings for modeling multi-relational data . In Advances in neural information processing systems, pages 2787--2795
work page 2013
-
[9]
Elizabeth Boschee, Jennifer Lautenschlager, Sean O’Brien, Steve Shellman, James Starz, and Michael Ward. 2015. https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/28075 ICEWS coded event data . Harvard Dataverse, 12
-
[10]
Pere-Llu \' s Huguet Cabot and Roberto Navigli. 2021. https://doi.org/10.18653/v1/2021.findings-emnlp.204 REBEL : Relation extraction by end-to-end language generation . In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2370--2381
-
[11]
Arun Chaganty, Ashwin Paranjape, Percy Liang, and Christopher D Manning. 2017. https://doi.org/10.18653/v1/D17-1109 Importance sampling for unbiased on-demand evaluation of knowledge base population . In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1038--1048
-
[12]
Nancy Chinchor and Elaine Marsh. 1998. https://catalog.ldc.upenn.edu/docs/LDC2001T02/guidelines.IEtask42.ps Muc-7 information extraction task definition . In Proceeding of the 1998 Message Understanding Conference (MUC-7), pages 359--367
work page 1998
-
[13]
John Dagdelen, Alexander Dunn, Sanghoon Lee, Nicholas Walker, Andrew S Rosen, Gerbrand Ceder, Kristin A Persson, and Anubhav Jain. 2024. https://doi.org/https://doi.org/10.1038/s41467-024-45563-x Structured information extraction from scientific text with large language models . Nature Communications, 15(1):1418
-
[14]
Shib Sankar Dasgupta, Swayambhu Nath Ray, and Partha Talukdar. 2018. https://doi.org/10.18653/v1/D18-1225 Hyte: Hyperplane-based temporally aware knowledge graph embedding . In Proceedings of the 2018 conference on empirical methods in natural language processing, pages 2001--2011
-
[15]
Daniel Daza, Michael Cochez, and Paul Groth. 2021. https://doi.org/10.1145/3442381.3450141 Inductive entity representations from text via link prediction . In Proceedings of the Web Conference 2021, pages 798--808
-
[16]
Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. 2018. https://doi.org/https://doi.org/10.1609/aaai.v32i1.11573 Convolutional 2d knowledge graph embeddings . In Proceedings of the AAAI conference on artificial intelligence, volume 32
-
[17]
Bhuwan Dhingra, Jeremy R Cole, Julian Martin Eisenschlos, Daniel Gillick, Jacob Eisenstein, and William W Cohen. 2022. https://doi.org/10.1162/tacl_a_00459 Time-aware language models as temporal knowledge bases . Transactions of the Association for Computational Linguistics, 10:257--273
-
[18]
Bayu Distiawan, Gerhard Weikum, Jianzhong Qi, and Rui Zhang. 2019. https://doi.org/10.18653/v1/P19-1023 Neural relation extraction for knowledge base enrichment . In Proceedings of the 2019 Conference of the Association for Computational Linguistics, pages 229--240
-
[19]
Hady Elsahar, Pavlos Vougiouklis, Arslen Remaci, Christophe Gravier, Jonathon Hare, Elena Simperl, and Frederique Laforest. 2019. https://aclanthology.org/L18-1544.pdf T-rex: A large scale alignment of natural language with knowledge base triples
work page 2019
-
[20]
Evgeniy Gabrilovich, Michael Ringgaard, and Amarnag Subramanya. 2013. https://lemurproject.org/clueweb12/FACC1/ Facc1: Freebase annotation of clueweb corpora
work page 2013
-
[21]
Luis Gal \'a rraga, Geremy Heitz, Kevin Murphy, and Fabian M Suchanek. 2014. https://doi.org/10.1145/2661829.2662073 Canonicalizing open knowledge bases . In Proceedings of the 23rd acm international conference on conference on information and knowledge management, pages 1679--1688
-
[22]
Tianyu Gao, Xu Han, Hao Zhu, Zhiyuan Liu, Peng Li, Maosong Sun, and Jie Zhou. 2019. https://doi.org/10.18653/v1/D19-1649 Fewrel 2.0: Towards more challenging few-shot relation classification . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing ...
-
[23]
Claire Gardent, Anastasia Shimorina, Shashi Narayan, and Laura Perez-Beltrachini. 2017. https://doi.org/10.18653/v1/W17-3518 The webnlg challenge: Generating text from rdf data . In Proceedings of the 10th International Conference on Natural Language Generation, pages 124--133
-
[24]
Saibo Geng, Martin Josifoski, Maxime Peyrard, and Robert West. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.674 Grammar-constrained decoding for structured nlp tasks without finetuning . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10932--10952
-
[25]
Xu Han, Hao Zhu, Pengfei Yu, Ziyun Wang, Yuan Yao, Zhiyuan Liu, and Maosong Sun. 2018. https://doi.org/10.18653/v1/D18-1514 FewRel : A large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), pages 4803--4809
-
[26]
Iris Hendrickx, Su Nam Kim, Zornitsa Kozareva, Preslav Nakov, Diarmuid \'O S \'e aghdha, Sebastian Pad \'o , Marco Pennacchiotti, Lorenza Romano, and Stan Szpakowicz. 2010. https://aclanthology.org/S10-1006.pdf Semeval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals . In Proceedings of the 5th International Workshop o...
work page 2010
-
[27]
Joel Jang, Seonghyeon Ye, Changho Lee, Sohee Yang, Joongbo Shin, Janghoon Han, Gyeonghun Kim, and Minjoon Seo. 2022 a . https://doi.org/10.18653/v1/2022.emnlp-main.418 T emporal W iki: A lifelong benchmark for training and evaluating ever-evolving language models . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, ...
-
[28]
Joel Jang, Seonghyeon Ye, Sohee Yang, Joongbo Shin, Janghoon Han, Gyeonghun Kim, Stanley Jungkyu Choi, and Minjoon Seo. 2022 b . https://openreview.net/forum?id=vfsRB5MImo9 Towards continual knowledge learning of language models . In ICLR
work page 2022
-
[29]
Heng Ji, Ralph Grishman, Hoa Trang Dang, Kira Griffitt, and Joe Ellis. 2010. https://blender.cs.illinois.edu/paper/kbp2010overview.pdf Overview of the TAC 2010 knowledge base population track . In Proceedings of the 2010 Text Analysis Conference (TAC 2010), pages 1--25
work page 2010
-
[30]
Pengcheng Jiang, Jiacheng Lin, Zifeng Wang, Jimeng Sun, and Jiawei Han. 2024. https://doi.org/10.18653/v1/2024.naacl-long.155 Genres: Rethinking evaluation for generative relation extraction in the era of large language models . In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Lang...
-
[31]
Martin Josifoski, Nicola De Cao, Maxime Peyrard, Fabio Petroni, and Robert West. 2022. https://doi.org/10.18653/v1/2022.naacl-main.342 G en IE : Generative information extraction . In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4626--4643, Seattle, Un...
-
[32]
Martin Josifoski, Marija Sakota, Maxime Peyrard, and Robert West. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.96 Exploiting asymmetry for synthetic training data generation: S ynth IE and the case of information extraction . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 1555--1574, Singapore. Associ...
-
[33]
Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Akari Asai, Xinyan Yu, Dragomir Radev, Noah A Smith, Yejin Choi, Kentaro Inui, et al. 2024. https://proceedings.neurips.cc/paper_files/paper/2023/file/9941624ef7f867a502732b5154d30cb7-Paper-Datasets_and_Benchmarks.pdf Realtime qa: What's the answer right now? Advances in Neural Information Processing Systems, 36
-
[34]
Jinyoung Kim, Dayoon Ko, and Gunhee Kim. 2024 a . https://doi.org/10.18653/v1/2024.emnlp-main.762 D ynamic ER : Resolving emerging mentions to dynamic entities for RAG . In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 13752--13770, Miami, Florida, USA. Association for Computational Linguistics
-
[35]
Yujin Kim, Jaehong Yoon, Seonghyeon Ye, Sangmin Bae, Namgyu Ho, Sung Ju Hwang, and Se-Young Yun. 2024 b . https://doi.org/10.18653/v1/2024.naacl-long.302 Carpe diem: On the evaluation of world knowledge in lifelong language models . In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human ...
-
[36]
Dayoon Ko, Jinyoung Kim, Hahyeon Choi, and Gunhee Kim. 2024. https://doi.org/10.18653/v1/2024.acl-long.181 G row OVER : How can LLM s adapt to growing real-world knowledge? In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3282--3308, Bangkok, Thailand. Association for Computational L...
- [37]
-
[38]
Kenton Lee, Luheng He, Mike Lewis, and Luke Zettlemoyer. 2017. https://doi.org/10.18653/v1/D17-1018 End-to-end neural coreference resolution . In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 188--197, Copenhagen, Denmark. Association for Computational Linguistics
-
[39]
u ttler, Mike Lewis, Wen-tau Yih, Tim Rockt \
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K \"u ttler, Mike Lewis, Wen-tau Yih, Tim Rockt \"a schel, et al. 2020. https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html Retrieval-augmented generation for knowledge-intensive nlp tasks . In Proceedings of th...
work page 2020
- [40]
-
[41]
Jiao Li, Yueping Sun, Robin J Johnson, Daniela Sciaky, Chih-Hsuan Wei, Robert Leaman, Allan Peter Davis, Carolyn J Mattingly, Thomas C Wiegers, and Zhiyong Lu. 2016. https://doi.org/10.1093/database/baw068 Biocreative v cdr task corpus: a resource for chemical disease relation extraction . Database, 2016
-
[42]
Adam Liska, Tomas Kocisky, Elena Gribovskaya, Tayfun Terzi, Eren Sezener, Devang Agrawal, D’Autume Cyprien De Masson, Tim Scholtes, Manzil Zaheer, Susannah Young, et al. 2022. https://proceedings.mlr.press/v162/liska22a.html Streaming QA : A benchmark for adaptation to new knowledge over time in question answering models . In International Conference on M...
work page 2022
-
[43]
Yi Luan, Luheng He, Mari Ostendorf, and Hannaneh Hajishirzi. 2018. https://doi.org/10.18653/v1/D18-1360 Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3219--3232
-
[44]
Ling Luo, Po-Ting Lai, Chih-Hsuan Wei, Cecilia N Arighi, and Zhiyong Lu. 2022. https://doi.org/10.1093/bib/bbac282 Bio RED : a rich biomedical relation extraction dataset . Briefings in Bioinformatics, 23(5):bbac282
-
[45]
Xin Lv, Yankai Lin, Yixin Cao, Lei Hou, Juanzi Li, Zhiyuan Liu, Peng Li, and Jie Zhou. 2022. https://doi.org/10.18653/v1/2022.findings-acl.282 Do pre-trained models benefit knowledge graph completion? a reliable evaluation and a reasonable approach . In Findings of the Association for Computational Linguistics: ACL 2022, pages 3570--3581, Dublin, Ireland....
-
[46]
Farzaneh Mahdisoltani, Joanna Biega, and Fabian Suchanek. 2014. https://imt.hal.science/hal-01699874/ Yago3: A knowledge base from multilingual wikipedias . In 7th biennial conference on innovative data systems research. CIDR Conference
work page 2014
-
[47]
Katerina Margatina, Shuai Wang, Yogarshi Vyas, Neha Anna John, Yassine Benajiba, and Miguel Ballesteros. 2023. https://doi.org/10.18653/v1/2023.eacl-main.211 Dynamic benchmarking of masked language models on temporal concept drift with multiple views . In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Lingu...
-
[48]
Filipe Mesquita, Matteo Cannaviccio, Jordan Schmidek, Paramita Mirza, and Denilson Barbosa. 2019. https://doi.org/10.18653/v1/D19-1069 K nowledge N et: A benchmark dataset for knowledge base population . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language ...
-
[49]
George A Miller. 1995. https://doi.org/10.1145/219717.219748 Wordnet: a lexical database for english . Communications of the ACM, 38(11):39--41
-
[50]
Yasumasa Onoe, Michael Zhang, Eunsol Choi, and Greg Durrett. 2022. https://doi.org/10.18653/v1/2022.findings-naacl.52 Entity cloze by date: What LM s know about unseen entities . In Findings of the Association for Computational Linguistics: NAACL 2022, pages 693--702, Seattle, United States. Association for Computational Linguistics
-
[51]
Yasumasa Onoe, Michael Zhang, Shankar Padmanabhan, Greg Durrett, and Eunsol Choi. 2023. https://doi.org/10.18653/v1/2023.acl-long.300 Can LM s learn new entities from descriptions? challenges in propagating injected knowledge . In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5469--5...
-
[52]
Riccardo Orlando, Pere-Llu \'i s Huguet Cabot, Edoardo Barba, and Roberto Navigli. 2024. https://doi.org/10.18653/v1/2024.findings-acl.839 R e L i K : Retrieve and L in K , fast and accurate entity linking and relation extraction on an academic budget . In Findings of the Association for Computational Linguistics: ACL 2024, pages 14114--14132, Bangkok, Th...
-
[53]
Heiko Paulheim. 2016. https://doi.org/10.3233/SW-160218 Knowledge graph refinement: A survey of approaches and evaluation methods . Semantic web, 8(3):489--508
-
[54]
Simon Razniewski, Hiba Arnaout, Shrestha Ghosh, and Fabian Suchanek. 2024. https://doi.org/10.1145/3639563 Completeness, recall, and negation in open-world knowledge bases: A survey . ACM Computing Surveys, 56(6):1--42
-
[55]
Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010. https://doi.org/https://doi.org/10.1007/978-3-642-15939-8_10 Modeling relations and their mentions without labeled text . In Proceedings of the 2010 European Conference on Machine Learning and Knowledge Discovery in Databases, pages 148--163
-
[56]
Gaetano Rossiello, Md Faisal Mahbub Chowdhury, Nandana Mihindukulasooriya, Owen Cornec, and Alfio Massimiliano Gliozzo. 2023. https://doi.org/https://doi.org/10.1609/aaai.v37i13.27084 Knowgl: Knowledge generation and linking from text . In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 16476--16478
-
[57]
Dan Roth and Wen-tau Yih. 2004. https://aclanthology.org/W04-2401.pdf A linear programming formulation for global inference in natural language tasks . Technical report, Illinois Univ at Urbana-Champaign Dept of Computer Science
work page 2004
-
[58]
Tara Safavi and Danai Koutra. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.669 C o DE x: A C omprehensive K nowledge G raph C ompletion B enchmark . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8328--8350, Online. Association for Computational Linguistics
-
[59]
Tong Shen, Fu Zhang, and Jingwei Cheng. 2022. https://doi.org/https://doi.org/10.1016/j.knosys.2022.109597 A comprehensive overview of knowledge graph completion . Knowledge-Based Systems, 255:109597
-
[60]
Zhiyi Song, Ann Bies, Stephanie Strassel, Tom Riese, Justin Mott, Joe Ellis, Jonathan Wright, Seth Kulick, Neville Ryant, and Xiaoyi Ma. 2015. https://aclanthology.org/W15-0812.pdf From light to rich ere: annotation of entities, relations, and events . In Proceedings of the the 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation...
work page 2015
-
[61]
Budhitama Subagdja, D Shanthoshigaa, Zhaoxia Wang, and Ah-Hwee Tan. 2024. https://doi.org/10.1145/3640313 Machine learning for refining knowledge graphs: A survey . ACM Computing Surveys, 56(6):1--38
-
[62]
TAC-KBP. 2022. https://tac.nist.gov/tracks/index.html TAC-KBP home page
work page 2022
-
[63]
Kristina Toutanova and Danqi Chen. 2015. https://aclanthology.org/W15-4007.pdf Observed versus latent features for knowledge base and text inference . In Proceedings of the 3rd workshop on continuous vector space models and their compositionality, pages 57--66
work page 2015
-
[64]
Denny Vrande c i \'c and Markus Kr \"o tzsch. 2014. https://doi.org/10.1145/2629489 Wikidata: a free collaborative knowledgebase . Communications of the ACM, 57(10):78--85
-
[65]
Tu Vu, Mohit Iyyer, Xuezhi Wang, Noah Constant, Jerry Wei, Jason Wei, Chris Tar, Yun-Hsuan Sung, Denny Zhou, Quoc Le, and Thang Luong. 2024. https://doi.org/10.18653/v1/2024.findings-acl.813 F resh LLM s: Refreshing large language models with search engine augmentation . In Findings of the Association for Computational Linguistics: ACL 2024, pages 13697--...
-
[66]
Christopher Walker, Stephanie Strassel, Julie Medero, and Kazuaki Maeda. 2006. Ace 2005 multilingual training corpus. Linguistic Data Consortium, Philadelphia, 57
work page 2006
-
[67]
Liang Wang, Wei Zhao, Zhuoyu Wei, and Jingming Liu. 2022. https://doi.org/10.18653/v1/2022.acl-long.295 S im KGC : Simple contrastive knowledge graph completion with pre-trained language models . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4281--4294, Dublin, Ireland. Associatio...
-
[68]
Xiaozhi Wang, Tianyu Gao, Zhaocheng Zhu, Zhiyuan Liu, Juanzi Li, and Jian Tang. 2021. https://doi.org/10.1162/tacl_a_00360 KEPLER : A unified model for knowledge embedding and pre-trained language representation . Transactions of the Association for Computational Linguistics, 9:176--194
-
[69]
Colin White, Samuel Dooley, Manley Roberts, Arka Pal, Ben Feuer, Siddhartha Jain, Ravid Shwartz-Ziv, Neel Jain, Khalid Saifullah, Siddartha Naidu, et al. 2024. https://openreview.net/forum?id=sKYHBTAxVa Livebench: A challenging, contamination-free llm benchmark . arXiv preprint arXiv:2406.19314
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [70]
-
[71]
Xiaobao Wu, Liangming Pan, William Yang Wang, and Anh Tuan Luu. 2024 b . https://doi.org/10.18653/v1/2024.emnlp-main.843 AKEW : Assessing knowledge editing in the wild . In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 15118--15133, Miami, Florida, USA. Association for Computational Linguistics
-
[72]
Xiaobao Wu, Liangming Pan, Yuxi Xie, Ruiwen Zhou, Shuai Zhao, Yubo Ma, Mingzhe Du, Rui Mao, Anh Tuan Luu, and William Yang Wang. 2024 c . https://arxiv.org/pdf/2412.13670 Antileak-bench: Preventing data contamination by automatically constructing benchmarks with updated real-world knowledge . arXiv preprint arXiv:2412.13670
-
[73]
Rui Xing, Jie Luo, and Tengwei Song. 2020. https://doi.org/https://doi.org/10.1186/s12859-020-03889-5 Biorel: towards large-scale biomedical relation extraction . BMC bioinformatics, 21:1--13
-
[74]
Wenhan Xiong, Mo Yu, Shiyu Chang, Xiaoxiao Guo, and William Yang Wang. 2018. https://doi.org/10.18653/v1/D18-1223 One-shot relational learning for knowledge graphs . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1980--1990, Brussels, Belgium. Association for Computational Linguistics
-
[75]
Derong Xu, Wei Chen, Wenjun Peng, Chao Zhang, Tong Xu, Xiangyu Zhao, Xian Wu, Yefeng Zheng, Yang Wang, and Enhong Chen. 2024. https://doi.org/https://doi.org/10.1007/s11704-024-40555-y Large language models for generative information extraction: A survey . Frontiers of Computer Science, 18(6):186357
-
[76]
Yuan Yao, Deming Ye, Peng Li, Xu Han, Yankai Lin, Zhenghao Liu, Zhiyuan Liu, Lixin Huang, Jie Zhou, and Maosong Sun. 2019. https://aclanthology.org/P19-1074 DocRED : A large-scale document-level relation extraction dataset . In Proceedings of the 2019 Annual Meeting of the Association for Computational Linguistics (ACL 2019), pages 764--777
work page 2019
-
[77]
Klim Zaporojets, Johannes Deleu, Chris Develder, and Thomas Demeester. 2021. https://doi.org/10.1016/j.ipm.2021.102563 DWIE : An entity-centric dataset for multi-task document-level information extraction . Information Processing & Management, 58(4):102563
-
[78]
Urchade Zaratiana, Nadi Tomeh, Pierre Holat, and Thierry Charnois. 2024. https://doi.org/https://doi.org/10.1609/aaai.v38i17.29919 An autoregressive text-to-graph framework for joint entity and relation extraction . In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 19477--19487
-
[79]
Bowen Zhang and Harold Soh. 2024. https://doi.org/10.18653/v1/2024.emnlp-main.548 Extract, define, canonicalize: An LLM -based framework for knowledge graph construction . In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 9820--9836, Miami, Florida, USA. Association for Computational Linguistics
-
[80]
Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and Christopher D. Manning. 2017. https://doi.org/10.18653/v1/D17-1004 Position-aware attention and supervised data improve slot filling . In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 35--45, Copenhagen, Denmark. Association for Computational Linguistics
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.