pith. sign in

arxiv: 2605.21369 · v1 · pith:WWF5JHJJnew · submitted 2026-05-20 · 💻 cs.CL

Findings of the Fifth Shared Task on Multilingual Coreference Resolution: Expanding Datasets for Long-Range Entities

Pith reviewed 2026-05-21 04:35 UTC · model grok-4.3

classification 💻 cs.CL
keywords multilingual coreference resolutionshared tasklong-range entitiesCorefUDLLM-based systemsmention identificationidentity clustering
0
0 comments X

The pith

Traditional systems outperformed LLM approaches in the fifth multilingual coreference resolution shared task focused on long-range entities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reports results from a shared task where teams built systems to identify mentions and cluster them into coreference chains across many languages, with special attention to chains that extend over long distances in text. It added five new datasets and two languages via CorefUD version 1.4, expanding coverage to 27 datasets in 19 languages total. Ten systems competed, four of them based on large language models, yet traditional methods still achieved the highest scores. The results point to LLMs gaining capability quickly enough to become competitive soon. A reader might care because accurate coreference resolution lets machines track entities reliably in extended conversations or documents.

Core claim

The fifth edition of the multilingual coreference resolution shared task emphasized long-range entities by incorporating five new datasets and two additional languages through CorefUD v1.4, resulting in 27 datasets across 19 languages. Ten systems participated, including four LLM-based approaches, and while traditional systems maintained the lead in mention identification and identity-based clustering, the LLM systems showed significant potential to challenge them in future editions.

What carries the argument

The shared-task evaluation setup on long-range coreference chains using the expanded CorefUD v1.4 collection, which compares performance of traditional and LLM-based systems on mention detection and clustering.

If this is right

  • LLM-based systems will likely narrow or close the performance gap with traditional methods in subsequent shared tasks.
  • The added long-range datasets will support more targeted development of systems that handle distant entity references.
  • Increased language coverage will improve the robustness of multilingual coreference tools.
  • Future editions may attract more LLM participants as their potential becomes clearer.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Stronger long-range coreference handling could improve downstream applications such as multi-document summarization and dialogue systems.
  • The task setup might be extended to test cross-lingual transfer of coreference knowledge between the added languages.
  • If LLMs scale well here, similar shared tasks could incorporate them earlier for other discourse phenomena.

Load-bearing premise

The new datasets and languages added via CorefUD v1.4 sufficiently represent genuine long-range coreference phenomena and that the shared-task evaluation metrics allow fair comparison between traditional and LLM-based systems.

What would settle it

An analysis of the new datasets showing that their coreference chains are not substantially longer than those in prior versions, or a next edition of the task where an LLM system surpasses all traditional systems on the same metrics.

Figures

Figures reproduced from arXiv: 2605.21369 by Anna Nedoluzhko, Daniel Zeman, Jakub Sido, Martin Popel, Michal Nov\'ak, Milan Straka, Miloslav Konop\'ik, Ond\v{r}ej Pra\v{z}\'ak, Zden\v{e}k \v{Z}abokrtsk\'y.

Figure 1
Figure 1. Figure 1: Serialization of a Spanish example sentence from [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Plot with results for individual languages in the primary metric (CoNLL F [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: CoNLL F1 scores of the best systems and the [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Performance of the systems on documents as their p95 entity range increases. Each data point represents [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Evolution of Codabench Submissions in the evaluation phase. The submissions to the LLM and Uncon [PITH_FULL_IMAGE:figures/full_fig_p032_5.png] view at source ↗
read the original abstract

This paper describes the fifth edition of the Shared Task on Multilingual Coreference Resolution, held in conjunction with the CODI-CRAC 2026 workshop. Building on previous iterations, the task required participants to develop systems capable of mention identification and identity-based coreference clustering. The 2026 edition specifically emphasizes long-range entities, defined as coreferential chains spanning significant distances, across many words and sentences. The task expanded its linguistic scope by incorporating five new datasets and two additional languages. These additions leverage version 1.4 of CorefUD, a harmonized multilingual collection comprising 27 datasets in 19 languages. In total, ten systems participated, including four LLM-based approaches (three fine-tuned models and one few-shot approach). While traditional systems still maintained their lead, LLMs demonstrated significant potential, suggesting they may soon challenge established approaches in future editions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript reports findings from the fifth Shared Task on Multilingual Coreference Resolution (CODI-CRAC 2026), which expands prior editions by emphasizing long-range coreference chains and adding five new datasets plus two languages via CorefUD v1.4 (now 27 datasets in 19 languages). Ten systems participated (six traditional, four LLM-based); traditional systems led on standard metrics while LLMs are described as showing 'significant potential' for future editions.

Significance. If the reported outcomes are robust, the work supplies a timely multilingual benchmark that tracks the transition toward LLM participation in coreference and isolates long-range phenomena as a distinct evaluation focus. This can guide dataset curation and system design in multilingual NLP, especially if future iterations incorporate the distance-specific diagnostics the current edition appears to lack.

major comments (1)
  1. [Abstract and Results] Abstract and Results section: the central claim that LLMs 'demonstrated significant potential' specifically for long-range entities rests on aggregate CoNLL-style F1 scores alone. No distance-binned F1, chain-span histograms, or long-chain-specific metrics are reported, so the attribution of LLM promise to distant coreference (rather than mention detection or short-range clustering) remains untested and load-bearing for the paper's emphasis on long-range phenomena.
minor comments (1)
  1. [Introduction] Clarify the exact quantitative criteria used to define 'long-range' entities (e.g., minimum sentence or token span) so readers can assess how well the new CorefUD v1.4 additions actually instantiate the targeted phenomenon.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript describing the findings of the fifth Shared Task on Multilingual Coreference Resolution. We address the major comment point by point below and indicate planned revisions.

read point-by-point responses
  1. Referee: [Abstract and Results] Abstract and Results section: the central claim that LLMs 'demonstrated significant potential' specifically for long-range entities rests on aggregate CoNLL-style F1 scores alone. No distance-binned F1, chain-span histograms, or long-chain-specific metrics are reported, so the attribution of LLM promise to distant coreference (rather than mention detection or short-range clustering) remains untested and load-bearing for the paper's emphasis on long-range phenomena.

    Authors: We agree with the referee that the current manuscript relies on aggregate CoNLL F1 scores to support the statement that LLMs showed significant potential, without providing distance-binned or long-chain-specific breakdowns. Although the task definition and dataset expansion explicitly targeted long-range coreference, our analysis of the ten participating systems (six traditional, four LLM-based) did not include these finer-grained diagnostics. This is a valid observation that limits the strength of the attribution to long-range handling. In the revised manuscript we will add distance-binned F1 results (e.g., separating chains spanning >5 sentences) and chain-span histograms to clarify where LLM approaches show relative strengths or weaknesses compared with traditional systems. revision: yes

Circularity Check

0 steps flagged

Shared-task findings paper reports external participant results on public data with no self-referential derivations

full rationale

This is a standard shared-task findings paper describing task setup, new datasets from CorefUD v1.4, participant submissions (including independent LLM systems), and aggregate scores. No equations, fitted parameters, or theoretical derivations exist. Claims about LLM potential derive from observed competition outcomes on held-out test data rather than any reduction to author-defined inputs or self-citations. The long-range emphasis is a task definition, not a result derived from prior author work in a circular manner. Self-contained against external benchmarks and public releases.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical shared-task report containing no mathematical derivations, fitted parameters, or postulated entities. It rests on standard assumptions of shared-task evaluation such as consistent annotation guidelines across languages.

pith-pipeline@v0.9.0 · 5733 in / 1042 out tokens · 31690 ms · 2026-05-21T04:35:04.266688+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 1 internal anchor

  1. [1]

    Amit Bagga and Breck Baldwin. 1998. Algorithms for Scoring Coreference Chains . In Proceedings of The First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference, pages 563--566

  2. [2]

    David Bamman, Olivia Lewke, and Anya Mansoor. 2020. https://aclanthology.org/2020.lrec-1.6/ An Annotated Dataset of Coreference in E nglish Literature . In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 44--54, Marseille, France. European Language Resources Association

  3. [3]

    David Bamman, Ted Underwood, and Noah A. Smith. 2014. https://doi.org/10.3115/v1/P14-1035 A B ayesian Mixed Effects Model of Literary Character . In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 370--379, Baltimore, Maryland. Association for Computational Linguistics

  4. [4]

    Antoine Bourgois and Thierry Poibeau. 2025. https://doi.org/10.18653/v1/2025.crac-1.5 The Elephant in the Coreference Room: Resolving Coreference in Full-Length F rench Fiction Works . In Proceedings of the Eighth Workshop on Computational Models of Reference, Anaphora and Coreference, pages 55--69, Suzhou, China. Association for Computational Linguistics

  5. [5]

    Peter Bourgonje and Manfred Stede. 2020. https://aclanthology.org/2020.lrec-1.133/ The P otsdam Commentary Corpus 2.2: Extending Annotations for Shallow Discourse Parsing . In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 1061--1066, Marseille, France. European Language Resources Association

  6. [6]

    Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzm \'a n, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. https://doi.org/10.18653/v1/2020.acl-main.747 Unsupervised Cross-lingual Representation Learning at Scale . In Proceedings of the 58th Annual Meeting of the Association for Comp...

  7. [7]

    Eleonora Delfino, Roberta Leotta, Marco Passarotti, and Giovanni Moretti. 2024. Building CorefLat. A Linguistic Resource for Coreference and Anaphora Resolution in Latin. In Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), pages 273--279

  8. [8]

    Pascal Denis and Jason Baldridge. 2009. Global joint models for coreference resolution and named entity classification. Procesamiento del lenguaje natural, 42

  9. [9]

    Jacob Devlin, Ming - Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. http://arxiv.org/abs/1810.04805 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . CoRR, abs/1810.04805

  10. [10]

    Qipeng Guo, Xiangkun Hu, Yue Zhang, Xipeng Qiu, and Zheng Zhang. 2023. https://doi.org/10.18653/v1/2023.acl-long.851 Dual Cache for Long Document Neural Coreference Resolution . In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15272--15285, Toronto, Canada. Association for Computatio...

  11. [11]

    Talika Gupta, Hans Ole Hatzel, and Chris Biemann. 2024. https://aclanthology.org/2024.latechclfl-1.2/ Coreference in Long Documents using Hierarchical Entity Merging . In Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024), pages 11--17, St. Julians,...

  12. [12]

    Sooyoun Han, Sumin Seo, Minji Kang, Jongin Kim, Nayoung Choi, Min Song, and Jinho D. Choi. 2021. https://doi.org/10.18653/v1/2021.crac-1.3 F antasy C oref: Coreference Resolution on Fantasy Literature Through Omniscient Writer ' s Point of View . In Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference, pages 24...

  13. [13]

    Dag Trygve Truslew Haug and Marius L. J hndal. 2008. https://api.semanticscholar.org/CorpusID:204978005 Creating a Parallel Treebank of the Old Indo-European Bible Translations . In Proceedings of the second workshop on language technology for cultural heritage data (LaTeCH 2008)

  14. [14]

    Kyungeun Kim, Seungjun Lee, Yohan Jo, Sungchul Kim, Chanhee Lee, Junyeong Park, Heegeun Yoon, Seokmin Shin, Ilhong Yun, and Kyomin Jung. 2024. http://arxiv.org/abs/2404.01140 KoCoNovel: Annotated Dataset of Character Coreference in Korean Novels

  15. [15]

    Fr \'e d \'e ric Landragin. 2021. https://hal.archives-ouvertes.fr/hal-03474748 Le corpus Democrat et son exploitation. Pr \'e sentation . Langages , 224:11--24

  16. [16]

    Ekaterina Lapshinova-Koltunski, Christian Hardmeier, and Pauline Krielke. 2018. https://aclanthology.org/L18-1065/ ParCorFull: a Parallel Corpus Annotated with Full Coreference . In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association

  17. [17]

    Kenton Lee, Luheng He, Mike Lewis, and Luke Zettlemoyer. 2017. https://doi.org/10.18653/v1/D17-1018 End-to-end Neural Coreference Resolution . In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 188--197, Copenhagen, Denmark. Association for Computational Linguistics

  18. [18]

    Xiaoqiang Luo. 2005. https://doi.org/10.3115/1220575.1220579 On Coreference Resolution Performance Metrics . In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT 2005, pages 25--32. Association for Computational Linguistics

  19. [19]

    Petter M hlum, Dag Haug, Tollef J rgensen, Andre K sen, Anders N klestad, Egil R nningstad, Per Erik Solberg, Erik Velldal, and Lilja vrelid. 2022. https://aclanthology.org/2022.crac-1.6/ NARC -- N orwegian Anaphora Resolution Corpus . In Proceedings of the Fifth Workshop on Computational Models of Reference, Anaphora and Coreference, pages 48--60, Gyeong...

  20. [20]

    Giuliano Martinelli, Edoardo Barba, and Roberto Navigli. 2024. https://doi.org/10.18653/v1/2024.acl-long.722 Maverick: Efficient and Accurate Coreference Resolution Defying Recent Trends . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13380--13394, Bangkok, Thailand. Association f...

  21. [21]

    Giuliano Martinelli, Tommaso Bonomo, Pere-Llu \'i s Huguet Cabot, and Roberto Navigli. 2025 a . https://aclanthology.org/2025.acl-long.1197/ BOOKCOREF : Coreference Resolution at Book Scale . In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 24526--24544, Vienna, Austria. Association ...

  22. [22]

    Giuliano Martinelli, Bruno Gatti, and Roberto Navigli. 2025 b . https://doi.org/10.18653/v1/2025.emnlp-main.1737 x C o R e: Cross-context Coreference Resolution . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 34264--34278, Suzhou, China. Association for Computational Linguistics

  23. [24]

    Marie Mikulová, Jiří Mírovský, Anna Nedoluzhko, Petr Pajas, Jan Štěpánek, and Jan Hajič. 2017 a . PDTSC 2.0 -- Spoken Corpus with Rich Multi-layer Structural Annotation. In Lecture Notes in Computer Science, 20th International Conference, TSD 2017 Prague, Czech Republic, No. 10415, pages 129--137, Cham / Heidelberg / New York / Dordrecht / London. Springe...

  24. [25]

    Marie Mikulová, Jiří Mírovský, Anna Nedoluzhko, Petr Pajas, Jan Štěpánek, and Jan Hajič. 2017 b . https://doi.org/10.1007/978-3-319-64206-2_15 PDTSC 2.0 - Spoken Corpus with Rich Multi-layer Structural Annotation . In Text, Speech, and Dialogue. TSD 2017, volume 10415 of Lecture Notes in Computer Science, Cham, Switzerland. Springer

  25. [26]

    Marie Mikulová, Barbora Štěpánková, Daniel Zeman, Jan Štěpánek, Milan Straka, and Jan Hajič. 2026. Meet UD\_Czech-PDTC : A Large and Genre-Rich Treebank in Universal Dependencies . In Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC 2026), Palma, Spain. European Language Resources Association

  26. [27]

    Nafise Sadat Moosavi and Michael Strube. 2016. https://doi.org/10.18653/v1/P16-1060 Which Coreference Evaluation Metric Do You Trust? A Proposal for a Link-based Entity Aware Metric . In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 632--642, Berlin, Germany. Association for Computat...

  27. [28]

    Vandan Mujadia, Palash Gupta, and Dipti Misra Sharma. 2016. https://aclanthology.org/L16-1025/ Coreference Annotation Scheme and Relation Types for H indi . In Proceedings of the Tenth International Conference on Language Resources and Evaluation ( LREC '16) , pages 161--168, Portoro z , Slovenia. European Language Resources Association (ELRA)

  28. [29]

    Judith Muzerelle, Ana \"i s Lefeuvre, Emmanuel Schang, Jean-Yves Antoine, Aurore Pelletier, Denis Maurel, Iris Eshkol, and Jeanne Villaneau. 2014. https://aclanthology.org/L14-1169/ ANCOR \_ C entre, a large free spoken F rench coreference corpus: description of the resource and reliability measures . In Proceedings of the Ninth International Conference o...

  29. [30]

    Frédérique Mélanie-Becquet, Jean Barré, Olga Seminck, Clément Plancq, Marco Naguib, Martial Pastor, and Thierry Poibeau. 2024. https://doi.org/10.48694/jcls.3924 BookNLP-fr, the French Versant of BookNLP. A Tailored Pipeline for 19th and 20th Century French Literature . Journal of Computational Literary Studies, 3:1--34

  30. [31]

    Sangha Nam, Minho Lee, Donghwan Kim, Kijong Han, Kuntae Kim, Sooji Yoon, Eun-kyung Kim, and Key-Sun Choi. 2020. https://aclanthology.org/2020.lrec-1.27/ Effective Crowdsourcing of Multiple Tasks for Comprehensive Knowledge Extraction . In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 212--219, Marseille, France. European L...

  31. [32]

    Anna Nedoluzhko, Michal Nov \'a k, Silvie Cinkov \'a , Marie Mikulov \'a , and Ji r \' M \' rovsk \'y . 2016. https://www.aclweb.org/anthology/L16-1026 Coreference in P rague C zech- E nglish D ependency T reebank . In Proceedings of the Tenth International Conference on Language Resources and Evaluation ( LREC 2016) , pages 169--176, Portoro z , Slovenia...

  32. [33]

    Anna Nedoluzhko, Michal Nov \'a k, Martin Popel, Zden e k Z abokrtsk \'y , Amir Zeldes, and Daniel Zeman. 2022. https://aclanthology.org/2022.lrec-1.520 C oref UD 1.0: Coreference Meets U niversal D ependencies . In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4859--4872, Marseille, France. European Language Resources ...

  33. [34]

    Michal Nov \'a k, Barbora Dohnalov \'a , Miloslav Konopik, Anna Nedoluzhko, Martin Popel, Ondrej Prazak, Jakub Sido, Milan Straka, Zden e k Z abokrtsk \'y , and Daniel Zeman. 2024. https://doi.org/10.18653/v1/2024.crac-1.8 Findings of the Third Shared Task on Multilingual Coreference Resolution . In Proceedings of the Seventh Workshop on Computational Mod...

  34. [35]

    Michal Nov \'a k, Miloslav Konopik, Anna Nedoluzhko, Martin Popel, Ondrej Prazak, Jakub Sido, Milan Straka, Zden e k Z abokrtsk \'y , and Daniel Zeman. 2025. https://doi.org/10.18653/v1/2025.crac-1.9 Findings of the Fourth Shared Task on Multilingual Coreference Resolution: Can LLM s Dethrone Traditional Approaches? In Proceedings of the Eighth Workshop o...

  35. [36]

    Maciej Ogrodniczuk, Katarzyna Glowińska, Mateusz Kopeć, Agata Savary, and Magdalena Zawisławska. 2013. https://doi.org/10.1007/978-3-319-43808-5\_17 Polish Coreference Corpus . In Human Language Technology. Challenges for Computer Science and Linguistics --- 6th Language and Technology Conference ( LTC 2013), Revised Selected Papers , volume 9561 of Lectu...

  36. [37]

    Maciej Ogrodniczuk, Katarzyna Głowińska, Mateusz Kopeć, Agata Savary, and Magdalena Zawisławska. 2015. http://www.degruyter.com/view/product/428667 Coreference in P olish: Annotation, Resolution and Evaluation . Walter De Gruyter

  37. [38]

    Janis Pagel and Nils Reiter. 2020. https://aclanthology.org/2020.lrec-1.7/ G er D ra C or-Coref: A Coreference Corpus for Dramatic Texts in G erman . In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 55--64, Marseille, France. European Language Resources Association

  38. [39]

    Tu g ba Pamay and G \"u l s en Eryi g it. 2018. https://doi.org/10.1109/INISTA.2018.8466293 Turkish Coreference Resolution . In 2018 Innovations in Intelligent Systems and Applications (INISTA), pages 1--7

  39. [40]

    Sameer Pradhan, Xiaoqiang Luo, Marta Recasens, Eduard Hovy, Vincent Ng, and Michael Strube. 2014. https://doi.org/10.3115/v1/P14-2006 Scoring Coreference Partitions of Predicted Mentions: A Reference Implementation . In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 30--35, Baltimore...

  40. [41]

    Ond r ej Pra z \'a k, Miloslav Konop \'i k, and Jakub Sido. 2021. https://aclanthology.org/2021.ranlp-1.125/ Multilingual Coreference Resolution with Harmonized Annotations . In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 1119--1123, Held Online. INCOMA Ltd

  41. [42]

    Ant \`o nia Mart \'

    Marta Recasens, Eduard Hovy, and M. Ant \`o nia Mart \' . 2010. http://www.lrec-conf.org/proceedings/lrec2010/pdf/160_Paper.pdf A Typology of Near-Identity Relations for Coreference ( NIDENT ) . In Proceedings of the Seventh International Conference on Language Resources and Evaluation ( LREC 2010) , Valletta, Malta. European Language Resources Association

  42. [43]

    Marta Recasens and Eduard H. Hovy. 2011. https://doi.org/10.1017/S135132491000029X BLANC : Implementing the Rand index for coreference evaluation . Natural Language Engineering, 17(4):485--510

  43. [44]

    Ant\` o nia Mart\'

    Marta Recasens and M. Ant\` o nia Mart\' . 2010. https://doi.org/10.1007/s10579-009-9108-x AnCora-CO: Coreferentially Annotated Corpora for Spanish and Catalan . Language Resources and Evaluation, 44(4):315–345

  44. [45]

    Kumar Shridhar, Nicholas Monath, Raghuveer Thirukovalluru, Alessandro Stolfo, Manzil Zaheer, Andrew McCallum, and Mrinmaya Sachan. 2023. https://doi.org/10.18653/v1/2023.findings-eacl.105 Longtonotes: O nto N otes with Longer Coreference Chains . In Findings of the Association for Computational Linguistics: EACL 2023, pages 1428--1442, Dubrovnik, Croatia....

  45. [46]

    Milan Straka. 2026. CorPipe at CRAC 2026: Empty Nodes and Cross-Lingual Transfer in Multilingual Coreference Resolution . In Proceedings of the 2nd Joint Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences and Computational Models of Reference, Anaphora and Coreference (CODI-CRAC 2026), San Diego, California, USA. Asso...

  46. [47]

    Swanson, Bryce D

    Daniel G. Swanson, Bryce D. Bussert, and Francis Tyers. 2024. https://aclanthology.org/2024.lt4hala-1.5/ Towards Named-Entity and Coreference Annotation of the H ebrew B ible . In Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024, pages 36--40, Torino, Italia. ELRA and ICCL

  47. [48]

    Ant \`o nia Mart \' , and Marta Recasens

    Mariona Taul \'e , M. Ant \`o nia Mart \' , and Marta Recasens. 2008. http://www.lrec-conf.org/proceedings/lrec2008/pdf/35_paper.pdf A n C ora: Multilevel Annotated Corpora for C atalan and S panish . In Proceedings of the Sixth International Conference on Language Resources and Evaluation ( LREC 2008) , Marrakech, Morocco. European Language Resources Association

  48. [49]

    Sim, D.V

    Svetlana Toldova, Anna Roytberg, Alina Ladygina, Maria Vasilyeva, Ilya Azerkovich, Matvei Kurzukov, G. Sim, D.V. Gorshkov, A. Ivanova, Anna Nedoluzhko, and Yulia Grishina. 2014. Evaluating Anaphora and Coreference Resolution for Russian . In Komp'juternaja lingvistika i intellektual'nye tehnologii. Po materialam ezhegodnoj Mezhdunarodnoj konferencii Dialo...

  49. [50]

    Shubham Toshniwal, Sam Wiseman, Allyson Ettinger, Karen Livescu, and Kevin Gimpel. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.685 Learning to I gnore: L ong D ocument C oreference with B ounded M emory N eural N etworks . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8519--8526, Online. Ass...

  50. [51]

    Shubham Toshniwal, Patrick Xia, Sam Wiseman, Karen Livescu, and Kevin Gimpel. 2021. https://doi.org/10.18653/v1/2021.crac-1.12 On Generalization in Coreference Resolution . In Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference, pages 111--120, Punta Cana, Dominican Republic. Association for Computational Linguistics

  51. [52]

    No \'e mi Vad \'a sz. 2022. https://aclanthology.org/2022.crac-1.5/ Building a Manually Annotated H ungarian Coreference Corpus: Workflow and Tools . In Proceedings of the Fifth Workshop on Computational Models of Reference, Anaphora and Coreference, pages 38--47, Gyeongju, Korea. Association for Computational Linguistics

  52. [53]

    Andreas van Cranenburgh. 2019. https://clinjournal.org/clinj/article/view/91 A Dutch coreference resolution system with an evaluation on literary fiction . Computational Linguistics in the Netherlands Journal

  53. [54]

    Andreas van Cranenburgh and Gertjan van Noord . 2022. https://clinjournal.org/clinj/article/view/157 OpenBoek: A Corpus of Literary Coreference and Entities with an Exploration of Historical Spelling Normalization . Computational Linguistics in the Netherlands Journal, 12:235--251

  54. [55]

    Marc Vilain, John Burger, John Aberdeen, Dennis Connolly, and Lynette Hirschman. 1995. https://aclanthology.org/M95-1005 A Model-Theoretic Coreference Scoring Scheme . In Sixth Message Understanding Conference ( MUC -6): Proceedings of a Conference Held in C olumbia, M aryland, November 6-8, 1995

  55. [56]

    Veronika Vincze, Kl \'a ra Heged u s, Alex Sliz-Nagy, and Rich \'a rd Farkas. 2018. https://www.aclweb.org/anthology/L18-1061 S zeged K oref: A H ungarian Coreference Corpus . In Proceedings of the Eleventh International Conference on Language Resources and Evaluation ( LREC 2018) , Miyazaki, Japan. European Language Resources Association

  56. [57]

    Ralph Weischedel, Eduard Hovy, Mitchell Marcus, Martha Palmer, Robert Belvin, Sameer Pradhan, Lance Ramshaw, and Nianwen Xue. 2011. OntoNotes: A Large Training Corpus for Enhanced Processing. In Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation, pages 54--63, New York. Springer-Verlag

  57. [58]

    Juntao Yu, Michal Nov \'a k, Abdulrahman Aloraini, Nafise Sadat Moosavi, Silviu Paun, Sameer Pradhan, and Massimo Poesio. 2023. https://aclanthology.org/2023.iwcs-1.19 The Universal Anaphora Scorer 2.0 . In Proceedings of the 15th International Conference on Computational Semantics, pages 183--194, Nancy, France. Association for Computational Linguistics

  58. [59]

    Zden e k Z abokrtsk \'y , Miloslav Konop \' k, Anna Nedoluzhko, Michal Nov \'a k, Maciej Ogrodniczuk, Martin Popel, Ond r ej Pra z \'a k, Jakub Sido, and Daniel Zeman. 2023. https://doi.org/10.18653/v1/2023.crac-sharedtask.1 Findings of the Second Shared Task on Multilingual Coreference Resolution . In Proceedings of the CRAC 2023 Shared Task on Multiling...

  59. [60]

    Zden e k Z abokrtsk \'y , Miloslav Konop \' k, Anna Nedoluzhko, Michal Nov \'a k, Maciej Ogrodniczuk, Martin Popel, Ond r ej Pra z \'a k, Jakub Sido, Daniel Zeman, and Yilun Zhu. 2022. https://aclanthology.org/2022.crac-mcr.1/ Findings of the Shared Task on Multilingual Coreference Resolution . In Proceedings of the CRAC 2022 Shared Task on Multilingual C...

  60. [61]

    Amir Zeldes. 2017. https://doi.org/10.1007/s10579-016-9343-x The GUM Corpus: Creating Multilayer Resources in the Classroom . Language Resources and Evaluation, 51(3):581--612

  61. [62]

    Voldemaras Z itkus and Rita Butkien\. e . 2018. https://doi.org/10.1109/SNAMS.2018.8554892 Coreference Annotation Scheme and Corpus for Lithuanian Language . In Fifth International Conference on Social Networks Analysis, Management and Security, SNAMS 2018, Valencia, Spain, October 15-18, 2018 , pages 243--250. IEEE