AAbAAC: An Annotated Corpus for Autoimmunity Information Extraction

Adrien Coulet (HeKA | U1346); Fabien Maury (Imagine - U1163; HeKA | U1346); Maud de Dieuleveult (Imagine - U1163); Sol\`ene Grosdidier

arxiv: 2606.13051 · v1 · pith:5VWHGB7Anew · submitted 2026-06-11 · 💻 cs.AI

AAbAAC: An Annotated Corpus for Autoimmunity Information Extraction

Fabien Maury (Imagine - U1163 , HeKA | U1346) , Sol\`ene Grosdidier , Maud de Dieuleveult (Imagine - U1163) , Adrien Coulet (HeKA | U1346) This is my paper

Pith reviewed 2026-06-27 06:34 UTC · model grok-4.3

classification 💻 cs.AI

keywords autoimmunityannotated corpusnamed entity recognitioninformation extractionbiomedical NLPautoantibodiesPubMed abstracts

0 comments

The pith

A corpus of 115 annotated PubMed abstracts raises named entity recognition accuracy for autoimmunity entities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AAbAAC, a collection of 115 PubMed abstracts manually marked up for autoimmune diseases, autoantibodies, their molecular targets, body locations, clinical signs, and the relationships among them. It tests several named entity recognition methods on this corpus and then fine-tunes models, recording the expected performance lift. The work addresses the persistent shortfall of general models in narrow biomedical areas where entity types and relations carry domain-specific complexity. A reader would see value in the demonstration that modest, focused annotation can adapt extraction tools to one such area without requiring massive new data.

Core claim

The authors construct and release AAbAAC to support information extraction in autoimmunity. They show that the corpus allows evaluation of existing NER systems and that fine-tuning on the annotations produces measurable gains in recognition of the targeted entities and relations.

What carries the argument

AAbAAC corpus: 115 manually annotated abstracts containing entities for autoimmune diseases, autoantibodies, targets, locations, and clinical signs plus their relationships, used both to benchmark NER and to fine-tune models.

If this is right

Fine-tuned models extract autoimmune entities and relations more accurately than untuned general models.
Targeted annotation of a few hundred abstracts can close performance gaps in specialized biomedical subfields.
The released corpus supplies training data and a benchmark for further computational work on autoimmunity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same annotation strategy could be repeated for other narrow biomedical topics that mix molecular and clinical terms.
Extracted relations from the corpus could be assembled into small knowledge graphs linking autoantibodies to diseases and signs.
The corpus offers a ready test set for checking whether larger language models retain their general advantage after domain adaptation.

Load-bearing premise

The 115 abstracts stand in for the full range of autoimmunity literature and the manual labels correctly identify the entities and relations without meaningful bias or omission.

What would settle it

Fine-tuning any standard NER model on AAbAAC produces no gain, or a loss, in F1 score when tested on a fresh set of autoimmunity abstracts that were not used in annotation or training.

Figures

Figures reproduced from arXiv: 2606.13051 by Adrien Coulet (HeKA | U1346), Fabien Maury (Imagine - U1163, HeKA | U1346), Maud de Dieuleveult (Imagine - U1163), Sol\`ene Grosdidier.

read the original abstract

Despite advances in information extraction driven by deep learning and large language models, performance gaps remain in highly specialized biomedical fields, where domainspecific complexity poses challenges for generalist models. In this work, we focus on the domain of autoimmunity, where the main entities of interest are autoimmune diseases, autoantibodies (i.e., molecules that may mark or cause these diseases), their molecular targets, their location in the body, and their associated clinical signs. Herein, we present AAbAAC (AutoAntibodies and Autoimmunity Annotated Corpus), a corpus of 115 abstracts selected from PubMed, where we manually annotated entities and their relationships. First, AAbAAC was used to evaluate several methods on the task of named entity recognition (NER), and secondly, to fine-tune NER models. Our study demonstrates the utility of AAbAAC for information extraction in the domain of autoimmunity, showing expected improvement in NER performance after finetuning. This illustrates the value of small-scale annotation efforts for specialized domains and contributes to the computational study of autoimmunity. The AAbAAC corpus is available at https://github.com/f-maury/AAbAAC.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper releases a small new annotated corpus for autoimmunity NER that fills a narrow gap but follows standard practices.

read the letter

This paper's core offering is the AAbAAC corpus: 115 manually annotated PubMed abstracts covering entities like autoimmune diseases, autoantibodies, their targets, body locations, and clinical signs, along with their relationships. The authors use it to benchmark some NER methods and then fine-tune models, reporting the expected gains.

It does a solid job of addressing a specific gap in biomedical information extraction where general models often underperform due to domain complexity. Making the annotations available publicly on GitHub is a practical step, and documenting the improvement after fine-tuning confirms the value of such targeted resources for specialized areas.

On the downside, the abstract provides no numbers on the actual performance improvements, no scores for inter-annotator agreement, and no information on the annotation guidelines or process. This leaves the strength of the central claim somewhat open until the full details are checked. The corpus is also limited in size, which is common for manual annotation projects but restricts broader claims about its utility.

Readers who would get value from this are those working on information extraction in biomedicine, particularly anyone needing data for autoimmunity or similar rare or complex domains. It could help with building or evaluating domain-adapted models.

The paper shows clear thinking in identifying the need and executing a standard annotation and evaluation pipeline. I recommend sending it for peer review so that experts can assess the annotation quality and suggest ways to extend or validate the resource.

Referee Report

3 major / 2 minor

Summary. The paper presents AAbAAC, a manually annotated corpus of 115 PubMed abstracts in the autoimmunity domain. Entities (autoimmune diseases, autoantibodies, molecular targets, body locations, clinical signs) and relations are annotated; the corpus is used both to benchmark existing NER methods and to fine-tune models, with the abstract claiming an expected improvement in NER performance after fine-tuning. The resource is released on GitHub.

Significance. A verified small-scale annotated corpus for a specialized biomedical subdomain could support domain-adaptation experiments and lower the barrier for information extraction work in autoimmunity, where generalist models often underperform. Reproducible release of the data itself is a concrete contribution even if the reported gains are modest.

major comments (3)

[Abstract] Abstract: the central claim that fine-tuning produces 'expected improvement in NER performance' is unsupported by any quantitative results (F1, precision, recall, or statistical tests before vs. after fine-tuning), preventing evaluation of whether the corpus actually demonstrates utility.
[Methods] Methods/Results: no inter-annotator agreement scores, annotation guidelines, or details on how the 115 abstracts were selected and annotated are supplied, which directly affects the weakest assumption that the annotations accurately capture entities and relations without significant bias or error.
[Results] Evaluation section: the description of which NER methods were evaluated and how the fine-tuning experiments were conducted (train/test splits, baseline models, hyperparameters) is absent, making the 'utility demonstration' impossible to reproduce or assess.

minor comments (2)

[Abstract] Abstract contains minor typographical issues: 'domainspecific' should be 'domain-specific' and 'finetuning' should be 'fine-tuning'.
The GitHub release is welcome, but the paper should include a concise description of the annotation schema, entity/relation definitions, and file formats so readers can use the corpus without first inspecting the repository.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed review and constructive feedback. We address each major comment below. The primary contribution of the work is the release of the AAbAAC corpus; we will revise the manuscript to improve clarity, reproducibility, and support for all claims.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that fine-tuning produces 'expected improvement in NER performance' is unsupported by any quantitative results (F1, precision, recall, or statistical tests before vs. after fine-tuning), preventing evaluation of whether the corpus actually demonstrates utility.

Authors: We acknowledge that the abstract asserts an improvement without accompanying quantitative evidence in the current manuscript. The fine-tuning experiments were performed, but the numerical results and statistical comparisons were omitted from the text. In revision we will either remove the unsupported claim from the abstract or add the missing F1/precision/recall figures and significance tests to the results section so that the claim is properly substantiated. revision: yes
Referee: [Methods] Methods/Results: no inter-annotator agreement scores, annotation guidelines, or details on how the 115 abstracts were selected and annotated are supplied, which directly affects the weakest assumption that the annotations accurately capture entities and relations without significant bias or error.

Authors: The referee correctly identifies that these methodological details are absent. We will add a dedicated Methods subsection that includes (1) the annotation guidelines used, (2) inter-annotator agreement statistics (Cohen’s kappa or equivalent), and (3) the precise PubMed query and selection criteria applied to obtain the 115 abstracts. These additions will be included in the revised manuscript. revision: yes
Referee: [Results] Evaluation section: the description of which NER methods were evaluated and how the fine-tuning experiments were conducted (train/test splits, baseline models, hyperparameters) is absent, making the 'utility demonstration' impossible to reproduce or assess.

Authors: We agree that the current Evaluation section lacks the necessary experimental details for reproducibility. In the revision we will expand this section to specify the NER models tested, the train/validation/test splits used, the baseline systems, all hyper-parameters, and the exact fine-tuning protocol. This will allow readers to replicate the reported experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an empirical resource-creation and evaluation study: it describes manual annotation of 115 PubMed abstracts into the AAbAAC corpus, reports baseline NER performance on that corpus, and shows the expected performance lift after fine-tuning. No equations, first-principles derivations, fitted parameters later re-labeled as predictions, or load-bearing self-citations appear in the text. The central claim (utility demonstrated by measurable NER improvement) is directly supported by the new annotations and standard domain-adaptation experiments; it does not reduce to any prior definition or self-referential input by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper relies on standard practices in corpus annotation and machine learning fine-tuning without introducing new mathematical parameters or entities.

pith-pipeline@v0.9.1-grok · 5766 in / 1129 out tokens · 37066 ms · 2026-06-27T06:34:08.181340+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 22 canonical work pages

[1]

Haseeb Ahsan. 2023. https://doi.org/10.1002/rai2.12049 Origins and history of autoimmunity— A brief review . Rheumatology & Autoimmunity, 3(1):9--14

work page doi:10.1002/rai2.12049 2023
[2]

Olivier Bodenreider. 2004. https://doi.org/10.1093/nar/gkh061 The Unified Medical Language System ( UMLS ): integrating biomedical terminology . Nucleic Acids Research, 32(Database issue):D267

work page doi:10.1093/nar/gkh061 2004
[3]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://doi.org/10.18653/v1/N19-1423 BERT : Pre-training of deep bidirectional transformers for language understanding . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long an...

work page doi:10.18653/v1/n19-1423 2019
[4]

Thuy Trang Dinh, Trang Phuong Vo-Chanh, Chau Nguyen, Viet Quoc Huynh, Nam Vo, and Hoang Duc Nguyen. 2022. https://doi.org/10.1186/s12859-022-04993-4 Extract antibody and antigen names from biomedical literature . BMC Bioinformatics , 23:524

work page doi:10.1186/s12859-022-04993-4 2022
[5]

A. W. Forrey, C. J. McDonald , G. DeMoor , S. M. Huff, D. Leavelle, D. Leland, T. Fiers, L. Charles, B. Griffin, F. Stalling, A. Tullis, K. Hutchins, and J. Baenziger. 1996. https://doi.org/10.1093/clinchem/42.1.81 Logical observation identifier names and codes ( LOINC ) database: a public use set of codes and names for electronic reporting of clinical la...

work page doi:10.1093/clinchem/42.1.81 1996
[6]

Gargano, Nicolas Matentzoglu, Ben Coleman, Eunice B

Michael A. Gargano, Nicolas Matentzoglu, Ben Coleman, Eunice B. Addo-Lartey, Anna V. Anagnostopoulos, Joel Anderton, Paul Avillach, Anita M. Bagley, Eduard Bakštein, James P. Balhoff, Gareth Baynam, Susan M. Bello, Michael Berk, Holli Bertram, Somer Bishop, Hannah Blau, David F. Bodenstein, Pablo Botas, Kaan Boztug, and 157 others. 2024. https://doi.org/1...

work page doi:10.1093/nar/gkad1005 2024
[7]

Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, and Hoifung Poon. 2022. https://doi.org/10.1145/3458754 Domain-specific language model pretraining for biomedical natural language processing . ACM Transactions on Computing for Healthcare , 3(1):1--23

work page doi:10.1145/3458754 2022
[8]

Hayter and Matthew C

Scott M. Hayter and Matthew C. Cook. 2012. https://doi.org/10.1016/j.autrev.2012.02.001 Updated assessment of the prevalence, spectrum and case definition of autoimmune disease . Autoimmunity Reviews, 11(10):754--765

work page doi:10.1016/j.autrev.2012.02.001 2012
[9]

Eduard Hovy and Julia Lavid. 2010. https://www.cs.cmu.edu/ hovy/papers/10KNS-annotation-Hovy-Lavid.pdf Towards a ‘science’ of corpus annotation: A new methodological challenge for corpus linguistics . Open Journal of Modern Linguistics, 9(3):206--214

2010
[10]

Yann Mathet, Antoine Widl \"o cher, and Jean-Philippe M \'e tivier. 2015. https://doi.org/10.1162/COLI_a_00227 The unified and holistic method gamma ( ) for inter-annotator agreement measure and alignment . Computational Linguistics, 41(3):437--479

work page doi:10.1162/coli_a_00227 2015
[11]

Frederick W. Miller. 2023. https://doi.org/10.1016/j.coi.2022.102266 The increasing prevalence of autoimmunity and autoimmune diseases: An urgent call to action for improved understanding, diagnosis, treatment and prevention . Current opinion in immunology, 80(102266)

work page doi:10.1016/j.coi.2022.102266 2023
[12]

Sunil Mohan and Donghui Li. 2019. https://doi.org/10.24432/C5G59C Medmentions: A large biomedical corpus annotated with UMLS concepts . In 1st Conference on Automated Knowledge Base Construction, AKBC 2019, Amherst, MA, USA, May 20-22, 2019

work page doi:10.24432/c5g59c 2019
[13]

Marco Naguib, Xavier Tannier, and Aur \'e lie N \'e v \'e ol. 2024. https://doi.org/10.18653/v1/2024.findings-emnlp.400 Few-shot clinical entity recognition in E nglish, F rench and S panish: masked language models outperform generative model prompting . In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 6829--6852, Miami, Flo...

work page doi:10.18653/v1/2024.findings-emnlp.400 2024
[14]

Hiroki Nakayama, Takahiro Kubo, Junya Kamura, Yasufumi Taniguchi, and Xu Liang. 2018. https://github.com/doccano/doccano doccano : Text annotation tool for human . Software available from https://github.com/doccano/doccano

2018
[15]

Ana Rath, Annie Olry, Ferdinand Dhombres, Maja Miličić Brandt, Bruno Urbero, and Segolene Ayme. 2012. https://doi.org/10.1002/humu.22078 Representation of rare diseases in health information systems: The orphanet approach to serve a wide range of end users . Human Mutation, 33(5):803--808

work page doi:10.1002/humu.22078 2012
[16]

Adam Remaki, Jacques Ung, Pierre Pages, Perceval Wajsburt, Elise Liu, Guillaume Faure, Thomas Petit-Jean, Xavier Tannier, and Christel Gérardin. 2025. https://doi.org/10.2196/68704 Improving Phenotyping of Patients With Immune - Mediated Inflammatory Diseases Through Automated Processing of Discharge Summaries : Multicenter Cohort Study . JMIR Medical Inf...

work page doi:10.2196/68704 2025
[17]

Eric Sayers. 2018. https://www.ncbi.nlm.nih.gov/books/NBK25500/ E-utilities quick start . In Entrez® Programming Utilities Help [Internet]. National Center for Biotechnology Information ( US )

2018
[18]

Yinon Shapira, Nancy Agmon-Levin, and Yehuda Shoenfeld. 2010. https://doi.org/10.1016/j.jaut.2009.11.018 Defining and analyzing geoepidemiology and human autoimmunity . Journal of Autoimmunity, 34(3):J168--J177

work page doi:10.1016/j.jaut.2009.11.018 2010
[19]

Luca Soldaini and Nazli Goharian. 2016. https://github.com/Georgetown-IR-Lab/QuickUMLS Quickumls: a fast, unsupervised approach for medical concept extraction . In MedIR workshop, sigir, pages 1--4

2016
[20]

Ganapathiraju

Sandeep Subramanian and Madhavi K. Ganapathiraju. 2017. https://doi.org/10.3390/data2040038 Antibody exchange: Information extraction of biological antibody donation and a web-portal to find donors and seekers . Data, 2(4):38

work page doi:10.3390/data2040038 2017
[21]

Hadrien Titeux and Rachid Riad. 2021. https://doi.org/10.21105/joss.02989 pygamma-agreement: Gamma measure for inter/intra-annotator agreement in python . Journal of Open Source Software, 6(62):2989

work page doi:10.21105/joss.02989 2021
[22]

Wheeler, Jason R

Randi Vita, Nina Blazeska, Daniel Marrama, IEDB Curation Team Members , Sebastian Duesing, Jason Bennett, Jason Greenbaum, Marcus De Almeida Mendes, Jarjapu Mahita, Daniel K. Wheeler, Jason R. Cantrell, James A. Overton, Darren A. Natale, Alessandro Sette, and Bjoern Peters. 2025. https://doi.org/10.1093/nar/gkae1092 The immune epitope database ( IEDB ): ...

work page doi:10.1093/nar/gkae1092 2025
[23]

Wang, Jeremiah H

Amy Y. Wang, Jeremiah H. Sable, and Kent A. Spackman. 2002. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2244575/ The SNOMED clinical terms development process: refinement and analysis of content. Proceedings of the AMIA Symposium, pages 845--849

2002
[24]

Dan Wang, Liuhui Yang, Ping Zhang, Joshua LaBaer, Henning Hermjakob, Dong Li, and Xiaobo Yu. 2017. https://doi.org/10.1093/nar/gkw946 AAgAtlas 1.0: a human autoantigen database . Nucleic Acids Research, 45(D1):D769--D776

work page doi:10.1093/nar/gkw946 2017
[25]

Anthony Yazdani, Ihor Stepanov, and Douglas Teodoro. 2025. https://doi.org/10.48550/arXiv.2504.00676 Gliner-biomed: A suite of efficient models for open biomedical named entity recognition . arXiv preprint arXiv:2504.00676

work page doi:10.48550/arxiv.2504.00676 2025
[26]

Urchade Zaratiana, Gil Pasternak, Oliver Boyd, George Hurn-Maloney, and Ash Lewis. 2025. https://doi.org/10.18653/v1/2025.emnlp-demos.10 GLiNER 2: Schema-driven multi-task learning for structured information extraction . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 130--140. Associ...

work page doi:10.18653/v1/2025.emnlp-demos.10 2025
[27]

Urchade Zaratiana, Nadi Tomeh, Pierre Holat, and Thierry Charnois. 2024. https://doi.org/10.18653/v1/2024.naacl-long.300 GLiNER : Generalist Model for Named Entity Recognition using Bidirectional Transformer . In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies ...

work page doi:10.18653/v1/2024.naacl-long.300 2024

[1] [1]

Haseeb Ahsan. 2023. https://doi.org/10.1002/rai2.12049 Origins and history of autoimmunity— A brief review . Rheumatology & Autoimmunity, 3(1):9--14

work page doi:10.1002/rai2.12049 2023

[2] [2]

Olivier Bodenreider. 2004. https://doi.org/10.1093/nar/gkh061 The Unified Medical Language System ( UMLS ): integrating biomedical terminology . Nucleic Acids Research, 32(Database issue):D267

work page doi:10.1093/nar/gkh061 2004

[3] [3]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://doi.org/10.18653/v1/N19-1423 BERT : Pre-training of deep bidirectional transformers for language understanding . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long an...

work page doi:10.18653/v1/n19-1423 2019

[4] [4]

Thuy Trang Dinh, Trang Phuong Vo-Chanh, Chau Nguyen, Viet Quoc Huynh, Nam Vo, and Hoang Duc Nguyen. 2022. https://doi.org/10.1186/s12859-022-04993-4 Extract antibody and antigen names from biomedical literature . BMC Bioinformatics , 23:524

work page doi:10.1186/s12859-022-04993-4 2022

[5] [5]

A. W. Forrey, C. J. McDonald , G. DeMoor , S. M. Huff, D. Leavelle, D. Leland, T. Fiers, L. Charles, B. Griffin, F. Stalling, A. Tullis, K. Hutchins, and J. Baenziger. 1996. https://doi.org/10.1093/clinchem/42.1.81 Logical observation identifier names and codes ( LOINC ) database: a public use set of codes and names for electronic reporting of clinical la...

work page doi:10.1093/clinchem/42.1.81 1996

[6] [6]

Gargano, Nicolas Matentzoglu, Ben Coleman, Eunice B

Michael A. Gargano, Nicolas Matentzoglu, Ben Coleman, Eunice B. Addo-Lartey, Anna V. Anagnostopoulos, Joel Anderton, Paul Avillach, Anita M. Bagley, Eduard Bakštein, James P. Balhoff, Gareth Baynam, Susan M. Bello, Michael Berk, Holli Bertram, Somer Bishop, Hannah Blau, David F. Bodenstein, Pablo Botas, Kaan Boztug, and 157 others. 2024. https://doi.org/1...

work page doi:10.1093/nar/gkad1005 2024

[7] [7]

Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, and Hoifung Poon. 2022. https://doi.org/10.1145/3458754 Domain-specific language model pretraining for biomedical natural language processing . ACM Transactions on Computing for Healthcare , 3(1):1--23

work page doi:10.1145/3458754 2022

[8] [8]

Hayter and Matthew C

Scott M. Hayter and Matthew C. Cook. 2012. https://doi.org/10.1016/j.autrev.2012.02.001 Updated assessment of the prevalence, spectrum and case definition of autoimmune disease . Autoimmunity Reviews, 11(10):754--765

work page doi:10.1016/j.autrev.2012.02.001 2012

[9] [9]

Eduard Hovy and Julia Lavid. 2010. https://www.cs.cmu.edu/ hovy/papers/10KNS-annotation-Hovy-Lavid.pdf Towards a ‘science’ of corpus annotation: A new methodological challenge for corpus linguistics . Open Journal of Modern Linguistics, 9(3):206--214

2010

[10] [10]

Yann Mathet, Antoine Widl \"o cher, and Jean-Philippe M \'e tivier. 2015. https://doi.org/10.1162/COLI_a_00227 The unified and holistic method gamma ( ) for inter-annotator agreement measure and alignment . Computational Linguistics, 41(3):437--479

work page doi:10.1162/coli_a_00227 2015

[11] [11]

Frederick W. Miller. 2023. https://doi.org/10.1016/j.coi.2022.102266 The increasing prevalence of autoimmunity and autoimmune diseases: An urgent call to action for improved understanding, diagnosis, treatment and prevention . Current opinion in immunology, 80(102266)

work page doi:10.1016/j.coi.2022.102266 2023

[12] [12]

Sunil Mohan and Donghui Li. 2019. https://doi.org/10.24432/C5G59C Medmentions: A large biomedical corpus annotated with UMLS concepts . In 1st Conference on Automated Knowledge Base Construction, AKBC 2019, Amherst, MA, USA, May 20-22, 2019

work page doi:10.24432/c5g59c 2019

[13] [13]

Marco Naguib, Xavier Tannier, and Aur \'e lie N \'e v \'e ol. 2024. https://doi.org/10.18653/v1/2024.findings-emnlp.400 Few-shot clinical entity recognition in E nglish, F rench and S panish: masked language models outperform generative model prompting . In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 6829--6852, Miami, Flo...

work page doi:10.18653/v1/2024.findings-emnlp.400 2024

[14] [14]

Hiroki Nakayama, Takahiro Kubo, Junya Kamura, Yasufumi Taniguchi, and Xu Liang. 2018. https://github.com/doccano/doccano doccano : Text annotation tool for human . Software available from https://github.com/doccano/doccano

2018

[15] [15]

Ana Rath, Annie Olry, Ferdinand Dhombres, Maja Miličić Brandt, Bruno Urbero, and Segolene Ayme. 2012. https://doi.org/10.1002/humu.22078 Representation of rare diseases in health information systems: The orphanet approach to serve a wide range of end users . Human Mutation, 33(5):803--808

work page doi:10.1002/humu.22078 2012

[16] [16]

Adam Remaki, Jacques Ung, Pierre Pages, Perceval Wajsburt, Elise Liu, Guillaume Faure, Thomas Petit-Jean, Xavier Tannier, and Christel Gérardin. 2025. https://doi.org/10.2196/68704 Improving Phenotyping of Patients With Immune - Mediated Inflammatory Diseases Through Automated Processing of Discharge Summaries : Multicenter Cohort Study . JMIR Medical Inf...

work page doi:10.2196/68704 2025

[17] [17]

Eric Sayers. 2018. https://www.ncbi.nlm.nih.gov/books/NBK25500/ E-utilities quick start . In Entrez® Programming Utilities Help [Internet]. National Center for Biotechnology Information ( US )

2018

[18] [18]

Yinon Shapira, Nancy Agmon-Levin, and Yehuda Shoenfeld. 2010. https://doi.org/10.1016/j.jaut.2009.11.018 Defining and analyzing geoepidemiology and human autoimmunity . Journal of Autoimmunity, 34(3):J168--J177

work page doi:10.1016/j.jaut.2009.11.018 2010

[19] [19]

Luca Soldaini and Nazli Goharian. 2016. https://github.com/Georgetown-IR-Lab/QuickUMLS Quickumls: a fast, unsupervised approach for medical concept extraction . In MedIR workshop, sigir, pages 1--4

2016

[20] [20]

Ganapathiraju

Sandeep Subramanian and Madhavi K. Ganapathiraju. 2017. https://doi.org/10.3390/data2040038 Antibody exchange: Information extraction of biological antibody donation and a web-portal to find donors and seekers . Data, 2(4):38

work page doi:10.3390/data2040038 2017

[21] [21]

Hadrien Titeux and Rachid Riad. 2021. https://doi.org/10.21105/joss.02989 pygamma-agreement: Gamma measure for inter/intra-annotator agreement in python . Journal of Open Source Software, 6(62):2989

work page doi:10.21105/joss.02989 2021

[22] [22]

Wheeler, Jason R

Randi Vita, Nina Blazeska, Daniel Marrama, IEDB Curation Team Members , Sebastian Duesing, Jason Bennett, Jason Greenbaum, Marcus De Almeida Mendes, Jarjapu Mahita, Daniel K. Wheeler, Jason R. Cantrell, James A. Overton, Darren A. Natale, Alessandro Sette, and Bjoern Peters. 2025. https://doi.org/10.1093/nar/gkae1092 The immune epitope database ( IEDB ): ...

work page doi:10.1093/nar/gkae1092 2025

[23] [23]

Wang, Jeremiah H

Amy Y. Wang, Jeremiah H. Sable, and Kent A. Spackman. 2002. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2244575/ The SNOMED clinical terms development process: refinement and analysis of content. Proceedings of the AMIA Symposium, pages 845--849

2002

[24] [24]

Dan Wang, Liuhui Yang, Ping Zhang, Joshua LaBaer, Henning Hermjakob, Dong Li, and Xiaobo Yu. 2017. https://doi.org/10.1093/nar/gkw946 AAgAtlas 1.0: a human autoantigen database . Nucleic Acids Research, 45(D1):D769--D776

work page doi:10.1093/nar/gkw946 2017

[25] [25]

Anthony Yazdani, Ihor Stepanov, and Douglas Teodoro. 2025. https://doi.org/10.48550/arXiv.2504.00676 Gliner-biomed: A suite of efficient models for open biomedical named entity recognition . arXiv preprint arXiv:2504.00676

work page doi:10.48550/arxiv.2504.00676 2025

[26] [26]

Urchade Zaratiana, Gil Pasternak, Oliver Boyd, George Hurn-Maloney, and Ash Lewis. 2025. https://doi.org/10.18653/v1/2025.emnlp-demos.10 GLiNER 2: Schema-driven multi-task learning for structured information extraction . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 130--140. Associ...

work page doi:10.18653/v1/2025.emnlp-demos.10 2025

[27] [27]

Urchade Zaratiana, Nadi Tomeh, Pierre Holat, and Thierry Charnois. 2024. https://doi.org/10.18653/v1/2024.naacl-long.300 GLiNER : Generalist Model for Named Entity Recognition using Bidirectional Transformer . In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies ...

work page doi:10.18653/v1/2024.naacl-long.300 2024