AAbAAC: An Annotated Corpus for Autoimmunity Information Extraction
Pith reviewed 2026-06-27 06:34 UTC · model grok-4.3
The pith
A corpus of 115 annotated PubMed abstracts raises named entity recognition accuracy for autoimmunity entities.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors construct and release AAbAAC to support information extraction in autoimmunity. They show that the corpus allows evaluation of existing NER systems and that fine-tuning on the annotations produces measurable gains in recognition of the targeted entities and relations.
What carries the argument
AAbAAC corpus: 115 manually annotated abstracts containing entities for autoimmune diseases, autoantibodies, targets, locations, and clinical signs plus their relationships, used both to benchmark NER and to fine-tune models.
If this is right
- Fine-tuned models extract autoimmune entities and relations more accurately than untuned general models.
- Targeted annotation of a few hundred abstracts can close performance gaps in specialized biomedical subfields.
- The released corpus supplies training data and a benchmark for further computational work on autoimmunity.
Where Pith is reading between the lines
- The same annotation strategy could be repeated for other narrow biomedical topics that mix molecular and clinical terms.
- Extracted relations from the corpus could be assembled into small knowledge graphs linking autoantibodies to diseases and signs.
- The corpus offers a ready test set for checking whether larger language models retain their general advantage after domain adaptation.
Load-bearing premise
The 115 abstracts stand in for the full range of autoimmunity literature and the manual labels correctly identify the entities and relations without meaningful bias or omission.
What would settle it
Fine-tuning any standard NER model on AAbAAC produces no gain, or a loss, in F1 score when tested on a fresh set of autoimmunity abstracts that were not used in annotation or training.
Figures
read the original abstract
Despite advances in information extraction driven by deep learning and large language models, performance gaps remain in highly specialized biomedical fields, where domainspecific complexity poses challenges for generalist models. In this work, we focus on the domain of autoimmunity, where the main entities of interest are autoimmune diseases, autoantibodies (i.e., molecules that may mark or cause these diseases), their molecular targets, their location in the body, and their associated clinical signs. Herein, we present AAbAAC (AutoAntibodies and Autoimmunity Annotated Corpus), a corpus of 115 abstracts selected from PubMed, where we manually annotated entities and their relationships. First, AAbAAC was used to evaluate several methods on the task of named entity recognition (NER), and secondly, to fine-tune NER models. Our study demonstrates the utility of AAbAAC for information extraction in the domain of autoimmunity, showing expected improvement in NER performance after finetuning. This illustrates the value of small-scale annotation efforts for specialized domains and contributes to the computational study of autoimmunity. The AAbAAC corpus is available at https://github.com/f-maury/AAbAAC.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents AAbAAC, a manually annotated corpus of 115 PubMed abstracts in the autoimmunity domain. Entities (autoimmune diseases, autoantibodies, molecular targets, body locations, clinical signs) and relations are annotated; the corpus is used both to benchmark existing NER methods and to fine-tune models, with the abstract claiming an expected improvement in NER performance after fine-tuning. The resource is released on GitHub.
Significance. A verified small-scale annotated corpus for a specialized biomedical subdomain could support domain-adaptation experiments and lower the barrier for information extraction work in autoimmunity, where generalist models often underperform. Reproducible release of the data itself is a concrete contribution even if the reported gains are modest.
major comments (3)
- [Abstract] Abstract: the central claim that fine-tuning produces 'expected improvement in NER performance' is unsupported by any quantitative results (F1, precision, recall, or statistical tests before vs. after fine-tuning), preventing evaluation of whether the corpus actually demonstrates utility.
- [Methods] Methods/Results: no inter-annotator agreement scores, annotation guidelines, or details on how the 115 abstracts were selected and annotated are supplied, which directly affects the weakest assumption that the annotations accurately capture entities and relations without significant bias or error.
- [Results] Evaluation section: the description of which NER methods were evaluated and how the fine-tuning experiments were conducted (train/test splits, baseline models, hyperparameters) is absent, making the 'utility demonstration' impossible to reproduce or assess.
minor comments (2)
- [Abstract] Abstract contains minor typographical issues: 'domainspecific' should be 'domain-specific' and 'finetuning' should be 'fine-tuning'.
- The GitHub release is welcome, but the paper should include a concise description of the annotation schema, entity/relation definitions, and file formats so readers can use the corpus without first inspecting the repository.
Simulated Author's Rebuttal
We thank the referee for the detailed review and constructive feedback. We address each major comment below. The primary contribution of the work is the release of the AAbAAC corpus; we will revise the manuscript to improve clarity, reproducibility, and support for all claims.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that fine-tuning produces 'expected improvement in NER performance' is unsupported by any quantitative results (F1, precision, recall, or statistical tests before vs. after fine-tuning), preventing evaluation of whether the corpus actually demonstrates utility.
Authors: We acknowledge that the abstract asserts an improvement without accompanying quantitative evidence in the current manuscript. The fine-tuning experiments were performed, but the numerical results and statistical comparisons were omitted from the text. In revision we will either remove the unsupported claim from the abstract or add the missing F1/precision/recall figures and significance tests to the results section so that the claim is properly substantiated. revision: yes
-
Referee: [Methods] Methods/Results: no inter-annotator agreement scores, annotation guidelines, or details on how the 115 abstracts were selected and annotated are supplied, which directly affects the weakest assumption that the annotations accurately capture entities and relations without significant bias or error.
Authors: The referee correctly identifies that these methodological details are absent. We will add a dedicated Methods subsection that includes (1) the annotation guidelines used, (2) inter-annotator agreement statistics (Cohen’s kappa or equivalent), and (3) the precise PubMed query and selection criteria applied to obtain the 115 abstracts. These additions will be included in the revised manuscript. revision: yes
-
Referee: [Results] Evaluation section: the description of which NER methods were evaluated and how the fine-tuning experiments were conducted (train/test splits, baseline models, hyperparameters) is absent, making the 'utility demonstration' impossible to reproduce or assess.
Authors: We agree that the current Evaluation section lacks the necessary experimental details for reproducibility. In the revision we will expand this section to specify the NER models tested, the train/validation/test splits used, the baseline systems, all hyper-parameters, and the exact fine-tuning protocol. This will allow readers to replicate the reported experiments. revision: yes
Circularity Check
No significant circularity
full rationale
The paper is an empirical resource-creation and evaluation study: it describes manual annotation of 115 PubMed abstracts into the AAbAAC corpus, reports baseline NER performance on that corpus, and shows the expected performance lift after fine-tuning. No equations, first-principles derivations, fitted parameters later re-labeled as predictions, or load-bearing self-citations appear in the text. The central claim (utility demonstrated by measurable NER improvement) is directly supported by the new annotations and standard domain-adaptation experiments; it does not reduce to any prior definition or self-referential input by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Haseeb Ahsan. 2023. https://doi.org/10.1002/rai2.12049 Origins and history of autoimmunity— A brief review . Rheumatology & Autoimmunity, 3(1):9--14
-
[2]
Olivier Bodenreider. 2004. https://doi.org/10.1093/nar/gkh061 The Unified Medical Language System ( UMLS ): integrating biomedical terminology . Nucleic Acids Research, 32(Database issue):D267
-
[3]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://doi.org/10.18653/v1/N19-1423 BERT : Pre-training of deep bidirectional transformers for language understanding . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long an...
-
[4]
Thuy Trang Dinh, Trang Phuong Vo-Chanh, Chau Nguyen, Viet Quoc Huynh, Nam Vo, and Hoang Duc Nguyen. 2022. https://doi.org/10.1186/s12859-022-04993-4 Extract antibody and antigen names from biomedical literature . BMC Bioinformatics , 23:524
-
[5]
A. W. Forrey, C. J. McDonald , G. DeMoor , S. M. Huff, D. Leavelle, D. Leland, T. Fiers, L. Charles, B. Griffin, F. Stalling, A. Tullis, K. Hutchins, and J. Baenziger. 1996. https://doi.org/10.1093/clinchem/42.1.81 Logical observation identifier names and codes ( LOINC ) database: a public use set of codes and names for electronic reporting of clinical la...
-
[6]
Gargano, Nicolas Matentzoglu, Ben Coleman, Eunice B
Michael A. Gargano, Nicolas Matentzoglu, Ben Coleman, Eunice B. Addo-Lartey, Anna V. Anagnostopoulos, Joel Anderton, Paul Avillach, Anita M. Bagley, Eduard Bakštein, James P. Balhoff, Gareth Baynam, Susan M. Bello, Michael Berk, Holli Bertram, Somer Bishop, Hannah Blau, David F. Bodenstein, Pablo Botas, Kaan Boztug, and 157 others. 2024. https://doi.org/1...
-
[7]
Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, and Hoifung Poon. 2022. https://doi.org/10.1145/3458754 Domain-specific language model pretraining for biomedical natural language processing . ACM Transactions on Computing for Healthcare , 3(1):1--23
-
[8]
Scott M. Hayter and Matthew C. Cook. 2012. https://doi.org/10.1016/j.autrev.2012.02.001 Updated assessment of the prevalence, spectrum and case definition of autoimmune disease . Autoimmunity Reviews, 11(10):754--765
-
[9]
Eduard Hovy and Julia Lavid. 2010. https://www.cs.cmu.edu/ hovy/papers/10KNS-annotation-Hovy-Lavid.pdf Towards a ‘science’ of corpus annotation: A new methodological challenge for corpus linguistics . Open Journal of Modern Linguistics, 9(3):206--214
2010
-
[10]
Yann Mathet, Antoine Widl \"o cher, and Jean-Philippe M \'e tivier. 2015. https://doi.org/10.1162/COLI_a_00227 The unified and holistic method gamma ( ) for inter-annotator agreement measure and alignment . Computational Linguistics, 41(3):437--479
-
[11]
Frederick W. Miller. 2023. https://doi.org/10.1016/j.coi.2022.102266 The increasing prevalence of autoimmunity and autoimmune diseases: An urgent call to action for improved understanding, diagnosis, treatment and prevention . Current opinion in immunology, 80(102266)
-
[12]
Sunil Mohan and Donghui Li. 2019. https://doi.org/10.24432/C5G59C Medmentions: A large biomedical corpus annotated with UMLS concepts . In 1st Conference on Automated Knowledge Base Construction, AKBC 2019, Amherst, MA, USA, May 20-22, 2019
-
[13]
Marco Naguib, Xavier Tannier, and Aur \'e lie N \'e v \'e ol. 2024. https://doi.org/10.18653/v1/2024.findings-emnlp.400 Few-shot clinical entity recognition in E nglish, F rench and S panish: masked language models outperform generative model prompting . In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 6829--6852, Miami, Flo...
-
[14]
Hiroki Nakayama, Takahiro Kubo, Junya Kamura, Yasufumi Taniguchi, and Xu Liang. 2018. https://github.com/doccano/doccano doccano : Text annotation tool for human . Software available from https://github.com/doccano/doccano
2018
-
[15]
Ana Rath, Annie Olry, Ferdinand Dhombres, Maja Miličić Brandt, Bruno Urbero, and Segolene Ayme. 2012. https://doi.org/10.1002/humu.22078 Representation of rare diseases in health information systems: The orphanet approach to serve a wide range of end users . Human Mutation, 33(5):803--808
-
[16]
Adam Remaki, Jacques Ung, Pierre Pages, Perceval Wajsburt, Elise Liu, Guillaume Faure, Thomas Petit-Jean, Xavier Tannier, and Christel Gérardin. 2025. https://doi.org/10.2196/68704 Improving Phenotyping of Patients With Immune - Mediated Inflammatory Diseases Through Automated Processing of Discharge Summaries : Multicenter Cohort Study . JMIR Medical Inf...
-
[17]
Eric Sayers. 2018. https://www.ncbi.nlm.nih.gov/books/NBK25500/ E-utilities quick start . In Entrez® Programming Utilities Help [Internet]. National Center for Biotechnology Information ( US )
2018
-
[18]
Yinon Shapira, Nancy Agmon-Levin, and Yehuda Shoenfeld. 2010. https://doi.org/10.1016/j.jaut.2009.11.018 Defining and analyzing geoepidemiology and human autoimmunity . Journal of Autoimmunity, 34(3):J168--J177
-
[19]
Luca Soldaini and Nazli Goharian. 2016. https://github.com/Georgetown-IR-Lab/QuickUMLS Quickumls: a fast, unsupervised approach for medical concept extraction . In MedIR workshop, sigir, pages 1--4
2016
-
[20]
Sandeep Subramanian and Madhavi K. Ganapathiraju. 2017. https://doi.org/10.3390/data2040038 Antibody exchange: Information extraction of biological antibody donation and a web-portal to find donors and seekers . Data, 2(4):38
-
[21]
Hadrien Titeux and Rachid Riad. 2021. https://doi.org/10.21105/joss.02989 pygamma-agreement: Gamma measure for inter/intra-annotator agreement in python . Journal of Open Source Software, 6(62):2989
-
[22]
Randi Vita, Nina Blazeska, Daniel Marrama, IEDB Curation Team Members , Sebastian Duesing, Jason Bennett, Jason Greenbaum, Marcus De Almeida Mendes, Jarjapu Mahita, Daniel K. Wheeler, Jason R. Cantrell, James A. Overton, Darren A. Natale, Alessandro Sette, and Bjoern Peters. 2025. https://doi.org/10.1093/nar/gkae1092 The immune epitope database ( IEDB ): ...
-
[23]
Wang, Jeremiah H
Amy Y. Wang, Jeremiah H. Sable, and Kent A. Spackman. 2002. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2244575/ The SNOMED clinical terms development process: refinement and analysis of content. Proceedings of the AMIA Symposium, pages 845--849
2002
-
[24]
Dan Wang, Liuhui Yang, Ping Zhang, Joshua LaBaer, Henning Hermjakob, Dong Li, and Xiaobo Yu. 2017. https://doi.org/10.1093/nar/gkw946 AAgAtlas 1.0: a human autoantigen database . Nucleic Acids Research, 45(D1):D769--D776
-
[25]
Anthony Yazdani, Ihor Stepanov, and Douglas Teodoro. 2025. https://doi.org/10.48550/arXiv.2504.00676 Gliner-biomed: A suite of efficient models for open biomedical named entity recognition . arXiv preprint arXiv:2504.00676
-
[26]
Urchade Zaratiana, Gil Pasternak, Oliver Boyd, George Hurn-Maloney, and Ash Lewis. 2025. https://doi.org/10.18653/v1/2025.emnlp-demos.10 GLiNER 2: Schema-driven multi-task learning for structured information extraction . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 130--140. Associ...
-
[27]
Urchade Zaratiana, Nadi Tomeh, Pierre Holat, and Thierry Charnois. 2024. https://doi.org/10.18653/v1/2024.naacl-long.300 GLiNER : Generalist Model for Named Entity Recognition using Bidirectional Transformer . In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.