Team Fusion@ SU@ BC8 SympTEMIST track: transformer-based approach for symptom recognition and linking
Pith reviewed 2026-05-10 19:14 UTC · model grok-4.3
The pith
The choice of knowledge base has the highest impact on accuracy for transformer-based symptom recognition and linking
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The presented approach fine-tunes a RoBERTa-based token-level classifier augmented with BiLSTM and CRF layers on an augmented training set for symptom named entity recognition. Entity linking is achieved by generating candidates with the cross-lingual SapBERT XLMR-Large model and computing cosine similarity against a knowledge base. The choice of knowledge base has the highest impact on model accuracy.
What carries the argument
SapBERT-based candidate generation followed by cosine similarity matching to a knowledge base, which controls the precision of symptom entity linking after initial recognition by the RoBERTa-BiLSTM-CRF model
If this is right
- Augmenting the training set improves coverage for symptom named entity recognition.
- Cosine similarity on SapBERT embeddings provides an effective way to rank knowledge base candidates for linking.
- Transformer models can be adapted effectively for medical symptom tasks through fine-tuning and additional sequence layers.
- The overall performance depends more on knowledge base selection than on other components of the pipeline.
Where Pith is reading between the lines
- Curating or selecting appropriate medical knowledge bases may yield greater returns for clinical NLP applications than optimizing embedding models alone.
- Similar pipelines could apply to linking other clinical entities like medications or diagnoses by using domain-specific knowledge bases.
- Testing the approach on real-world clinical notes with varying symptom terminology would validate its robustness beyond the shared task data.
Load-bearing premise
The augmented training set sufficiently covers the distribution of symptoms in unseen test data and cosine similarity on SapBERT embeddings reliably identifies correct entity links.
What would settle it
Evaluating the model on test data containing symptoms not represented in the augmented training set or using a knowledge base with mismatched terminology would reveal if accuracy holds or declines.
read the original abstract
This paper presents a transformer-based approach to solving the SympTEMIST named entity recognition (NER) and entity linking (EL) tasks. For NER, we fine-tune a RoBERTa-based (1) token-level classifier with BiLSTM and CRF layers on an augmented train set. Entity linking is performed by generating candidates using the cross-lingual SapBERT XLMR-Large (2), and calculating cosine similarity against a knowledge base. The choice of knowledge base proves to have the highest impact on model accuracy.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes a transformer-based pipeline for the SympTEMIST shared task on symptom NER and entity linking. NER is performed by fine-tuning a RoBERTa model augmented with BiLSTM and CRF layers on an augmented training set. Entity linking generates candidates via cross-lingual SapBERT (XLMR-Large) embeddings and ranks them by cosine similarity against a knowledge base. The central empirical claim is that the choice of knowledge base produces the largest accuracy delta among the components tested during system development.
Significance. If the ablation-style comparisons are reproducible and the test-set results hold, the work supplies a concrete, task-specific demonstration that KB coverage and alignment dominate performance in biomedical symptom linking. This is consistent with broader EL literature and could usefully inform resource selection for similar low-resource medical NER/EL settings. The pipeline itself is standard; the value lies in the reported sensitivity ranking rather than architectural novelty.
major comments (1)
- The abstract and system description assert that KB choice has the highest impact, yet no accuracy figures, delta values, or ablation table are supplied to quantify this ranking relative to other design choices (e.g., augmentation strategy or CRF layer). Without these numbers the central empirical claim cannot be verified or compared to prior SympTEMIST submissions.
minor comments (2)
- The augmentation procedure for the training set is mentioned but not detailed (size, method, or coverage statistics); adding this information would improve reproducibility.
- Consider including a brief error analysis or example of KB-induced linking failures to illustrate why the chosen KB outperforms alternatives.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We agree that the central claim requires explicit quantitative support and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: The abstract and system description assert that KB choice has the highest impact, yet no accuracy figures, delta values, or ablation table are supplied to quantify this ranking relative to other design choices (e.g., augmentation strategy or CRF layer). Without these numbers the central empirical claim cannot be verified or compared to prior SympTEMIST submissions.
Authors: We acknowledge that the manuscript currently states the KB choice has the highest impact without providing the supporting ablation numbers or table. In the revised version we will add a dedicated ablation subsection (or table) that reports exact accuracy scores and deltas for the full pipeline versus variants that remove augmentation, remove the CRF layer, or swap knowledge bases. These numbers will directly substantiate the ranking and enable comparison with other SympTEMIST submissions. revision: yes
Circularity Check
No significant circularity identified
full rationale
The paper describes an empirical pipeline for NER (RoBERTa + BiLSTM-CRF on augmented data) and EL (SapBERT embeddings + cosine similarity to KB) in the SympTEMIST shared task. The central claim that KB choice has the highest impact on accuracy is presented as an observation from ablation-style comparisons during system development, with no equations, derivations, fitted parameters renamed as predictions, or load-bearing self-citations. No self-definitional loops, ansatzes smuggled via citation, or renaming of known results occur; the work is a standard applied description of model choices and empirical results, fully self-contained without reducing any claim to its own inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
T. Almeida, R. A. A. Jonker, R. Poudel, J. M. Silva, and S. Matos. Discovering med- ical procedures in spanish using transformer models with mcrf and augmentation. In Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum , 2023
work page 2023
-
[2]
Casimiro Pio Carrino, Jordi Armengol-Estapé, Asier Gutiérrez-Fandiño, Joan Llop- Palao, Marc Pàmies, Aitor Gonzalez-Agirre, and Marta Villegas. Biomedical and clinical language models for spanish: On the benefits of domain-specific pretraining in a mid-resource scenario, 2021
work page 2021
-
[3]
Al- fonso Ureña-López, and María Teresa Martín-Valdivia
Mariia Chizhikova, Jaime Collado-Montañez, Manuel Carlos Díaz-Galiano, L. Al- fonso Ureña-López, and María Teresa Martín-Valdivia. Coming a long way with pre-trained transformers and string matching techniques: Clinical procedure men- tion recognition and normalization. In Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum , 2023
work page 2023
-
[4]
Vijay Krishnan and Vignesh Ganapathy. Named entity recognition. In Named Entity Recognition, 2005
work page 2005
-
[5]
Lukas Lange, Heike Adel, Jannik Strötgen, and Dietrich Klakow. Clin-x-es: pre- trained language models and a study on cross-task transfer for concept extraction in the clinical domain. Bioinformatics, 38(12):3267–3274, apr 2022
work page 2022
-
[6]
S. Lima-López, E. Farré-Maduell, L. Gasco-Sánchez, Rodríguez-Miret, J., and M. Krallinger. Overview of the symptemist shared task at biocreative viii: detection and normalization of symptoms, signs and findings. In Proceedings of BioCreative VIII Workshop, 2023
work page 2023
-
[7]
Learning domain- specialised representations for cross-lingual biomedical entity linking
Fangyu Liu, Ivan Vulić, Anna Korhonen, and Nigel Collier. Learning domain- specialised representations for cross-lingual biomedical entity linking. In Proceedings of ACL-IJCNLP 2021 , pages 565–574, August 2021
work page 2021
-
[8]
Roberta: A robustly optimized bert pretraining approach, 2019
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach, 2019
work page 2019
-
[9]
Sylvia Vassileva, Georgi Grazhdanski, Svetla Boytcheva, and Ivan Koychev. Fusion @ bioasq medprocner: Transformer-based approach for procedure recognition and linking in spanish clinical text. In Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum , 2023
work page 2023
-
[10]
Marta Villegas, Ander Intxaurrondo, Aitor Gonzalez-Agirre, Montserrat Marimon, and Martin Krallinger. The mespen resource for english-spanish medical machine translation and terminologies: Census of parallel corpora, glossaries and term trans- lations. In LREC MultilingualBIO: Multilingual Biomedical Text Processing , pages 32–39. ELRA, 2018
work page 2018
-
[11]
Elena Zotova, Aitor García-Pablos, Montse Cuadros, and German Rigau. Vicomtech at medprocner 2023: Transformers-based sequence-labelling and cross-encoding for entity detection and normalisation in spanish clinical texts. In Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum , 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.