ADMEDTAGGER: an annotation framework for distillation of expert knowledge for the Polish medical language

Andrzej Czy\.zewski; Franciszek G\'orski

arxiv: 2601.09722 · v2 · pith:4AR5QD6Gnew · submitted 2025-12-27 · 💻 cs.CL · cs.AI

ADMEDTAGGER: an annotation framework for distillation of expert knowledge for the Polish medical language

Franciszek G\'orski , Andrzej Czy\.zewski This is my paper

Pith reviewed 2026-05-21 15:48 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords Polish medical textsLLM annotationknowledge distillationBERT classifiersclinical text classificationmultilingual modelsmedical NLP

0 comments

The pith

A multilingual LLM annotates Polish medical texts to train compact classifiers reaching F1 above 0.80.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents an annotation framework that uses the multilingual Llama 3.1 model as a teacher to label a large corpus of Polish medical texts across five clinical categories. This solves the lack of annotated data by generating labels automatically and verifying only a portion manually to create a test set. The resulting dataset trains three BERT-based classifiers, with DistilBERT performing best at F1 scores over 0.80 for every category and over 0.93 for three of them. These smaller models serve as practical alternatives to large language models because they require far less memory and run much faster while maintaining high accuracy.

Core claim

Using a pretrained multilingual LLM to annotate Polish medical texts creates labeled data that trains DistilBERT, BioBERT, and HerBERT classifiers to high F1 scores, yielding compact models that function as efficient alternatives to direct LLM use for clinical text categorization.

What carries the argument

The ADMEDTAGGER framework that treats a multilingual LLM as teacher model to distill expert knowledge into smaller BERT-based classifiers for Polish medical language.

If this is right

The trained classifiers enable multi-class categorization of Polish medical texts without needing full-scale LLMs at inference time.
Resource use drops dramatically: models are roughly 500 times smaller and use 300 times less GPU memory with hundreds of times faster inference.
The framework scales annotation for other low-resource medical domains by combining LLM labeling with limited human checks.
DistilBERT emerges as the strongest of the three architectures tested on this Polish clinical dataset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same teacher-student setup could extend to other languages that lack large annotated medical corpora.
Periodic human audits of LLM-generated labels might further improve classifier robustness over time.
Deploying the compact models in clinical software could support real-time Polish text processing where compute is limited.

Load-bearing premise

The labels generated by Llama 3.1 are sufficiently accurate and unbiased to train reliable classifiers even though only part of them received manual verification.

What would settle it

Independent expert re-annotation of a held-out portion of the corpus or direct comparison of model outputs against fresh human labels on unseen documents would confirm or refute the reported F1 scores.

Figures

Figures reproduced from arXiv: 2601.09722 by Andrzej Czy\.zewski, Franciszek G\'orski.

**Figure 1.** Figure 1: Diagram of the ADMEDVOICE system. multi-label text classification. Surprisingly, despite recent advances in LLMs, methods based on in-context learning with generative models do not generally outperform fine-tuned smaller language models (SLMs) in classification tasks. Even with advanced prompting techniques and ensemble methods, LLMs only marginally outperform encoder-only SLMs on individual datasets, r… view at source ↗

**Figure 2.** Figure 2: Diagram of the whole ADMEDTAGGER methodology. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Prompt construction for the Radiology Clinical [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Datasets’ labels distribution for each clinical category. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: DistilBERT’s confusion matrices for each clinical category. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

read the original abstract

In this work, we present an annotation framework that demonstrates how a multilingual LLM pretrained on a large corpus can be used as a teacher model to distill the expert knowledge needed for tagging medical texts in Polish. This work is part of a larger project called ADMEDVOICE, within which we collected an extensive corpus of medical texts representing five clinical categories - Radiology, Oncology, Cardiology, Hypertension, and Pathology. Using this data, we had to develop a multi-class classifier, but the fundamental problem turned out to be the lack of resources for annotating an adequate number of texts. Therefore, in our solution, we used the multilingual Llama3.1 model to annotate an extensive corpus of medical texts in Polish. Using our limited annotation resources, we verified only a portion of these labels, creating a test set from them. The data annotated in this way were then used for training and validation of 3 different types of classifiers based on the BERT architecture - the distilled DistilBERT model, BioBERT fine-tuned on medical data, and HerBERT fine-tuned on the Polish language corpus. Among the models we trained, the DistilBERT model achieved the best results, reaching an F1 score > 0.80 for each clinical category and an F1 score > 0.93 for 3 of them. In this way, we obtained a series of highly effective classifiers that represent an alternative to large language models, due to their nearly 500 times smaller size, 300 times lower GPU VRAM consumption, and several hundred times faster inference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They used Llama 3.1 to label Polish medical texts and trained DistilBERT to solid F1 on a human-verified test set, but the paper leaves the teacher label accuracy unquantified.

read the letter

The main thing to know is that this work shows how to use a large multilingual LLM to label Polish medical documents across five categories and then train much smaller models that still deliver solid performance on a human-verified test set. They gathered a corpus in radiology, oncology, cardiology, hypertension, and pathology. Annotation was the blocker, so they let Llama 3.1 do the bulk of the labeling. A subset got manual checks to form the test set. From there they fine-tuned DistilBERT, BioBERT, and HerBERT. DistilBERT won with F1 scores over 0.80 on every category and over 0.93 on three of them. The payoff is models that are hundreds of times smaller and faster. What works here is the end-to-end recipe for a low-resource setting. The size and speed comparisons are concrete, and the choice of models makes sense for the domain and language. It is a direct application of distillation rather than a new method, but the Polish medical focus fills a gap. The weaker part is the lack of numbers on how closely the Llama labels matched the human ones on the verified samples. The abstract notes that only a portion was checked, but without inter-annotator agreement or per-category error rates, the training data quality stays opaque. That makes it harder to judge how much the results depend on clean labels versus the test set being forgiving. If the full paper has those figures, they need to be added or highlighted. This is aimed at practitioners who need deployable medical taggers in languages like Polish where labeled data is scarce. A reader looking for baselines or a starting pipeline would find it useful. It is worth sending for peer review. The results are specific enough to be checked, and the practical angle is clear. Reviewers can ask for the missing agreement data and any sensitivity checks on the labels.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces ADMEDTAGGER, an annotation framework that employs the multilingual Llama 3.1 model to annotate a corpus of Polish medical texts across five clinical categories: Radiology, Oncology, Cardiology, Hypertension, and Pathology. Due to limited annotation resources, only a portion of the LLM-generated labels is manually verified to form a test set, while the annotated data is used to train and validate three BERT-based classifiers: DistilBERT, BioBERT, and HerBERT. The DistilBERT model achieves the best performance with F1 scores exceeding 0.80 for each category and 0.93 for three of them, positioning these compact models as efficient alternatives to large language models.

Significance. If the results hold with confirmed label quality, this work provides a practical method for addressing data scarcity in Polish medical NLP by distilling LLM knowledge into lightweight classifiers. It could enable efficient, deployable tools for clinical text classification with substantially lower computational costs than full LLMs.

major comments (1)

[Annotation framework and experimental setup] The performance claims rest on training data produced by Llama 3.1 with only partial manual verification for the test set. No quantitative details are supplied on annotation error rates, per-category agreement between Llama 3.1 labels and human verification, or error analysis on the verified subset (see the annotation framework description and results reporting). This omission is load-bearing for the central distillation claim, as unquantified label noise could affect what the downstream models actually learn.

minor comments (1)

[Abstract] The abstract states F1 > 0.80 for each category and > 0.93 for three but does not indicate the exact scale of the verified test set or the split ratios used for training/validation; adding these numbers would improve clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for highlighting an important aspect of our experimental design. We address the major comment below and describe the changes we will make in revision.

read point-by-point responses

Referee: [Annotation framework and experimental setup] The performance claims rest on training data produced by Llama 3.1 with only partial manual verification for the test set. No quantitative details are supplied on annotation error rates, per-category agreement between Llama 3.1 labels and human verification, or error analysis on the verified subset (see the annotation framework description and results reporting). This omission is load-bearing for the central distillation claim, as unquantified label noise could affect what the downstream models actually learn.

Authors: We agree that quantitative characterization of label quality is necessary to fully support the distillation claim. The manuscript states that only a portion of the Llama 3.1-generated labels was manually verified to form the test set, but it does not report agreement statistics or error analysis. In the revised manuscript we will add a dedicated subsection under the annotation framework that reports, for each of the five categories: (i) the exact number of samples that received human verification, (ii) the percentage agreement between the Llama 3.1 label and the human annotator, and (iii) a concise error analysis of the main types of discrepancies observed. These additions will allow readers to assess the degree of label noise present in the training data and will directly address the concern that unquantified noise could undermine the reported performance of the distilled classifiers. The high F1 scores obtained on the human-verified test set remain the primary empirical support for the effectiveness of the approach, but we accept that the requested metrics are required for a complete evaluation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; evaluation is independent of training labels

full rationale

The paper's core claim rests on training classifiers (including DistilBERT) on Llama 3.1-generated labels for Polish medical texts across five categories, with only a portion of labels manually verified to form a held-out test set. Reported F1 scores (>0.80 overall, >0.93 on three categories) are measured directly on this separate human-verified test set. No equations, fitted parameters renamed as predictions, self-citations, or ansatzes appear in the provided text that would reduce the performance numbers to the LLM labels by construction. The derivation chain is therefore self-contained against an external benchmark (manual verification), with any label-noise concerns falling under correctness risk rather than circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central result rests on the unverified assumption that LLM-generated labels are high-quality enough for training; no free parameters or invented entities are introduced beyond standard model choices.

axioms (1)

domain assumption Llama 3.1 can generate sufficiently accurate category labels for Polish medical texts to train effective classifiers.
Invoked when the authors use the generated annotations for training and validation without reporting full verification statistics.

pith-pipeline@v0.9.0 · 5814 in / 1188 out tokens · 39023 ms · 2026-05-21T15:48:40.796799+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 4 internal anchors

[1]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Dis- tilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[2]

A comprehensive survey on knowledge distillation.arXiv preprint arXiv:2503.12067, 2025

Amir M Mansourian et al. A comprehensive survey on knowledge distillation.arXiv preprint arXiv:2503.12067, 2025

work page arXiv 2025
[3]

A survey on knowledge distilla- tion: Recent advancements.ScienceDirect, Novem- ber 2024

N Moslemi et al. A survey on knowledge distilla- tion: Recent advancements.ScienceDirect, Novem- ber 2024

work page 2024
[4]

Survey on knowledge dis- tillation for large language models: Methods, evalua- tion, and application.ACM Transactions on Intelli- gent Systems and Technology, 2024

ACM Transactions Team. Survey on knowledge dis- tillation for large language models: Methods, evalua- tion, and application.ACM Transactions on Intelli- gent Systems and Technology, 2024

work page 2024
[5]

Coun- terclockwise block-by-block knowledge distillation for neural network compression.Scientific Reports, 15(1):91152, 2025

Xiaohui Zhang, Wei Li, Jian Wang, et al. Coun- terclockwise block-by-block knowledge distillation for neural network compression.Scientific Reports, 15(1):91152, 2025

work page 2025
[6]

Uncertainty-based knowledge distillation for bayesian deep neural network compression

Mina Hemmatian, Ali Shahzadi, and Saeed Mozaf- fari. Uncertainty-based knowledge distillation for bayesian deep neural network compression. International Journal of Approximate Reasoning, 173:109289, 2024

work page 2024
[7]

Knowl- edge distillation: Enhancing neural network com- pression with integrated gradients.arXiv preprint arXiv:2503.13008, 2025

David E Hernandez, Torbjörn Nordling, et al. Knowl- edge distillation: Enhancing neural network com- pression with integrated gradients.arXiv preprint arXiv:2503.13008, 2025

work page arXiv 2025
[8]

Knowledge distillation: A survey

Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. Knowledge distillation: A survey. International Journal of Computer Vision, 129:1789– 1819, 2021

work page 2021
[9]

A Survey on Knowledge Distillation of Large Language Models

Xiaohan Xu, Ming Li, Chongyang Tao, Tao Shen, Reynold Cheng, Jinyang Li, Can Xu, Dacheng Tao, and Tianyi Zhou. A survey on knowledge distil- lation of large language models.arXiv preprint arXiv:2402.13116, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[10]

Bert: Pre-training of deep bidi- rectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidi- rectional transformers for language understanding. 10 APREPRINT- JANUARY16, 2026 InProceedings of NAACL-HLT, pages 4171–4186, 2019

work page 2026
[11]

A survey of text classification with transformers: How wide? how large? how long? how accurate? how expensive? how safe?IEEE Access, 2024

M Fields et al. A survey of text classification with transformers: How wide? how large? how long? how accurate? how expensive? how safe?IEEE Access, 2024

work page 2024
[12]

Improving bert-based model for med- ical text classification with an optimization algo- rithm

Karim Gasmi. Improving bert-based model for med- ical text classification with an optimization algo- rithm. InAdvances in Computational Collective Intel- ligence. ICCCI 2022. Communications in Computer and Information Science, volume 1653, Cham, 2022. Springer

work page 2022
[13]

Survey of transformers and towards ensemble learning using transformers for natural lan- guage processing.PMC, 2024

S Patel et al. Survey of transformers and towards ensemble learning using transformers for natural lan- guage processing.PMC, 2024

work page 2024
[14]

Rethinking of bert sen- tence embedding for text classification.Neural Com- puting and Applications, August 2024

Hussein T Al-Natsheh et al. Rethinking of bert sen- tence embedding for text classification.Neural Com- puting and Applications, August 2024

work page 2024
[15]

Qiushi Sun, Zhangyue Yin, Xiang Li, Zhiyong Wu, Xipeng Qiu, and Lingpeng Kong

Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. How to fine-tune bert for text classification?arXiv preprint arXiv:1905.05583, 2019

work page arXiv 1905
[16]

Short-text sentiment classification model based on bert and dual- stream transformer gated attention mechanism.Elec- tronics, 14(19):3904, 2025

Yuxin Li, Wei Zhang, Hao Wang, et al. Short-text sentiment classification model based on bert and dual- stream transformer gated attention mechanism.Elec- tronics, 14(19):3904, 2025

work page 2025
[17]

Lnlf-bert: Transformer for long document classification with multiple attention levels.IEEE Access, 2024

Imad El Maaroufi, Youssef Mellah, Karim El Kharki, et al. Lnlf-bert: Transformer for long document classification with multiple attention levels.IEEE Access, 2024

work page 2024
[18]

Are we really making much progress in text classification? a comparative review.arXiv preprint arXiv:2204.03954, 2025

J Jiang et al. Are we really making much progress in text classification? a comparative review.arXiv preprint arXiv:2204.03954, 2025

work page arXiv 2025
[19]

Biobert: A pre-trained biomedical language representation model for biomedical text mining.Bioinformatics, 36(4):1234–1240, 2020

Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. Biobert: A pre-trained biomedical language representation model for biomedical text mining.Bioinformatics, 36(4):1234–1240, 2020

work page 2020
[20]

Lightweight transformers for clini- cal natural language processing.Natural Language Engineering, pages 1–28, 2023

Omid Rohanian, Mohammadmahdi Nouriborji, Hannah Jauncey, Samaneh Kouchaki, Farhad Nooralahzadeh, Lei Clifton, Laura Merson, and David A Clifton. Lightweight transformers for clini- cal natural language processing.Natural Language Engineering, pages 1–28, 2023

work page 2023
[21]

ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission

Kexin Huang, Jaan Altosaar, and Rajesh Ran- ganath. Clinicalbert: Modeling clinical notes and predicting hospital readmission.arXiv preprint arXiv:1904.05342, 2020

work page internal anchor Pith review Pith/arXiv arXiv 1904
[22]

Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets

Yifan Peng, Shankai Yan, and Zhiyong Lu. Transfer learning in biomedical natural language processing: An evaluation of bert and elmo on ten benchmarking datasets.arXiv preprint arXiv:1906.05474, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906
[23]

Med-bert: pretrained contextualized em- beddings on large-scale structured electronic health records for disease prediction.npj Digital Medicine, 4(1):1–13, 2021

Laila Rasmy, Yang Xiang, Ziqian Xie, Cui Tao, and Degui Zhi. Med-bert: pretrained contextualized em- beddings on large-scale structured electronic health records for disease prediction.npj Digital Medicine, 4(1):1–13, 2021

work page 2021
[24]

Knowledge-augmented multimodal clinical rationale generation for disease diagnosis with small language models.arXiv preprint arXiv:2411.07611, 2025

ClinRaGen Research Team. Knowledge-augmented multimodal clinical rationale generation for disease diagnosis with small language models.arXiv preprint arXiv:2411.07611, 2025

work page arXiv 2025
[25]

Attention mechanism with bert for content annotation and categorization of pregnancy-related questions on a community q&a site

Eunkyung Kim, Kai Huang, Yu Xing, Xiaoqian Jiang, et al. Attention mechanism with bert for content annotation and categorization of pregnancy-related questions on a community q&a site. InAMIA An- nual Symposium Proceedings, volume 2020, pages 625–634. American Medical Informatics Association, 2021

work page 2020
[26]

On the behaviour of bert’s attention for the classification of medical reports.CEUR Workshop Proceedings, 3277, 2022

Luca Putelli, Alfonso E Gerevini, Alberto Lavelli, Tahir Mehmood, and Ivan Serina. On the behaviour of bert’s attention for the classification of medical reports.CEUR Workshop Proceedings, 3277, 2022

work page 2022
[27]

Publicly available clinical bert embed- dings

Emily Alsentzer, John Murphy, William Boag, Wei- Hung Weng, Di Jin, Tristan Naumann, and Matthew McDermott. Publicly available clinical bert embed- dings. InProceedings of the 2nd Clinical Natural Language Processing Workshop, pages 72–78. ACL, 2019

work page 2019
[28]

Comparison of bert implementations for natural language processing of narrative medical documents.ScienceDirect, 2020

G Wang et al. Comparison of bert implementations for natural language processing of narrative medical documents.ScienceDirect, 2020

work page 2020
[29]

Large language model influence on diagnostic reasoning: A randomized clinical trial

Ethan Tanner et al. Large language model influence on diagnostic reasoning: A randomized clinical trial. JAMA Network Open, 7(10), 2024

work page 2024
[30]

Large language models encode clinical knowledge.Nature, 620(7972):172–180, 2023

Karan Singhal, Shekoofeh Azizi, Tao Tu, S Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, et al. Large language models encode clinical knowledge.Nature, 620(7972):172–180, 2023

work page 2023
[31]

Large language models for dis- ease diagnosis: a scoping review.npj Artificial Intel- ligence, 1(1):11, 2025

Shuang Wang, Zhenyu Zhao, Xi Ouyang, Qian Wang, and Dinggang Shen. Large language models for dis- ease diagnosis: a scoping review.npj Artificial Intel- ligence, 1(1):11, 2025

work page 2025
[32]

Application of large language models in disease diagnosis and treatment.Chinese Medical Journal, 2025

Shuo Liu, Yanglin Pan, Kejia Wang, Xiaodong Jia, et al. Application of large language models in disease diagnosis and treatment.Chinese Medical Journal, 2025

work page 2025
[33]

Kdh-mltc: Knowledge distillation for healthcare multi-label text classifica- tion.arXiv preprint arXiv:2505.07162, 2025

Hajar Sakai and Sarah S Lam. Kdh-mltc: Knowledge distillation for healthcare multi-label text classifica- tion.arXiv preprint arXiv:2505.07162, 2025

work page arXiv 2025
[34]

Medical prediction using discharge- bert and corebert.Referenced in KDH-MLTC, 2025

M Hasan et al. Medical prediction using discharge- bert and corebert.Referenced in KDH-MLTC, 2025

work page 2025
[35]

Cross-lingual text classification with minimal resources by transferring a sparse teacher

Giannis Karamanolakis, Daniel Hsu, and Luis Gra- vano. Cross-lingual text classification with minimal resources by transferring a sparse teacher. InFind- ings of the Association for Computational Linguistics: EMNLP 2020, pages 3604–3622, 2020

work page 2020
[36]

Dsg-kd: Knowledge distillation from domain-specific to general language models.arXiv preprint arXiv:2409.14904, 2024

Y Cho et al. Dsg-kd: Knowledge distillation from domain-specific to general language models.arXiv preprint arXiv:2409.14904, 2024. 11 APREPRINT- JANUARY16, 2026

work page arXiv 2024
[37]

Cross-lingual text classification with multilingual distillation and zero-shot-aware training.arXiv preprint arXiv:2202.13654, 2022

Ziqing Yang et al. Cross-lingual text classification with multilingual distillation and zero-shot-aware training.arXiv preprint arXiv:2202.13654, 2022

work page arXiv 2022
[38]

G Katsogiannis-Meimarakis and G Koutrika. Mul- tilingual text categorization and sentiment analysis: A comparative analysis of the utilization of multilin- gual approaches for classifying twitter data.Neural Computing and Applications, May 2023

work page 2023
[39]

Cross-lingual classification of political texts using multilingual sentence embed- dings.Political Analysis, 31(3), January 2023

Moritz Laurer, Wouter van Atteveldt, Andreu Casas, and Kasper Welbers. Cross-lingual classification of political texts using multilingual sentence embed- dings.Political Analysis, 31(3), January 2023

work page 2023
[40]

Universal cross-lingual text clas- sification.arXiv preprint arXiv:2406.11028, June 2024

Raviraj Joshi et al. Universal cross-lingual text clas- sification.arXiv preprint arXiv:2406.11028, June 2024

work page arXiv 2024
[41]

Herbert: Efficiently pretrained transformer-based language model for polish

Robert Mroczkowski, Piotr Rybak, Alina Wróblewska, and Ireneusz Gawlik. Herbert: Efficiently pretrained transformer-based language model for polish. InProceedings of the 8th Workshop on Balto-Slavic Natural Language Processing, pages 1–10, Kiyv, Ukraine, April 2021. Association for Computational Linguistics

work page 2021
[42]

Klej: Comprehensive benchmark for polish language understanding.arXiv preprint arXiv:2005.00630, 2020

Piotr Rybak, Robert Mroczkowski, Janusz Tracz, and Ireneusz Gawlik. Klej: Comprehensive benchmark for polish language understanding.arXiv preprint arXiv:2005.00630, 2020

work page arXiv 2005
[43]

Wroclaw corpus of consumer reviews sentiment (WCCRS), 2019

Jan Koco ´n, Monika Za ´sko-Zieli´nska, Piotr Miłkowski, Arkadiusz Janz, and Maciej Piasecki. Wroclaw corpus of consumer reviews sentiment (WCCRS), 2019. CLARIN-PL digital repository

work page 2019
[44]

Deep learning analysis of polish electronic health records for diagnosis prediction in patients with cardiovascular diseases.Personalized Medicine, 12(6):869, May 2022

Górno´sl ˛ askie Centrum Medyczne Research Team. Deep learning analysis of polish electronic health records for diagnosis prediction in patients with cardiovascular diseases.Personalized Medicine, 12(6):869, May 2022

work page 2022
[45]

AssistMED Project Team. Practical use case of natural language processing for observational clin- ical research data retrieval from electronic health records: Assistmed project.Polish Archives of Inter- nal Medicine, 2024

work page 2024
[46]

Machine learning tools match physician accuracy in multilingual text annotation

Marta Zielonka, Andrzej Czy˙zewski, Dariusz Szplit, Beata Graff, Anna Szyndler, Mariusz Budzisz, and Krzysztof Narkiewicz. Machine learning tools match physician accuracy in multilingual text annotation. Scientific Reports, 15(1):5487, 2025

work page 2025
[47]

A comprehensive polish medical speech dataset for enhancing auto- matic medical dictation.Scientific Data, 12(1):1436, 2025

Andrzej Czy ˙zewski, Sebastian Cygert, Karolina Marciniuk, Maciej Szczodrak, Arkadiusz Harasim- iuk, Piotr Odya, Marina Galanina, Piotr Szczuko, Bo˙zena Kostek, Beata Graff, et al. A comprehensive polish medical speech dataset for enhancing auto- matic medical dictation.Scientific Data, 12(1):1436, 2025. 12

work page 2025

[1] [1]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Dis- tilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[2] [2]

A comprehensive survey on knowledge distillation.arXiv preprint arXiv:2503.12067, 2025

Amir M Mansourian et al. A comprehensive survey on knowledge distillation.arXiv preprint arXiv:2503.12067, 2025

work page arXiv 2025

[3] [3]

A survey on knowledge distilla- tion: Recent advancements.ScienceDirect, Novem- ber 2024

N Moslemi et al. A survey on knowledge distilla- tion: Recent advancements.ScienceDirect, Novem- ber 2024

work page 2024

[4] [4]

Survey on knowledge dis- tillation for large language models: Methods, evalua- tion, and application.ACM Transactions on Intelli- gent Systems and Technology, 2024

ACM Transactions Team. Survey on knowledge dis- tillation for large language models: Methods, evalua- tion, and application.ACM Transactions on Intelli- gent Systems and Technology, 2024

work page 2024

[5] [5]

Coun- terclockwise block-by-block knowledge distillation for neural network compression.Scientific Reports, 15(1):91152, 2025

Xiaohui Zhang, Wei Li, Jian Wang, et al. Coun- terclockwise block-by-block knowledge distillation for neural network compression.Scientific Reports, 15(1):91152, 2025

work page 2025

[6] [6]

Uncertainty-based knowledge distillation for bayesian deep neural network compression

Mina Hemmatian, Ali Shahzadi, and Saeed Mozaf- fari. Uncertainty-based knowledge distillation for bayesian deep neural network compression. International Journal of Approximate Reasoning, 173:109289, 2024

work page 2024

[7] [7]

Knowl- edge distillation: Enhancing neural network com- pression with integrated gradients.arXiv preprint arXiv:2503.13008, 2025

David E Hernandez, Torbjörn Nordling, et al. Knowl- edge distillation: Enhancing neural network com- pression with integrated gradients.arXiv preprint arXiv:2503.13008, 2025

work page arXiv 2025

[8] [8]

Knowledge distillation: A survey

Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. Knowledge distillation: A survey. International Journal of Computer Vision, 129:1789– 1819, 2021

work page 2021

[9] [9]

A Survey on Knowledge Distillation of Large Language Models

Xiaohan Xu, Ming Li, Chongyang Tao, Tao Shen, Reynold Cheng, Jinyang Li, Can Xu, Dacheng Tao, and Tianyi Zhou. A survey on knowledge distil- lation of large language models.arXiv preprint arXiv:2402.13116, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[10] [10]

Bert: Pre-training of deep bidi- rectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidi- rectional transformers for language understanding. 10 APREPRINT- JANUARY16, 2026 InProceedings of NAACL-HLT, pages 4171–4186, 2019

work page 2026

[11] [11]

A survey of text classification with transformers: How wide? how large? how long? how accurate? how expensive? how safe?IEEE Access, 2024

M Fields et al. A survey of text classification with transformers: How wide? how large? how long? how accurate? how expensive? how safe?IEEE Access, 2024

work page 2024

[12] [12]

Improving bert-based model for med- ical text classification with an optimization algo- rithm

Karim Gasmi. Improving bert-based model for med- ical text classification with an optimization algo- rithm. InAdvances in Computational Collective Intel- ligence. ICCCI 2022. Communications in Computer and Information Science, volume 1653, Cham, 2022. Springer

work page 2022

[13] [13]

Survey of transformers and towards ensemble learning using transformers for natural lan- guage processing.PMC, 2024

S Patel et al. Survey of transformers and towards ensemble learning using transformers for natural lan- guage processing.PMC, 2024

work page 2024

[14] [14]

Rethinking of bert sen- tence embedding for text classification.Neural Com- puting and Applications, August 2024

Hussein T Al-Natsheh et al. Rethinking of bert sen- tence embedding for text classification.Neural Com- puting and Applications, August 2024

work page 2024

[15] [15]

Qiushi Sun, Zhangyue Yin, Xiang Li, Zhiyong Wu, Xipeng Qiu, and Lingpeng Kong

Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. How to fine-tune bert for text classification?arXiv preprint arXiv:1905.05583, 2019

work page arXiv 1905

[16] [16]

Short-text sentiment classification model based on bert and dual- stream transformer gated attention mechanism.Elec- tronics, 14(19):3904, 2025

Yuxin Li, Wei Zhang, Hao Wang, et al. Short-text sentiment classification model based on bert and dual- stream transformer gated attention mechanism.Elec- tronics, 14(19):3904, 2025

work page 2025

[17] [17]

Lnlf-bert: Transformer for long document classification with multiple attention levels.IEEE Access, 2024

Imad El Maaroufi, Youssef Mellah, Karim El Kharki, et al. Lnlf-bert: Transformer for long document classification with multiple attention levels.IEEE Access, 2024

work page 2024

[18] [18]

Are we really making much progress in text classification? a comparative review.arXiv preprint arXiv:2204.03954, 2025

J Jiang et al. Are we really making much progress in text classification? a comparative review.arXiv preprint arXiv:2204.03954, 2025

work page arXiv 2025

[19] [19]

Biobert: A pre-trained biomedical language representation model for biomedical text mining.Bioinformatics, 36(4):1234–1240, 2020

Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. Biobert: A pre-trained biomedical language representation model for biomedical text mining.Bioinformatics, 36(4):1234–1240, 2020

work page 2020

[20] [20]

Lightweight transformers for clini- cal natural language processing.Natural Language Engineering, pages 1–28, 2023

Omid Rohanian, Mohammadmahdi Nouriborji, Hannah Jauncey, Samaneh Kouchaki, Farhad Nooralahzadeh, Lei Clifton, Laura Merson, and David A Clifton. Lightweight transformers for clini- cal natural language processing.Natural Language Engineering, pages 1–28, 2023

work page 2023

[21] [21]

ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission

Kexin Huang, Jaan Altosaar, and Rajesh Ran- ganath. Clinicalbert: Modeling clinical notes and predicting hospital readmission.arXiv preprint arXiv:1904.05342, 2020

work page internal anchor Pith review Pith/arXiv arXiv 1904

[22] [22]

Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets

Yifan Peng, Shankai Yan, and Zhiyong Lu. Transfer learning in biomedical natural language processing: An evaluation of bert and elmo on ten benchmarking datasets.arXiv preprint arXiv:1906.05474, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906

[23] [23]

Med-bert: pretrained contextualized em- beddings on large-scale structured electronic health records for disease prediction.npj Digital Medicine, 4(1):1–13, 2021

Laila Rasmy, Yang Xiang, Ziqian Xie, Cui Tao, and Degui Zhi. Med-bert: pretrained contextualized em- beddings on large-scale structured electronic health records for disease prediction.npj Digital Medicine, 4(1):1–13, 2021

work page 2021

[24] [24]

Knowledge-augmented multimodal clinical rationale generation for disease diagnosis with small language models.arXiv preprint arXiv:2411.07611, 2025

ClinRaGen Research Team. Knowledge-augmented multimodal clinical rationale generation for disease diagnosis with small language models.arXiv preprint arXiv:2411.07611, 2025

work page arXiv 2025

[25] [25]

Attention mechanism with bert for content annotation and categorization of pregnancy-related questions on a community q&a site

Eunkyung Kim, Kai Huang, Yu Xing, Xiaoqian Jiang, et al. Attention mechanism with bert for content annotation and categorization of pregnancy-related questions on a community q&a site. InAMIA An- nual Symposium Proceedings, volume 2020, pages 625–634. American Medical Informatics Association, 2021

work page 2020

[26] [26]

On the behaviour of bert’s attention for the classification of medical reports.CEUR Workshop Proceedings, 3277, 2022

Luca Putelli, Alfonso E Gerevini, Alberto Lavelli, Tahir Mehmood, and Ivan Serina. On the behaviour of bert’s attention for the classification of medical reports.CEUR Workshop Proceedings, 3277, 2022

work page 2022

[27] [27]

Publicly available clinical bert embed- dings

Emily Alsentzer, John Murphy, William Boag, Wei- Hung Weng, Di Jin, Tristan Naumann, and Matthew McDermott. Publicly available clinical bert embed- dings. InProceedings of the 2nd Clinical Natural Language Processing Workshop, pages 72–78. ACL, 2019

work page 2019

[28] [28]

Comparison of bert implementations for natural language processing of narrative medical documents.ScienceDirect, 2020

G Wang et al. Comparison of bert implementations for natural language processing of narrative medical documents.ScienceDirect, 2020

work page 2020

[29] [29]

Large language model influence on diagnostic reasoning: A randomized clinical trial

Ethan Tanner et al. Large language model influence on diagnostic reasoning: A randomized clinical trial. JAMA Network Open, 7(10), 2024

work page 2024

[30] [30]

Large language models encode clinical knowledge.Nature, 620(7972):172–180, 2023

Karan Singhal, Shekoofeh Azizi, Tao Tu, S Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, et al. Large language models encode clinical knowledge.Nature, 620(7972):172–180, 2023

work page 2023

[31] [31]

Large language models for dis- ease diagnosis: a scoping review.npj Artificial Intel- ligence, 1(1):11, 2025

Shuang Wang, Zhenyu Zhao, Xi Ouyang, Qian Wang, and Dinggang Shen. Large language models for dis- ease diagnosis: a scoping review.npj Artificial Intel- ligence, 1(1):11, 2025

work page 2025

[32] [32]

Application of large language models in disease diagnosis and treatment.Chinese Medical Journal, 2025

Shuo Liu, Yanglin Pan, Kejia Wang, Xiaodong Jia, et al. Application of large language models in disease diagnosis and treatment.Chinese Medical Journal, 2025

work page 2025

[33] [33]

Kdh-mltc: Knowledge distillation for healthcare multi-label text classifica- tion.arXiv preprint arXiv:2505.07162, 2025

Hajar Sakai and Sarah S Lam. Kdh-mltc: Knowledge distillation for healthcare multi-label text classifica- tion.arXiv preprint arXiv:2505.07162, 2025

work page arXiv 2025

[34] [34]

Medical prediction using discharge- bert and corebert.Referenced in KDH-MLTC, 2025

M Hasan et al. Medical prediction using discharge- bert and corebert.Referenced in KDH-MLTC, 2025

work page 2025

[35] [35]

Cross-lingual text classification with minimal resources by transferring a sparse teacher

Giannis Karamanolakis, Daniel Hsu, and Luis Gra- vano. Cross-lingual text classification with minimal resources by transferring a sparse teacher. InFind- ings of the Association for Computational Linguistics: EMNLP 2020, pages 3604–3622, 2020

work page 2020

[36] [36]

Dsg-kd: Knowledge distillation from domain-specific to general language models.arXiv preprint arXiv:2409.14904, 2024

Y Cho et al. Dsg-kd: Knowledge distillation from domain-specific to general language models.arXiv preprint arXiv:2409.14904, 2024. 11 APREPRINT- JANUARY16, 2026

work page arXiv 2024

[37] [37]

Cross-lingual text classification with multilingual distillation and zero-shot-aware training.arXiv preprint arXiv:2202.13654, 2022

Ziqing Yang et al. Cross-lingual text classification with multilingual distillation and zero-shot-aware training.arXiv preprint arXiv:2202.13654, 2022

work page arXiv 2022

[38] [38]

G Katsogiannis-Meimarakis and G Koutrika. Mul- tilingual text categorization and sentiment analysis: A comparative analysis of the utilization of multilin- gual approaches for classifying twitter data.Neural Computing and Applications, May 2023

work page 2023

[39] [39]

Cross-lingual classification of political texts using multilingual sentence embed- dings.Political Analysis, 31(3), January 2023

Moritz Laurer, Wouter van Atteveldt, Andreu Casas, and Kasper Welbers. Cross-lingual classification of political texts using multilingual sentence embed- dings.Political Analysis, 31(3), January 2023

work page 2023

[40] [40]

Universal cross-lingual text clas- sification.arXiv preprint arXiv:2406.11028, June 2024

Raviraj Joshi et al. Universal cross-lingual text clas- sification.arXiv preprint arXiv:2406.11028, June 2024

work page arXiv 2024

[41] [41]

Herbert: Efficiently pretrained transformer-based language model for polish

Robert Mroczkowski, Piotr Rybak, Alina Wróblewska, and Ireneusz Gawlik. Herbert: Efficiently pretrained transformer-based language model for polish. InProceedings of the 8th Workshop on Balto-Slavic Natural Language Processing, pages 1–10, Kiyv, Ukraine, April 2021. Association for Computational Linguistics

work page 2021

[42] [42]

Klej: Comprehensive benchmark for polish language understanding.arXiv preprint arXiv:2005.00630, 2020

Piotr Rybak, Robert Mroczkowski, Janusz Tracz, and Ireneusz Gawlik. Klej: Comprehensive benchmark for polish language understanding.arXiv preprint arXiv:2005.00630, 2020

work page arXiv 2005

[43] [43]

Wroclaw corpus of consumer reviews sentiment (WCCRS), 2019

Jan Koco ´n, Monika Za ´sko-Zieli´nska, Piotr Miłkowski, Arkadiusz Janz, and Maciej Piasecki. Wroclaw corpus of consumer reviews sentiment (WCCRS), 2019. CLARIN-PL digital repository

work page 2019

[44] [44]

Deep learning analysis of polish electronic health records for diagnosis prediction in patients with cardiovascular diseases.Personalized Medicine, 12(6):869, May 2022

Górno´sl ˛ askie Centrum Medyczne Research Team. Deep learning analysis of polish electronic health records for diagnosis prediction in patients with cardiovascular diseases.Personalized Medicine, 12(6):869, May 2022

work page 2022

[45] [45]

AssistMED Project Team. Practical use case of natural language processing for observational clin- ical research data retrieval from electronic health records: Assistmed project.Polish Archives of Inter- nal Medicine, 2024

work page 2024

[46] [46]

Machine learning tools match physician accuracy in multilingual text annotation

Marta Zielonka, Andrzej Czy˙zewski, Dariusz Szplit, Beata Graff, Anna Szyndler, Mariusz Budzisz, and Krzysztof Narkiewicz. Machine learning tools match physician accuracy in multilingual text annotation. Scientific Reports, 15(1):5487, 2025

work page 2025

[47] [47]

A comprehensive polish medical speech dataset for enhancing auto- matic medical dictation.Scientific Data, 12(1):1436, 2025

Andrzej Czy ˙zewski, Sebastian Cygert, Karolina Marciniuk, Maciej Szczodrak, Arkadiusz Harasim- iuk, Piotr Odya, Marina Galanina, Piotr Szczuko, Bo˙zena Kostek, Beata Graff, et al. A comprehensive polish medical speech dataset for enhancing auto- matic medical dictation.Scientific Data, 12(1):1436, 2025. 12

work page 2025