Recognition: unknown
RADS: Reinforcement Learning-Based Sample Selection Improves Transfer Learning in Low-resource and Imbalanced Clinical Settings
Pith reviewed 2026-05-10 00:34 UTC · model grok-4.3
The pith
Reinforcement learning selects more useful samples than standard active learning for clinical transfer learning under scarcity and imbalance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RADS is a sample selection strategy that uses reinforcement learning to adaptively choose the most informative samples from a small, imbalanced pool for domain adaptation in clinical natural language processing tasks. Experiments on real-world clinical datasets demonstrate that this leads to enhanced model transferability and better handling of extreme class imbalance compared to conventional active learning techniques such as uncertainty and diversity sampling.
What carries the argument
RADS, an RL-based adaptive sampler that learns a policy to select samples maximizing post-fine-tuning performance in the target clinical domain.
If this is right
- Selected samples lead to higher performance metrics on downstream clinical tasks like classification.
- The strategy remains effective even with extreme class imbalance in the available data.
- Transfer learning becomes more reliable in low-resource clinical settings without additional data collection.
- Outperforms uncertainty sampling and diversity sampling baselines on multiple datasets.
Where Pith is reading between the lines
- This method could be tested on non-clinical low-resource NLP tasks to check broader applicability.
- Different reward functions for the RL agent might further improve selection quality in future iterations.
- Combining RADS with other transfer learning techniques like data augmentation could yield additional gains.
Load-bearing premise
The reward signal used to train the reinforcement learning agent can guide it toward genuinely informative samples instead of being overwhelmed by the class imbalance or outliers in the tiny initial dataset.
What would settle it
If on a new clinical dataset with low resources and high imbalance, models fine-tuned on RADS-selected samples show no improvement or worse results than those using uncertainty sampling, as measured by standard metrics like F1 score.
Figures
read the original abstract
A common strategy in transfer learning is few shot fine-tuning, but its success is highly dependent on the quality of samples selected as training examples. Active learning methods such as uncertainty sampling and diversity sampling can select useful samples. However, under extremely low-resource and class-imbalanced conditions, they often favor outliers rather than truly informative samples, resulting in degraded performance. In this paper, we introduce RADS (Reinforcement Adaptive Domain Sampling), a robust sample selection strategy using reinforcement learning (RL) to identify the most informative samples. Experimental evaluations on several real world clinical datasets show our sample selection strategy enhances model transferability while maintaining robust performance under extreme class imbalance compared to traditional methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes RADS (Reinforcement Adaptive Domain Sampling), an RL-based sample selection method for improving transfer learning in low-resource, class-imbalanced clinical NLP settings. It claims that uncertainty and diversity sampling often select outliers under extreme imbalance, while RADS learns a policy to identify more informative samples, yielding better transferability and robustness on real clinical datasets.
Significance. If the empirical claims hold with proper controls, RADS could address a practical bottleneck in clinical transfer learning by providing a more stable alternative to standard active learning heuristics when labeled data is both scarce and skewed.
major comments (2)
- [Abstract] Abstract: the central claim that 'experimental evaluations on several real world clinical datasets show our sample selection strategy enhances model transferability while maintaining robust performance under extreme class imbalance' is asserted without any metrics, baselines, dataset sizes, imbalance ratios, or statistical tests, so the claim cannot be evaluated.
- [Method] Method (RL component): no description is given of the reward function, state representation, or any balancing/diversity term that would prevent the policy gradient from being dominated by majority-class performance or outlier gradients in tiny initial pools; this is load-bearing for the claim that RL avoids the failure modes of uncertainty/diversity sampling.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below and have made revisions to strengthen the paper where the comments identify areas for improvement.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that 'experimental evaluations on several real world clinical datasets show our sample selection strategy enhances model transferability while maintaining robust performance under extreme class imbalance' is asserted without any metrics, baselines, dataset sizes, imbalance ratios, or statistical tests, so the claim cannot be evaluated.
Authors: We agree that the original abstract is too high-level and does not allow readers to immediately assess the strength of the central claim. In the revised manuscript we have updated the abstract to include concrete details: specific performance metrics (F1 improvements over baselines), the clinical datasets used along with their sizes and imbalance ratios, and a brief reference to the statistical tests performed. This revision makes the claim directly evaluable from the abstract while remaining concise. revision: yes
-
Referee: [Method] Method (RL component): no description is given of the reward function, state representation, or any balancing/diversity term that would prevent the policy gradient from being dominated by majority-class performance or outlier gradients in tiny initial pools; this is load-bearing for the claim that RL avoids the failure modes of uncertainty/diversity sampling.
Authors: The referee is correct that a precise description of the RL components is necessary to substantiate the paper's claims. The initial submission's Method section did not provide sufficient detail on these elements. We have substantially expanded the Method section to explicitly define (1) the state representation (model embeddings combined with a summary of the labeled pool's class distribution), (2) the reward function (improvement in macro-F1 on a held-out validation set plus a diversity penalty term), and (3) an explicit balancing mechanism within the policy objective that down-weights majority-class gradients and outlier influence. These additions directly explain how the learned policy mitigates the outlier and imbalance problems observed with uncertainty and diversity sampling. revision: yes
Circularity Check
No significant circularity; method is an application of standard RL without self-referential reductions
full rationale
The paper introduces RADS as a reinforcement learning strategy for sample selection in low-resource clinical transfer learning. No equations, fitted parameters, or derivations are presented that reduce by construction to the inputs (e.g., no self-definitional reward functions or predictions that are statistically forced from the same data). The central claims rest on experimental evaluations on external real-world datasets rather than on any load-bearing self-citation chain, uniqueness theorem from the authors, or ansatz smuggled via prior work. This is a standard methodological contribution with independent empirical content; no circular steps are identifiable from the provided text.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Proceedings of Second Doctoral Symposium on Computational Intelligence: DoSCI 2021 , pages=
Named entity recognition in natural language processing: A systematic review , author=. Proceedings of Second Doctoral Symposium on Computational Intelligence: DoSCI 2021 , pages=. 2022 , organization=
2021
-
[2]
Journal of Computational and Cognitive Engineering , volume=
Comparing BERT against traditional machine learning models in text classification , author=. Journal of Computational and Cognitive Engineering , volume=
-
[3]
Retrieving and reading: A comprehensive survey on open-domain question answering
Retrieving and reading: A comprehensive survey on open-domain question answering , author=. arXiv preprint arXiv:2101.00774 , year=
-
[4]
LLaMA: Open and Efficient Foundation Language Models
Llama: Open and efficient foundation language models , author=. arXiv preprint arXiv:2302.13971 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
Liu, Aixin and Feng, Bei and Xue, Bing and Wang, Bingxuan and Wu, Bochao and Lu, Chengda and Zhao, Chenggang and Deng, Chengqi and Zhang, Chenyu and Ruan, Chong and others , journal=
-
[6]
Journal of Big Data , volume=
A survey of transfer learning , author=. Journal of Big Data , volume=. 2016 , publisher=
2016
-
[7]
2009 , publisher=
Neofytos, Dionissios and Horn, D and Anaissie, E and Steinbach, W and Olyaei, A and Fishman, J and Pfaller, M and Chang, C and Webster, K and Marr, K , journal=. 2009 , publisher=
2009
-
[8]
2014 , publisher=
Girmenia, Corrado and Raiola, Anna Maria and Piciocchi, Alfonso and Algarotti, Alessandra and Stanzani, Marta and Cudillo, Laura and Pecoraro, Clara and Guidi, Stefano and Iori, Anna Paola and Montante, Barbara and others , journal=. 2014 , publisher=
2014
-
[9]
Clinical Infectious Diseases , volume=
Prospective surveillance for invasive fungal infections in hematopoietic stem cell transplant recipients, 2001--2006: overview of the Transplant-Associated Infection Surveillance Network (TRANSNET) Database , author=. Clinical Infectious Diseases , volume=. 2010 , publisher=
2001
-
[10]
Journal of Hospital Infection , volume=
Advances in electronic surveillance for healthcare-associated infections in the 21st Century: a systematic review , author=. Journal of Hospital Infection , volume=. 2013 , publisher=
2013
-
[11]
Journal of Biomedical Informatics , volume=
Detecting evidence of invasive fungal infections in cytology and histopathology reports enriched with concept-level annotations , author=. Journal of Biomedical Informatics , volume=. 2023 , publisher=
2023
-
[12]
2015 , publisher=
Martinez, David and Ananda-Rajah, Michelle R and Suominen, Hanna and Slavin, Monica A and Thursky, Karin A and Cavedon, Lawrence , journal=. 2015 , publisher=
2015
-
[13]
2008 Fourth international conference on natural computation , volume=
On the class imbalance problem , author=. 2008 Fourth international conference on natural computation , volume=. 2008 , organization=
2008
-
[14]
Terminology , volume=
Terminology in medical reports: Textual parameters and their lexical indicators that hinder patient understanding , author=. Terminology , volume=. 2020 , publisher=
2020
-
[15]
2021 , publisher=
Cury, Ricardo C and Megyeri, Istvan and Lindsey, Tony and Macedo, Robson and Batlle, Juan and Kim, Shwan and Baker, Brian and Harris, Robert and Clark, Reese H , journal=. 2021 , publisher=
2021
-
[16]
Computers in Biology and Medicine , volume=
L. Computers in Biology and Medicine , volume=. 2020 , publisher=
2020
-
[17]
2020 , publisher=
Lee, Jinhyuk and Yoon, Wonjin and Kim, Sungdong and Kim, Donghyeon and Kim, Sunkyu and So, Chan Ho and Kang, Jaewoo , journal=. 2020 , publisher=
2020
-
[18]
Huang, Kexin and Altosaar, Jaan and Ranganath, Rajesh , journal=
-
[19]
International Congress on Information and Communication Technology , pages=
Epidemic Information Extraction for Event-Based Surveillance Using Large Language Models , author=. International Congress on Information and Communication Technology , pages=. 2024 , organization=
2024
-
[20]
BMC Medical Informatics and Decision Making , volume=
Leveraging machine learning approaches for predicting potential Lyme disease cases and incidence rates in the United States using Twitter , author=. BMC Medical Informatics and Decision Making , volume=. 2023 , publisher=
2023
-
[21]
PPT : Pre-trained Prompt Tuning for Few-shot Learning
Gu, Yuxian and Han, Xu and Liu, Zhiyuan and Huang, Minlie. PPT : Pre-trained Prompt Tuning for Few-shot Learning. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.576
-
[22]
Advances in Neural Information Processing Systems , volume=
Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning , author=. Advances in Neural Information Processing Systems , volume=
-
[23]
Machine Learning , volume=
How to measure uncertainty in uncertainty sampling for active learning , author=. Machine Learning , volume=. 2022 , publisher=
2022
-
[24]
International Journal of Computer Vision , volume=
Multi-class active learning by uncertainty sampling with diversity maximization , author=. International Journal of Computer Vision , volume=. 2015 , publisher=
2015
-
[25]
Learning how to Active Learn: A Deep Reinforcement Learning Approach
Fang, Meng and Li, Yuan and Cohn, Trevor. Learning how to Active Learn: A Deep Reinforcement Learning Approach. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017. doi:10.18653/v1/D17-1063
-
[26]
Journal of the American Medical Informatics Association , volume=
A review of reinforcement learning for natural language processing and applications in healthcare , author=. Journal of the American Medical Informatics Association , volume=. 2024 , publisher=
2024
-
[27]
Ai Open , volume=
Data augmentation approaches in natural language processing: A survey , author=. Ai Open , volume=. 2022 , publisher=
2022
-
[28]
Journal of Big Data , volume=
Text data augmentation for deep learning , author=. Journal of Big Data , volume=. 2021 , publisher=
2021
-
[29]
ACM Computing Surveys , volume=
A survey on data augmentation for text classification , author=. ACM Computing Surveys , volume=. 2022 , publisher=
2022
-
[30]
Studies in Health Technology and Informatics , volume=
Automated Detection of Invasive Fungal Infections in Clinical Reports Using Medical Language Models , author=. Studies in Health Technology and Informatics , volume=
-
[31]
ACM computing surveys (CSUR) , volume=
A survey of deep active learning , author=. ACM computing surveys (CSUR) , volume=. 2021 , publisher=
2021
-
[32]
International Journal of Machine Learning and Computing , volume=
Addressing the class imbalance problem in medical datasets , author=. International Journal of Machine Learning and Computing , volume=. 2013 , publisher=
2013
-
[33]
Machine Learning , volume=
The class imbalance problem in deep learning , author=. Machine Learning , volume=. 2024 , publisher=
2024
-
[34]
JOIV: International Journal on Informatics Visualization , volume=
Addressing Class Imbalance of Health Data: A Systematic Literature Review on Modified Synthetic Minority Oversampling Technique (SMOTE) Strategies , author=. JOIV: International Journal on Informatics Visualization , volume=
-
[35]
Frontiers in digital health , volume=
A review on over-sampling techniques in classification of multi-class imbalanced datasets: Insights for medical problems , author=. Frontiers in digital health , volume=. 2024 , publisher=
2024
-
[36]
Journal of Big Data , volume=
Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data , author=. Journal of Big Data , volume=. 2024 , publisher=
2024
-
[37]
Artificial Intelligence Review , volume=
Cost-sensitive learning for imbalanced medical data: a review , author=. Artificial Intelligence Review , volume=. 2024 , publisher=
2024
-
[38]
Journal of Information Science , volume=
A novel focal-loss and class-weight-aware convolutional neural network for the classification of in-text citations , author=. Journal of Information Science , volume=. 2023 , publisher=
2023
-
[39]
Language Models are Few-Shot Learners , url =
Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winte...
-
[40]
Active Learning by Acquiring Contrastive Examples
Margatina, Katerina and Vernikos, Giorgos and Barrault, Lo. Active Learning by Acquiring Contrastive Examples. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. doi:10.18653/v1/2021.emnlp-main.51
-
[41]
A survey on deep transfer learning , author=. Artificial Neural Networks and Machine Learning--ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part III 27 , pages=. 2018 , organization=
2018
-
[42]
Tsinghua Science and Technology , volume=
Enriching the transfer learning with pre-trained lexicon embedding for low-resource neural machine translation , author=. Tsinghua Science and Technology , volume=. 2021 , publisher=
2021
-
[43]
arXiv preprint arXiv:2007.04239 , year=
A survey on transfer learning in natural language processing , author=. arXiv preprint arXiv:2007.04239 , year=
-
[44]
Health informatics journal , volume=
The class imbalance problem detecting adverse drug reactions in electronic health records , author=. Health informatics journal , volume=. 2019 , publisher=
2019
-
[45]
Journal of Big Data , volume=
Survey on deep learning with class imbalance , author=. Journal of Big Data , volume=. 2019 , publisher=
2019
-
[46]
Proceedings of the 40th annual meeting of the Association for Computational Linguistics , pages=
Bleu: a method for automatic evaluation of machine translation , author=. Proceedings of the 40th annual meeting of the Association for Computational Linguistics , pages=
-
[47]
2022 , publisher=
Liu, Jinghui and Capurro, Daniel and Nguyen, Anthony and Verspoor, Karin , journal=. 2022 , publisher=
2022
-
[48]
Hester, Todd and Vecerik, Matej and Pietquin, Olivier and Lanctot, Marc and Schaul, Tom and Piot, Bilal and Horgan, Dan and Quan, John and Sendonaris, Andrew and Osband, Ian and others , booktitle=. Deep
-
[49]
Proceedings of the international multiconference of engineers and computer scientists , volume=
Using of Jaccard coefficient for keywords similarity , author=. Proceedings of the international multiconference of engineers and computer scientists , volume=
-
[50]
2004 , publisher=
Townsend, David W and Carney, Jonathan PJ and Yap, Jeffrey T and Hall, Nathan C , journal=. 2004 , publisher=
2004
-
[51]
Current Fungal Infection Reports , volume=
Histopathology in the diagnosis of invasive fungal diseases , author=. Current Fungal Infection Reports , volume=. 2021 , publisher=
2021
-
[52]
Nature , volume=
Human-level control through deep reinforcement learning , author=. Nature , volume=. 2015 , publisher=
2015
-
[53]
Wang, Peng and Wang, Xiaobin and Lou, Chao and Mao, Shengyu and Xie, Pengjun and Jiang, Yong. Effective Demonstration Annotation for In-Context Learning via Language Model-Based Determinantal Point Process. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.74
-
[54]
The Annals of Mathematical Statistics , volume=
On information and sufficiency , author=. The Annals of Mathematical Statistics , volume=. 1951 , publisher=
1951
-
[55]
Wang, Yuxia and Liu, Fei and Verspoor, Karin and Baldwin, Timothy. Evaluating the Utility of Model Configurations and Data Augmentation on Clinical Semantic Textual Similarity. Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing. 2020. doi:10.18653/v1/2020.bionlp-1.11
-
[56]
Knowledge and information systems , volume=
A survey on instance selection for active learning , author=. Knowledge and information systems , volume=. 2013 , publisher=
2013
-
[57]
Cold-start Active Learning through Self-supervised Language Modeling
Yuan, Michelle and Lin, Hsuan-Tien and Boyd-Graber, Jordan. Cold-start Active Learning through Self-supervised Language Modeling. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.637
-
[58]
Rozova, Vlada and Khanina, Anna and Ong, Jeremy and Alipour, Ramin and Worth, Leon and Slavin, Monica and Thursky, Karin and Verspoor, Karin , title =. 2025 , month = feb, note =. doi:10.13026/d51v-j343 , url =
-
[59]
Rozova, Vlada and Khanina, Anna and Teng, Jasmine and Teh, Joanne and Worth, Leon and Slavin, Monica and thursky, karin and Verspoor, Karin , title =. 2023 , month = jul, note =. doi:10.13026/fmj9-p237 , url =
-
[60]
Information systems frontiers , pages=
Comparing and Improving Active Learning Uncertainty Measures for Transformer Models by Discarding Outliers , author=. Information systems frontiers , pages=. 2024 , publisher=
2024
-
[61]
international conference on machine learning , pages=
Dropout as a bayesian approximation: Representing model uncertainty in deep learning , author=. international conference on machine learning , pages=. 2016 , organization=
2016
-
[62]
Bayesian active learning for classification and preferenc e learning,
Bayesian active learning for classification and preference learning , author=. arXiv preprint arXiv:1112.5745 , year=
-
[63]
2019 , publisher=
Johnson, Alistair EW and Pollard, Tom J and Berkowitz, Seth J and Greenbaum, Nathaniel R and Lungren, Matthew P and Deng, Chih-ying and Mark, Roger G and Horng, Steven , journal=. 2019 , publisher=
2019
-
[64]
Proceedings of the AAAI conference on artificial intelligence , volume=
Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
-
[65]
International conference on machine learning , pages=
Dueling network architectures for deep reinforcement learning , author=. International conference on machine learning , pages=. 2016 , organization=
2016
-
[66]
Zhang, Jipeng and Qin, Yaxuan and Pi, Renjie and Zhang, Weizhong and Pan, Rui and Zhang, Tong , booktitle=
-
[67]
Advances in neural information processing systems , volume=
Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning , author=. Advances in neural information processing systems , volume=
-
[68]
Elangovan, Aparna and He, Jiayuan and Verspoor, Karin. Memorization vs. Generalization : Quantifying Data Leakage in NLP Performance Evaluation. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 2021. doi:10.18653/v1/2021.eacl-main.113
-
[69]
2025 , url=
Daniel P Jeong and Zachary Chase Lipton and Pradeep Kumar Ravikumar , journal=. 2025 , url=
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.