Presents the first fully open pipeline for clinical LLMs that unifies eight public QA datasets with clinician-vetted synthetic data from guidelines and vignettes, achieving improved performance on medical benchmarks while enabling full auditability.
arXiv preprint arXiv:2311.09774 , year=
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Freezing deep layers and training shallow layers during continued pre-training of LLMs outperforms full fine-tuning and the opposite allocation on C-Eval and CMMLU, guided by a new layer-sensitivity diagnostic.
LLM agents iteratively generate and optimize data processing strategies for fine-tuning, delivering over 80% win rates versus unprocessed data and 65% versus LLM-based AutoML baselines while cutting search time by up to 10x.
HuatuoGPT-o1 achieves superior medical complex reasoning by using a verifier to curate reasoning trajectories for fine-tuning and then applying RL with verifier-based rewards.
citing papers explorer
-
Fully Open Meditron: An Auditable Pipeline for Clinical LLMs
Presents the first fully open pipeline for clinical LLMs that unifies eight public QA datasets with clinician-vetted synthetic data from guidelines and vignettes, achieving improved performance on medical benchmarks while enabling full auditability.
-
Freeze Deep, Train Shallow: Interpretable Layer Allocation for Continued Pre-Training
Freezing deep layers and training shallow layers during continued pre-training of LLMs outperforms full fine-tuning and the opposite allocation on C-Eval and CMMLU, guided by a new layer-sensitivity diagnostic.
-
LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning
LLM agents iteratively generate and optimize data processing strategies for fine-tuning, delivering over 80% win rates versus unprocessed data and 65% versus LLM-based AutoML baselines while cutting search time by up to 10x.
-
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
HuatuoGPT-o1 achieves superior medical complex reasoning by using a verifier to curate reasoning trajectories for fine-tuning and then applying RL with verifier-based rewards.