Large Language Models for Medical Forecasting -- Foresight 2

Daniel Bean; James Teo; Joshua Au Yeung; Richard J. Dobson; Zeljko Kraljevic

arxiv: 2412.10848 · v1 · pith:NKRFQTOXnew · submitted 2024-12-14 · 💻 cs.CL · cs.AI· cs.LG

Large Language Models for Medical Forecasting -- Foresight 2

Zeljko Kraljevic , Joshua Au Yeung , Daniel Bean , James Teo , Richard J. Dobson This is my paper

classification 💻 cs.CL cs.AIcs.LG

keywords biomedicaldatafine-tunedmodelforecastingforesighthospitalimprovement

0 comments

read the original abstract

Foresight 2 (FS2) is a large language model fine-tuned on hospital data for modelling patient timelines (GitHub 'removed for anon'). It can understand patients' clinical notes and predict SNOMED codes for a wide range of biomedical use cases, including diagnosis suggestions, risk forecasting, and procedure and medication recommendations. FS2 is trained on the free text portion of the MIMIC-III dataset, firstly through extracting biomedical concepts and then creating contextualised patient timelines, upon which the model is then fine-tuned. The results show significant improvement over the previous state-of-the-art for the next new biomedical concept prediction (P/R - 0.73/0.66 vs 0.52/0.32) and a similar improvement specifically for the next new disorder prediction (P/R - 0.69/0.62 vs 0.46/0.25). Finally, on the task of risk forecast, we compare our model to GPT-4-turbo (and a range of open-source biomedical LLMs) and show that FS2 performs significantly better on such tasks (P@5 - 0.90 vs 0.65). This highlights the need to incorporate hospital data into LLMs and shows that small models outperform much larger ones when fine-tuned on high-quality, specialised data.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Training Large Language Models to Predict Clinical Events
cs.LG 2026-05 unverdicted novelty 5.0

Training a LoRA adapter on 6,900 examples derived from MIMIC-III notes reduces expected calibration error from 0.1269 to 0.0398 and Brier score from 0.199 to 0.145 for clinical event prediction.