Large Language Models versus Classical Machine Learning: Performance in COVID-19 Mortality Prediction Using High-Dimensional Tabular Data

· 2024 · cs.LG · arXiv 2409.02136

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

This study compared the performance of classical feature-based machine learning models (CMLs) and large language models (LLMs) in predicting COVID-19 mortality using high-dimensional tabular data from 9,134 patients across four hospitals. Seven CML models, including XGBoost and random forest (RF), were evaluated alongside eight LLMs, such as GPT-4 and Mistral-7b, which performed zero-shot classification on text-converted structured data. Additionally, Mistral- 7b was fine-tuned using the QLoRA approach. XGBoost and RF demonstrated superior performance among CMLs, achieving F1 scores of 0.87 and 0.83 for internal and external validation, respectively. GPT-4 led the LLM category with an F1 score of 0.43, while fine-tuning Mistral-7b significantly improved its recall from 1% to 79%, yielding a stable F1 score of 0.74 during external validation. Although LLMs showed moderate performance in zero-shot classification, fine-tuning substantially enhanced their effectiveness, potentially bridging the gap with CML models. However, CMLs still outperformed LLMs in handling high-dimensional tabular data tasks. This study highlights the potential of both CMLs and fine-tuned LLMs in medical predictive modeling, while emphasizing the current superiority of CMLs for structured data analysis.

representative citing papers

Deterministic Decisions for High-Stakes AI. A Zero-Egress Pipeline with the Deployability of RAG and the Accuracy of Machine Learning

cs.LG · 2026-06-28 · unverdicted · novelty 5.0

Zero-shot LLMs exhibit intervention bias in educational advising, over-recommending actions by 43 percentage points, while supervised DT and XGBoost models achieve near-zero calibration error and macro-F1 of 0.79.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Deterministic Decisions for High-Stakes AI. A Zero-Egress Pipeline with the Deployability of RAG and the Accuracy of Machine Learning cs.LG · 2026-06-28 · unverdicted · none · ref 33 · internal anchor
Zero-shot LLMs exhibit intervention bias in educational advising, over-recommending actions by 43 percentage points, while supervised DT and XGBoost models achieve near-zero calibration error and macro-F1 of 0.79.

Large Language Models versus Classical Machine Learning: Performance in COVID-19 Mortality Prediction Using High-Dimensional Tabular Data

fields

years

verdicts

representative citing papers

citing papers explorer