Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.
Publicly Available Clinical
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5representative citing papers
HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.
PromptRad reformulates multi-label radiology report classification as masked language modeling and enriches verbalizers with UMLS synonyms, outperforming baselines with only 32 training examples.
Fine-tuned e5_large LLM reaches 0.866 F1_micro on ICD classification of 145k Spanish psychiatric texts, outperforming BoW, TF-IDF, and other transformers.
TF-IDF with LGBM achieved the highest AUC-ROC of 0.80 and best balance in predicting next-day discharge from clinical notes, outperforming fine-tuned compact LLMs like DistilGPT-2.
citing papers explorer
-
Compared to What? Baselines and Metrics for Counterfactual Prompting
Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.
-
HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering
HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.
-
PromptRad: Knowledge-Enhanced Multi-Label Prompt-Tuning for Low-Resource Radiology Report Labeling
PromptRad reformulates multi-label radiology report classification as masked language modeling and enriches verbalizers with UMLS synonyms, outperforming baselines with only 32 training examples.
-
Automated ICD Classification of Psychiatric Diagnoses: From Classical NLP to Large Language Models
Fine-tuned e5_large LLM reaches 0.866 F1_micro on ICD classification of 145k Spanish psychiatric texts, outperforming BoW, TF-IDF, and other transformers.
-
Resource-Conscious Modeling for Next- Day Discharge Prediction Using Clinical Notes
TF-IDF with LGBM achieved the highest AUC-ROC of 0.80 and best balance in predicting next-day discharge from clinical notes, outperforming fine-tuned compact LLMs like DistilGPT-2.