White-box method ReXTrust achieves highest AUC (peak 93.0) on Gut-VLM across five VLMs, outperforming alternatives by statistically significant margins while black-box and some gray-box methods collapse on certain models.
Chaudhari, and Jean-Benoit Delbrouck
6 Pith papers cite this work. Polarity classification is still indexing.
years
2026 6verdicts
UNVERDICTED 6representative citing papers
Introduces the SAGE South Asian GI endoscopy dataset and reports large performance drops in multi-class classifiers and large multimodal models due to geographic population shift.
RadSEM is a constrained LLM-assisted metric that rewrites radiology reports into atomic finding sentences, applies contradiction-constrained many-to-many matching, and computes an abnormal-focused weighted F1 score.
RAD3D-Prefix is a diagnostic-prior conditioning framework for 3D CT report generation that integrates image embeddings with multi-label classification logits, showing that freezing larger LLMs and training only projection layers outperforms fine-tuning across scales.
ESC-RL improves RL for radiology reports via group-wise evidence-aware rewards (GEAR) and LLM-driven self-correcting preference learning (SPL), reaching state-of-the-art on two chest X-ray datasets.
CXRMate-2 improves chest X-ray report generation via temporal embeddings and tractable RL, delivering metric gains and 45% acceptability in radiologist review with no significant preference difference on most findings.
citing papers explorer
-
A Benchmark for Hallucination Detection in VLMs for Gastrointestinal Endoscopy
White-box method ReXTrust achieves highest AUC (peak 93.0) on Gut-VLM across five VLMs, outperforming alternatives by statistically significant margins while black-box and some gray-box methods collapse on certain models.
-
SAGE: An Expert-Annotated South Asian GI Endoscopy Dataset for Multimodal Learning and Hallucination Analysis
Introduces the SAGE South Asian GI endoscopy dataset and reports large performance drops in multi-class classifiers and large multimodal models due to geographic population shift.
-
RadSEM: A Finding-by-Finding Metric for Clinical Consistency in Radiology Reports
RadSEM is a constrained LLM-assisted metric that rewrites radiology reports into atomic finding sentences, applies contradiction-constrained many-to-many matching, and computes an abnormal-focused weighted F1 score.
-
Revisiting LLM Adaptation for 3D CT Report Generation: A Study of Scaling and Diagnostic Priors
RAD3D-Prefix is a diagnostic-prior conditioning framework for 3D CT report generation that integrates image embeddings with multi-label classification logits, showing that freezing larger LLMs and training only projection layers outperforms fine-tuning across scales.
-
Enhancing Reinforcement Learning for Radiology Report Generation with Evidence-aware Rewards and Self-correcting Preference Learning
ESC-RL improves RL for radiology reports via group-wise evidence-aware rewards (GEAR) and LLM-driven self-correcting preference learning (SPL), reaching state-of-the-art on two chest X-ray datasets.
-
CXRMate-2: Structured Multimodal Temporal Embeddings and Tractable Reinforcement Learning for Clinically Acceptable Chest X-ray Radiology Report Generation
CXRMate-2 improves chest X-ray report generation via temporal embeddings and tractable RL, delivering metric gains and 45% acceptability in radiologist review with no significant preference difference on most findings.