CuraView detects sentence-level faithfulness hallucinations in medical discharge summaries via GraphRAG knowledge graphs and multi-agent evidence grading, achieving 0.831 F1 on critical contradictions with a fine-tuned Qwen3-14B model and 50% relative improvement over baselines.
InProceedings of the 37th International Conference on Neural Infor- mation Processing Systems, pages 53728–53741
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
MediEval benchmark reveals LLM failures like hallucinated support and truth inversion in medical reasoning, while CoRFu fine-tuning raises macro-F1 by 16.4 points and removes truth inversion errors.
Med-Gemini sets new records on 10 of 14 medical benchmarks including 91.1% on MedQA-USMLE, beats GPT-4V by 44.5% on multimodal tasks, and surpasses humans on medical text summarization.
citing papers explorer
No citing papers match the current filters.