A corpus-centric framework diagnoses scale, structure, overlap, metadata, and terminology properties across nine biomedical NER/EL corpora, showing substantial differences that common statistics fail to capture.
Med42-v2: A suite of clinical llms
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
MHGraphBench is a new PrimeKG-derived benchmark that exposes a recognition-to-judgment gap in 15 LLMs on mental health tasks while stressing that results measure KG agreement under constrained interfaces, not clinical capability.
MedHopQA introduces a 1,000-question two-hop biomedical QA benchmark where retrieval-augmented systems reach 89% conceptual accuracy, outperforming zero-shot baselines by over 20 points.
Evaluation of 22 LLMs shows they are more susceptible to spin in medical abstracts than humans but can recognize and mitigate it when prompted.
FairEnc reduces demographic biases in VLMs for glaucoma detection via LLM-generated synthetic text and dual-level visual debiasing while preserving diagnostic accuracy across datasets.
TheraAgent uses iterative agentic refinement with an integrated clinical judge to produce more accurate, complete, and safer treatment plans than standard LLMs.
LLMs show strong exam performance on medical tasks but exhibit a clear gap in accuracy on authentic clinical decision-making as measured by the new MR-Bench benchmark and unified evaluations.
An LLM-based framework automates auditing of discharge summaries using a DISCHARGED-derived checklist on MIMIC-IV data to detect missing or ambiguous documentation elements.
ECG foundation models for signal interpretation and medical LLMs for reasoning can be integrated into agentic systems for real-time cardiovascular intelligence on edge devices.
citing papers explorer
-
What Do Biomedical NER and Entity Linking Benchmarks Measure? A Corpus-Centric Diagnostic Framework
A corpus-centric framework diagnoses scale, structure, overlap, metadata, and terminology properties across nine biomedical NER/EL corpora, showing substantial differences that common statistics fail to capture.
-
MHGraphBench: Knowledge Graph-Grounded Benchmarking of Mental Health Knowledge in Large Language Models
MHGraphBench is a new PrimeKG-derived benchmark that exposes a recognition-to-judgment gap in 15 LLMs on mental health tasks while stressing that results measure KG agreement under constrained interfaces, not clinical capability.
-
Overview of the MedHopQA track at BioCreative IX: track description, participation and evaluation of systems for multi-hop medical question answering
MedHopQA introduces a 1,000-question two-hop biomedical QA benchmark where retrieval-augmented systems reach 89% conceptual accuracy, outperforming zero-shot baselines by over 20 points.
-
Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature?
Evaluation of 22 LLMs shows they are more susceptible to spin in medical abstracts than humans but can recognize and mitigate it when prompted.
-
FairEnc: A Fair Vision-Language Model with Fair Vision and Text Encoders for Glaucoma Detection
FairEnc reduces demographic biases in VLMs for glaucoma detection via LLM-generated synthetic text and dual-level visual debiasing while preserving diagnostic accuracy across datasets.
-
TheraAgent: Self-Improving Therapeutic Agent for Precise and Comprehensive Treatment Planning
TheraAgent uses iterative agentic refinement with an integrated clinical judge to produce more accurate, complete, and safer treatment plans than standard LLMs.
-
Medical Reasoning with Large Language Models: A Survey and MR-Bench
LLMs show strong exam performance on medical tasks but exhibit a clear gap in accuracy on authentic clinical decision-making as measured by the new MR-Bench benchmark and unified evaluations.
-
Automated Auditing of Hospital Discharge Summaries for Care Transitions
An LLM-based framework automates auditing of discharge summaries using a DISCHARGED-derived checklist on MIMIC-IV data to detect missing or ambiguous documentation elements.
-
ECG Foundation Models and Medical LLMs for Agentic Cardiovascular Intelligence at the Edge: A Review and Outlook
ECG foundation models for signal interpretation and medical LLMs for reasoning can be integrated into agentic systems for real-time cardiovascular intelligence on edge devices.