Med42-v2: A suite of clinical llms

Marco AF Pimentel · 2024 · arXiv 2408.06142

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

What Do Biomedical NER and Entity Linking Benchmarks Measure? A Corpus-Centric Diagnostic Framework

cs.CL · 2026-05-19 · accept · novelty 7.0

A corpus-centric framework diagnoses scale, structure, overlap, metadata, and terminology properties across nine biomedical NER/EL corpora, showing substantial differences that common statistics fail to capture.

MHGraphBench: Knowledge Graph-Grounded Benchmarking of Mental Health Knowledge in Large Language Models

cs.CL · 2026-05-15 · conditional · novelty 7.0

MHGraphBench is a new PrimeKG-derived benchmark that exposes a recognition-to-judgment gap in 15 LLMs on mental health tasks while stressing that results measure KG agreement under constrained interfaces, not clinical capability.

Overview of the MedHopQA track at BioCreative IX: track description, participation and evaluation of systems for multi-hop medical question answering

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

MedHopQA introduces a 1,000-question two-hop biomedical QA benchmark where retrieval-augmented systems reach 89% conceptual accuracy, outperforming zero-shot baselines by over 20 points.

Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature?

cs.CL · 2025-02-11 · unverdicted · novelty 7.0

Evaluation of 22 LLMs shows they are more susceptible to spin in medical abstracts than humans but can recognize and mitigate it when prompted.

FairEnc: A Fair Vision-Language Model with Fair Vision and Text Encoders for Glaucoma Detection

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

FairEnc reduces demographic biases in VLMs for glaucoma detection via LLM-generated synthetic text and dual-level visual debiasing while preserving diagnostic accuracy across datasets.

TheraAgent: Self-Improving Therapeutic Agent for Precise and Comprehensive Treatment Planning

cs.AI · 2026-05-07 · unverdicted · novelty 5.0

TheraAgent uses iterative agentic refinement with an integrated clinical judge to produce more accurate, complete, and safer treatment plans than standard LLMs.

Medical Reasoning with Large Language Models: A Survey and MR-Bench

cs.CL · 2026-03-17 · accept · novelty 5.0

LLMs show strong exam performance on medical tasks but exhibit a clear gap in accuracy on authentic clinical decision-making as measured by the new MR-Bench benchmark and unified evaluations.

Automated Auditing of Hospital Discharge Summaries for Care Transitions

cs.AI · 2026-04-07 · unverdicted · novelty 4.0

An LLM-based framework automates auditing of discharge summaries using a DISCHARGED-derived checklist on MIMIC-IV data to detect missing or ambiguous documentation elements.

ECG Foundation Models and Medical LLMs for Agentic Cardiovascular Intelligence at the Edge: A Review and Outlook

eess.SP · 2026-04-02 · unverdicted · novelty 3.0

ECG foundation models for signal interpretation and medical LLMs for reasoning can be integrated into agentic systems for real-time cardiovascular intelligence on edge devices.

citing papers explorer

Showing 9 of 9 citing papers.

What Do Biomedical NER and Entity Linking Benchmarks Measure? A Corpus-Centric Diagnostic Framework cs.CL · 2026-05-19 · accept · none · ref 68
A corpus-centric framework diagnoses scale, structure, overlap, metadata, and terminology properties across nine biomedical NER/EL corpora, showing substantial differences that common statistics fail to capture.
MHGraphBench: Knowledge Graph-Grounded Benchmarking of Mental Health Knowledge in Large Language Models cs.CL · 2026-05-15 · conditional · none · ref 23
MHGraphBench is a new PrimeKG-derived benchmark that exposes a recognition-to-judgment gap in 15 LLMs on mental health tasks while stressing that results measure KG agreement under constrained interfaces, not clinical capability.
Overview of the MedHopQA track at BioCreative IX: track description, participation and evaluation of systems for multi-hop medical question answering cs.CL · 2026-05-12 · unverdicted · none · ref 48
MedHopQA introduces a 1,000-question two-hop biomedical QA benchmark where retrieval-augmented systems reach 89% conceptual accuracy, outperforming zero-shot baselines by over 20 points.
Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature? cs.CL · 2025-02-11 · unverdicted · none · ref 14
Evaluation of 22 LLMs shows they are more susceptible to spin in medical abstracts than humans but can recognize and mitigate it when prompted.
FairEnc: A Fair Vision-Language Model with Fair Vision and Text Encoders for Glaucoma Detection cs.CV · 2026-05-06 · unverdicted · none · ref 11
FairEnc reduces demographic biases in VLMs for glaucoma detection via LLM-generated synthetic text and dual-level visual debiasing while preserving diagnostic accuracy across datasets.
TheraAgent: Self-Improving Therapeutic Agent for Precise and Comprehensive Treatment Planning cs.AI · 2026-05-07 · unverdicted · none · ref 1
TheraAgent uses iterative agentic refinement with an integrated clinical judge to produce more accurate, complete, and safer treatment plans than standard LLMs.
Medical Reasoning with Large Language Models: A Survey and MR-Bench cs.CL · 2026-03-17 · accept · none · ref 63
LLMs show strong exam performance on medical tasks but exhibit a clear gap in accuracy on authentic clinical decision-making as measured by the new MR-Bench benchmark and unified evaluations.
Automated Auditing of Hospital Discharge Summaries for Care Transitions cs.AI · 2026-04-07 · unverdicted · none · ref 12
An LLM-based framework automates auditing of discharge summaries using a DISCHARGED-derived checklist on MIMIC-IV data to detect missing or ambiguous documentation elements.
ECG Foundation Models and Medical LLMs for Agentic Cardiovascular Intelligence at the Edge: A Review and Outlook eess.SP · 2026-04-02 · unverdicted · none · ref 17
ECG foundation models for signal interpretation and medical LLMs for reasoning can be integrated into agentic systems for real-time cardiovascular intelligence on edge devices.

Med42-v2: A suite of clinical llms

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer