MedicalBench is a benchmark for implicit medical concept extraction and sentence-level evidence retrieval built from MIMIC-IV discharge summaries with human verification to test LLM reasoning on unstated medical ideas.
A novel playbook for pragmatic trial operations to monitor and evaluate ambient artificial intelligence in clinical practice.NEJM AI
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
A multi-channel governance framework for a deployed ambient AI scribe achieved measurable improvements in clinician-validated performance and feedback quality through continuous rubric evaluation, live monitoring, and controlled experiments.
citing papers explorer
-
MedicalBench: Evaluating Large Language Models Toward Improved Medical Concept Extraction
MedicalBench is a benchmark for implicit medical concept extraction and sentence-level evidence retrieval built from MIMIC-IV discharge summaries with human verification to test LLM reasoning on unstated medical ideas.
-
End-to-End Evaluation and Governance of an EHR-Embedded AI Agent for Clinicians
A multi-channel governance framework for a deployed ambient AI scribe achieved measurable improvements in clinician-validated performance and feedback quality through continuous rubric evaluation, live monitoring, and controlled experiments.