Agen- tehr: Advancing autonomous clinical decision-making via retrospective summarization

Yusheng Liao, Chuan Xuan, Yutong Cai, Lina Yang, Zhe Chen, Yanfeng Wang, Yu Wang · 2026 · arXiv 2601.13918

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

PhysicianBench: Evaluating LLM Agents in Real-World EHR Environments

cs.AI · 2026-05-04 · conditional · novelty 8.0

PhysicianBench is a new benchmark of 100 physician-reviewed, execution-grounded tasks in live EHR environments where the best LLM agent reaches only 46% success and open-source models reach 19%.

ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning

cs.CL · 2026-05-19 · unverdicted · novelty 6.0

ClinSeekAgent automates active multimodal evidence seeking for clinical reasoning, improving LLM performance on raw EHR and CXR tasks while enabling distillation into smaller models.

CodeClinic: Evaluating Automation of Coding Skills for Clinical Reasoning Agents

cs.AI · 2026-05-10 · unverdicted · novelty 6.0

CodeClinic benchmark demonstrates that LLM-generated Python skill libraries from clinical guidelines enhance consistency and reduce token consumption by up to 40% compared to zero-shot approaches on MIMIC-IV based tasks.

citing papers explorer

Showing 3 of 3 citing papers.

PhysicianBench: Evaluating LLM Agents in Real-World EHR Environments cs.AI · 2026-05-04 · conditional · none · ref 20
PhysicianBench is a new benchmark of 100 physician-reviewed, execution-grounded tasks in live EHR environments where the best LLM agent reaches only 46% success and open-source models reach 19%.
ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning cs.CL · 2026-05-19 · unverdicted · none · ref 17
ClinSeekAgent automates active multimodal evidence seeking for clinical reasoning, improving LLM performance on raw EHR and CXR tasks while enabling distillation into smaller models.
CodeClinic: Evaluating Automation of Coding Skills for Clinical Reasoning Agents cs.AI · 2026-05-10 · unverdicted · none · ref 13
CodeClinic benchmark demonstrates that LLM-generated Python skill libraries from clinical guidelines enhance consistency and reduce token consumption by up to 40% compared to zero-shot approaches on MIMIC-IV based tasks.

Agen- tehr: Advancing autonomous clinical decision-making via retrospective summarization

fields

years

verdicts

representative citing papers

citing papers explorer