npj Digital Medicine , year=

Evaluation of causal reasoning for large language models in contextualized clinical scenarios of laboratory test interpretation , author= · DOI 10.1038/s41746-026-02632-3

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open at publisher browse 1 citing papers

representative citing papers

CLExEval: A Human-in-the-Loop Framework for Qualitative Evaluation of LLM Clinical Reasoning

cs.CL · 2026-06-30 · unverdicted · novelty 6.0

CLExEval introduces a human-annotated evaluation framework on 40 rare cases that identifies verbosity bias, hidden knowledge paradox, and 68.6% reasoning-to-output mismatch in LLMs while showing LLM-as-a-Judge overestimates reliability.

citing papers explorer

Showing 1 of 1 citing paper.

CLExEval: A Human-in-the-Loop Framework for Qualitative Evaluation of LLM Clinical Reasoning cs.CL · 2026-06-30 · unverdicted · none · ref 19
CLExEval introduces a human-annotated evaluation framework on 40 rare cases that identifies verbosity bias, hidden knowledge paradox, and 68.6% reasoning-to-output mismatch in LLMs while showing LLM-as-a-Judge overestimates reliability.

npj Digital Medicine , year=

fields

years

verdicts

representative citing papers

citing papers explorer