hub

arXiv preprint arXiv:2504.00993 (2025)

Wu, J · 2025 · arXiv 2504.00993

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

read on arXiv browse 13 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

EpiGraph: Building Generalists for Evidence-Intensive Epilepsy Reasoning in the Wild

cs.AI · 2026-05-10 · conditional · novelty 7.0 · 2 refs

EpiGraph creates a heterogeneous epilepsy knowledge graph that boosts LLM performance on clinical reasoning tasks by 30-41% in pharmacogenomics when used with Graph-RAG.

ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning

cs.CL · 2026-05-19 · unverdicted · novelty 6.0

ClinSeekAgent automates active multimodal evidence seeking for clinical reasoning, improving LLM performance on raw EHR and CXR tasks while enabling distillation into smaller models.

RadThinking: A Dataset for Longitudinal Clinical Reasoning in Radiology

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

RadThinking releases a large longitudinal CT VQA dataset stratified into foundation perception questions, single-rule reasoning questions, and compositional multi-step chains grounded in clinical reporting standards for cancer screening.

CLR-voyance: Reinforcing Open-Ended Reasoning for Inpatient Clinical Decision Support with Outcome-Aware Rubrics

cs.CL · 2026-05-10 · unverdicted · novelty 6.0

CLR-voyance reformulates inpatient reasoning as POMDP with clinician-validated outcome rubrics, yielding an 8B model that outperforms larger frontier models on the authors' new benchmark.

BioMedArena: An Open-source Toolkit for Building and Evaluating Biomedical Deep Research Agents

cs.AI · 2026-05-07 · conditional · novelty 6.0

BioMedArena releases a standardized toolkit with 147 biomedical benchmarks, 75 tools, and six harnesses that achieve SOTA results on eight tasks with a +15.03 percentage point average lift.

Improving Clinical Diagnosis with Counterfactual Multi-Agent Reasoning

cs.CL · 2026-03-29 · unverdicted · novelty 6.0

A new counterfactual multi-agent framework improves LLM diagnostic accuracy by quantifying confidence shifts from edited clinical findings and guiding specialist discussions.

Structured In-context Environment Scaling for Large Language Model Reasoning

cs.CL · 2025-09-27 · conditional · novelty 6.0

SIE framework automatically constructs scalable, verifiable reasoning environments from structured data, improving in-domain performance and enabling generalization to out-of-domain math and logic tasks.

Query-Conditioned Knowledge Alignment for Reliable Cross-System Medical Reasoning

cs.AI · 2026-05-18 · conditional · novelty 5.0

QCEA reformulates entity alignment as a query-conditioned ranking task with semantic encoding, graph learning, and direction-aware transformation to handle context-dependent, asymmetric correspondences in medical knowledge graphs.

ReMedi: Reasoner for Medical Clinical Prediction

cs.CL · 2026-05-02 · unverdicted · novelty 5.0

ReMedi boosts LLM performance on EHR clinical predictions by up to 19.9% F1 through ground-truth-guided rationale regeneration and fine-tuning.

MedSynapse-V: Bridging Visual Perception and Clinical Intuition via Latent Memory Evolution

cs.CV · 2026-04-29 · unverdicted · novelty 5.0 · 2 refs

MedSynapse-V proposes a latent diagnostic memory evolution framework using Meta Query, Causal Counterfactual Refinement, and Intrinsic Memory Transition to improve medical VLM diagnostic accuracy over chain-of-thought methods.

VeriLLMed: Interactive Visual Debugging of Medical Large Language Models with Knowledge Graphs

cs.CL · 2026-04-25 · unverdicted · novelty 5.0

VeriLLMed is an interactive visual debugging tool that maps LLM diagnostic reasoning to knowledge graphs to identify and categorize relation, branch, and missing errors.

DeepER-Med: Advancing Deep Evidence-Based Research in Medicine Through Agentic AI

cs.AI · 2026-04-16 · unverdicted · novelty 5.0

DeepER-Med introduces a three-module agentic AI workflow for evidence-based medical research that outperforms production platforms on a new expert-curated dataset of 100 questions and matches clinical recommendations in seven of eight real-world cases.

Medical Reasoning with Large Language Models: A Survey and MR-Bench

cs.CL · 2026-03-17 · accept · novelty 5.0

LLMs show strong exam performance on medical tasks but exhibit a clear gap in accuracy on authentic clinical decision-making as measured by the new MR-Bench benchmark and unified evaluations.

citing papers explorer

Showing 13 of 13 citing papers.

EpiGraph: Building Generalists for Evidence-Intensive Epilepsy Reasoning in the Wild cs.AI · 2026-05-10 · conditional · none · ref 19 · 2 links
EpiGraph creates a heterogeneous epilepsy knowledge graph that boosts LLM performance on clinical reasoning tasks by 30-41% in pharmacogenomics when used with Graph-RAG.
ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning cs.CL · 2026-05-19 · unverdicted · none · ref 1
ClinSeekAgent automates active multimodal evidence seeking for clinical reasoning, improving LLM performance on raw EHR and CXR tasks while enabling distillation into smaller models.
RadThinking: A Dataset for Longitudinal Clinical Reasoning in Radiology cs.CV · 2026-05-11 · unverdicted · none · ref 109
RadThinking releases a large longitudinal CT VQA dataset stratified into foundation perception questions, single-rule reasoning questions, and compositional multi-step chains grounded in clinical reporting standards for cancer screening.
CLR-voyance: Reinforcing Open-Ended Reasoning for Inpatient Clinical Decision Support with Outcome-Aware Rubrics cs.CL · 2026-05-10 · unverdicted · none · ref 44
CLR-voyance reformulates inpatient reasoning as POMDP with clinician-validated outcome rubrics, yielding an 8B model that outperforms larger frontier models on the authors' new benchmark.
BioMedArena: An Open-source Toolkit for Building and Evaluating Biomedical Deep Research Agents cs.AI · 2026-05-07 · conditional · none · ref 23
BioMedArena releases a standardized toolkit with 147 biomedical benchmarks, 75 tools, and six harnesses that achieve SOTA results on eight tasks with a +15.03 percentage point average lift.
Improving Clinical Diagnosis with Counterfactual Multi-Agent Reasoning cs.CL · 2026-03-29 · unverdicted · none · ref 25
A new counterfactual multi-agent framework improves LLM diagnostic accuracy by quantifying confidence shifts from edited clinical findings and guiding specialist discussions.
Structured In-context Environment Scaling for Large Language Model Reasoning cs.CL · 2025-09-27 · conditional · none · ref 23
SIE framework automatically constructs scalable, verifiable reasoning environments from structured data, improving in-domain performance and enabling generalization to out-of-domain math and logic tasks.
Query-Conditioned Knowledge Alignment for Reliable Cross-System Medical Reasoning cs.AI · 2026-05-18 · conditional · none · ref 32
QCEA reformulates entity alignment as a query-conditioned ranking task with semantic encoding, graph learning, and direction-aware transformation to handle context-dependent, asymmetric correspondences in medical knowledge graphs.
ReMedi: Reasoner for Medical Clinical Prediction cs.CL · 2026-05-02 · unverdicted · none · ref 34
ReMedi boosts LLM performance on EHR clinical predictions by up to 19.9% F1 through ground-truth-guided rationale regeneration and fine-tuning.
MedSynapse-V: Bridging Visual Perception and Clinical Intuition via Latent Memory Evolution cs.CV · 2026-04-29 · unverdicted · none · ref 52 · 2 links
MedSynapse-V proposes a latent diagnostic memory evolution framework using Meta Query, Causal Counterfactual Refinement, and Intrinsic Memory Transition to improve medical VLM diagnostic accuracy over chain-of-thought methods.
VeriLLMed: Interactive Visual Debugging of Medical Large Language Models with Knowledge Graphs cs.CL · 2026-04-25 · unverdicted · none · ref 64
VeriLLMed is an interactive visual debugging tool that maps LLM diagnostic reasoning to knowledge graphs to identify and categorize relation, branch, and missing errors.
DeepER-Med: Advancing Deep Evidence-Based Research in Medicine Through Agentic AI cs.AI · 2026-04-16 · unverdicted · none · ref 32
DeepER-Med introduces a three-module agentic AI workflow for evidence-based medical research that outperforms production platforms on a new expert-curated dataset of 100 questions and matches clinical recommendations in seven of eight real-world cases.
Medical Reasoning with Large Language Models: A Survey and MR-Bench cs.CL · 2026-03-17 · accept · none · ref 24
LLMs show strong exam performance on medical tasks but exhibit a clear gap in accuracy on authentic clinical decision-making as measured by the new MR-Bench benchmark and unified evaluations.

arXiv preprint arXiv:2504.00993 (2025)

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer