MedPRMBench is the first fine-grained benchmark for process reward models in medical reasoning, featuring 6500 questions, 13000 chains, 113910 step labels, and a baseline that improves downstream QA accuracy by 3.2-6.7 points.
MedS3: Towards Medical Slow Thinking with Self-Evolved Soft Dual-sided Process Supervision.arXiv preprint arXiv:2501.12051.2025
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 3years
2026 3representative citing papers
ReMedi boosts LLM performance on EHR clinical predictions by up to 19.9% F1 through ground-truth-guided rationale regeneration and fine-tuning.
LLMs show strong exam performance on medical tasks but exhibit a clear gap in accuracy on authentic clinical decision-making as measured by the new MR-Bench benchmark and unified evaluations.
citing papers explorer
-
MedPRMBench: A Fine-grained Benchmark for Process Reward Models in Medical Reasoning
MedPRMBench is the first fine-grained benchmark for process reward models in medical reasoning, featuring 6500 questions, 13000 chains, 113910 step labels, and a baseline that improves downstream QA accuracy by 3.2-6.7 points.
-
ReMedi: Reasoner for Medical Clinical Prediction
ReMedi boosts LLM performance on EHR clinical predictions by up to 19.9% F1 through ground-truth-guided rationale regeneration and fine-tuning.
-
Medical Reasoning with Large Language Models: A Survey and MR-Bench
LLMs show strong exam performance on medical tasks but exhibit a clear gap in accuracy on authentic clinical decision-making as measured by the new MR-Bench benchmark and unified evaluations.