MedS3: Towards Medical Slow Thinking with Self-Evolved Soft Dual-sided Process Supervision.arXiv preprint arXiv:2501.12051.2025

Jiang S, Liao Y , Chen Z, Zhang Y , Wang Y , Wang Y · 2025 · arXiv 2501.12051

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

MedPRMBench: A Fine-grained Benchmark for Process Reward Models in Medical Reasoning

cs.CL · 2026-04-19 · unverdicted · novelty 8.0

MedPRMBench is the first fine-grained benchmark for process reward models in medical reasoning, featuring 6500 questions, 13000 chains, 113910 step labels, and a baseline that improves downstream QA accuracy by 3.2-6.7 points.

ReMedi: Reasoner for Medical Clinical Prediction

cs.CL · 2026-05-02 · unverdicted · novelty 5.0

ReMedi boosts LLM performance on EHR clinical predictions by up to 19.9% F1 through ground-truth-guided rationale regeneration and fine-tuning.

Medical Reasoning with Large Language Models: A Survey and MR-Bench

cs.CL · 2026-03-17 · accept · novelty 5.0

LLMs show strong exam performance on medical tasks but exhibit a clear gap in accuracy on authentic clinical decision-making as measured by the new MR-Bench benchmark and unified evaluations.

citing papers explorer

Showing 3 of 3 citing papers.

MedPRMBench: A Fine-grained Benchmark for Process Reward Models in Medical Reasoning cs.CL · 2026-04-19 · unverdicted · none · ref 8
MedPRMBench is the first fine-grained benchmark for process reward models in medical reasoning, featuring 6500 questions, 13000 chains, 113910 step labels, and a baseline that improves downstream QA accuracy by 3.2-6.7 points.
ReMedi: Reasoner for Medical Clinical Prediction cs.CL · 2026-05-02 · unverdicted · none · ref 48
ReMedi boosts LLM performance on EHR clinical predictions by up to 19.9% F1 through ground-truth-guided rationale regeneration and fine-tuning.
Medical Reasoning with Large Language Models: A Survey and MR-Bench cs.CL · 2026-03-17 · accept · none · ref 49
LLMs show strong exam performance on medical tasks but exhibit a clear gap in accuracy on authentic clinical decision-making as measured by the new MR-Bench benchmark and unified evaluations.

MedS3: Towards Medical Slow Thinking with Self-Evolved Soft Dual-sided Process Supervision.arXiv preprint arXiv:2501.12051.2025

fields

years

verdicts

representative citing papers

citing papers explorer