arXiv preprint arXiv:2502.12671 , year=

Wang, B · 2025 · arXiv 2502.12671

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning

cs.LG · 2026-01-28 · unverdicted · novelty 6.0

LLM agents iteratively generate and optimize data processing strategies for fine-tuning, delivering over 80% win rates versus unprocessed data and 65% versus LLM-based AutoML baselines while cutting search time by up to 10x.

Active Evidence-Seeking and Diagnostic Reasoning in Large Language Models for Clinical Decision Support

cs.AI · 2026-05-21 · unverdicted · novelty 5.0

Multi-turn evidence seeking reduces LLM diagnostic accuracy by 12.75% and supporting-evidence quality by 24.36% versus full-context evaluation in a new OSCE-inspired benchmark across 468 cases and 15 models.

ReMedi: Reasoner for Medical Clinical Prediction

cs.CL · 2026-05-02 · unverdicted · novelty 5.0

ReMedi boosts LLM performance on EHR clinical predictions by up to 19.9% F1 through ground-truth-guided rationale regeneration and fine-tuning.

Medical Reasoning with Large Language Models: A Survey and MR-Bench

cs.CL · 2026-03-17 · accept · novelty 5.0

LLMs show strong exam performance on medical tasks but exhibit a clear gap in accuracy on authentic clinical decision-making as measured by the new MR-Bench benchmark and unified evaluations.

A Survey of Reinforcement Learning for Large Reasoning Models

cs.CL · 2025-09-10 · accept · novelty 3.0

A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.

citing papers explorer

Showing 5 of 5 citing papers.

LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning cs.LG · 2026-01-28 · unverdicted · none · ref 46
LLM agents iteratively generate and optimize data processing strategies for fine-tuning, delivering over 80% win rates versus unprocessed data and 65% versus LLM-based AutoML baselines while cutting search time by up to 10x.
Active Evidence-Seeking and Diagnostic Reasoning in Large Language Models for Clinical Decision Support cs.AI · 2026-05-21 · unverdicted · none · ref 44
Multi-turn evidence seeking reduces LLM diagnostic accuracy by 12.75% and supporting-evidence quality by 24.36% versus full-context evaluation in a new OSCE-inspired benchmark across 468 cases and 15 models.
ReMedi: Reasoner for Medical Clinical Prediction cs.CL · 2026-05-02 · unverdicted · none · ref 47
ReMedi boosts LLM performance on EHR clinical predictions by up to 19.9% F1 through ground-truth-guided rationale regeneration and fine-tuning.
Medical Reasoning with Large Language Models: A Survey and MR-Bench cs.CL · 2026-03-17 · accept · none · ref 42
LLMs show strong exam performance on medical tasks but exhibit a clear gap in accuracy on authentic clinical decision-making as measured by the new MR-Bench benchmark and unified evaluations.
A Survey of Reinforcement Learning for Large Reasoning Models cs.CL · 2025-09-10 · accept · none · ref 220
A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.

arXiv preprint arXiv:2502.12671 , year=

fields

years

verdicts

representative citing papers

citing papers explorer