arXiv preprint arXiv:2501.09213 , year=

FinemedLM-o1: Enhancing medical knowledge reasoning ability of LLM from supervised fine-tuning to test-time training , author= · arXiv 2501.09213

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation

cs.CL · 2025-07-28 · accept · novelty 8.0

MediQAl is a new French medical QA benchmark with 32k exam-sourced questions in three formats and cognitive labels, evaluated on 14 LLMs to reveal gaps between factual recall and reasoning performance.

Supervised Fine-tuning with Synthetic Rationale Data Hurts Real-World Disease Prediction

cs.AI · 2026-06-09 · conditional · novelty 6.0

Across 504 configurations on five-year ADRD prediction, rationale-based supervised fine-tuning consistently degrades performance relative to label-only fine-tuning, despite high-quality rationales validated by experts.

MixRea: Benchmarking Explicit-Implicit Reasoning in Large Language Models

cs.CL · 2026-05-19 · unverdicted · novelty 6.0

MixRea benchmark reveals LLMs achieve at most 42.8% consistency on explicit-implicit reasoning tasks, with PRCP prompting proposed to recover overlooked relations.

citing papers explorer

Showing 1 of 1 citing paper after filters.

MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation cs.CL · 2025-07-28 · accept · none · ref 17
MediQAl is a new French medical QA benchmark with 32k exam-sourced questions in three formats and cognitive labels, evaluated on 14 LLMs to reveal gaps between factual recall and reasoning performance.

arXiv preprint arXiv:2501.09213 , year=

fields

years

verdicts

representative citing papers

citing papers explorer