Preprint, arXiv:2501.09213

Finemedlm-o1: Enhancing the medical reasoning ability of llm from supervised fine-tuning to test-time training · arXiv 2501.09213

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation

cs.CL · 2025-07-28 · accept · novelty 8.0

MediQAl is a new French medical QA benchmark with 32k exam-sourced questions in three formats and cognitive labels, evaluated on 14 LLMs to reveal gaps between factual recall and reasoning performance.

MixRea: Benchmarking Explicit-Implicit Reasoning in Large Language Models

cs.CL · 2026-05-19 · unverdicted · novelty 6.0

MixRea benchmark reveals LLMs achieve at most 42.8% consistency on explicit-implicit reasoning tasks, with PRCP prompting proposed to recover overlooked relations.

citing papers explorer

Showing 2 of 2 citing papers.

MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation cs.CL · 2025-07-28 · accept · none · ref 17
MediQAl is a new French medical QA benchmark with 32k exam-sourced questions in three formats and cognitive labels, evaluated on 14 LLMs to reveal gaps between factual recall and reasoning performance.
MixRea: Benchmarking Explicit-Implicit Reasoning in Large Language Models cs.CL · 2026-05-19 · unverdicted · none · ref 14
MixRea benchmark reveals LLMs achieve at most 42.8% consistency on explicit-implicit reasoning tasks, with PRCP prompting proposed to recover overlooked relations.

Preprint, arXiv:2501.09213

fields

years

verdicts

representative citing papers

citing papers explorer