arXiv preprint arXiv:2404.14779 , year =

Christophe, Cl · 2006 · arXiv 2404.14779

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs

cs.LG · 2026-05-18 · conditional · novelty 8.0

Conformal Selective Acting (CSA) fills a gap in conformal methods by providing per-round, pathwise-valid selective risk bounds for adaptive RLVR LLM streams under predictable updates and isotonic calibration.

Towards the Next Frontier of LLMs, Training on Private Data: A Cross-Domain Benchmark for Federated Fine-Tuning

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

Federated PEFT on LLMs across healthcare and finance datasets performs close to centralized training and beats isolated local training under non-IID conditions.

MediEval: A Unified Medical Benchmark for Patient-Contextual and Knowledge-Grounded Reasoning in LLMs

cs.CL · 2025-12-23 · unverdicted · novelty 6.0

MediEval benchmark reveals LLM failures like hallucinated support and truth inversion in medical reasoning, while CoRFu fine-tuning raises macro-F1 by 16.4 points and removes truth inversion errors.

PrinciplismQA: A Philosophy-Grounded Approach to Assessing LLM-Human Clinical Medical Ethics Alignment

cs.CL · 2025-08-07 · unverdicted · novelty 6.0

PrinciplismQA benchmark reveals significant gaps in LLMs' clinical ethical reasoning despite high knowledge accuracy.

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

cs.CL · 2024-12-25 · unverdicted · novelty 6.0

HuatuoGPT-o1 achieves superior medical complex reasoning by using a verifier to curate reasoning trajectories for fine-tuning and then applying RL with verifier-based rewards.

citing papers explorer

Showing 3 of 3 citing papers after filters.

MediEval: A Unified Medical Benchmark for Patient-Contextual and Knowledge-Grounded Reasoning in LLMs cs.CL · 2025-12-23 · unverdicted · none · ref 1
MediEval benchmark reveals LLM failures like hallucinated support and truth inversion in medical reasoning, while CoRFu fine-tuning raises macro-F1 by 16.4 points and removes truth inversion errors.
PrinciplismQA: A Philosophy-Grounded Approach to Assessing LLM-Human Clinical Medical Ethics Alignment cs.CL · 2025-08-07 · unverdicted · none · ref 5
PrinciplismQA benchmark reveals significant gaps in LLMs' clinical ethical reasoning despite high knowledge accuracy.
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs cs.CL · 2024-12-25 · unverdicted · none · ref 58
HuatuoGPT-o1 achieves superior medical complex reasoning by using a verifier to curate reasoning trajectories for fine-tuning and then applying RL with verifier-based rewards.

arXiv preprint arXiv:2404.14779 , year =

fields

years

verdicts

representative citing papers

citing papers explorer