Conformal Selective Acting (CSA) fills a gap in conformal methods by providing per-round, pathwise-valid selective risk bounds for adaptive RLVR LLM streams under predictable updates and isotonic calibration.
arXiv preprint arXiv:2404.14779 , year =
5 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Federated PEFT on LLMs across healthcare and finance datasets performs close to centralized training and beats isolated local training under non-IID conditions.
MediEval benchmark reveals LLM failures like hallucinated support and truth inversion in medical reasoning, while CoRFu fine-tuning raises macro-F1 by 16.4 points and removes truth inversion errors.
PrinciplismQA benchmark reveals significant gaps in LLMs' clinical ethical reasoning despite high knowledge accuracy.
HuatuoGPT-o1 achieves superior medical complex reasoning by using a verifier to curate reasoning trajectories for fine-tuning and then applying RL with verifier-based rewards.
citing papers explorer
-
MediEval: A Unified Medical Benchmark for Patient-Contextual and Knowledge-Grounded Reasoning in LLMs
MediEval benchmark reveals LLM failures like hallucinated support and truth inversion in medical reasoning, while CoRFu fine-tuning raises macro-F1 by 16.4 points and removes truth inversion errors.
-
PrinciplismQA: A Philosophy-Grounded Approach to Assessing LLM-Human Clinical Medical Ethics Alignment
PrinciplismQA benchmark reveals significant gaps in LLMs' clinical ethical reasoning despite high knowledge accuracy.
-
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
HuatuoGPT-o1 achieves superior medical complex reasoning by using a verifier to curate reasoning trajectories for fine-tuning and then applying RL with verifier-based rewards.