Empirical comparison of four NLI checkers as process rewards in GRPO-trained medical RAG shows log-prob scoring collapses to neutral labels while moderate local classifiers improve BERTScore without reward hacking.
Medbiolm: Optimizing medical and biological qa with fine-tuned large language models and retrieval-augmented generation
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
CLIN-LLM combines uncertainty-calibrated BioBERT classification with retrieval-augmented FLAN-T5 generation and safety post-processing to reach 98% accuracy on clinical cases while cutting unsafe antibiotic suggestions by 67%.
A fine-tuned LLM called Perovskite-R1, built from curated perovskite literature and material libraries, proposes precursor additives and designs with some experimental validation showing improved stability and performance.
citing papers explorer
-
What Makes a Medical Checker Trainable? Diagnosing Signal Collapse and Reward Hacking in Checker-Guided RAG for Biomedical QA
Empirical comparison of four NLI checkers as process rewards in GRPO-trained medical RAG shows log-prob scoring collapses to neutral labels while moderate local classifiers improve BERTScore without reward hacking.
-
CLIN-LLM: A Safety-Constrained Hybrid Framework for Clinical Diagnosis and Treatment Generation
CLIN-LLM combines uncertainty-calibrated BioBERT classification with retrieval-augmented FLAN-T5 generation and safety post-processing to reach 98% accuracy on clinical cases while cutting unsafe antibiotic suggestions by 67%.
-
Perovskite-R1: a domain-specialized large language model for intelligent discovery of precursor additives and experimental design
A fine-tuned LLM called Perovskite-R1, built from curated perovskite literature and material libraries, proposes precursor additives and designs with some experimental validation showing improved stability and performance.