Mid-training LLMs on self-generated diverse reasoning paths improves subsequent RL performance on mathematical benchmarks and OOD tasks.
Thinktuning: Instilling cognitive reflections without distillation
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
verdicts
UNVERDICTED 2roles
background 1polarities
support 1representative citing papers
RECAP is an inference-time framework using cognitive appraisal theory to enhance emotional alignment and transparency in medical dialogue systems across model scales.
citing papers explorer
-
Mid-Training with Self-Generated Data Improves Reinforcement Learning in Language Models
Mid-training LLMs on self-generated diverse reasoning paths improves subsequent RL performance on mathematical benchmarks and OOD tasks.
-
RECAP: Transparent Inference-Time Emotion Alignment for Medical Dialogue Systems
RECAP is an inference-time framework using cognitive appraisal theory to enhance emotional alignment and transparency in medical dialogue systems across model scales.