Qa-lign: Aligning llms through constitutionally decomposed qa.arXiv preprint arXiv:2506.08123

Jacob Dineen, Aswin RRV, Qin Liu, Zhikun Xu, Xiao Ye, Ming Shen, Zhaonan Li, Shijie Lu, Chitta Baral, Muhao Chen, et al · 2025 · arXiv 2506.08123

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution

cs.CL · 2026-04-03 · unverdicted · novelty 6.0

Vocabulary dropout prevents diversity collapse in LLM co-evolution by masking proposer logits, yielding average +4.4 point solver gains on mathematical reasoning benchmarks at 8B scale.

Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains

cs.LG · 2025-07-23 · unverdicted · novelty 6.0

RaR uses aggregated rubric feedback as rewards in on-policy RL, delivering up to 31% relative gains on HealthBench and 7% on GPQA-Diamond versus direct Likert LLM-as-judge baselines.

RECAP: Transparent Inference-Time Emotion Alignment for Medical Dialogue Systems

cs.CL · 2025-09-12 · unverdicted · novelty 5.0

RECAP is an inference-time framework using cognitive appraisal theory to enhance emotional alignment and transparency in medical dialogue systems across model scales.

A Survey of Reinforcement Learning for Large Reasoning Models

cs.CL · 2025-09-10 · accept · novelty 3.0

A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.

citing papers explorer

Showing 4 of 4 citing papers.

Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution cs.CL · 2026-04-03 · unverdicted · none · ref 8
Vocabulary dropout prevents diversity collapse in LLM co-evolution by masking proposer logits, yielding average +4.4 point solver gains on mathematical reasoning benchmarks at 8B scale.
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains cs.LG · 2025-07-23 · unverdicted · none · ref 7
RaR uses aggregated rubric feedback as rewards in on-policy RL, delivering up to 31% relative gains on HealthBench and 7% on GPQA-Diamond versus direct Likert LLM-as-judge baselines.
RECAP: Transparent Inference-Time Emotion Alignment for Medical Dialogue Systems cs.CL · 2025-09-12 · unverdicted · none · ref 16
RECAP is an inference-time framework using cognitive appraisal theory to enhance emotional alignment and transparency in medical dialogue systems across model scales.
A Survey of Reinforcement Learning for Large Reasoning Models cs.CL · 2025-09-10 · accept · none · ref 108
A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.

Qa-lign: Aligning llms through constitutionally decomposed qa.arXiv preprint arXiv:2506.08123

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer