Mitigating the safety-utility trade-off in llm alignment via adaptive safe context learning, 2026 c

Yanbo Wang, Minzheng Wang, Jian Liang, Lu Wang, Yongcan Yu, Ran He · 2026 · arXiv 2602.13562

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space

cs.LG · 2026-04-15 · unverdicted · novelty 6.0

PreRL applies reward-driven updates to P(y) in pre-train space, uses Negative Sample Reinforcement to prune bad reasoning paths and boost reflection, and combines with standard RL in Dual Space RL to outperform baselines on reasoning tasks.

citing papers explorer

Showing 1 of 1 citing paper.

From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space cs.LG · 2026-04-15 · unverdicted · none · ref 60
PreRL applies reward-driven updates to P(y) in pre-train space, uses Negative Sample Reinforcement to prune bad reasoning paths and boost reflection, and combines with standard RL in Dual Space RL to outperform baselines on reasoning tasks.

Mitigating the safety-utility trade-off in llm alignment via adaptive safe context learning, 2026 c

fields

years

verdicts

representative citing papers

citing papers explorer