Mixed-objective reward models underperform single-objective ones because shared neurons support one objective while negatively affecting the other, creating alignment tension.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Synthesizes literature into a four-stage lifecycle framework for cyberbullying governance from detection through proactive intervention.
citing papers explorer
-
Understanding helpfulness and harmless tension in reward models
Mixed-objective reward models underperform single-objective ones because shared neurons support one objective while negatively affecting the other, creating alignment tension.
-
Cyberbullying Governance on Social Media: A Unified Framework from Content Identification to Intervention
Synthesizes literature into a four-stage lifecycle framework for cyberbullying governance from detection through proactive intervention.