Harmful generation in LLMs relies on a compact, unified set of weights that alignment compresses and that are distinct from benign capabilities, explaining emergent misalignment.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Longitudinal analysis of Reddit posts shows human-AI romance discourse evolving from intimate personal stories to focus on platform governance, technical problems, and societal impacts.
citing papers explorer
-
Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism
Harmful generation in LLMs relies on a compact, unified set of weights that alignment compresses and that are distinct from benign capabilities, explaining emergent misalignment.
-
Technically Love: The Evolution of Human-AI Romance Discourse on Reddit
Longitudinal analysis of Reddit posts shows human-AI romance discourse evolving from intimate personal stories to focus on platform governance, technical problems, and societal impacts.