pith. sign in

Title resolution pending

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

citation-role summary

method 1

citation-polarity summary

fields

cs.AI 1

years

2026 1

verdicts

UNVERDICTED 1

roles

method 1

polarities

use method 1

representative citing papers

Why Do Large Language Models Generate Harmful Content?

cs.AI · 2026-04-13 · unverdicted · novelty 6.0

Causal mediation analysis shows harmful LLM outputs arise in late layers from MLP failures and gating neurons, with early layers handling harm context detection and signal propagation.

citing papers explorer

Showing 1 of 1 citing paper.

  • Why Do Large Language Models Generate Harmful Content? cs.AI · 2026-04-13 · unverdicted · none · ref 26

    Causal mediation analysis shows harmful LLM outputs arise in late layers from MLP failures and gating neurons, with early layers handling harm context detection and signal propagation.