Breaking bad tokens: Detoxification of LLMs using sparse autoencoders

Agam Goyal, Vedant Rathi, William Yeh, Yian Wang, Yuen Chen, Hari Sundaram · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

State Contamination in Memory-Augmented LLM Agents

cs.AI · 2026-05-16 · unverdicted · novelty 6.0

Toxic context can be laundered into memory summaries that stay below toxicity thresholds while still driving higher downstream toxicity in LLM agents compared to neutral baselines.

citing papers explorer

Showing 1 of 1 citing paper.

State Contamination in Memory-Augmented LLM Agents cs.AI · 2026-05-16 · unverdicted · none · ref 9
Toxic context can be laundered into memory summaries that stay below toxicity thresholds while still driving higher downstream toxicity in LLM agents compared to neutral baselines.

Breaking bad tokens: Detoxification of LLMs using sparse autoencoders

fields

years

verdicts

representative citing papers

citing papers explorer