pith. sign in

Breaking bad tokens: Detoxification of LLMs using sparse autoencoders

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.AI 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

State Contamination in Memory-Augmented LLM Agents

cs.AI · 2026-05-16 · unverdicted · novelty 6.0

Toxic context can be laundered into memory summaries that stay below toxicity thresholds while still driving higher downstream toxicity in LLM agents compared to neutral baselines.

citing papers explorer

Showing 1 of 1 citing paper.

  • State Contamination in Memory-Augmented LLM Agents cs.AI · 2026-05-16 · unverdicted · none · ref 9

    Toxic context can be laundered into memory summaries that stay below toxicity thresholds while still driving higher downstream toxicity in LLM agents compared to neutral baselines.