The paper maps agent memory research via three forms (token-level, parametric, latent), three functions (factual, experiential, working), and dynamics of formation/evolution/retrieval, plus benchmarks and future directions.
Detoxifying Large Language Models via Knowledge Editing , booktitle =
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CL 3verdicts
UNVERDICTED 3roles
background 1polarities
unclear 1representative citing papers
Toxicity in language models is disproportionately encoded in early MLP layers and can be localized via activation differentials then suppressed at inference time without gradient descent.
A survey that catalogs threat models, detection approaches, and mitigation strategies for toxicity in multilingual LLMs while identifying challenges such as uneven language coverage and culturally variable harm definitions.
citing papers explorer
-
Where Does Toxicity Live? Mechanistic Localization and Targeted Suppression in Language Models
Toxicity in language models is disproportionately encoded in early MLP layers and can be localized via activation differentials then suppressed at inference time without gradient descent.
-
A Survey of Toxicity Detection and Mitigation Strategies for Multilingual Language Models
A survey that catalogs threat models, detection approaches, and mitigation strategies for toxicity in multilingual LLMs while identifying challenges such as uneven language coverage and culturally variable harm definitions.