Detoxifying Large Language Models via Knowledge Editing , booktitle =

Mengru Wang, Ningyu Zhang, Ziwen Xu, Zekun Xi, Shumin Deng, Yunzhi Yao, Qishen Zhang, Linyi Yang, Jindong Wang, Huajun Chen · 2024 · DOI 10.18653/v1/2024.acl-long.171

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

unclear 1

representative citing papers

Memory in the Age of AI Agents

cs.CL · 2025-12-15 · unverdicted · novelty 6.0

The paper maps agent memory research via three forms (token-level, parametric, latent), three functions (factual, experiential, working), and dynamics of formation/evolution/retrieval, plus benchmarks and future directions.

Where Does Toxicity Live? Mechanistic Localization and Targeted Suppression in Language Models

cs.CL · 2026-05-27 · unverdicted · novelty 5.0

Toxicity in language models is disproportionately encoded in early MLP layers and can be localized via activation differentials then suppressed at inference time without gradient descent.

A Survey of Toxicity Detection and Mitigation Strategies for Multilingual Language Models

cs.CL · 2026-06-24 · unverdicted · novelty 1.0

A survey that catalogs threat models, detection approaches, and mitigation strategies for toxicity in multilingual LLMs while identifying challenges such as uneven language coverage and culturally variable harm definitions.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Where Does Toxicity Live? Mechanistic Localization and Targeted Suppression in Language Models cs.CL · 2026-05-27 · unverdicted · none · ref 40
Toxicity in language models is disproportionately encoded in early MLP layers and can be localized via activation differentials then suppressed at inference time without gradient descent.
A Survey of Toxicity Detection and Mitigation Strategies for Multilingual Language Models cs.CL · 2026-06-24 · unverdicted · none · ref 55
A survey that catalogs threat models, detection approaches, and mitigation strategies for toxicity in multilingual LLMs while identifying challenges such as uneven language coverage and culturally variable harm definitions.

Detoxifying Large Language Models via Knowledge Editing , booktitle =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer