InProceedings of the 2020 CHI conference on human factors in computing systems, pages 1–13

Zheng Hui, Zhaoxiao Guo, Hang Zhao, Juanyong Duan, Lin Ai, Yinheng Li, Julia Hirschberg, Congrui Huang · 2024 · arXiv 2411.15175

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

Where Does Toxicity Live? Mechanistic Localization and Targeted Suppression in Language Models

cs.CL · 2026-05-27 · unverdicted · novelty 5.0

Toxicity in language models is disproportionately encoded in early MLP layers and can be localized via activation differentials then suppressed at inference time without gradient descent.

Beyond Static Benchmarks: Synthesizing Harmful Content via Persona-based Simulation for Robust Evaluation

cs.CL · 2026-04-18 · unverdicted · novelty 5.0

A two-dimensional persona simulation framework generates harmful content that is more challenging to detect and comparably diverse to human-curated datasets for robust evaluation of detection systems.

citing papers explorer

Showing 2 of 2 citing papers.

Where Does Toxicity Live? Mechanistic Localization and Targeted Suppression in Language Models cs.CL · 2026-05-27 · unverdicted · none · ref 16
Toxicity in language models is disproportionately encoded in early MLP layers and can be localized via activation differentials then suppressed at inference time without gradient descent.
Beyond Static Benchmarks: Synthesizing Harmful Content via Persona-based Simulation for Robust Evaluation cs.CL · 2026-04-18 · unverdicted · none · ref 3
A two-dimensional persona simulation framework generates harmful content that is more challenging to detect and comparably diverse to human-curated datasets for robust evaluation of detection systems.

InProceedings of the 2020 CHI conference on human factors in computing systems, pages 1–13

fields

years

verdicts

representative citing papers

citing papers explorer