Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , month = jul, year =

Mitigating Toxic Degeneration with Empathetic Data: Exploring the Relationship Between Toxicity · 2022 · DOI 10.18653/v1/2022.naacl-main.363

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open at publisher browse 1 citing papers

representative citing papers

Low-Agreeableness Persona Conditioning for Safe LLM Fine-Tuning

cs.CL · 2026-06-26 · unverdicted · novelty 5.0

Low-agreeableness persona conditioning in fine-tuning data reduces jailbreak susceptibility and harmful outputs in warm LLMs while preserving conversational warmth.

citing papers explorer

Showing 1 of 1 citing paper.

Low-Agreeableness Persona Conditioning for Safe LLM Fine-Tuning cs.CL · 2026-06-26 · unverdicted · none · ref 12
Low-agreeableness persona conditioning in fine-tuning data reduces jailbreak susceptibility and harmful outputs in warm LLMs while preserving conversational warmth.

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , month = jul, year =

fields

years

verdicts

representative citing papers

citing papers explorer