Proceedings of the ART of Safety: Workshop on Adversarial Testing and Red-Teaming for Generative AI , month = nov, year =

· 2023 · DOI 10.18653/v1/2023.artofsafety-1.2

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open at publisher browse 1 citing papers

representative citing papers

Low-Agreeableness Persona Conditioning for Safe LLM Fine-Tuning

cs.CL · 2026-06-26 · unverdicted · novelty 5.0

Low-agreeableness persona conditioning in fine-tuning data reduces jailbreak susceptibility and harmful outputs in warm LLMs while preserving conversational warmth.

citing papers explorer

Showing 1 of 1 citing paper.

Low-Agreeableness Persona Conditioning for Safe LLM Fine-Tuning cs.CL · 2026-06-26 · unverdicted · none · ref 25
Low-agreeableness persona conditioning in fine-tuning data reduces jailbreak susceptibility and harmful outputs in warm LLMs while preserving conversational warmth.

Proceedings of the ART of Safety: Workshop on Adversarial Testing and Red-Teaming for Generative AI , month = nov, year =

fields

years

verdicts

representative citing papers

citing papers explorer