Optimus mitigates toxicity during LLM fine-tuning by combining repurposed LLM safety alignments for detection with synthetic data and DPO alignment, remaining effective even with highly biased classifiers and against attacks.
Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality https://lmsys.org/blog/2023-03-3 0-vicuna/
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Optimus: A Robust Defense Framework for Mitigating Toxicity while Fine-Tuning Conversational AI
Optimus mitigates toxicity during LLM fine-tuning by combining repurposed LLM safety alignments for detection with synthetic data and DPO alignment, remaining effective even with highly biased classifiers and against attacks.