You Only Prompt Once: On the Capabilities of Prompt Learning on Large Lan- guage Models to Tackle Toxic Content

Xinlei He, Savvas Zannettou, Yun Shen, Yang Zhang · 2023 · arXiv 2308.05596

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Teaching LLMs Human-Like Editing of Inappropriate Argumentation via Reinforcement Learning

cs.CL · 2026-04-14 · unverdicted · novelty 7.0

Reinforcement learning with a multi-part reward teaches LLMs to output independent, meaning-preserving sentence edits that raise argument appropriateness close to full rewriting.

Optimus: A Robust Defense Framework for Mitigating Toxicity while Fine-Tuning Conversational AI

cs.CR · 2025-07-08 · unverdicted · novelty 6.0

Optimus mitigates toxicity during LLM fine-tuning by combining repurposed LLM safety alignments for detection with synthetic data and DPO alignment, remaining effective even with highly biased classifiers and against attacks.

citing papers explorer

Showing 2 of 2 citing papers.

Teaching LLMs Human-Like Editing of Inappropriate Argumentation via Reinforcement Learning cs.CL · 2026-04-14 · unverdicted · none · ref 13
Reinforcement learning with a multi-part reward teaches LLMs to output independent, meaning-preserving sentence edits that raise argument appropriateness close to full rewriting.
Optimus: A Robust Defense Framework for Mitigating Toxicity while Fine-Tuning Conversational AI cs.CR · 2025-07-08 · unverdicted · none · ref 29
Optimus mitigates toxicity during LLM fine-tuning by combining repurposed LLM safety alignments for detection with synthetic data and DPO alignment, remaining effective even with highly biased classifiers and against attacks.

You Only Prompt Once: On the Capabilities of Prompt Learning on Large Lan- guage Models to Tackle Toxic Content

fields

years

verdicts

representative citing papers

citing papers explorer