Llama Guard is an instruction-tuned Llama2-7b model that performs multi-class safety classification on prompts and responses, matching or exceeding existing moderation tools on benchmarks while supporting taxonomy customization.
Exploring social bias in chatbots using stereotype knowledge
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2023 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
Llama Guard is an instruction-tuned Llama2-7b model that performs multi-class safety classification on prompts and responses, matching or exceeding existing moderation tools on benchmarks while supporting taxonomy customization.