An algorithm audit finds that OpenAI moderation, Llama Guard, and Shield Gemma frequently flag content from real therapy sessions as undesirable.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.HC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
AI Content Moderation in Therapy Conversations
An algorithm audit finds that OpenAI moderation, Llama Guard, and Shield Gemma frequently flag content from real therapy sessions as undesirable.