ToxiREX is a new dataset of 128k Reddit comments in six languages with hierarchical annotations for implicit toxicity in conversational context based on an existing reasoning schema.
and Levinstein, Benjamin A
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
LLMs show structured attribute-driven decisions that a behavioral model can predict, but self-reports recover those drivers only partially, indicating superficial beliefs.
citing papers explorer
-
Superficial Beliefs in LLM Decision-Making
LLMs show structured attribute-driven decisions that a behavioral model can predict, but self-reports recover those drivers only partially, indicating superficial beliefs.