MultiSoc-4D benchmark shows LLMs annotating Bengali social media exhibit instruction-induced label collapse, preferring fallback labels and missing 79% of hate speech and 75% of sarcasm instances despite high agreement but near-zero kappa.
For transformer based mono and multi-lingual models performance are pre- sented in Table 13
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
MultiSoc-4D: A Benchmark for Diagnosing Instruction-Induced Label Collapse in Closed-Set LLM Annotation of Bengali Social Media
MultiSoc-4D benchmark shows LLMs annotating Bengali social media exhibit instruction-induced label collapse, preferring fallback labels and missing 79% of hate speech and 75% of sarcasm instances despite high agreement but near-zero kappa.