WhoSaidIt: Human-LLM Collaborative Annotation for Text-Based Multilingual Speaker-Attribute Classification

David Smith; Jackie Lee; Lingyu Gao; Meghan Jemison; Will Monroe

arxiv: 2605.26070 · v1 · pith:2MLGVM52new · submitted 2026-05-25 · 💻 cs.CL

WhoSaidIt: Human-LLM Collaborative Annotation for Text-Based Multilingual Speaker-Attribute Classification

Lingyu Gao , Will Monroe , David Smith , Meghan Jemison , Jackie Lee This is my paper

classification 💻 cs.CL

keywords multilingualspeaker-attributeannotationllmsclassificationcollaborativeframeworklabels

0 comments

read the original abstract

Annotating speaker attributes from text is inherently ambiguous, particularly in multilingual settings where demographic and social cues are implicit and culturally variable. We propose a human-large language model (LLM) collaborative re-annotation framework for stabilizing multilingual speaker-attribute labels under practical resource constraints. Starting from a noisy corpus, we use LLMs to surface recurring annotation rationales through iterative interaction with experts, and apply disagreement-focused sampling for targeted re-annotation. Using this framework, we construct WhoSaidIt, a multilingual dataset covering nine speaker-attribute labels. We quantify divergence between original and revised annotations, benchmark recent LLMs, and analyze the effect of explicit rationales on model behavior. Our results reveal substantial cross-lingual differences in annotation decisions and demonstrate both the strengths and limitations of LLMs in speaker-attribute classification.

This paper has not been read by Pith yet.

WhoSaidIt: Human-LLM Collaborative Annotation for Text-Based Multilingual Speaker-Attribute Classification

discussion (0)