LLMs correct only 34.8% of zero-shot annotation errors via prompting, and Definition-Specific Familiarity correlates positively with performance (partial r = +0.41) while memorization metrics do not.
S em E val-2016 task 6: Detecting stance in tweets
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 4verdicts
UNVERDICTED 4representative citing papers
ContentFuzz rewrites posts with LLM guidance from stance model confidence to flip machine labels without altering human intent, tested across four models and three datasets in two languages.
Introduces the first interpersonal emotion dataset from congressional tweets and demonstrates that joint neural modeling of interpersonal group relationships and emotions yields performance gains on both.
Fine-tuned RoBERTa achieves 0.62 macro-F1 on 900 Reddit comments, outperforming best zero-shot LLM at 0.50, with largest gap on detecting belief propagation.
citing papers explorer
-
On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance
LLMs correct only 34.8% of zero-shot annotation errors via prompting, and Definition-Specific Familiarity correlates positively with performance (partial r = +0.41) while memorization metrics do not.
-
Content Fuzzing for Escaping Information Cocoons on Digital Social Media
ContentFuzz rewrites posts with LLM guidance from stance model confidence to flip machine labels without altering human intent, tested across four models and three datasets in two languages.
-
How people talk about each other: Modeling Generalized Intergroup Bias and Emotion
Introduces the first interpersonal emotion dataset from congressional tweets and demonstrates that joint neural modeling of interpersonal group relationships and emotions yields performance gains on both.
-
Long Live Fine-Tuning: Task-Specific Transformers Outperform Zero-Shot LLMs for Misinformation Response Classification on Reddit
Fine-tuned RoBERTa achieves 0.62 macro-F1 on 900 Reddit comments, outperforming best zero-shot LLM at 0.50, with largest gap on detecting belief propagation.