LLMs can provide cost-effective annotation of credibility in Danish asylum texts but produce inconsistent errors that vary by model and prompt, requiring checks beyond single-model accuracy.
The Alternative Annotator Test for LLM -as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLM s
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2representative citing papers
LLMs can be statistically superior to humans at estimating group-level judgments on subjective tasks because of their low variance and decoupled representation-processing biases.
LLMs perform substantially better as pragmatic listeners judging language than as speakers generating it, revealing weak alignment between the two roles.
A literature review concludes that pursuing consensus in data annotation creates biased AI by dismissing subjective disagreements and enforcing geographic hegemony, and proposes mapping diversity instead.
This survey paper identifies opportunities for LLMs in low-resource language humanities research along with challenges in data accessibility, model adaptability, and cultural sensitivity.
citing papers explorer
-
LLMs as annotators of credibility assessment in Danish asylum decisions: evaluating classification performance and errors beyond aggregated metrics
LLMs can provide cost-effective annotation of credibility in Danish asylum texts but produce inconsistent errors that vary by model and prompt, requiring checks beyond single-model accuracy.
-
From Fallback to Frontline: When Can LLMs be Superior Annotators of Human Perspectives?
LLMs can be statistically superior to humans at estimating group-level judgments on subjective tasks because of their low variance and decoupled representation-processing biases.
-
How Hypocritical Is Your LLM judge? Listener-Speaker Asymmetries in the Pragmatic Competence of Large Language Models
LLMs perform substantially better as pragmatic listeners judging language than as speakers generating it, revealing weak alignment between the two roles.
-
The Consensus Trap: Dissecting Subjectivity and the "Ground Truth" Illusion in Data Annotation
A literature review concludes that pursuing consensus in data annotation creates biased AI by dismissing subjective disagreements and enforcing geographic hegemony, and proposes mapping diversity instead.
-
Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research
This survey paper identifies opportunities for LLMs in low-resource language humanities research along with challenges in data accessibility, model adaptability, and cultural sensitivity.