The Alternative Annotator Test for LLM -as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLM s

Calderon, N · 2025 · DOI 10.18653/v1/2025.acl-long.782

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open at publisher browse 5 citing papers

citation-role summary

background 2

citation-polarity summary

background 1 support 1

representative citing papers

LLMs as annotators of credibility assessment in Danish asylum decisions: evaluating classification performance and errors beyond aggregated metrics

cs.CL · 2026-05-13 · accept · novelty 7.0

LLMs can provide cost-effective annotation of credibility in Danish asylum texts but produce inconsistent errors that vary by model and prompt, requiring checks beyond single-model accuracy.

From Fallback to Frontline: When Can LLMs be Superior Annotators of Human Perspectives?

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

LLMs can be statistically superior to humans at estimating group-level judgments on subjective tasks because of their low variance and decoupled representation-processing biases.

How Hypocritical Is Your LLM judge? Listener-Speaker Asymmetries in the Pragmatic Competence of Large Language Models

cs.CL · 2026-04-17 · unverdicted · novelty 6.0

LLMs perform substantially better as pragmatic listeners judging language than as speakers generating it, revealing weak alignment between the two roles.

The Consensus Trap: Dissecting Subjectivity and the "Ground Truth" Illusion in Data Annotation

cs.AI · 2026-02-11 · unverdicted · novelty 5.0

A literature review concludes that pursuing consensus in data annotation creates biased AI by dismissing subjective disagreements and enforcing geographic hegemony, and proposes mapping diversity instead.

Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research

cs.CL · 2024-11-30 · unverdicted · novelty 2.0

This survey paper identifies opportunities for LLMs in low-resource language humanities research along with challenges in data accessibility, model adaptability, and cultural sensitivity.

citing papers explorer

Showing 5 of 5 citing papers.

LLMs as annotators of credibility assessment in Danish asylum decisions: evaluating classification performance and errors beyond aggregated metrics cs.CL · 2026-05-13 · accept · none · ref 26
LLMs can provide cost-effective annotation of credibility in Danish asylum texts but produce inconsistent errors that vary by model and prompt, requiring checks beyond single-model accuracy.
From Fallback to Frontline: When Can LLMs be Superior Annotators of Human Perspectives? cs.AI · 2026-04-20 · unverdicted · none · ref 14
LLMs can be statistically superior to humans at estimating group-level judgments on subjective tasks because of their low variance and decoupled representation-processing biases.
How Hypocritical Is Your LLM judge? Listener-Speaker Asymmetries in the Pragmatic Competence of Large Language Models cs.CL · 2026-04-17 · unverdicted · none · ref 9
LLMs perform substantially better as pragmatic listeners judging language than as speakers generating it, revealing weak alignment between the two roles.
The Consensus Trap: Dissecting Subjectivity and the "Ground Truth" Illusion in Data Annotation cs.AI · 2026-02-11 · unverdicted · none · ref 50
A literature review concludes that pursuing consensus in data annotation creates biased AI by dismissing subjective disagreements and enforcing geographic hegemony, and proposes mapping diversity instead.
Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research cs.CL · 2024-11-30 · unverdicted · none · ref 17
This survey paper identifies opportunities for LLMs in low-resource language humanities research along with challenges in data accessibility, model adaptability, and cultural sensitivity.

The Alternative Annotator Test for LLM -as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLM s

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer