LLMs Capture Emotion Labels, Not Emotion Uncertainty: Distributional Analysis and Calibration of Human-LLM Judgment Gaps

· 2026 · cs.CL · arXiv 2604.27345

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Human annotators frequently disagree on emotion labels, yet most evaluations of Large Language Model (LLM) emotion annotation collapse these judgments into a single gold standard, discarding the distributional information that disagreement encodes. We ask whether LLMs capture the structure of this disagreement, not just majority labels, by comparing emotion judgment distributions between human annotators and four zero-shot LLMs, plus a fine-tuned RoBERTa baseline, across two complementary benchmarks: GoEmotions and EmoBank, totaling 640,000 LLM responses. Zero-shot models diverge substantially from human distributions, and in-domain fine-tuning, not model scale, is required to close the gap. We formalize a lexical-grounding gradient through a quantitative transparency score that predicts per-category human--LLM agreement: LLMs reliably capture emotions with explicit lexical markers but systematically fail on pragmatically complex emotions requiring contextual inference, a pattern that replicates across both categorical and continuous emotion frameworks. We further propose three lightweight post-hoc calibration methods that reduce the distributional gap by up to 14\%, and provide actionable guidelines for when LLM emotion annotations can, and cannot, substitute for human labeling.

representative citing papers

Interpretable Uncertainty Routing Separating Emotion Ambiguity from Distribution Shift in Facial Expression Recognition

cs.CV · 2026-06-21 · unverdicted · novelty 6.0

Uncertainty decomposition via deep ensembles separates annotator disagreement from distribution shift in FER, enabling a routing mechanism that retains 1.8x more ambiguous faces at matched OOD rejection compared to single-uncertainty baselines.

citing papers explorer

Showing 1 of 1 citing paper.

Interpretable Uncertainty Routing Separating Emotion Ambiguity from Distribution Shift in Facial Expression Recognition cs.CV · 2026-06-21 · unverdicted · none · ref 15 · internal anchor
Uncertainty decomposition via deep ensembles separates annotator disagreement from distribution shift in FER, enabling a routing mechanism that retains 1.8x more ambiguous faces at matched OOD rejection compared to single-uncertainty baselines.

LLMs Capture Emotion Labels, Not Emotion Uncertainty: Distributional Analysis and Calibration of Human-LLM Judgment Gaps

fields

years

verdicts

representative citing papers

citing papers explorer