VOIR DIRE benchmark shows MLLM-as-a-Judge systems decompose into positivity-floor calibration failure and orientation failure on culturally contested items, with persona prompting recovering only the former.
Inherent disagreements in human textual infer- ences.Transactions of the Association for Computational Linguistics, 7:677–694
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
A framework jointly models annotator-specific NLI labels and explanations using conditioned representations and two explainer architectures, improving predictive performance over baselines.
Introduces Defensibility Index, Ambiguity Index, and Probabilistic Defensibility Signal to evaluate AI moderation decisions by logical derivability from explicit rules rather than agreement with historical labels, with validation on 193k+ Reddit cases showing 33-46.6 pp metric gaps and a Governance
Introduces PAU as a governance architecture for municipal AI in public spaces, informed by case studies on subgroup-aware scaling (R2=0.89) and pluralistic preference data that treats neutrality as indeterminacy.
citing papers explorer
-
Jury Duty: Calibration and Orientation Failures in MLLM-as-a-Judge Under Cultural Ambiguity
VOIR DIRE benchmark shows MLLM-as-a-Judge systems decompose into positivity-floor calibration failure and orientation failure on culturally contested items, with persona prompting recovering only the former.
-
Fine-Grained Perspectives: Modeling Explanations with Annotator-Specific Rationales
A framework jointly models annotator-specific NLI labels and explanations using conditioned representations and two explainer architectures, improving predictive performance over baselines.
-
Escaping the Agreement Trap: Defensibility Signals for Evaluating Rule-Governed AI
Introduces Defensibility Index, Ambiguity Index, and Probabilistic Defensibility Signal to evaluate AI moderation decisions by logical derivability from explicit rules rather than agreement with historical labels, with validation on 193k+ Reddit cases showing 33-46.6 pp metric gaps and a Governance
-
Pluralistic-Alignment Urbanism: Operationalizing a Right to AI for Inclusive Public Space
Introduces PAU as a governance architecture for municipal AI in public spaces, informed by case studies on subgroup-aware scaling (R2=0.89) and pluralistic preference data that treats neutrality as indeterminacy.