Inherent disagreements in human textual infer- ences.Transactions of the Association for Computational Linguistics, 7:677–694

Ellie Pavlick, Tom Kwiatkowski · 2019 · DOI 10.1162/tacl_a_00293

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

representative citing papers

Jury Duty: Calibration and Orientation Failures in MLLM-as-a-Judge Under Cultural Ambiguity

cs.CV · 2026-06-12 · unverdicted · novelty 7.0

VOIR DIRE benchmark shows MLLM-as-a-Judge systems decompose into positivity-floor calibration failure and orientation failure on culturally contested items, with persona prompting recovering only the former.

Fine-Grained Perspectives: Modeling Explanations with Annotator-Specific Rationales

cs.CL · 2026-04-23 · unverdicted · novelty 7.0

A framework jointly models annotator-specific NLI labels and explanations using conditioned representations and two explainer architectures, improving predictive performance over baselines.

Escaping the Agreement Trap: Defensibility Signals for Evaluating Rule-Governed AI

cs.AI · 2026-04-22 · unverdicted · novelty 7.0

Introduces Defensibility Index, Ambiguity Index, and Probabilistic Defensibility Signal to evaluate AI moderation decisions by logical derivability from explicit rules rather than agreement with historical labels, with validation on 193k+ Reddit cases showing 33-46.6 pp metric gaps and a Governance

Pluralistic-Alignment Urbanism: Operationalizing a Right to AI for Inclusive Public Space

cs.CY · 2026-05-15 · unverdicted · novelty 4.0

Introduces PAU as a governance architecture for municipal AI in public spaces, informed by case studies on subgroup-aware scaling (R2=0.89) and pluralistic preference data that treats neutrality as indeterminacy.

citing papers explorer

Showing 4 of 4 citing papers after filters.

Jury Duty: Calibration and Orientation Failures in MLLM-as-a-Judge Under Cultural Ambiguity cs.CV · 2026-06-12 · unverdicted · none · ref 16
VOIR DIRE benchmark shows MLLM-as-a-Judge systems decompose into positivity-floor calibration failure and orientation failure on culturally contested items, with persona prompting recovering only the former.
Fine-Grained Perspectives: Modeling Explanations with Annotator-Specific Rationales cs.CL · 2026-04-23 · unverdicted · none · ref 41
A framework jointly models annotator-specific NLI labels and explanations using conditioned representations and two explainer architectures, improving predictive performance over baselines.
Escaping the Agreement Trap: Defensibility Signals for Evaluating Rule-Governed AI cs.AI · 2026-04-22 · unverdicted · none · ref 17
Introduces Defensibility Index, Ambiguity Index, and Probabilistic Defensibility Signal to evaluate AI moderation decisions by logical derivability from explicit rules rather than agreement with historical labels, with validation on 193k+ Reddit cases showing 33-46.6 pp metric gaps and a Governance
Pluralistic-Alignment Urbanism: Operationalizing a Right to AI for Inclusive Public Space cs.CY · 2026-05-15 · unverdicted · none · ref 66
Introduces PAU as a governance architecture for municipal AI in public spaces, informed by case studies on subgroup-aware scaling (R2=0.89) and pluralistic preference data that treats neutrality as indeterminacy.

Inherent disagreements in human textual infer- ences.Transactions of the Association for Computational Linguistics, 7:677–694

fields

years

verdicts

representative citing papers

citing papers explorer