Introduces Defensibility Index, Ambiguity Index, and Probabilistic Defensibility Signal to evaluate AI moderation decisions by logical derivability from explicit rules rather than agreement with historical labels, with validation on 193k+ Reddit cases showing 33-46.6 pp metric gaps and a Governance
Selective classification for deep neural networks
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
citing papers explorer
-
Escaping the Agreement Trap: Defensibility Signals for Evaluating Rule-Governed AI
Introduces Defensibility Index, Ambiguity Index, and Probabilistic Defensibility Signal to evaluate AI moderation decisions by logical derivability from explicit rules rather than agreement with historical labels, with validation on 193k+ Reddit cases showing 33-46.6 pp metric gaps and a Governance
- MoBayes: A Modular Bayesian Framework for Separating Reasoning from Language in Conversational Clinical Decision Support