Introduces Defensibility Index, Ambiguity Index, and Probabilistic Defensibility Signal to evaluate AI moderation decisions by logical derivability from explicit rules rather than agreement with historical labels, with validation on 193k+ Reddit cases showing 33-46.6 pp metric gaps and a Governance
Examining reasoning llms-as-judges in non-verifiable llm post-training
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Reasoning Arena converts non-diverse reward groups in RLVR into relative rewards via adaptive trace tournaments and Bradley-Terry fitting on anchor comparisons, claiming 7.6% average gains and 27-41% faster training on math/coding benchmarks.
Auto-Rubric as Reward externalizes VLM preferences into structured rubrics and applies Rubric Policy Optimization to create more reliable binary rewards for multimodal generation, outperforming pairwise models on text-to-image and editing benchmarks.
citing papers explorer
-
Escaping the Agreement Trap: Defensibility Signals for Evaluating Rule-Governed AI
Introduces Defensibility Index, Ambiguity Index, and Probabilistic Defensibility Signal to evaluate AI moderation decisions by logical derivability from explicit rules rather than agreement with historical labels, with validation on 193k+ Reddit cases showing 33-46.6 pp metric gaps and a Governance
-
Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short
Reasoning Arena converts non-diverse reward groups in RLVR into relative rewards via adaptive trace tournaments and Bradley-Terry fitting on anchor comparisons, claiming 7.6% average gains and 27-41% faster training on math/coding benchmarks.
-
Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria
Auto-Rubric as Reward externalizes VLM preferences into structured rubrics and applies Rubric Policy Optimization to create more reliable binary rewards for multimodal generation, outperforming pairwise models on text-to-image and editing benchmarks.