hub Canonical reference

Spang, and Sebastian Möller

Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin V on Hagen, Silas Alberti, Alan Chan · 2024 · arXiv 0106.365903

Canonical reference. 83% of citing Pith papers cite this work as background.

14 Pith papers citing it

Background 83% of classified citations

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6

citation-polarity summary

background 5 support 1

representative citing papers

DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models

cs.AI · 2026-05-12 · unverdicted · novelty 7.0

DisaBench supplies a participatory taxonomy of twelve disability harm types, paired benign-adversarial prompts across seven life domains, and human-annotated data showing that standard safety tests miss context-dependent harms.

Narrow Secret Loyalty Dodges Black-Box Audits

cs.CR · 2026-05-07 · unverdicted · novelty 7.0 · 3 refs

First model organisms of narrow secret loyalties in LLMs evade black-box audits without principal knowledge and persist even at low poison fractions in training data.

Hubs or Fringes: Pretraining Data Selection via Web Graph Centrality

cs.CL · 2026-06-09 · conditional · novelty 6.0

Web graph centrality from Common Crawl supplies an orthogonal signal for pretraining data selection that improves language model performance when central and peripheral hosts are balanced.

Human oversight of agentic systems in practice: Examining the oversight work, challenges, and heuristics of developers using software agents

cs.SE · 2026-06-03 · unverdicted · novelty 6.0 · 3 refs

Exploratory interview study with 17 developers identifies four forms of emergent oversight work for software agents and documents situated challenges and heuristics.

Sequential Fairness Auditing with Limited Output Access

cs.AI · 2026-06-29 · unverdicted · novelty 5.0

The paper introduces a sequential generalized likelihood-ratio test framework for auditing Statistical Parity and Equal Opportunity fairness metrics under limited model query access.

Open Weight AI Models Require Proportional Evaluation Approaches

cs.CY · 2026-06-18 · unverdicted · novelty 5.0

Open-weight AI models mostly fail four proposed proportional evaluation criteria (PE1-4) designed to address risks from public weights that closed models do not face.

AI at the Front Lines of Platform Governance: Using LLMs to Support Illegal Content Reporting under the Digital Services Act

cs.HC · 2026-05-22 · unverdicted · novelty 5.0 · 2 refs

EvalAI providing pro/con arguments improves provision-level accuracy and reduces misclassification distance in DSA illegal content reporting under AI error conditions versus conventional XAI.

Evaluating Structured Documentation as a Tool for Reflexivity in Dataset Development

cs.CY · 2026-05-11 · unverdicted · novelty 5.0

Structured dataset documentation shows little engagement with major reflexivity themes from FAccT literature, leading to a new codebook and extended datasheet questions.

Developing an AI Concept Envisioning Toolkit to Support Reflective Juxtaposition of Values and Harms

cs.HC · 2026-04-30 · conditional · novelty 5.0

A new toolkit with cards and maps enables AI designers to juxtapose values and harms in early concept stages, shown valuable in designer surveys and interviews.

The Consensus Trap: Dissecting Subjectivity and the "Ground Truth" Illusion in Data Annotation

cs.AI · 2026-02-11 · unverdicted · novelty 5.0

A literature review concludes that pursuing consensus in data annotation creates biased AI by dismissing subjective disagreements and enforcing geographic hegemony, and proposes mapping diversity instead.

Multi-agent Self-triage System with Medical Flowcharts

cs.AI · 2025-11-16 · unverdicted · novelty 5.0

A multi-agent conversational system using AMA flowcharts achieves 95.29% top-3 retrieval accuracy and 99.10% navigation accuracy on large synthetic medical conversation datasets.

Quantifying Geospatial in the Common Crawl Corpus

cs.CL · 2024-06-07 · unverdicted · novelty 5.0

Analysis estimates 18.7% of Common Crawl documents contain geospatial information like coordinates and addresses, with little difference by language.

White Paper: Human-AI Collaboration in Conflict Analysis: Text Classifier Development with Peacebuilders

cs.HC · 2026-04-22 · unverdicted · novelty 4.0

Participatory annotation by peacebuilders and data scientists produced open-source BERT classifiers for Kenya polarization and Sudan hate speech that showed better contextual alignment than standard approaches.

Understanding AI Trustworthiness: A Scoping Review of AIES & FAccT Articles

cs.AI · 2025-10-24 · unverdicted · novelty 3.0

A scoping review of AIES and FAccT literature concludes that AI trustworthiness research prioritizes technical precision over social, ethical, and institutional factors, leaving the sociotechnical nature of AI systems underexplored.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Human oversight of agentic systems in practice: Examining the oversight work, challenges, and heuristics of developers using software agents cs.SE · 2026-06-03 · unverdicted · none · ref 16 · 3 links
Exploratory interview study with 17 developers identifies four forms of emergent oversight work for software agents and documents situated challenges and heuristics.

Spang, and Sebastian Möller

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer