DisaBench supplies a participatory taxonomy of twelve disability harm types, paired benign-adversarial prompts across seven life domains, and human-annotated data showing that standard safety tests miss context-dependent harms.
hub Canonical reference
Spang, and Sebastian Möller
Canonical reference. 83% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
roles
background 5representative citing papers
First model organisms of narrow secret loyalties in LLMs evade black-box audits without principal knowledge and persist even at low poison fractions in training data.
Web graph centrality from Common Crawl supplies an orthogonal signal for pretraining data selection that improves language model performance when central and peripheral hosts are balanced.
Exploratory interview study with 17 developers identifies four forms of emergent oversight work for software agents and documents situated challenges and heuristics.
The paper introduces a sequential generalized likelihood-ratio test framework for auditing Statistical Parity and Equal Opportunity fairness metrics under limited model query access.
Open-weight AI models mostly fail four proposed proportional evaluation criteria (PE1-4) designed to address risks from public weights that closed models do not face.
EvalAI providing pro/con arguments improves provision-level accuracy and reduces misclassification distance in DSA illegal content reporting under AI error conditions versus conventional XAI.
Structured dataset documentation shows little engagement with major reflexivity themes from FAccT literature, leading to a new codebook and extended datasheet questions.
A new toolkit with cards and maps enables AI designers to juxtapose values and harms in early concept stages, shown valuable in designer surveys and interviews.
A literature review concludes that pursuing consensus in data annotation creates biased AI by dismissing subjective disagreements and enforcing geographic hegemony, and proposes mapping diversity instead.
A multi-agent conversational system using AMA flowcharts achieves 95.29% top-3 retrieval accuracy and 99.10% navigation accuracy on large synthetic medical conversation datasets.
Analysis estimates 18.7% of Common Crawl documents contain geospatial information like coordinates and addresses, with little difference by language.
Participatory annotation by peacebuilders and data scientists produced open-source BERT classifiers for Kenya polarization and Sudan hate speech that showed better contextual alignment than standard approaches.
A scoping review of AIES and FAccT literature concludes that AI trustworthiness research prioritizes technical precision over social, ethical, and institutional factors, leaving the sociotechnical nature of AI systems underexplored.
citing papers explorer
-
Human oversight of agentic systems in practice: Examining the oversight work, challenges, and heuristics of developers using software agents
Exploratory interview study with 17 developers identifies four forms of emergent oversight work for software agents and documents situated challenges and heuristics.