DisaBench supplies a participatory taxonomy of twelve disability harm types, paired benign-adversarial prompts across seven life domains, and human-annotated data showing that standard safety tests miss context-dependent harms.
hub Canonical reference
Spang, and Sebastian Möller
Canonical reference. 83% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
roles
background 5representative citing papers
First model organisms of narrow secret loyalties in LLMs evade black-box audits without principal knowledge and persist even at low poison fractions in training data.
Web graph centrality from Common Crawl supplies an orthogonal signal for pretraining data selection that improves language model performance when central and peripheral hosts are balanced.
Exploratory interview study with 17 developers identifies four forms of emergent oversight work for software agents and documents situated challenges and heuristics.
The paper introduces a sequential generalized likelihood-ratio test framework for auditing Statistical Parity and Equal Opportunity fairness metrics under limited model query access.
Open-weight AI models mostly fail four proposed proportional evaluation criteria (PE1-4) designed to address risks from public weights that closed models do not face.
EvalAI providing pro/con arguments improves provision-level accuracy and reduces misclassification distance in DSA illegal content reporting under AI error conditions versus conventional XAI.
Structured dataset documentation shows little engagement with major reflexivity themes from FAccT literature, leading to a new codebook and extended datasheet questions.
A new toolkit with cards and maps enables AI designers to juxtapose values and harms in early concept stages, shown valuable in designer surveys and interviews.
A literature review concludes that pursuing consensus in data annotation creates biased AI by dismissing subjective disagreements and enforcing geographic hegemony, and proposes mapping diversity instead.
A multi-agent conversational system using AMA flowcharts achieves 95.29% top-3 retrieval accuracy and 99.10% navigation accuracy on large synthetic medical conversation datasets.
Analysis estimates 18.7% of Common Crawl documents contain geospatial information like coordinates and addresses, with little difference by language.
Participatory annotation by peacebuilders and data scientists produced open-source BERT classifiers for Kenya polarization and Sudan hate speech that showed better contextual alignment than standard approaches.
A scoping review of AIES and FAccT literature concludes that AI trustworthiness research prioritizes technical precision over social, ethical, and institutional factors, leaving the sociotechnical nature of AI systems underexplored.
citing papers explorer
-
DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models
DisaBench supplies a participatory taxonomy of twelve disability harm types, paired benign-adversarial prompts across seven life domains, and human-annotated data showing that standard safety tests miss context-dependent harms.
-
Narrow Secret Loyalty Dodges Black-Box Audits
First model organisms of narrow secret loyalties in LLMs evade black-box audits without principal knowledge and persist even at low poison fractions in training data.
-
Hubs or Fringes: Pretraining Data Selection via Web Graph Centrality
Web graph centrality from Common Crawl supplies an orthogonal signal for pretraining data selection that improves language model performance when central and peripheral hosts are balanced.
-
Human oversight of agentic systems in practice: Examining the oversight work, challenges, and heuristics of developers using software agents
Exploratory interview study with 17 developers identifies four forms of emergent oversight work for software agents and documents situated challenges and heuristics.
-
Sequential Fairness Auditing with Limited Output Access
The paper introduces a sequential generalized likelihood-ratio test framework for auditing Statistical Parity and Equal Opportunity fairness metrics under limited model query access.
-
Open Weight AI Models Require Proportional Evaluation Approaches
Open-weight AI models mostly fail four proposed proportional evaluation criteria (PE1-4) designed to address risks from public weights that closed models do not face.
-
AI at the Front Lines of Platform Governance: Using LLMs to Support Illegal Content Reporting under the Digital Services Act
EvalAI providing pro/con arguments improves provision-level accuracy and reduces misclassification distance in DSA illegal content reporting under AI error conditions versus conventional XAI.
-
Evaluating Structured Documentation as a Tool for Reflexivity in Dataset Development
Structured dataset documentation shows little engagement with major reflexivity themes from FAccT literature, leading to a new codebook and extended datasheet questions.
-
Developing an AI Concept Envisioning Toolkit to Support Reflective Juxtaposition of Values and Harms
A new toolkit with cards and maps enables AI designers to juxtapose values and harms in early concept stages, shown valuable in designer surveys and interviews.
-
The Consensus Trap: Dissecting Subjectivity and the "Ground Truth" Illusion in Data Annotation
A literature review concludes that pursuing consensus in data annotation creates biased AI by dismissing subjective disagreements and enforcing geographic hegemony, and proposes mapping diversity instead.
-
White Paper: Human-AI Collaboration in Conflict Analysis: Text Classifier Development with Peacebuilders
Participatory annotation by peacebuilders and data scientists produced open-source BERT classifiers for Kenya polarization and Sudan hate speech that showed better contextual alignment than standard approaches.