ArXiv e-prints , year =

Olawale Salaudeen, Anka Reuel, Ahmed Ahmed, Suhana Bedi, Zachary Robertson, Sudharsan Sundar, Ben Domingue, Angelina Wang, Sanmi Koyejo · 2025 · arXiv 2505.10573

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

FairTree: Subgroup Fairness Auditing of Machine Learning Models with Bias-Variance Decomposition

cs.LG · 2026-04-21 · unverdicted · novelty 7.0

FairTree audits ML models for subgroup fairness by decomposing performance disparities into systematic bias and variance using permutation-based and fluctuation tests adapted from psychometric methods.

The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems

cs.CY · 2026-02-19 · accept · novelty 6.0

The 2025 AI Agent Index catalogs technical and safety details for 30 deployed AI agents and finds low developer transparency on safety, evaluations, and societal impacts.

When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

A formalization of benchmarkless LLM safety scoring validated via an instrumental-validity chain of contrast separation, target variance dominance, and rerun stability, demonstrated on Norwegian scenarios.

Evaluating AI-Generated Images of Cultural Artifacts with Community-Informed Rubrics

cs.CY · 2026-04-02 · unverdicted · novelty 5.0 · 2 refs

Case studies with blind UK residents and people from Kerala and Tamil Nadu demonstrate that community input at the systematization stage produces culturally grounded definitions of appropriateness for text-to-image model outputs.

Why Johnny Can't Use Agents: Industry Aspirations vs. User Realities with AI Agents

cs.HC · 2025-09-18 · unverdicted · novelty 5.0

Industry markets AI agents for orchestration, creation, and insight, but a usability study with 31 participants reveals users face challenges from capability misalignment and lack of meta-cognition in tools like Operator and Manus.

Making AI Evaluation Deployment Relevant Through Context Specification

cs.AI · 2026-03-06 · unverdicted · novelty 4.0

Context specification is a process that turns diffuse stakeholder perspectives into explicit definitions of properties, behaviors, and outcomes to guide context-aware AI evaluations.

citing papers explorer

Showing 6 of 6 citing papers.

FairTree: Subgroup Fairness Auditing of Machine Learning Models with Bias-Variance Decomposition cs.LG · 2026-04-21 · unverdicted · none · ref 15
FairTree audits ML models for subgroup fairness by decomposing performance disparities into systematic bias and variance using permutation-based and fluctuation tests adapted from psychometric methods.
The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems cs.CY · 2026-02-19 · accept · none · ref 113
The 2025 AI Agent Index catalogs technical and safety details for 30 deployed AI agents and finds low developer transparency on safety, evaluations, and societal impacts.
When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels cs.LG · 2026-05-07 · unverdicted · none · ref 27
A formalization of benchmarkless LLM safety scoring validated via an instrumental-validity chain of contrast separation, target variance dominance, and rerun stability, demonstrated on Norwegian scenarios.
Evaluating AI-Generated Images of Cultural Artifacts with Community-Informed Rubrics cs.CY · 2026-04-02 · unverdicted · none · ref 99 · 2 links
Case studies with blind UK residents and people from Kerala and Tamil Nadu demonstrate that community input at the systematization stage produces culturally grounded definitions of appropriateness for text-to-image model outputs.
Why Johnny Can't Use Agents: Industry Aspirations vs. User Realities with AI Agents cs.HC · 2025-09-18 · unverdicted · none · ref 67
Industry markets AI agents for orchestration, creation, and insight, but a usability study with 31 participants reveals users face challenges from capability misalignment and lack of meta-cognition in tools like Operator and Manus.
Making AI Evaluation Deployment Relevant Through Context Specification cs.AI · 2026-03-06 · unverdicted · none · ref 5
Context specification is a process that turns diffuse stakeholder perspectives into explicit definitions of properties, behaviors, and outcomes to guide context-aware AI evaluations.

ArXiv e-prints , year =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer