A structured regression approach for evaluating model performance across intersectional subgroups

Alexandra Chouldechova; Christine Herlihy; Kimberly Truong; Miroslav Dudik

arxiv: 2401.14893 · v2 · pith:EDAYGOYNnew · submitted 2024-01-26 · 💻 cs.LG · cs.CY· stat.AP· stat.ML

A structured regression approach for evaluating model performance across intersectional subgroups

Christine Herlihy , Kimberly Truong , Alexandra Chouldechova , Miroslav Dudik This is my paper

classification 💻 cs.LG cs.CYstat.APstat.ML

keywords subgroupsapproachperformanceevaluationintersectionalacrosssmalldata

0 comments

read the original abstract

Disaggregated evaluation is a central task in AI fairness assessment, where the goal is to measure an AI system's performance across different subgroups defined by combinations of demographic or other sensitive attributes. The standard approach is to stratify the evaluation data across subgroups and compute performance metrics separately for each group. However, even for moderately-sized evaluation datasets, sample sizes quickly get small once considering intersectional subgroups, which greatly limits the extent to which intersectional groups are included in analysis. In this work, we introduce a structured regression approach to disaggregated evaluation that we demonstrate can yield reliable system performance estimates even for very small subgroups. We provide corresponding inference strategies for constructing confidence intervals and explore how goodness-of-fit testing can yield insight into the structure of fairness-related harms experienced by intersectional groups. We evaluate our approach on two publicly available datasets, and several variants of semi-synthetic data. The results show that our method is considerably more accurate than the standard approach, especially for small subgroups, and demonstrate how goodness-of-fit testing helps identify the key factors that drive differences in performance.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

FairTree: Subgroup Fairness Auditing of Machine Learning Models with Bias-Variance Decomposition
cs.LG 2026-04 unverdicted novelty 7.0

FairTree audits ML models for subgroup fairness by decomposing performance disparities into systematic bias and variance using permutation-based and fluctuation tests adapted from psychometric methods.