Responsible Benchmarking of Fairness for Automatic Speech Recognition

· 2026 · cs.CL · arXiv 2605.10615

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Many studies have shown automatic speech processing (ASR) systems have unequal performance across speakergroups (SG's). However, the manner in which such studies arrive at this conclusion is inconsistent. To pave the wayfor more reliable results in future studies, we lay out best practices for benchmarking ASR fairness based on literaturefrom machine learning fairness, social sciences, and speech science. We first describe the importance of preciselythe fairness hypothesis being interrogated, and tailoring fairness metrics to apply specifically to said hypothesis.We then examine several benchmarks used to rate ASR systems on fairness and discuss how their results can bemisconstrued without assiduous oversight into the intersections between SG's. We find that evaluating fairnessbased on single heterogeneous SG's, such as they are defined in fairness benchmarks, can lead to misidentifyingwhich SG's are actually being mistreated by ASR systems. We advocate for as fine-grained an analysis as possibleof the intersectionality of as many demographic variables as are available in the metadata of fairness corpora in orderto tease out such spurious correlations

representative citing papers

Responsible Benchmarking of Fairness for Automatic Speech Recognition

cs.CL · 2026-05-11 · unverdicted · novelty 4.0

Current ASR fairness benchmarks using single heterogeneous speaker groups can misidentify mistreated groups, so evaluations should use fine-grained intersectional analysis of available demographic metadata.

citing papers explorer

Showing 1 of 1 citing paper.

Responsible Benchmarking of Fairness for Automatic Speech Recognition cs.CL · 2026-05-11 · unverdicted · none · ref 2 · internal anchor
Current ASR fairness benchmarks using single heterogeneous speaker groups can misidentify mistreated groups, so evaluations should use fine-grained intersectional analysis of available demographic metadata.

Responsible Benchmarking of Fairness for Automatic Speech Recognition

fields

years

verdicts

representative citing papers

citing papers explorer