pith. sign in

Responsible Benchmarking of Fairness for Automatic Speech Recognition

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

Many studies have shown automatic speech processing (ASR) systems have unequal performance across speakergroups (SG's). However, the manner in which such studies arrive at this conclusion is inconsistent. To pave the wayfor more reliable results in future studies, we lay out best practices for benchmarking ASR fairness based on literaturefrom machine learning fairness, social sciences, and speech science. We first describe the importance of preciselythe fairness hypothesis being interrogated, and tailoring fairness metrics to apply specifically to said hypothesis.We then examine several benchmarks used to rate ASR systems on fairness and discuss how their results can bemisconstrued without assiduous oversight into the intersections between SG's. We find that evaluating fairnessbased on single heterogeneous SG's, such as they are defined in fairness benchmarks, can lead to misidentifyingwhich SG's are actually being mistreated by ASR systems. We advocate for as fine-grained an analysis as possibleof the intersectionality of as many demographic variables as are available in the metadata of fairness corpora in orderto tease out such spurious correlations

fields

cs.CL 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

citing papers explorer

Showing 1 of 1 citing paper.

  • Responsible Benchmarking of Fairness for Automatic Speech Recognition cs.CL · 2026-05-11 · unverdicted · none · ref 2 · internal anchor

    Current ASR fairness benchmarks using single heterogeneous speaker groups can misidentify mistreated groups, so evaluations should use fine-grained intersectional analysis of available demographic metadata.