Rickford and Dan Jurafsky and Sharad Goel , title =

Koenecke, Allison, Nam, Andrew, Lake, Emily, Nudell, Joe, Quartey, Minnie, Mengesha, Zion · 2020 · DOI 10.1073/pnas.1915768117

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

open at publisher browse 9 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Ouvia: A User-centered Framework for Measuring Usability of Speech Translation in Real-World Communication Scenarios

cs.CL · 2026-06-04 · unverdicted · novelty 7.0

Ouvia is a user-centered evaluation framework for speech translation usability in real-world scenarios, showing limited usability rates and the superiority of QA-based metrics.

Toward Calibrated, Fair, and accurate Deepfake Detection

cs.LG · 2026-06-03 · unverdicted · novelty 7.0

Face-Feature Tuning is a label-free logit remapping method that reduces FPR/TPR gaps across groups in deepfake detection while preserving overall accuracy.

"This Wasn't Made for Me": Recentering User Experience and Emotional Impact in the Evaluation of ASR Bias

cs.CL · 2026-04-22 · unverdicted · novelty 7.0

ASR bias causes users from underrepresented dialects to internalize failures as personal inadequacy and perform extensive emotional and linguistic labor, revealing harms missed by accuracy-only evaluations.

Layer-wise Probing of wav2vec 2.0 and Whisper for Consonant Cluster Reduction in African American English

cs.CL · 2026-06-22 · unverdicted · novelty 6.0

Layer-wise probing of wav2vec2-base and Whisper-small shows both models distinguish reduced vs. canonical consonant clusters in AAE with high accuracy and retain cues to underlying stops, encoding CCR as gradient variation.

Ethical and social risks of harm from Language Models

cs.CL · 2021-12-08 · accept · novelty 6.0

The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job loss and environmental costs.

SamaVaani: Auditing and Debiasing Multilingual Clinical ASR for Indian Languages

cs.CL · 2026-06-25 · unverdicted · novelty 5.0

Audit of multilingual clinical ASR reveals demographic biases; SamaVaani debiasing technique is proposed to jointly boost performance and fairness in Indian languages.

Few-Shot Synthetic Accented Speech for ASR Fine-Tuning: What Helps and When?

cs.SD · 2026-04-30 · unverdicted · novelty 5.0

Random phoneme substitutions recover most ASR gains from synthetic accented speech, with targeted edits and ground-truth prosody providing only marginal additional benefits.

Afrispeech Semantics: Evaluating Audio Semantic Reasoning in Spoken Language Models Across Domains and Accents

cs.CL · 2026-05-11 · unverdicted · novelty 4.0

Audio language models are benchmarked on five semantic and paralinguistic reasoning tasks to reveal limitations in handling spoken audio evidence, accent variation, and domain shifts.

Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility

cs.LG · 2026-05-07 · unverdicted · novelty 4.0 · 2 refs

Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.

citing papers explorer

Showing 9 of 9 citing papers.

Ouvia: A User-centered Framework for Measuring Usability of Speech Translation in Real-World Communication Scenarios cs.CL · 2026-06-04 · unverdicted · none · ref 33
Ouvia is a user-centered evaluation framework for speech translation usability in real-world scenarios, showing limited usability rates and the superiority of QA-based metrics.
Toward Calibrated, Fair, and accurate Deepfake Detection cs.LG · 2026-06-03 · unverdicted · none · ref 202
Face-Feature Tuning is a label-free logit remapping method that reduces FPR/TPR gaps across groups in deepfake detection while preserving overall accuracy.
"This Wasn't Made for Me": Recentering User Experience and Emotional Impact in the Evaluation of ASR Bias cs.CL · 2026-04-22 · unverdicted · none · ref 24
ASR bias causes users from underrepresented dialects to internalize failures as personal inadequacy and perform extensive emotional and linguistic labor, revealing harms missed by accuracy-only evaluations.
Layer-wise Probing of wav2vec 2.0 and Whisper for Consonant Cluster Reduction in African American English cs.CL · 2026-06-22 · unverdicted · none · ref 18
Layer-wise probing of wav2vec2-base and Whisper-small shows both models distinguish reduced vs. canonical consonant clusters in AAE with high accuracy and retain cues to underlying stops, encoding CCR as gradient variation.
Ethical and social risks of harm from Language Models cs.CL · 2021-12-08 · accept · none · ref 147
The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job loss and environmental costs.
SamaVaani: Auditing and Debiasing Multilingual Clinical ASR for Indian Languages cs.CL · 2026-06-25 · unverdicted · none · ref 8
Audit of multilingual clinical ASR reveals demographic biases; SamaVaani debiasing technique is proposed to jointly boost performance and fairness in Indian languages.
Few-Shot Synthetic Accented Speech for ASR Fine-Tuning: What Helps and When? cs.SD · 2026-04-30 · unverdicted · none · ref 9
Random phoneme substitutions recover most ASR gains from synthetic accented speech, with targeted edits and ground-truth prosody providing only marginal additional benefits.
Afrispeech Semantics: Evaluating Audio Semantic Reasoning in Spoken Language Models Across Domains and Accents cs.CL · 2026-05-11 · unverdicted · none · ref 92
Audio language models are benchmarked on five semantic and paralinguistic reasoning tasks to reveal limitations in handling spoken audio evidence, accent variation, and domain shifts.
Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility cs.LG · 2026-05-07 · unverdicted · none · ref 266 · 2 links
Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.

Rickford and Dan Jurafsky and Sharad Goel , title =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer