arxiv: 2511.21517 · v2 · submitted 2025-11-26 · 💻 cs.CL · cs.AI

Voice, Bias, and Coreference: An Interpretability Study of Gender in Speech Translation

Lina Conti , Dennis Fucci , Marco Gaido , Matteo Negri , Guillaume Wisniewski , Luisa Bentivogli This is my paper

Pith reviewed 2026-05-17 04:47 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords speech translationgender biasinterpretabilitycoreferenceacoustic featuresspectrogramsmachine translationspeaker characteristics

0 comments

The pith

Speech translation models assign gender to speaker-referring terms by linking them to first-person pronouns and using acoustic cues from across the frequency spectrum.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper studies the mechanisms speech translation models use to assign grammatical gender to terms that refer to the speaker when translating from English to Spanish, French or Italian. The models learn a general masculine bias from training patterns rather than specific term associations. Acoustic features in the input speech allow the models to override biases from their internal language model. The model with better gender accuracy does this by using first-person pronouns to connect the gendered terms back to the speaker and by accessing gender information spread throughout the frequency spectrum instead of focusing on pitch.

Core claim

The study shows that models do not simply replicate term-specific gender associations from training data but learn broader patterns of masculine prevalence. While the internal language model exhibits strong masculine bias, models can override these preferences based on acoustic input. Using contrastive feature attribution on spectrograms reveals that the model with higher gender accuracy relies on first-person pronouns to link gendered terms back to the speaker, accessing gender information distributed across the frequency spectrum rather than concentrated in pitch.

What carries the argument

First-person pronoun coreference linking combined with distributed frequency spectrum access for gender cues, identified via contrastive feature attribution on spectrograms.

If this is right

Acoustic input overrides internal language model masculine bias in gender decisions.
Models rely on coreference mechanisms rather than direct term-gender associations from data.
Gender information is distributed across the full frequency spectrum, not limited to pitch.
This mechanism contributes to higher gender accuracy in the better-performing model.
The pattern holds for translations into Spanish, French, and Italian.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Masking or altering first-person pronouns in the speech input should reduce gender assignment accuracy if the mechanism is causal.
The distributed frequency access could make models more robust to variations in speaker voice pitch or quality.
This finding may inform design of future speech models to explicitly model speaker coreference for reduced bias.
Similar interpretability techniques could be applied to other attributes like speaker age or emotion in translation tasks.

Load-bearing premise

Contrastive feature attribution on spectrograms accurately captures the causal acoustic features and coreference mechanisms without method-induced artifacts or spurious correlations.

What would settle it

An experiment showing that blocking access to first-person pronouns in the audio input causes the high-accuracy model to lose its gender assignment advantage, or demonstrating that gender cues are localized to pitch frequencies rather than spread across the spectrum.

Figures

Figures reproduced from arXiv: 2511.21517 by Dennis Fucci, Guillaume Wisniewski, Lina Conti, Luisa Bentivogli, Marco Gaido, Matteo Negri.

**Figure 2.** Figure 2: Average saliency scores across the frequency dimension for examples that flip, for the Trans [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

read the original abstract

Unlike text, speech conveys information about the speaker, such as gender, through acoustic cues like pitch. This gives rise to modality-specific bias concerns. For example, in speech translation (ST), when translating from languages with notional gender, such as English, into languages where gender-ambiguous terms referring to the speaker are assigned grammatical gender, the speaker's vocal characteristics may play a role in gender assignment. This risks misgendering speakers, whether through masculine defaults or vocal-based assumptions. Yet, how ST models make these decisions remains poorly understood. We investigate the mechanisms ST models use to assign gender to speaker-referring terms across three language pairs (en-es/fr/it). To do so, we examine how training data patterns, internal language model (ILM) biases, and acoustic information interact. We find that models do not simply replicate term-specific gender associations from training data, but learn broader patterns of masculine prevalence. While the ILM exhibits strong masculine bias, models can override these preferences based on acoustic input. Using contrastive feature attribution on spectrograms, we reveal that the model with higher gender accuracy relies on a previously unknown mechanism: using first-person pronouns to link gendered terms back to the speaker, accessing gender information distributed across the frequency spectrum rather than concentrated in pitch.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows ST models override ILM masculine bias with acoustics by linking via first-person pronouns and using gender cues across the full frequency spectrum rather than pitch alone.

read the letter

The main thing to know is that this work identifies a specific mechanism in speech translation models: higher-accuracy ones use first-person pronouns to tie gendered terms back to the speaker while pulling gender information from across the frequency spectrum instead of concentrating on pitch. They back this with comparisons to training data patterns, ILM biases, and acoustic overrides across en-es, en-fr, and en-it pairs, using contrastive attribution on spectrograms to surface the details. That combination moves past routine bias audits and gives a mechanistic angle on how the models actually decide gender for speaker-referring terms. The empirical grounding in data statistics and attribution outputs looks solid for an observational study, with no signs of circularity or obvious fitting problems. The abstract and described results support the claim that models learn broader masculine prevalence rather than term-specific associations from training data. One soft spot is the reliance on contrastive attribution without reported interventions such as pronoun masking or frequency-band ablation. Attribution maps can reflect correlations or baseline choices rather than strict causal necessity, especially when acoustic and linguistic features overlap in spectrograms. That leaves some uncertainty about whether the attributed regions are truly required for the observed behavior. This paper is aimed at researchers working on interpretability and bias in speech translation or multimodal models. It offers targeted, actionable insights that could inform mitigation work without claiming to reshape the broader field. A serious referee should review it to verify the experimental controls, statistical details, and robustness of the attribution findings.

Referee Report

1 major / 2 minor

Summary. The manuscript investigates how speech translation models assign grammatical gender to speaker-referring terms when translating from English into Spanish, French, and Italian. It examines interactions among training data gender patterns, internal language model (ILM) masculine biases, and acoustic cues in spectrogram inputs. The central claims are that models learn broad masculine prevalence rather than term-specific associations, can override ILM biases using acoustic information, and that higher-accuracy models employ a coreference mechanism via first-person pronouns while accessing gender information distributed across the frequency spectrum (rather than concentrated in pitch), as identified through contrastive feature attribution.

Significance. If the attribution-based mechanistic findings hold, the work provides novel empirical insights into modality-specific gender handling in ST systems beyond simple data replication, with potential implications for reducing misgendering risks. The grounding in direct comparisons of training data statistics, ILM outputs, and attribution maps across three language pairs supplies mechanistic evidence that strengthens the observational claims.

major comments (1)

[contrastive feature attribution analysis] The central claim regarding the previously unknown mechanism (first-person pronoun coreference linking gendered terms to the speaker and distributed frequency access) rests on contrastive feature attribution applied to spectrograms. No interventions such as pronoun masking, frequency-band ablation, or counterfactual audio synthesis are described to establish that the attributed regions are causally necessary rather than correlational, which is load-bearing for the interpretability conclusion in the abstract and results.

minor comments (2)

[Abstract] The abstract would benefit from brief inclusion of the specific ST model architectures, training details, and any statistical controls or significance testing used for the gender accuracy and bias comparisons.
[Methods] Clarify the choice of baseline in the contrastive attribution method and any preprocessing applied to spectrogram inputs to aid reproducibility of the saliency maps.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address the major comment below with clarifications on our methodology and planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [contrastive feature attribution analysis] The central claim regarding the previously unknown mechanism (first-person pronoun coreference linking gendered terms to the speaker and distributed frequency access) rests on contrastive feature attribution applied to spectrograms. No interventions such as pronoun masking, frequency-band ablation, or counterfactual audio synthesis are described to establish that the attributed regions are causally necessary rather than correlational, which is load-bearing for the interpretability conclusion in the abstract and results.

Authors: We acknowledge that contrastive feature attribution identifies regions of the spectrogram that correlate with the model's gender predictions rather than establishing strict causality via interventions such as pronoun masking, frequency-band ablation, or counterfactual synthesis. The manuscript does not include such experiments, which would indeed provide stronger causal evidence for the proposed coreference mechanism and distributed frequency access. At the same time, the contrastive setup—comparing attributions across models with differing gender accuracies, against ILM baselines, and across three language pairs—generates consistent mechanistic hypotheses that go beyond simple data replication. These patterns align with the observed override of masculine ILM biases by acoustic input. To address the concern directly, we will revise the manuscript to add an explicit discussion of the correlational character of attribution methods, a limitations subsection on the absence of causal interventions, and suggestions for future work involving targeted ablations. This partial revision will qualify the interpretability claims appropriately while preserving the empirical observations. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical interpretability analysis grounded in data and attribution outputs

full rationale

The paper conducts an observational study examining training data patterns, ILM biases, and acoustic inputs via contrastive feature attribution on spectrograms. The central claim about first-person pronoun coreference and distributed frequency access is presented as a direct finding from applying the attribution method to model behavior and comparing against data statistics. No equations, fitted parameters renamed as predictions, self-citations, or ansatzes are used in a load-bearing way that reduces the result to its own inputs by construction. The derivation remains self-contained against external benchmarks such as training data and attribution maps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on standard empirical ML practices and interpretability assumptions rather than new free parameters or invented entities. No numbers are fitted ad hoc to support the central claim.

axioms (1)

domain assumption Contrastive feature attribution accurately reveals the model's internal decision process for gender assignment from spectrograms.
Invoked when applying the technique to identify pronoun linking and frequency distribution.

pith-pipeline@v0.9.0 · 5544 in / 1410 out tokens · 41317 ms · 2026-05-17T04:47:38.765385+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 3 internal anchors

[1]

Voice, Bias, and Coreference: An Interpretability Study of Gender in Speech Translation

Introduction Improved speech technology has made voice a popular modality to interact with AI systems, with applications like live translation through earphones now widely available (Chen, 2025). However, un- like text, speech conveys information beyond lin- guistic content: vocal characteristics like pitch and pronunciation provide cues about the speaker...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

(2020), we make explicit the assumptions underlying our work on bias

Bias Statement Following Blodgett et al. (2020), we make explicit the assumptions underlying our work on bias. We focus on misgendering in ST: when systems trans- late speaker-referring terms into gendered target language forms that do not align with the speaker’s gender identity. We consider outputs biased when they contradict reference translations that...

work page 2020
[3]

For instance, Wisniewski et al

Related Works Gender bias in MT has been extensively studied (Savoldi et al., 2025a) with interpretability work re- vealing mechanisms underlying gendered choices in text-based systems. For instance, Wisniewski et al. (2022) and Manna et al. (2025) show that accurate gender disambiguation critically depends on correct handling of coreference chains. Featu...

work page 2022
[4]

diventata

Method To investigate the factors driving gender assign- ment by ST models for terms referring to the speaker, we examine three potential sources: train- ing data patterns, the decoder’s learned biases in- dependentoftheinputaudio,andthemostrelevant acoustic features from the input for gender assign- ment. In this section, we introduce the methods through...

work page 2020
[5]

She is a student,

Experimental Setup DataWe use MuST-SHE (Bentivogli et al., 2020), a benchmark containing annotations for gender- neutral English terms in natural speech that re- quire gender marking when translated to Spanish, French, or Italian. We focus on the subset con- taining terms referring to the speaker, as these are cases where acoustic gender cues could in- fl...

work page 2020
[6]

What is the influence of gender associations learned from the training data?

Training Data Analysis This section addresses our research question “What is the influence of gender associations learned from the training data?” Specifically, we measure whether models replicate term-specific gender patterns from their training data by comput- ing gender prevalence (§4.1) in MuST-C (Cattoni et al., 2021), the training corpus for the mod...

work page 2021
[7]

What is the impact of the model’s learned knowledge of the target language and a priori assumptions about gender on predictions?

Internal Language Model Analysis While the training data analysis has revealed that models do not simply memorize term-specific as- sociations, gender assignment must still be driven by some combination of learned decoder prefer- ences and input audio features. We first investi- gate whether the decoder has internalized broader biases beyond individual te...

work page 2020
[8]

What aspects of the input audio does the model use to assign gender to terms referring to the speaker?

The Role of Input Audio This section addresses our third research ques- tion: “What aspects of the input audio does the model use to assign gender to terms referring to the speaker?” For this, we apply the feature attri- bution method from §4.3. Occluding 1–20% of the most salient features highlighted by the saliency mapflipsthepredictedgenderin40.7%ofSpa...

work page 2020
[9]

Ouranalysisre- vealed that rather than memorizing individual term- gender pairings from training data, models learn that masculine forms are generally more prevalent

Conclusion We investigated how ST models assign gender to speaker-referring terms when translating from En- glishtothreeRomancelanguages. Ouranalysisre- vealed that rather than memorizing individual term- gender pairings from training data, models learn that masculine forms are generally more prevalent. The decoder exhibits strong bias toward masculine de...

work page 2019
[10]

This interpretability work contributes to the latter effort, providing insights into how speech translation models make gender assignments that can inform future interventions

Ethics Statement Broader Impact.To mitigate harmful behaviors in AI systems such as those outlined in our Bias Statement (§2), we need both mitigation strate- gies (Vanmassenhove et al., 2018; Escudé Font andCosta-jussà,2019;SaundersandByrne,2020; Saunders et al., 2022) and foundational research that reveals the mechanisms underlying biased be- haviors. T...

work page 2018
[11]

While these models are not state-of-the-art in terms of overall speech trans- lation performance, we selected them for specific methodological reasons

Limitations Models.Our analysis focuses on two model ar- chitectures trained on the MuST-C dataset (Cat- toni et al., 2021). While these models are not state-of-the-art in terms of overall speech trans- lation performance, we selected them for specific methodological reasons. The Transformer model (Wang et al., 2020) demonstrates notably higher accuracy o...

work page 2021
[12]

Bibliographical References Martine Adda-Decker and Lori Lamel. 2005. Do speech recognizers prefer female speakers? In Interspeech, pages 2205–2208. Giuseppe Attanasio, Flor Miriam Plaza del Arco, Debora Nozza, and Anne Lauscher. 2023. A tale of pronouns: Interpretability informs gender bias mitigation for fairer instruction-tuned machine translation. InPr...

work page arXiv 2005
[13]

Lina Conti and Guillaume Wisniewski

The unheard alternative: Contrastive ex- planations for speech-to-text models. Lina Conti and Guillaume Wisniewski. 2023. Us- ing artificial french data to understand the emer- gence of gender bias in transformer language models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Pro- cessing, pages 10362–10371. Marta R. Costa-j...

work page 2023
[14]

InInterspeech 2022-23rd INTERSPEECH Conference

Probing phoneme, language and speaker information in unsupervised speech representa- tions. InInterspeech 2022-23rd INTERSPEECH Conference. Mostafa Elaraby, Ahmed Y Tawfik, Mahmoud Khaled, Hany Hassan, and Aly Osama. 2018. Gender aware spoken language translation ap- plied to english-arabic. In2018 2nd International Conference on Natural Language and Spee...

work page arXiv 2022
[15]

Marco Gaido, Beatrice Savoldi, Luisa Bentivogli, Matteo Negri, and Marco Turchi

Spes: Spectrogram perturbation for explainable speech-to-text generation.arXiv preprint arXiv:2411.01710. Marco Gaido, Beatrice Savoldi, Luisa Bentivogli, Matteo Negri, and Marco Turchi. 2020. Breeding gender-aware direct speech translation systems. InProceedings of the 28th International Confer- ence on Computational Linguistics, pages 3951– 3964, Barcel...

work page arXiv 2020
[16]

In2021 IEEE Spoken Language Technology Workshop (SLT), pages 243–250

Internal language model estimation for domain-adaptive end-to-end speech recognition. In2021 IEEE Spoken Language Technology Workshop (SLT), pages 243–250. IEEE. Hosein Mohebbi, Grzegorz Chrupała, Willem Zuidema, and Afra Alishahi. 2023. Homophone disambiguation reveals patterns of context mix- inginspeechtransformers. InProceedingsofthe 2023 Conference o...

work page 2023
[17]

AudioPaLM: A Large Language Model That Can Speak and Listen

Springer International Publishing Cham. Paul K Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, Zalán Bor- sos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, et al. 2023. Audiopalm: A large language model that can speak and listen.arXiv preprint arXiv:2306.12925. Gabriele Sarti, Nils Feldhus, Ludwig ...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[18]

InFindings of the Association for Computational Linguistics: ACL 2022, pages 3814–3823

First the worst: Finding better gender translations during beam search. InFindings of the Association for Computational Linguistics: ACL 2022, pages 3814–3823. Beatrice Savoldi, Jasmijn Bastings, Luisa Ben- tivogli, and Eva Vanmassenhove. 2025a. A decade of gender bias in machine translation. Patterns, 6(6). Beatrice Savoldi, Eleonora Cupin, Manjinder Thi...

work page arXiv 2022
[19]

Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey

Towards holistic evaluation of large audio- language models: A comprehensive survey. arXiv preprint arXiv:2505.15957. Marcely Zanon Boito, Laurent Besacier, Natalia Tomashenko, andYannickEstève.2022. Astudy of gender impact in self-supervised models for speech-to-text systems. InProc. Interspeech 2022, pages 1278–1282. Mohammad Zeineldeen, Aleksandr Glush...

work page internal anchor Pith review Pith/arXiv arXiv 2022