What Does Debiasing Really Remove? A Geometric Study of PCA-Based Gender Debiasing in Word Embeddings

Alexey Kresin; Tchifou M. Dieffi; Tomer Caspi

arxiv: 2606.07964 · v1 · pith:QOY4KXADnew · submitted 2026-06-06 · 💻 cs.CL

What Does Debiasing Really Remove? A Geometric Study of PCA-Based Gender Debiasing in Word Embeddings

Alexey Kresin , Tchifou M. Dieffi , Tomer Caspi This is my paper

Pith reviewed 2026-06-27 20:13 UTC · model grok-4.3

classification 💻 cs.CL

keywords gender biasword embeddingsPCA debiasinggeometric analysisprincipal componentsWEATassociative biasdirect bias

0 comments

The pith

PCA gender debiasing removes direct bias from the first principal component but leaves associative bias distributed across dimensions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper performs a geometric analysis of PCA-based methods for removing gender bias from word embeddings. It shows that direct bias concentrates in the first principal component while associative bias measured by WEAT spreads across many dimensions without aligning to those components. Removing additional components reduces the targeted bias yet steadily distorts vector relationships and semantic structure. The results indicate that bias is not confined to a low-rank subspace, so simple removal trades one form of bias reduction for geometric damage with no single optimal cutoff.

Core claim

Direct gender bias in word embeddings is captured primarily by the first principal component, allowing its removal to reduce that bias, whereas associative bias does not align with the principal directions and remains after subspace removal; at the same time, excising any number of components degrades the embedding geometry in a measurable way.

What carries the argument

Principal components of the embedding matrix, which identify a candidate gender subspace whose successive removal is tracked for effects on direct bias, WEAT scores, and geometric fidelity.

If this is right

Direct bias decreases when the first principal component is subtracted.
Associative bias persists across multiple dimensions after principal-component removal.
Semantic relationships and vector norms degrade as more principal components are removed.
The optimal number of components to remove varies with the chosen bias metric and the source embedding.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Methods that target only linear subspaces are unlikely to eliminate all measurable gender associations.
Debiasing evaluations should separately track direct bias, distributed associations, and geometric integrity rather than rely on one score.
Embeddings may require nonlinear or data-driven corrections once low-rank removal reaches its limit.

Load-bearing premise

WEAT scores give a reliable, independent measure of associative gender bias that is separate from the direct bias already captured by the first principal component.

What would settle it

A dataset in which WEAT scores drop sharply after removal of only the first principal component, or in which vector cosine similarities and downstream task performance stay stable after removal of several components.

Figures

Figures reproduced from arXiv: 2606.07964 by Alexey Kresin, Tchifou M. Dieffi, Tomer Caspi.

**Figure 1.** Figure 1: Explained variance spectrum of the gender subspace. (a) The explained variance ratio of individual principal [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: Cross-embedding comparison of cumulative PCA-based component removal on GloVe, FastText, and [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Single-PC ablation analysis with normalized metrics. (a) Bias metrics (direct bias and WEAT) and [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

read the original abstract

Debiasing methods based on principal component analysis (PCA) are broadly used to reduce gender bias in word embeddings used in LLMs, yet it remains unclear what aspects of bias they actually remove and how destructive this process is. These methods are based on the understanding that bias resides in a low-dimensional subspace, with the assumption that most of it can be captured by a few principal components. In this work, we conduct a systematic geometric analysis of PCA-based gender debiasing and investigate what is actually removed from the embedding space. Our experiments across multiple embeddings show that direct gender bias is primarily concentrated in the first principal component, supporting the low-rank bias hypothesis. However, associative bias measured by WEAT does not align with these principal directions and is instead spread across multiple embedding dimensions. Furthermore, as expected, we demonstrate that removing an increasing number of principal components leads to a consistent degradation of the embedding geometry, affecting semantic structure and vector relationships. These results reveal that PCA-based debiasing operates as a trade-off: while it effectively reduces certain forms of direct bias, it fails to eliminate distributed associations and introduces geometric distortion. Moreover, there is no universal optimal level of debiasing, as the balance between bias reduction and semantic preservation depends on the chosen metric and embedding. Overall, our findings suggest that bias in word embeddings is not purely low-rank and that simple subspace removal methods may be insufficient for comprehensive debiasing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PCA hits direct bias in PC1 but leaves WEAT spread out and always costs geometry; the non-low-rank claim needs a direct check on post-removal WEAT.

read the letter

The main thing to know is that this paper separates direct gender bias (which loads on the first principal component) from associative bias measured by WEAT (which does not), while showing that removing any number of components steadily damages embedding geometry.

The geometric analysis across several embeddings is the useful part. It gives concrete evidence that direct bias is low-rank in the sense the authors define, and it quantifies the expected degradation in vector relationships as more components are dropped. That trade-off description is clearer than most prior debiasing papers.

The softer spot is the jump to “bias is not purely low-rank.” The abstract states that WEAT does not align with the principal directions, but it does not report whether removing the first component actually changes WEAT scores, nor any correlation between WEAT pair loadings and PC1. Without that link the distributed-bias conclusion rests on the spread observation alone. The degradation trends also lack error bars or tests, so the consistency claim is harder to weigh.

The work is aimed at people who design or evaluate subspace debiasing for embeddings. Readers who care about the practical limits of PCA methods will get something concrete from the geometry results. It is worth sending to referees because the question is practical and the experiments are reproducible in principle, even if the current version needs tighter controls on the WEAT side.

Referee Report

2 major / 1 minor

Summary. The manuscript conducts a geometric analysis of PCA-based gender debiasing on word embeddings. It reports that direct gender bias concentrates primarily in the first principal component while associative bias (measured via WEAT) is distributed across multiple dimensions and does not align with the leading PCs. Experiments across embeddings show that removing increasing numbers of principal components reduces certain bias measures but consistently degrades semantic structure and vector relationships, leading to the conclusion that gender bias is not purely low-rank and that simple subspace removal is insufficient for comprehensive debiasing.

Significance. If the geometric separation between direct bias and WEAT holds, the work supplies concrete empirical evidence that PCA debiasing involves an unavoidable trade-off between bias reduction and preservation of embedding utility, with no universal optimum. The multi-embedding scope and explicit focus on what is removed versus what remains are positive contributions to the debiasing literature.

major comments (2)

[Abstract] Abstract: The central claim that 'associative bias measured by WEAT does not align with these principal directions and is instead spread across multiple embedding dimensions' (and thus that bias is not purely low-rank) is load-bearing, yet the text supplies no verification that WEAT effect sizes are independent of PC1, such as the correlation between WEAT word-pair difference vectors and PC1 loadings or WEAT scores computed after explicit removal of the first component.
[Experiments] Experiments section (implied by abstract description of 'experiments across multiple embeddings'): No error bars, dataset sizes for the WEAT tests, or statistical significance tests are reported for the 'consistent degradation trends,' which weakens the ability to evaluate whether the observed spread of WEAT bias is robust or an artifact of metric choice.

minor comments (1)

[Abstract] The abstract invokes 'direct gender bias' and 'associative bias' without a brief parenthetical definition or reference to the precise operationalization used in the experiments.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major point below, agreeing where additional verification or reporting is warranted and outlining the planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that 'associative bias measured by WEAT does not align with these principal directions and is instead spread across multiple embedding dimensions' (and thus that bias is not purely low-rank) is load-bearing, yet the text supplies no verification that WEAT effect sizes are independent of PC1, such as the correlation between WEAT word-pair difference vectors and PC1 loadings or WEAT scores computed after explicit removal of the first component.

Authors: We agree that explicit verification would strengthen the central claim. The main text already includes geometric projections and quantitative comparisons showing that WEAT bias does not concentrate in the leading PCs (unlike direct bias), but we will add the suggested analyses: Pearson correlations between WEAT word-pair difference vectors and PC1 loadings, plus WEAT effect sizes recomputed after ablating the first component. These will be incorporated into Section 4 with a new table or figure for clarity. revision: yes
Referee: [Experiments] Experiments section (implied by abstract description of 'experiments across multiple embeddings'): No error bars, dataset sizes for the WEAT tests, or statistical significance tests are reported for the 'consistent degradation trends,' which weakens the ability to evaluate whether the observed spread of WEAT bias is robust or an artifact of metric choice.

Authors: We acknowledge this reporting gap. The experiments were run on standard WEAT test sets (e.g., the original Caliskan et al. stimuli) across multiple embeddings, but we will revise the Experiments section to report exact dataset sizes, add error bars (standard deviation across embeddings or bootstrap) to all degradation plots, and include statistical significance tests (paired t-tests or Wilcoxon tests) for the observed trends in bias reduction versus semantic degradation. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical geometric analysis with independent metrics

full rationale

The paper conducts experiments comparing PCA components to direct bias and WEAT associative bias across embeddings, reporting observed distributions without any derivation chain. No equations define a quantity in terms of itself, no fitted parameter is relabeled as a prediction, and no self-citation is invoked as a uniqueness theorem or load-bearing premise. Claims rest on external benchmarks (standard word embeddings, WEAT scores, geometric distortion measures) that are not constructed from the paper's own fitted values, making the analysis self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the low-rank bias hypothesis being testable via PCA and on WEAT being an appropriate proxy for associative bias; no free parameters or invented entities are described in the abstract.

axioms (2)

domain assumption Bias resides in a low-dimensional subspace that can be captured by principal components
Stated explicitly in the abstract as the basis for PCA debiasing methods.
domain assumption WEAT scores measure associative bias independently of the direct bias subspace
Used to contrast the two forms of bias when claiming that associative bias is spread across dimensions.

pith-pipeline@v0.9.1-grok · 5795 in / 1329 out tokens · 16840 ms · 2026-06-27T20:13:39.244133+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 1 canonical work pages · 1 internal anchor

[1]

Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on , pages=

Real-time segmentation of on-line handwritten arabic script , author=. Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on , pages=. 2014 , organization=

2014
[2]

Soft Computing and Pattern Recognition (SoCPaR), 2014 6th International Conference of , pages=

Fast classification of handwritten on-line Arabic characters , author=. Soft Computing and Pattern Recognition (SoCPaR), 2014 6th International Conference of , pages=. 2014 , organization=

2014
[3]

Estimate and Replace: A Novel Approach to Integrating Deep Neural Networks with Existing Applications

Estimate and Replace: A Novel Approach to Integrating Deep Neural Networks with Existing Applications , author=. arXiv preprint arXiv:1804.09028 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In ICLR, 2013

2013
[5]

GloVe: Global Vectors for Word Representation

Jeffrey Pennington, Richard Socher, and Christopher Manning. GloVe: Global Vectors for Word Representation. In EMNLP, 2014

2014
[6]

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In Advances in Neural Information Processing Systems (NeurIPS), 2016

2016
[7]

Bryson, and Arvind Narayanan

Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan. Semantics Derived Automatically from Language Corpora Contain Human-like Biases. Science, 356(6334):183--186, 2017

2017
[8]

Learning Gender-Neutral Word Embeddings

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. Learning Gender-Neutral Word Embeddings. In Proceedings of EMNLP, 2018

2018
[9]

Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But Do Not Remove Them

Hila Gonen and Yoav Goldberg. Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But Do Not Remove Them. In Proceedings of NAACL, 2019

2019
[10]

How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings

Kawin Ethayarajh. How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings. In Proceedings of EMNLP-IJCNLP, 2019

2019
[11]

Aylin Caliskan, Pimparkar Parth Ajay, Tessa Charlesworth, Robert Wolfe, and Mahzarin R. Banaji. Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency, Syntax, and Semantics. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2022

2022
[12]

The Impact of Debiasing Word Embeddings on Information Retrieval

Eva Gerritse. The Impact of Debiasing Word Embeddings on Information Retrieval. 2019

2019

[1] [1]

Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on , pages=

Real-time segmentation of on-line handwritten arabic script , author=. Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on , pages=. 2014 , organization=

2014

[2] [2]

Soft Computing and Pattern Recognition (SoCPaR), 2014 6th International Conference of , pages=

Fast classification of handwritten on-line Arabic characters , author=. Soft Computing and Pattern Recognition (SoCPaR), 2014 6th International Conference of , pages=. 2014 , organization=

2014

[3] [3]

Estimate and Replace: A Novel Approach to Integrating Deep Neural Networks with Existing Applications

Estimate and Replace: A Novel Approach to Integrating Deep Neural Networks with Existing Applications , author=. arXiv preprint arXiv:1804.09028 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In ICLR, 2013

2013

[5] [5]

GloVe: Global Vectors for Word Representation

Jeffrey Pennington, Richard Socher, and Christopher Manning. GloVe: Global Vectors for Word Representation. In EMNLP, 2014

2014

[6] [6]

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In Advances in Neural Information Processing Systems (NeurIPS), 2016

2016

[7] [7]

Bryson, and Arvind Narayanan

Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan. Semantics Derived Automatically from Language Corpora Contain Human-like Biases. Science, 356(6334):183--186, 2017

2017

[8] [8]

Learning Gender-Neutral Word Embeddings

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. Learning Gender-Neutral Word Embeddings. In Proceedings of EMNLP, 2018

2018

[9] [9]

Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But Do Not Remove Them

Hila Gonen and Yoav Goldberg. Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But Do Not Remove Them. In Proceedings of NAACL, 2019

2019

[10] [10]

How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings

Kawin Ethayarajh. How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings. In Proceedings of EMNLP-IJCNLP, 2019

2019

[11] [11]

Aylin Caliskan, Pimparkar Parth Ajay, Tessa Charlesworth, Robert Wolfe, and Mahzarin R. Banaji. Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency, Syntax, and Semantics. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2022

2022

[12] [12]

The Impact of Debiasing Word Embeddings on Information Retrieval

Eva Gerritse. The Impact of Debiasing Word Embeddings on Information Retrieval. 2019

2019