What Does Debiasing Really Remove? A Geometric Study of PCA-Based Gender Debiasing in Word Embeddings
Pith reviewed 2026-06-27 20:13 UTC · model grok-4.3
The pith
PCA gender debiasing removes direct bias from the first principal component but leaves associative bias distributed across dimensions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Direct gender bias in word embeddings is captured primarily by the first principal component, allowing its removal to reduce that bias, whereas associative bias does not align with the principal directions and remains after subspace removal; at the same time, excising any number of components degrades the embedding geometry in a measurable way.
What carries the argument
Principal components of the embedding matrix, which identify a candidate gender subspace whose successive removal is tracked for effects on direct bias, WEAT scores, and geometric fidelity.
If this is right
- Direct bias decreases when the first principal component is subtracted.
- Associative bias persists across multiple dimensions after principal-component removal.
- Semantic relationships and vector norms degrade as more principal components are removed.
- The optimal number of components to remove varies with the chosen bias metric and the source embedding.
Where Pith is reading between the lines
- Methods that target only linear subspaces are unlikely to eliminate all measurable gender associations.
- Debiasing evaluations should separately track direct bias, distributed associations, and geometric integrity rather than rely on one score.
- Embeddings may require nonlinear or data-driven corrections once low-rank removal reaches its limit.
Load-bearing premise
WEAT scores give a reliable, independent measure of associative gender bias that is separate from the direct bias already captured by the first principal component.
What would settle it
A dataset in which WEAT scores drop sharply after removal of only the first principal component, or in which vector cosine similarities and downstream task performance stay stable after removal of several components.
Figures
read the original abstract
Debiasing methods based on principal component analysis (PCA) are broadly used to reduce gender bias in word embeddings used in LLMs, yet it remains unclear what aspects of bias they actually remove and how destructive this process is. These methods are based on the understanding that bias resides in a low-dimensional subspace, with the assumption that most of it can be captured by a few principal components. In this work, we conduct a systematic geometric analysis of PCA-based gender debiasing and investigate what is actually removed from the embedding space. Our experiments across multiple embeddings show that direct gender bias is primarily concentrated in the first principal component, supporting the low-rank bias hypothesis. However, associative bias measured by WEAT does not align with these principal directions and is instead spread across multiple embedding dimensions. Furthermore, as expected, we demonstrate that removing an increasing number of principal components leads to a consistent degradation of the embedding geometry, affecting semantic structure and vector relationships. These results reveal that PCA-based debiasing operates as a trade-off: while it effectively reduces certain forms of direct bias, it fails to eliminate distributed associations and introduces geometric distortion. Moreover, there is no universal optimal level of debiasing, as the balance between bias reduction and semantic preservation depends on the chosen metric and embedding. Overall, our findings suggest that bias in word embeddings is not purely low-rank and that simple subspace removal methods may be insufficient for comprehensive debiasing.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript conducts a geometric analysis of PCA-based gender debiasing on word embeddings. It reports that direct gender bias concentrates primarily in the first principal component while associative bias (measured via WEAT) is distributed across multiple dimensions and does not align with the leading PCs. Experiments across embeddings show that removing increasing numbers of principal components reduces certain bias measures but consistently degrades semantic structure and vector relationships, leading to the conclusion that gender bias is not purely low-rank and that simple subspace removal is insufficient for comprehensive debiasing.
Significance. If the geometric separation between direct bias and WEAT holds, the work supplies concrete empirical evidence that PCA debiasing involves an unavoidable trade-off between bias reduction and preservation of embedding utility, with no universal optimum. The multi-embedding scope and explicit focus on what is removed versus what remains are positive contributions to the debiasing literature.
major comments (2)
- [Abstract] Abstract: The central claim that 'associative bias measured by WEAT does not align with these principal directions and is instead spread across multiple embedding dimensions' (and thus that bias is not purely low-rank) is load-bearing, yet the text supplies no verification that WEAT effect sizes are independent of PC1, such as the correlation between WEAT word-pair difference vectors and PC1 loadings or WEAT scores computed after explicit removal of the first component.
- [Experiments] Experiments section (implied by abstract description of 'experiments across multiple embeddings'): No error bars, dataset sizes for the WEAT tests, or statistical significance tests are reported for the 'consistent degradation trends,' which weakens the ability to evaluate whether the observed spread of WEAT bias is robust or an artifact of metric choice.
minor comments (1)
- [Abstract] The abstract invokes 'direct gender bias' and 'associative bias' without a brief parenthetical definition or reference to the precise operationalization used in the experiments.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments. We address each major point below, agreeing where additional verification or reporting is warranted and outlining the planned revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that 'associative bias measured by WEAT does not align with these principal directions and is instead spread across multiple embedding dimensions' (and thus that bias is not purely low-rank) is load-bearing, yet the text supplies no verification that WEAT effect sizes are independent of PC1, such as the correlation between WEAT word-pair difference vectors and PC1 loadings or WEAT scores computed after explicit removal of the first component.
Authors: We agree that explicit verification would strengthen the central claim. The main text already includes geometric projections and quantitative comparisons showing that WEAT bias does not concentrate in the leading PCs (unlike direct bias), but we will add the suggested analyses: Pearson correlations between WEAT word-pair difference vectors and PC1 loadings, plus WEAT effect sizes recomputed after ablating the first component. These will be incorporated into Section 4 with a new table or figure for clarity. revision: yes
-
Referee: [Experiments] Experiments section (implied by abstract description of 'experiments across multiple embeddings'): No error bars, dataset sizes for the WEAT tests, or statistical significance tests are reported for the 'consistent degradation trends,' which weakens the ability to evaluate whether the observed spread of WEAT bias is robust or an artifact of metric choice.
Authors: We acknowledge this reporting gap. The experiments were run on standard WEAT test sets (e.g., the original Caliskan et al. stimuli) across multiple embeddings, but we will revise the Experiments section to report exact dataset sizes, add error bars (standard deviation across embeddings or bootstrap) to all degradation plots, and include statistical significance tests (paired t-tests or Wilcoxon tests) for the observed trends in bias reduction versus semantic degradation. revision: yes
Circularity Check
No circularity: empirical geometric analysis with independent metrics
full rationale
The paper conducts experiments comparing PCA components to direct bias and WEAT associative bias across embeddings, reporting observed distributions without any derivation chain. No equations define a quantity in terms of itself, no fitted parameter is relabeled as a prediction, and no self-citation is invoked as a uniqueness theorem or load-bearing premise. Claims rest on external benchmarks (standard word embeddings, WEAT scores, geometric distortion measures) that are not constructed from the paper's own fitted values, making the analysis self-contained.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Bias resides in a low-dimensional subspace that can be captured by principal components
- domain assumption WEAT scores measure associative bias independently of the direct bias subspace
Reference graph
Works this paper leans on
-
[1]
Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on , pages=
Real-time segmentation of on-line handwritten arabic script , author=. Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on , pages=. 2014 , organization=
2014
-
[2]
Soft Computing and Pattern Recognition (SoCPaR), 2014 6th International Conference of , pages=
Fast classification of handwritten on-line Arabic characters , author=. Soft Computing and Pattern Recognition (SoCPaR), 2014 6th International Conference of , pages=. 2014 , organization=
2014
-
[3]
Estimate and Replace: A Novel Approach to Integrating Deep Neural Networks with Existing Applications , author=. arXiv preprint arXiv:1804.09028 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[4]
Efficient Estimation of Word Representations in Vector Space
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In ICLR, 2013
2013
-
[5]
GloVe: Global Vectors for Word Representation
Jeffrey Pennington, Richard Socher, and Christopher Manning. GloVe: Global Vectors for Word Representation. In EMNLP, 2014
2014
-
[6]
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In Advances in Neural Information Processing Systems (NeurIPS), 2016
2016
-
[7]
Bryson, and Arvind Narayanan
Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan. Semantics Derived Automatically from Language Corpora Contain Human-like Biases. Science, 356(6334):183--186, 2017
2017
-
[8]
Learning Gender-Neutral Word Embeddings
Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. Learning Gender-Neutral Word Embeddings. In Proceedings of EMNLP, 2018
2018
-
[9]
Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But Do Not Remove Them
Hila Gonen and Yoav Goldberg. Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But Do Not Remove Them. In Proceedings of NAACL, 2019
2019
-
[10]
How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings
Kawin Ethayarajh. How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings. In Proceedings of EMNLP-IJCNLP, 2019
2019
-
[11]
Aylin Caliskan, Pimparkar Parth Ajay, Tessa Charlesworth, Robert Wolfe, and Mahzarin R. Banaji. Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency, Syntax, and Semantics. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2022
2022
-
[12]
The Impact of Debiasing Word Embeddings on Information Retrieval
Eva Gerritse. The Impact of Debiasing Word Embeddings on Information Retrieval. 2019
2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.