pith. sign in

arxiv: 2606.07964 · v1 · pith:QOY4KXADnew · submitted 2026-06-06 · 💻 cs.CL

What Does Debiasing Really Remove? A Geometric Study of PCA-Based Gender Debiasing in Word Embeddings

Pith reviewed 2026-06-27 20:13 UTC · model grok-4.3

classification 💻 cs.CL
keywords gender biasword embeddingsPCA debiasinggeometric analysisprincipal componentsWEATassociative biasdirect bias
0
0 comments X

The pith

PCA gender debiasing removes direct bias from the first principal component but leaves associative bias distributed across dimensions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper performs a geometric analysis of PCA-based methods for removing gender bias from word embeddings. It shows that direct bias concentrates in the first principal component while associative bias measured by WEAT spreads across many dimensions without aligning to those components. Removing additional components reduces the targeted bias yet steadily distorts vector relationships and semantic structure. The results indicate that bias is not confined to a low-rank subspace, so simple removal trades one form of bias reduction for geometric damage with no single optimal cutoff.

Core claim

Direct gender bias in word embeddings is captured primarily by the first principal component, allowing its removal to reduce that bias, whereas associative bias does not align with the principal directions and remains after subspace removal; at the same time, excising any number of components degrades the embedding geometry in a measurable way.

What carries the argument

Principal components of the embedding matrix, which identify a candidate gender subspace whose successive removal is tracked for effects on direct bias, WEAT scores, and geometric fidelity.

If this is right

  • Direct bias decreases when the first principal component is subtracted.
  • Associative bias persists across multiple dimensions after principal-component removal.
  • Semantic relationships and vector norms degrade as more principal components are removed.
  • The optimal number of components to remove varies with the chosen bias metric and the source embedding.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Methods that target only linear subspaces are unlikely to eliminate all measurable gender associations.
  • Debiasing evaluations should separately track direct bias, distributed associations, and geometric integrity rather than rely on one score.
  • Embeddings may require nonlinear or data-driven corrections once low-rank removal reaches its limit.

Load-bearing premise

WEAT scores give a reliable, independent measure of associative gender bias that is separate from the direct bias already captured by the first principal component.

What would settle it

A dataset in which WEAT scores drop sharply after removal of only the first principal component, or in which vector cosine similarities and downstream task performance stay stable after removal of several components.

Figures

Figures reproduced from arXiv: 2606.07964 by Alexey Kresin, Tchifou M. Dieffi, Tomer Caspi.

Figure 1
Figure 1. Figure 1: Explained variance spectrum of the gender subspace. (a) The explained variance ratio of individual principal [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Cross-embedding comparison of cumulative PCA-based component removal on GloVe, FastText, and [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Single-PC ablation analysis with normalized metrics. (a) Bias metrics (direct bias and WEAT) and [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
read the original abstract

Debiasing methods based on principal component analysis (PCA) are broadly used to reduce gender bias in word embeddings used in LLMs, yet it remains unclear what aspects of bias they actually remove and how destructive this process is. These methods are based on the understanding that bias resides in a low-dimensional subspace, with the assumption that most of it can be captured by a few principal components. In this work, we conduct a systematic geometric analysis of PCA-based gender debiasing and investigate what is actually removed from the embedding space. Our experiments across multiple embeddings show that direct gender bias is primarily concentrated in the first principal component, supporting the low-rank bias hypothesis. However, associative bias measured by WEAT does not align with these principal directions and is instead spread across multiple embedding dimensions. Furthermore, as expected, we demonstrate that removing an increasing number of principal components leads to a consistent degradation of the embedding geometry, affecting semantic structure and vector relationships. These results reveal that PCA-based debiasing operates as a trade-off: while it effectively reduces certain forms of direct bias, it fails to eliminate distributed associations and introduces geometric distortion. Moreover, there is no universal optimal level of debiasing, as the balance between bias reduction and semantic preservation depends on the chosen metric and embedding. Overall, our findings suggest that bias in word embeddings is not purely low-rank and that simple subspace removal methods may be insufficient for comprehensive debiasing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript conducts a geometric analysis of PCA-based gender debiasing on word embeddings. It reports that direct gender bias concentrates primarily in the first principal component while associative bias (measured via WEAT) is distributed across multiple dimensions and does not align with the leading PCs. Experiments across embeddings show that removing increasing numbers of principal components reduces certain bias measures but consistently degrades semantic structure and vector relationships, leading to the conclusion that gender bias is not purely low-rank and that simple subspace removal is insufficient for comprehensive debiasing.

Significance. If the geometric separation between direct bias and WEAT holds, the work supplies concrete empirical evidence that PCA debiasing involves an unavoidable trade-off between bias reduction and preservation of embedding utility, with no universal optimum. The multi-embedding scope and explicit focus on what is removed versus what remains are positive contributions to the debiasing literature.

major comments (2)
  1. [Abstract] Abstract: The central claim that 'associative bias measured by WEAT does not align with these principal directions and is instead spread across multiple embedding dimensions' (and thus that bias is not purely low-rank) is load-bearing, yet the text supplies no verification that WEAT effect sizes are independent of PC1, such as the correlation between WEAT word-pair difference vectors and PC1 loadings or WEAT scores computed after explicit removal of the first component.
  2. [Experiments] Experiments section (implied by abstract description of 'experiments across multiple embeddings'): No error bars, dataset sizes for the WEAT tests, or statistical significance tests are reported for the 'consistent degradation trends,' which weakens the ability to evaluate whether the observed spread of WEAT bias is robust or an artifact of metric choice.
minor comments (1)
  1. [Abstract] The abstract invokes 'direct gender bias' and 'associative bias' without a brief parenthetical definition or reference to the precise operationalization used in the experiments.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major point below, agreeing where additional verification or reporting is warranted and outlining the planned revisions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that 'associative bias measured by WEAT does not align with these principal directions and is instead spread across multiple embedding dimensions' (and thus that bias is not purely low-rank) is load-bearing, yet the text supplies no verification that WEAT effect sizes are independent of PC1, such as the correlation between WEAT word-pair difference vectors and PC1 loadings or WEAT scores computed after explicit removal of the first component.

    Authors: We agree that explicit verification would strengthen the central claim. The main text already includes geometric projections and quantitative comparisons showing that WEAT bias does not concentrate in the leading PCs (unlike direct bias), but we will add the suggested analyses: Pearson correlations between WEAT word-pair difference vectors and PC1 loadings, plus WEAT effect sizes recomputed after ablating the first component. These will be incorporated into Section 4 with a new table or figure for clarity. revision: yes

  2. Referee: [Experiments] Experiments section (implied by abstract description of 'experiments across multiple embeddings'): No error bars, dataset sizes for the WEAT tests, or statistical significance tests are reported for the 'consistent degradation trends,' which weakens the ability to evaluate whether the observed spread of WEAT bias is robust or an artifact of metric choice.

    Authors: We acknowledge this reporting gap. The experiments were run on standard WEAT test sets (e.g., the original Caliskan et al. stimuli) across multiple embeddings, but we will revise the Experiments section to report exact dataset sizes, add error bars (standard deviation across embeddings or bootstrap) to all degradation plots, and include statistical significance tests (paired t-tests or Wilcoxon tests) for the observed trends in bias reduction versus semantic degradation. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical geometric analysis with independent metrics

full rationale

The paper conducts experiments comparing PCA components to direct bias and WEAT associative bias across embeddings, reporting observed distributions without any derivation chain. No equations define a quantity in terms of itself, no fitted parameter is relabeled as a prediction, and no self-citation is invoked as a uniqueness theorem or load-bearing premise. Claims rest on external benchmarks (standard word embeddings, WEAT scores, geometric distortion measures) that are not constructed from the paper's own fitted values, making the analysis self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the low-rank bias hypothesis being testable via PCA and on WEAT being an appropriate proxy for associative bias; no free parameters or invented entities are described in the abstract.

axioms (2)
  • domain assumption Bias resides in a low-dimensional subspace that can be captured by principal components
    Stated explicitly in the abstract as the basis for PCA debiasing methods.
  • domain assumption WEAT scores measure associative bias independently of the direct bias subspace
    Used to contrast the two forms of bias when claiming that associative bias is spread across dimensions.

pith-pipeline@v0.9.1-grok · 5795 in / 1329 out tokens · 16840 ms · 2026-06-27T20:13:39.244133+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

12 extracted references · 1 canonical work pages · 1 internal anchor

  1. [1]

    Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on , pages=

    Real-time segmentation of on-line handwritten arabic script , author=. Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on , pages=. 2014 , organization=

  2. [2]

    Soft Computing and Pattern Recognition (SoCPaR), 2014 6th International Conference of , pages=

    Fast classification of handwritten on-line Arabic characters , author=. Soft Computing and Pattern Recognition (SoCPaR), 2014 6th International Conference of , pages=. 2014 , organization=

  3. [3]

    Estimate and Replace: A Novel Approach to Integrating Deep Neural Networks with Existing Applications

    Estimate and Replace: A Novel Approach to Integrating Deep Neural Networks with Existing Applications , author=. arXiv preprint arXiv:1804.09028 , year=

  4. [4]

    Efficient Estimation of Word Representations in Vector Space

    Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In ICLR, 2013

  5. [5]

    GloVe: Global Vectors for Word Representation

    Jeffrey Pennington, Richard Socher, and Christopher Manning. GloVe: Global Vectors for Word Representation. In EMNLP, 2014

  6. [6]

    Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

    Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In Advances in Neural Information Processing Systems (NeurIPS), 2016

  7. [7]

    Bryson, and Arvind Narayanan

    Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan. Semantics Derived Automatically from Language Corpora Contain Human-like Biases. Science, 356(6334):183--186, 2017

  8. [8]

    Learning Gender-Neutral Word Embeddings

    Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. Learning Gender-Neutral Word Embeddings. In Proceedings of EMNLP, 2018

  9. [9]

    Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But Do Not Remove Them

    Hila Gonen and Yoav Goldberg. Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But Do Not Remove Them. In Proceedings of NAACL, 2019

  10. [10]

    How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings

    Kawin Ethayarajh. How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings. In Proceedings of EMNLP-IJCNLP, 2019

  11. [11]

    Aylin Caliskan, Pimparkar Parth Ajay, Tessa Charlesworth, Robert Wolfe, and Mahzarin R. Banaji. Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency, Syntax, and Semantics. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2022

  12. [12]

    The Impact of Debiasing Word Embeddings on Information Retrieval

    Eva Gerritse. The Impact of Debiasing Word Embeddings on Information Retrieval. 2019