Gender bias and factual gender knowledge are severely entangled in language model circuits and neurons, making neuron ablation an unreliable method for debiasing.
arXiv preprint arXiv:1809.01496 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
abstract
Word embedding models have become a fundamental component in a wide range of Natural Language Processing (NLP) applications. However, embeddings trained on human-generated corpora have been demonstrated to inherit strong gender stereotypes that reflect social constructs. To address this concern, in this paper, we propose a novel training procedure for learning gender-neutral word embeddings. Our approach aims to preserve gender information in certain dimensions of word vectors while compelling other dimensions to be free of gender influence. Based on the proposed method, we generate a Gender-Neutral variant of GloVe (GN-GloVe). Quantitative and qualitative experiments demonstrate that GN-GloVe successfully isolates gender information without sacrificing the functionality of the embedding model.
fields
cs.CL 2verdicts
UNVERDICTED 2representative citing papers
Authors release a new 800-sentence gender-balanced profession dataset and use it to test occupational gender stereotypes in three sentiment analysis models.
citing papers explorer
-
GKnow: Measuring the Entanglement of Gender Bias and Factual Gender
Gender bias and factual gender knowledge are severely entangled in language model circuits and neurons, making neuron ablation an unreliable method for debiasing.
-
Good Secretaries, Bad Truck Drivers? Occupational Gender Stereotypes in Sentiment Analysis
Authors release a new 800-sentence gender-balanced profession dataset and use it to test occupational gender stereotypes in three sentiment analysis models.