Good Secretaries, Bad Truck Drivers? Occupational Gender Stereotypes in Sentiment Analysis

Isha Bhallamudi; Jayadev Bhaskaran

REVIEW 1 major objections 1 minor 37 references

Reviewed by Pith at T0; open to challenge.

T0 means a machine referee read the full paper against a public rubric. The mark states how deep the mechanical check went, never who wrote it. the ladder, T0–T4 →

Challenge this review Re-run · record.json Download PDF Read on arXiv ↗

T0 review · grok-4.3

Sentiment analysis models assign different scores to the same occupation depending on whether the subject is described as male or female.

2026-05-25 17:09 UTC pith:TECGLZJW

load-bearing objection The paper ships a new 800-sentence benchmark for occupational gender stereotypes in sentiment models and tests it on three systems, but the sentence construction details are thin enough that confounds remain possible. the 1 major comments →

arxiv 1906.10256 v2 pith:TECGLZJW submitted 2019-06-24 cs.CL

Good Secretaries, Bad Truck Drivers? Occupational Gender Stereotypes in Sentiment Analysis

Jayadev Bhaskaran , Isha Bhallamudi This is my paper

classification cs.CL

keywords occupational gender stereotypessentiment analysisgender biasNLP evaluationprofession datasetsmodel bias testing

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

The pith

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates a dataset of 800 gender-balanced sentences about specific professions and uses it to test whether sentiment models produce systematically different outputs based on the gender paired with each job. It runs the test on three models and checks whether the observed differences match broader societal views of which occupations are seen as masculine or feminine. A reader would care because these models are increasingly used in applications that evaluate text about people and work. If the differences are real and consistent, the work supplies a repeatable method for detecting and tracking such biases.

Core claim

The authors establish that occupational gender stereotypes appear in sentiment analysis models through measurable differences in sentiment scores on gender-swapped sentences about the same professions, and that the pattern of these differences corresponds to societal perceptions of occupational gender roles.

What carries the argument

A released gender-balanced dataset of 800 sentences about professions, employed as a test bench to isolate sentiment differences attributable to gender-occupation pairings.

Load-bearing premise

Differences in model scores on the 800 sentences arise solely from the gender-occupation link rather than from other variations in sentence wording or structure.

What would settle it

Applying the three models to the dataset and finding no consistent, statistically significant difference in average sentiment between the male-subject and female-subject versions of the same profession sentences.

Watch this falsifier — get emailed when new claim-graph text bears on it.

If this is right

Sentiment models will carry forward and potentially reinforce societal occupational gender stereotypes in any downstream task that processes text about jobs.
The degree of bias in a model can be quantified and compared against independent measures of societal occupational stereotypes.
Applications that rely on sentiment scores for hiring-related text, reviews, or social media will inherit these occupation-by-gender patterns unless corrected.
The released dataset provides a public benchmark that other researchers can apply to additional models or updated versions of existing models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sentence-construction approach could be adapted to measure other embedded stereotypes such as those involving race or age in sentiment outputs.
Retraining the tested models on data that deliberately balances gender across professions might reduce the measured score differences.
The method offers a template for auditing bias in other NLP tasks that assign numerical scores to text involving people and roles.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit.

Desk Editor's Note

The paper ships a new 800-sentence benchmark for occupational gender stereotypes in sentiment models and tests it on three systems, but the sentence construction details are thin enough that confounds remain possible.

read the letter

The main takeaway is a released dataset of 800 gender-balanced sentences about specific professions, paired with a test method to measure whether sentiment models assign different scores based on the gender of the person in the job. They run this on three models and compare the outputs to external perceptions of those occupations. Releasing the data is the concrete step forward here; anyone auditing deployed sentiment systems now has a narrow, targeted probe they can apply directly.

Referee Report

1 major / 1 minor

Summary. The paper claims that occupational gender stereotypes are present in sentiment analysis models and can be measured using a newly released gender-balanced dataset of 800 profession-related sentences. It proposes a methodology to use this dataset as a test bench, evaluates three models, and relates the observed biases to societal perceptions of occupations.

Significance. If the dataset construction isolates gender-occupation effects without lexical confounds, the work supplies a concrete, reproducible benchmark for quantifying and mitigating implicit biases in sentiment models, which are widely deployed in downstream applications.

major comments (1)

[§4] §4 (Dataset): the claim that the 800-sentence set isolates occupational gender stereotypes requires evidence that male/female sentence pairs differ only in gender markers; no validation is provided that verb choice, objects, sentence length, or profession-specific phrasing are balanced, so sentiment gaps could arise from template artifacts rather than stereotypes.

minor comments (1)

The abstract does not name the three evaluated models or the exact societal-perception data source used for correlation analysis.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback. We address the major comment on dataset validation below.

read point-by-point responses

Referee: [§4] §4 (Dataset): the claim that the 800-sentence set isolates occupational gender stereotypes requires evidence that male/female sentence pairs differ only in gender markers; no validation is provided that verb choice, objects, sentence length, or profession-specific phrasing are balanced, so sentiment gaps could arise from template artifacts rather than stereotypes.

Authors: We agree that the manuscript does not include explicit quantitative validation of balance across non-gender features. The 800 sentences were generated from a small number of fixed templates per profession, with only the gendered pronoun and the profession noun varied while holding verbs, objects, and overall structure constant within each profession pair. We will revise §4 to describe the template design in detail and add balance statistics (identical sentence lengths within pairs, identical verbs/objects for matched sentences) to demonstrate that sentiment differences arise from gender-occupation associations rather than template artifacts. revision: yes

Circularity Check

0 steps flagged

Empirical measurement study with newly introduced dataset exhibits no circularity

full rationale

The paper releases a new gender-balanced dataset of 800 sentences and applies it to measure occupational gender stereotypes in three sentiment models, relating results to external societal perceptions. No derivation chain, equations, or fitted parameters are present; the central claim rests on the independent construction and evaluation of this dataset rather than reducing to prior inputs, self-citations, or ansatzes. This matches the default expectation for non-circular empirical work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are stated or required by the described contribution.

pith-pipeline@v0.9.0 · 5608 in / 979 out tokens · 29660 ms · 2026-05-25T17:09:42.026405+00:00 · methodology

0 comments

read the original abstract

In this work, we investigate the presence of occupational gender stereotypes in sentiment analysis models. Such a task has implications for reducing implicit biases in these models, which are being applied to an increasingly wide variety of downstream tasks. We release a new gender-balanced dataset of 800 sentences pertaining to specific professions and propose a methodology for using it as a test bench to evaluate sentiment analysis models. We evaluate the presence of occupational gender stereotypes in 3 different models using our approach, and explore their relationship with societal perceptions of occupations.

Figures

Figures reproduced from arXiv: 1906.10256 by Isha Bhallamudi, Jayadev Bhaskaran.

**Figure 1.** Figure 1: Simple diagram of our task definition. representations as building blocks for NLP tasks. The rise of this paradigm is characterized by the use of language models for pretraining, exemplified by models such as ELMo (Peters et al., 2018), ULMFit (Howard and Ruder, 2018), GPT (Radford, 2018), and BERT (Devlin et al., 2018). These models have shown marked improvements over word vector based approaches for a… view at source ↗

**Figure 2.** Figure 2: Median weekly earnings (Current Population Survey, 2018) vs. mean predicted positive probability using M.3 (BERT), per profession. of external data. First, we analyze differences in mean positive class probability between sentences with male and female nouns for each profession. We notice that pilot has the highest positive difference between female and male noun sentences (i.e., female is higher), while … view at source ↗

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 8 internal anchors

[1]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

Evaluating the Underlying Gender Bias in Contextualized Word Embeddings

Christine Basta, Marta R. Costa-juss\`a, and Noe Casas. 2019. http://arxiv.org/abs/1904.08783 Evaluating the U nderlying G ender B ias in C ontextualized W ord E mbeddings . arXiv e-prints

work page internal anchor Pith review Pith/arXiv arXiv 2019
[4]

Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. 2016. http://dl.acm.org/citation.cfm?id=3157382.3157584 Man is to C omputer P rogrammer A s W oman is to H omemaker? D ebiasing W ord E mbeddings . In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS'16, pages 4356--4364, USA. Cur...

work page arXiv 2016
[5]

Carlo Bonferroni. 1936. Teoria statistica delle classi e calcolo delle probabilita. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze, 8:3--62

work page 1936
[6]

Bryson and Arvind Narayanan , title =

Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan. 2017. https://doi.org/10.1126/science.aal4230 Semantics derived automatically from language corpora contain human-like biases . Science, 356(6334):183--186

work page doi:10.1126/science.aal4230 2017
[7]

Mary Ann Cejka and Alice H. Eagly. 1999. https://doi.org/10.1177/0146167299025004002 Gender- S tereotypic I mages of O ccupations C orrespond to the S ex S egregation of E mployment . Personality and Social Psychology Bulletin, 25(4):413--423

work page doi:10.1177/0146167299025004002 1999
[8]

Fran c ois Chollet et al. 2015. Keras. https://keras.io

work page 2015
[9]

Andre Costa and Adriano Veloso. 2015. https://doi.org/10.13140/RG.2.1.1623.3688 Employee analytics through sentiment analysis . In Brazilian Symposium on Databases

work page doi:10.13140/rg.2.1.1623.3688 2015
[10]

Current Population Survey . 2018. https://www.bls.gov/cps/cpsaat39.htm 39. Median weekly earnings of full-time wage and salary workers by detailed occupation and sex . Bureau of Labor Statistics, United States Department of Labor

work page 2018
[11]

Eva Derous and Ann Marie Ryan. 2018. https://doi.org/10.1111/1748-8583.12217 When your resume is (not) turning you down: Modelling ethnic bias in resume screening . Human Resource Management Journal, 29(2):113--130

work page doi:10.1111/1748-8583.12217 2018
[12]

Jacob Devlin, Ming - Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. http://arxiv.org/abs/1810.04805 BERT: P re-training of D eep B idirectional T ransformers for L anguage understanding . CoRR, abs/1810.04805

work page internal anchor Pith review Pith/arXiv arXiv 2018
[13]

Alice H Eagly and Valerie J Steffen. 1984. Gender stereotypes stem from the distribution of women and men into social roles. Journal of personality and social psychology, 46(4):735

work page 1984
[14]

Nikhil Garg, Londa Schiebinger, Dan Jurafsky, and James Zou. 2018. https://doi.org/10.1073/pnas.1720347115 Word embeddings quantify 100 years of gender and ethnic stereotypes . Proceedings of the National Academy of Sciences, 115(16):E3635--E3644

work page doi:10.1073/pnas.1720347115 2018
[15]

Peter Glick, Korin Wilk, and Michele Perreault. 1995. https://doi.org/10.1007/BF01544212 Images of occupations: C omponents of gender and status in occupational stereotypes . Sex Roles, 32(9):565--582

work page doi:10.1007/bf01544212 1995
[16]

Hila Gonen and Yoav Goldberg. 2019. http://arxiv.org/abs/1903.03862 Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them . arXiv e-prints

work page arXiv 2019
[17]

Haines, Kay Deaux, and Nicole Lofaro

Elizabeth L. Haines, Kay Deaux, and Nicole Lofaro. 2016. https://doi.org/10.1177/0361684316634081 The T imes T hey A re a- C hanging... or A re T hey N ot? A C omparison of G ender S tereotypes, 1983-2014 . Psychology of Women Quarterly, 40(3):353--363

work page doi:10.1177/0361684316634081 2016
[18]

Sepp Hochreiter and J\" u rgen Schmidhuber. 1997. https://doi.org/10.1162/neco.1997.9.8.1735 Long S hort- T erm M emory . Neural Comput., 9(8):1735--1780

work page doi:10.1162/neco.1997.9.8.1735 1997
[19]

Jeremy Howard and Sebastian Ruder. 2018. http://arxiv.org/abs/1801.06146 Fine-tuned L anguage M odels for T ext C lassification . CoRR, abs/1801.06146

work page internal anchor Pith review Pith/arXiv arXiv 2018
[20]

Matthew Kay, Cynthia Matuszek, and Sean A. Munson. 2015. https://doi.org/10.1145/2702123.2702520 Unequal representation and gender stereotypes in image search results for occupations . In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, CHI '15, pages 3819--3828, New York, NY, USA. ACM

work page doi:10.1145/2702123.2702520 2015
[21]

Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems

Svetlana Kiritchenko and Saif M. Mohammad. 2018. http://arxiv.org/abs/1805.04508 Examining gender and race bias in two hundred sentiment analysis systems . CoRR, abs/1805.04508

work page internal anchor Pith review Pith/arXiv arXiv 2018
[22]

On Measuring Social Biases in Sentence Encoders

Chandler May, Alex Wang, Shikha Bordia, Samuel R. Bowman, and Rachel Rudinger. 2019. http://arxiv.org/abs/1903.10561 On measuring social biases in sentence encoders

work page internal anchor Pith review Pith/arXiv arXiv 2019
[23]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf Distributed R epresentations of W ords and P hrases and their C ompositionality . In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, edit...

work page 2013
[24]

Mohammad, Felipe Bravo-Marquez, Mohammad Salameh, and Svetlana Kiritchenko

Saif M. Mohammad, Felipe Bravo-Marquez, Mohammad Salameh, and Svetlana Kiritchenko. 2018. Semeval-2018 T ask 1: A ffect in T weets. In Proceedings of International Workshop on Semantic Evaluation (SemEval-2018), New Orleans, LA, USA

work page 2018
[25]

Astrid Nieuwets. 2015. Fallen Females: On the Semantic Pejoration of Mistress and Spinster . Bachelor's thesis, Utrecht University

work page 2015
[26]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in P ython. Journal of Machine Learning Research, 12:2825--2830

work page 2011
[27]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. http://www.aclweb.org/anthology/D14-1162 Glove: G lobal V ectors for W ord R epresentation . In Empirical Methods in Natural Language Processing (EMNLP), pages 1532--1543

work page 2014
[28]

Deep contextualized word representations

Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. http://arxiv.org/abs/1802.05365 Deep contextualized word representations . CoRR, abs/1802.05365

work page internal anchor Pith review Pith/arXiv arXiv 2018
[29]

Alec Radford. 2018. Improving L anguage U nderstanding by G enerative P re- T raining

work page 2018
[30]

Rachel Rudinger, Jason Naradowsky, Brian Leonard, and Benjamin Van Durme. 2018. https://doi.org/10.18653/v1/n18-2002 Gender bias in coreference resolution . Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

work page doi:10.18653/v1/n18-2002 2018
[31]

Rudman and Julie E

Laurie A. Rudman and Julie E. Phelan. 2008. https://doi.org/https://doi.org/10.1016/j.riob.2008.04.003 Backlash effects for disconfirming gender stereotypes in organizations . Research in Organizational Behavior, 28:61 -- 79

work page doi:10.1016/j.riob.2008.04.003 2008
[32]

Eva H Shinar. 1975. https://doi.org/https://doi.org/10.1016/0001-8791(75)90037-8 Sexual stereotypes of occupations . Journal of Vocational Behavior, 7(1):99 -- 111

work page doi:10.1016/0001-8791(75)90037-8 1975
[33]

Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher Manning, Andrew Ng, and Christopher Potts. 2013. Recursive D eep M odels for S emantic C ompositionality O ver a S entiment T reebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1631--1642

work page 2013
[34]

Dries Vervecken, Bettina Hannover, and Ilka Wolter. 2013. https://doi.org/https://doi.org/10.1016/j.jvb.2013.01.008 Changing ( S )expectations: How gender fair job descriptions impact children's perceptions and interest regarding traditionally male occupations . Journal of Vocational Behavior, 82(3):208 -- 220

work page doi:10.1016/j.jvb.2013.01.008 2013
[35]

Kellie Webster, Marta Recasens, Vera Axelrod, and Jason Baldridge. 2018. https://doi.org/10.1162/tacl_a_00240 Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns . Transactions of the Association for Computational Linguistics, 6:605–617

work page doi:10.1162/tacl_a_00240 2018
[36]

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell, Vicente Ordonez, and Kai-Wei Chang. 2019. Gender B ias in C ontextualized W ord E mbeddings. CoRR, abs/1904.03310

work page internal anchor Pith review Pith/arXiv arXiv 2019
[37]

Jieyu Zhao, Yichao Zhou, Zeyu Li, Wei Wang, and Kai - Wei Chang. 2018. http://arxiv.org/abs/1809.01496 Learning G ender- N eutral W ord E mbeddings . CoRR, abs/1809.01496

work page internal anchor Pith review Pith/arXiv arXiv 2018

[1] [1]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[3] [3]

Evaluating the Underlying Gender Bias in Contextualized Word Embeddings

Christine Basta, Marta R. Costa-juss\`a, and Noe Casas. 2019. http://arxiv.org/abs/1904.08783 Evaluating the U nderlying G ender B ias in C ontextualized W ord E mbeddings . arXiv e-prints

work page internal anchor Pith review Pith/arXiv arXiv 2019

[4] [4]

Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. 2016. http://dl.acm.org/citation.cfm?id=3157382.3157584 Man is to C omputer P rogrammer A s W oman is to H omemaker? D ebiasing W ord E mbeddings . In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS'16, pages 4356--4364, USA. Cur...

work page arXiv 2016

[5] [5]

Carlo Bonferroni. 1936. Teoria statistica delle classi e calcolo delle probabilita. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze, 8:3--62

work page 1936

[6] [6]

Bryson and Arvind Narayanan , title =

Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan. 2017. https://doi.org/10.1126/science.aal4230 Semantics derived automatically from language corpora contain human-like biases . Science, 356(6334):183--186

work page doi:10.1126/science.aal4230 2017

[7] [7]

Mary Ann Cejka and Alice H. Eagly. 1999. https://doi.org/10.1177/0146167299025004002 Gender- S tereotypic I mages of O ccupations C orrespond to the S ex S egregation of E mployment . Personality and Social Psychology Bulletin, 25(4):413--423

work page doi:10.1177/0146167299025004002 1999

[8] [8]

Fran c ois Chollet et al. 2015. Keras. https://keras.io

work page 2015

[9] [9]

Andre Costa and Adriano Veloso. 2015. https://doi.org/10.13140/RG.2.1.1623.3688 Employee analytics through sentiment analysis . In Brazilian Symposium on Databases

work page doi:10.13140/rg.2.1.1623.3688 2015

[10] [10]

Current Population Survey . 2018. https://www.bls.gov/cps/cpsaat39.htm 39. Median weekly earnings of full-time wage and salary workers by detailed occupation and sex . Bureau of Labor Statistics, United States Department of Labor

work page 2018

[11] [11]

Eva Derous and Ann Marie Ryan. 2018. https://doi.org/10.1111/1748-8583.12217 When your resume is (not) turning you down: Modelling ethnic bias in resume screening . Human Resource Management Journal, 29(2):113--130

work page doi:10.1111/1748-8583.12217 2018

[12] [12]

Jacob Devlin, Ming - Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. http://arxiv.org/abs/1810.04805 BERT: P re-training of D eep B idirectional T ransformers for L anguage understanding . CoRR, abs/1810.04805

work page internal anchor Pith review Pith/arXiv arXiv 2018

[13] [13]

Alice H Eagly and Valerie J Steffen. 1984. Gender stereotypes stem from the distribution of women and men into social roles. Journal of personality and social psychology, 46(4):735

work page 1984

[14] [14]

Nikhil Garg, Londa Schiebinger, Dan Jurafsky, and James Zou. 2018. https://doi.org/10.1073/pnas.1720347115 Word embeddings quantify 100 years of gender and ethnic stereotypes . Proceedings of the National Academy of Sciences, 115(16):E3635--E3644

work page doi:10.1073/pnas.1720347115 2018

[15] [15]

Peter Glick, Korin Wilk, and Michele Perreault. 1995. https://doi.org/10.1007/BF01544212 Images of occupations: C omponents of gender and status in occupational stereotypes . Sex Roles, 32(9):565--582

work page doi:10.1007/bf01544212 1995

[16] [16]

Hila Gonen and Yoav Goldberg. 2019. http://arxiv.org/abs/1903.03862 Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them . arXiv e-prints

work page arXiv 2019

[17] [17]

Haines, Kay Deaux, and Nicole Lofaro

Elizabeth L. Haines, Kay Deaux, and Nicole Lofaro. 2016. https://doi.org/10.1177/0361684316634081 The T imes T hey A re a- C hanging... or A re T hey N ot? A C omparison of G ender S tereotypes, 1983-2014 . Psychology of Women Quarterly, 40(3):353--363

work page doi:10.1177/0361684316634081 2016

[18] [18]

Sepp Hochreiter and J\" u rgen Schmidhuber. 1997. https://doi.org/10.1162/neco.1997.9.8.1735 Long S hort- T erm M emory . Neural Comput., 9(8):1735--1780

work page doi:10.1162/neco.1997.9.8.1735 1997

[19] [19]

Jeremy Howard and Sebastian Ruder. 2018. http://arxiv.org/abs/1801.06146 Fine-tuned L anguage M odels for T ext C lassification . CoRR, abs/1801.06146

work page internal anchor Pith review Pith/arXiv arXiv 2018

[20] [20]

Matthew Kay, Cynthia Matuszek, and Sean A. Munson. 2015. https://doi.org/10.1145/2702123.2702520 Unequal representation and gender stereotypes in image search results for occupations . In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, CHI '15, pages 3819--3828, New York, NY, USA. ACM

work page doi:10.1145/2702123.2702520 2015

[21] [21]

Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems

Svetlana Kiritchenko and Saif M. Mohammad. 2018. http://arxiv.org/abs/1805.04508 Examining gender and race bias in two hundred sentiment analysis systems . CoRR, abs/1805.04508

work page internal anchor Pith review Pith/arXiv arXiv 2018

[22] [22]

On Measuring Social Biases in Sentence Encoders

Chandler May, Alex Wang, Shikha Bordia, Samuel R. Bowman, and Rachel Rudinger. 2019. http://arxiv.org/abs/1903.10561 On measuring social biases in sentence encoders

work page internal anchor Pith review Pith/arXiv arXiv 2019

[23] [23]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf Distributed R epresentations of W ords and P hrases and their C ompositionality . In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, edit...

work page 2013

[24] [24]

Mohammad, Felipe Bravo-Marquez, Mohammad Salameh, and Svetlana Kiritchenko

Saif M. Mohammad, Felipe Bravo-Marquez, Mohammad Salameh, and Svetlana Kiritchenko. 2018. Semeval-2018 T ask 1: A ffect in T weets. In Proceedings of International Workshop on Semantic Evaluation (SemEval-2018), New Orleans, LA, USA

work page 2018

[25] [25]

Astrid Nieuwets. 2015. Fallen Females: On the Semantic Pejoration of Mistress and Spinster . Bachelor's thesis, Utrecht University

work page 2015

[26] [26]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in P ython. Journal of Machine Learning Research, 12:2825--2830

work page 2011

[27] [27]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. http://www.aclweb.org/anthology/D14-1162 Glove: G lobal V ectors for W ord R epresentation . In Empirical Methods in Natural Language Processing (EMNLP), pages 1532--1543

work page 2014

[28] [28]

Deep contextualized word representations

Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. http://arxiv.org/abs/1802.05365 Deep contextualized word representations . CoRR, abs/1802.05365

work page internal anchor Pith review Pith/arXiv arXiv 2018

[29] [29]

Alec Radford. 2018. Improving L anguage U nderstanding by G enerative P re- T raining

work page 2018

[30] [30]

Rachel Rudinger, Jason Naradowsky, Brian Leonard, and Benjamin Van Durme. 2018. https://doi.org/10.18653/v1/n18-2002 Gender bias in coreference resolution . Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

work page doi:10.18653/v1/n18-2002 2018

[31] [31]

Rudman and Julie E

Laurie A. Rudman and Julie E. Phelan. 2008. https://doi.org/https://doi.org/10.1016/j.riob.2008.04.003 Backlash effects for disconfirming gender stereotypes in organizations . Research in Organizational Behavior, 28:61 -- 79

work page doi:10.1016/j.riob.2008.04.003 2008

[32] [32]

Eva H Shinar. 1975. https://doi.org/https://doi.org/10.1016/0001-8791(75)90037-8 Sexual stereotypes of occupations . Journal of Vocational Behavior, 7(1):99 -- 111

work page doi:10.1016/0001-8791(75)90037-8 1975

[33] [33]

Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher Manning, Andrew Ng, and Christopher Potts. 2013. Recursive D eep M odels for S emantic C ompositionality O ver a S entiment T reebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1631--1642

work page 2013

[34] [34]

Dries Vervecken, Bettina Hannover, and Ilka Wolter. 2013. https://doi.org/https://doi.org/10.1016/j.jvb.2013.01.008 Changing ( S )expectations: How gender fair job descriptions impact children's perceptions and interest regarding traditionally male occupations . Journal of Vocational Behavior, 82(3):208 -- 220

work page doi:10.1016/j.jvb.2013.01.008 2013

[35] [35]

Kellie Webster, Marta Recasens, Vera Axelrod, and Jason Baldridge. 2018. https://doi.org/10.1162/tacl_a_00240 Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns . Transactions of the Association for Computational Linguistics, 6:605–617

work page doi:10.1162/tacl_a_00240 2018

[36] [36]

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell, Vicente Ordonez, and Kai-Wei Chang. 2019. Gender B ias in C ontextualized W ord E mbeddings. CoRR, abs/1904.03310

work page internal anchor Pith review Pith/arXiv arXiv 2019

[37] [37]

Jieyu Zhao, Yichao Zhou, Zeyu Li, Wei Wang, and Kai - Wei Chang. 2018. http://arxiv.org/abs/1809.01496 Learning G ender- N eutral W ord E mbeddings . CoRR, abs/1809.01496

work page internal anchor Pith review Pith/arXiv arXiv 2018