Bleaching Text: Abstract Features for Cross-lingual Gender Prediction

Barbara Plank; Ian Matroos; Malvina Nissim; Nikola Ljube\v{s}i\'c; Rob van der Goot

arxiv: 1805.03122 · v1 · pith:HW42DZ3Enew · submitted 2018-05-08 · 💻 cs.CL

Bleaching Text: Abstract Features for Cross-lingual Gender Prediction

Rob van der Goot , Nikola Ljube\v{s}i\'c , Ian Matroos , Malvina Nissim , Barbara Plank This is my paper

classification 💻 cs.CL

keywords featurescross-lingualgenderlexicalpredictionabstractbetterbleaching

0 comments

read the original abstract

Gender prediction has typically focused on lexical and social network features, yielding good performance, but making systems highly language-, topic-, and platform-dependent. Cross-lingual embeddings circumvent some of these limitations, but capture gender-specific style less. We propose an alternative: bleaching text, i.e., transforming lexical strings into more abstract features. This study provides evidence that such features allow for better transfer across languages. Moreover, we present a first study on the ability of humans to perform cross-lingual gender prediction. We find that human predictive power proves similar to that of our bleached models, and both perform better than lexical models.

This paper has not been read by Pith yet.

Bleaching Text: Abstract Features for Cross-lingual Gender Prediction

discussion (0)