SEntiMoji: An Emoji-Powered Learning Approach for Sentiment Analysis in Software Engineering

Qiaozhu Mei; Xuan Lu; Xuanzhe Liu; Yanbin Cao; Zhenpeng Chen

arxiv: 1907.02202 · v1 · pith:VYOSUFQWnew · submitted 2019-07-04 · 💻 cs.SE · cs.CL

SEntiMoji: An Emoji-Powered Learning Approach for Sentiment Analysis in Software Engineering

Zhenpeng Chen , Yanbin Cao , Xuan Lu , Qiaozhu Mei , Xuanzhe Liu This is my paper

Pith reviewed 2026-05-25 09:34 UTC · model grok-4.3

classification 💻 cs.SE cs.CL

keywords sentiment analysissoftware engineeringemojinoisy labelsrepresentation learningtweetsgithubtechnical jargon

0 comments

The pith

Emotional emojis from tweets and GitHub posts train better sentiment classifiers for software engineering texts than prior methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that scarce labeled SE data limits sentiment analysis quality because of technical jargon, so the authors use emotional emojis as noisy labels to learn representations from abundant tweets and GitHub posts. These representations capture both domain jargon and cross-domain sentiment patterns. The resulting classifier, trained on the learned representations plus available labeled SE data, shows significant gains on benchmark SE datasets. The work shows that general-domain signals via emojis matter more than purely domain-specific resources.

Core claim

We employ emotional emojis as noisy labels of sentiments and propose a representation learning approach that uses both Tweets and GitHub posts containing emojis to learn sentiment-aware representations for SE-related texts. These emoji-labeled posts can not only supply the technical jargon, but also incorporate more general sentiment patterns shared across domains. They as well as labeled data are used to learn the final sentiment classifier.

What carries the argument

Representation learning supervised by emotional emojis in tweets and GitHub posts to produce sentiment-aware embeddings for SE texts.

If this is right

The method achieves significant improvement on representative benchmark datasets for SE sentiment analysis.
Tweets contribute the majority of the performance gain.
Future SE sentiment work should draw on open-domain data through signals such as emojis instead of relying solely on limited domain-specific labeled resources.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same emoji-supervision approach could transfer to other technical domains where jargon blocks off-the-shelf NLP tools.
Replacing the current representation learner with more recent embedding models might further boost results without changing the core idea.
The finding that general-domain data helps suggests testing emoji labels on additional SE tasks such as opinion mining in code reviews.

Load-bearing premise

Emotional emojis accurately reflect the sentiment of the surrounding text even when the text contains technical jargon.

What would settle it

An experiment that trains the same model without the emoji-based pretraining step and finds no improvement or a drop in accuracy on the SE benchmark datasets would falsify the central claim.

Figures

Figures reproduced from arXiv: 1907.02202 by Qiaozhu Mei, Xuan Lu, Xuanzhe Liu, Yanbin Cao, Zhenpeng Chen.

**Figure 1.** Figure 1: The architecture of DeepMoji. and we call it DeepMoji-SE; 2) use DeepMoji-SE to obtain vector representations of the sentiment-labeled texts and then use these vectors as features to train the sentiment classifier. Next, we describe the existing DeepMoji model and the two-stage learning process in details. 3.2.1 DeepMoji Model Felbo et al. [23] learned DeepMoji model through predicting emojis used in Twee… view at source ↗

read the original abstract

Sentiment analysis has various application scenarios in software engineering (SE), such as detecting developers' emotions in commit messages and identifying their opinions on Q&A forums. However, commonly used out-of-the-box sentiment analysis tools cannot obtain reliable results on SE tasks and the misunderstanding of technical jargon is demonstrated to be the main reason. Then, researchers have to utilize labeled SE-related texts to customize sentiment analysis for SE tasks via a variety of algorithms. However, the scarce labeled data can cover only very limited expressions and thus cannot guarantee the analysis quality. To address such a problem, we turn to the easily available emoji usage data for help. More specifically, we employ emotional emojis as noisy labels of sentiments and propose a representation learning approach that uses both Tweets and GitHub posts containing emojis to learn sentiment-aware representations for SE-related texts. These emoji-labeled posts can not only supply the technical jargon, but also incorporate more general sentiment patterns shared across domains. They as well as labeled data are used to learn the final sentiment classifier. Compared to the existing sentiment analysis methods used in SE, the proposed approach can achieve significant improvement on representative benchmark datasets. By further contrast experiments, we find that the Tweets make a key contribution to the power of our approach. This finding informs future research not to unilaterally pursue the domain-specific resource, but try to transform knowledge from the open domain through ubiquitous signals such as emojis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Combining emoji signals from tweets and GitHub helps SE sentiment analysis more than expected, mainly via the tweets, though the emoji-to-sentiment mapping in jargon-heavy text lacks direct validation.

read the letter

The paper shows that emoji data from tweets and GitHub can be combined to learn sentiment representations that improve performance on SE benchmarks over standard tools. The key new piece is the joint use of those two sources and the finding that tweets contribute the most. It does a solid job of tackling scarce labeled data in SE by leveraging noisy emoji labels for pretraining. The contrast experiments are a nice touch and give a practical takeaway about not ignoring general domain data. The approach is representation learning plus fine-tuning, which is appropriate here. The main soft spot is that there's no reported check on whether the emoji labels actually align with sentiment in SE texts. If emojis don't capture the right signals when technical jargon is present, the transfer could fall short. The abstract also doesn't give the numbers or baselines, so the size of the improvement is unclear from the summary alone. The math is simple and the data sources are external, so no circularity issues. This is for SE practitioners or researchers who work on sentiment tools for code-related text. It would be useful for anyone looking at noisy supervision from social media. It should go to peer review. The method is concrete and the source comparison is worth referee scrutiny.

Referee Report

2 major / 1 minor

Summary. The paper proposes SEntiMoji, a representation learning approach that treats emotional emojis in Tweets and GitHub posts as noisy sentiment labels to pre-train sentiment-aware embeddings, then fine-tunes the resulting classifier on scarce labeled SE data. It claims this yields significant gains over prior SE sentiment tools on benchmark datasets and that Tweets contribute more than GitHub posts to the gains.

Significance. If the reported gains hold after proper validation, the work would be useful because it demonstrates a practical way to leverage abundant emoji-labeled social-media data to mitigate the labeled-data bottleneck in SE sentiment analysis and supplies an empirical argument against purely domain-specific resource collection.

major comments (2)

[Abstract] Abstract: the central claim of 'significant improvement' on representative benchmark datasets is asserted without any quantitative results, baseline details, statistical tests, or ablation numbers, preventing evaluation of the performance claim.
[Abstract] The method relies on the untested assumption that emoji-derived labels align sufficiently with human sentiment judgments on SE texts containing technical jargon; no agreement rate, confusion matrix, or cross-domain label-consistency experiment is described to support the transfer step.

minor comments (1)

[Abstract] Abstract: the phrase 'contrast experiments' is used without indicating which datasets, models, or metrics were contrasted.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and agree that revisions to the abstract are warranted to strengthen the presentation of our claims.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of 'significant improvement' on representative benchmark datasets is asserted without any quantitative results, baseline details, statistical tests, or ablation numbers, preventing evaluation of the performance claim.

Authors: We agree that the abstract would be improved by including concrete quantitative support for the performance claims. The full manuscript reports these details in the experimental evaluation, including accuracy/F1 improvements over baselines such as SentiStrength and other SE-specific tools, along with statistical significance tests and ablation results on the relative contribution of Tweets versus GitHub posts. We will revise the abstract to summarize the key quantitative findings and ablation outcomes. revision: yes
Referee: [Abstract] The method relies on the untested assumption that emoji-derived labels align sufficiently with human sentiment judgments on SE texts containing technical jargon; no agreement rate, confusion matrix, or cross-domain label-consistency experiment is described to support the transfer step.

Authors: The approach intentionally treats emojis as noisy labels to capture both domain-specific jargon and cross-domain sentiment patterns, with effectiveness shown through downstream gains on SE benchmarks. We acknowledge that an explicit cross-domain label-consistency analysis (e.g., agreement rates or confusion matrices between emoji labels and human judgments on SE text) is not described. To address this, we will add such an analysis or discussion in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper presents an empirical ML pipeline that pre-trains sentiment representations on external emoji-labeled Tweets and GitHub posts then fine-tunes on scarce labeled SE data. No equations, fitted parameters, or derivations appear that reduce any claimed prediction to the target labels by construction. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The method is self-contained against external benchmarks and does not rename known results or smuggle assumptions via prior author work.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only review; full paper would likely list model hyperparameters and training details. The central premise rests on one domain assumption about emoji labels.

free parameters (1)

model hyperparameters and representation dimensions
Standard in any neural representation learning approach; not enumerated in abstract.

axioms (1)

domain assumption Emotional emojis provide reliable noisy sentiment labels transferable across general and SE domains
Invoked when the abstract states that emoji usage data supplies both technical jargon and general sentiment patterns.

pith-pipeline@v0.9.0 · 5792 in / 1230 out tokens · 42636 ms · 2026-05-25T09:34:58.955934+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

76 extracted references · 76 canonical work pages · 1 internal anchor

[1]

SentiStrength

2010. SentiStrength. http://sentistrength.wlv.ac.uk/. Retrieved in November 2018. SEntiMoji: An Emoji-Powered Learning Approach for Sentiment Analysis in Software Engineering ESEC/FSE ’19, August 26–30, 2019, Tallinn, Estonia

work page 2010
[2]

JIRA Dataset

2016. JIRA Dataset. http://ansymore.uantwerpen.be/system/files/uploads/ artefacts/alessandro/MSR16/archive3.zip. Retrieved in November 2018

work page 2016
[3]

DeepMoji

2017. DeepMoji. https://github.com/bfelbo/deepmoji. Retrieved in November 2018

work page 2017
[4]

2017. SentiCR. https://github.com/senticr/SentiCR/. Retrieved in November 2018

work page 2017
[5]

SentiStrength-SE

2017. SentiStrength-SE. http://laser.cs.uno.edu/Projects/Projects.html. Retrieved in November 2018

work page 2017
[6]

Java Library Dataset

2018. Java Library Dataset. https://sentiment-se.github.io/replication.zip. Re- trieved in November 2018

work page 2018
[7]

Senti4SD

2018. Senti4SD. https://github.com/collab-uniba/Senti4SD. Retrieved in Novem- ber 2018

work page 2018
[8]

Toufique Ahmed, Amiangshu Bosu, Anindya Iqbal, and Shahram Rahimi. 2017. SentiCR: a customized sentiment analysis tool for code review interactions. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017. 106–111

work page 2017
[9]

Wei Ai, Xuan Lu, Xuanzhe Liu, Ning Wang, Gang Huang, and Qiaozhu Mei. 2017. Untangling emoji popularity through semantic embeddings. In Proceedings of the Eleventh International Conference on Web and Social Media, ICWSM 2017 . 2–11

work page 2017
[10]

Akiko Aizawa. 2003. An information-theoretic perspective of TF-IDF measures. Information Processing & Management 39, 1 (2003), 45–65

work page 2003
[11]

Yoav Benjamini and Daniel Yekutieli. 2001. The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics 29, 4 (2001), 1165–1188

work page 2001
[12]

Steven Bird and Edward Loper. 2004. NLTK: the natural language toolkit. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, ACL 2004. 31

work page 2004
[13]

Cássio Castaldi Araujo Blaz and Karin Becker. 2016. Sentiment analysis in tickets for IT support. In Proceedings of the 13th International Conference on Mining Software Repositories, MSR 2016 . 235–246

work page 2016
[14]

Fabio Calefato, Filippo Lanubile, Federico Maiorano, and Nicole Novielli. 2018. Sentiment polarity detection for software development. Empirical Software Engineering 23, 3 (2018), 1352–1382

work page 2018
[15]

Chawla, Kevin W

Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer

work page
[16]

Journal of Artificial Intelligence Research 16 (2002), 321–357

SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002), 321–357

work page 2002
[17]

Zhenpeng Chen, Xuan Lu, Wei Ai, Huoran Li, Qiaozhu Mei, and Xuanzhe Liu

work page
[18]

In Proceedings of the 2018 World Wide Web Conference, WWW

Through a gender lens: learning usage patterns of emojis from large-scale Android users. In Proceedings of the 2018 World Wide Web Conference, WWW

work page 2018
[19]

Zhenpeng Chen, Sheng Shen, Ziniu Hu, Xuan Lu, Qiaozhu Mei, and Xuanzhe Liu. 2019. Emoji-powered representation learning for cross-lingual sentiment classification. In Proceedings of the 2019 World Wide Web Conference on World Wide Web, WWW 2019. 251–262

work page 2019
[20]

Shaiful Alam Chowdhury and Abram Hindle. 2016. Characterizing energy-aware software projects: are they different?. In Proceedings of the 13th International Conference on Mining Software Repositories, MSR 2016 . 508–511

work page 2016
[21]

Maëlick Claes, Mika Mäntylä, and Umar Farooq. 2018. On the use of emoticons in open source software development. In Proceedings of the 12th ACM/IEEE Inter- national Symposium on Empirical Software Engineering and Measurement, ESEM

work page 2018
[22]

Dmitry Davidov, Oren Tsur, and Ari Rappoport. 2010. Enhanced sentiment learn- ing using Twitter hashtags and smileys. In Proceedings of the 23rd International Conference on Computational Linguistics, COLING 2010 . 241–249

work page 2010
[23]

Thomas G Dietterich. 1998. Approximate statistical tests for comparing su- pervised classification learning algorithms. Neural computation 10, 7 (1998), 1895–1923

work page 1998
[24]

Jin Ding, Hailong Sun, Xu Wang, and Xudong Liu. 2018. Entity-level sentiment analysis of issue comments. In Proceedings of the 3rd International Workshop on Emotion A wareness in Software Engineering, SEmotion@ICSE 2018. 7–13

work page 2018
[25]

Bjarke Felbo, Alan Mislove, Anders Søgaard, Iyad Rahwan, and Sune Lehmann

work page
[26]

In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017

Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017 . 1615–1625

work page 2017
[27]

Jerome H Friedman. 2002. Stochastic gradient boosting. Computational Statistics & Data Analysis 38, 4 (2002), 367–378

work page 2002
[28]

Daviti Gachechiladze, Filippo Lanubile, Nicole Novielli, and Alexander Serebrenik

work page
[29]

In Proceed- ings of the 39th IEEE/ACM International Conference on Software Engineering: New Ideas and Emerging Technologies Results Track, ICSE-NIER 2017

Anger and its direction in collaborative software development. In Proceed- ings of the 39th IEEE/ACM International Conference on Software Engineering: New Ideas and Emerging Technologies Results Track, ICSE-NIER 2017 . 11–14

work page 2017
[30]

David García, Marcelo Serrano Zanetti, and Frank Schweitzer. 2013. The role of emotions in contributors activity: a case study on the GENTOO community. In 2013 International Conference on Cloud and Green Computing, CGC 2013. 410–417

work page 2013
[31]

Anastasia Giachanou and Fabio Crestani. 2016. Like it or not: a survey of Twitter sentiment analysis methods. Comput. Surveys 49, 2 (2016), 28:1–28:41

work page 2016
[32]

Emitza Guzman, David Azócar, and Yang Li. 2014. Sentiment analysis of commit comments in GitHub: an empirical study. In Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014 . 352–355

work page 2014
[33]

Emitza Guzman and Bernd Bruegge. 2013. Towards emotional awareness in software development teams. In Joint Meeting of the European Software Engineer- ing Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2013. 671–674

work page 2013
[34]

Emitza Guzman and Walid Maalej. 2014. How do users like this feature? A fine grained sentiment analysis of app reviews. In IEEE 22nd International Require- ments Engineering Conference, RE 2014 . 153–162

work page 2014
[35]

Hermans and B

M. Hermans and B. Schrauwen. 2013. Training and analysing deep recurrent neural networks. Proceedings of advances in Neural Information Processing Systems (2013), 190–198

work page 2013
[36]

Sepp Hochreiter and JÃĳrgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780

work page 1997
[37]

Tianran Hu, Han Guo, Hao Sun, Thuy-vy Thi Nguyen, and Jiebo Luo. 2017. Spice up your chat: the intentions and sentiment effects of using emojis. In Proceedings of the Eleventh International Conference on Web and Social Media, ICWSM 2017 . 102–111

work page 2017
[38]

Murphy-Hill

Nasif Imtiaz, Justin Middleton, Joymallya Chakraborty, Neill Robson, Gina Bai, and Emerson R. Murphy-Hill. 2019. Investigating the effects of gender bias on GitHub. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019. 700–711

work page 2019
[39]

Md Rakibul Islam and Minhaz F. Zibran. 2017. Leveraging automated senti- ment analysis in software engineering. In Proceedings of the 14th International Conference on Mining Software Repositories, MSR 2017 . 203–214

work page 2017
[40]

Md Rakibul Islam and Minhaz F. Zibran. 2018. SentiStrength-SE: exploiting domain specificity for improved sentiment analysis in software engineering text. Journal of Systems and Software 145 (2018), 125–146

work page 2018
[41]

Robbert Jongeling, Subhajit Datta, and Alexander Serebrenik. 2015. Choosing your weapons: on sentiment analysis tools for software engineering research. In 2015 IEEE International Conference on Software Maintenance and Evolution, ICSME

work page 2015
[42]

Robbert Jongeling, Proshanta Sarkar, Subhajit Datta, and Alexander Serebrenik

work page
[43]

Empirical Software Engineering 22, 5 (2017), 2543–2584

On negative results when using sentiment analysis tools for software engineering research. Empirical Software Engineering 22, 5 (2017), 2543–2584

work page 2017
[44]

Francisco Jurado and Pilar Rodríguez Marín. 2015. Sentiment analysis in moni- toring software development processes: an exploratory case study on GitHub’s project issues. Journal of Systems and Software 104 (2015), 82–89

work page 2015
[45]

Bin Lin, Fiorella Zampetti, Gabriele Bavota, Massimiliano Di Penta, and Michele Lanza. 2019. Pattern-based mining of opinions in Q&A websites. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019 . 548–559

work page 2019
[46]

Bin Lin, Fiorella Zampetti, Gabriele Bavota, Massimiliano Di Penta, Michele Lanza, and Rocco Oliveto. 2018. Sentiment analysis for software engineering: how far can we go?. InProceedings of the 40th International Conference on Software Engineering, ICSE 2018. 94–104

work page 2018
[47]

Kun-Lin Liu, Wu-Jun Li, and Minyi Guo. 2012. Emoticon smoothed language models for Twitter sentiment analysis. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2012

work page 2012
[48]

Xuan Lu, Wei Ai, Xuanzhe Liu, Qian Li, Ning Wang, Gang Huang, and Qiaozhu Mei. 2016. Learning from the ubiquitous language: an empirical analysis of emoji usage of smartphone users. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp 2016 . 770–780

work page 2016
[49]

Xuan Lu, Yanbin Cao, Zhenpeng Chen, and Xuanzhe Liu. 2018. A first look at emoji usage on GitHub: an empirical study. CoRR abs/1812.04863 (2018). arXiv:1812.04863 http://arxiv.org/abs/1812.04863

work page internal anchor Pith review Pith/arXiv arXiv 2018
[50]

Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky

Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 . 55–60

work page 2014
[51]

Mika Mäntylä, Bram Adams, Giuseppe Destefanis, Daniel Graziotin, and Marco Ortu. 2016. Mining valence, arousal, and dominance: possibilities for detecting burnout and productivity?. In Proceedings of the 13th International Conference on Mining Software Repositories, MSR 2016 . 247–258

work page 2016
[52]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. Computer Science (2013)

work page 2013
[53]

Alessandro Murgia, Parastou Tourani, Bram Adams, and Marco Ortu. 2014. Do de- velopers feel emotions? An exploratory analysis of emotions in software artifacts. In Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014. 262–271

work page 2014
[54]

Nicole Novielli, Fabio Calefato, and Filippo Lanubile. 2015. The challenges of sentiment detection in the social programmer ecosystem. In Proceedings of the 7th International Workshop on Social Software Engineering, SSE 2015 . 33–40

work page 2015
[55]

Nicole Novielli, Daniela Girardi, and Filippo Lanubile. 2018. A benchmark study on sentiment analysis for software engineering research. In Proceedings of the 15th International Conference on Mining Software Repositories, MSR 2018. 364–375

work page 2018
[56]

Marco Ortu, Bram Adams, Giuseppe Destefanis, Parastou Tourani, Michele March- esi, and Roberto Tonelli. 2015. Are bullies more productive? Empirical study of affectiveness vs. issue fixing time. In 12th IEEE/ACM Working Conference on Mining Software Repositories, MSR 2015 . 303–313. ESEC/FSE ’19, August 26–30, 2019, Tallinn, Estonia Zhenpeng Chen, Yanbin ...

work page 2015
[57]

Marco Ortu, Giuseppe Destefanis, Steve Counsell, Stephen Swift, Roberto Tonelli, and Michele Marchesi. 2016. Arsonists or firefighters? Affectiveness in agile software development. In Proceedings of the 2016 International Conference on Agile Software Development, XP 2016 . Springer, 144–155

work page 2016
[58]

Marco Ortu, Alessandro Murgia, Giuseppe Destefanis, Parastou Tourani, Roberto Tonelli, Michele Marchesi, and Bram Adams. 2016. The emotional side of software developers in JIRA. In Proceedings of the 13th International Conference on Mining Software Repositories, MSR 2016 . 480–483

work page 2016
[59]

Sebastiano Panichella, Andrea Di Sorbo, Emitza Guzman, Corrado Aaron Vis- aggio, Gerardo Canfora, and Harald C. Gall. 2015. How can I improve my app? Classifying user reviews for software maintenance and evolution. In 2015 IEEE International Conference on Software Maintenance and Evolution, ICSME 2015 . 281–290

work page 2015
[60]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014 . 1532–1543

work page 2014
[61]

Daniel Pletea, Bogdan Vasilescu, and Alexander Serebrenik. 2014. Security and emotion: sentiment analysis of security discussions on GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014 . 348–351

work page 2014
[62]

Roy, and Iman Keivanloo

Mohammad Masudur Rahman, Chanchal K. Roy, and Iman Keivanloo. 2015. Rec- ommending insightful comments for source code using crowdsourced knowledge. In Proceedings of the 15th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2015 . 81–90

work page 2015
[63]

Rumelhart, Geoffrey E

David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1988. Learning representations by back-propagating errors. Nature (1988), 533–536

work page 1988
[64]

Vinayak Sinha, Alina Lazar, and Bonita Sharif. 2016. Analyzing developer senti- ment in commit logs. InProceedings of the 13th International Conference on Mining Software Repositories, MSR 2016 . 520–523

work page 2016
[65]

Vibha Singhal Sinha, Senthil Mani, and Monika Gupta. 2013. Exploring activeness of users in QA forums. In Proceedings of the 10th Working Conference on Mining Software Repositories, MSR 2013 . 77–80

work page 2013
[66]

Rodrigo Souza and Bruno Silva. 2017. Sentiment analysis of Travis CI builds. In Proceedings of the 14th International Conference on Mining Software Repositories, MSR 2017. 459–462

work page 2017
[67]

Johan AK Suykens and Joos Vandewalle. 1999. Least squares support vector machine classifiers. Neural processing letters 9, 3 (1999), 293–300

work page 1999
[68]

Mike Thelwall, Kevan Buckley, Georgios Paltoglou, Di Cai, and Arvid Kappas

work page
[69]

JASIST 61, 12 (2010), 2544–2558

Sentiment in short strength detection informal text. JASIST 61, 12 (2010), 2544–2558

work page 2010
[70]

Parastou Tourani and Bram Adams. 2016. The impact of human discussions on just-in-time quality assurance: An empirical study on OpenStack and Eclipse. In IEEE 23rd International Conference on Software Analysis, Evolution, and Reengi- neering, SANER 2016. 189–200

work page 2016
[71]

Gias Uddin and Foutse Khomh. 2017. Opiner: an opinion search and summariza- tion engine for APIs. InProceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017 . 978–983

work page 2017
[72]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Annual Conference on Neural Information Processing Systems 2017, NIPS 2017. 6000–6010

work page 2017
[73]

Honghao Wei, Fuzheng Zhang, Nicholas Jing Yuan, Chuan Cao, Hao Fu, Xing Xie, Yong Rui, and Wei-Ying Ma. 2017. Beyond the words: predicting user personality from heterogeneous information. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM 2017 . 305–314

work page 2017
[74]

Michal R. Wróbel. 2013. Emotions in the software development process. In Proceedings of the 6th International Conference on Human System Interactions, HSI

work page 2013
[75]

Michal R Wrobel. 2016. Towards the participant observation of emotions in software development teams. In Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, FedCSIS 2016 . 1545–1548

work page 2016
[76]

Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transfer- able are features in deep neural networks?. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, NIPS 2014. 3320–3328

work page 2014

[1] [1]

SentiStrength

2010. SentiStrength. http://sentistrength.wlv.ac.uk/. Retrieved in November 2018. SEntiMoji: An Emoji-Powered Learning Approach for Sentiment Analysis in Software Engineering ESEC/FSE ’19, August 26–30, 2019, Tallinn, Estonia

work page 2010

[2] [2]

JIRA Dataset

2016. JIRA Dataset. http://ansymore.uantwerpen.be/system/files/uploads/ artefacts/alessandro/MSR16/archive3.zip. Retrieved in November 2018

work page 2016

[3] [3]

DeepMoji

2017. DeepMoji. https://github.com/bfelbo/deepmoji. Retrieved in November 2018

work page 2017

[4] [4]

2017. SentiCR. https://github.com/senticr/SentiCR/. Retrieved in November 2018

work page 2017

[5] [5]

SentiStrength-SE

2017. SentiStrength-SE. http://laser.cs.uno.edu/Projects/Projects.html. Retrieved in November 2018

work page 2017

[6] [6]

Java Library Dataset

2018. Java Library Dataset. https://sentiment-se.github.io/replication.zip. Re- trieved in November 2018

work page 2018

[7] [7]

Senti4SD

2018. Senti4SD. https://github.com/collab-uniba/Senti4SD. Retrieved in Novem- ber 2018

work page 2018

[8] [8]

Toufique Ahmed, Amiangshu Bosu, Anindya Iqbal, and Shahram Rahimi. 2017. SentiCR: a customized sentiment analysis tool for code review interactions. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017. 106–111

work page 2017

[9] [9]

Wei Ai, Xuan Lu, Xuanzhe Liu, Ning Wang, Gang Huang, and Qiaozhu Mei. 2017. Untangling emoji popularity through semantic embeddings. In Proceedings of the Eleventh International Conference on Web and Social Media, ICWSM 2017 . 2–11

work page 2017

[10] [10]

Akiko Aizawa. 2003. An information-theoretic perspective of TF-IDF measures. Information Processing & Management 39, 1 (2003), 45–65

work page 2003

[11] [11]

Yoav Benjamini and Daniel Yekutieli. 2001. The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics 29, 4 (2001), 1165–1188

work page 2001

[12] [12]

Steven Bird and Edward Loper. 2004. NLTK: the natural language toolkit. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, ACL 2004. 31

work page 2004

[13] [13]

Cássio Castaldi Araujo Blaz and Karin Becker. 2016. Sentiment analysis in tickets for IT support. In Proceedings of the 13th International Conference on Mining Software Repositories, MSR 2016 . 235–246

work page 2016

[14] [14]

Fabio Calefato, Filippo Lanubile, Federico Maiorano, and Nicole Novielli. 2018. Sentiment polarity detection for software development. Empirical Software Engineering 23, 3 (2018), 1352–1382

work page 2018

[15] [15]

Chawla, Kevin W

Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer

work page

[16] [16]

Journal of Artificial Intelligence Research 16 (2002), 321–357

SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002), 321–357

work page 2002

[17] [17]

Zhenpeng Chen, Xuan Lu, Wei Ai, Huoran Li, Qiaozhu Mei, and Xuanzhe Liu

work page

[18] [18]

In Proceedings of the 2018 World Wide Web Conference, WWW

Through a gender lens: learning usage patterns of emojis from large-scale Android users. In Proceedings of the 2018 World Wide Web Conference, WWW

work page 2018

[19] [19]

Zhenpeng Chen, Sheng Shen, Ziniu Hu, Xuan Lu, Qiaozhu Mei, and Xuanzhe Liu. 2019. Emoji-powered representation learning for cross-lingual sentiment classification. In Proceedings of the 2019 World Wide Web Conference on World Wide Web, WWW 2019. 251–262

work page 2019

[20] [20]

Shaiful Alam Chowdhury and Abram Hindle. 2016. Characterizing energy-aware software projects: are they different?. In Proceedings of the 13th International Conference on Mining Software Repositories, MSR 2016 . 508–511

work page 2016

[21] [21]

Maëlick Claes, Mika Mäntylä, and Umar Farooq. 2018. On the use of emoticons in open source software development. In Proceedings of the 12th ACM/IEEE Inter- national Symposium on Empirical Software Engineering and Measurement, ESEM

work page 2018

[22] [22]

Dmitry Davidov, Oren Tsur, and Ari Rappoport. 2010. Enhanced sentiment learn- ing using Twitter hashtags and smileys. In Proceedings of the 23rd International Conference on Computational Linguistics, COLING 2010 . 241–249

work page 2010

[23] [23]

Thomas G Dietterich. 1998. Approximate statistical tests for comparing su- pervised classification learning algorithms. Neural computation 10, 7 (1998), 1895–1923

work page 1998

[24] [24]

Jin Ding, Hailong Sun, Xu Wang, and Xudong Liu. 2018. Entity-level sentiment analysis of issue comments. In Proceedings of the 3rd International Workshop on Emotion A wareness in Software Engineering, SEmotion@ICSE 2018. 7–13

work page 2018

[25] [25]

Bjarke Felbo, Alan Mislove, Anders Søgaard, Iyad Rahwan, and Sune Lehmann

work page

[26] [26]

In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017

Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017 . 1615–1625

work page 2017

[27] [27]

Jerome H Friedman. 2002. Stochastic gradient boosting. Computational Statistics & Data Analysis 38, 4 (2002), 367–378

work page 2002

[28] [28]

Daviti Gachechiladze, Filippo Lanubile, Nicole Novielli, and Alexander Serebrenik

work page

[29] [29]

In Proceed- ings of the 39th IEEE/ACM International Conference on Software Engineering: New Ideas and Emerging Technologies Results Track, ICSE-NIER 2017

Anger and its direction in collaborative software development. In Proceed- ings of the 39th IEEE/ACM International Conference on Software Engineering: New Ideas and Emerging Technologies Results Track, ICSE-NIER 2017 . 11–14

work page 2017

[30] [30]

David García, Marcelo Serrano Zanetti, and Frank Schweitzer. 2013. The role of emotions in contributors activity: a case study on the GENTOO community. In 2013 International Conference on Cloud and Green Computing, CGC 2013. 410–417

work page 2013

[31] [31]

Anastasia Giachanou and Fabio Crestani. 2016. Like it or not: a survey of Twitter sentiment analysis methods. Comput. Surveys 49, 2 (2016), 28:1–28:41

work page 2016

[32] [32]

Emitza Guzman, David Azócar, and Yang Li. 2014. Sentiment analysis of commit comments in GitHub: an empirical study. In Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014 . 352–355

work page 2014

[33] [33]

Emitza Guzman and Bernd Bruegge. 2013. Towards emotional awareness in software development teams. In Joint Meeting of the European Software Engineer- ing Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2013. 671–674

work page 2013

[34] [34]

Emitza Guzman and Walid Maalej. 2014. How do users like this feature? A fine grained sentiment analysis of app reviews. In IEEE 22nd International Require- ments Engineering Conference, RE 2014 . 153–162

work page 2014

[35] [35]

Hermans and B

M. Hermans and B. Schrauwen. 2013. Training and analysing deep recurrent neural networks. Proceedings of advances in Neural Information Processing Systems (2013), 190–198

work page 2013

[36] [36]

Sepp Hochreiter and JÃĳrgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780

work page 1997

[37] [37]

Tianran Hu, Han Guo, Hao Sun, Thuy-vy Thi Nguyen, and Jiebo Luo. 2017. Spice up your chat: the intentions and sentiment effects of using emojis. In Proceedings of the Eleventh International Conference on Web and Social Media, ICWSM 2017 . 102–111

work page 2017

[38] [38]

Murphy-Hill

Nasif Imtiaz, Justin Middleton, Joymallya Chakraborty, Neill Robson, Gina Bai, and Emerson R. Murphy-Hill. 2019. Investigating the effects of gender bias on GitHub. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019. 700–711

work page 2019

[39] [39]

Md Rakibul Islam and Minhaz F. Zibran. 2017. Leveraging automated senti- ment analysis in software engineering. In Proceedings of the 14th International Conference on Mining Software Repositories, MSR 2017 . 203–214

work page 2017

[40] [40]

Md Rakibul Islam and Minhaz F. Zibran. 2018. SentiStrength-SE: exploiting domain specificity for improved sentiment analysis in software engineering text. Journal of Systems and Software 145 (2018), 125–146

work page 2018

[41] [41]

Robbert Jongeling, Subhajit Datta, and Alexander Serebrenik. 2015. Choosing your weapons: on sentiment analysis tools for software engineering research. In 2015 IEEE International Conference on Software Maintenance and Evolution, ICSME

work page 2015

[42] [42]

Robbert Jongeling, Proshanta Sarkar, Subhajit Datta, and Alexander Serebrenik

work page

[43] [43]

Empirical Software Engineering 22, 5 (2017), 2543–2584

On negative results when using sentiment analysis tools for software engineering research. Empirical Software Engineering 22, 5 (2017), 2543–2584

work page 2017

[44] [44]

Francisco Jurado and Pilar Rodríguez Marín. 2015. Sentiment analysis in moni- toring software development processes: an exploratory case study on GitHub’s project issues. Journal of Systems and Software 104 (2015), 82–89

work page 2015

[45] [45]

Bin Lin, Fiorella Zampetti, Gabriele Bavota, Massimiliano Di Penta, and Michele Lanza. 2019. Pattern-based mining of opinions in Q&A websites. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019 . 548–559

work page 2019

[46] [46]

Bin Lin, Fiorella Zampetti, Gabriele Bavota, Massimiliano Di Penta, Michele Lanza, and Rocco Oliveto. 2018. Sentiment analysis for software engineering: how far can we go?. InProceedings of the 40th International Conference on Software Engineering, ICSE 2018. 94–104

work page 2018

[47] [47]

Kun-Lin Liu, Wu-Jun Li, and Minyi Guo. 2012. Emoticon smoothed language models for Twitter sentiment analysis. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2012

work page 2012

[48] [48]

Xuan Lu, Wei Ai, Xuanzhe Liu, Qian Li, Ning Wang, Gang Huang, and Qiaozhu Mei. 2016. Learning from the ubiquitous language: an empirical analysis of emoji usage of smartphone users. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp 2016 . 770–780

work page 2016

[49] [49]

Xuan Lu, Yanbin Cao, Zhenpeng Chen, and Xuanzhe Liu. 2018. A first look at emoji usage on GitHub: an empirical study. CoRR abs/1812.04863 (2018). arXiv:1812.04863 http://arxiv.org/abs/1812.04863

work page internal anchor Pith review Pith/arXiv arXiv 2018

[50] [50]

Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky

Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 . 55–60

work page 2014

[51] [51]

Mika Mäntylä, Bram Adams, Giuseppe Destefanis, Daniel Graziotin, and Marco Ortu. 2016. Mining valence, arousal, and dominance: possibilities for detecting burnout and productivity?. In Proceedings of the 13th International Conference on Mining Software Repositories, MSR 2016 . 247–258

work page 2016

[52] [52]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. Computer Science (2013)

work page 2013

[53] [53]

Alessandro Murgia, Parastou Tourani, Bram Adams, and Marco Ortu. 2014. Do de- velopers feel emotions? An exploratory analysis of emotions in software artifacts. In Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014. 262–271

work page 2014

[54] [54]

Nicole Novielli, Fabio Calefato, and Filippo Lanubile. 2015. The challenges of sentiment detection in the social programmer ecosystem. In Proceedings of the 7th International Workshop on Social Software Engineering, SSE 2015 . 33–40

work page 2015

[55] [55]

Nicole Novielli, Daniela Girardi, and Filippo Lanubile. 2018. A benchmark study on sentiment analysis for software engineering research. In Proceedings of the 15th International Conference on Mining Software Repositories, MSR 2018. 364–375

work page 2018

[56] [56]

Marco Ortu, Bram Adams, Giuseppe Destefanis, Parastou Tourani, Michele March- esi, and Roberto Tonelli. 2015. Are bullies more productive? Empirical study of affectiveness vs. issue fixing time. In 12th IEEE/ACM Working Conference on Mining Software Repositories, MSR 2015 . 303–313. ESEC/FSE ’19, August 26–30, 2019, Tallinn, Estonia Zhenpeng Chen, Yanbin ...

work page 2015

[57] [57]

Marco Ortu, Giuseppe Destefanis, Steve Counsell, Stephen Swift, Roberto Tonelli, and Michele Marchesi. 2016. Arsonists or firefighters? Affectiveness in agile software development. In Proceedings of the 2016 International Conference on Agile Software Development, XP 2016 . Springer, 144–155

work page 2016

[58] [58]

Marco Ortu, Alessandro Murgia, Giuseppe Destefanis, Parastou Tourani, Roberto Tonelli, Michele Marchesi, and Bram Adams. 2016. The emotional side of software developers in JIRA. In Proceedings of the 13th International Conference on Mining Software Repositories, MSR 2016 . 480–483

work page 2016

[59] [59]

Sebastiano Panichella, Andrea Di Sorbo, Emitza Guzman, Corrado Aaron Vis- aggio, Gerardo Canfora, and Harald C. Gall. 2015. How can I improve my app? Classifying user reviews for software maintenance and evolution. In 2015 IEEE International Conference on Software Maintenance and Evolution, ICSME 2015 . 281–290

work page 2015

[60] [60]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014 . 1532–1543

work page 2014

[61] [61]

Daniel Pletea, Bogdan Vasilescu, and Alexander Serebrenik. 2014. Security and emotion: sentiment analysis of security discussions on GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014 . 348–351

work page 2014

[62] [62]

Roy, and Iman Keivanloo

Mohammad Masudur Rahman, Chanchal K. Roy, and Iman Keivanloo. 2015. Rec- ommending insightful comments for source code using crowdsourced knowledge. In Proceedings of the 15th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2015 . 81–90

work page 2015

[63] [63]

Rumelhart, Geoffrey E

David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1988. Learning representations by back-propagating errors. Nature (1988), 533–536

work page 1988

[64] [64]

Vinayak Sinha, Alina Lazar, and Bonita Sharif. 2016. Analyzing developer senti- ment in commit logs. InProceedings of the 13th International Conference on Mining Software Repositories, MSR 2016 . 520–523

work page 2016

[65] [65]

Vibha Singhal Sinha, Senthil Mani, and Monika Gupta. 2013. Exploring activeness of users in QA forums. In Proceedings of the 10th Working Conference on Mining Software Repositories, MSR 2013 . 77–80

work page 2013

[66] [66]

Rodrigo Souza and Bruno Silva. 2017. Sentiment analysis of Travis CI builds. In Proceedings of the 14th International Conference on Mining Software Repositories, MSR 2017. 459–462

work page 2017

[67] [67]

Johan AK Suykens and Joos Vandewalle. 1999. Least squares support vector machine classifiers. Neural processing letters 9, 3 (1999), 293–300

work page 1999

[68] [68]

Mike Thelwall, Kevan Buckley, Georgios Paltoglou, Di Cai, and Arvid Kappas

work page

[69] [69]

JASIST 61, 12 (2010), 2544–2558

Sentiment in short strength detection informal text. JASIST 61, 12 (2010), 2544–2558

work page 2010

[70] [70]

Parastou Tourani and Bram Adams. 2016. The impact of human discussions on just-in-time quality assurance: An empirical study on OpenStack and Eclipse. In IEEE 23rd International Conference on Software Analysis, Evolution, and Reengi- neering, SANER 2016. 189–200

work page 2016

[71] [71]

Gias Uddin and Foutse Khomh. 2017. Opiner: an opinion search and summariza- tion engine for APIs. InProceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017 . 978–983

work page 2017

[72] [72]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Annual Conference on Neural Information Processing Systems 2017, NIPS 2017. 6000–6010

work page 2017

[73] [73]

Honghao Wei, Fuzheng Zhang, Nicholas Jing Yuan, Chuan Cao, Hao Fu, Xing Xie, Yong Rui, and Wei-Ying Ma. 2017. Beyond the words: predicting user personality from heterogeneous information. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM 2017 . 305–314

work page 2017

[74] [74]

Michal R. Wróbel. 2013. Emotions in the software development process. In Proceedings of the 6th International Conference on Human System Interactions, HSI

work page 2013

[75] [75]

Michal R Wrobel. 2016. Towards the participant observation of emotions in software development teams. In Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, FedCSIS 2016 . 1545–1548

work page 2016

[76] [76]

Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transfer- able are features in deep neural networks?. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, NIPS 2014. 3320–3328

work page 2014