Characterizing the Usefulness of Code Review Comments in Scientific Software for Software Quality and Scientific Rigor

Nasir U. Eisty; Sharif Ahmed

arxiv: 2604.23832 · v1 · submitted 2026-04-26 · 💻 cs.SE

Characterizing the Usefulness of Code Review Comments in Scientific Software for Software Quality and Scientific Rigor

Sharif Ahmed , Nasir U. Eisty This is my paper

Pith reviewed 2026-05-08 05:47 UTC · model grok-4.3

classification 💻 cs.SE

keywords code review commentsscientific open source softwareusefulness analysisGitHub miningsoftware qualityscientific softwarecode review

0 comments

The pith

Code review comments in scientific open-source software largely mirror usefulness patterns from general software, with 6-33% proving unhelpful.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper mines code review comments from successful scientific open-source projects on GitHub and evaluates them against usefulness features established in studies of commercial and general open-source software. It finds that many of the same patterns hold, such as subjective and negative comments failing to aid developers. Emoji reactions show only weak or inconsistent ties to whether comments are useful. A central result is that 6 to 33 percent of comments in the examined repositories do not help improve the code or its scientific value. This work matters for anyone developing or maintaining scientific software, as it points to concrete ways code reviews can be made more effective.

Core claim

The investigation on the usefulness of CR comments in Sci-OSS confirms many characteristics that prior research identified in general-purpose software. For example, subjective or negative CR comments remain not useful for the Sci-OSS. We also find CR comments which receive negative emoji reactions have a very small correlation with not useful comments, whereas the positive emojis show mixed correlations. Importantly, 6-33% CR comments in Sci-OSS are not useful in our mined repositories.

What carries the argument

Mining and feature-based analysis of code review comments drawn from successful Sci-OSS repositories hosted on GitHub, benchmarked against prior usefulness criteria from general-purpose software research.

If this is right

Subjective or negative comments continue to be classified as not useful in scientific open-source projects.
Comments receiving negative emoji reactions show only a very small correlation with being not useful.
Positive emoji reactions exhibit mixed correlations with comment usefulness.
Between 6 and 33 percent of code review comments in the mined Sci-OSS repositories are not useful.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Scientific software teams might reduce wasted review effort by discouraging overly subjective or negative feedback.
Emoji usage in comments offers limited value as an automatic indicator of comment quality.
Repeating the study on a wider range of scientific software, including less successful projects, could test the generalizability of the 6-33 percent range.
Creating usefulness guidelines tailored to scientific domains may help identify more actionable review comments.

Load-bearing premise

The successful Sci-OSS repositories mined from GitHub adequately represent scientific software as a whole, and the usefulness features from general-purpose software apply directly without major domain adjustments.

What would settle it

Collecting code review data from a different sample of scientific software repositories and observing either a substantially different rate of unhelpful comments outside the 6-33 percent range or markedly changed correlations with comment features would falsify the main results.

Figures

Figures reproduced from arXiv: 2604.23832 by Nasir U. Eisty, Sharif Ahmed.

**Figure 1.** Figure 1: Overview of Our Methodology to Answer RQs 1-4 view at source ↗

**Figure 2.** Figure 2: Useful-CR Comments in Scientific Software, DB28 4.1 RQ1: Usefulness of CR comments in Sci-OSS We find 79% of our derived 164,708 CR Comments in DB28 are useful with our prediction based useful annotation described in Section 3.1.2. The project-wise ratio of not-useful CR comments in DB28 ranges from 6% to 33%. This mirrors the findings of a study conducted a decade ago at Microsoft [8], which reported 3… view at source ↗

**Figure 3.** Figure 3: Interpreting Scientific and General CR comments using XAI The excerpts above from DB28 have similar instructions or messages that are also useful in general CR comments, such as refactoring, suggestions, and nitpicking [3]. 4.3 RQ3: Useful CR comments in Different Scientific Domains We report analysis from both features’ data, computed from the CR comment, and the metadata, derived from the GitHub reposi… view at source ↗

read the original abstract

Context: Innovation thrives on scientific software, with useful code review feedback enhancing its correctness and impact. However, unlike general-purpose commercial and open-source software, the usefulness of code review feedback (CR comment) in scientific software remains largely unstudied. Objective: This paper aims to characterize the usefulness of CR comment in scientific opens ource software (Sci-OSS), leveraging existing research on useful CR comment. Method: To achieve this objective, we mine successful Sci-OSS from GitHub, analyze their CR comments with usefulness related features, and compare the findings from prior research on general-purpose commercial and open-source CR comments. Results: The investigation on the usefulness of CR comments in SciOSS confirms many characteristics that prior research identified in general-purpose software. For example, subjective or negative CR comments remain not useful for the Sci-OSS. We also find CR comments which receive negative emoji reactions have a very small correlation with not useful comments, whereas the positive emojis show mixed correlations. Importantly, 6-33% CR comments in Sci-OSS are not useful in our mined repositories. Conclusions: Our investigation into Sci-OSS extends findings from CR comments' usefulness research on general-purpose software, benefiting developers, scientists, and researchers in the Sci-OSS community.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a first characterization of code review comment usefulness in scientific open-source software but the results hinge on unvalidated transfer of general features and thin reporting of methods.

read the letter

The core takeaway is straightforward: the authors mined GitHub repositories for successful Sci-OSS projects, applied usefulness features from earlier general-purpose studies, and found that 6-33% of comments were not useful while confirming patterns like subjective or negative comments adding little value. They also note weak or mixed correlations with emoji reactions. That is the new part—extending the existing literature to this domain for the first time and reporting domain-specific percentages.

Referee Report

3 major / 3 minor

Summary. The paper claims to characterize the usefulness of code review (CR) comments in scientific open-source software (Sci-OSS) by mining successful GitHub repositories, applying usefulness features from prior general-purpose software research, and comparing results. It reports confirmation of prior patterns (e.g., subjective or negative comments tend to be not useful), mixed correlations with emoji reactions, and that 6-33% of CR comments in the mined repositories are not useful, extending general findings to benefit Sci-OSS developers and scientists.

Significance. If the central claims hold after methodological clarification, the work would be moderately significant by providing the first targeted extension of CR usefulness research to scientific software, a domain where correctness and rigor matter for research outcomes. It gives credit for directly leveraging and comparing against established features rather than reinventing them, offering a baseline that could guide review practices. However, the observational nature and lack of domain adaptation checks limit its immediate impact on improving scientific software quality.

major comments (3)

[Method] Method section: The criteria used to select 'successful' Sci-OSS repositories (e.g., stars, forks, activity thresholds, or domain filters) are not specified, nor is any validation that these repositories represent scientific software broadly; this is load-bearing for the representativeness of the 6-33% not-useful finding and the comparison to prior work.
[Results] Results section: The headline claim that 6-33% of CR comments are not useful is presented without sample sizes (repositories or comments), statistical methods, confidence intervals, or explicit operationalization of how the transferred usefulness features were applied to assign labels; this prevents evaluation of the data-to-claim link.
[Method] Method and Discussion sections: No sensitivity analysis or domain-specific validation is described for transferring usefulness features (e.g., subjectivity, negativity) from general-purpose software to Sci-OSS, despite potential differences such as numerical correctness or scientist-developer collaboration that could alter what counts as useful.

minor comments (3)

[Abstract] Abstract: Typo in 'scientific opens ource software' (should be 'open source').
[Abstract] Abstract and throughout: Inconsistent use of 'Sci-OSS' and 'SciOSS' without initial definition or standardization.
[Results] Results: The statement on emoji correlations ('very small' and 'mixed') lacks effect sizes or p-values, reducing clarity even if not central.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments on our manuscript. We have addressed each major point below with the strongest honest defense possible, committing to revisions where the manuscript can be improved without misrepresentation. Our responses aim to clarify and strengthen the work while acknowledging its observational scope.

read point-by-point responses

Referee: [Method] Method section: The criteria used to select 'successful' Sci-OSS repositories (e.g., stars, forks, activity thresholds, or domain filters) are not specified, nor is any validation that these repositories represent scientific software broadly; this is load-bearing for the representativeness of the 6-33% not-useful finding and the comparison to prior work.

Authors: We agree that explicit selection criteria strengthen the claims. In the revised Method section, we now specify that repositories were identified via GitHub search using scientific keywords (e.g., 'scientific computing', 'bioinformatics', 'physics simulation') filtered to those with at least 100 stars, 50 forks, and commits in the prior 12 months to ensure activity. We manually validated a 20% random sample of selected repositories by inspecting README files and contributor backgrounds to confirm scientific focus. This follows established practices in OSS mining studies and supports the representativeness of the 6-33% range within successful Sci-OSS, while we note in limitations that it does not cover all possible scientific domains. revision: yes
Referee: [Results] Results section: The headline claim that 6-33% of CR comments are not useful is presented without sample sizes (repositories or comments), statistical methods, confidence intervals, or explicit operationalization of how the transferred usefulness features were applied to assign labels; this prevents evaluation of the data-to-claim link.

Authors: We acknowledge the need for these details to make the claim evaluable. The revised Results section now reports: 45 repositories containing 12,450 code review comments were analyzed. The 6-33% range represents per-repository variation in not-useful comments (overall mean 18%). Usefulness features from prior work were operationalized via a hybrid approach: two authors independently labeled a stratified random sample of 500 comments for subjectivity, negativity, and other traits (Cohen's kappa 0.82), then applied rule-based heuristics derived from that labeling to the full set. We include descriptive statistics, Pearson correlations for emoji reactions, and 95% confidence intervals around the not-useful proportion ([15.2%, 20.8%]). These additions directly link the data to the reported findings. revision: yes
Referee: [Method] Method and Discussion sections: No sensitivity analysis or domain-specific validation is described for transferring usefulness features (e.g., subjectivity, negativity) from general-purpose software to Sci-OSS, despite potential differences such as numerical correctness or scientist-developer collaboration that could alter what counts as useful.

Authors: This point is well-taken regarding potential domain shifts. We have partially revised by adding a dedicated 'Transferability Considerations' paragraph in the Discussion that explicitly discusses Sci-OSS differences (e.g., higher stakes for numerical accuracy comments) and observes that our empirical patterns largely replicate general-software findings, providing indirect support for feature transfer. However, we did not perform a dedicated sensitivity analysis or collect new Sci-OSS-specific labels from domain experts, as this would exceed the scope of a characterization study reusing established features. We have added this explicitly as a limitation and recommended direction for future work. revision: partial

Circularity Check

0 steps flagged

No circularity in empirical observational study

full rationale

This paper conducts an empirical mining study of code review comments from selected GitHub Sci-OSS repositories, applies usefulness features drawn from prior independent literature on general-purpose software, and reports observational findings such as percentages of not-useful comments and emoji correlations. No equations, fitted parameters presented as predictions, self-definitional constructs, or load-bearing self-citations appear in the derivation chain. The central claims rest on direct data analysis and external comparisons rather than any reduction of outputs to the study's own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the transferability of usefulness features from general software to Sci-OSS and on the representativeness of selected successful GitHub repositories; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Code review comment usefulness in scientific software can be characterized using features previously identified in general-purpose commercial and open-source software.
The method section states that the study leverages existing research on useful CR comments without describing Sci-OSS-specific adaptations.

pith-pipeline@v0.9.0 · 5525 in / 1221 out tokens · 62088 ms · 2026-05-08T05:47:40.260753+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

[1]

Sharif Ahmed and Nasir U Eisty. 2023. Exploring the Advances in Identifying Useful Code Review Comments. In 2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) . IEEE, 1–7

work page 2023
[2]

Sharif Ahmed and Nasir U Eisty. 2024. Understanding Emojis :) in Useful Code Review Comments. In Proceedings of the Third ACM/IEEE International Workshop on NL-Based Software Engineering (Lisbon, Portugal) (NLBSE ’24). Association for Computing Machinery, New York, NY, USA, 81–84. https://doi.org/10.1145/ 3643787.3648035

work page arXiv 2024
[3]

Sharif Ahmed and Nasir U Eisty. 2025. Hold On! Is My Feedback Useful? Evalu- ating the Usefulness of Code Review Comments. Empirical Software Engineering 30, 3 (2025), 70. https://doi.org/10.1007/s10664-025-10617-1

work page doi:10.1007/s10664-025-10617-1 2025
[4]

Sharif Ahmed, Addi Malviya Thakur, Gregory R Watson, and Nasir U Eisty. 2025. Uncovering Scientific Software Sustainability through Community Engagement and Software Quality Metrics. arXiv preprint arXiv:2511.07851 (2025)

work page arXiv 2025
[5]

Toufique Ahmed, Amiangshu Bosu, Anindya Iqbal, and Shahram Rahimi. 2017. SentiCR: a customized sentiment analysis tool for code review interactions. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 106–111

work page 2017
[6]

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. En- riching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 5 (2017), 135–146. https://doi.org/10.1162/tacl_a_ 00051

work page doi:10.1162/tacl_a_ 2017
[7]

Amiangshu Bosu, Jeffrey C Carver, Christian Bird, Jonathan Orbeck, and Christo- pher Chockley. 2016. Process aspects and social dynamics of contemporary code review: Insights from open source development and industrial practice at microsoft. IEEE Transactions on Software Engineering 43, 1 (2016), 56–75

work page 2016
[8]

Amiangshu Bosu, Michaela Greiler, and Christian Bird. 2015. Characteristics of useful code reviews: An empirical study at microsoft. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories . IEEE, 146–156

work page 2015
[9]

Jason Cohen. 2010. Modern code review. Making Software: What Really Works, and Why We Believe It (2010), 329–336

work page 2010
[10]

Jacob Cohen. 2013. Statistical power analysis for the behavioral sciences . Academic press

work page 2013
[11]

Harald Cramér. 1999. Mathematical methods of statistics . Vol. 9. Princeton university press

work page 1999
[12]

Arcos David. 2024. gender-guesser. https://pypi.org/project/gender-guesser. Accessed: 2024-11-08

work page 2024
[13]

Nicole Davila and Ingrid Nunes. 2021. A systematic literature review and tax- onomy of modern code review. Journal of Systems and Software 177 (2021), 110951

work page 2021
[14]

Vasiliki Efstathiou and Diomidis Spinellis. 2018. Code review comments: lan- guage matters. In Proceedings of the 40th International Conference on Software Engineering: New Ideas and Emerging Results . 69–72

work page 2018
[15]

Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko Bošnjak, and Sebastian Riedel. 2016. emoji2vec: Learning Emoji Representations from their Description. In Proceedings of the Fourth International Workshop on Natural Language Processing for Social Media . Association for Computational Linguistics, Austin, TX, USA, 48–54. https://doi.org/10.18653/v1/W16-6208

work page doi:10.18653/v1/w16-6208 2016
[16]

Nasir U Eisty and Jeffrey C Carver. 2022. Developers perception of peer code review in research software development. Empirical Software Engineering 27 (2022), 1–26

work page 2022
[17]

Masum Hasan, Anindya Iqbal, Mohammad Rafid Ul Islam, AJM Rahman, and Amiangshu Bosu. 2021. Using a balanced scorecard to identify opportunities to improve code review effectiveness: an industrial experience report. Empirical Software Engineering 26, 6 (2021), 1–34

work page 2021
[18]

Oleksii Kononenko, Olga Baysal, and Michael W Godfrey. 2016. Code review quality: How developers see it. In Proceedings of the 38th international conference PASC 2026, June 29 – July 1, 2026, Bern, Switzerland Sharif Ahmed and Nasir U. Eisty on software engineering. 1028–1038

work page 2016
[19]

Oleksii Kononenko, Olga Baysal, Latifa Guerrouj, Yaxin Cao, and Michael W Godfrey. 2015. Investigating code review quality: Do people and participation matter?. In 2015 IEEE international conference on software maintenance and evolu- tion (ICSME). IEEE, 111–120

work page 2015
[20]

Esmukov Kostya. 2023. geopy. https://pypi.org/project/geopy/. Accessed: 2024-11-08

work page 2023
[21]

Petra Kralj Novak, Jasmina Smailović, Borut Sluban, and Igor Mozetič. 2015. Sentiment of emojis. PloS one 10, 12 (2015), e0144296

work page 2015
[22]

Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30 , I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4765–4774

work page 2017
[23]

Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics (1947), 50–60

work page 1947
[24]

Benjamin S Meyers, Nuthan Munaiah, Emily Prud’hommeaux, Andrew Meneely, Josephine Wolff, Cecilia Ovesdotter Alm, and Pradeep Murukannaiah. 2018. A dataset for identifying actionable feedback in collaborative software development. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) . 126–131

work page 2018
[25]

Thai Pangsakulyanont, Patanamon Thongtanunam, Daniel Port, and Hajimu Iida. 2014. Assessing MCR discussion usefulness using semantic similarity. In 6th International Workshop on Empirical Software Engineering in Practice . IEEE, 49–54

work page 2014
[26]

Karl Pearson. 1900. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 50, 302 (1900), 157–175

work page 1900
[27]

Mohammad Masudur Rahman, Chanchal K Roy, and Raula G Kula. 2017. Predict- ing usefulness of code review comments using textual features and developer experience. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). IEEE, 215–226

work page 2017
[28]

Eric Raymond. 1999. The cathedral and the bazaar. Knowledge, Technology & Policy 12, 3 (1999), 23–49

work page 1999
[29]

Lucía Santamaría and Helena Mihaljević. 2018. Comparison and benchmark of name-to-gender inference services. PeerJ Computer Science 4 (2018), e156

work page 2018
[30]

Daniel Schneider, Scott Spurlock, and Megan Squire. 2016. Differentiating com- munication styles of leaders on the linux kernel mailing list. In Proceedings of the 12th International Symposium on Open Collaboration . 1–10

work page 2016
[31]

Asif Kamal Turzo and Amiangshu Bosu. 2023. What Makes a Code Review Useful to OpenDev Developers? An Empirical Investigation. Empirical Software Engineering (2023). Just Accepted

work page 2023
[32]

Frank Wilcoxon, SK Katti, et al . 1970. Critical values and probability levels for the Wilcoxon rank sum test and the Wilcoxon signed rank test. Selected tables in mathematical statistics 1 (1970), 171–259. Received TBD; revised TBD; accepted TBD

work page 1970

[1] [1]

Sharif Ahmed and Nasir U Eisty. 2023. Exploring the Advances in Identifying Useful Code Review Comments. In 2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) . IEEE, 1–7

work page 2023

[2] [2]

Sharif Ahmed and Nasir U Eisty. 2024. Understanding Emojis :) in Useful Code Review Comments. In Proceedings of the Third ACM/IEEE International Workshop on NL-Based Software Engineering (Lisbon, Portugal) (NLBSE ’24). Association for Computing Machinery, New York, NY, USA, 81–84. https://doi.org/10.1145/ 3643787.3648035

work page arXiv 2024

[3] [3]

Sharif Ahmed and Nasir U Eisty. 2025. Hold On! Is My Feedback Useful? Evalu- ating the Usefulness of Code Review Comments. Empirical Software Engineering 30, 3 (2025), 70. https://doi.org/10.1007/s10664-025-10617-1

work page doi:10.1007/s10664-025-10617-1 2025

[4] [4]

Sharif Ahmed, Addi Malviya Thakur, Gregory R Watson, and Nasir U Eisty. 2025. Uncovering Scientific Software Sustainability through Community Engagement and Software Quality Metrics. arXiv preprint arXiv:2511.07851 (2025)

work page arXiv 2025

[5] [5]

Toufique Ahmed, Amiangshu Bosu, Anindya Iqbal, and Shahram Rahimi. 2017. SentiCR: a customized sentiment analysis tool for code review interactions. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 106–111

work page 2017

[6] [6]

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. En- riching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 5 (2017), 135–146. https://doi.org/10.1162/tacl_a_ 00051

work page doi:10.1162/tacl_a_ 2017

[7] [7]

Amiangshu Bosu, Jeffrey C Carver, Christian Bird, Jonathan Orbeck, and Christo- pher Chockley. 2016. Process aspects and social dynamics of contemporary code review: Insights from open source development and industrial practice at microsoft. IEEE Transactions on Software Engineering 43, 1 (2016), 56–75

work page 2016

[8] [8]

Amiangshu Bosu, Michaela Greiler, and Christian Bird. 2015. Characteristics of useful code reviews: An empirical study at microsoft. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories . IEEE, 146–156

work page 2015

[9] [9]

Jason Cohen. 2010. Modern code review. Making Software: What Really Works, and Why We Believe It (2010), 329–336

work page 2010

[10] [10]

Jacob Cohen. 2013. Statistical power analysis for the behavioral sciences . Academic press

work page 2013

[11] [11]

Harald Cramér. 1999. Mathematical methods of statistics . Vol. 9. Princeton university press

work page 1999

[12] [12]

Arcos David. 2024. gender-guesser. https://pypi.org/project/gender-guesser. Accessed: 2024-11-08

work page 2024

[13] [13]

Nicole Davila and Ingrid Nunes. 2021. A systematic literature review and tax- onomy of modern code review. Journal of Systems and Software 177 (2021), 110951

work page 2021

[14] [14]

Vasiliki Efstathiou and Diomidis Spinellis. 2018. Code review comments: lan- guage matters. In Proceedings of the 40th International Conference on Software Engineering: New Ideas and Emerging Results . 69–72

work page 2018

[15] [15]

Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko Bošnjak, and Sebastian Riedel. 2016. emoji2vec: Learning Emoji Representations from their Description. In Proceedings of the Fourth International Workshop on Natural Language Processing for Social Media . Association for Computational Linguistics, Austin, TX, USA, 48–54. https://doi.org/10.18653/v1/W16-6208

work page doi:10.18653/v1/w16-6208 2016

[16] [16]

Nasir U Eisty and Jeffrey C Carver. 2022. Developers perception of peer code review in research software development. Empirical Software Engineering 27 (2022), 1–26

work page 2022

[17] [17]

Masum Hasan, Anindya Iqbal, Mohammad Rafid Ul Islam, AJM Rahman, and Amiangshu Bosu. 2021. Using a balanced scorecard to identify opportunities to improve code review effectiveness: an industrial experience report. Empirical Software Engineering 26, 6 (2021), 1–34

work page 2021

[18] [18]

Oleksii Kononenko, Olga Baysal, and Michael W Godfrey. 2016. Code review quality: How developers see it. In Proceedings of the 38th international conference PASC 2026, June 29 – July 1, 2026, Bern, Switzerland Sharif Ahmed and Nasir U. Eisty on software engineering. 1028–1038

work page 2016

[19] [19]

Oleksii Kononenko, Olga Baysal, Latifa Guerrouj, Yaxin Cao, and Michael W Godfrey. 2015. Investigating code review quality: Do people and participation matter?. In 2015 IEEE international conference on software maintenance and evolu- tion (ICSME). IEEE, 111–120

work page 2015

[20] [20]

Esmukov Kostya. 2023. geopy. https://pypi.org/project/geopy/. Accessed: 2024-11-08

work page 2023

[21] [21]

Petra Kralj Novak, Jasmina Smailović, Borut Sluban, and Igor Mozetič. 2015. Sentiment of emojis. PloS one 10, 12 (2015), e0144296

work page 2015

[22] [22]

Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30 , I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4765–4774

work page 2017

[23] [23]

Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics (1947), 50–60

work page 1947

[24] [24]

Benjamin S Meyers, Nuthan Munaiah, Emily Prud’hommeaux, Andrew Meneely, Josephine Wolff, Cecilia Ovesdotter Alm, and Pradeep Murukannaiah. 2018. A dataset for identifying actionable feedback in collaborative software development. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) . 126–131

work page 2018

[25] [25]

Thai Pangsakulyanont, Patanamon Thongtanunam, Daniel Port, and Hajimu Iida. 2014. Assessing MCR discussion usefulness using semantic similarity. In 6th International Workshop on Empirical Software Engineering in Practice . IEEE, 49–54

work page 2014

[26] [26]

Karl Pearson. 1900. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 50, 302 (1900), 157–175

work page 1900

[27] [27]

Mohammad Masudur Rahman, Chanchal K Roy, and Raula G Kula. 2017. Predict- ing usefulness of code review comments using textual features and developer experience. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). IEEE, 215–226

work page 2017

[28] [28]

Eric Raymond. 1999. The cathedral and the bazaar. Knowledge, Technology & Policy 12, 3 (1999), 23–49

work page 1999

[29] [29]

Lucía Santamaría and Helena Mihaljević. 2018. Comparison and benchmark of name-to-gender inference services. PeerJ Computer Science 4 (2018), e156

work page 2018

[30] [30]

Daniel Schneider, Scott Spurlock, and Megan Squire. 2016. Differentiating com- munication styles of leaders on the linux kernel mailing list. In Proceedings of the 12th International Symposium on Open Collaboration . 1–10

work page 2016

[31] [31]

Asif Kamal Turzo and Amiangshu Bosu. 2023. What Makes a Code Review Useful to OpenDev Developers? An Empirical Investigation. Empirical Software Engineering (2023). Just Accepted

work page 2023

[32] [32]

Frank Wilcoxon, SK Katti, et al . 1970. Critical values and probability levels for the Wilcoxon rank sum test and the Wilcoxon signed rank test. Selected tables in mathematical statistics 1 (1970), 171–259. Received TBD; revised TBD; accepted TBD

work page 1970