pith. sign in

arxiv: 2604.20461 · v1 · submitted 2026-04-22 · 💻 cs.SE

On the Informativeness of Security Commit Messages: A Large-scale Replication Study

Pith reviewed 2026-05-10 00:05 UTC · model grok-4.3

classification 💻 cs.SE
keywords security commit messagescommit message informativenessreplication studypatch triagesoftware securityConventional Commits Specificationsoftware ecosystemsGitHub commits
0
0 comments X

The pith

Security-related commit messages are generally not informative enough for security purposes, as confirmed by an independent large-scale replication.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper replicates a 2023 study on how well security commit messages support patch triage and rapid fixes. Using only information from the original paper, the authors independently retrieve 50,673 security commits from GitHub and re-implement the prior measurement techniques. Their analysis confirms in a statistically significant way that these messages are typically too uninformative for security needs. The work extends the replication by covering a longer period through 2025, where informativeness appears to be declining, and by comparing results across ecosystems such as the Linux kernel, Ubuntu, Go, and PyPI. It also tests the Conventional Commits Specification and finds that compliant messages score lower on informativeness than non-compliant ones.

Core claim

The paper establishes that an independent re-implementation of the original informativeness assessment, applied to 50,673 security-related commits from GitHub between June 1999 and August 2022, reproduces the finding that commit messages are in general not informative enough for security-focused purposes. Extending the dataset to October 2025 shows that informativeness has worsened over time. Breaking the results down by software ecosystem reveals statistically significant differences, while CCS-compliant commits prove less informative than non-compliant ones.

What carries the argument

Independent re-implementation of the original informativeness scoring techniques applied to newly retrieved security commits without reuse of prior artifacts or data.

If this is right

  • Informativeness of security commit messages has declined from 1999 through 2025.
  • Significant differences in message quality exist across major software ecosystems.
  • Commits following the Conventional Commits Specification are less informative than those that do not.
  • Cross-ecosystem studies are needed to develop more effective guidelines for security commit messages.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Project maintainers could adopt automated checks or templates to boost the presence of vulnerability references in security commits.
  • Tool builders might explore ways to generate or suggest more informative messages directly from patch diffs.
  • Community guidelines could be tested for whether they improve triage speed in practice beyond current specifications.

Load-bearing premise

The re-implementation of the original measurement techniques scores commit informativeness in a way that matches the prior study's criteria without systematic differences.

What would settle it

A fresh analysis using the same scoring rules on a comparable set of security commits that finds most messages contain clear vulnerability details, affected versions, and fix descriptions would contradict the replication result.

Figures

Figures reproduced from arXiv: 2604.20461 by Stefano Zacchiroli, Syful Islam.

Figure 1
Figure 1. Figure 1: Overview of the methodology of our study. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
read the original abstract

The informativeness of security-related commit messages is crucial for patch triage: when high, it enables the rapid distribution and deployment of security fixes. Prior research (Reis et al., 2023) reported, however, that commit messages are often too uninformative to support these activities. To assess the robustness of this negative result, we independently replicate the original study using only the information provided in the paper, without reusing any of the original artifacts (data, analysis pipeline, etc.). We retrieve \num{50673} security-related commits and analyze their informativeness using an independent re-implementation of the techniques introduced by Reis et al. For the same source (i.e., GitHub) and time period (from June 1999 to August 2022) as the original study, our replication confirms the original findings in a statistically significant way: security-related commit messages are, in general, not informative enough for security-focused purposes. We then extend the original study in several ways. Over a longer time period (from June 1999 to October 2025), we find that commit-message informativeness is worsening. Breaking results down by software ecosystem (Linux kernel, Ubuntu, Go, PyPI, etc.), we observe significant differences in informativeness. Finally, we examine emerging best practices for writing commit messages, such as the Conventional Commits Specification (CCS), and again find significant differences in an unexpected direction: CCS-compliant commits are less informative than non-compliant ones. Our findings highlight the need for cross-ecosystem analyses to understand platform- and community-specific commit-message practices, and to inform the development and adoption of universally applicable guidelines for writing informative security-related commit messages.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents an independent replication of Reis et al. (2023) on the informativeness of security-related commit messages. Using only information from the original paper, the authors retrieve 50,673 commits from GitHub (June 1999–August 2022) and apply a fresh re-implementation of the informativeness scoring techniques. They statistically confirm the original finding that security commit messages are generally not informative enough for security purposes. Extensions include analysis over an extended period to October 2025 (showing declining informativeness), breakdowns by ecosystem (Linux, Ubuntu, Go, PyPI, etc.) revealing significant differences, and a comparison of Conventional Commits Specification (CCS) compliant vs. non-compliant messages (finding CCS commits less informative).

Significance. If the re-implementation is faithful, the work provides a robust, large-scale independent confirmation of a practically important negative result, supported by independent data retrieval rather than reuse of original artifacts. The extensions add value by documenting temporal decline, ecosystem-specific patterns, and an unexpected CCS effect, which could inform guidelines for security patch communication. The scale (50,673 commits) and claimed statistical significance are strengths for generalizability.

major comments (2)
  1. [Section 3] Section 3 (Replication Methodology and Re-implementation): The manuscript states that an independent re-implementation of Reis et al.'s informativeness scoring was used but supplies neither the exact decision rules (e.g., keyword lists, sentence patterns, or thresholds), nor any quantitative fidelity validation such as inter-rater reliability, Cohen's kappa, or agreement on a held-out sample of commits. This is load-bearing for the central replication claim, because without evidence that the new scoring produces comparable results to the 2023 study, the 'statistically significant confirmation' could reflect differences in operationalization rather than true replication of the original finding.
  2. [Section 5] Section 5 (Extended Analysis, CCS comparison): The claim that CCS-compliant commits are 'less informative' in an 'unexpected direction' is presented without discussion of potential confounds such as differing project maturity, commit volume, or ecosystem-specific adoption rates of CCS. This weakens the interpretation of the ecosystem and best-practice extensions, which are positioned as key contributions beyond the replication.
minor comments (2)
  1. [Abstract] Abstract: The claim of confirmation 'in a statistically significant way' is stated without any mention of the specific statistical tests, p-values, or effect sizes used; adding these details would improve transparency.
  2. The manuscript would benefit from an appendix containing the full re-implementation decision rules or pseudocode to enable future replications, even if the main text remains high-level.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the replication claims and strengthen the extensions. We address each major point below, proposing targeted revisions for transparency and robustness while maintaining the independence of our re-implementation.

read point-by-point responses
  1. Referee: [Section 3] Section 3 (Replication Methodology and Re-implementation): The manuscript states that an independent re-implementation of Reis et al.'s informativeness scoring was used but supplies neither the exact decision rules (e.g., keyword lists, sentence patterns, or thresholds), nor any quantitative fidelity validation such as inter-rater reliability, Cohen's kappa, or agreement on a held-out sample of commits. This is load-bearing for the central replication claim, because without evidence that the new scoring produces comparable results to the 2023 study, the 'statistically significant confirmation' could reflect differences in operationalization rather than true replication of the original finding.

    Authors: We agree that greater transparency is needed for the re-implementation. Our approach followed the high-level description in Reis et al. (2023) without access to their artifacts or code, which precludes direct inter-rater reliability or Cohen's kappa against the original scoring. In revision we will add an appendix with the complete keyword lists, sentence patterns, and thresholds used. We will also report results from a manual validation on a held-out sample of 200 commits (two raters, agreement rate and Cohen's kappa), confirming that our operationalization aligns with the original study's intent. This addresses potential differences in implementation while preserving the independent nature of the replication. revision: yes

  2. Referee: [Section 5] Section 5 (Extended Analysis, CCS comparison): The claim that CCS-compliant commits are 'less informative' in an 'unexpected direction' is presented without discussion of potential confounds such as differing project maturity, commit volume, or ecosystem-specific adoption rates of CCS. This weakens the interpretation of the ecosystem and best-practice extensions, which are positioned as key contributions beyond the replication.

    Authors: We acknowledge the need to address potential confounds in the CCS analysis. In the revised manuscript we will add controls for project maturity (measured by repository age and number of commits), commit volume per project, and ecosystem-specific CCS adoption rates. We will report adjusted comparisons (e.g., via regression or matched samples) to assess whether the lower informativeness persists after accounting for these factors. The unexpected direction remains a notable observation warranting further study, but the added discussion will clarify its interpretation and strengthen the contribution of the best-practice extension. revision: yes

Circularity Check

0 steps flagged

No circularity: independent data retrieval and re-implementation of external method

full rationale

The paper retrieves 50,673 new security commits directly from GitHub for the original time window and applies an independent re-implementation derived solely from the published description of Reis et al. (2023). No equations, fitted parameters, or predictions are defined in terms of the target result. The cited prior work is by different authors and is treated as an external benchmark rather than a self-citation chain. The confirmation claim rests on applying the described procedure to fresh data, which is externally falsifiable and does not reduce to the paper's own inputs by construction. This is a standard empirical replication with no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only view; no explicit free parameters, axioms, or invented entities are described. The work relies on standard statistical significance testing and an operational definition of commit informativeness taken from prior literature.

pith-pipeline@v0.9.0 · 5606 in / 1115 out tokens · 32250 ms · 2026-05-10T00:05:45.513614+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

  1. [1]

    [n. d.]. ACM Artifact Review and Badging. https://www.acm.org/publications/ policies/artifact-review-and-badging-current. Last Accessed: February 25, 2026

  2. [2]

    [n. d.]. Conventional Commits. https://www.conventionalcommits.org/en/v1.0.0/. Last Accessed: December 10, 2025

  3. [3]

    [n. d.]. NVD (National Vulnerability Database). https://nvd.nist.gov/. Last Accessed: December 23, 2025

  4. [4]

    [n. d.]. OSV (Open-Source Vulnerability Database). https://osv.dev/. Last Accessed: December 23, 2025

  5. [5]

    [n. d.]. Python package (conventional-pre-commit). https://pypi.org/project/ conventional-pre-commit. Last Accessed: February 25, 2026

  6. [6]

    [n. d.]. Python package (langdetect). https://pypi.org/project/langdetect/. Last Accessed: February 25, 2026

  7. [7]

    Ahmad Abdellatif, Mairieli Wessel, Igor Steinmacher, Marco A Gerosa, and Emad Shihab. 2022. BotHunter: An approach to detect software bots in GitHub. In Proceedings of the 19th International Conference on Mining Software Repositories. 6–17

  8. [8]

    Jean-Francois Abramatic, Roberto Di Cosmo, and Stefano Zacchiroli. 2018. Build- ing the Universal Archive of Source Code.Commun. ACM61, 10 (October 2018), 29–31. doi:10.1145/3183558

  9. [9]

    C Banerjee and SK Pandey. 2010. Research on software security awareness: problems and prospects.ACM SIGSOFT Software Engineering Notes35, 5 (2010), 1–5

  10. [10]

    Russell Brandom. 2017. Former Equifax CEO blames breach on a single person who failed to deploy patch. https://www.theverge.com/2017/10/3/16410806/ equifax-ceo-blame-breach-patch-congress-testimony. Last Accessed: Feburary 12, 2026

  11. [11]

    Natarajan Chidambaram, Alexandre Decan, and Tom Mens. 2023. A dataset of bot and human activities in GitHub. In2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR). IEEE, 465–469

  12. [12]

    Natarajan Chidambaram and Tom Mens. 2025. Observing bots in the wild: A quantitative analysis of a large open source ecosystem. In2025 IEEE/ACM International Workshop on Bots in Software Engineering (BotSE). IEEE, 1–5

  13. [13]

    Nesara Dissanayake, Asangi Jayatilaka, Mansooreh Zahedi, and M Ali Babar

  14. [14]

    Software security patch management-A systematic literature review of challenges, approaches, tools and practices.Information and Software Technology 144 (2022), 106771

  15. [15]

    Kelsey R Fulton, Daniel Votipka, Desiree Abrokwa, Michelle L Mazurek, Michael Hicks, and James Parker. 2022. Understanding the how and the why: Exploring secure development practices through a course competition. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. 1141–1155

  16. [16]

    Mehdi Golzadeh, Alexandre Decan, Damien Legay, and Tom Mens. 2021. A ground-truth dataset and classification model for detecting bots in GitHub issue and PR comments.Journal of Systems and Software175 (2021), 110911

  17. [17]

    Dan Goodin. 2017. Failure to patch two-month-old bug led to massive Equifax breach. https://arstechnica.com/information-technology/2017/09/ massive-equifax-breach-caused-by-failure-to-patch-two-month-old-bug/. Last Accessed: Feburary 12, 2026

  18. [18]

    Yuxiang Guo, Xiaopeng Gao, Zhenyu Zhang, Wing Kwong Chan, and Bo Jiang

  19. [19]

    In2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS)

    A study on the impact of pre-trained model on Just-In-Time defect predic- tion. In2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS). IEEE, 105–116

  20. [20]

    Syful Islam and Stefano Zacchiroli. 2026. Replication Package (On the Infor- mativeness of Security Commit Messages: A Large-scale Replication Study). https://doi.org/10.5281/zenodo.18757725. Last Accessed: February 26, 2026

  21. [21]

    Frank Li, Lisa Rogers, Arunesh Mathur, Nathan Malkin, and Marshini Chetty

  22. [22]

    InFifteenth Symposium on Usable Privacy and Security (SOUPS 2019)

    Keepers of the machines: Examining how system administrators manage software updates for multiple machines. InFifteenth Symposium on Usable Privacy and Security (SOUPS 2019). 273–288

  23. [23]

    Moritz Mock, Thomas Forrer, and Barbara Russo. 2024. Where do developers ad- mit their security-related concerns?. InInternational Conference on Agile Software Development. Springer, 189–195

  24. [24]

    Patrick Morrison, Tosin Daniel Oyetoyan, and Laurie Williams. 2018. Identifying security issues in software development: are keywords enough?. InProceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. 426–427

  25. [25]

    Giang Nguyen-Truong, Hong Jin Kang, David Lo, Abhishek Sharma, Andrew E Santosa, Asankhaya Sharma, and Ming Yi Ang. 2022. Hermes: Using commit- issue linking to detect vulnerability-fixing commits. In2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 51–62

  26. [26]

    Antoine Pietri, Diomidis Spinellis, and Stefano Zacchiroli. 2020. The Software Heritage Graph Dataset: Large-scale Analysis of Public Software Development History. InMSR 2020: The 17th International Conference on Mining Software Repos- itories. IEEE, 1–5. doi:10.1145/3379597.3387510

  27. [27]

    Sofia Reis, Rui Abreu, Hakan Erdogmus, and Corina Păsăreanu. 2022. SECOM: Towards a convention for security commit messages. InProceedings of the 19th International Conference on Mining Software Repositories. 764–765

  28. [28]

    Sofia Reis, Rui Abreu, and Corina Pasareanu. 2023. Are security commit messages informative? Not enough!. InProceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering. 196–199

  29. [29]

    Sofia Reis, Corina Pasareanu, Rui Abreu, and Hakan Erdogmus. 2023. SECOMlint: A linter for Security Commit Messages.arXiv preprint arXiv:2301.06959(2023)

  30. [30]

    Antonino Sabetta and Michele Bezzi. 2018. A practical approach to the automatic classification of security-relevant commits. In2018 IEEE International conference on software maintenance and evolution (ICSME). IEEE, 579–582

  31. [31]

    Arthur D Sawadogo, Tegawendé F Bissyandé, Naouel Moha, Kevin Allix, Jacques Klein, Li Li, and Yves Le Traon. 2022. SSPCatcher: Learning to catch security patches.Empirical Software Engineering27, 6 (2022), 151

  32. [32]

    Sho Suzuki, Hirohisa Aman, Sousuke Amasaki, Tomoyuki Yokogawa, and Minoru Kawahara. 2017. An application of the pagerank algorithm to commit evaluation on git repository. In2017 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, 380–383

  33. [33]

    Xin Tan, Yuan Zhang, Chenyuan Mi, Jiajun Cao, Kun Sun, Yifan Lin, and Min Yang. 2021. Locating the security patches for disclosed oss vulnerabilities with vulnerability-commit correlation ranking. InProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 3282–3299

  34. [34]

    Yingchen Tian, Yuxia Zhang, Klaas-Jan Stol, Lin Jiang, and Hui Liu. 2022. What makes a good commit message?. InProceedings of the 44th International Conference on Software Engineering. 2389–2401

  35. [35]

    Christian Tiefenau, Maximilian Häring, Katharina Krombholz, and Emanuel Von Zezschwitz. 2020. Security, availability, and multiple information sources: Exploring update behavior of system administrators. InSixteenth Symposium on Usable Privacy and Security (SOUPS 2020). 239–258

  36. [36]

    Shichao Wang, Yun Zhang, Liagfeng Bao, Xin Xia, and Minghui Wu. 2022. Vc- match: a ranking-based approach for automatic security patches localization for OSS vulnerabilities. In2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 589–600

  37. [37]

    Zhendong Wang, Yi Wang, and David Redmiles. 2022. From specialized mechanics to project butlers: The usage of bots in open source software development.IEEE Software39, 5 (2022), 38–43

  38. [38]

    Zhengran Zeng, Yuqun Zhang, Haotian Zhang, and Lingming Zhang. 2021. Deep just-in-time defect prediction: how far are we?. InProceedings of the 30th ACM SIGSOFT international symposium on software testing and analysis. 427–438

  39. [39]

    Yaqin Zhou, Jing Kai Siow, Chenyu Wang, Shangqing Liu, and Yang Liu. 2021. Spi: Automated identification of security patches via commits.ACM Transactions on Software Engineering and Methodology (TOSEM)31, 1 (2021), 1–27

  40. [40]

    Fei Zuo, Xin Zhang, Yuqi Song, Junghwan Rhee, and Jicheng Fu. 2023. Commit message can help: security patch detection in open source software via trans- former. In2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA). IEEE, 345–351