On the Informativeness of Security Commit Messages: A Large-scale Replication Study
Pith reviewed 2026-05-10 00:05 UTC · model grok-4.3
The pith
Security-related commit messages are generally not informative enough for security purposes, as confirmed by an independent large-scale replication.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that an independent re-implementation of the original informativeness assessment, applied to 50,673 security-related commits from GitHub between June 1999 and August 2022, reproduces the finding that commit messages are in general not informative enough for security-focused purposes. Extending the dataset to October 2025 shows that informativeness has worsened over time. Breaking the results down by software ecosystem reveals statistically significant differences, while CCS-compliant commits prove less informative than non-compliant ones.
What carries the argument
Independent re-implementation of the original informativeness scoring techniques applied to newly retrieved security commits without reuse of prior artifacts or data.
If this is right
- Informativeness of security commit messages has declined from 1999 through 2025.
- Significant differences in message quality exist across major software ecosystems.
- Commits following the Conventional Commits Specification are less informative than those that do not.
- Cross-ecosystem studies are needed to develop more effective guidelines for security commit messages.
Where Pith is reading between the lines
- Project maintainers could adopt automated checks or templates to boost the presence of vulnerability references in security commits.
- Tool builders might explore ways to generate or suggest more informative messages directly from patch diffs.
- Community guidelines could be tested for whether they improve triage speed in practice beyond current specifications.
Load-bearing premise
The re-implementation of the original measurement techniques scores commit informativeness in a way that matches the prior study's criteria without systematic differences.
What would settle it
A fresh analysis using the same scoring rules on a comparable set of security commits that finds most messages contain clear vulnerability details, affected versions, and fix descriptions would contradict the replication result.
Figures
read the original abstract
The informativeness of security-related commit messages is crucial for patch triage: when high, it enables the rapid distribution and deployment of security fixes. Prior research (Reis et al., 2023) reported, however, that commit messages are often too uninformative to support these activities. To assess the robustness of this negative result, we independently replicate the original study using only the information provided in the paper, without reusing any of the original artifacts (data, analysis pipeline, etc.). We retrieve \num{50673} security-related commits and analyze their informativeness using an independent re-implementation of the techniques introduced by Reis et al. For the same source (i.e., GitHub) and time period (from June 1999 to August 2022) as the original study, our replication confirms the original findings in a statistically significant way: security-related commit messages are, in general, not informative enough for security-focused purposes. We then extend the original study in several ways. Over a longer time period (from June 1999 to October 2025), we find that commit-message informativeness is worsening. Breaking results down by software ecosystem (Linux kernel, Ubuntu, Go, PyPI, etc.), we observe significant differences in informativeness. Finally, we examine emerging best practices for writing commit messages, such as the Conventional Commits Specification (CCS), and again find significant differences in an unexpected direction: CCS-compliant commits are less informative than non-compliant ones. Our findings highlight the need for cross-ecosystem analyses to understand platform- and community-specific commit-message practices, and to inform the development and adoption of universally applicable guidelines for writing informative security-related commit messages.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents an independent replication of Reis et al. (2023) on the informativeness of security-related commit messages. Using only information from the original paper, the authors retrieve 50,673 commits from GitHub (June 1999–August 2022) and apply a fresh re-implementation of the informativeness scoring techniques. They statistically confirm the original finding that security commit messages are generally not informative enough for security purposes. Extensions include analysis over an extended period to October 2025 (showing declining informativeness), breakdowns by ecosystem (Linux, Ubuntu, Go, PyPI, etc.) revealing significant differences, and a comparison of Conventional Commits Specification (CCS) compliant vs. non-compliant messages (finding CCS commits less informative).
Significance. If the re-implementation is faithful, the work provides a robust, large-scale independent confirmation of a practically important negative result, supported by independent data retrieval rather than reuse of original artifacts. The extensions add value by documenting temporal decline, ecosystem-specific patterns, and an unexpected CCS effect, which could inform guidelines for security patch communication. The scale (50,673 commits) and claimed statistical significance are strengths for generalizability.
major comments (2)
- [Section 3] Section 3 (Replication Methodology and Re-implementation): The manuscript states that an independent re-implementation of Reis et al.'s informativeness scoring was used but supplies neither the exact decision rules (e.g., keyword lists, sentence patterns, or thresholds), nor any quantitative fidelity validation such as inter-rater reliability, Cohen's kappa, or agreement on a held-out sample of commits. This is load-bearing for the central replication claim, because without evidence that the new scoring produces comparable results to the 2023 study, the 'statistically significant confirmation' could reflect differences in operationalization rather than true replication of the original finding.
- [Section 5] Section 5 (Extended Analysis, CCS comparison): The claim that CCS-compliant commits are 'less informative' in an 'unexpected direction' is presented without discussion of potential confounds such as differing project maturity, commit volume, or ecosystem-specific adoption rates of CCS. This weakens the interpretation of the ecosystem and best-practice extensions, which are positioned as key contributions beyond the replication.
minor comments (2)
- [Abstract] Abstract: The claim of confirmation 'in a statistically significant way' is stated without any mention of the specific statistical tests, p-values, or effect sizes used; adding these details would improve transparency.
- The manuscript would benefit from an appendix containing the full re-implementation decision rules or pseudocode to enable future replications, even if the main text remains high-level.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify the replication claims and strengthen the extensions. We address each major point below, proposing targeted revisions for transparency and robustness while maintaining the independence of our re-implementation.
read point-by-point responses
-
Referee: [Section 3] Section 3 (Replication Methodology and Re-implementation): The manuscript states that an independent re-implementation of Reis et al.'s informativeness scoring was used but supplies neither the exact decision rules (e.g., keyword lists, sentence patterns, or thresholds), nor any quantitative fidelity validation such as inter-rater reliability, Cohen's kappa, or agreement on a held-out sample of commits. This is load-bearing for the central replication claim, because without evidence that the new scoring produces comparable results to the 2023 study, the 'statistically significant confirmation' could reflect differences in operationalization rather than true replication of the original finding.
Authors: We agree that greater transparency is needed for the re-implementation. Our approach followed the high-level description in Reis et al. (2023) without access to their artifacts or code, which precludes direct inter-rater reliability or Cohen's kappa against the original scoring. In revision we will add an appendix with the complete keyword lists, sentence patterns, and thresholds used. We will also report results from a manual validation on a held-out sample of 200 commits (two raters, agreement rate and Cohen's kappa), confirming that our operationalization aligns with the original study's intent. This addresses potential differences in implementation while preserving the independent nature of the replication. revision: yes
-
Referee: [Section 5] Section 5 (Extended Analysis, CCS comparison): The claim that CCS-compliant commits are 'less informative' in an 'unexpected direction' is presented without discussion of potential confounds such as differing project maturity, commit volume, or ecosystem-specific adoption rates of CCS. This weakens the interpretation of the ecosystem and best-practice extensions, which are positioned as key contributions beyond the replication.
Authors: We acknowledge the need to address potential confounds in the CCS analysis. In the revised manuscript we will add controls for project maturity (measured by repository age and number of commits), commit volume per project, and ecosystem-specific CCS adoption rates. We will report adjusted comparisons (e.g., via regression or matched samples) to assess whether the lower informativeness persists after accounting for these factors. The unexpected direction remains a notable observation warranting further study, but the added discussion will clarify its interpretation and strengthen the contribution of the best-practice extension. revision: yes
Circularity Check
No circularity: independent data retrieval and re-implementation of external method
full rationale
The paper retrieves 50,673 new security commits directly from GitHub for the original time window and applies an independent re-implementation derived solely from the published description of Reis et al. (2023). No equations, fitted parameters, or predictions are defined in terms of the target result. The cited prior work is by different authors and is treated as an external benchmark rather than a self-citation chain. The confirmation claim rests on applying the described procedure to fresh data, which is externally falsifiable and does not reduce to the paper's own inputs by construction. This is a standard empirical replication with no load-bearing self-referential steps.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
[n. d.]. ACM Artifact Review and Badging. https://www.acm.org/publications/ policies/artifact-review-and-badging-current. Last Accessed: February 25, 2026
work page 2026
-
[2]
[n. d.]. Conventional Commits. https://www.conventionalcommits.org/en/v1.0.0/. Last Accessed: December 10, 2025
work page 2025
-
[3]
[n. d.]. NVD (National Vulnerability Database). https://nvd.nist.gov/. Last Accessed: December 23, 2025
work page 2025
-
[4]
[n. d.]. OSV (Open-Source Vulnerability Database). https://osv.dev/. Last Accessed: December 23, 2025
work page 2025
-
[5]
[n. d.]. Python package (conventional-pre-commit). https://pypi.org/project/ conventional-pre-commit. Last Accessed: February 25, 2026
work page 2026
-
[6]
[n. d.]. Python package (langdetect). https://pypi.org/project/langdetect/. Last Accessed: February 25, 2026
work page 2026
-
[7]
Ahmad Abdellatif, Mairieli Wessel, Igor Steinmacher, Marco A Gerosa, and Emad Shihab. 2022. BotHunter: An approach to detect software bots in GitHub. In Proceedings of the 19th International Conference on Mining Software Repositories. 6–17
work page 2022
-
[8]
Jean-Francois Abramatic, Roberto Di Cosmo, and Stefano Zacchiroli. 2018. Build- ing the Universal Archive of Source Code.Commun. ACM61, 10 (October 2018), 29–31. doi:10.1145/3183558
-
[9]
C Banerjee and SK Pandey. 2010. Research on software security awareness: problems and prospects.ACM SIGSOFT Software Engineering Notes35, 5 (2010), 1–5
work page 2010
-
[10]
Russell Brandom. 2017. Former Equifax CEO blames breach on a single person who failed to deploy patch. https://www.theverge.com/2017/10/3/16410806/ equifax-ceo-blame-breach-patch-congress-testimony. Last Accessed: Feburary 12, 2026
work page 2017
-
[11]
Natarajan Chidambaram, Alexandre Decan, and Tom Mens. 2023. A dataset of bot and human activities in GitHub. In2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR). IEEE, 465–469
work page 2023
-
[12]
Natarajan Chidambaram and Tom Mens. 2025. Observing bots in the wild: A quantitative analysis of a large open source ecosystem. In2025 IEEE/ACM International Workshop on Bots in Software Engineering (BotSE). IEEE, 1–5
work page 2025
-
[13]
Nesara Dissanayake, Asangi Jayatilaka, Mansooreh Zahedi, and M Ali Babar
-
[14]
Software security patch management-A systematic literature review of challenges, approaches, tools and practices.Information and Software Technology 144 (2022), 106771
work page 2022
-
[15]
Kelsey R Fulton, Daniel Votipka, Desiree Abrokwa, Michelle L Mazurek, Michael Hicks, and James Parker. 2022. Understanding the how and the why: Exploring secure development practices through a course competition. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. 1141–1155
work page 2022
-
[16]
Mehdi Golzadeh, Alexandre Decan, Damien Legay, and Tom Mens. 2021. A ground-truth dataset and classification model for detecting bots in GitHub issue and PR comments.Journal of Systems and Software175 (2021), 110911
work page 2021
-
[17]
Dan Goodin. 2017. Failure to patch two-month-old bug led to massive Equifax breach. https://arstechnica.com/information-technology/2017/09/ massive-equifax-breach-caused-by-failure-to-patch-two-month-old-bug/. Last Accessed: Feburary 12, 2026
work page 2017
-
[18]
Yuxiang Guo, Xiaopeng Gao, Zhenyu Zhang, Wing Kwong Chan, and Bo Jiang
-
[19]
In2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS)
A study on the impact of pre-trained model on Just-In-Time defect predic- tion. In2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS). IEEE, 105–116
-
[20]
Syful Islam and Stefano Zacchiroli. 2026. Replication Package (On the Infor- mativeness of Security Commit Messages: A Large-scale Replication Study). https://doi.org/10.5281/zenodo.18757725. Last Accessed: February 26, 2026
-
[21]
Frank Li, Lisa Rogers, Arunesh Mathur, Nathan Malkin, and Marshini Chetty
-
[22]
InFifteenth Symposium on Usable Privacy and Security (SOUPS 2019)
Keepers of the machines: Examining how system administrators manage software updates for multiple machines. InFifteenth Symposium on Usable Privacy and Security (SOUPS 2019). 273–288
work page 2019
-
[23]
Moritz Mock, Thomas Forrer, and Barbara Russo. 2024. Where do developers ad- mit their security-related concerns?. InInternational Conference on Agile Software Development. Springer, 189–195
work page 2024
-
[24]
Patrick Morrison, Tosin Daniel Oyetoyan, and Laurie Williams. 2018. Identifying security issues in software development: are keywords enough?. InProceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. 426–427
work page 2018
-
[25]
Giang Nguyen-Truong, Hong Jin Kang, David Lo, Abhishek Sharma, Andrew E Santosa, Asankhaya Sharma, and Ming Yi Ang. 2022. Hermes: Using commit- issue linking to detect vulnerability-fixing commits. In2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 51–62
work page 2022
-
[26]
Antoine Pietri, Diomidis Spinellis, and Stefano Zacchiroli. 2020. The Software Heritage Graph Dataset: Large-scale Analysis of Public Software Development History. InMSR 2020: The 17th International Conference on Mining Software Repos- itories. IEEE, 1–5. doi:10.1145/3379597.3387510
-
[27]
Sofia Reis, Rui Abreu, Hakan Erdogmus, and Corina Păsăreanu. 2022. SECOM: Towards a convention for security commit messages. InProceedings of the 19th International Conference on Mining Software Repositories. 764–765
work page 2022
-
[28]
Sofia Reis, Rui Abreu, and Corina Pasareanu. 2023. Are security commit messages informative? Not enough!. InProceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering. 196–199
work page 2023
- [29]
-
[30]
Antonino Sabetta and Michele Bezzi. 2018. A practical approach to the automatic classification of security-relevant commits. In2018 IEEE International conference on software maintenance and evolution (ICSME). IEEE, 579–582
work page 2018
-
[31]
Arthur D Sawadogo, Tegawendé F Bissyandé, Naouel Moha, Kevin Allix, Jacques Klein, Li Li, and Yves Le Traon. 2022. SSPCatcher: Learning to catch security patches.Empirical Software Engineering27, 6 (2022), 151
work page 2022
-
[32]
Sho Suzuki, Hirohisa Aman, Sousuke Amasaki, Tomoyuki Yokogawa, and Minoru Kawahara. 2017. An application of the pagerank algorithm to commit evaluation on git repository. In2017 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, 380–383
work page 2017
-
[33]
Xin Tan, Yuan Zhang, Chenyuan Mi, Jiajun Cao, Kun Sun, Yifan Lin, and Min Yang. 2021. Locating the security patches for disclosed oss vulnerabilities with vulnerability-commit correlation ranking. InProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 3282–3299
work page 2021
-
[34]
Yingchen Tian, Yuxia Zhang, Klaas-Jan Stol, Lin Jiang, and Hui Liu. 2022. What makes a good commit message?. InProceedings of the 44th International Conference on Software Engineering. 2389–2401
work page 2022
-
[35]
Christian Tiefenau, Maximilian Häring, Katharina Krombholz, and Emanuel Von Zezschwitz. 2020. Security, availability, and multiple information sources: Exploring update behavior of system administrators. InSixteenth Symposium on Usable Privacy and Security (SOUPS 2020). 239–258
work page 2020
-
[36]
Shichao Wang, Yun Zhang, Liagfeng Bao, Xin Xia, and Minghui Wu. 2022. Vc- match: a ranking-based approach for automatic security patches localization for OSS vulnerabilities. In2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 589–600
work page 2022
-
[37]
Zhendong Wang, Yi Wang, and David Redmiles. 2022. From specialized mechanics to project butlers: The usage of bots in open source software development.IEEE Software39, 5 (2022), 38–43
work page 2022
-
[38]
Zhengran Zeng, Yuqun Zhang, Haotian Zhang, and Lingming Zhang. 2021. Deep just-in-time defect prediction: how far are we?. InProceedings of the 30th ACM SIGSOFT international symposium on software testing and analysis. 427–438
work page 2021
-
[39]
Yaqin Zhou, Jing Kai Siow, Chenyu Wang, Shangqing Liu, and Yang Liu. 2021. Spi: Automated identification of security patches via commits.ACM Transactions on Software Engineering and Methodology (TOSEM)31, 1 (2021), 1–27
work page 2021
-
[40]
Fei Zuo, Xin Zhang, Yuqi Song, Junghwan Rhee, and Jicheng Fu. 2023. Commit message can help: security patch detection in open source software via trans- former. In2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA). IEEE, 345–351
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.