Recognition: unknown
Multifaceted Hero Developers and Bug-Fixing Outcomes Across Severity
Pith reviewed 2026-05-07 06:47 UTC · model grok-4.3
The pith
Hero developers in open-source projects form largely distinct groups depending on whether contribution is measured by code activity or by discussion activity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Across 77 Apache Software Foundation projects, developers identified as heroes by commit count, distinct files touched, and code churn differ substantially from those identified by issue-comment count and number of distinct issues commented on. The pooled Jaccard overlap between the technical and social hero sets is 0.10. Technical heroes exhibit strong social activity in 71.4 percent of cases, while only 24.2 percent of social heroes exhibit strong technical activity. Fix-rate and reopen-rate differences across hero categories are modest, but the performance ranking of hero categories changes across bug severity levels.
What carries the argument
Comparison of hero sets defined by five contribution metrics (three technical: commit count, distinct files touched, churn; two social: issue-comment count, distinct issues commented on) and their linkage to fix rates and reopen rates stratified by bug severity.
If this is right
- Hero projects are common under every one of the five metrics examined.
- Technical and social hero sets overlap by only 0.10 on average.
- Technical heroes show strong social activity far more often than social heroes show strong technical activity.
- Fix and reopen rates exhibit only modest differences across hero categories, yet category rankings vary with bug severity.
- A single-metric definition of heroism is insufficient for reliable contributor identification or severity-aware assignment.
Where Pith is reading between the lines
- Project leads could retain more impact by tracking developers who score highly on both technical and social metrics rather than labeling them as heroes on one dimension alone.
- Automated triage systems might assign high-severity bugs to developers who appear in both technical and social hero sets to improve outcomes.
- The observed asymmetry suggests that strong technical output tends to generate or accompany discussion activity more often than the reverse in open-source settings.
- Testing the same metrics on non-Apache ecosystems would show whether the low overlap is general or specific to Apache governance practices.
Load-bearing premise
The five selected metrics validly and sufficiently capture distinct technical and social facets of contribution, and fix and reopen rates serve as appropriate proxies for bug-fixing outcomes across severity levels.
What would settle it
A replication on the same or similar projects that finds a Jaccard overlap above 0.4 between technical and social hero sets, or finds performance rankings that remain stable across severity levels, would falsify the claim that heroism is metric-dependent.
Figures
read the original abstract
Open-source projects often rely on a small group of highly active contributors known as hero developers. Prior work shows that hero developers are common in many OSS and enterprise projects, yet who qualifies as a hero depends heavily on the chosen contribution metric. Code-based metrics identify implementation-focused developers, whereas discussion-based metrics highlight coordination and communication; these metrics capture distinct facets of contribution. We conducted a measurement-sensitive study of multifaceted heroism across 77 Apache Software Foundation projects using three technical measures (commit count, distinct files touched, churn) and two social measures (issue-comment count, number of distinct issues commented on). We examined hero prevalence, overlap among hero sets, and severity-wise bug-fixing outcomes via fix and reopen rates. Results show that hero projects are common under all measures, but identified heroes differ substantially across facets. The pooled Jaccard overlap between technical and social hero sets is only 0.10. Cross-facet asymmetry is evident: 71.4% of technical heroes exhibit strong social activity, while only 24.2% of social heroes show strong technical activity. Fix-rate and reopen-rate differences are modest, yet hero-category rankings vary across severity levels and outcome measures. These findings indicate that heroism is not a single, metric-independent role. A multifaceted perspective offers a more reliable understanding of key contributors and better supports developer prioritisation and severity-aware bug assignment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper conducts an empirical measurement study across 77 Apache Software Foundation projects, using three technical metrics (commit count, distinct files touched, churn) and two social metrics (issue-comment count, distinct issues commented on) to identify hero developers. It reports high hero prevalence under all metrics, low pooled Jaccard overlap (0.10) between technical and social hero sets, cross-facet asymmetry (71.4% vs. 24.2%), and modest differences in fix and reopen rates that vary by severity level and hero category. The central claim is that heroism is metric-dependent and that a multifaceted perspective provides a more reliable understanding of key contributors while better supporting developer prioritization and severity-aware bug assignment.
Significance. If the results hold after addressing the gaps below, the work provides a sizable-sample demonstration that single-metric hero definitions are insufficient in OSS projects, with concrete evidence of low overlap and outcome variation across severity. The explicit use of multiple outcome proxies (fix/reopen rates) stratified by severity and the distinction between technical and social facets are strengths that could inform contributor analysis and assignment practices. The study does not include machine-checked proofs or parameter-free derivations but does rely on direct computation from project data rather than fitted models.
major comments (2)
- [Abstract] Abstract and Discussion: The claim that a multifaceted perspective 'offers a more reliable understanding of key contributors and better supports developer prioritisation and severity-aware bug assignment' is not supported by direct evidence. The reported results establish low overlap and varying rankings but contain no head-to-head evaluation (e.g., simulation of assignment effectiveness, predictive utility for fix/reopen rates, or comparison of union/ensemble identification against any single-metric baseline). Without such a test, the leap from distinct sets to superior decision support remains an untested assumption.
- [Methodology] Methodology section: No details are provided on the hero-identification thresholds (e.g., top-X% cutoff, absolute count, or percentile), data-cleaning steps, handling of project-size confounds, or error estimation for the overlap and rate calculations. These omissions are load-bearing because the central claims rest on the stability of the hero sets and the modest rate differences; without them, reproducibility and robustness cannot be assessed.
minor comments (2)
- [Abstract] Abstract: The selection criteria and time window for the 77 projects are not stated, which affects interpretation of prevalence and generalizability.
- [Results] Results: Clarify whether the reported fix/reopen rates are raw percentages or adjusted; if the latter, state the controls used.
Simulated Author's Rebuttal
We appreciate the referee's detailed feedback on our manuscript. We have carefully considered the major comments and provide point-by-point responses below. Where appropriate, we will revise the manuscript to address the concerns raised.
read point-by-point responses
-
Referee: [Abstract] Abstract and Discussion: The claim that a multifaceted perspective 'offers a more reliable understanding of key contributors and better supports developer prioritisation and severity-aware bug assignment' is not supported by direct evidence. The reported results establish low overlap and varying rankings but contain no head-to-head evaluation (e.g., simulation of assignment effectiveness, predictive utility for fix/reopen rates, or comparison of union/ensemble identification against any single-metric baseline). Without such a test, the leap from distinct sets to superior decision support remains an untested assumption.
Authors: We acknowledge that our study is primarily descriptive and does not include direct empirical tests of decision support, such as simulations comparing assignment strategies. The central contribution is the demonstration of low overlap between technical and social hero sets (Jaccard 0.10) and the variation in bug-fixing outcomes across severity levels, which we interpret as evidence that single-metric approaches are insufficient. To address this, we will revise the abstract and discussion sections to qualify the claim, changing 'offers a more reliable understanding' to 'suggests that a multifaceted perspective may provide a more reliable understanding' and similarly for the prioritization aspect, emphasizing that this is an implication rather than a directly tested outcome. We believe this interpretation is supported by the observed asymmetries and outcome differences, but agree that stronger validation would require additional experiments beyond the scope of this measurement study. revision: partial
-
Referee: [Methodology] Methodology section: No details are provided on the hero-identification thresholds (e.g., top-X% cutoff, absolute count, or percentile), data-cleaning steps, handling of project-size confounds, or error estimation for the overlap and rate calculations. These omissions are load-bearing because the central claims rest on the stability of the hero sets and the modest rate differences; without them, reproducibility and robustness cannot be assessed.
Authors: We thank the referee for pointing out these omissions, which are indeed important for reproducibility. In the revised manuscript, we will expand the Methodology section with a new subsection that explicitly details the hero-identification thresholds used for each metric (normalized per project to account for size differences), the data-cleaning steps including filtering of automated accounts and commit types, how project-size confounds were handled through relative metrics and per-project analysis, and error estimation via sensitivity analyses and confidence intervals for the reported overlap and rate differences. These additions will enhance the reproducibility and allow assessment of the robustness of our hero sets and findings. revision: yes
Circularity Check
No circularity: direct empirical computations from project data
full rationale
The paper is a measurement study that defines hero sets via five explicit contribution metrics, computes Jaccard overlaps and fix/reopen rates directly from Apache project data, and reports observed differences. No equations, fitted parameters, derivations, or self-citation chains appear in the load-bearing steps; all reported quantities (prevalence, 0.10 pooled overlap, 71.4%/24.2% asymmetry, category rankings) are independent calculations rather than reductions of the inputs by construction. The interpretive conclusion that a multifaceted view is more reliable follows from the empirical patterns without self-definitional or fitted-input circularity.
Axiom & Free-Parameter Ledger
free parameters (1)
- Hero identification threshold
axioms (2)
- domain assumption The selected technical and social metrics capture distinct and meaningful facets of developer contribution.
- domain assumption Fix rate and reopen rate are valid proxies for bug-fixing outcomes.
Reference graph
Works this paper leans on
-
[1]
Amritanshu Agrawal, Akond Rahman, Rahul Krishna, Alexander Sobran, and Tim Menzies. 2018. We don’t need another hero? the impact of "heroes" on software development. InProceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice(Gothenburg, Sweden) (ICSE-SEIP ’18). Association for Computing Machinery, New York...
-
[2]
Thomas Bock, Nils Alznauer, Mitchell Joblin, and Sven Apel. 2023. Automatic Core-Developer Identification on GitHub: A Validation Study.ACM Trans. Softw. Eng. Methodol.32, 6, Article 138 (Sept. 2023), 29 pages. doi:10.1145/3593803
-
[3]
H. Alperen Çetin and Eray Tüzün. 2020. Identifying key developers using artifact traceability graphs. InProceedings of the 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering(Virtual, USA) (PROMISE 2020). Association for Computing Machinery, New York, NY, USA, 51–60. doi:10.1145/3416508.3417116
-
[4]
Carlos DA De Almeida, Diego N Feijó, and Lincoln S Rocha. 2022. Studying the impact of continuous delivery adoption on bug-fixing time in apache’s open- source projects. InProceedings of the 19th International Conference on Mining Software Repositories. 132–136
2022
-
[5]
Amir Hossein Ghapanchi and Aybuke Aurum. 2011. Measuring the Effectiveness of the Defect-Fixing Process in Open Source Software Projects. InProceedings of the 2011 44th Hawaii International Conference on System Sciences (HICSS ’11). IEEE Computer Society, USA, 1–11. doi:10.1109/HICSS.2011.305
-
[6]
James Herbsleb. 2014. Socio-technical coordination (keynote). InCompanion Pro- ceedings of the 36th International Conference on Software Engineering(Hyderabad, India)(ICSE Companion 2014). Association for Computing Machinery, New York, NY, USA, 1. doi:10.1145/2591062.2600729
-
[7]
Elgun Jabrayilzade, Mikhail Evtikhiev, Eray Tüzün, and Vladimir Kovalenko
-
[8]
Bus factor in practice. InProceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice(Pittsburgh, Pennsylvania) (ICSE-SEIP ’22). Association for Computing Machinery, New York, NY, USA, 97–106. doi:10.1145/3510457.3513082
-
[9]
Yue Li, He Zhang, Yuzhe Jin, Zhong Ren, Liming Dong, Jun Lyu, Lanxin Yang, David Lo, and Dong Shao. 2024. An Explainable Automated Model for Measuring Software Engineer Contribution. InProceedings of the 39th IEEE/ACM Interna- tional Conference on Automated Software Engineering(Sacramento, CA, USA) (ASE ’24). Association for Computing Machinery, New York,...
-
[10]
Yue Li, He Zhang, Lanxin Yang, Liming Dong, Juzheng Zhang, and Bohan Liu
-
[11]
Measuring software engineer’s contribution in practice: An industrial experience report.Journal of Software: Evolution and Process37, 1 (2025), e2722
2025
- [12]
-
[13]
Filippo Ricca and Alessandro Marchetto. 2010. Are Heroes common in FLOSS projects?. InProceedings of the 2010 ACM-IEEE International Symposium on Em- pirical Software Engineering and Measurement(Bolzano-Bozen, Italy)(ESEM ’10). Association for Computing Machinery, New York, NY, USA, Article 55, 4 pages. doi:10.1145/1852786.1852856
-
[14]
Gregorio Robles and Jesus M Gonzalez-Barahona. 2006. Contributor turnover in li- bre software projects. InOpen Source Systems: IFIP Working Group 2.13 Foundation on Open Source Software, June 8–10, 2006, Como, Italy 2. Springer, 273–286
2006
-
[15]
Gregorio Robles, Jesus M Gonzalez-Barahona, and Israel Herraiz. 2009. Evolu- tion of the core team of developers in libre software projects. In2009 6th IEEE international working conference on mining software repositories. IEEE, 167–170
2009
-
[16]
Yuqiang Sun, Zhengzi Xu, Chengwei Liu, Yiran Zhang, and Yang Liu. 2024. Who is the Real Hero? Measuring Developer Contribution via Multi-Dimensional Data Integration. InProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering(Echternach, Luxembourg)(ASE ’23). IEEE Press, 825–836. doi:10.1109/ASE56229.2023.00102
- [17]
-
[18]
Michail Tsikerdekis. 2018. Persistent code contribution: a ranking algorithm for code contribution in crowdsourced software.Empirical Software Engineering23, 4 (2018), 1871–1894
2018
-
[19]
Guo, and Brendan Mur- phy
Thomas Zimmermann, Nachiappan Nagappan, Philip J. Guo, and Brendan Mur- phy. 2012. Characterizing and predicting which bugs get reopened. InProceedings of the 34th International Conference on Software Engineering(Zurich, Switzerland) (ICSE ’12). IEEE Press, 1074–1083
2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.