arxiv: 2604.27754 · v1 · submitted 2026-04-30 · 💻 cs.SE

Recognition: unknown

Multifaceted Hero Developers and Bug-Fixing Outcomes Across Severity

Amit Kumar , Mahen Gandhi , Meher Bhardwaj , Hrishikesh Ethari , Sonali Agarwal

Authors on Pith no claims yet

Pith reviewed 2026-05-07 06:47 UTC · model grok-4.3

classification 💻 cs.SE

keywords hero developersopen source softwarecontribution metricsbug fixingApache projectsbug severityfix ratesreopen rates

0 comments

The pith

Hero developers in open-source projects form largely distinct groups depending on whether contribution is measured by code activity or by discussion activity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Open-source projects rely on a small set of highly active contributors called hero developers, but the same person rarely qualifies as a hero under both code-based and discussion-based measures. This study tracks heroes across 77 Apache projects using three technical metrics (commits, files touched, churn) and two social metrics (comment count, distinct issues commented on). The identified hero sets overlap by only 0.10 on average, with technical heroes far more likely to show social activity than the reverse. Bug fix and reopen rates differ only modestly between hero types, yet the ranking of which heroes perform best shifts with bug severity. The central conclusion is that heroism is not a single, metric-independent role and that a multifaceted view is required for accurate developer prioritization and severity-aware bug assignment.

Core claim

Across 77 Apache Software Foundation projects, developers identified as heroes by commit count, distinct files touched, and code churn differ substantially from those identified by issue-comment count and number of distinct issues commented on. The pooled Jaccard overlap between the technical and social hero sets is 0.10. Technical heroes exhibit strong social activity in 71.4 percent of cases, while only 24.2 percent of social heroes exhibit strong technical activity. Fix-rate and reopen-rate differences across hero categories are modest, but the performance ranking of hero categories changes across bug severity levels.

What carries the argument

Comparison of hero sets defined by five contribution metrics (three technical: commit count, distinct files touched, churn; two social: issue-comment count, distinct issues commented on) and their linkage to fix rates and reopen rates stratified by bug severity.

If this is right

Hero projects are common under every one of the five metrics examined.
Technical and social hero sets overlap by only 0.10 on average.
Technical heroes show strong social activity far more often than social heroes show strong technical activity.
Fix and reopen rates exhibit only modest differences across hero categories, yet category rankings vary with bug severity.
A single-metric definition of heroism is insufficient for reliable contributor identification or severity-aware assignment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Project leads could retain more impact by tracking developers who score highly on both technical and social metrics rather than labeling them as heroes on one dimension alone.
Automated triage systems might assign high-severity bugs to developers who appear in both technical and social hero sets to improve outcomes.
The observed asymmetry suggests that strong technical output tends to generate or accompany discussion activity more often than the reverse in open-source settings.
Testing the same metrics on non-Apache ecosystems would show whether the low overlap is general or specific to Apache governance practices.

Load-bearing premise

The five selected metrics validly and sufficiently capture distinct technical and social facets of contribution, and fix and reopen rates serve as appropriate proxies for bug-fixing outcomes across severity levels.

What would settle it

A replication on the same or similar projects that finds a Jaccard overlap above 0.4 between technical and social hero sets, or finds performance rankings that remain stable across severity levels, would falsify the claim that heroism is metric-dependent.

Figures

Figures reproduced from arXiv: 2604.27754 by Amit Kumar, Hrishikesh Ethari, Mahen Gandhi, Meher Bhardwaj, Sonali Agarwal.

**Figure 1.** Figure 1: Global pooled Jaccard overlap among aggregated view at source ↗

**Figure 2.** Figure 2: RBO (𝑝=0.9) between severity-wise reopen-rate rankings of aggregated hero types under assignee attribution view at source ↗

read the original abstract

Open-source projects often rely on a small group of highly active contributors known as hero developers. Prior work shows that hero developers are common in many OSS and enterprise projects, yet who qualifies as a hero depends heavily on the chosen contribution metric. Code-based metrics identify implementation-focused developers, whereas discussion-based metrics highlight coordination and communication; these metrics capture distinct facets of contribution. We conducted a measurement-sensitive study of multifaceted heroism across 77 Apache Software Foundation projects using three technical measures (commit count, distinct files touched, churn) and two social measures (issue-comment count, number of distinct issues commented on). We examined hero prevalence, overlap among hero sets, and severity-wise bug-fixing outcomes via fix and reopen rates. Results show that hero projects are common under all measures, but identified heroes differ substantially across facets. The pooled Jaccard overlap between technical and social hero sets is only 0.10. Cross-facet asymmetry is evident: 71.4% of technical heroes exhibit strong social activity, while only 24.2% of social heroes show strong technical activity. Fix-rate and reopen-rate differences are modest, yet hero-category rankings vary across severity levels and outcome measures. These findings indicate that heroism is not a single, metric-independent role. A multifaceted perspective offers a more reliable understanding of key contributors and better supports developer prioritisation and severity-aware bug assignment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reports low overlap (Jaccard 0.10) and asymmetry between technical and social hero sets across 77 Apache projects plus modest severity-linked differences in fix/reopen rates, but does not test whether a multifaceted approach actually improves prioritization over single-metric baselines.

read the letter

The core new material here is the set of concrete measurements: pooled Jaccard overlap of 0.10 between technical and social hero groups, the 71.4% vs 24.2% asymmetry in cross-facet activity, and the observation that hero-category rankings for fix and reopen rates shift across severity levels. These numbers come from three technical metrics and two social ones applied to a decent sample of real Apache projects, and the authors track actual bug outcomes rather than stopping at hero identification. That is useful incremental data for the existing hero-developer literature, which has mostly used single metrics before. The work is straightforward empirical measurement with no circular modeling, and the authors are clear that heroism depends on the chosen metric. The soft spot is the jump from these descriptive results to the claim that a multifaceted view “offers a more reliable understanding” and “better supports developer prioritisation and severity-aware bug assignment.” The paper shows the sets are different and that outcomes vary a little by severity, but it does not run any head-to-head comparison of single-metric versus combined-metric assignment on the same bugs, nor any simulation of triage effectiveness. Without that, the practical recommendation rests on the untested assumption that distinctness equals superiority. Threshold choices for defining heroes and any data-cleaning steps are also not detailed enough in the abstract to judge robustness, though the overall design looks replicable. This is the kind of paper that belongs in a software engineering venue that values measurement studies. A serious editor should send it to review so the authors can add the missing baseline comparison and tighten the interpretation; the data collection itself is worth referee time even if the conclusions need work. I would not bring it to a reading group unless someone is specifically collecting hero-developer replications, and I would not cite it in my own work unless the revision supplies the direct test of prioritization value.

Referee Report

2 major / 2 minor

Summary. The paper conducts an empirical measurement study across 77 Apache Software Foundation projects, using three technical metrics (commit count, distinct files touched, churn) and two social metrics (issue-comment count, distinct issues commented on) to identify hero developers. It reports high hero prevalence under all metrics, low pooled Jaccard overlap (0.10) between technical and social hero sets, cross-facet asymmetry (71.4% vs. 24.2%), and modest differences in fix and reopen rates that vary by severity level and hero category. The central claim is that heroism is metric-dependent and that a multifaceted perspective provides a more reliable understanding of key contributors while better supporting developer prioritization and severity-aware bug assignment.

Significance. If the results hold after addressing the gaps below, the work provides a sizable-sample demonstration that single-metric hero definitions are insufficient in OSS projects, with concrete evidence of low overlap and outcome variation across severity. The explicit use of multiple outcome proxies (fix/reopen rates) stratified by severity and the distinction between technical and social facets are strengths that could inform contributor analysis and assignment practices. The study does not include machine-checked proofs or parameter-free derivations but does rely on direct computation from project data rather than fitted models.

major comments (2)

[Abstract] Abstract and Discussion: The claim that a multifaceted perspective 'offers a more reliable understanding of key contributors and better supports developer prioritisation and severity-aware bug assignment' is not supported by direct evidence. The reported results establish low overlap and varying rankings but contain no head-to-head evaluation (e.g., simulation of assignment effectiveness, predictive utility for fix/reopen rates, or comparison of union/ensemble identification against any single-metric baseline). Without such a test, the leap from distinct sets to superior decision support remains an untested assumption.
[Methodology] Methodology section: No details are provided on the hero-identification thresholds (e.g., top-X% cutoff, absolute count, or percentile), data-cleaning steps, handling of project-size confounds, or error estimation for the overlap and rate calculations. These omissions are load-bearing because the central claims rest on the stability of the hero sets and the modest rate differences; without them, reproducibility and robustness cannot be assessed.

minor comments (2)

[Abstract] Abstract: The selection criteria and time window for the 77 projects are not stated, which affects interpretation of prevalence and generalizability.
[Results] Results: Clarify whether the reported fix/reopen rates are raw percentages or adjusted; if the latter, state the controls used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's detailed feedback on our manuscript. We have carefully considered the major comments and provide point-by-point responses below. Where appropriate, we will revise the manuscript to address the concerns raised.

read point-by-point responses

Referee: [Abstract] Abstract and Discussion: The claim that a multifaceted perspective 'offers a more reliable understanding of key contributors and better supports developer prioritisation and severity-aware bug assignment' is not supported by direct evidence. The reported results establish low overlap and varying rankings but contain no head-to-head evaluation (e.g., simulation of assignment effectiveness, predictive utility for fix/reopen rates, or comparison of union/ensemble identification against any single-metric baseline). Without such a test, the leap from distinct sets to superior decision support remains an untested assumption.

Authors: We acknowledge that our study is primarily descriptive and does not include direct empirical tests of decision support, such as simulations comparing assignment strategies. The central contribution is the demonstration of low overlap between technical and social hero sets (Jaccard 0.10) and the variation in bug-fixing outcomes across severity levels, which we interpret as evidence that single-metric approaches are insufficient. To address this, we will revise the abstract and discussion sections to qualify the claim, changing 'offers a more reliable understanding' to 'suggests that a multifaceted perspective may provide a more reliable understanding' and similarly for the prioritization aspect, emphasizing that this is an implication rather than a directly tested outcome. We believe this interpretation is supported by the observed asymmetries and outcome differences, but agree that stronger validation would require additional experiments beyond the scope of this measurement study. revision: partial
Referee: [Methodology] Methodology section: No details are provided on the hero-identification thresholds (e.g., top-X% cutoff, absolute count, or percentile), data-cleaning steps, handling of project-size confounds, or error estimation for the overlap and rate calculations. These omissions are load-bearing because the central claims rest on the stability of the hero sets and the modest rate differences; without them, reproducibility and robustness cannot be assessed.

Authors: We thank the referee for pointing out these omissions, which are indeed important for reproducibility. In the revised manuscript, we will expand the Methodology section with a new subsection that explicitly details the hero-identification thresholds used for each metric (normalized per project to account for size differences), the data-cleaning steps including filtering of automated accounts and commit types, how project-size confounds were handled through relative metrics and per-project analysis, and error estimation via sensitivity analyses and confidence intervals for the reported overlap and rate differences. These additions will enhance the reproducibility and allow assessment of the robustness of our hero sets and findings. revision: yes

Circularity Check

0 steps flagged

No circularity: direct empirical computations from project data

full rationale

The paper is a measurement study that defines hero sets via five explicit contribution metrics, computes Jaccard overlaps and fix/reopen rates directly from Apache project data, and reports observed differences. No equations, fitted parameters, derivations, or self-citation chains appear in the load-bearing steps; all reported quantities (prevalence, 0.10 pooled overlap, 71.4%/24.2% asymmetry, category rankings) are independent calculations rather than reductions of the inputs by construction. The interpretive conclusion that a multifaceted view is more reliable follows from the empirical patterns without self-definitional or fitted-input circularity.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The abstract supplies only high-level study design; the primary unstated assumption is that the five listed metrics adequately represent distinct contribution facets. No free parameters or invented entities are visible in the provided text.

free parameters (1)

Hero identification threshold
The cutoff (e.g., top percentile or activity level) used to label developers as heroes is not stated in the abstract but is required to produce the reported sets and overlap statistics.

axioms (2)

domain assumption The selected technical and social metrics capture distinct and meaningful facets of developer contribution.
Invoked when the study treats the three code metrics and two discussion metrics as separate lenses on heroism.
domain assumption Fix rate and reopen rate are valid proxies for bug-fixing outcomes.
Used when comparing severity-wise performance across hero categories.

pith-pipeline@v0.9.0 · 10039 in / 1687 out tokens · 110117 ms · 2026-05-07T06:47:30.729586+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 11 canonical work pages

[1]

Amritanshu Agrawal, Akond Rahman, Rahul Krishna, Alexander Sobran, and Tim Menzies. 2018. We don’t need another hero? the impact of "heroes" on software development. InProceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice(Gothenburg, Sweden) (ICSE-SEIP ’18). Association for Computing Machinery, New York...

work page doi:10.1145/3183519.3183549 2018
[2]

Thomas Bock, Nils Alznauer, Mitchell Joblin, and Sven Apel. 2023. Automatic Core-Developer Identification on GitHub: A Validation Study.ACM Trans. Softw. Eng. Methodol.32, 6, Article 138 (Sept. 2023), 29 pages. doi:10.1145/3593803

work page doi:10.1145/3593803 2023
[3]

Alperen Çetin and Eray Tüzün

H. Alperen Çetin and Eray Tüzün. 2020. Identifying key developers using artifact traceability graphs. InProceedings of the 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering(Virtual, USA) (PROMISE 2020). Association for Computing Machinery, New York, NY, USA, 51–60. doi:10.1145/3416508.3417116

work page doi:10.1145/3416508.3417116 2020
[4]

Carlos DA De Almeida, Diego N Feijó, and Lincoln S Rocha. 2022. Studying the impact of continuous delivery adoption on bug-fixing time in apache’s open- source projects. InProceedings of the 19th International Conference on Mining Software Repositories. 132–136

2022
[5]

Amir Hossein Ghapanchi and Aybuke Aurum. 2011. Measuring the Effectiveness of the Defect-Fixing Process in Open Source Software Projects. InProceedings of the 2011 44th Hawaii International Conference on System Sciences (HICSS ’11). IEEE Computer Society, USA, 1–11. doi:10.1109/HICSS.2011.305

work page doi:10.1109/hicss.2011.305 2011
[6]

James Herbsleb. 2014. Socio-technical coordination (keynote). InCompanion Pro- ceedings of the 36th International Conference on Software Engineering(Hyderabad, India)(ICSE Companion 2014). Association for Computing Machinery, New York, NY, USA, 1. doi:10.1145/2591062.2600729

work page doi:10.1145/2591062.2600729 2014
[7]

Elgun Jabrayilzade, Mikhail Evtikhiev, Eray Tüzün, and Vladimir Kovalenko
[8]

InProceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice(Pittsburgh, Pennsylvania) (ICSE-SEIP ’22)

Bus factor in practice. InProceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice(Pittsburgh, Pennsylvania) (ICSE-SEIP ’22). Association for Computing Machinery, New York, NY, USA, 97–106. doi:10.1145/3510457.3513082

work page doi:10.1145/3510457.3513082
[9]

Yue Li, He Zhang, Yuzhe Jin, Zhong Ren, Liming Dong, Jun Lyu, Lanxin Yang, David Lo, and Dong Shao. 2024. An Explainable Automated Model for Measuring Software Engineer Contribution. InProceedings of the 39th IEEE/ACM Interna- tional Conference on Automated Software Engineering(Sacramento, CA, USA) (ASE ’24). Association for Computing Machinery, New York,...

work page doi:10.1145/3691620.3695071 2024
[10]

Yue Li, He Zhang, Lanxin Yang, Liming Dong, Juzheng Zhang, and Bohan Liu
[11]

Measuring software engineer’s contribution in practice: An industrial experience report.Journal of Software: Evolution and Process37, 1 (2025), e2722

2025
[12]

Suvodeep Majumder, Joymallya Chakraborty, Amritanshu Agrawal, and Tim Menzies. 2019. Communication and Code Dependency Effects on Software Code Quality: An Empirical Analysis of Herbsleb Hypothesis.arXiv preprint arXiv:1904.09954(2019)

work page arXiv 2019
[13]

Filippo Ricca and Alessandro Marchetto. 2010. Are Heroes common in FLOSS projects?. InProceedings of the 2010 ACM-IEEE International Symposium on Em- pirical Software Engineering and Measurement(Bolzano-Bozen, Italy)(ESEM ’10). Association for Computing Machinery, New York, NY, USA, Article 55, 4 pages. doi:10.1145/1852786.1852856

work page doi:10.1145/1852786.1852856 2010
[14]

Gregorio Robles and Jesus M Gonzalez-Barahona. 2006. Contributor turnover in li- bre software projects. InOpen Source Systems: IFIP Working Group 2.13 Foundation on Open Source Software, June 8–10, 2006, Como, Italy 2. Springer, 273–286

2006
[15]

Gregorio Robles, Jesus M Gonzalez-Barahona, and Israel Herraiz. 2009. Evolu- tion of the core team of developers in libre software projects. In2009 6th IEEE international working conference on mining software repositories. IEEE, 167–170

2009
[16]

Yuqiang Sun, Zhengzi Xu, Chengwei Liu, Yiran Zhang, and Yang Liu. 2024. Who is the Real Hero? Measuring Developer Contribution via Multi-Dimensional Data Integration. InProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering(Echternach, Luxembourg)(ASE ’23). IEEE Press, 825–836. doi:10.1109/ASE56229.2023.00102

work page doi:10.1109/ase56229.2023.00102 2024
[17]

Alexander Trautsch, Fabian Trautsch, and Steffen Herbold. 2021. MSR Mining Challenge: The SmartSHARK Repository Mining Data. (2021). arXiv:2102.11540 [cs.SE] https://arxiv.org/abs/2102.11540

work page arXiv 2021
[18]

Michail Tsikerdekis. 2018. Persistent code contribution: a ranking algorithm for code contribution in crowdsourced software.Empirical Software Engineering23, 4 (2018), 1871–1894

2018
[19]

Guo, and Brendan Mur- phy

Thomas Zimmermann, Nachiappan Nagappan, Philip J. Guo, and Brendan Mur- phy. 2012. Characterizing and predicting which bugs get reopened. InProceedings of the 34th International Conference on Software Engineering(Zurich, Switzerland) (ICSE ’12). IEEE Press, 1074–1083

2012