Predicting Merge Conflicts in Collaborative Software Development
Pith reviewed 2026-05-24 21:24 UTC · model grok-4.3
The pith
A classifier using nine lightweight Git features predicts safe merges with F1-scores of 0.95 to 0.97 across languages.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We design a classifier for predicting merge conflicts, based on 9 light-weight Git feature sets. To evaluate our predictor, we perform a large-scale study on 267,657 merge scenarios from 744 GitHub repositories in seven programming languages. Our results show that we achieve high f1-scores, varying from 0.95 to 0.97 for different programming languages, when predicting safe merge scenarios. The f1-score is between 0.57 and 0.68 for the conflicting merge scenarios. Predicting merge conflicts is feasible in practice, especially in the context of predicting safe merge scenarios as a pre-filtering step for speculative merging.
What carries the argument
A machine-learning classifier based on nine light-weight Git feature sets that distinguishes safe from conflicting merge scenarios.
If this is right
- Speculative merging systems can safely skip most merges the classifier labels safe, cutting background computation.
- Developers receive earlier warnings about likely conflicts before the changes grow large and complex.
- The same nine-feature approach delivers high accuracy on safe merges in all seven languages tested.
- The technique is positioned as a practical pre-filter rather than a complete replacement for full merge simulation.
Where Pith is reading between the lines
- The same lightweight features could be extracted from other version-control systems to test whether the prediction approach generalizes beyond Git.
- Adding simple change-type or file-overlap signals might raise the lower F1-scores observed for conflicting merges.
- Embedding the classifier in continuous-integration pipelines could automatically route only risky merges to human review.
- The gap between safe-merge and conflict-merge accuracy suggests the features capture absence of conflict more readily than presence of conflict.
Load-bearing premise
The nine light-weight Git feature sets provide sufficient information to distinguish between safe and conflicting merge scenarios with high accuracy.
What would settle it
Running the trained classifier on merge scenarios from a fresh collection of repositories outside the original 744 and finding that the F1-score for safe merges drops below 0.9 would challenge the feasibility result.
Figures
read the original abstract
Background. During collaborative software development, developers often use branches to add features or fix bugs. When merging changes from two branches, conflicts may occur if the changes are inconsistent. Developers need to resolve these conflicts before completing the merge, which is an error-prone and time-consuming process. Early detection of merge conflicts, which warns developers about resolving conflicts before they become large and complicated, is among the ways of dealing with this problem. Existing techniques do this by continuously pulling and merging all combinations of branches in the background to notify developers as soon as a conflict occurs, which is a computationally expensive process. One potential way for reducing this cost is to use a machine-learning based conflict predictor that filters out the merge scenarios that are not likely to have conflicts, ie safe merge scenarios. Aims. In this paper, we assess if conflict prediction is feasible. Method. We design a classifier for predicting merge conflicts, based on 9 light-weight Git feature sets. To evaluate our predictor, we perform a large-scale study on 267, 657 merge scenarios from 744 GitHub repositories in seven programming languages. Results. Our results show that we achieve high f1-scores, varying from 0.95 to 0.97 for different programming languages, when predicting safe merge scenarios. The f1-score is between 0.57 and 0.68 for the conflicting merge scenarios. Conclusions. Predicting merge conflicts is feasible in practice, especially in the context of predicting safe merge scenarios as a pre-filtering step for speculative merging.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that a machine-learning classifier based on 9 lightweight Git feature sets can predict merge conflicts with high F1 (0.95-0.97) on safe merges and moderate F1 (0.57-0.68) on conflicting merges. The evaluation uses 267,657 merge scenarios from 744 GitHub repositories across 7 languages. The central conclusion is that conflict prediction is feasible in practice, particularly as a pre-filter for safe merges to reduce the cost of speculative merging.
Significance. A reliable pre-filter for safe merges would meaningfully lower the computational overhead of continuous speculative merging in collaborative development. The scale of the empirical study (hundreds of thousands of real merges) is a positive aspect. However, the reported F1 disparity between classes limits the immediate practical significance unless the conflict-class performance can be shown to be adequate for the pre-filter role or the use-case is narrowed.
major comments (3)
- [Abstract] Abstract: The F1 scores of 0.57-0.68 on conflicting merges are moderate and undermine the pre-filtering claim, because either low recall on conflicts would let conflicting merges reach the expensive speculative step or low precision would cause unnecessary speculative work on safe merges.
- [Method] Method/Results: No feature definitions, ablation studies, feature-importance analysis, or baseline comparisons (e.g., simple file-overlap heuristics) are described, so it is unclear whether the nine Git features capture non-trivial conflict signals or merely reproduce obvious overlap statistics.
- [Evaluation] Evaluation: The manuscript provides no details on the cross-validation procedure, class imbalance handling, or precision/recall breakdown per class, which are required to assess whether the reported F1 values generalize or are artifacts of majority-class bias.
minor comments (2)
- [Abstract] The abstract and conclusions should explicitly state the class distribution (safe vs. conflicting) to contextualize the F1 gap.
- [Method] Notation for the nine feature sets should be introduced with a table or enumerated list for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments. The feedback highlights important areas for clarification and strengthening, particularly around the practical implications of the F1 scores, feature analysis, and evaluation details. We address each major comment below and will revise the manuscript to incorporate additional explanations, analyses, and details as outlined.
read point-by-point responses
-
Referee: [Abstract] Abstract: The F1 scores of 0.57-0.68 on conflicting merges are moderate and undermine the pre-filtering claim, because either low recall on conflicts would let conflicting merges reach the expensive speculative step or low precision would cause unnecessary speculative work on safe merges.
Authors: We agree that the moderate F1 on the conflict class requires careful interpretation for the pre-filter use case. However, the primary intended application is to identify safe merges with high confidence (F1 0.95-0.97) so they can be skipped in speculative merging, directly reducing computational cost. For the conflict class, even moderate performance provides value by catching some conflicts early; the system can still fall back to full speculative merging for uncertain cases. Low recall on conflicts would indeed allow some through, but this is acceptable if the goal is cost reduction rather than perfect filtering. We will revise the abstract, introduction, and discussion sections to explicitly frame the use case this way, add per-class precision/recall breakdowns, and discuss the precision-recall trade-offs to show when the pre-filter remains beneficial. revision: yes
-
Referee: [Method] Method/Results: No feature definitions, ablation studies, feature-importance analysis, or baseline comparisons (e.g., simple file-overlap heuristics) are described, so it is unclear whether the nine Git features capture non-trivial conflict signals or merely reproduce obvious overlap statistics.
Authors: The nine feature sets are introduced in Section 3 with high-level descriptions drawn from Git metadata (e.g., commit counts, file changes, branch divergence metrics). We acknowledge that explicit definitions, ablation results, feature-importance rankings, and a baseline comparison (such as a simple file-overlap heuristic) were not included. These additions would clarify whether the features provide non-trivial signals. We will expand Section 3 with precise feature definitions, add an ablation study removing feature groups, include feature-importance analysis (e.g., via permutation importance or SHAP), and compare against a file-overlap baseline to demonstrate incremental value of the ML model. revision: yes
-
Referee: [Evaluation] Evaluation: The manuscript provides no details on the cross-validation procedure, class imbalance handling, or precision/recall breakdown per class, which are required to assess whether the reported F1 values generalize or are artifacts of majority-class bias.
Authors: The evaluation used 10-fold cross-validation with stratification by repository to prevent leakage across projects, and class imbalance was addressed via class weighting in the classifier. We agree that these procedural details, along with full per-class precision, recall, and F1 scores for each language, are essential for assessing generalization and potential majority-class bias. We will add a dedicated subsection in the evaluation describing the CV procedure, imbalance handling method, and complete per-class metrics (including confusion matrices or PR curves) to allow readers to verify the results are not artifacts of imbalance. revision: yes
Circularity Check
No circularity: empirical ML evaluation on held-out repository data
full rationale
The paper trains a classifier on nine Git-derived features and reports F1 scores on a large held-out set of 267k merge scenarios from 744 real GitHub repositories. No derivation chain exists; the central claim rests on standard supervised learning performance metrics rather than any self-definition, fitted-input-as-prediction, or self-citation load-bearing step. The evaluation protocol (train/test split across repositories) is externally falsifiable and independent of the reported numbers.
Axiom & Free-Parameter Ledger
free parameters (1)
- ML model parameters
axioms (1)
- domain assumption Light-weight Git features are predictive of merge conflicts
Reference graph
Works this paper leans on
-
[1]
The promises and perils of mining git,
C. Bird, P. C. Rigby, E. T. Barr, D. J. Hamilton, D. M. German, and P. Devanbu, “The promises and perils of mining git,” in Mining Software Repositories, 2009. MSR’09. 6th IEEE International Working Conference on. IEEE, 2009, pp. 1–10
work page 2009
-
[2]
The promises and perils of mining github,
E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. German, and D. Damian, “The promises and perils of mining github,” in Proceedings of the 11th working conference on mining software repositories . ACM, 2014, pp. 92–101
work page 2014
-
[3]
Software practitioner per- spectives on merge conflicts and resolutions,
S. McKee, N. Nelson, A. Sarma, and D. Dig, “Software practitioner per- spectives on merge conflicts and resolutions,” in Software Maintenance and Evolution (ICSME), 2017 IEEE International Conference on. IEEE, 2017, pp. 467–478
work page 2017
-
[4]
Understanding semi-structured merge conflict characteristics in open-source java projects,
P. Accioly, P. Borba, and G. Cavalcanti, “Understanding semi-structured merge conflict characteristics in open-source java projects,” Empirical Software Engineering, vol. 23, no. 4, pp. 2051–2085, 2018
work page 2051
-
[5]
Palantir: Early detection of development conflicts arising from parallel code changes,
A. Sarma, D. F. Redmiles, and A. Van Der Hoek, “Palantir: Early detection of development conflicts arising from parallel code changes,” IEEE Transactions on Software Engineering, vol. 38, no. 4, pp. 889–908, 2012
work page 2012
-
[6]
Assessing the value of branches with what-if analysis,
C. Bird and T. Zimmermann, “Assessing the value of branches with what-if analysis,” in Proceedings of the ACM SIGSOFT 20th Interna- tional Symposium on the Foundations of Software Engineering . ACM, 2012, p. 45
work page 2012
-
[7]
Early detection of collaboration conflicts and risks,
Y . Brun, R. Holmes, M. D. Ernst, and D. Notkin, “Early detection of collaboration conflicts and risks,” IEEE Transactions on Software Engineering, vol. 39, no. 10, pp. 1358–1375, 2013
work page 2013
-
[8]
Improving early detection of software merge conflicts,
M. L. Guimar ˜aes and A. R. Silva, “Improving early detection of software merge conflicts,” in Proceedings of the 34th International Conference on Software Engineering . IEEE Press, 2012, pp. 342–352
work page 2012
-
[9]
Awareness and merge conflicts in distributed software development,
H. C. Estler, M. Nordio, C. A. Furia, and B. Meyer, “Awareness and merge conflicts in distributed software development,” in Global Software Engineering (ICGSE), 2014 IEEE 9th International Conference on . IEEE, 2014, pp. 26–35
work page 2014
-
[10]
Incremental speculative merging,
J. Baumgartner, R. Kanzelman, H. Mony, and V . Paruthi, “Incremental speculative merging,” Apr. 26 2011, uS Patent 7,934,180
work page 2011
-
[11]
Proactive detection of collaboration conflicts,
Y . Brun, R. Holmes, M. D. Ernst, and D. Notkin, “Proactive detection of collaboration conflicts,” in Proceedings of the 19th ACM SIGSOFT sym- posium and the 13th European conference on Foundations of software engineering. ACM, 2011, pp. 168–178
work page 2011
-
[12]
Cassandra: Proactive conflict minimization through optimized task scheduling,
B. K. Kasi and A. Sarma, “Cassandra: Proactive conflict minimization through optimized task scheduling,” in Proceedings of the 2013 Inter- national Conference on Software Engineering . IEEE Press, 2013, pp. 732–741
work page 2013
-
[13]
In- dicators for merge conflicts in the wild: survey and empirical study,
O. Leßenich, J. Siegmund, S. Apel, C. K ¨astner, and C. Hunsen, “In- dicators for merge conflicts in the wild: survey and empirical study,” Automated Software Engineering , vol. 25, no. 2, pp. 279–313, 2018
work page 2018
-
[14]
Analyzing conflict predictors in open-source java projects,
P. Accioly, P. Borba, L. Silva, and G. Cavalcanti, “Analyzing conflict predictors in open-source java projects,” in Proceedings of the 15th International Conference on Mining Software Repositories . ACM, 2018, pp. 576–586
work page 2018
-
[15]
Curating github for engineered software projects,
N. Munaiah, S. Kroh, C. Cabrey, and M. Nagappan, “Curating github for engineered software projects,” Empirical Software Engineering, vol. 22, no. 6, pp. 3219–3253, 2017
work page 2017
-
[16]
J. R. Quinlan, “Induction of decision trees,” Machine learning, vol. 1, no. 1, pp. 81–106, 1986
work page 1986
-
[17]
Classification and regression by randomfor- est,
A. Liaw, M. Wiener et al., “Classification and regression by randomfor- est,” R news, vol. 2, no. 3, pp. 18–22, 2002
work page 2002
- [18]
-
[19]
A state-of-the-art survey on software merging,
T. Mens, “A state-of-the-art survey on software merging,” IEEE trans- actions on software engineering , vol. 28, no. 5, pp. 449–462, 2002
work page 2002
-
[20]
Semistruc- tured merge: rethinking merge in revision control systems,
S. Apel, J. Liebig, B. Brandl, C. Lengauer, and C. K ¨astner, “Semistruc- tured merge: rethinking merge in revision control systems,” in Proceed- ings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering . ACM, 2011, pp. 190–200
work page 2011
-
[21]
How do centralized and distributed version control systems impact software changes?
C. Brindescu, M. Codoban, S. Shmarkatiuk, and D. Dig, “How do centralized and distributed version control systems impact software changes?” in Proceedings of the 36th International Conference on Software Engineering. ACM, 2014, pp. 322–333
work page 2014
-
[22]
Structured merge with auto- tuning: balancing precision and performance,
S. Apel, O. Leßenich, and C. Lengauer, “Structured merge with auto- tuning: balancing precision and performance,” in Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineer- ing. ACM, 2012, pp. 120–129
work page 2012
-
[23]
Structure-oriented merging of revisions of software documents,
B. Westfechtel, “Structure-oriented merging of revisions of software documents,” in Proceedings of the 3rd international workshop on Software configuration management . ACM, 1991, pp. 68–79
work page 1991
-
[24]
J. Buffenbarger, “Syntactic software merging,” in Software Configuration Management. Springer, 1995, pp. 153–172
work page 1995
-
[25]
“Fstmerge tool,” https://github.com/joliebig/featurehouse/tree/master/ fstmerge
- [26]
-
[27]
Evaluating and improving semistructured merge,
G. Cavalcanti, P. Borba, and P. Accioly, “Evaluating and improving semistructured merge,” Proceedings of the ACM on Programming Lan- guages, vol. 1, no. OOPSLA, p. 59, 2017
work page 2017
-
[28]
On the Nature of Merge Conflicts: a Study of 2,731 Open Source Java Projects Hosted by GitHub,
G. G. L. Menezes, L. G. P. Murta, M. O. Barros, and A. Van Der Hoek, “On the Nature of Merge Conflicts: a Study of 2,731 Open Source Java Projects Hosted by GitHub,” IEEE Transactions on Software Engineering, 2018
work page 2018
-
[29]
Tipmerge: recom- mending experts for integrating changes across branches,
C. Costa, J. Figueiredo, L. Murta, and A. Sarma, “Tipmerge: recom- mending experts for integrating changes across branches,” in Proceed- ings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering . ACM, 2016, pp. 523–534
work page 2016
-
[30]
Effective software merging in the presence of object-oriented refactorings,
D. Dig, K. Manzoor, R. E. Johnson, and T. N. Nguyen, “Effective software merging in the presence of object-oriented refactorings,” IEEE Transactions on Software Engineering, vol. 34, no. 3, pp. 321–335, 2008
work page 2008
-
[31]
Are refactorings to blame? an empirical study of refactorings in merge conflicts,
M. Mahmoudi, S. Nadi, and N. Tsantalis, “Are refactorings to blame? an empirical study of refactorings in merge conflicts,” in Proc. of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER ’19) , 2019
work page 2019
-
[32]
Syde: a tool for collaborative software development,
L. Hattori and M. Lanza, “Syde: a tool for collaborative software development,” in Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 2. ACM, 2010, pp. 235– 238
work page 2010
-
[33]
Supporting merge conflict resolu- tion by using fine-grained code change history,
Y . Nishimura and K. Maruyama, “Supporting merge conflict resolu- tion by using fine-grained code change history,” in Software Analysis, Evolution, and Reengineering (SANER), 2016 IEEE 23rd International Conference on, vol. 1. IEEE, 2016, pp. 661–664
work page 2016
-
[34]
Studying pull request merges: a case study of shopify’s active merchant,
O. Kononenko, T. Rose, O. Baysal, M. Godfrey, D. Theisen, and B. de Water, “Studying pull request merges: a case study of shopify’s active merchant,” in Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice . ACM, 2018, pp. 124–133
work page 2018
-
[35]
Early prediction of merged code changes to prioritize reviewing tasks,
Y . Fan, X. Xia, D. Lo, and S. Li, “Early prediction of merged code changes to prioritize reviewing tasks,” Empirical Software Engineering, pp. 1–48, 2018
work page 2018
-
[36]
Scalable software merging studies with merganser,
M. Owhadi-Kareshk and S. Nadi, “Scalable software merging studies with merganser,” in Proceedings of the 16th International Conference on Mining Software Repositories (MSR ’19) , 2019
work page 2019
-
[37]
https://git-scm.com/docs/git-merge
-
[38]
Learning from class-imbalanced data: Review of methods and applications,
G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, “Learning from class-imbalanced data: Review of methods and applications,” Expert Systems with Applications , vol. 73, pp. 220–239, 2017
work page 2017
-
[39]
Smote: synthetic minority over-sampling technique,
N. V . Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote: synthetic minority over-sampling technique,” Journal of artificial intel- ligence research, vol. 16, pp. 321–357, 2002
work page 2002
-
[40]
“reaper dataset,” https://reporeapers.github.io/static/downloads/dataset. csv.gz
-
[41]
M. G. Kendall, S. F. Kendall, and B. B. Smith, “The distribution of spearman’s coefficient of rank correlation in a universe in which all rankings occur an equal number of times,” Biometrika, pp. 251–273, 1939
work page 1939
-
[42]
T. W. Anderson and J. D. Finn, The new statistical analysis of data . Springer Science & Business Media, 2012
work page 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.