Rover: Context-aware Conflict Resolution with LLM
Pith reviewed 2026-05-19 23:01 UTC · model grok-4.3
pith:DBJPT6B2 Add to your LaTeX paper
What is a Pith Number?\usepackage{pith}
\pithnumber{DBJPT6B2}
Prints a linked pith:DBJPT6B2 badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more
The pith
Rover builds multi-layer code graphs so LLMs can resolve merge conflicts more accurately than prior tools or models alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Rover integrates program analysis with large language models by constructing a Multi-layer Code Property Graph that records inter-file dependencies and then applying graph connectivity clustering to form meaningful contexts around each conflict; these contexts allow the LLM to generate resolutions that achieve higher similarity to ground-truth human resolutions at character, lexical, and semantic levels than standalone LLMs, the MergeGen baseline, or WizardMerge supplied with adjacent code.
What carries the argument
Multi-layer Code Property Graph (MtCPG) plus graph connectivity clustering, which extracts inter-file dependencies and groups them into focused contexts that steer the LLM toward accurate conflict resolutions.
If this is right
- Rover produces resolutions closer to human ground truth than standalone LLMs or existing machine-learning and suggestion tools.
- The method handles merge conflicts that involve complex cross-file dependencies more effectively than approaches lacking explicit context construction.
- By reducing the number of conflicts left for manual resolution, the technique could shorten the time required to integrate changes in large codebases.
- The combination of graph-based context extraction and LLM generation can be evaluated directly against similarity metrics at multiple levels of code representation.
Where Pith is reading between the lines
- The same graph-plus-clustering technique could be tested on other software-engineering tasks that require understanding scattered changes, such as refactoring or bug-fix propagation.
- If the graph construction remains efficient at scale, the approach might be integrated into version-control systems to resolve a larger fraction of merges without human intervention.
- Failures in resolution could be traced back to specific missing edges in the multi-layer graph, offering a route to improve the context-building step itself.
Load-bearing premise
The multi-layer graph and clustering step will reliably extract contexts that let the LLM infer the right developer intentions without adding new errors or hallucinations.
What would settle it
Run Rover and the three baseline systems on the same collection of merge conflicts that have recorded ground-truth resolutions and measure whether Rover still shows higher similarity scores at character, lexical, and semantic levels.
Figures
read the original abstract
Code merging is a significant challenge, particularly in large-scale projects. Existing solutions, including program analysis and machine learning, show promise but face critical limitations. Program analysis lacks the ability to infer developers' intentions, relying on conservative strategies that offload unresolved conflicts for manual handling. Meanwhile, model-based approaches struggle with conflicts involving complex code dependencies due to insufficient contextual awareness. To address these gaps, we introduce Rover, a novel conflict resolution system that integrates program analysis with large language models (LLMs). To obtain context-aware prompts, we propose Multi-layer Code Property Graph (MtCPG), a new representation capturing inter-file dependencies and enabling contextual analysis for a given conflict. Using graph connectivity algorithms, Rover further clusters conflicting code and associated changes into meaningful "contexts" that guide the LLM in generating accurate resolutions. We compared Rover with standalone LLMs, machine learning baseline MergeGen, and suggestion provider tool WizardMerge with adjacent code as the contexts. Evaluation results show that Rover surpasses all of these approaches in terms of conflict resolution, achieving higher similarity to ground-truth resolutions at character, lexical, and semantic levels.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents Rover, a context-aware conflict resolution system for code merging that combines program analysis using a Multi-layer Code Property Graph (MtCPG) with large language models (LLMs). By clustering conflicting code and changes via graph connectivity algorithms, Rover generates context-aware prompts for the LLM to produce resolutions. Evaluation shows Rover achieves higher similarity to ground-truth resolutions compared to standalone LLMs, MergeGen, and WizardMerge at character, lexical, and semantic levels.
Significance. If the results hold, Rover could improve automated merge conflict resolution in large projects by better inferring developer intentions through enhanced contextual analysis, potentially reducing the need for manual handling of unresolved conflicts and advancing the integration of LLMs with program analysis techniques.
major comments (2)
- [Abstract] Abstract: The central claim that Rover surpasses baselines rests on higher similarity to ground-truth resolutions, but the manuscript provides no details on dataset size, conflict selection criteria, statistical significance, or exact evaluation protocol. This makes the empirical superiority difficult to verify and is load-bearing for the headline result.
- [Approach] MtCPG and clustering description: The approach assumes that the Multi-layer Code Property Graph plus graph connectivity clustering will reliably produce contexts allowing the LLM to infer developer intentions without new errors or hallucinations, yet no verification is provided that extracted contexts are complete (e.g., inter-file dependencies) or that LLM outputs preserve original semantics beyond the reported similarity metrics. This assumption directly supports the claim of improved conflict resolution.
minor comments (1)
- [Abstract] Abstract: Consider briefly noting the scale of the evaluation (e.g., number of conflicts or projects) to give readers an immediate sense of the empirical basis.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and indicate where revisions will be made to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that Rover surpasses baselines rests on higher similarity to ground-truth resolutions, but the manuscript provides no details on dataset size, conflict selection criteria, statistical significance, or exact evaluation protocol. This makes the empirical superiority difficult to verify and is load-bearing for the headline result.
Authors: We agree that the abstract omits these specifics for brevity. Section 4 of the manuscript describes the evaluation dataset, conflict selection from open-source repositories based on dependency complexity, the protocol for computing character, lexical, and semantic similarity, and statistical significance testing. We will revise the abstract to include a concise summary of dataset scale, selection approach, and significance to support verifiability of the claims. revision: yes
-
Referee: [Approach] MtCPG and clustering description: The approach assumes that the Multi-layer Code Property Graph plus graph connectivity clustering will reliably produce contexts allowing the LLM to infer developer intentions without new errors or hallucinations, yet no verification is provided that extracted contexts are complete (e.g., inter-file dependencies) or that LLM outputs preserve original semantics beyond the reported similarity metrics. This assumption directly supports the claim of improved conflict resolution.
Authors: The referee correctly identifies that we did not include separate verification steps for context completeness or semantic preservation beyond the similarity metrics. The MtCPG construction and connectivity-based clustering are intended to capture inter-file dependencies, but we acknowledge the lack of explicit checks such as manual audits or hallucination analysis. We will add a discussion subsection with qualitative examples of extracted contexts and resolutions to address this in the revision. revision: partial
Circularity Check
No circularity; empirical evaluation of proposed system is self-contained
full rationale
The paper proposes a new MtCPG representation and graph-connectivity clustering to supply context to an LLM for merge-conflict resolution, then reports empirical similarity scores against baselines and ground-truth patches. No equations, fitted parameters, or predictions appear; the central result is an external comparison rather than a derivation that reduces to its own inputs by construction. No load-bearing self-citations or uniqueness theorems imported from prior author work are invoked. The approach is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
invented entities (1)
-
Multi-layer Code Property Graph (MtCPG)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Paola Accioly, Paulo Borba, and Guilherme Cavalcanti. 2018. Understanding semi-structured merge conflict character- istics in open-source java projects.Empirical Software Engineering23 (2018), 2051–2085
work page 2018
-
[2]
Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K Arora, Yu Bai, Bowen Baker, Haiming Bao, et al. 2025. gpt-oss-120b & gpt-oss-20b model card.arXiv preprint arXiv:2508.10925(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[3]
Mehdi Ahmed-Nacer, Pascal Urso, and François Charoy. 2013. Improving textual merge result. In9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing. IEEE, 390–399
work page 2013
-
[5]
Waad Aldndni, Na Meng, and Francisco Servant. 2023. Automatic prediction of developers’ resolutions for software merge conflicts.Journal of Systems and Software206 (2023), 111836
work page 2023
-
[6]
Sven Apel, Olaf Leßenich, and Christian Lengauer. 2012. Structured merge with auto-tuning: balancing precision and performance. InProceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. 120–129
work page 2012
-
[7]
Sven Apel, Jörg Liebig, Benjamin Brandl, Christian Lengauer, and Christian Kästner. 2011. Semistructured merge: rethinking merge in revision control systems. InProceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. 190–200
work page 2011
-
[8]
Eric Bruneton, Romain Lenglet, and Thierry Coupaye. 2002. ASM: a code manipulation tool to implement adaptable systems.Adaptable and extensible component systems30, 19 (2002)
work page 2002
-
[9]
Max Brunsfeld. 2025. Tree-sitter. https://github.com/tree-sitter/tree-sitter. Accessed: 2025-3-9
work page 2025
-
[10]
Guilherme Cavalcanti, Paulo Borba, Georg Seibt, and Sven Apel. 2019. The impact of structure on software merging: Semistructured versus structured merge. In2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1002–1013
work page 2019
-
[11]
Cp-Algorithms. 2023. Segment Tree. https://cp-algorithms.com/data_structures/segment_tree.html. Accessed: 2024-Nov
work page 2023
-
[12]
Léuson Da Silva, Paulo Borba, Toni Maciel, Wardah Mahmood, Thorsten Berger, João Moisakis, Aldiberg Gomes, and Vinícius Leite. 2024. Detecting semantic conflicts with unit tests.Journal of Systems and Software214 (2024), 112070
work page 2024
-
[13]
Léuson Da Silva, Paulo Borba, and Arthur Pires. 2022. Build conflicts in the wild.Journal of Software: Evolution and Process34, 4 (2022), e2441
work page 2022
- [14]
-
[15]
Elizabeth Dinella, Todd Mytkowicz, Alexey Svyatkovskiy, Christian Bird, Mayur Naik, and Shuvendu Lahiri. 2022. Deepmerge: Learning to merge programs.IEEE Transactions on Software Engineering49, 4 (2022), 1599–1614
work page 2022
-
[16]
Jinhao Dong, Qihao Zhu, Zeyu Sun, Yiling Lou, and Dan Hao. 2023. Merge Conflict Resolution: Classification or Generation?. In2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1652– 1663
work page 2023
-
[17]
Paulo Elias, Heleno de S Campos Junior, Eduardo Ogasawara, and Leonardo Gresta Paulino Murta. 2023. Towards accurate recommendations of merge conflicts resolution strategies.Information and Software Technology(2023), 107332
work page 2023
-
[18]
Paulo Elias, Heleno de S Campos Junior, Eduardo Ogasawara, and Leonardo Gresta Paulino Murta. 2023. Towards accurate recommendations of merge conflicts resolution strategies.Information and Software Technology164 (2023), 107332
work page 2023
-
[19]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al . 2024. The llama 3 herd of models.arXiv preprint arXiv:2407.21783(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[20]
Xiangming Gu, Tianyu Pang, Chao Du, Qian Liu, Fengzhuo Zhang, Cunxiao Du, Ye Wang, and Min Lin. 2024. When Attention Sink Emerges in Language Models: An Empirical View.arXiv preprint arXiv:2410.10781(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[21]
Mário Luís Guimarães and António Rito Silva. 2012. Improving early detection of software merge conflicts. In2012 34th International Conference on Software Engineering (ICSE). IEEE, 342–352
work page 2012
-
[22]
Simon Larsén, Jean-Rémy Falleri, Benoit Baudry, and Martin Monperrus. 2022. Spork: Structured Merge for Java with Formatting Preservation.IEEE Transactions on Software Engineering49, 1 (2022), 64–83
work page 2022
-
[23]
Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. InInternational symposium on code generation and optimization, 2004. CGO 2004.IEEE, 75–86
work page 2004
-
[24]
Jon Loeliger and Matthew McCullough. 2012.Version Control with Git: Powerful tools and techniques for collaborative software development. " O’Reilly Media, Inc. "
work page 2012
-
[25]
Qinyu Luo, Yining Ye, Shihao Liang, Zhong Zhang, Yujia Qin, Yaxi Lu, Yesai Wu, Xin Cong, Yankai Lin, Yingli Zhang, et al. 2024. Repoagent: An llm-powered open-source framework for repository-level code documentation generation. , Vol. 1, No. 1, Article . Publication date: May 2018. Rover: Context-aware Conflict Resolution with LLM 21 arXiv preprint arXiv:...
-
[26]
Dewayne E Perry, Harvey P Siy, and Lawrence G Votta. 2001. Parallel changes in large-scale software development: an observational case study.ACM Transactions on Software Engineering and Methodology (TOSEM)10, 3 (2001), 308–337
work page 2001
-
[27]
C Michael Pilato, Ben Collins-Sussman, and Brian W Fitzpatrick. 2008.Version control with subversion: next generation open source version control. " O’Reilly Media, Inc. "
work page 2008
-
[28]
Eric Sven Ristad and Peter N Yianilos. 1998. Learning string-edit distance.IEEE Transactions on Pattern Analysis and Machine Intelligence20, 5 (1998), 522–532
work page 1998
-
[29]
Saul Schleimer, Daniel S Wilkerson, and Alex Aiken. 2003. Winnowing: local algorithms for document fingerprinting. InProceedings of the 2003 ACM SIGMOD international conference on Management of data. 76–85
work page 2003
-
[30]
Georg Seibt, Florian Heck, Guilherme Cavalcanti, Paulo Borba, and Sven Apel. 2021. Leveraging structure in software merge: An empirical study.IEEE Transactions on Software Engineering48, 11 (2021), 4590–4610
work page 2021
-
[31]
Bo Shen, Wei Zhang, Haiyan Zhao, Guangtai Liang, Zhi Jin, and Qianxiang Wang. 2019. IntelliMerge: a refactoring- aware software merging technique.Proceedings of the ACM on Programming Languages3, OOPSLA (2019), 1–28
work page 2019
-
[32]
Chaochao Shen, Wenhua Yang, Minxue Pan, and Yu Zhou. 2023. Git Merge Conflict Resolution Leveraging Strategy Classification and LLM. In2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS). IEEE, 228–239
work page 2023
-
[33]
Marcelo Sousa, Isil Dillig, and Shuvendu K Lahiri. 2018. Verified three-way program merge.Proceedings of the ACM on Programming Languages2, OOPSLA (2018), 1–29
work page 2018
-
[34]
Diomidis Spinellis. 2005. Version control systems.IEEE software22, 5 (2005), 108–109
work page 2005
-
[35]
Alexey Svyatkovskiy, Sarah Fakhoury, Negar Ghorbani, Todd Mytkowicz, Elizabeth Dinella, Christian Bird, Jinu Jang, Neel Sundaresan, and Shuvendu K Lahiri. 2022. Program merge conflict resolution via neural transformers. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 822–833
work page 2022
-
[36]
Manuel Then, Moritz Kaufmann, Fernando Chirigati, Tuan-Anh Hoang-Vu, Kien Pham, Alfons Kemper, Thomas Neumann, and Huy T Vo. 2014. The more the merrier: Efficient multi-source graph traversal.Proceedings of the VLDB Endowment8, 4 (2014), 449–460
work page 2014
-
[37]
Gustavo Vale, Claus Hunsen, Eduardo Figueiredo, and Sven Apel. 2021. Challenges of resolving merge conflicts: A mining and survey study.IEEE Transactions on Software Engineering48, 12 (2021), 4964–4985
work page 2021
-
[38]
Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie Hendren, Patrick Lam, and Vijay Sundaresan. 2010. Soot: A Java bytecode optimization framework. InCASCON First Decade High Impact Papers. 214–224
work page 2010
-
[39]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al . 2022. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems35 (2022), 24824–24837
work page 2022
-
[40]
Peipei Xia, Li Zhang, and Fanzhang Li. 2015. Learning similarity with cosine similarity ensemble.Information sciences 307 (2015), 39–52
work page 2015
-
[41]
Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. 2014. Modeling and discovering vulnerabilities with code property graphs. In2014 IEEE symposium on security and privacy. IEEE, 590–604
work page 2014
-
[42]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025. Qwen3 technical report.arXiv preprint arXiv:2505.09388(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[43]
John Yang, Carlos E Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. 2024. Swe-agent: Agent-computer interfaces enable automated software engineering.Advances in Neural Information Processing Systems37 (2024), 50528–50652
work page 2024
-
[44]
Jialu Zhang, Todd Mytkowicz, Mike Kaufman, Ruzica Piskac, and Shuvendu K Lahiri. 2022. Using pre-trained language models to resolve textual and semantic merge conflicts (experience paper). InProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 77–88
work page 2022
-
[45]
Qingyu Zhang, Junzhe Li, Jiayi Lin, Jie Ding, Lanteng Lin, and Chenxiong Qian. 2025. WizardMerge—Save Us from Merging without Any Clues.ACM Transactions on Software Engineering and Methodology35, 1 (2025), 1–28
work page 2025
- [46]
-
[47]
Yuntong Zhang, Haifeng Ruan, Zhiyu Fan, and Abhik Roychoudhury. 2024. Autocoderover: Autonomous program improvement. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1592–1604
work page 2024
-
[48]
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023. A survey of large language models.arXiv preprint arXiv:2303.18223(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[49]
Fengmin Zhu and Fei He. 2018. Conflict resolution for structured merge via version space algebra.Proceedings of the ACM on Programming Languages2, OOPSLA (2018), 1–25. , Vol. 1, No. 1, Article . Publication date: May 2018. 22 Qingyu Zhang, Junzhe Li, Jiayi Lin, Changhua Luo, and Chenxiong Qian
work page 2018
-
[50]
Fengmin Zhu, Xingyu Xie, Dongyu Feng, Na Meng, and Fei He. 2022. Mastery: Shifted-Code-Aware Structured Merging. InInternational Symposium on Dependable Software Engineering: Theories, Tools, and Applications. Springer, 70–87
work page 2022
-
[51]
Nazatul Nurlisa Zolkifli, Amir Ngah, and Aziz Deraman. 2018. Version control system: A review.Procedia Computer Science135 (2018), 408–415. , Vol. 1, No. 1, Article . Publication date: May 2018
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.