pith. sign in

arxiv: 2605.17279 · v1 · pith:DBJPT6B2new · submitted 2026-05-17 · 💻 cs.SE · cs.AI

Rover: Context-aware Conflict Resolution with LLM

Pith reviewed 2026-05-19 23:01 UTC · model grok-4.3

classification 💻 cs.SE cs.AI
keywords code mergingconflict resolutionlarge language modelsprogram analysiscode property graphsoftware engineeringmerge conflictscontext-aware prompting
0
0 comments X p. Extension
pith:DBJPT6B2 Add to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{DBJPT6B2}

Prints a linked pith:DBJPT6B2 badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

Rover builds multi-layer code graphs so LLMs can resolve merge conflicts more accurately than prior tools or models alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to show that code merge conflicts in large projects can be handled automatically by feeding large language models carefully chosen contexts built from program analysis. Existing program-analysis tools are too conservative and hand off too many cases to humans, while pure AI approaches often miss the dependencies that connect changes across files. Rover creates a Multi-layer Code Property Graph to capture those dependencies and then uses graph clustering to turn the relevant pieces into focused prompts that guide the LLM toward resolutions that match what developers actually chose. A sympathetic reader would care because successful automation here would cut down the manual work that slows large-team development and reduce the errors that slip through when conflicts are resolved incorrectly.

Core claim

Rover integrates program analysis with large language models by constructing a Multi-layer Code Property Graph that records inter-file dependencies and then applying graph connectivity clustering to form meaningful contexts around each conflict; these contexts allow the LLM to generate resolutions that achieve higher similarity to ground-truth human resolutions at character, lexical, and semantic levels than standalone LLMs, the MergeGen baseline, or WizardMerge supplied with adjacent code.

What carries the argument

Multi-layer Code Property Graph (MtCPG) plus graph connectivity clustering, which extracts inter-file dependencies and groups them into focused contexts that steer the LLM toward accurate conflict resolutions.

If this is right

  • Rover produces resolutions closer to human ground truth than standalone LLMs or existing machine-learning and suggestion tools.
  • The method handles merge conflicts that involve complex cross-file dependencies more effectively than approaches lacking explicit context construction.
  • By reducing the number of conflicts left for manual resolution, the technique could shorten the time required to integrate changes in large codebases.
  • The combination of graph-based context extraction and LLM generation can be evaluated directly against similarity metrics at multiple levels of code representation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same graph-plus-clustering technique could be tested on other software-engineering tasks that require understanding scattered changes, such as refactoring or bug-fix propagation.
  • If the graph construction remains efficient at scale, the approach might be integrated into version-control systems to resolve a larger fraction of merges without human intervention.
  • Failures in resolution could be traced back to specific missing edges in the multi-layer graph, offering a route to improve the context-building step itself.

Load-bearing premise

The multi-layer graph and clustering step will reliably extract contexts that let the LLM infer the right developer intentions without adding new errors or hallucinations.

What would settle it

Run Rover and the three baseline systems on the same collection of merge conflicts that have recorded ground-truth resolutions and measure whether Rover still shows higher similarity scores at character, lexical, and semantic levels.

Figures

Figures reproduced from arXiv: 2605.17279 by Changhua Luo, Chenxiong Qian, Jiayi Lin, Junzhe Li, Qingyu Zhang.

Figure 1
Figure 1. Figure 1: A code context reliance example. Although extensive research has been conducted on code merging and conflict resolution, existing approaches still face fundamental limitations that hinder fully automated and reliable resolution. On the one hand, neither textual nor structural merging tools can eliminate all conflicts, leaving developers responsible for manually resolving unresolved cases. On the other hand… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of Rover System. 3.1 Overview We present the overview of Rover in [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Concept graph of multi-layer code property graph. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Construction example of multi-layer code property graph. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The workflow of Rover’s graph-text alignment. To align the preliminary merge results with the MtCPG, we extend the "coloring" idea from WizardMerge [45]. In WizardMerge’s graph-text alignment, a segment tree [11] is used to manage all definition nodes, with modified code blocks serving as the colors to fill these nodes. However, this method can fragment one code block into multiple pieces in the preliminar… view at source ↗
Figure 6
Figure 6. Figure 6: Relative improvement ration compared with [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Case study of the effectivenss of Rover. To explore how Rover can impact other models, we evaluated Rover and LoC=20 with gpt-oss￾20b and Llama-3.1-8B-Instruct respectively on the same dataset. The results are shown in [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
read the original abstract

Code merging is a significant challenge, particularly in large-scale projects. Existing solutions, including program analysis and machine learning, show promise but face critical limitations. Program analysis lacks the ability to infer developers' intentions, relying on conservative strategies that offload unresolved conflicts for manual handling. Meanwhile, model-based approaches struggle with conflicts involving complex code dependencies due to insufficient contextual awareness. To address these gaps, we introduce Rover, a novel conflict resolution system that integrates program analysis with large language models (LLMs). To obtain context-aware prompts, we propose Multi-layer Code Property Graph (MtCPG), a new representation capturing inter-file dependencies and enabling contextual analysis for a given conflict. Using graph connectivity algorithms, Rover further clusters conflicting code and associated changes into meaningful "contexts" that guide the LLM in generating accurate resolutions. We compared Rover with standalone LLMs, machine learning baseline MergeGen, and suggestion provider tool WizardMerge with adjacent code as the contexts. Evaluation results show that Rover surpasses all of these approaches in terms of conflict resolution, achieving higher similarity to ground-truth resolutions at character, lexical, and semantic levels.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents Rover, a context-aware conflict resolution system for code merging that combines program analysis using a Multi-layer Code Property Graph (MtCPG) with large language models (LLMs). By clustering conflicting code and changes via graph connectivity algorithms, Rover generates context-aware prompts for the LLM to produce resolutions. Evaluation shows Rover achieves higher similarity to ground-truth resolutions compared to standalone LLMs, MergeGen, and WizardMerge at character, lexical, and semantic levels.

Significance. If the results hold, Rover could improve automated merge conflict resolution in large projects by better inferring developer intentions through enhanced contextual analysis, potentially reducing the need for manual handling of unresolved conflicts and advancing the integration of LLMs with program analysis techniques.

major comments (2)
  1. [Abstract] Abstract: The central claim that Rover surpasses baselines rests on higher similarity to ground-truth resolutions, but the manuscript provides no details on dataset size, conflict selection criteria, statistical significance, or exact evaluation protocol. This makes the empirical superiority difficult to verify and is load-bearing for the headline result.
  2. [Approach] MtCPG and clustering description: The approach assumes that the Multi-layer Code Property Graph plus graph connectivity clustering will reliably produce contexts allowing the LLM to infer developer intentions without new errors or hallucinations, yet no verification is provided that extracted contexts are complete (e.g., inter-file dependencies) or that LLM outputs preserve original semantics beyond the reported similarity metrics. This assumption directly supports the claim of improved conflict resolution.
minor comments (1)
  1. [Abstract] Abstract: Consider briefly noting the scale of the evaluation (e.g., number of conflicts or projects) to give readers an immediate sense of the empirical basis.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and indicate where revisions will be made to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that Rover surpasses baselines rests on higher similarity to ground-truth resolutions, but the manuscript provides no details on dataset size, conflict selection criteria, statistical significance, or exact evaluation protocol. This makes the empirical superiority difficult to verify and is load-bearing for the headline result.

    Authors: We agree that the abstract omits these specifics for brevity. Section 4 of the manuscript describes the evaluation dataset, conflict selection from open-source repositories based on dependency complexity, the protocol for computing character, lexical, and semantic similarity, and statistical significance testing. We will revise the abstract to include a concise summary of dataset scale, selection approach, and significance to support verifiability of the claims. revision: yes

  2. Referee: [Approach] MtCPG and clustering description: The approach assumes that the Multi-layer Code Property Graph plus graph connectivity clustering will reliably produce contexts allowing the LLM to infer developer intentions without new errors or hallucinations, yet no verification is provided that extracted contexts are complete (e.g., inter-file dependencies) or that LLM outputs preserve original semantics beyond the reported similarity metrics. This assumption directly supports the claim of improved conflict resolution.

    Authors: The referee correctly identifies that we did not include separate verification steps for context completeness or semantic preservation beyond the similarity metrics. The MtCPG construction and connectivity-based clustering are intended to capture inter-file dependencies, but we acknowledge the lack of explicit checks such as manual audits or hallucination analysis. We will add a discussion subsection with qualitative examples of extracted contexts and resolutions to address this in the revision. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical evaluation of proposed system is self-contained

full rationale

The paper proposes a new MtCPG representation and graph-connectivity clustering to supply context to an LLM for merge-conflict resolution, then reports empirical similarity scores against baselines and ground-truth patches. No equations, fitted parameters, or predictions appear; the central result is an external comparison rather than a derivation that reduces to its own inputs by construction. No load-bearing self-citations or uniqueness theorems imported from prior author work are invoked. The approach is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Only abstract available; no explicit free parameters, axioms, or invented entities beyond the proposed MtCPG representation can be extracted.

invented entities (1)
  • Multi-layer Code Property Graph (MtCPG) no independent evidence
    purpose: Capture inter-file dependencies to enable contextual analysis for code conflicts
    Proposed as a new representation in the abstract to address limitations of prior program analysis and ML approaches.

pith-pipeline@v0.9.0 · 5726 in / 1136 out tokens · 27008 ms · 2026-05-19T23:01:10.989483+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 5 internal anchors

  1. [1]

    Paola Accioly, Paulo Borba, and Guilherme Cavalcanti. 2018. Understanding semi-structured merge conflict character- istics in open-source java projects.Empirical Software Engineering23 (2018), 2051–2085

  2. [2]

    Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K Arora, Yu Bai, Bowen Baker, Haiming Bao, et al. 2025. gpt-oss-120b & gpt-oss-20b model card.arXiv preprint arXiv:2508.10925(2025)

  3. [3]

    Mehdi Ahmed-Nacer, Pascal Urso, and François Charoy. 2013. Improving textual merge result. In9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing. IEEE, 390–399

  4. [5]

    Waad Aldndni, Na Meng, and Francisco Servant. 2023. Automatic prediction of developers’ resolutions for software merge conflicts.Journal of Systems and Software206 (2023), 111836

  5. [6]

    Sven Apel, Olaf Leßenich, and Christian Lengauer. 2012. Structured merge with auto-tuning: balancing precision and performance. InProceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. 120–129

  6. [7]

    Sven Apel, Jörg Liebig, Benjamin Brandl, Christian Lengauer, and Christian Kästner. 2011. Semistructured merge: rethinking merge in revision control systems. InProceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. 190–200

  7. [8]

    Eric Bruneton, Romain Lenglet, and Thierry Coupaye. 2002. ASM: a code manipulation tool to implement adaptable systems.Adaptable and extensible component systems30, 19 (2002)

  8. [9]

    Max Brunsfeld. 2025. Tree-sitter. https://github.com/tree-sitter/tree-sitter. Accessed: 2025-3-9

  9. [10]

    Guilherme Cavalcanti, Paulo Borba, Georg Seibt, and Sven Apel. 2019. The impact of structure on software merging: Semistructured versus structured merge. In2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1002–1013

  10. [11]

    Cp-Algorithms. 2023. Segment Tree. https://cp-algorithms.com/data_structures/segment_tree.html. Accessed: 2024-Nov

  11. [12]

    Léuson Da Silva, Paulo Borba, Toni Maciel, Wardah Mahmood, Thorsten Berger, João Moisakis, Aldiberg Gomes, and Vinícius Leite. 2024. Detecting semantic conflicts with unit tests.Journal of Systems and Software214 (2024), 112070

  12. [13]

    Léuson Da Silva, Paulo Borba, and Arthur Pires. 2022. Build conflicts in the wild.Journal of Software: Evolution and Process34, 4 (2022), e2441

  13. [14]

    Pantazis Deligiannis, Akash Lal, Nikita Mehrotra, and Aseem Rastogi. 2023. Fixing rust compilation errors using llms. arXiv preprint arXiv:2308.05177(2023)

  14. [15]

    Elizabeth Dinella, Todd Mytkowicz, Alexey Svyatkovskiy, Christian Bird, Mayur Naik, and Shuvendu Lahiri. 2022. Deepmerge: Learning to merge programs.IEEE Transactions on Software Engineering49, 4 (2022), 1599–1614

  15. [16]

    Jinhao Dong, Qihao Zhu, Zeyu Sun, Yiling Lou, and Dan Hao. 2023. Merge Conflict Resolution: Classification or Generation?. In2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1652– 1663

  16. [17]

    Paulo Elias, Heleno de S Campos Junior, Eduardo Ogasawara, and Leonardo Gresta Paulino Murta. 2023. Towards accurate recommendations of merge conflicts resolution strategies.Information and Software Technology(2023), 107332

  17. [18]

    Paulo Elias, Heleno de S Campos Junior, Eduardo Ogasawara, and Leonardo Gresta Paulino Murta. 2023. Towards accurate recommendations of merge conflicts resolution strategies.Information and Software Technology164 (2023), 107332

  18. [19]

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al . 2024. The llama 3 herd of models.arXiv preprint arXiv:2407.21783(2024)

  19. [20]

    Xiangming Gu, Tianyu Pang, Chao Du, Qian Liu, Fengzhuo Zhang, Cunxiao Du, Ye Wang, and Min Lin. 2024. When Attention Sink Emerges in Language Models: An Empirical View.arXiv preprint arXiv:2410.10781(2024)

  20. [21]

    Mário Luís Guimarães and António Rito Silva. 2012. Improving early detection of software merge conflicts. In2012 34th International Conference on Software Engineering (ICSE). IEEE, 342–352

  21. [22]

    Simon Larsén, Jean-Rémy Falleri, Benoit Baudry, and Martin Monperrus. 2022. Spork: Structured Merge for Java with Formatting Preservation.IEEE Transactions on Software Engineering49, 1 (2022), 64–83

  22. [23]

    Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. InInternational symposium on code generation and optimization, 2004. CGO 2004.IEEE, 75–86

  23. [24]

    O’Reilly Media, Inc

    Jon Loeliger and Matthew McCullough. 2012.Version Control with Git: Powerful tools and techniques for collaborative software development. " O’Reilly Media, Inc. "

  24. [25]

    Qinyu Luo, Yining Ye, Shihao Liang, Zhong Zhang, Yujia Qin, Yaxi Lu, Yesai Wu, Xin Cong, Yankai Lin, Yingli Zhang, et al. 2024. Repoagent: An llm-powered open-source framework for repository-level code documentation generation. , Vol. 1, No. 1, Article . Publication date: May 2018. Rover: Context-aware Conflict Resolution with LLM 21 arXiv preprint arXiv:...

  25. [26]

    Dewayne E Perry, Harvey P Siy, and Lawrence G Votta. 2001. Parallel changes in large-scale software development: an observational case study.ACM Transactions on Software Engineering and Methodology (TOSEM)10, 3 (2001), 308–337

  26. [27]

    O’Reilly Media, Inc

    C Michael Pilato, Ben Collins-Sussman, and Brian W Fitzpatrick. 2008.Version control with subversion: next generation open source version control. " O’Reilly Media, Inc. "

  27. [28]

    Eric Sven Ristad and Peter N Yianilos. 1998. Learning string-edit distance.IEEE Transactions on Pattern Analysis and Machine Intelligence20, 5 (1998), 522–532

  28. [29]

    Saul Schleimer, Daniel S Wilkerson, and Alex Aiken. 2003. Winnowing: local algorithms for document fingerprinting. InProceedings of the 2003 ACM SIGMOD international conference on Management of data. 76–85

  29. [30]

    Georg Seibt, Florian Heck, Guilherme Cavalcanti, Paulo Borba, and Sven Apel. 2021. Leveraging structure in software merge: An empirical study.IEEE Transactions on Software Engineering48, 11 (2021), 4590–4610

  30. [31]

    Bo Shen, Wei Zhang, Haiyan Zhao, Guangtai Liang, Zhi Jin, and Qianxiang Wang. 2019. IntelliMerge: a refactoring- aware software merging technique.Proceedings of the ACM on Programming Languages3, OOPSLA (2019), 1–28

  31. [32]

    Chaochao Shen, Wenhua Yang, Minxue Pan, and Yu Zhou. 2023. Git Merge Conflict Resolution Leveraging Strategy Classification and LLM. In2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS). IEEE, 228–239

  32. [33]

    Marcelo Sousa, Isil Dillig, and Shuvendu K Lahiri. 2018. Verified three-way program merge.Proceedings of the ACM on Programming Languages2, OOPSLA (2018), 1–29

  33. [34]

    Diomidis Spinellis. 2005. Version control systems.IEEE software22, 5 (2005), 108–109

  34. [35]

    Alexey Svyatkovskiy, Sarah Fakhoury, Negar Ghorbani, Todd Mytkowicz, Elizabeth Dinella, Christian Bird, Jinu Jang, Neel Sundaresan, and Shuvendu K Lahiri. 2022. Program merge conflict resolution via neural transformers. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 822–833

  35. [36]

    Manuel Then, Moritz Kaufmann, Fernando Chirigati, Tuan-Anh Hoang-Vu, Kien Pham, Alfons Kemper, Thomas Neumann, and Huy T Vo. 2014. The more the merrier: Efficient multi-source graph traversal.Proceedings of the VLDB Endowment8, 4 (2014), 449–460

  36. [37]

    Gustavo Vale, Claus Hunsen, Eduardo Figueiredo, and Sven Apel. 2021. Challenges of resolving merge conflicts: A mining and survey study.IEEE Transactions on Software Engineering48, 12 (2021), 4964–4985

  37. [38]

    Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie Hendren, Patrick Lam, and Vijay Sundaresan. 2010. Soot: A Java bytecode optimization framework. InCASCON First Decade High Impact Papers. 214–224

  38. [39]

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al . 2022. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems35 (2022), 24824–24837

  39. [40]

    Peipei Xia, Li Zhang, and Fanzhang Li. 2015. Learning similarity with cosine similarity ensemble.Information sciences 307 (2015), 39–52

  40. [41]

    Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. 2014. Modeling and discovering vulnerabilities with code property graphs. In2014 IEEE symposium on security and privacy. IEEE, 590–604

  41. [42]

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025. Qwen3 technical report.arXiv preprint arXiv:2505.09388(2025)

  42. [43]

    John Yang, Carlos E Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. 2024. Swe-agent: Agent-computer interfaces enable automated software engineering.Advances in Neural Information Processing Systems37 (2024), 50528–50652

  43. [44]

    Jialu Zhang, Todd Mytkowicz, Mike Kaufman, Ruzica Piskac, and Shuvendu K Lahiri. 2022. Using pre-trained language models to resolve textual and semantic merge conflicts (experience paper). InProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 77–88

  44. [45]

    Qingyu Zhang, Junzhe Li, Jiayi Lin, Jie Ding, Lanteng Lin, and Chenxiong Qian. 2025. WizardMerge—Save Us from Merging without Any Clues.ACM Transactions on Software Engineering and Methodology35, 1 (2025), 1–28

  45. [46]

    Qingyu Zhang, Liangcai Su, Kai Ye, and Chenxiong Qian. 2024. CONGRA: Benchmarking Automatic Conflict Resolution. arXiv preprint arXiv:2409.14121(2024)

  46. [47]

    Yuntong Zhang, Haifeng Ruan, Zhiyu Fan, and Abhik Roychoudhury. 2024. Autocoderover: Autonomous program improvement. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1592–1604

  47. [48]

    Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023. A survey of large language models.arXiv preprint arXiv:2303.18223(2023)

  48. [49]

    Fengmin Zhu and Fei He. 2018. Conflict resolution for structured merge via version space algebra.Proceedings of the ACM on Programming Languages2, OOPSLA (2018), 1–25. , Vol. 1, No. 1, Article . Publication date: May 2018. 22 Qingyu Zhang, Junzhe Li, Jiayi Lin, Changhua Luo, and Chenxiong Qian

  49. [50]

    Fengmin Zhu, Xingyu Xie, Dongyu Feng, Na Meng, and Fei He. 2022. Mastery: Shifted-Code-Aware Structured Merging. InInternational Symposium on Dependable Software Engineering: Theories, Tools, and Applications. Springer, 70–87

  50. [51]

    Nazatul Nurlisa Zolkifli, Amir Ngah, and Aziz Deraman. 2018. Version control system: A review.Procedia Computer Science135 (2018), 408–415. , Vol. 1, No. 1, Article . Publication date: May 2018