pith. sign in

arxiv: 2606.00953 · v1 · pith:J2BCK4FMnew · submitted 2026-05-31 · 💻 cs.LG · cs.MA

When Parallelism Pays Off: Cohesion-Aware Task Partitioning for Multi-Agent Coding

Pith reviewed 2026-06-28 17:52 UTC · model grok-4.3

classification 💻 cs.LG cs.MA
keywords multi-agent LLM systemscode generationgraph partitioningtask decompositiondependency analysissoftware engineeringagent orchestration
0
0 comments X

The pith

Cohesion-aware partitioning of code dependency graphs lets multi-agent LLM coders raise pass rates while cutting time and cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper treats multi-agent coding as a graph partitioning task that trades off shorter critical paths against the cost of moving context between agents. It builds Co-Coder, which extracts a dependency graph via static analysis, isolates hub files, applies community detection to form cohesive partitions, and runs them with a dependency-aware scheduler. On 28 repository-level tasks the method improves pass rate, wall-clock time, and API cost over sequential, file-based, and other agent-team baselines, with the biggest gains on projects that have dense inter-file dependencies. The work shows that respecting code structure when assigning tasks to agents produces measurable efficiency gains rather than naive parallelism.

Core claim

Treating repository-level coding tasks as a graph partitioning problem on static dependency graphs, solved by isolating hubs and using community detection followed by dependency-aware scheduling, yields task assignments that improve pass rate by up to 14 percent, wall-clock speedup up to 2.1 times, and API cost reduction up to 35 percent compared with sequential and file-based parallel baselines.

What carries the argument

Static-analysis dependency graph partitioned by community detection after hub isolation, executed by a dependency-aware scheduler that assigns cohesive subgraphs to separate agents.

If this is right

  • Partitions that group files with many shared dependencies reduce the volume of context that must cross agent boundaries.
  • Community detection on the dependency graph produces more efficient task splits than assigning whole files or running everything sequentially.
  • The largest efficiency and quality gains appear on projects whose dependency graphs are densest.
  • Dependency-aware scheduling prevents agents from starting work on tasks whose prerequisites have not yet completed.
  • Reducing redundant context transfers directly lowers the number of tokens sent to the underlying LLM.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same graph-partitioning logic could be applied to other multi-agent workflows that involve shared state, such as collaborative planning or distributed data analysis.
  • If the static graph underestimates certain runtime dependencies, an online refinement step that updates the partition during execution could recover some of the lost gains.
  • Replacing the fixed community-detection heuristic with a learned cost model that predicts actual LLM context-transfer expense might tighten the partitions further.

Load-bearing premise

Static analysis of the repository produces dependency graphs that accurately reflect the real communication costs agents incur when passing context to one another.

What would settle it

Measure whether the reported gains disappear on a high-dependency project when the static graph is replaced by one built from actual runtime call traces that include dynamic dependencies missed by static analysis.

Figures

Figures reproduced from arXiv: 2606.00953 by Ethan Chandra, Fangru Lin, Lunyiu Nie, Stanislav Gannutin, Swarat Chaudhuri, Xu Yang.

Figure 1
Figure 1. Figure 1: Average pass rate on DevEval and CodeProjectEval. Labels above bars: wall￾clock speedup vs. Sequential version; labels inside bars: avg. API cost per task (USD). Inspired by classic distributed computing systems, we propose Cohesion-aware Coder (Co-Coder), an orchestration framework that uses directed acyclic graphs to partition and schedule multi-agent sys￾tems. Co-Coder represents the project as a weight… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of Co-Coder. The input specifications from the benchmarks are first parsed into [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Condensed RIB for the zxcvbn project. Three of six files shown; remaining files (feedback.py, time_estimates.py, __main__.py) omitted for space. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗
read the original abstract

Multi-agent Large Language Model (LLM) systems offer a way to decompose complex tasks, such as coding, through parallelization and context isolation. However, adding agents in practice introduces inter-agent communication overhead, which incurs extra cost and can sometimes offset the efficiency gains. We formalize multi-agent orchestration as a graph partitioning problem that captures the communication-to-computation trade-off: task decomposition can shorten critical-path computation, but cross-agent dependencies require costly context transfer. We instantiate this view in repository-level software engineering and present Cohesion-aware Coder (Co-Coder), which builds dependency graphs from static analysis, isolates structural hub files, partitions the graph via community detection, and executes the partition with a dependency-aware scheduler. Across 28 real-world tasks on DevEval and CodeProjectEval, Co-Coder advances the Pareto-frontier over sequential and file-based parallel baselines as well as Claude Code with Agent Teams, lifting pass rate by up to 14.0%, achieving up to a 2.10x wall-clock speedup, and reducing API cost by up to 35%, with the largest gains on the most dependency-dense projects. Co-coder demonstrates how cohesion-aware orchestration can make parallel coding agents both theoretically grounded and practically efficient, suggesting a broader design principle for multi-agent systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that formalizing multi-agent LLM coding as a graph-partitioning problem—where static-analysis dependency graphs are partitioned via community detection to balance communication-to-computation costs—yields Co-Coder, which improves over sequential, file-based parallel, and Claude Code baselines. On 28 tasks from DevEval and CodeProjectEval it reports up to 14% higher pass rate, 2.1× wall-clock speedup, and 35% lower API cost, with largest gains on dependency-dense projects.

Significance. If the empirical results survive scrutiny of baseline fairness, statistical significance, and the validity of static graphs as proxies for LLM context-transfer costs, the work supplies a concrete, graph-theoretic design principle for orchestrating parallel LLM agents. The explicit modeling of the communication-to-computation trade-off and the use of community detection on real repository graphs are strengths that could generalize beyond coding.

major comments (2)
  1. [Evaluation] Evaluation section: the reported 14.0% / 2.10× / 35% gains are presented as point estimates without reported standard deviations, number of runs, or statistical significance tests; this makes it impossible to judge whether the Pareto-frontier advance is robust or could be explained by run-to-run variance or post-hoc partition selection.
  2. [Method] Method (graph construction and scheduler): the central claim that cohesion-aware partitioning improves the relevant trade-off rests on the untested assumption that static-analysis edges (imports, calls) accurately encode the context-transfer costs actually incurred by LLM agents; no ablation, correlation study, or human annotation validates that syntactic dependencies predict semantic context volume or latency.
minor comments (2)
  1. [Abstract] Abstract and §1: the phrase "advances the Pareto-frontier" is used without defining the axes or showing the full frontier plot; a single sentence clarifying the three metrics and how the frontier is constructed would help.
  2. [Throughout] Notation: the paper introduces "Cohesion-aware Coder (Co-Coder)" but later refers to "Co-coder"; consistent capitalization and acronym usage throughout would reduce minor confusion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on evaluation robustness and the grounding of our graph-partitioning assumptions. We respond to each major comment below and indicate planned revisions.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: the reported 14.0% / 2.10× / 35% gains are presented as point estimates without reported standard deviations, number of runs, or statistical significance tests; this makes it impossible to judge whether the Pareto-frontier advance is robust or could be explained by run-to-run variance or post-hoc partition selection.

    Authors: We acknowledge the limitation in the current presentation. The reported figures are single-run point estimates obtained under fixed random seeds for reproducibility. In the revised manuscript we will re-execute all 28 tasks with at least five independent runs per method, report means and standard deviations, and include paired statistical tests (Wilcoxon signed-rank) together with p-values. We will also document the deterministic partition-selection procedure to rule out post-hoc bias. revision: yes

  2. Referee: [Method] Method (graph construction and scheduler): the central claim that cohesion-aware partitioning improves the relevant trade-off rests on the untested assumption that static-analysis edges (imports, calls) accurately encode the context-transfer costs actually incurred by LLM agents; no ablation, correlation study, or human annotation validates that syntactic dependencies predict semantic context volume or latency.

    Authors: Static-analysis edges are a standard, inexpensive proxy in repository-level software engineering; our results already show that gains scale with dependency density, providing indirect empirical support. Nevertheless, we agree that an explicit ablation would strengthen the claim. The revision will add (i) a comparison of community-detection partitions against random and file-based baselines on the same graphs and (ii) a brief discussion of the proxy’s limitations. A dedicated human annotation or correlation study of context volume is beyond the scope of the present work and is noted as future research. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on external benchmarks

full rationale

The paper formalizes orchestration as graph partitioning but reports results solely via empirical evaluation on external benchmarks (DevEval, CodeProjectEval) against published baselines. No equations, fitted parameters, or predictions reduce the reported pass-rate/speedup/cost gains to quantities defined inside the paper. No self-citation chains or ansatzes are load-bearing for the central claims. The derivation is self-contained against external data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no explicit free parameters, axioms, or invented entities are stated. The approach implicitly assumes that static dependency edges are a sufficient proxy for LLM context-transfer cost.

pith-pipeline@v0.9.1-grok · 5779 in / 1161 out tokens · 18532 ms · 2026-06-28T17:52:10.901081+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 9 canonical work pages · 4 internal anchors

  1. [1]

    C., Arun Iyer, Suresh Parthasarathy, Sriram K

    Ramakrishna Bairi, Atharv Sonwane, Aditya Kanade, Vageesh D. C., Arun Iyer, Suresh Parthasarathy, Sriram K. Rajamani, Balasubramanyan Ashok, and Shashank Shet. Codeplan: Repository-level coding using llms and planning.CoRR, abs/2309.12499, 2023

  2. [2]

    Why Do Multi-Agent LLM Systems Fail?

    Mert Cemri, Melissa Z. Pan, Shuyi Yang, Lakshya A. Agrawal, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya G. Parameswaran, Dan Klein, Kannan Ramchandran, Matei Zaharia, Joseph E. Gonzalez, and Ion Stoica. Why do multi-agent LLM systems fail?CoRR, abs/2503.13657, 2025

  3. [3]

    Culler, Richard M

    David E. Culler, Richard M. Karp, David A. Patterson, Abhijit Sahay, Klaus E. Schauser, Eunice E. Santos, Ramesh Subramonian, and Thorsten von Eicken. Logp: Towards a realistic model of parallel computation. In Marina C. Chen and Robert Halstead, editors,Proceedings of the Fourth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOP...

  4. [4]

    TCP: a benchmark for temporal constraint-based planning

    Zifeng Ding, Sikuan Yan, Moy Yuan, Xianglong Hu, Fangru Lin, and Andreas Vlachos. TCP: a benchmark for temporal constraint-based planning. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors,Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, EMNLP 2025, Suzhou, China, November 4-9...

  5. [5]

    DevBench: A Realistic, Developer-Informed Benchmark for Code Generation Models

    Pareesa Ameneh Golnari, Adarsh Kumarappan, Wen Wen, Xiaoyu Liu, Gabriel Ryan, Yuting Sun, Shengyu Fu, and Elsie Nallipogu. Devbench: A realistic, developer-informed benchmark for code generation models.CoRR, abs/2601.11895, 2026

  6. [6]

    R. L. Graham. Bounds for certain multiprocessing anomalies.The Bell System Technical Journal, 45(9):1563–1581, 1966

  7. [7]

    Metagpt: Meta programming for A multi-agent collaborative framework

    Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and Jürgen Schmidhuber. Metagpt: Meta programming for A multi-agent collaborative framework. InThe Twelfth International Conference on Learning Representations, ICL...

  8. [8]

    Livecodebench: Holistic and contamination free evaluation of large language models for code

    Naman Jain, King Han, Alex Gu, Wen-Ding Li, Fanjia Yan, Tianjun Zhang, Sida Wang, Ar- mando Solar-Lezama, Koushik Sen, and Ion Stoica. Livecodebench: Holistic and contamination free evaluation of large language models for code. InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net, 2025

  9. [9]

    Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R

    Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R. Narasimhan. Swe-bench: Can language models resolve real-world github issues? In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024

  10. [10]

    Planning and scheduling in the process industry.OR Spectr., 24(3):219–250, 2002

    Josef Kallrath. Planning and scheduling in the process industry.OR Spectr., 24(3):219–250, 2002

  11. [11]

    A fast and high quality multilevel scheme for partitioning irregular graphs.SIAM J

    George Karypis and Vipin Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs.SIAM J. Sci. Comput., 20(1):359–392, 1998

  12. [12]

    Kernighan and Shen Lin

    Brian W. Kernighan and Shen Lin. An efficient heuristic procedure for partitioning graphs.Bell Syst. Tech. J., 49(2):291–307, 1970

  13. [13]

    Lieberwirth, Xinkai Yu, Yicheng Fu, Michael J

    Arpandeep Khatua, Hao Zhu, Peter Tran, Arya Prabhudesai, Frederic Sadrieh, Johann K. Lieberwirth, Xinkai Yu, Yicheng Fu, Michael J. Ryan, Jiaxin Pei, and Diyi Yang. Cooperbench: Why coding agents cannot be your teammates yet.CoRR, abs/2601.13295, 2026

  14. [14]

    Mahoney, Kurt Keutzer, and Amir Gholami

    Sehoon Kim, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael W. Mahoney, Kurt Keutzer, and Amir Gholami. An LLM compiler for parallel function calling. In Ruslan Salakhutdinov, Zico Kolter, Katherine A. Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Forty-first International Conference on Machine Learning, ICML 2024,...

  15. [15]

    Prompting large language models to tackle the full software development lifecycle: A case study

    Bowen Li, Wenhan Wu, Ziwei Tang, Lin Shi, John Yang, Jinyang Li, Shunyu Yao, Chen Qian, Binyuan Hui, Qicheng Zhang, Zhiyin Yu, He Du, Ping Yang, Dahua Lin, Chao Peng, and Kai Chen. Prompting large language models to tackle the full software development lifecycle: A case study. In Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eu...

  16. [16]

    CAMEL: communicative agents for "mind" exploration of large language model society

    Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. CAMEL: communicative agents for "mind" exploration of large language model society. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors, Advances in Neural Information Processing Systems 36: Annual Conference on Neural Infor- matio...

  17. [17]

    Deveval: A manually-annotated code generation benchmark aligned with real-world code repositories

    Jia Li, Ge Li, Yunfei Zhao, Yongmin Li, Huanyu Liu, Hao Zhu, Lecheng Wang, Kaibo Liu, Zheng Fang, Lanshen Wang, Jiazheng Ding, Xuanming Zhang, Yuqi Zhu, Yihong Dong, Zhi Jin, Binhua Li, Fei Huang, Yongbin Li, Bin Gu, and Mengfei Yang. Deveval: A manually-annotated code generation benchmark aligned with real-world code repositories. In Lun-Wei Ku, Andre Ma...

  18. [18]

    Cohn, and Janet B

    Fangru Lin, Emanuele La Malfa, Valentin Hofmann, Elle Michelle Yang, Anthony G. Cohn, and Janet B. Pierrehumbert. Graph-enhanced large language models in asynchronous plan reasoning. In Ruslan Salakhutdinov, Zico Kolter, Katherine A. Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Forty-first International Conference ...

  19. [19]

    Tianyang Liu, Canwen Xu, and Julian J. McAuley. Repobench: Benchmarking repository- level code auto-completion systems. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024

  20. [20]

    Large language models miss the multi-agent mark

    Emanuele La Malfa, Gabriele La Malfa, Samuele Marro, Jie M. Zhang, Elizabeth Black, Michael Luck, Philip Torr, and Michael J. Wooldridge. Large language models miss the multi-agent mark.CoRR, abs/2505.21298, 2025

  21. [21]

    Martin.Agile Software Development: Principles, Patterns, and Practices

    R.C. Martin.Agile Software Development: Principles, Patterns, and Practices. Alan Apt series. Pearson Education, 2003

  22. [22]

    Collins, Ilia Sucholutsky, Natalia Vélez, and Thomas L

    Elizabeth Mieczkowski, Katherine M. Collins, Ilia Sucholutsky, Natalia Vélez, and Thomas L. Griffiths. Language model teams as distributed systems.CoRR, abs/2603.12229, 2026

  23. [23]

    Chatdev: Communicative agents for software development

    Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, Juyuan Xu, Dahai Li, Zhiyuan Liu, and Maosong Sun. Chatdev: Communicative agents for software development. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Li...

  24. [24]

    Scaling large language model- based multi-agent collaboration

    Chen Qian, Zihao Xie, Yifei Wang, Wei Liu, Kunlun Zhu, Hanchen Xia, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Zhiyuan Liu, and Maosong Sun. Scaling large language model- based multi-agent collaboration. InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net, 2025

  25. [25]

    Maps of random walks on complex networks reveal community structure.Proceedings of the National Academy of Sciences, 105(4):1118–1123, 2008

    Martin Rosvall and Carl T Bergstrom. Maps of random walks on complex networks reveal community structure.Proceedings of the National Academy of Sciences, 105(4):1118–1123, 2008

  26. [26]

    Silverman, Jason D

    Alvin Wei Ming Tan, Chunhua Yu, Bria Long, Wanjing Ma, Tonya Murray, Rebecca D. Silverman, Jason D. Yeatman, and Michael C. Frank. Devbench: A multimodal developmental 11 benchmark for language learning. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors,Advances in Neural Informatio...

  27. [27]

    Performance-effective and low-complexity task scheduling for heterogeneous computing.IEEE Trans

    Haluk Topcuoglu, Salim Hariri, and Min-You Wu. Performance-effective and low-complexity task scheduling for heterogeneous computing.IEEE Trans. Parallel Distributed Syst., 13(3):260– 274, 2002

  28. [28]

    Leslie G. Valiant. A bridging model for parallel computation.Commun. ACM, 33(8):103–111, 1990

  29. [29]

    Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H

    Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, and et al. Openhands: An open platform for AI software developers as generalist agents. InThe...

  30. [30]

    Repoformer: Selective retrieval for repository-level code completion

    Di Wu, Wasi Uddin Ahmad, Dejiao Zhang, Murali Krishna Ramanathan, and Xiaofei Ma. Repoformer: Selective retrieval for repository-level code completion. In Ruslan Salakhutdinov, Zico Kolter, Katherine A. Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Forty-first International Conference on Machine Learning, ICML 2024,...

  31. [31]

    AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

    Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li, Li Jiang, Xiaoyun Zhang, and Chi Wang. Autogen: Enabling next-gen LLM applications via multi-agent conversation framework.CoRR, abs/2308.08155, 2023

  32. [32]

    Agentless: Demystifying LLM-based Software Engineering Agents

    Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, and Lingming Zhang. Agentless: Demystifying llm-based software engineering agents.CoRR, abs/2407.01489, 2024

  33. [33]

    Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press

    John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. Swe-agent: Agent-computer interfaces enable automated software engineering. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors,Advances in Neural Information Processing Syst...

  34. [34]

    Towards realistic project- level code generation via multi-agent collaboration and semantic architecture modeling.CoRR, abs/2511.03404, 2025

    Qianhui Zhao, Li Zhang, Fang Liu, Junhang Cheng, Chengru Wu, Junchen Ai, Qiaoyuanhe Meng, Lichen Zhang, Xiaoli Lian, Shubin Song, and Yuanping Guo. Towards realistic project- level code generation via multi-agent collaboration and semantic architecture modeling.CoRR, abs/2511.03404, 2025

  35. [35]

    DevBench

    Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, and Jürgen Schmidhuber. Gptswarm: Language agents as optimizable graphs. In Ruslan Salakhutdinov, Zico Kolter, Katherine A. Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Forty-first International Conference on Machine Learning, ICML 2024...