On the Reliability of Networks of AI Agents: Density Evolution, Stopping Sets, and Architecture Optimization
Pith reviewed 2026-06-26 21:46 UTC · model grok-4.3
The pith
A density-evolution theorem predicts the asymptotic fraction of unresolved subclaims in AI agent networks modeled as role-typed factor graphs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We prove a density-evolution theorem that predicts the asymptotic fraction of unresolved subclaims on random role-typed architectures, with an extension to deterministic, locally tree-like graph sequences. The XOR case recovers the classical LDPC recursion on the binary erasure channel (BEC); the AND case exposes an asymmetry between positive and negative verifier certificates.
What carries the argument
Density evolution on sparse role-typed factor graphs whose noisy Boolean check nodes combine set-valued messages via a single logical-forcing rule.
If this is right
- Random architectures admit a computable reliability threshold below which the fraction of unresolved subclaims vanishes.
- The XOR verifier recovers the classical LDPC threshold on the binary erasure channel.
- The AND verifier produces distinct thresholds for positive versus negative certificates.
- The same recursion applies to any locally tree-like sequence of deterministic graphs.
- Stopping-set analysis becomes available for finite-length performance bounds.
Where Pith is reading between the lines
- The same message-passing model could be used to compare reliability of different role assignments before deployment.
- Asymmetry results for AND suggest that certificate polarity should be balanced when designing verifier prompts.
- The framework supplies a concrete way to test whether a proposed multi-agent workflow will leave a positive fraction of claims unresolved.
- Stopping sets identified by the analysis point to minimal subgraphs whose repair would improve overall resolution.
Load-bearing premise
The three distinct failure modes can be modeled uniformly as erasures that propagate through set-valued messages combined by a single logical-forcing rule on the factor graph.
What would settle it
Monte Carlo simulation on large random role-typed graphs whose measured fraction of unresolved subclaims deviates from the density-evolution prediction by more than the predicted variance.
Figures
read the original abstract
Modern AI systems increasingly solve a task not with a single model call but with several imperfect agents working together: some propose pieces of a solution, others verify them, and the results are combined. These systems often outperform any single model, yet it is rarely clear why they succeed or when they will fail. We model such a system as message passing on a sparse graph, the structure that underlies low-density parity-check (LDPC) codes, and extend the density-evolution machinery of coding theory to this richer setting. In our model a task is a set of coupled binary subclaims, and an agent architecture is a sparse, role-typed factor graph whose check nodes are noisy Boolean verifier nodes, each computing a local Boolean function of the subclaims it touches. Three distinct failure modes, all modeled as erasures (an agent abstaining, a verifier returning no usable output, and a message lost between two agents), propagate as the agents exchange set-valued messages. The check agents combine these messages by a single logical-forcing rule that specializes to XOR, AND, OR, implication, and Horn constraints. This is more than a relabeling of LDPC theory: the verifier functions are nonlinear and value-asymmetric, and the three failure modes do not reduce to a single effective channel, so they require new threshold, finite-length, and converse results rather than a direct reuse of parity-check density evolution. We prove a density-evolution theorem that predicts the asymptotic fraction of unresolved subclaims on random role-typed architectures, with an extension to deterministic, locally tree-like graph sequences. The XOR case recovers the classical LDPC recursion on the binary erasure channel (BEC); the AND case exposes an asymmetry between positive and negative verifier certificates.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper models networks of AI agents solving tasks via coupled binary subclaims as message passing on sparse, role-typed factor graphs whose check nodes are noisy Boolean verifiers. Three failure modes (agent abstention, verifier abstention, message loss) are modeled uniformly as erasures that propagate via set-valued messages combined at checks by a single logical-forcing rule. The authors prove a density-evolution theorem predicting the asymptotic fraction of unresolved subclaims on random role-typed architectures (with an extension to deterministic locally tree-like sequences), recovering the classical LDPC BEC recursion for XOR and exposing positive/negative asymmetry for AND.
Significance. If the density-evolution theorem is correct, the work supplies a non-trivial extension of coding-theory tools to multi-agent reliability analysis, including new threshold and finite-length results that do not reduce to standard LDPC. The explicit recovery of the XOR case and the identification of AND asymmetry are concrete strengths; the manuscript also ships a theorem rather than only simulations.
major comments (1)
- [Abstract / density-evolution theorem] Abstract / density-evolution theorem: the claimed single DE recursion requires that the three failure modes can be treated uniformly as erasures under one logical-forcing rule on set-valued messages for arbitrary Boolean verifier functions. The abstract itself states that the verifiers are nonlinear and value-asymmetric and that the modes "do not reduce to a single effective channel"; any mode-specific effect on admissible message sets would necessitate distinct alphabets or separate recursions, undermining the single-theorem claim. This assumption is load-bearing for the central result.
minor comments (1)
- Clarify whether the extension to deterministic locally tree-like sequences is proved as a separate statement or follows directly as a corollary of the random-graph theorem.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for highlighting the modeling assumptions underlying the density-evolution theorem. We address the single major comment below and believe the central claim remains intact.
read point-by-point responses
-
Referee: [Abstract / density-evolution theorem] Abstract / density-evolution theorem: the claimed single DE recursion requires that the three failure modes can be treated uniformly as erasures under one logical-forcing rule on set-valued messages for arbitrary Boolean verifier functions. The abstract itself states that the verifiers are nonlinear and value-asymmetric and that the modes "do not reduce to a single effective channel"; any mode-specific effect on admissible message sets would necessitate distinct alphabets or separate recursions, undermining the single-theorem claim. This assumption is load-bearing for the central result.
Authors: The three erasure modes are deliberately modeled with the same message alphabet (subsets of {0,1}) and the same logical-forcing combination rule at every check node; the rule depends only on the verifier function, not on which physical mechanism produced the erasure. Consequently the message-passing dynamics remain identical across modes, permitting a single density-evolution recursion (Theorem 1) whose state is the distribution over unresolved sets. The statement that the modes “do not reduce to a single effective channel” refers to the fact that the resulting thresholds, asymmetry for AND/OR, and finite-length scaling differ from classical BEC-LDPC and therefore require fresh analysis; it does not imply that the message alphabet or recursion must be mode-specific. The derivation in Sections 3–4 makes this uniformity explicit for arbitrary Boolean verifiers. We are happy to insert one clarifying sentence in the abstract to separate the two senses of “channel.” revision: partial
Circularity Check
Density-evolution theorem extends LDPC machinery to role-typed graphs with independent content
full rationale
The paper frames its central result as a proved density-evolution theorem for the asymptotic unresolved fraction under set-valued messages and a logical-forcing rule on role-typed factor graphs. It explicitly distinguishes the setting from direct LDPC reuse by noting nonlinear asymmetric verifiers and multiple non-equivalent erasure modes, and recovers the classical BEC recursion only as the special XOR case while deriving new asymmetry for AND. No quoted step reduces the theorem to a fitted parameter, self-citation chain, or input by construction; the derivation is presented as an extension of existing coding-theory tools to a new domain with stated differences that require fresh analysis.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Density evolution applies to large random sparse graphs that are locally tree-like
Reference graph
Works this paper leans on
-
[1]
Hilbert: Recursively building formal proofs with informal reasoning,
S. Varambally, T. Voice, Y. Sun, Z. Chen, R. Yu, and K. Ye, “Hilbert: Recursively building formal proofs with informal reasoning,” inThe 5th Workshop on Mathematical Reasoning and AI at NeurIPS 2025, 2025. [Online]. Available: https://openreview.net/forum?id=ljAHonPrs1 57
2025
-
[2]
Coder: Issue resolving with multi-agent and task graphs,
D. Chen, S. Lin, M. Zeng, D. Zan, J.-G. Wang, A. Cheshkov, J. Sun, H. Yu, G. Dong, A. Aliev, J. Wang, X. Cheng, G. Liang, Y. Ma, P. Bian, T. Xie, and Q. Wang, “Coder: Issue resolving with multi-agent and task graphs,” 2024. [Online]. Available: https://arxiv.org/abs/2406.01304
arXiv 2024
-
[3]
Debating with more persuasive llms leads to more truthful answers,
A. Khan, J. Hughes, D. Valentine, L. Ruis, K. Sachan, A. Radhakrishnan, E. Grefenstette, S. R. Bowman, T. Rocktäschel, and E. Perez, “Debating with more persuasive llms leads to more truthful answers,” inProceedings of the 41st International Conference on Machine Learning, ser. ICML’24. JMLR.org, 2024
2024
-
[4]
Towards a science of ai agent reliability,
S. Rabanser, S. Kapoor, P. Kirgis, K. Liu, S. Utpala, and A. Narayanan, “Towards a science of ai agent reliability,” 2026. [Online]. Available: https://arxiv.org/abs/2602.16666
Pith/arXiv arXiv 2026
-
[5]
Richardson and R
T. Richardson and R. Urbanke,Modern Coding Theory. Cambridge University Press, 2008
2008
-
[6]
Analysis of low density codes and improved designs using irregular graphs,
M. Luby, M. Mitzenmacher, A. Shokrollah, and D. Spielman, “Analysis of low density codes and improved designs using irregular graphs,” inProceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, ser. STOC ’98. New York, NY, USA: Association for Computing Machinery, 1998, p. 249–258. [Online]. Available: https://doi.org/10.1145/276698.276756
-
[7]
The capacity of low-density parity-check codes under message-passing decoding,
T. Richardson and R. Urbanke, “The capacity of low-density parity-check codes under message-passing decoding,”IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 599–618, 2001
2001
-
[8]
Design of capacity-approaching irregular low-density parity-check codes,
T. Richardson, M. Shokrollahi, and R. Urbanke, “Design of capacity-approaching irregular low-density parity-check codes,” IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 619–637, 2001
2001
-
[9]
Finite-length analysis of low-density parity-check codes on the binary erasure channel,
C. Di, D. Proietti, I. E. Telatar, T. J. Richardson, and R. L. Urbanke, “Finite-length analysis of low-density parity-check codes on the binary erasure channel,”IEEE Trans. Inf. Theor., vol. 48, no. 6, p. 1570–1579, Sep. 2006. [Online]. Available: https://doi.org/10.1109/TIT.2002.1003839
-
[10]
Nonuniform error correction using low-density parity-check codes,
H. Pishro-Nik, N. Rahnavard, and F. Fekri, “Nonuniform error correction using low-density parity-check codes,”IEEE Transactions on Information Theory, vol. 51, no. 7, pp. 2702–2714, 2005
2005
-
[11]
Design of ldpc codes robust to noisy message-passing decoding,
A. Tarighati, H. Farhadi, and F. Lahouti, “Design of ldpc codes robust to noisy message-passing decoding,” 2015. [Online]. Available: https://arxiv.org/abs/1501.02483
Pith/arXiv arXiv 2015
-
[12]
Noisy density evolution with asymmetric deviation models,
E. Dupraz and F. Leduc-Primeau, “Noisy density evolution with asymmetric deviation models,”IEEE Transactions on Communications, vol. 69, no. 3, pp. 1403–1416, 2021
2021
-
[13]
SWE-bench: Can language models resolve real-world github issues?
C. E. Jimenez, J. Yang, A. Wettig, S. Yao, K. Pei, O. Press, and K. R. Narasimhan, “SWE-bench: Can language models resolve real-world github issues?” inThe Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview.net/forum?id=VTF8yNQM66
2024
-
[14]
SWE-agent: Agent-computer interfaces enable automated software engineering,
J. Yang, C. E. Jimenez, A. Wettig, K. Lieret, S. Yao, K. R. Narasimhan, and O. Press, “SWE-agent: Agent-computer interfaces enable automated software engineering,” inThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. [Online]. Available: https://openreview.net/forum?id=mXpq6ut8J3
2024
-
[15]
CLEVER: A curated benchmark for formally verified code generation,
A. Thakur, J. Lee, G. Tsoukalas, M. Sistla, M. Zhao, S. Zetzsche, G. Durrett, Y. Yue, and S. Chaudhuri, “CLEVER: A curated benchmark for formally verified code generation,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2026. [Online]. Available: https: //openreview.net/forum?id=IbOacMF5qd
2026
-
[16]
Proving the coding interview: A benchmark for formally verified code generation,
Q. Dougherty and R. Mehta, “Proving the coding interview: A benchmark for formally verified code generation,” in2025 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code), 2025, pp. 72–79
2025
-
[17]
Solving a million-step llm task with zero errors,
E. Meyerson, G. Paolo, R. Dailey, H. Shahrzad, O. Francon, C. F. Hayes, X. Qiu, B. Hodjat, and R. Miikkulainen, “Solving a million-step llm task with zero errors,” 2025. [Online]. Available: https://arxiv.org/abs/2511.09030
arXiv 2025
-
[18]
Improving factuality and reasoning in language models through multiagent debate,
Y. Du, S. Li, A. Torralba, J. B. Tenenbaum, and I. Mordatch, “Improving factuality and reasoning in language models through multiagent debate,” inProceedings of the 41st International Conference on Machine Learning, ser. ICML’24. JMLR.org, 2024
2024
-
[19]
On scalable oversight with weak LLMs judging strong LLMs,
Z. Kenton, N. Y. Siegel, J. Kramar, J. Brown-Cohen, S. Albanie, J. Bulian, R. Agarwal, D. Lindner, Y. Tang, N. Goodman, and R. Shah, “On scalable oversight with weak LLMs judging strong LLMs,” inThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. [Online]. Available: https://openreview.net/forum?id=O1fp9nVraj
2024
-
[20]
Debate or vote: Which yields better decisions in multi-agent large language models?
H. K. Choi, J. Zhu, and S. Li, “Debate or vote: Which yields better decisions in multi-agent large language models?” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. [Online]. Available: https://openreview.net/forum?id=iUjGNJzrF1
2026
-
[21]
Factor graphs and the sum-product algorithm,
F. Kschischang, B. Frey, and H.-A. Loeliger, “Factor graphs and the sum-product algorithm,”IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 498–519, 2001
2001
-
[22]
Low-density parity-check codes,
R. Gallager, “Low-density parity-check codes,”IRE Transactions on Information Theory, vol. 8, no. 1, pp. 21–28, 1962
1962
-
[23]
On decoding of low-density parity-check codes over the binary erasure channel,
H. Pishro-Nik and F. Fekri, “On decoding of low-density parity-check codes over the binary erasure channel,”IEEE Transactions on Information Theory, vol. 50, no. 3, pp. 439–454, 2004
2004
-
[24]
Results on punctured low-density parity-check codes and improved iterative decoding techniques,
——, “Results on punctured low-density parity-check codes and improved iterative decoding techniques,”IEEE Transactions on Information Theory, vol. 53, no. 2, pp. 599–614, 2007
2007
-
[25]
Generalized and doubly generalized ldpc codes with random component codes for the binary erasure channel,
E. Paolini, M. P. C. Fossorier, and M. Chiani, “Generalized and doubly generalized ldpc codes with random component codes for the binary erasure channel,”IEEE Transactions on Information Theory, vol. 56, no. 4, pp. 1651–1672, 2010
2010
-
[26]
Analysis of absorbing sets and fully absorbing sets of array-based ldpc codes,
L. Dolecek, Z. Zhang, V. Anantharam, M. J. Wainwright, and B. Nikolic, “Analysis of absorbing sets and fully absorbing sets of array-based ldpc codes,”IEEE Transactions on Information Theory, vol. 56, no. 1, pp. 181–201, 2010
2010
-
[27]
Density evolution for asymmetric memoryless channels,
C.-C. Wang, S. Kulkarni, and H. Poor, “Density evolution for asymmetric memoryless channels,”IEEE Transactions on Information Theory, vol. 51, no. 12, pp. 4216–4236, 2005
2005
-
[28]
Reflexion: language agents with verbal reinforcement learning,
N. Shinn, F. Cassano, A. Gopinath, K. R. Narasimhan, and S. Yao, “Reflexion: language agents with verbal reinforcement learning,” inThirty-seventh Conference on Neural Information Processing Systems, 2023. [Online]. Available: https://openreview.net/forum?id=vAElhFcKW6 58
2023
-
[29]
React: Synergizing reasoning and acting in language models,
S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. R. Narasimhan, and Y. Cao, “React: Synergizing reasoning and acting in language models,” inThe Eleventh International Conference on Learning Representations, 2023. [Online]. Available: https://openreview.net/forum?id=WE_vluYUL-X
2023
-
[30]
Tree of thoughts: Deliberate problem solving with large language models,
S. Yao, D. Yu, J. Zhao, I. Shafran, T. L. Griffiths, Y. Cao, and K. R. Narasimhan, “Tree of thoughts: Deliberate problem solving with large language models,” inThirty-seventh Conference on Neural Information Processing Systems,
-
[31]
Available: https://openreview.net/forum?id=5Xc1ecxO1h
[Online]. Available: https://openreview.net/forum?id=5Xc1ecxO1h
-
[32]
Self-consistency improves chain of thought reasoning in language models,
X. Wang, J. Wei, D. Schuurmans, Q. V. Le, E. H. Chi, S. Narang, A. Chowdhery, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,” inThe Eleventh International Conference on Learning Representations,
-
[33]
Available: https://openreview.net/forum?id=1PL1NIMMrw
[Online]. Available: https://openreview.net/forum?id=1PL1NIMMrw
-
[34]
Training verifiers to solve math word problems,
K. Cobbe, V. Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, M. Plappert, J. Tworek, J. Hilton, R. Nakano, C. Hesse, and J. Schulman, “Training verifiers to solve math word problems,” 2021. [Online]. Available: https://arxiv.org/abs/2110.14168
Pith/arXiv arXiv 2021
-
[35]
Let’s verify step by step,
H. Lightman, V. Kosaraju, Y. Burda, H. Edwards, B. Baker, T. Lee, J. Leike, J. Schulman, I. Sutskever, and K. Cobbe, “Let’s verify step by step,” inThe Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview.net/forum?id=v8L0pN6EOi
2024
-
[36]
minif2f: a cross-system benchmark for formal olympiad-level mathematics,
K. Zheng, J. M. Han, and S. Polu, “minif2f: a cross-system benchmark for formal olympiad-level mathematics,” inInternational Conference on Learning Representations, 2022. [Online]. Available: https://openreview.net/forum?id= 9ZPegFuFTFv
2022
-
[37]
Proofnet: Autoformalizing and formally proving undergraduate-level mathematics,
Z. Azerbayev, B. Piotrowski, H. Schoelkopf, E. W. Ayers, D. Radev, and J. Avigad, “Proofnet: Autoformalizing and formally proving undergraduate-level mathematics,” 2023. [Online]. Available: https://arxiv.org/abs/2302.12433
arXiv 2023
-
[38]
Lean-STar: Learning to interleave thinking and proving,
H. Lin, Z. Sun, S. Welleck, and Y. Yang, “Lean-STar: Learning to interleave thinking and proving,” inThe Thirteenth International Conference on Learning Representations, 2025. [Online]. Available: https://openreview.net/forum?id= SOWZ59UyNc
2025
-
[39]
On the reliability limits of llm-based multi-agent planning,
R. Ao, S. Gao, and D. Simchi-Levi, “On the reliability limits of llm-based multi-agent planning,” 2026. [Online]. Available: https://arxiv.org/abs/2603.26993
arXiv 2026
-
[40]
M. Mézard and A. Montanari,Information, Physics, and Computation. Oxford University Press, 01 2009. [Online]. Available: https://doi.org/10.1093/acprof:oso/9780198570837.001.0001
work page doi:10.1093/acprof:oso/9780198570837.001.0001 2009
-
[41]
On the method of bounded differences,
C. McDiarmid, “On the method of bounded differences,” inSurveys in combinatorics, 1989 (Norwich, 1989), ser. London Math. Soc. Lecture Note Ser. Cambridge Univ. Press, Cambridge, 1989, vol. 141, pp. 148–188
1989
-
[42]
Mas-fire: Fault injection and reliability evaluation for llm-based multi-agent systems,
J. Jia, Z. Deng, Z. Chen, Y. Wang, and Z. Zheng, “Mas-fire: Fault injection and reliability evaluation for llm-based multi-agent systems,” 2026. [Online]. Available: https://arxiv.org/abs/2602.19843
arXiv 2026
-
[43]
The objective method: Probabilistic combinatorial optimization and local weak convergence,
D. Aldous and J. M. Steele, “The objective method: Probabilistic combinatorial optimization and local weak convergence,” inProbability on Discrete Structures, H. Kesten, Ed. Berlin: Springer, 2004, pp. 1–72
2004
-
[44]
Resolvent of large random graphs,
C. Bordenave and M. Lelarge, “Resolvent of large random graphs,”Random Structures and Algorithms, vol. 37, no. 3, pp. 332–352, 2010
2010
-
[45]
Interlacing families I: Bipartite Ramanujan graphs of all degrees,
A. W. Marcus, D. A. Spielman, and N. Srivastava, “Interlacing families I: Bipartite Ramanujan graphs of all degrees,” Annals of Mathematics, vol. 182, no. 1, pp. 307–325, 2015
2015
-
[46]
El Gamal and Y.-H
A. El Gamal and Y.-H. Kim,Network Information Theory. Cambridge University Press, 2011
2011
-
[47]
T. M. Cover and J. A. Thomas,Elements of Information Theory 2nd Edition (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, July 2006
2006
-
[48]
Large deviations for sums of partly dependent random variables,
S. Janson, “Large deviations for sums of partly dependent random variables,”Random Structures and Algorithms, vol. 24, no. 3, pp. 234–248, 2004
2004
-
[49]
Stein’s method for concentration inequalities,
S. Chatterjee, “Stein’s method for concentration inequalities,”Probability Theory and Related Fields, vol. 138, pp. 305–321, 2007
2007
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.