Multi-agent decision making: A Blackwell's informativeness approach

Cuong C. Nguyen; Gustavo Carneiro; Kevin Wells; Zheng Zhang

arxiv: 2605.06028 · v1 · submitted 2026-05-07 · 💻 cs.LG

Multi-agent decision making: A Blackwell's informativeness approach

Zheng Zhang , Cuong C. Nguyen , Kevin Wells , Gustavo Carneiro This is my paper

Pith reviewed 2026-05-08 14:17 UTC · model grok-4.3

classification 💻 cs.LG

keywords blackwellapproachdebateinformativenesspooledposteriorvotingagents

0 comments

The pith

Voting and debate in multi-LLM systems induce information structures no more informative than pooled private information under Blackwell ordering, with a product-of-posteriors estimator outperforming them on six QA benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models can be treated as agents that each hold some private information about a question. When multiple models collaborate, common methods like taking a majority vote or having them debate are shown here to produce decisions that are no better informed than if all their private information had simply been combined into one pool. The paper uses a mathematical ordering called Blackwell informativeness to establish this limit and identifies the ideal rule as computing the single best guess from that full pool. Because directly computing the pooled posterior is hard in practice, the authors approximate it by multiplying the individual models' probability estimates together. Experiments on six standard question-answering datasets show this approximation beats both voting and debate baselines. The work therefore supplies both a theoretical ceiling on what collaboration can achieve and a concrete way to get closer to that ceiling without needing to share raw model internals.

Core claim

we show that voting and debate induce information structures that are no more informative than the pooled private information of all agents. This result identifies Bayesian pooled posterior maximisation as an information-theoretic upper-bound decision rule under the Blackwell ordering.

Load-bearing premise

That LLM agents' responses can be faithfully modeled as information structures within Blackwell's abstraction, and that the product-of-posteriors estimator sufficiently approximates the true pooled posterior for the observed performance gains.

read the original abstract

The rapid development of large language models (LLMs) has motivated research on decision-making in multi-agent systems, where multiple agents collaborate to achieve shared objectives. Existing aggregation approaches, such as voting and debate, are largely ad-hoc and lack formal guarantees regarding the informativeness of the resulting decisions. In this paper, we provide a principled approach to analyse decisions made in the multi-LLM setting using Blackwell's informativeness framework. Within the Blackwell information-structure abstraction, we show that voting and debate induce information structures that are no more informative than the pooled private information of all agents. This result identifies Bayesian pooled posterior maximisation as an information-theoretic upper-bound decision rule under the Blackwell ordering. Motivated by this theoretical analysis, we introduce a practical method for LLM-based question-answering (QA) tasks that estimates each agent's posterior and approximates the pooled posterior using a product-of-posteriors estimator. Extensive experiments on six QA benchmarks demonstrate that our approach outperforms state-of-the-art multi-LLM debate and voting methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Blackwell's framework shows voting and debate cannot beat pooled private signals in multi-LLM settings, and the product-of-posteriors estimator delivers measurable gains on QA tasks.

read the letter

The core contribution is a direct application of Blackwell's informativeness ordering to multi-agent LLM decision making. The authors prove that standard methods like voting and debate produce information structures no more informative than the joint private signals from all agents, so Bayesian pooling of posteriors is the information-theoretic upper bound. They then introduce a product-of-posteriors estimator as a practical surrogate and report that it outperforms debate and voting baselines across six QA benchmarks.

Referee Report

2 major / 3 minor

Summary. The paper applies Blackwell's informativeness ordering to multi-agent LLM decision making. It proves that voting and debate induce information structures no more informative than the pooled private signals of all agents, positioning Bayesian maximization of the pooled posterior as an information-theoretic upper bound. Motivated by this, it proposes a product-of-posteriors estimator as a practical surrogate for the pooled posterior and reports that the resulting method outperforms existing debate and voting baselines on six QA benchmarks.

Significance. If the central theoretical claim holds, the work supplies a clean decision-theoretic lens for comparing aggregation rules in multi-LLM systems and explains why ad-hoc procedures cannot exceed the information already present in the agents' private signals. The empirical results on multiple benchmarks provide initial evidence of practical utility. The explicit use of Blackwell ordering and the identification of a parameter-free upper bound are notable strengths that could guide future protocol design.

major comments (2)

[§4 (Theoretical Analysis), Theorem 1] §4 (Theoretical Analysis), Theorem 1: the proof that debate produces an information structure no more informative than the pooled private signals requires an explicit construction of the signal space and the observation kernel induced by the debate protocol; without this construction it is difficult to confirm that the Blackwell ordering is preserved for both single-round and multi-turn debate.
[§5 (Proposed Method)] §5 (Proposed Method): the product-of-posteriors estimator is introduced as an approximation to the Bayesian pooled posterior, yet the paper provides no error bound or set of sufficient conditions (e.g., conditional independence of agent responses) under which the approximation preserves the Blackwell ordering; this gap is load-bearing for the claim that the practical method approaches the theoretical upper bound.

minor comments (3)

[Abstract] The abstract states that experiments were run on six QA benchmarks but does not name them; listing the benchmarks and the exact evaluation protocol in the abstract would improve immediate readability.
[§2] Notation for information structures (e.g., the definition of the signal space and the Blackwell ordering) is introduced late; moving a compact definition to §2 would aid readers unfamiliar with the framework.
[Experiments] Table 1 (or equivalent results table) reports aggregate accuracy but omits per-benchmark standard deviations or statistical significance tests; adding these would strengthen the empirical claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the paper's contributions. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [§4 (Theoretical Analysis), Theorem 1] §4 (Theoretical Analysis), Theorem 1: the proof that debate produces an information structure no more informative than the pooled private signals requires an explicit construction of the signal space and the observation kernel induced by the debate protocol; without this construction it is difficult to confirm that the Blackwell ordering is preserved for both single-round and multi-turn debate.

Authors: We appreciate the referee's suggestion for greater explicitness. The current proof of Theorem 1 relies on the fact that any debate transcript (single-round or multi-turn) is generated as a (possibly stochastic) function of the agents' private signals alone, which by Blackwell's theorem immediately yields the claimed ordering. To address the concern directly, the revised manuscript will include an explicit construction: we define the debate signal space as the set of all possible finite-length transcripts over the vocabulary, with the observation kernel specified as the distribution over transcripts induced by the LLM agents' response functions conditioned on their private signals. This construction applies uniformly to both single-round and multi-turn protocols and confirms that the induced information structure is Blackwell-dominated by the pooled private signals. revision: yes
Referee: [§5 (Proposed Method)] §5 (Proposed Method): the product-of-posteriors estimator is introduced as an approximation to the Bayesian pooled posterior, yet the paper provides no error bound or set of sufficient conditions (e.g., conditional independence of agent responses) under which the approximation preserves the Blackwell ordering; this gap is load-bearing for the claim that the practical method approaches the theoretical upper bound.

Authors: We agree that a formal error analysis would tighten the link between the practical estimator and the theoretical upper bound. The revised version will add a new subsection discussing the product-of-posteriors under the sufficient condition of conditional independence of agent responses given the ground truth; under this condition the estimator is exactly the normalized pooled posterior when agents report log-odds. We also note that a general non-asymptotic error bound is unavailable because LLM response distributions are black-box and may violate independence. Nevertheless, the method remains a computationally tractable surrogate whose empirical performance on the six QA benchmarks consistently approaches or exceeds that of voting and debate, consistent with the information-theoretic motivation. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained within external Blackwell framework

full rationale

The paper's central theoretical claim—that voting and debate induce information structures no more informative than the pooled private signals—is derived by applying Blackwell's informativeness ordering to explicitly defined multi-agent information structures. This comparison is a direct consequence of the framework's partial order and does not reduce to any fitted parameter, self-citation, or internal definition. The product-of-posteriors estimator is presented only as a practical surrogate for the Bayesian pooled posterior, with its utility demonstrated empirically on external QA benchmarks rather than by algebraic identity. No self-definitional, fitted-input, or load-bearing self-citation patterns appear in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on the applicability of Blackwell's information-structure abstraction to LLM agent responses and the validity of the product-of-posteriors approximation; no free parameters or invented entities are mentioned.

axioms (1)

domain assumption LLM agent responses can be modeled as information structures comparable under Blackwell's informativeness ordering
Invoked to compare voting, debate, and pooled information.

pith-pipeline@v0.9.0 · 5475 in / 1267 out tokens · 34382 ms · 2026-05-08T14:17:43.441351+00:00 · methodology

Multi-agent decision making: A Blackwell's informativeness approach

Core claim

Load-bearing premise

discussion (0)