Multi-agent decision making: A Blackwell's informativeness approach
Pith reviewed 2026-05-08 14:17 UTC · model grok-4.3
The pith
Voting and debate in multi-LLM systems induce information structures no more informative than pooled private information under Blackwell ordering, with a product-of-posteriors estimator outperforming them on six QA benchmarks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
we show that voting and debate induce information structures that are no more informative than the pooled private information of all agents. This result identifies Bayesian pooled posterior maximisation as an information-theoretic upper-bound decision rule under the Blackwell ordering.
Load-bearing premise
That LLM agents' responses can be faithfully modeled as information structures within Blackwell's abstraction, and that the product-of-posteriors estimator sufficiently approximates the true pooled posterior for the observed performance gains.
read the original abstract
The rapid development of large language models (LLMs) has motivated research on decision-making in multi-agent systems, where multiple agents collaborate to achieve shared objectives. Existing aggregation approaches, such as voting and debate, are largely ad-hoc and lack formal guarantees regarding the informativeness of the resulting decisions. In this paper, we provide a principled approach to analyse decisions made in the multi-LLM setting using Blackwell's informativeness framework. Within the Blackwell information-structure abstraction, we show that voting and debate induce information structures that are no more informative than the pooled private information of all agents. This result identifies Bayesian pooled posterior maximisation as an information-theoretic upper-bound decision rule under the Blackwell ordering. Motivated by this theoretical analysis, we introduce a practical method for LLM-based question-answering (QA) tasks that estimates each agent's posterior and approximates the pooled posterior using a product-of-posteriors estimator. Extensive experiments on six QA benchmarks demonstrate that our approach outperforms state-of-the-art multi-LLM debate and voting methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper applies Blackwell's informativeness ordering to multi-agent LLM decision making. It proves that voting and debate induce information structures no more informative than the pooled private signals of all agents, positioning Bayesian maximization of the pooled posterior as an information-theoretic upper bound. Motivated by this, it proposes a product-of-posteriors estimator as a practical surrogate for the pooled posterior and reports that the resulting method outperforms existing debate and voting baselines on six QA benchmarks.
Significance. If the central theoretical claim holds, the work supplies a clean decision-theoretic lens for comparing aggregation rules in multi-LLM systems and explains why ad-hoc procedures cannot exceed the information already present in the agents' private signals. The empirical results on multiple benchmarks provide initial evidence of practical utility. The explicit use of Blackwell ordering and the identification of a parameter-free upper bound are notable strengths that could guide future protocol design.
major comments (2)
- [§4 (Theoretical Analysis), Theorem 1] §4 (Theoretical Analysis), Theorem 1: the proof that debate produces an information structure no more informative than the pooled private signals requires an explicit construction of the signal space and the observation kernel induced by the debate protocol; without this construction it is difficult to confirm that the Blackwell ordering is preserved for both single-round and multi-turn debate.
- [§5 (Proposed Method)] §5 (Proposed Method): the product-of-posteriors estimator is introduced as an approximation to the Bayesian pooled posterior, yet the paper provides no error bound or set of sufficient conditions (e.g., conditional independence of agent responses) under which the approximation preserves the Blackwell ordering; this gap is load-bearing for the claim that the practical method approaches the theoretical upper bound.
minor comments (3)
- [Abstract] The abstract states that experiments were run on six QA benchmarks but does not name them; listing the benchmarks and the exact evaluation protocol in the abstract would improve immediate readability.
- [§2] Notation for information structures (e.g., the definition of the signal space and the Blackwell ordering) is introduced late; moving a compact definition to §2 would aid readers unfamiliar with the framework.
- [Experiments] Table 1 (or equivalent results table) reports aggregate accuracy but omits per-benchmark standard deviations or statistical significance tests; adding these would strengthen the empirical claims.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive assessment of the paper's contributions. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [§4 (Theoretical Analysis), Theorem 1] §4 (Theoretical Analysis), Theorem 1: the proof that debate produces an information structure no more informative than the pooled private signals requires an explicit construction of the signal space and the observation kernel induced by the debate protocol; without this construction it is difficult to confirm that the Blackwell ordering is preserved for both single-round and multi-turn debate.
Authors: We appreciate the referee's suggestion for greater explicitness. The current proof of Theorem 1 relies on the fact that any debate transcript (single-round or multi-turn) is generated as a (possibly stochastic) function of the agents' private signals alone, which by Blackwell's theorem immediately yields the claimed ordering. To address the concern directly, the revised manuscript will include an explicit construction: we define the debate signal space as the set of all possible finite-length transcripts over the vocabulary, with the observation kernel specified as the distribution over transcripts induced by the LLM agents' response functions conditioned on their private signals. This construction applies uniformly to both single-round and multi-turn protocols and confirms that the induced information structure is Blackwell-dominated by the pooled private signals. revision: yes
-
Referee: [§5 (Proposed Method)] §5 (Proposed Method): the product-of-posteriors estimator is introduced as an approximation to the Bayesian pooled posterior, yet the paper provides no error bound or set of sufficient conditions (e.g., conditional independence of agent responses) under which the approximation preserves the Blackwell ordering; this gap is load-bearing for the claim that the practical method approaches the theoretical upper bound.
Authors: We agree that a formal error analysis would tighten the link between the practical estimator and the theoretical upper bound. The revised version will add a new subsection discussing the product-of-posteriors under the sufficient condition of conditional independence of agent responses given the ground truth; under this condition the estimator is exactly the normalized pooled posterior when agents report log-odds. We also note that a general non-asymptotic error bound is unavailable because LLM response distributions are black-box and may violate independence. Nevertheless, the method remains a computationally tractable surrogate whose empirical performance on the six QA benchmarks consistently approaches or exceeds that of voting and debate, consistent with the information-theoretic motivation. revision: partial
Circularity Check
No significant circularity; derivation self-contained within external Blackwell framework
full rationale
The paper's central theoretical claim—that voting and debate induce information structures no more informative than the pooled private signals—is derived by applying Blackwell's informativeness ordering to explicitly defined multi-agent information structures. This comparison is a direct consequence of the framework's partial order and does not reduce to any fitted parameter, self-citation, or internal definition. The product-of-posteriors estimator is presented only as a practical surrogate for the Bayesian pooled posterior, with its utility demonstrated empirically on external QA benchmarks rather than by algebraic identity. No self-definitional, fitted-input, or load-bearing self-citation patterns appear in the derivation chain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM agent responses can be modeled as information structures comparable under Blackwell's informativeness ordering
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.