Recognition: unknown
CIA: Inferring the Communication Topology from LLM-based Multi-Agent Systems
Pith reviewed 2026-05-10 15:31 UTC · model grok-4.3
The pith
Communication topologies in LLM-based multi-agent systems can be inferred from black-box access by analyzing semantic correlations in elicited reasoning outputs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that MAS communication topologies can be inferred under a restrictive black-box setting by constructing adversarial queries to induce intermediate agents' reasoning outputs and modeling their semantic correlations through global bias disentanglement and LLM-guided weak supervision, with experiments showing an average AUC of 0.87 and peaks up to 0.99 on systems with optimized topologies.
What carries the argument
Communication Inference Attack (CIA), which crafts queries to elicit reasoning outputs and applies global bias disentanglement plus LLM-guided weak supervision to recover topology from semantic correlations.
If this is right
- Optimized communication structures in deployed MAS become extractable as intellectual property by external observers.
- Black-box access to MAS outputs is sufficient to map internal information flow and identify vulnerabilities.
- Security evaluations for multi-agent systems must now include tests for topology leakage under query-based attacks.
- Design choices for agent collaboration can be reverse-engineered to replicate or disrupt system behavior.
Where Pith is reading between the lines
- Defensive techniques such as output noise injection or topology randomization could be tested to reduce correlation signals.
- The same query-and-correlation approach might extend to inferring interaction patterns in other black-box collaborative AI setups.
- Scaling the attack to MAS with dozens of agents or different base models would clarify whether accuracy holds beyond the tested cases.
Load-bearing premise
The assumption that semantic correlations in the elicited reasoning outputs faithfully encode the true underlying communication links without any internal agent states or direct logs.
What would settle it
Running the attack on a MAS with a fully known ground-truth topology and finding that the recovered structure matches the true one at rates no better than random guessing.
read the original abstract
LLM-based Multi-Agent Systems (MAS) have demonstrated remarkable capabilities in solving complex tasks. Central to MAS is the communication topology which governs how agents exchange information internally. Consequently, the security of communication topologies has attracted increasing attention. In this paper, we investigate a critical privacy risk: MAS communication topologies can be inferred under a restrictive black-box setting, exposing system vulnerabilities and posing significant intellectual property threats. To explore this risk, we propose Communication Inference Attack (CIA), a novel attack that constructs new adversarial queries to induce intermediate agents' reasoning outputs and models their semantic correlations through the proposed global bias disentanglement and LLM-guided weak supervision. Extensive experiments on MAS with optimized communication topologies demonstrate the effectiveness of CIA, achieving an average AUC of 0.87 and a peak AUC of up to 0.99, thereby revealing the substantial privacy risk in MAS.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes the Communication Inference Attack (CIA) to recover the communication topology of black-box LLM-based multi-agent systems. Adversarial queries are used to elicit intermediate agent reasoning traces; semantic correlations among these traces are then modeled via a global bias disentanglement step followed by LLM-guided weak supervision. Experiments on MAS with optimized topologies report an average AUC of 0.87 and a peak of 0.99, which the authors interpret as evidence of a substantial privacy risk.
Significance. If the attack is shown to be topology-specific rather than artifact-driven, the result would be significant: it would demonstrate that MAS communication graphs are recoverable from output semantics alone, with direct implications for intellectual-property protection and secure MAS deployment. The combination of adversarial query construction with LLM-guided supervision is a technically interesting direction that could generalize to other black-box inference problems.
major comments (3)
- [§3.2] §3.2 (Global Bias Disentanglement): The formulation does not include an explicit orthogonalization or ablation that isolates topology-induced correlations from shared prompt/role artifacts. If the disentanglement step leaves residual prompt leakage, the reported AUC values could be explained by spurious alignment rather than genuine topology recovery; this is load-bearing for the black-box privacy-risk claim.
- [§4.1] §4.1 and Table 2: The experimental section provides no quantitative controls or baselines that test whether the attack exploits prompt structure versus topology edges (e.g., an ablation that randomizes agent roles while keeping the graph fixed). Without such controls, it is impossible to confirm that the peak AUC of 0.99 reflects topology inference rather than prompt leakage.
- [§4.3] §4.3: No statistical reporting (variance across random seeds, confidence intervals, or number of independent runs) accompanies the AUC figures. Given that the central claim rests on these performance numbers, the absence of variability measures weakens the reliability assessment.
minor comments (2)
- [Abstract] The abstract and §4 omit the exact number of adversarial queries per topology and the precise MAS task distributions; adding these details would improve reproducibility.
- [Figure 3] Figure 3 caption does not state the number of agents or the optimization procedure used to generate the “optimized topologies,” making the figure hard to interpret without cross-referencing the text.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment point by point below. Where the concerns identify gaps in the original submission, we have revised the manuscript accordingly to strengthen the evidence that CIA recovers topology rather than prompt artifacts.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Global Bias Disentanglement): The formulation does not include an explicit orthogonalization or ablation that isolates topology-induced correlations from shared prompt/role artifacts. If the disentanglement step leaves residual prompt leakage, the reported AUC values could be explained by spurious alignment rather than genuine topology recovery; this is load-bearing for the black-box privacy-risk claim.
Authors: We agree that an explicit isolation step strengthens the claim. The global bias disentanglement subtracts a shared component estimated from all traces, but to make this rigorous we have added (i) a Gram-Schmidt orthogonalization of the correlation matrix against a prompt-role subspace constructed from role-only runs and (ii) a new ablation that removes the disentanglement module entirely. The revised §3.2 now contains the updated formulation and the ablation results (AUC drops from 0.87 to 0.61 when disentanglement is omitted). These additions directly address the possibility of residual prompt leakage. revision: yes
-
Referee: [§4.1] §4.1 and Table 2: The experimental section provides no quantitative controls or baselines that test whether the attack exploits prompt structure versus topology edges (e.g., an ablation that randomizes agent roles while keeping the graph fixed). Without such controls, it is impossible to confirm that the peak AUC of 0.99 reflects topology inference rather than prompt leakage.
Authors: We accept that the original experiments lacked this control. We have added a new baseline in which agent roles and system prompts are randomly reassigned while the underlying communication graph is held fixed. Under this condition the attack AUC falls to 0.52 ± 0.04 (near chance), whereas the original optimized-topology setting retains 0.87. The revised Table 2 now reports both the original and the randomized-role results, confirming that performance is driven by topology edges rather than prompt structure. revision: yes
-
Referee: [§4.3] §4.3: No statistical reporting (variance across random seeds, confidence intervals, or number of independent runs) accompanies the AUC figures. Given that the central claim rests on these performance numbers, the absence of variability measures weakens the reliability assessment.
Authors: We thank the referee for highlighting this omission. All experiments have been re-executed with five independent random seeds (different LLM sampling seeds and query orderings). The revised §4.3 and Table 2 now report mean AUC together with standard deviation and 95 % confidence intervals. The average AUC is 0.87 ± 0.03 (CI [0.84, 0.90]) and the peak remains 0.99 ± 0.01, demonstrating stability across runs. revision: yes
Circularity Check
No significant circularity; empirical attack evaluation is self-contained
full rationale
The paper proposes CIA as an empirical attack method relying on adversarial queries, global bias disentanglement, and LLM-guided weak supervision to recover topologies from black-box outputs, then reports AUC performance (avg 0.87, peak 0.99) on optimized MAS instances. No equations, derivations, or claims reduce by construction to fitted parameters, self-definitions, or self-citation chains. The central result is an external performance measurement on held-out or varied topologies rather than a renaming, ansatz smuggling, or uniqueness theorem imported from prior author work. This is a standard empirical security evaluation with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems
FlowSteer is a prompt-only attack that biases multi-agent LLM workflow planning to propagate malicious signals, raising success rates by up to 55%, with FlowGuard as an input-side defense reducing it by up to 34%.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.