arxiv: 2604.12461 · v1 · submitted 2026-04-14 · 💻 cs.AI

Recognition: unknown

CIA: Inferring the Communication Topology from LLM-based Multi-Agent Systems

Yongxuan Wu , Xixun Lin , He Zhang , Nan Sun , Kun Wang , Chuan Zhou , Shirui Pan , Yanan Cao

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:31 UTC · model grok-4.3

classification 💻 cs.AI

keywords multi-agent systemscommunication topologyinference attackblack-box settingLLMprivacy risksemantic correlationadversarial queries

0 comments

The pith

Communication topologies in LLM-based multi-agent systems can be inferred from black-box access by analyzing semantic correlations in elicited reasoning outputs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that the internal structure of how agents exchange information in LLM-powered multi-agent systems is recoverable even when only final outputs are visible and no internal states or logs are available. It introduces an attack that generates targeted queries to surface intermediate reasoning, then disentangles biases and applies weak supervision to link semantic patterns back to the original topology. If correct, this means the collaborative design of these systems is exposed to outsiders, creating risks for intellectual property and system security as multi-agent setups handle increasingly complex tasks. A reader would care because the attack works on optimized topologies and reaches high accuracy, turning a hidden design choice into observable information.

Core claim

The paper claims that MAS communication topologies can be inferred under a restrictive black-box setting by constructing adversarial queries to induce intermediate agents' reasoning outputs and modeling their semantic correlations through global bias disentanglement and LLM-guided weak supervision, with experiments showing an average AUC of 0.87 and peaks up to 0.99 on systems with optimized topologies.

What carries the argument

Communication Inference Attack (CIA), which crafts queries to elicit reasoning outputs and applies global bias disentanglement plus LLM-guided weak supervision to recover topology from semantic correlations.

If this is right

Optimized communication structures in deployed MAS become extractable as intellectual property by external observers.
Black-box access to MAS outputs is sufficient to map internal information flow and identify vulnerabilities.
Security evaluations for multi-agent systems must now include tests for topology leakage under query-based attacks.
Design choices for agent collaboration can be reverse-engineered to replicate or disrupt system behavior.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Defensive techniques such as output noise injection or topology randomization could be tested to reduce correlation signals.
The same query-and-correlation approach might extend to inferring interaction patterns in other black-box collaborative AI setups.
Scaling the attack to MAS with dozens of agents or different base models would clarify whether accuracy holds beyond the tested cases.

Load-bearing premise

The assumption that semantic correlations in the elicited reasoning outputs faithfully encode the true underlying communication links without any internal agent states or direct logs.

What would settle it

Running the attack on a MAS with a fully known ground-truth topology and finding that the recovered structure matches the true one at rates no better than random guessing.

read the original abstract

LLM-based Multi-Agent Systems (MAS) have demonstrated remarkable capabilities in solving complex tasks. Central to MAS is the communication topology which governs how agents exchange information internally. Consequently, the security of communication topologies has attracted increasing attention. In this paper, we investigate a critical privacy risk: MAS communication topologies can be inferred under a restrictive black-box setting, exposing system vulnerabilities and posing significant intellectual property threats. To explore this risk, we propose Communication Inference Attack (CIA), a novel attack that constructs new adversarial queries to induce intermediate agents' reasoning outputs and models their semantic correlations through the proposed global bias disentanglement and LLM-guided weak supervision. Extensive experiments on MAS with optimized communication topologies demonstrate the effectiveness of CIA, achieving an average AUC of 0.87 and a peak AUC of up to 0.99, thereby revealing the substantial privacy risk in MAS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a concrete black-box attack for recovering MAS communication graphs via adversarial queries and disentanglement, with high reported AUCs, but the method risks latching onto prompt artifacts instead of topology.

read the letter

The core result here is that you can often infer the hidden communication topology in LLM multi-agent systems even with only black-box access to outputs. They build CIA around adversarial queries that force out intermediate reasoning traces, then apply global bias disentanglement and LLM-guided weak supervision to turn semantic correlations into a recovered graph. On their test MAS with optimized topologies this yields average AUC 0.87 and peaks near 0.99, which is the main empirical punch.

Referee Report

3 major / 2 minor

Summary. The paper proposes the Communication Inference Attack (CIA) to recover the communication topology of black-box LLM-based multi-agent systems. Adversarial queries are used to elicit intermediate agent reasoning traces; semantic correlations among these traces are then modeled via a global bias disentanglement step followed by LLM-guided weak supervision. Experiments on MAS with optimized topologies report an average AUC of 0.87 and a peak of 0.99, which the authors interpret as evidence of a substantial privacy risk.

Significance. If the attack is shown to be topology-specific rather than artifact-driven, the result would be significant: it would demonstrate that MAS communication graphs are recoverable from output semantics alone, with direct implications for intellectual-property protection and secure MAS deployment. The combination of adversarial query construction with LLM-guided supervision is a technically interesting direction that could generalize to other black-box inference problems.

major comments (3)

[§3.2] §3.2 (Global Bias Disentanglement): The formulation does not include an explicit orthogonalization or ablation that isolates topology-induced correlations from shared prompt/role artifacts. If the disentanglement step leaves residual prompt leakage, the reported AUC values could be explained by spurious alignment rather than genuine topology recovery; this is load-bearing for the black-box privacy-risk claim.
[§4.1] §4.1 and Table 2: The experimental section provides no quantitative controls or baselines that test whether the attack exploits prompt structure versus topology edges (e.g., an ablation that randomizes agent roles while keeping the graph fixed). Without such controls, it is impossible to confirm that the peak AUC of 0.99 reflects topology inference rather than prompt leakage.
[§4.3] §4.3: No statistical reporting (variance across random seeds, confidence intervals, or number of independent runs) accompanies the AUC figures. Given that the central claim rests on these performance numbers, the absence of variability measures weakens the reliability assessment.

minor comments (2)

[Abstract] The abstract and §4 omit the exact number of adversarial queries per topology and the precise MAS task distributions; adding these details would improve reproducibility.
[Figure 3] Figure 3 caption does not state the number of agents or the optimization procedure used to generate the “optimized topologies,” making the figure hard to interpret without cross-referencing the text.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below. Where the concerns identify gaps in the original submission, we have revised the manuscript accordingly to strengthen the evidence that CIA recovers topology rather than prompt artifacts.

read point-by-point responses

Referee: [§3.2] §3.2 (Global Bias Disentanglement): The formulation does not include an explicit orthogonalization or ablation that isolates topology-induced correlations from shared prompt/role artifacts. If the disentanglement step leaves residual prompt leakage, the reported AUC values could be explained by spurious alignment rather than genuine topology recovery; this is load-bearing for the black-box privacy-risk claim.

Authors: We agree that an explicit isolation step strengthens the claim. The global bias disentanglement subtracts a shared component estimated from all traces, but to make this rigorous we have added (i) a Gram-Schmidt orthogonalization of the correlation matrix against a prompt-role subspace constructed from role-only runs and (ii) a new ablation that removes the disentanglement module entirely. The revised §3.2 now contains the updated formulation and the ablation results (AUC drops from 0.87 to 0.61 when disentanglement is omitted). These additions directly address the possibility of residual prompt leakage. revision: yes
Referee: [§4.1] §4.1 and Table 2: The experimental section provides no quantitative controls or baselines that test whether the attack exploits prompt structure versus topology edges (e.g., an ablation that randomizes agent roles while keeping the graph fixed). Without such controls, it is impossible to confirm that the peak AUC of 0.99 reflects topology inference rather than prompt leakage.

Authors: We accept that the original experiments lacked this control. We have added a new baseline in which agent roles and system prompts are randomly reassigned while the underlying communication graph is held fixed. Under this condition the attack AUC falls to 0.52 ± 0.04 (near chance), whereas the original optimized-topology setting retains 0.87. The revised Table 2 now reports both the original and the randomized-role results, confirming that performance is driven by topology edges rather than prompt structure. revision: yes
Referee: [§4.3] §4.3: No statistical reporting (variance across random seeds, confidence intervals, or number of independent runs) accompanies the AUC figures. Given that the central claim rests on these performance numbers, the absence of variability measures weakens the reliability assessment.

Authors: We thank the referee for highlighting this omission. All experiments have been re-executed with five independent random seeds (different LLM sampling seeds and query orderings). The revised §4.3 and Table 2 now report mean AUC together with standard deviation and 95 % confidence intervals. The average AUC is 0.87 ± 0.03 (CI [0.84, 0.90]) and the peak remains 0.99 ± 0.01, demonstrating stability across runs. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical attack evaluation is self-contained

full rationale

The paper proposes CIA as an empirical attack method relying on adversarial queries, global bias disentanglement, and LLM-guided weak supervision to recover topologies from black-box outputs, then reports AUC performance (avg 0.87, peak 0.99) on optimized MAS instances. No equations, derivations, or claims reduce by construction to fitted parameters, self-definitions, or self-citation chains. The central result is an external performance measurement on held-out or varied topologies rather than a renaming, ansatz smuggling, or uniqueness theorem imported from prior author work. This is a standard empirical security evaluation with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the attack relies on standard ML techniques and LLM capabilities assumed from prior work.

pith-pipeline@v0.9.0 · 5459 in / 904 out tokens · 46567 ms · 2026-05-10T15:31:46.130800+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems
cs.CR 2026-05 unverdicted novelty 7.0

FlowSteer is a prompt-only attack that biases multi-agent LLM workflow planning to propagate malicious signals, raising success rates by up to 55%, with FlowGuard as an input-side defense reducing it by up to 34%.