Instruction Bleed: Cross-Module Interference in Prompt-Composed Agentic Systems
Pith reviewed 2026-06-26 01:40 UTC · model grok-4.3
The pith
Architectural non-isolation in transformers causes measurable interference between prompt modules in agentic systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Compositional behavioral leakage is interference between modules sharing a context window due to transformer self-attention providing no formal boundary. On a deployed job-evaluation agent across 144 trials, a three-channel perturbation protocol that alters non-focal modules along volume, content, and form dimensions shows a detectable paired effect only in the content channel (Cohen's d = 0.63, bootstrap CI excluding zero). No recommendation is flipped, placing the phenomenon in a sub-threshold regime invisible to ordinary quality assurance but potentially compounding across thousands of decisions. The effect is orthogonal to adversarial injection, cognitive degradation, multi-agent fault p
What carries the argument
The three-channel perturbation protocol that perturbs non-focal modules along volume, content, and form to isolate cross-module interference.
If this is right
- Standard QA procedures miss sub-threshold cross-module effects that may still compound over repeated agent decisions.
- Cross-module interference measurement becomes a required part of evaluating prompt-composed agents.
- CBL is orthogonal to existing agent failure categories such as adversarial injection and multi-agent fault propagation.
- The reusable protocol and falsifiable prediction set allow systematic detection of this interference.
Where Pith is reading between the lines
- If the effect compounds as described, agents running over long sessions could exhibit gradual unexplained shifts in behavior.
- System builders may need to test for content bleed when combining many prompt modules into one context.
- The same non-isolation mechanism could affect any multi-module transformer application that relies on concatenated instructions.
Load-bearing premise
The three-channel perturbation protocol isolates effects caused by architectural non-isolation rather than task-specific confounds or the particular job-evaluation prompt structure.
What would settle it
A replication of the 144-trial experiment on the same job-evaluation agent setup that finds no paired effect in the content channel would falsify the claim of detectable compositional behavioral leakage.
read the original abstract
Practitioners of prompt-composed agentic systems report a recurring failure mode: editing one prompt module silently shifts the behavior of others despite no shared variable or executable dependency. We formalize this as compositional behavioral leakage (CBL): interference between modules sharing a context window. CBL is enabled by architectural non-isolation: transformer self-attention provides no formal boundary between concatenated modules. We probe CBL on a deployed job-evaluation agent (Claude Sonnet 4.6, 144 trials) through a reusable three-channel protocol that perturbs non-focal modules along volume, content, and form. Only the content channel produces a detectable paired effect (Cohen's d = 0.63, bootstrap 95% CI excluding zero); no recommendation flipped -- a sub-threshold regime invisible to standard QA but compounding across the thousands of decisions a deployed agent makes. CBL is orthogonal to known agent-failure axes (adversarial injection, cognitive degradation, multi-agent fault propagation, privacy leakage). We contribute an operational definition, a reusable protocol, a falsifiable prediction set, and a system-class characterization, establishing cross-module interference measurement as a requirement for prompt-composed agent evaluation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces compositional behavioral leakage (CBL) as interference between prompt modules in agentic systems enabled by transformer self-attention lacking formal boundaries. It reports an empirical probe on a job-evaluation agent (Claude Sonnet 4.6, 144 trials) using a three-channel perturbation protocol on non-focal modules (volume, content, form), finding a detectable paired effect only in the content channel (Cohen's d = 0.63, bootstrap 95% CI excluding zero) with no recommendation flips; this is characterized as a sub-threshold regime that compounds across many decisions. The work supplies an operational definition, reusable protocol, falsifiable prediction set, and system-class characterization, claiming orthogonality to other agent-failure modes.
Significance. If the central empirical result holds after controls for confounds, the paper would establish cross-module interference as a measurable evaluation requirement for prompt-composed agents, supplying a reusable protocol and falsifiable predictions that could be adopted in deployed-system testing. The explicit reporting of effect size, bootstrap CI, and trial count is a strength, as is the attempt to isolate a new failure axis.
major comments (3)
- [§3] §3 (three-channel protocol): the protocol perturbs non-focal modules along volume/content/form but reports no controls such as module-order randomization or placement of semantically matched content into focal vs. non-focal positions; without these, the content-channel effect (Cohen's d = 0.63) cannot be attributed specifically to architectural non-isolation rather than task-specific semantic integration or prompt-structure confounds, which is load-bearing for the central claim.
- [Results] Results (sub-threshold compounding claim): the extrapolation that the observed effect 'compounds across the thousands of decisions a deployed agent makes' is stated without direct measurement of accumulation or multi-decision effects on the same agent, weakening the practical-significance argument.
- [Methods] Methods (statistical details): exact prompt texts, exclusion rules, baseline comparisons, and a priori power analysis are not supplied, leaving open whether the reported paired effect and CI fully support the claim without post-hoc choices.
minor comments (2)
- [Abstract] Abstract and Methods: the term 'Compositional behavioral leakage (CBL)' is introduced as an invented entity without prior literature citation; a brief related-work paragraph would clarify novelty.
- [Results] Notation: 'paired effect' is used without an explicit definition of the pairing (e.g., which trials are paired); a short clarifying sentence would improve reproducibility.
Simulated Author's Rebuttal
We appreciate the referee's detailed feedback on our manuscript. We address each of the major comments below and indicate where revisions will be made.
read point-by-point responses
-
Referee: [§3] §3 (three-channel protocol): the protocol perturbs non-focal modules along volume/content/form but reports no controls such as module-order randomization or placement of semantically matched content into focal vs. non-focal positions; without these, the content-channel effect (Cohen's d = 0.63) cannot be attributed specifically to architectural non-isolation rather than task-specific semantic integration or prompt-structure confounds, which is load-bearing for the central claim.
Authors: We agree that module-order randomization and explicit controls for semantic matching would provide stronger isolation of the architectural effect. Our current design relies on the differential impact across perturbation channels (volume and form showing no effect) to argue against general confounds. However, to address this, we will add a dedicated limitations subsection discussing these potential confounds and include order randomization in future protocol iterations. This is a partial revision as the core protocol remains as is but with added discussion. revision: partial
-
Referee: [Results] Results (sub-threshold compounding claim): the extrapolation that the observed effect 'compounds across the thousands of decisions a deployed agent makes' is stated without direct measurement of accumulation or multi-decision effects on the same agent, weakening the practical-significance argument.
Authors: The referee is correct that we do not provide direct empirical measurement of compounding across multiple decisions. The claim is an extrapolation based on the sub-threshold effect size and the scale of deployed agent usage. We will revise the manuscript to frame this as a hypothesized implication rather than a demonstrated result, and suggest it as an avenue for future work. This addresses the concern without altering the reported findings. revision: yes
-
Referee: [Methods] Methods (statistical details): exact prompt texts, exclusion rules, baseline comparisons, and a priori power analysis are not supplied, leaving open whether the reported paired effect and CI fully support the claim without post-hoc choices.
Authors: We will include the exact prompt templates, exclusion criteria, baseline conditions, and details on the power analysis (based on pilot data targeting d=0.5) in the revised supplementary materials or methods section. This will allow full reproducibility and verification of the statistical claims. revision: yes
Circularity Check
No circularity; empirical measurement stands on experimental data
full rationale
The paper defines CBL operationally, describes a three-channel perturbation protocol, and reports a measured Cohen's d effect size with bootstrap CI from 144 trials on Claude Sonnet 4.6. No equations, fitted parameters renamed as predictions, self-citations, or ansatzes appear in the provided text. The central result is a direct statistical observation rather than a quantity derived by construction from prior inputs or author-defined relations. The derivation chain is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Transformer self-attention provides no formal boundary between concatenated prompt modules.
invented entities (1)
-
Compositional behavioral leakage (CBL)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Sun , title =
Chupei Wang and Jiaqiu V. Sun , title =. ICML 2025 Workshop on Long-Context Foundation Models (LCFM) , year =
2025
-
[2]
2025 , eprint=
The Sum Leaks More Than Its Parts: Compositional Privacy Risks and Mitigations in Multi-Agent Collaboration , author=. 2025 , eprint=
2025
-
[3]
International Conference on Learning Representations (ICLR) , year =
Tushar Khot and Harsh Trivedi and Matthew Finlayson and Yao Fu and Kyle Richardson and Peter Clark and Ashish Sabharwal , title =. International Conference on Learning Representations (ICLR) , year =
-
[4]
Hailin Chen and Amrita Saha and Shafiq Joty and Steven C. H. Hoi , title =. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022. doi:10.18653/v1/2022.emnlp-main.109
-
[5]
Pilault, Jonathan and Liu, Can and Bansal, Mohit and Dreyer, Markus , title =. Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence , articleno =. 2023 , isbn =. doi:10.24963/ijcai.2023/460 , abstract =
-
[6]
Melanie Sclar and Yejin Choi and Yulia Tsvetkov and Alane Suhr , year=. Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How. 2310.11324 , archivePrefix=
work page internal anchor Pith review Pith/arXiv arXiv
-
[7]
The Twelfth International Conference on Learning Representations,
Dingli Yu and Simran Kaur and Arushi Gupta and Jonah Brown. The Twelfth International Conference on Learning Representations,
-
[8]
and Sanyal, Soumya and Welleck, Sean and Ren, Xiang and Ettinger, Allyson and Harchaoui, Zaid and Choi, Yejin , title =
Dziri, Nouha and Lu, Ximing and Sclar, Melanie and Li, Xiang Lorraine and Jiang, Liwei and Lin, Bill Yuchen and West, Peter and Bhagavatula, Chandra and Le Bras, Ronan and Hwang, Jena D. and Sanyal, Soumya and Welleck, Sean and Ren, Xiang and Ettinger, Allyson and Harchaoui, Zaid and Choi, Yejin , title =. Proceedings of the 37th International Conference ...
2023
-
[9]
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
Edoardo Debenedetti and Jie Zhang and Mislav Balunović and Luca Beurer-Kellner and Marc Fischer and Florian Tramèr , year=. 2406.13352 , archivePrefix=
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
Proceedings of the AAAI Conference on Artificial Intelligence , author=
Sweeping Heterogeneity with Smart. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2025 , month=. doi:10.1609/aaai.v39i16.33804 , abstractNote=
-
[11]
International Conference on Learning Representations (ICLR) , year =
Hoyeon Chang and Jinho Park and Hanseul Cho and Sohee Yang and Miyoung Ko and Hyeonbin Hwang and Seungpil Won and Dohaeng Lee and Youbin Ahn and Minjoon Seo , title =. International Conference on Learning Representations (ICLR) , year =
-
[12]
Hammad Atta and Muhammad Zeeshan Baig and Yasir Mehmood and Nadeem Shahzad and Ken Huang and Muhammad Aziz Ul Haq and Muhammad Awais and Kamal Ahmed , year=. 2507.15330 , archivePrefix=
-
[13]
Jin Jia and Zhiling Deng and Zhuangbin Chen and Yingqi Wang and Zibin Zheng , year=. 2602.19843 , archivePrefix=
-
[14]
2025 , eprint=
Single-agent or Multi-agent Systems? Why Not Both? , author=. 2025 , eprint=
2025
-
[15]
Steinberger, Peter , year =
-
[16]
Fernández de Valderrama, Santiago , license =
-
[17]
2025 , howpublished =
Prashanth Subrahmanyam , title =. 2025 , howpublished =
2025
-
[18]
and Tang, Xiangru and Zhuge, Mingchen and Pan, Jiayi and Song, Yueqi and Li, Bowen and Singh, Jaskirat and Tran, Hoang H
Wang, Xingyao and Li, Boxuan and Song, Yufan and Xu, Frank F. and Tang, Xiangru and Zhuge, Mingchen and Pan, Jiayi and Song, Yueqi and Li, Bowen and Singh, Jaskirat and Tran, Hoang H. and Li, Fuqiang and Ma, Ren and Zheng, Mingzhang and Qian, Bill and Shao, Yanjun and Muennighoff, Niklas and Zhang, Yizhe and Hui, Binyuan and Lin, Junyang and Brennan, Robe...
-
[19]
Gauthier, Paul , year =. Aider:
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.