Instruction Bleed: Cross-Module Interference in Prompt-Composed Agentic Systems

Ching-Yu Lin; Yifan Liu

arxiv: 2606.26356 · v1 · pith:H4JFQDZ2new · submitted 2026-06-24 · 💻 cs.AI · cs.IR· cs.MA

Instruction Bleed: Cross-Module Interference in Prompt-Composed Agentic Systems

Ching-Yu Lin , Yifan Liu This is my paper

Pith reviewed 2026-06-26 01:40 UTC · model grok-4.3

classification 💻 cs.AI cs.IRcs.MA

keywords compositional behavioral leakageprompt-composed agentscross-module interferencethree-channel protocolarchitectural non-isolationsub-threshold effectsagent evaluation

0 comments

The pith

Architectural non-isolation in transformers causes measurable interference between prompt modules in agentic systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that prompt modules concatenated into one context window interfere with one another even without shared variables or dependencies. This interference, called compositional behavioral leakage, is enabled by the lack of boundaries in transformer self-attention. A three-channel test on a job-evaluation agent found a moderate paired effect only when content in non-focal modules was changed, with no output flips. A sympathetic reader would care because the effect sits below standard QA thresholds yet may accumulate across the many decisions a deployed agent makes. The work supplies an operational definition, a reusable protocol, and a claim that such measurement must become part of agent evaluation.

Core claim

Compositional behavioral leakage is interference between modules sharing a context window due to transformer self-attention providing no formal boundary. On a deployed job-evaluation agent across 144 trials, a three-channel perturbation protocol that alters non-focal modules along volume, content, and form dimensions shows a detectable paired effect only in the content channel (Cohen's d = 0.63, bootstrap CI excluding zero). No recommendation is flipped, placing the phenomenon in a sub-threshold regime invisible to ordinary quality assurance but potentially compounding across thousands of decisions. The effect is orthogonal to adversarial injection, cognitive degradation, multi-agent fault p

What carries the argument

The three-channel perturbation protocol that perturbs non-focal modules along volume, content, and form to isolate cross-module interference.

If this is right

Standard QA procedures miss sub-threshold cross-module effects that may still compound over repeated agent decisions.
Cross-module interference measurement becomes a required part of evaluating prompt-composed agents.
CBL is orthogonal to existing agent failure categories such as adversarial injection and multi-agent fault propagation.
The reusable protocol and falsifiable prediction set allow systematic detection of this interference.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the effect compounds as described, agents running over long sessions could exhibit gradual unexplained shifts in behavior.
System builders may need to test for content bleed when combining many prompt modules into one context.
The same non-isolation mechanism could affect any multi-module transformer application that relies on concatenated instructions.

Load-bearing premise

The three-channel perturbation protocol isolates effects caused by architectural non-isolation rather than task-specific confounds or the particular job-evaluation prompt structure.

What would settle it

A replication of the 144-trial experiment on the same job-evaluation agent setup that finds no paired effect in the content channel would falsify the claim of detectable compositional behavioral leakage.

read the original abstract

Practitioners of prompt-composed agentic systems report a recurring failure mode: editing one prompt module silently shifts the behavior of others despite no shared variable or executable dependency. We formalize this as compositional behavioral leakage (CBL): interference between modules sharing a context window. CBL is enabled by architectural non-isolation: transformer self-attention provides no formal boundary between concatenated modules. We probe CBL on a deployed job-evaluation agent (Claude Sonnet 4.6, 144 trials) through a reusable three-channel protocol that perturbs non-focal modules along volume, content, and form. Only the content channel produces a detectable paired effect (Cohen's d = 0.63, bootstrap 95% CI excluding zero); no recommendation flipped -- a sub-threshold regime invisible to standard QA but compounding across the thousands of decisions a deployed agent makes. CBL is orthogonal to known agent-failure axes (adversarial injection, cognitive degradation, multi-agent fault propagation, privacy leakage). We contribute an operational definition, a reusable protocol, a falsifiable prediction set, and a system-class characterization, establishing cross-module interference measurement as a requirement for prompt-composed agent evaluation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Paper names cross-module prompt interference and gives it a testable protocol, but one experiment on one task leaves the architectural claim open to task confounds.

read the letter

The main takeaway is that prompt-composed agents can show behavioral shifts from edits to non-focal modules even without explicit dependencies. The authors formalize this as compositional behavioral leakage, run a three-channel perturbation test on a job-evaluation agent with Claude Sonnet, and report a moderate paired effect only on the content channel (Cohen's d = 0.63). They position the issue as orthogonal to adversarial injection or degradation and supply an operational definition plus falsifiable predictions.

What the paper does cleanly is supply a reusable protocol and concrete numbers from 144 trials with a bootstrap CI. That is more grounded than many agent papers that stop at anecdotes. The sub-threshold regime point is also practical: standard QA might miss effects that accumulate over repeated decisions.

The soft spot is exactly the one the stress-test note flags. The content-channel result could reflect the model integrating any available text into the evaluation logic rather than proving that self-attention creates unavoidable cross-module leakage. The setup uses a single task and model, with no reported controls for module order, semantically matched focal vs. non-focal content, or replication on a structurally different task. The compounding claim is stated but not directly measured. Methods details on prompt text and exclusion rules are also thin in the abstract.

This is aimed at teams that build or evaluate multi-prompt agents. A practitioner looking for new test axes would find the protocol useful even if the causal interpretation needs more work. It is coherent enough on its own terms to merit referee time, though any review would likely press for tighter isolation of the architectural mechanism.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces compositional behavioral leakage (CBL) as interference between prompt modules in agentic systems enabled by transformer self-attention lacking formal boundaries. It reports an empirical probe on a job-evaluation agent (Claude Sonnet 4.6, 144 trials) using a three-channel perturbation protocol on non-focal modules (volume, content, form), finding a detectable paired effect only in the content channel (Cohen's d = 0.63, bootstrap 95% CI excluding zero) with no recommendation flips; this is characterized as a sub-threshold regime that compounds across many decisions. The work supplies an operational definition, reusable protocol, falsifiable prediction set, and system-class characterization, claiming orthogonality to other agent-failure modes.

Significance. If the central empirical result holds after controls for confounds, the paper would establish cross-module interference as a measurable evaluation requirement for prompt-composed agents, supplying a reusable protocol and falsifiable predictions that could be adopted in deployed-system testing. The explicit reporting of effect size, bootstrap CI, and trial count is a strength, as is the attempt to isolate a new failure axis.

major comments (3)

[§3] §3 (three-channel protocol): the protocol perturbs non-focal modules along volume/content/form but reports no controls such as module-order randomization or placement of semantically matched content into focal vs. non-focal positions; without these, the content-channel effect (Cohen's d = 0.63) cannot be attributed specifically to architectural non-isolation rather than task-specific semantic integration or prompt-structure confounds, which is load-bearing for the central claim.
[Results] Results (sub-threshold compounding claim): the extrapolation that the observed effect 'compounds across the thousands of decisions a deployed agent makes' is stated without direct measurement of accumulation or multi-decision effects on the same agent, weakening the practical-significance argument.
[Methods] Methods (statistical details): exact prompt texts, exclusion rules, baseline comparisons, and a priori power analysis are not supplied, leaving open whether the reported paired effect and CI fully support the claim without post-hoc choices.

minor comments (2)

[Abstract] Abstract and Methods: the term 'Compositional behavioral leakage (CBL)' is introduced as an invented entity without prior literature citation; a brief related-work paragraph would clarify novelty.
[Results] Notation: 'paired effect' is used without an explicit definition of the pairing (e.g., which trials are paired); a short clarifying sentence would improve reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We appreciate the referee's detailed feedback on our manuscript. We address each of the major comments below and indicate where revisions will be made.

read point-by-point responses

Referee: [§3] §3 (three-channel protocol): the protocol perturbs non-focal modules along volume/content/form but reports no controls such as module-order randomization or placement of semantically matched content into focal vs. non-focal positions; without these, the content-channel effect (Cohen's d = 0.63) cannot be attributed specifically to architectural non-isolation rather than task-specific semantic integration or prompt-structure confounds, which is load-bearing for the central claim.

Authors: We agree that module-order randomization and explicit controls for semantic matching would provide stronger isolation of the architectural effect. Our current design relies on the differential impact across perturbation channels (volume and form showing no effect) to argue against general confounds. However, to address this, we will add a dedicated limitations subsection discussing these potential confounds and include order randomization in future protocol iterations. This is a partial revision as the core protocol remains as is but with added discussion. revision: partial
Referee: [Results] Results (sub-threshold compounding claim): the extrapolation that the observed effect 'compounds across the thousands of decisions a deployed agent makes' is stated without direct measurement of accumulation or multi-decision effects on the same agent, weakening the practical-significance argument.

Authors: The referee is correct that we do not provide direct empirical measurement of compounding across multiple decisions. The claim is an extrapolation based on the sub-threshold effect size and the scale of deployed agent usage. We will revise the manuscript to frame this as a hypothesized implication rather than a demonstrated result, and suggest it as an avenue for future work. This addresses the concern without altering the reported findings. revision: yes
Referee: [Methods] Methods (statistical details): exact prompt texts, exclusion rules, baseline comparisons, and a priori power analysis are not supplied, leaving open whether the reported paired effect and CI fully support the claim without post-hoc choices.

Authors: We will include the exact prompt templates, exclusion criteria, baseline conditions, and details on the power analysis (based on pilot data targeting d=0.5) in the revised supplementary materials or methods section. This will allow full reproducibility and verification of the statistical claims. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical measurement stands on experimental data

full rationale

The paper defines CBL operationally, describes a three-channel perturbation protocol, and reports a measured Cohen's d effect size with bootstrap CI from 144 trials on Claude Sonnet 4.6. No equations, fitted parameters renamed as predictions, self-citations, or ansatzes appear in the provided text. The central result is a direct statistical observation rather than a quantity derived by construction from prior inputs or author-defined relations. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that transformer self-attention mixes modules without isolation and on the validity of the perturbation protocol as a clean probe of that mixing.

axioms (1)

domain assumption Transformer self-attention provides no formal boundary between concatenated prompt modules.
Stated directly in the abstract as the architectural enabler of CBL.

invented entities (1)

Compositional behavioral leakage (CBL) no independent evidence
purpose: To name and operationalize the cross-module interference phenomenon.
Newly introduced term whose independent evidence is the reported experiment itself.

pith-pipeline@v0.9.1-grok · 5741 in / 1286 out tokens · 33411 ms · 2026-06-26T01:40:09.903338+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 7 canonical work pages · 2 internal anchors

[1]

Sun , title =

Chupei Wang and Jiaqiu V. Sun , title =. ICML 2025 Workshop on Long-Context Foundation Models (LCFM) , year =

2025
[2]

2025 , eprint=

The Sum Leaks More Than Its Parts: Compositional Privacy Risks and Mitigations in Multi-Agent Collaboration , author=. 2025 , eprint=

2025
[3]

International Conference on Learning Representations (ICLR) , year =

Tushar Khot and Harsh Trivedi and Matthew Finlayson and Yao Fu and Kyle Richardson and Peter Clark and Ashish Sabharwal , title =. International Conference on Learning Representations (ICLR) , year =
[4]

Hailin Chen and Amrita Saha and Shafiq Joty and Steven C. H. Hoi , title =. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022. doi:10.18653/v1/2022.emnlp-main.109

work page doi:10.18653/v1/2022.emnlp-main.109 2022
[5]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence , articleno =

Pilault, Jonathan and Liu, Can and Bansal, Mohit and Dreyer, Markus , title =. Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence , articleno =. 2023 , isbn =. doi:10.24963/ijcai.2023/460 , abstract =

work page doi:10.24963/ijcai.2023/460 2023
[6]

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting

Melanie Sclar and Yejin Choi and Yulia Tsvetkov and Alane Suhr , year=. Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How. 2310.11324 , archivePrefix=

work page internal anchor Pith review Pith/arXiv arXiv
[7]

The Twelfth International Conference on Learning Representations,

Dingli Yu and Simran Kaur and Arushi Gupta and Jonah Brown. The Twelfth International Conference on Learning Representations,
[8]

and Sanyal, Soumya and Welleck, Sean and Ren, Xiang and Ettinger, Allyson and Harchaoui, Zaid and Choi, Yejin , title =

Dziri, Nouha and Lu, Ximing and Sclar, Melanie and Li, Xiang Lorraine and Jiang, Liwei and Lin, Bill Yuchen and West, Peter and Bhagavatula, Chandra and Le Bras, Ronan and Hwang, Jena D. and Sanyal, Soumya and Welleck, Sean and Ren, Xiang and Ettinger, Allyson and Harchaoui, Zaid and Choi, Yejin , title =. Proceedings of the 37th International Conference ...

2023
[9]

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

Edoardo Debenedetti and Jie Zhang and Mislav Balunović and Luca Beurer-Kellner and Marc Fischer and Florian Tramèr , year=. 2406.13352 , archivePrefix=

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

Sweeping Heterogeneity with Smart. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2025 , month=. doi:10.1609/aaai.v39i16.33804 , abstractNote=

work page doi:10.1609/aaai.v39i16.33804 2025
[11]

International Conference on Learning Representations (ICLR) , year =

Hoyeon Chang and Jinho Park and Hanseul Cho and Sohee Yang and Miyoung Ko and Hyeonbin Hwang and Seungpil Won and Dohaeng Lee and Youbin Ahn and Minjoon Seo , title =. International Conference on Learning Representations (ICLR) , year =
[12]

2507.15330 , archivePrefix=

Hammad Atta and Muhammad Zeeshan Baig and Yasir Mehmood and Nadeem Shahzad and Ken Huang and Muhammad Aziz Ul Haq and Muhammad Awais and Kamal Ahmed , year=. 2507.15330 , archivePrefix=

work page arXiv
[13]

2602.19843 , archivePrefix=

Jin Jia and Zhiling Deng and Zhuangbin Chen and Yingqi Wang and Zibin Zheng , year=. 2602.19843 , archivePrefix=

work page arXiv
[14]

2025 , eprint=

Single-agent or Multi-agent Systems? Why Not Both? , author=. 2025 , eprint=

2025
[15]

Steinberger, Peter , year =
[16]

Fernández de Valderrama, Santiago , license =
[17]

2025 , howpublished =

Prashanth Subrahmanyam , title =. 2025 , howpublished =

2025
[18]

and Tang, Xiangru and Zhuge, Mingchen and Pan, Jiayi and Song, Yueqi and Li, Bowen and Singh, Jaskirat and Tran, Hoang H

Wang, Xingyao and Li, Boxuan and Song, Yufan and Xu, Frank F. and Tang, Xiangru and Zhuge, Mingchen and Pan, Jiayi and Song, Yueqi and Li, Bowen and Singh, Jaskirat and Tran, Hoang H. and Li, Fuqiang and Ma, Ren and Zheng, Mingzhang and Qian, Bill and Shao, Yanjun and Muennighoff, Niklas and Zhang, Yizhe and Hui, Binyuan and Lin, Junyang and Brennan, Robe...
[19]

Gauthier, Paul , year =. Aider:

[1] [1]

Sun , title =

Chupei Wang and Jiaqiu V. Sun , title =. ICML 2025 Workshop on Long-Context Foundation Models (LCFM) , year =

2025

[2] [2]

2025 , eprint=

The Sum Leaks More Than Its Parts: Compositional Privacy Risks and Mitigations in Multi-Agent Collaboration , author=. 2025 , eprint=

2025

[3] [3]

International Conference on Learning Representations (ICLR) , year =

Tushar Khot and Harsh Trivedi and Matthew Finlayson and Yao Fu and Kyle Richardson and Peter Clark and Ashish Sabharwal , title =. International Conference on Learning Representations (ICLR) , year =

[4] [4]

Hailin Chen and Amrita Saha and Shafiq Joty and Steven C. H. Hoi , title =. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022. doi:10.18653/v1/2022.emnlp-main.109

work page doi:10.18653/v1/2022.emnlp-main.109 2022

[5] [5]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence , articleno =

Pilault, Jonathan and Liu, Can and Bansal, Mohit and Dreyer, Markus , title =. Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence , articleno =. 2023 , isbn =. doi:10.24963/ijcai.2023/460 , abstract =

work page doi:10.24963/ijcai.2023/460 2023

[6] [6]

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting

Melanie Sclar and Yejin Choi and Yulia Tsvetkov and Alane Suhr , year=. Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How. 2310.11324 , archivePrefix=

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

The Twelfth International Conference on Learning Representations,

Dingli Yu and Simran Kaur and Arushi Gupta and Jonah Brown. The Twelfth International Conference on Learning Representations,

[8] [8]

and Sanyal, Soumya and Welleck, Sean and Ren, Xiang and Ettinger, Allyson and Harchaoui, Zaid and Choi, Yejin , title =

Dziri, Nouha and Lu, Ximing and Sclar, Melanie and Li, Xiang Lorraine and Jiang, Liwei and Lin, Bill Yuchen and West, Peter and Bhagavatula, Chandra and Le Bras, Ronan and Hwang, Jena D. and Sanyal, Soumya and Welleck, Sean and Ren, Xiang and Ettinger, Allyson and Harchaoui, Zaid and Choi, Yejin , title =. Proceedings of the 37th International Conference ...

2023

[9] [9]

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

Edoardo Debenedetti and Jie Zhang and Mislav Balunović and Luca Beurer-Kellner and Marc Fischer and Florian Tramèr , year=. 2406.13352 , archivePrefix=

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

Sweeping Heterogeneity with Smart. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2025 , month=. doi:10.1609/aaai.v39i16.33804 , abstractNote=

work page doi:10.1609/aaai.v39i16.33804 2025

[11] [11]

International Conference on Learning Representations (ICLR) , year =

Hoyeon Chang and Jinho Park and Hanseul Cho and Sohee Yang and Miyoung Ko and Hyeonbin Hwang and Seungpil Won and Dohaeng Lee and Youbin Ahn and Minjoon Seo , title =. International Conference on Learning Representations (ICLR) , year =

[12] [12]

2507.15330 , archivePrefix=

Hammad Atta and Muhammad Zeeshan Baig and Yasir Mehmood and Nadeem Shahzad and Ken Huang and Muhammad Aziz Ul Haq and Muhammad Awais and Kamal Ahmed , year=. 2507.15330 , archivePrefix=

work page arXiv

[13] [13]

2602.19843 , archivePrefix=

Jin Jia and Zhiling Deng and Zhuangbin Chen and Yingqi Wang and Zibin Zheng , year=. 2602.19843 , archivePrefix=

work page arXiv

[14] [14]

2025 , eprint=

Single-agent or Multi-agent Systems? Why Not Both? , author=. 2025 , eprint=

2025

[15] [15]

Steinberger, Peter , year =

[16] [16]

Fernández de Valderrama, Santiago , license =

[17] [17]

2025 , howpublished =

Prashanth Subrahmanyam , title =. 2025 , howpublished =

2025

[18] [18]

and Tang, Xiangru and Zhuge, Mingchen and Pan, Jiayi and Song, Yueqi and Li, Bowen and Singh, Jaskirat and Tran, Hoang H

Wang, Xingyao and Li, Boxuan and Song, Yufan and Xu, Frank F. and Tang, Xiangru and Zhuge, Mingchen and Pan, Jiayi and Song, Yueqi and Li, Bowen and Singh, Jaskirat and Tran, Hoang H. and Li, Fuqiang and Ma, Ren and Zheng, Mingzhang and Qian, Bill and Shao, Yanjun and Muennighoff, Niklas and Zhang, Yizhe and Hui, Binyuan and Lin, Junyang and Brennan, Robe...

[19] [19]

Gauthier, Paul , year =. Aider: