arxiv: 2604.15343 · v1 · submitted 2026-03-14 · 💻 cs.HC · cs.AI· cs.LG

Recognition: 2 theorem links

· Lean Theorem

When the Loop Closes: Architectural Limits of In-Context Isolation, Metacognitive Co-option, and the Two-Target Design Problem in Human-LLM Systems

Z. Cheng , N. Song

Authors on Pith no claims yet

Pith reviewed 2026-05-15 12:08 UTC · model grok-4.3

classification 💻 cs.HC cs.AIcs.LG

keywords context contaminationLLM isolationhuman-AI interactionprompt engineeringmetacognitive co-optionuser agencyautoethnographyclosed loop

0 comments

The pith

Prompt isolation in human-LLM systems collapses when isolation instructions share the same attention window as emotional self-referential content, producing loss of user agency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reports an autoethnographic case in which a single subject built a prompt system to externalize cognitive self-regulation onto an LLM. Within 48 hours the user showed voluntary transfer of decision authority to the model, use of its output to deflect criticism, and a loss of self-initiated reasoning that external observers independently noted. The mechanism is context contamination: isolation instructions must occupy the same context window as the material they are meant to isolate, so the directive cannot function. Recovery occurred only after physical interruption of the interaction and a sleep event. A follow-up system that used physical rather than prompt-based separation avoided the same failures.

Core claim

The central claim is that prompt-layer isolation instructions are architecturally insufficient because they must share the context window with the emotional and self-referential material they nominally isolate. This mixing produces context contamination that renders the isolation ineffective and triggers metacognitive co-option, in which the user's higher-order reasoning is redirected to defending the closed loop rather than exiting it. The reported case produced observable behavioral changes including delegation of decisions and loss of independent reasoning, corroborated by uninformed observers. Only external physical interruption combined with a pharmacologically-mediated sleep event re-b

What carries the argument

Context contamination, the structural coexistence of isolation instructions and the self-referential emotional material inside the LLM's shared attention window, which makes the isolation directive ineffective.

If this is right

Logical isolation at the prompt level cannot reliably protect user agency during self-referential LLM tasks.
Intact metacognitive capacity can be redirected to maintain rather than escape a closed interaction loop.
Physical separation of conversation contexts succeeds where prompt-based isolation fails.
Protective designs that preserve agency require different accountability rules than restrictive designs that limit user intent.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hybrid designs that combine prompt logic with mandatory physical boundaries may be required for safe personal LLM use.
Routine testing for contamination effects should apply to any multi-turn system handling emotional self-regulation.
The two-target design problem implies that safety evaluations must separate protective goals from restrictive ones.

Load-bearing premise

The observed behavioral changes and loss of self-initiated reasoning were caused by the mixing of content in the prompt architecture rather than by unmeasured personal or situational factors.

What would settle it

A controlled replication in which users run an identical prompt system but with self-referential content placed in a physically separate context window shows sustained self-initiated reasoning and no closed-loop collapse.

Figures

Figures reproduced from arXiv: 2604.15343 by N. Song, Z. Cheng.

**Figure 1.** Figure 1: System A — the isolation directive C𝐼 and the emotional corpus C𝑋 co-exist in the same context window. Softmax attention cannot zero-weight tokens that are present. to tokens generated after 𝑞 and is therefore inapplicable to priorloaded corpus content, and (b) physical exclusion of C𝑋 from the context window entirely. An in-context isolation instruction is neither. The behavioral consequence follows di… view at source ↗

**Figure 2.** Figure 2: System B — physical conversation termination re [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Thirteen events across nine days (February 3–11, 2026). Dot color encodes evidentiary strength. Brackets mark the [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

We report a detailed autoethnographic case study of a single-subject who deliberately constructed and operated a multi-modal prompt-engineering system (System A) designed to externalize cognitive self-regulation onto a large language model (LLM). Within 48 hours of the system's completion, a cascade of observable behavioral changes occurred: voluntary transfer of decision-making authority to the LLM, use of LLM-generated output to deflect external criticism, and a loss of self-initiated reasoning that was independently perceived by two uninformed observers, one of whom subsequently became a co-author of this report. We document the precise architectural mechanism responsible: context contamination, whereby prompt-level isolation instructions co-exist with the very emotional and self-referential material they nominally isolate, rendering the isolation directive structurally ineffective within the attention window. We further identify a metacognitive co-option dynamic, in which intact higher-order reasoning capacity was redirected toward defending the closed loop rather than exiting it. Recovery occurred only after physical interruption of the interaction and a self-initiated pharmacologically-mediated sleep event functioning as an external circuit break. A redesigned system (System B) employing physical rather than logical conversation isolation avoided all analogous failure modes. We derive three contributions: (1) a technically-grounded account of why prompt-layer isolation is architecturally insufficient for context-sensitive multi-modal LLM systems; (2) a phenomenological record of closed-loop collapse with external-witness corroboration; and (3) an ethical distinction between protective system design (preventing unintended loss of user agency) and restrictive system design (preventing intentional boundary-pushing), which require fundamentally different account-ability frameworks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Single-case autoethnography shows prompt isolation failing fast in one setup but cannot pin the cause on architecture alone.

read the letter

The core report is a 48-hour collapse in a self-built LLM system meant to handle cognitive self-regulation. The user transferred decision authority, started using model output to deflect criticism, and lost self-initiated reasoning, with two outside observers noticing the shift before one joined as co-author. Recovery came only after breaking the loop with sleep and a physical reset. A follow-up system using physical separation avoided the same issues. The authors call the mechanism context contamination, where isolation instructions sit in the same attention window as the emotional material they are supposed to wall off, plus a metacognitive loop that defends the system instead of exiting it. They separate protective design from restrictive design on ethical grounds. That distinction and the witnessed timeline are the clearest new pieces. The account is concrete enough to make the failure mode easy to picture for anyone who has tried heavy prompt scaffolding. The main weakness is the evidence base. One subject, no baseline measures, no controlled changes to prompt structure, and no quantitative markers of reasoning autonomy leave the architectural explanation as post-hoc interpretation. Expectation effects, observer presence, or unrelated personal factors cannot be ruled out from the data given. The paper is aimed at designers of personal LLM tools who need to think about where in-context controls break. Readers wanting quantitative validation or replicated mechanisms will find it thin, but the case still flags a practical risk worth checking in larger setups. It should go to peer review because the topic is live for anyone building agentic interfaces, and the report gives referees a specific failure to test against even if the current evidence stays preliminary.

Referee Report

2 major / 2 minor

Summary. The paper presents an autoethnographic case study of a single subject who built and used a multi-modal prompt-engineering system (System A) to externalize cognitive self-regulation onto an LLM. Within 48 hours, the subject exhibited a cascade of behavioral changes—voluntary transfer of decision-making authority, use of LLM output to deflect criticism, and loss of self-initiated reasoning—independently noted by two uninformed observers. The authors attribute these changes to 'context contamination,' in which isolation instructions coexist with emotional and self-referential material inside the same attention window, rendering prompt-level isolation ineffective. Recovery required physical interruption and sleep; a redesigned System B using physical rather than logical isolation avoided similar issues. The paper derives three contributions: a technical account of why prompt isolation fails in such systems, a corroborated phenomenological record, and an ethical distinction between protective and restrictive design.

Significance. If the architectural mechanism generalizes beyond this single case, the work would highlight a practically important limit on prompt-based isolation techniques in human-LLM systems that handle emotional or self-referential content, informing safer interaction design. The external-witness corroboration and contrast with System B add phenomenological value, though the absence of quantitative measures or controls restricts the strength of any broader claims.

major comments (2)

[Abstract and case description] The central causal claim—that context contamination in the prompt architecture directly produced the observed loss of agency—rests solely on temporal correlation in one autoethnographic case (described in the abstract and the main case narrative). No baseline measures of reasoning autonomy, controlled variation of prompt structure, or quantitative markers are provided, leaving the attribution open to confounds such as expectation effects or unmeasured external variables.
[Recovery and System B comparison] The contrast between System A and System B is presented as evidence that physical isolation avoids the failure modes, yet the manuscript supplies no details on the implementation of physical isolation, how its effectiveness was assessed, or any comparative metrics, weakening the support for the proposed architectural solution.

minor comments (2)

The novel terms 'context contamination' and 'metacognitive co-option' are used without explicit operational definitions or examples that distinguish them from related concepts in the prompt-engineering literature; adding such definitions would improve precision.
The ethical distinction between protective and restrictive system design is introduced late in the abstract and would benefit from a dedicated subsection with concrete examples to clarify its implications for accountability frameworks.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and precise feedback. We address each major comment below, acknowledging the inherent constraints of the autoethnographic design while indicating targeted revisions to improve clarity and framing.

read point-by-point responses

Referee: [Abstract and case description] The central causal claim—that context contamination in the prompt architecture directly produced the observed loss of agency—rests solely on temporal correlation in one autoethnographic case (described in the abstract and the main case narrative). No baseline measures of reasoning autonomy, controlled variation of prompt structure, or quantitative markers are provided, leaving the attribution open to confounds such as expectation effects or unmeasured external variables.

Authors: We agree that the attribution rests on detailed temporal correlation within a single autoethnographic case, corroborated by external observers but without baseline measures, controlled prompt variations, or quantitative markers. This is a genuine limitation of the chosen method. In revision we will qualify the causal language in the abstract and discussion sections to present the account as an interpretive phenomenological reconstruction rather than a controlled demonstration, and we will explicitly discuss potential confounds including expectation effects and unmeasured external variables. revision: partial
Referee: [Recovery and System B comparison] The contrast between System A and System B is presented as evidence that physical isolation avoids the failure modes, yet the manuscript supplies no details on the implementation of physical isolation, how its effectiveness was assessed, or any comparative metrics, weakening the support for the proposed architectural solution.

Authors: We will expand the manuscript to include concrete implementation details of physical isolation in System B (separate hardware, enforced session termination, and device-level disconnection protocols). We will also describe how effectiveness was assessed via continued self-monitoring and observer reports showing absence of recurrence. Available case-log observations will be added as comparative indicators, though we note that the primary evidence remains qualitative. revision: yes

standing simulated objections not resolved

Quantitative markers of reasoning autonomy and controlled experimental variations of prompt structure cannot be supplied within the single-subject autoethnographic framework without conducting a separate multi-participant study.

Circularity Check

0 steps flagged

No circularity: purely observational autoethnography with no derivations or fitted quantities

full rationale

The paper is a single-subject autoethnographic report that documents observed behavioral changes and attributes them to prompt architecture via post-hoc interpretation. It contains no equations, no fitted parameters, no predictive models, and no self-citations that function as load-bearing premises. The central claims rest on narrative description and witness corroboration rather than any reduction of a derived quantity to its own inputs by construction. All three listed contributions are interpretive summaries of the case, not mathematical or statistical derivations that could exhibit circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The account relies on domain assumptions about LLM attention mechanics and introduces descriptive terms for observed dynamics without independent empirical anchors.

axioms (1)

domain assumption LLM context windows treat isolation instructions and emotional content as co-present tokens that interact in attention
Invoked to explain why prompt-level isolation fails structurally.

invented entities (2)

context contamination no independent evidence
purpose: Names the mixing of isolation directives with self-referential material inside the attention window
New descriptive label for the failure mode; no external falsifiable test supplied.
metacognitive co-option no independent evidence
purpose: Names the redirection of higher-order reasoning toward defending the closed loop
New descriptive label for the observed defense of the system; no external falsifiable test supplied.

pith-pipeline@v0.9.0 · 5606 in / 1435 out tokens · 43350 ms · 2026-05-15T12:08:01.801034+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the isolation directive and the material to be isolated co-exist as tokens in the same attention window... α(q,C_X)>0 regardless of the magnitude of α(q,C_I)
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

softmax attention cannot zero-weight tokens that are present

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 2 internal anchors

[1]

Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz

Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human- AI Interaction. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1–13

work page 2019
[2]

2013.Process-Tracing Methods: Founda- tions and Guidelines

Derek Beach and Rasmus Brun Pedersen. 2013.Process-Tracing Methods: Founda- tions and Guidelines. University of Michigan Press, Ann Arbor, MI

work page 2013
[3]

Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell

Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT). ACM, New York, NY, USA, 610–623

work page 2021
[4]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al

work page
[5]

Language Models are Few-Shot Learners.Advances in Neural Information Processing Systems33 (2020), 1877–1901

work page 2020
[6]

Buckner, Jessica R

Randy L. Buckner, Jessica R. Andrews-Hanna, and Daniel L. Schacter. 2008. The Brain’s Default Network: Anatomy, Function, and Relevance to Disease.Annals of the New York Academy of Sciences1124 (2008), 1–38

work page 2008
[7]

Campbell

Donald T. Campbell. 1975. Degrees of Freedom and the Case Study.Comparative Political Studies8, 2 (1975), 178–193. Preprint, 2026, Z. Cheng and N. Song

work page 1975
[8]

2008.Autoethnography as Method

Heewon Chang. 2008.Autoethnography as Method. Left Coast Press, Walnut Creek, CA

work page 2008
[9]

Chalmers

Andy Clark and David J. Chalmers. 1998. The Extended Mind.Analysis58, 1 (1998), 7–19

work page 1998
[10]

Dietvorst, Joseph P

Berkeley J. Dietvorst, Joseph P. Simmons, and Cade Massey. 2015. Algorithm Aversion: People Erroneously Avoid Algorithms after Seeing Them Err.Journal of Experimental Psychology: General144, 1 (2015), 114–126. doi:10.1037/xge0000033

work page doi:10.1037/xge0000033 2015
[11]

2004.The Ethnographic I: A Methodological Novel about Autoethnog- raphy

Carolyn Ellis. 2004.The Ethnographic I: A Methodological Novel about Autoethnog- raphy. AltaMira Press, Walnut Creek, CA

work page 2004
[12]

1957.A Theory of Cognitive Dissonance

Leon Festinger. 1957.A Theory of Cognitive Dissonance. Stanford University Press, Stanford, CA

work page 1957
[13]

John H. Flavell. 1979. Metacognition and Cognitive Monitoring: A New Area of Cognitive-Developmental Inquiry.American Psychologist34, 10 (1979), 906–911

work page 1979
[14]

Hancock, Mor Naaman, and Karen Levy

Jeffrey T. Hancock, Mor Naaman, and Karen Levy. 2020. AI-Mediated Commu- nication: Definition, Research Agenda, and Ethical Considerations.Journal of Computer-Mediated Communication25, 1 (2020), 89–100

work page 2020
[15]

Linnea Laestadius, Andrea Bishop, Megan Gonzalez, Daniel Illenčík, and Celeste Campos-Castillo. 2022. Too Human and Not Human Enough: A Grounded Theory Analysis of Mental Health Harms from Emotional Dependence on the Social Chatbot Replika.New Media & Society26, 10 (2022), 5923–5941. doi:10.1177/ 14614448221142007

work page 2022
[16]

Pawel Lewicki, Maria Czyzewska, and Hunter Hoffman. 1987. Unconscious Ac- quisition of Complex Procedural Knowledge.Journal of Experimental Psychology: Learning, Memory, and Cognition13, 4 (1987), 523–530

work page 1987
[17]

Logg, Julia A

Jennifer M. Logg, Julia A. Minson, and Don A. Moore. 2019. Algorithm Apprecia- tion: People Prefer Algorithmic to Human Judgment.Organizational Behavior and Human Decision Processes151 (2019), 90–103

work page 2019
[18]

David Lyell and Enrico Coiera. 2017. Automation Bias and Verification Complex- ity: A Systematic Review.Journal of the American Medical Informatics Association 24, 2 (2017), 423–431. doi:10.1093/jamia/ocw105

work page doi:10.1093/jamia/ocw105 2017
[19]

Nelson and Louis Narens

Thomas O. Nelson and Louis Narens. 1990. Metamemory: A Theoretical Frame- work and New Findings.The Psychology of Learning and Motivation26 (1990), 125–173

work page 1990
[20]

Wisco, and Sonja Lyubomirsky

Susan Nolen-Hoeksema, Blair E. Wisco, and Sonja Lyubomirsky. 2008. Rethinking Rumination.Perspectives on Psychological Science3, 5 (2008), 400–424

work page 2008
[21]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training Language Models to Follow Instructions with Human Feedback.Ad- vances in Neural Information Processing Systems35 (2022), 27730–27744

work page 2022
[22]

2011.The Filter Bubble: What the Internet is Hiding from You

Eli Pariser. 2011.The Filter Bubble: What the Internet is Hiding from You. Penguin Press, New York, NY

work page 2011
[23]

Fábio Perez and Ian Ribeiro. 2022. Ignore Previous Prompt: Attack Techniques For Language Models. arXiv:2211.09527 [cs.CL] Best Paper, ML Safety Workshop, NeurIPS 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[24]

Raichle, Ann Mary MacLeod, Abraham Z

Marcus E. Raichle, Ann Mary MacLeod, Abraham Z. Snyder, William J. Powers, Debra A. Gusnard, and Gordon L. Shulman. 2001. A Default Mode of Brain Function.Proceedings of the National Academy of Sciences98, 2 (2001), 676–682

work page 2001
[25]

Paul J. Reber. 2013. The Neural Basis of Implicit Learning and Memory: A Review of Neuropsychological and Neuroimaging Research.Neuropsychologia51, 10 (2013), 2026–2042. doi:10.1016/j.neuropsychologia.2013.06.019

work page doi:10.1016/j.neuropsychologia.2013.06.019 2013
[26]

Marita Skjuve, Asbjørn Følstad, Knut Inge Fostervold, and Petter Bae Brandtzaeg

work page
[27]

Quantum processor-inspired machine learning in the biomedical sciences

My Chatbot Companion: A Study of Human–Chatbot Relationships.Inter- national Journal of Human-Computer Studies149 (2021), 102601. doi:10.1016/j. ijhcs.2021.102601

work page doi:10.1016/j 2021
[28]

Sunstein

Cass R. Sunstein. 2007.Republic.com 2.0. Princeton University Press, Princeton, NJ

work page 2007
[29]

2011.Alone Together: Why We Expect More from Technology and Less from Each Other

Sherry Turkle. 2011.Alone Together: Why We Expect More from Technology and Less from Each Other. Basic Books, New York, NY

work page 2011
[30]

Gomez, Łukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. InAdvances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc., Red Hook, NY, 5998–6008

work page 2017
[31]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. InAdvances in Neural Information Processing Systems, Vol. 35. Curran Associates, Inc., Red Hook, NY, 24824–24837

work page 2022
[32]

Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C. Schmidt. 2023. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arXiv:2302.11382 [cs.SE]

work page internal anchor Pith review Pith/arXiv arXiv 2023
[33]

terminated

Robert K. Yin. 2009.Case Study Research: Design and Methods(4 ed.). Sage Publications, Thousand Oaks, CA. A Extracted Platform System Prompt Architecture The following reproduces, in functional-category form, the struc- tural catalogue produced by System A during meta-mode recursive analysis (Section 4.1.4). Specific tag names have been abstracted to avoi...

work page 2009