Thought virus: Viral misalignment via subliminal prompting in multi-agent systems.arXiv preprint arXiv:2603.00131

Moritz Weckbecker, Jonas Müller, Ben Hagag, Michael Mulet · 2026 · arXiv 2603.00131

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Subliminal Learning Is Steering Vector Distillation

cs.AI · 2026-05-31 · unverdicted · novelty 7.0

Subliminal learning is steering vector distillation: a student fine-tuned on a steered teacher's outputs learns to imitate the steering vector.

Mitigating Misalignment Contagion by Steering with Implicit Traits

cs.AI · 2026-05-04 · unverdicted · novelty 7.0 · 2 refs

Steering language models with intermittent implicit trait reinforcements reduces misalignment contagion in multi-agent social dilemma games more effectively than system prompt repetition.

Emergent and Subliminal Misalignment Through the Lens of Data-Mediated Transfer

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Emergent and subliminal misalignment in LLMs arise from data structure interactions and transfer via benign distillation data, with stronger effects under shared functional structure and on-policy settings.

Agent-Native Immune System: Architecture, Taxonomy, and Engineering

cs.AI · 2026-06-26 · unverdicted · novelty 4.0

Introduces ANIS as an endogenous, six-layer immune architecture for AI agents with taxonomy of viruses/vaccines and a meta-cognitive Harness Triad for continual adaptation.

citing papers explorer

Showing 4 of 4 citing papers after filters.

Subliminal Learning Is Steering Vector Distillation cs.AI · 2026-05-31 · unverdicted · none · ref 41
Subliminal learning is steering vector distillation: a student fine-tuned on a steered teacher's outputs learns to imitate the steering vector.
Mitigating Misalignment Contagion by Steering with Implicit Traits cs.AI · 2026-05-04 · unverdicted · none · ref 15 · 2 links
Steering language models with intermittent implicit trait reinforcements reduces misalignment contagion in multi-agent social dilemma games more effectively than system prompt repetition.
Emergent and Subliminal Misalignment Through the Lens of Data-Mediated Transfer cs.LG · 2026-05-12 · unverdicted · none · ref 26
Emergent and subliminal misalignment in LLMs arise from data structure interactions and transfer via benign distillation data, with stronger effects under shared functional structure and on-policy settings.
Agent-Native Immune System: Architecture, Taxonomy, and Engineering cs.AI · 2026-06-26 · unverdicted · none · ref 19
Introduces ANIS as an endogenous, six-layer immune architecture for AI agents with taxonomy of viruses/vaccines and a meta-cognitive Harness Triad for continual adaptation.

Thought virus: Viral misalignment via subliminal prompting in multi-agent systems.arXiv preprint arXiv:2603.00131

fields

years

verdicts

representative citing papers

citing papers explorer