Thought virus: Viral misalignment via subliminal prompting in multi-agent systems.arXiv preprint arXiv:2603.00131

Moritz Weckbecker, Jonas Müller, Ben Hagag, Michael Mulet · 2026 · arXiv 2603.00131

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Mitigating Misalignment Contagion by Steering with Implicit Traits

cs.AI · 2026-05-04 · unverdicted · novelty 7.0 · 2 refs

Steering language models with intermittent implicit trait reinforcements reduces misalignment contagion in multi-agent social dilemma games more effectively than system prompt repetition.

Emergent and Subliminal Misalignment Through the Lens of Data-Mediated Transfer

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Emergent and subliminal misalignment in LLMs arise from data structure interactions and transfer via benign distillation data, with stronger effects under shared functional structure and on-policy settings.

citing papers explorer

Showing 2 of 2 citing papers.

Mitigating Misalignment Contagion by Steering with Implicit Traits cs.AI · 2026-05-04 · unverdicted · none · ref 15 · 2 links
Steering language models with intermittent implicit trait reinforcements reduces misalignment contagion in multi-agent social dilemma games more effectively than system prompt repetition.
Emergent and Subliminal Misalignment Through the Lens of Data-Mediated Transfer cs.LG · 2026-05-12 · unverdicted · none · ref 26
Emergent and subliminal misalignment in LLMs arise from data structure interactions and transfer via benign distillation data, with stronger effects under shared functional structure and on-policy settings.

Thought virus: Viral misalignment via subliminal prompting in multi-agent systems.arXiv preprint arXiv:2603.00131

fields

years

verdicts

representative citing papers

citing papers explorer