Towards understanding subliminal learning: When and how hidden biases transfer

Towards understanding subliminal learning: When, how hidden biases transfer , author= · 2025 · arXiv 2509.23886

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

cs.AI · 2026-05-30 · conditional · novelty 7.0

Subliminal learning is a LoRA artifact that disappears with full finetuning, depends on context tokens like system prompts, and localizes to overlapping finetuning-evaluation tokens.

Learning Through Noise: Why Subliminal Learning Works and When It Fails

cs.LG · 2026-05-22 · unverdicted · novelty 7.0

Subliminal learning occurs via compatible auxiliary and class output heads on task-unrelated inputs, even with random hidden layers or architecture changes, with theory and upper bounds on failure.

Subliminal Steering: Stronger Encoding of Hidden Signals

cs.CL · 2026-04-28 · unverdicted · novelty 7.0

Subliminal steering transfers complex behavioral biases and the underlying steering vector through fine-tuning on innocuous data, achieving higher precision than prior prompt-based methods.

Emergent and Subliminal Misalignment Through the Lens of Data-Mediated Transfer

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Emergent and subliminal misalignment in LLMs arise from data structure interactions and transfer via benign distillation data, with stronger effects under shared functional structure and on-policy settings.

citing papers explorer

Showing 4 of 4 citing papers.

Subliminal Learning is a LoRA Artifact cs.AI · 2026-05-30 · conditional · none · ref 6
Subliminal learning is a LoRA artifact that disappears with full finetuning, depends on context tokens like system prompts, and localizes to overlapping finetuning-evaluation tokens.
Learning Through Noise: Why Subliminal Learning Works and When It Fails cs.LG · 2026-05-22 · unverdicted · none · ref 12
Subliminal learning occurs via compatible auxiliary and class output heads on task-unrelated inputs, even with random hidden layers or architecture changes, with theory and upper bounds on failure.
Subliminal Steering: Stronger Encoding of Hidden Signals cs.CL · 2026-04-28 · unverdicted · none · ref 10
Subliminal steering transfers complex behavioral biases and the underlying steering vector through fine-tuning on innocuous data, achieving higher precision than prior prompt-based methods.
Emergent and Subliminal Misalignment Through the Lens of Data-Mediated Transfer cs.LG · 2026-05-12 · unverdicted · none · ref 16
Emergent and subliminal misalignment in LLMs arise from data structure interactions and transfer via benign distillation data, with stronger effects under shared functional structure and on-policy settings.

Towards understanding subliminal learning: When and how hidden biases transfer

fields

years

verdicts

representative citing papers

citing papers explorer