For each model and training channel (SFT, OPD, OPTD), we use the learning rate of 1×10−4 using AdamW

· 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Emergent and Subliminal Misalignment Through the Lens of Data-Mediated Transfer

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Emergent and subliminal misalignment in LLMs arise from data structure interactions and transfer via benign distillation data, with stronger effects under shared functional structure and on-policy settings.

citing papers explorer

Showing 1 of 1 citing paper.

Emergent and Subliminal Misalignment Through the Lens of Data-Mediated Transfer cs.LG · 2026-05-12 · unverdicted · none · ref 34
Emergent and subliminal misalignment in LLMs arise from data structure interactions and transfer via benign distillation data, with stronger effects under shared functional structure and on-policy settings.

For each model and training channel (SFT, OPD, OPTD), we use the learning rate of 1×10−4 using AdamW

fields

years

verdicts

representative citing papers

citing papers explorer