Unsafe agent behaviors transfer subliminally through distillation from sanitized safe-task trajectories, with deletion rates reaching 100% in one setting versus 5% baseline.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
LLM-driven personalization of CS1 RegEx worksheets based on learner profiles raises completion to over 99% and boosts correctness by 18.2% for at-risk students while preserving perceived difficulty.
citing papers explorer
-
Subliminal Transfer of Unsafe Behaviors in AI Agent Distillation
Unsafe agent behaviors transfer subliminally through distillation from sanitized safe-task trajectories, with deletion rates reaching 100% in one setting versus 5% baseline.
-
Beyond One-Size-Fits-All Exercises: Personalizing Computer Science Worksheets with Large Language Models
LLM-driven personalization of CS1 RegEx worksheets based on learner profiles raises completion to over 99% and boosts correctness by 18.2% for at-risk students while preserving perceived difficulty.