Benign fine-tuning drifts LLM parameters toward danger directions; SQSD scores each sample by the projection difference of its induced update onto safety versus danger vectors.
Fine-tuning and utilization methods of domain- specific llms.arXiv preprint arXiv:2401.02981
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
From Parameter Dynamics to Risk Scoring : Quantifying Sample-Level Safety Degradation in LLM Fine-tuning
Benign fine-tuning drifts LLM parameters toward danger directions; SQSD scores each sample by the projection difference of its induced update onto safety versus danger vectors.