Benign fine-tuning of foundation models induces large, heterogeneous, and often contradictory changes in safety metrics across general and domain-specific benchmarks.
The construct of content validity.Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, 45(1):83–117, November 1998
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CY 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Safety Drift After Fine-Tuning: Evidence from High-Stakes Domains
Benign fine-tuning of foundation models induces large, heterogeneous, and often contradictory changes in safety metrics across general and domain-specific benchmarks.