Prior-leaning probability objectives outperform NLL for strong base models on SFT while NLL dominates for weak models, with the switch governed by a model-capability continuum.
Then max q∈∆V−1 F(q) = 11 √ 33−59 768 ≤0.00546, and the maximizer is attained by a vector with qi =q j = 9− √ 33 24 ,all remaining mass1−2q i placed on one coordinate
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum
Prior-leaning probability objectives outperform NLL for strong base models on SFT while NLL dominates for weak models, with the switch governed by a model-capability continuum.