SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.
arXiv preprint arXiv:2003.00307 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2verdicts
UNVERDICTED 2representative citing papers
Class imbalance causes DNNs to underfit minority classes early in training and produce non-generalizable minority representations later by overfitting to minimize overall loss.
citing papers explorer
-
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.
-
On the Impact of Class Imbalance on the Learning Dynamics of Deep Neural Networks:An Intuitive Insight
Class imbalance causes DNNs to underfit minority classes early in training and produce non-generalizable minority representations later by overfitting to minimize overall loss.