pith:PNR7WYFD
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Self-play fine-tuning turns a weak supervised LLM into a strong one by iteratively contrasting its own generations against fixed human data.
arxiv:2401.01335 v3 · 2024-01-02 · cs.LG · cs.AI · cs.CL · stat.ML
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{PNR7WYFDY56BHRSVDE7CYNLHTD}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
The global optimum to the training objective function of our method is achieved only when the LLM policy aligns with the target data distribution. Empirically, SPIN can significantly improve the LLM's performance across a variety of benchmarks and even outperform models trained through direct preference optimization (DPO) supplemented with extra GPT-4 preference data.
That the self-generated responses from earlier model iterations provide useful contrastive signals without introducing persistent biases or distribution shifts that would prevent steady improvement toward the human data distribution.
SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:39:21.380083Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
7b63fb60a3c77c13c655193e2c356798d5009e3bb0cd862eebc10e7ca5dd0fcf
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/PNR7WYFDY56BHRSVDE7CYNLHTD \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 7b63fb60a3c77c13c655193e2c356798d5009e3bb0cd862eebc10e7ca5dd0fcf
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "925cc3c9884b19ea31170356b7ee90c6ebd9eec1148b0fe5e311970cc28cec29",
"cross_cats_sorted": [
"cs.AI",
"cs.CL",
"stat.ML"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.LG",
"submitted_at": "2024-01-02T18:53:13Z",
"title_canon_sha256": "2f69f69cbc581696e830d29dd6d32aeed783be8aefed4b103ddfce31006cb938"
},
"schema_version": "1.0",
"source": {
"id": "2401.01335",
"kind": "arxiv",
"version": 3
}
}