pith:FEB6QFFT
SWE-smith: Scaling Data for Software Engineering Agents
SWE-smith automatically synthesizes 50k task instances from 128 Python repositories to train an open-source agent that resolves 40.2 percent of SWE-bench Verified issues.
arxiv:2504.21798 v2 · 2025-04-30 · cs.SE · cs.AI · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{FEB6QFFTRDQKX5RSDAAB5MODZL}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Using SWE-smith, we create a dataset of 50k instances sourced from 128 GitHub repositories, an order of magnitude larger than all previous works. We train SWE-agent-LM-32B, achieving 40.2% Pass@1 resolve rate on the SWE-bench Verified benchmark, state of the art among open source models.
The automatically synthesized task instances that break tests are of sufficient quality, diversity, and realism to train models that generalize to real software engineering tasks, without requiring extensive human validation or filtering.
SWE-smith scales software engineering training data to 50k instances across 128 repositories, enabling SWE-agent-LM-32B to achieve 40.2% Pass@1 on SWE-bench Verified, state of the art among open-source models.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:52.791429Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
2903e814b388e0abf63218001eb1c3cadd13fc958cdc3344c85f333878871b2d
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/FEB6QFFTRDQKX5RSDAAB5MODZL \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 2903e814b388e0abf63218001eb1c3cadd13fc958cdc3344c85f333878871b2d
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "60c9c44a3ac8bee3645aa18c38296122a0252a61c6f42dcfb7d6f74ddfe56a75",
"cross_cats_sorted": [
"cs.AI",
"cs.CL"
],
"license": "http://creativecommons.org/licenses/by-sa/4.0/",
"primary_cat": "cs.SE",
"submitted_at": "2025-04-30T16:56:06Z",
"title_canon_sha256": "74e623afc4099bc2c0a46366219227b6101d99254dc432c65c1d7065bdfed02f"
},
"schema_version": "1.0",
"source": {
"id": "2504.21798",
"kind": "arxiv",
"version": 2
}
}