pith. sign in
Pith Number

pith:FEB6QFFT

pith:2025:FEB6QFFTRDQKX5RSDAAB5MODZL
not attested not anchored not stored refs resolved

SWE-smith: Scaling Data for Software Engineering Agents

Alexander Wettig, Binyuan Hui, Carlos E. Jimenez, Diyi Yang, John Yang, Kabir Khandpur, Kilian Lieret, Ludwig Schmidt, Ofir Press, Yanzhe Zhang

SWE-smith automatically synthesizes 50k task instances from 128 Python repositories to train an open-source agent that resolves 40.2 percent of SWE-bench Verified issues.

arxiv:2504.21798 v2 · 2025-04-30 · cs.SE · cs.AI · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{FEB6QFFTRDQKX5RSDAAB5MODZL}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Using SWE-smith, we create a dataset of 50k instances sourced from 128 GitHub repositories, an order of magnitude larger than all previous works. We train SWE-agent-LM-32B, achieving 40.2% Pass@1 resolve rate on the SWE-bench Verified benchmark, state of the art among open source models.

C2weakest assumption

The automatically synthesized task instances that break tests are of sufficient quality, diversity, and realism to train models that generalize to real software engineering tasks, without requiring extensive human validation or filtering.

C3one line summary

SWE-smith scales software engineering training data to 50k instances across 128 repositories, enabling SWE-agent-LM-32B to achieve 40.2% Pass@1 on SWE-bench Verified, state of the art among open-source models.

References

32 extracted · 32 resolved · 1 Pith anchors

[1] Huatong Song, Lisheng Huang, Shuang Sun, Jinhao Jiang, Ran Le, Daixuan Cheng, Guoxin Chen, Yiwen Hu, Zongchao Chen, Wayne Xin Zhao, and 1 oth- ers 2024
[2] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments 2024 · arXiv:2404.07972
[3] Occasionally, the README.md file may also contain installation instructions
[4] pip install -e
[5] You can usually find tests in a tests/ or test/ directory

Formal links

2 machine-checked theorem links

Cited by

29 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:52.791429Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

2903e814b388e0abf63218001eb1c3cadd13fc958cdc3344c85f333878871b2d

Aliases

arxiv: 2504.21798 · arxiv_version: 2504.21798v2 · doi: 10.48550/arxiv.2504.21798 · pith_short_12: FEB6QFFTRDQK · pith_short_16: FEB6QFFTRDQKX5RS · pith_short_8: FEB6QFFT
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/FEB6QFFTRDQKX5RSDAAB5MODZL \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 2903e814b388e0abf63218001eb1c3cadd13fc958cdc3344c85f333878871b2d
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "60c9c44a3ac8bee3645aa18c38296122a0252a61c6f42dcfb7d6f74ddfe56a75",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by-sa/4.0/",
    "primary_cat": "cs.SE",
    "submitted_at": "2025-04-30T16:56:06Z",
    "title_canon_sha256": "74e623afc4099bc2c0a46366219227b6101d99254dc432c65c1d7065bdfed02f"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2504.21798",
    "kind": "arxiv",
    "version": 2
  }
}