pith. sign in
Pith Number

pith:IIEAK72Y

pith:2025:IIEAK72YB67YYZTRLIIH2KDEWJ
not attested not anchored not stored refs resolved

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Daniel Fried, Gabriel Synnaeve, Jade Copet, Lingming Zhang, Olivier Duchenne, Quentin Carbonneaux, Rishabh Singh, Sida I. Wang, Yuxiang Wei

Reinforcement learning on open software evolution data enables LLMs to recover developer reasoning and solve 41% of real GitHub issues.

arxiv:2502.18449 v2 · 2025-02-25 · cs.SE · cs.AI · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{IIEAK72YB67YYZTRLIIH2KDEWJ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

our resulting reasoning model, Llama3-SWE-RL-70B, achieves a 41.0% solve rate on SWE-bench Verified -- a human-verified collection of real-world GitHub issues. To our knowledge, this is the best performance reported for medium-sized (<100B) LLMs to date, even comparable to leading proprietary LLMs like GPT-4o.

C2weakest assumption

The assumption that a lightweight rule-based similarity score between ground-truth and generated solutions serves as an effective reward for learning genuine reasoning processes rather than superficial pattern matching.

C3one line summary

SWE-RL uses RL on software evolution data to train LLMs achieving 41% on SWE-bench Verified with generalization to other reasoning tasks.

References

192 extracted · 192 resolved · 15 Pith anchors

[1] Claude 3.5 sonnet model card addendum 2024
[2] Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet 2024
[4] Codet: Code generation with generated tests 2023
[5] Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Mich 2021
[6] Meta large language model compiler: Foundation models of compiler optimization, 2024 2024

Formal links

3 machine-checked theorem links

Cited by

31 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:52.780436Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

4208057f580fbf8c66715a107d2864b2548c8bef57c8e7cc713325e3ddea90f6

Aliases

arxiv: 2502.18449 · arxiv_version: 2502.18449v2 · doi: 10.48550/arxiv.2502.18449 · pith_short_12: IIEAK72YB67Y · pith_short_16: IIEAK72YB67YYZTR · pith_short_8: IIEAK72Y
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/IIEAK72YB67YYZTRLIIH2KDEWJ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 4208057f580fbf8c66715a107d2864b2548c8bef57c8e7cc713325e3ddea90f6
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "730dfc0fa178a079f7ed7e2f80a86ff8ac9f804908893c4a1853b3b8bb5d84b9",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.SE",
    "submitted_at": "2025-02-25T18:45:04Z",
    "title_canon_sha256": "a91330707567ffa0ad37268e5f68258ff99c1316fd5dd035252f3e5e1a1cc008"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2502.18449",
    "kind": "arxiv",
    "version": 2
  }
}