pith. sign in
Pith Number

pith:LPM2GGKO

pith:2025:LPM2GGKOMAOVP3AIND2NNJ45OD
not attested not anchored not stored refs resolved

Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving

Aoyan Li, Daoguang Zan, Hanwu Chen, Jing Su, Kai Shen, Liangqiang Chen, Liang Xiang, Linhao Zhang, Lu Chen, Qi Liu, Rui Long, Shulin Xin, Siyao Liu, Tianyu Liu, Wei Liu, Xiaojian Zhong, Yongsheng Xiao, Yuyu Zhang, Zhirong Huang

Multi-SWE-bench supplies 1632 expert-curated issue-resolving tasks across seven languages to test LLMs beyond Python-only benchmarks.

arxiv:2504.02605 v1 · 2025-04-03 · cs.SE · cs.AI · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{LPM2GGKOMAOVP3AIND2NNJ45OD}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

we introduce a multilingual issue-resolving benchmark, called Multi-SWE-bench, covering Java, TypeScript, JavaScript, Go, Rust, C, and C++. It includes a total of 1,632 high-quality instances, which were carefully annotated from 2,456 candidates by 68 expert annotators, ensuring that the benchmark can provide an accurate and reliable evaluation.

C2weakest assumption

The 68 expert annotators' curation from 2,456 candidates to 1,632 instances produces an unbiased, high-quality, and representative set that accurately reflects real-world issue-resolving difficulty across languages.

C3one line summary

Multi-SWE-bench provides 1,632 high-quality issue-resolving instances across Java, TypeScript, JavaScript, Go, Rust, C, and C++ for evaluating LLMs on codebase modifications.

References

23 extracted · 23 resolved · 7 Pith anchors

[1] R. Abreu, P . Zoeteweij, and A. J. Van Gemund. On the accuracy of spectrum-based fault localization. In Testing: Academic and industrial conference practice and research techniques- MUTATION (TAICP AR 2007
[2] M. Allamanis and C. Sutton. Mining source code repositories at massive scale using language modeling. In 2013 10th working conference on mining software repositories (MSR), pages 207–216. IEEE, 2013
[3] Multi-lingual evaluation of code generation models
[4] Program Synthesis with Large Language Models 2025 · arXiv:2108.07732
[5] Evaluating Large Language Models Trained on Code · arXiv:2107.03374

Cited by

25 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:48.785976Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

5bd9a3194e601d57ec0868f4d6a79d70dc64d6e781232b82ed790948529fe591

Aliases

arxiv: 2504.02605 · arxiv_version: 2504.02605v1 · doi: 10.48550/arxiv.2504.02605 · pith_short_12: LPM2GGKOMAOV · pith_short_16: LPM2GGKOMAOVP3AI · pith_short_8: LPM2GGKO
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/LPM2GGKOMAOVP3AIND2NNJ45OD \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 5bd9a3194e601d57ec0868f4d6a79d70dc64d6e781232b82ed790948529fe591
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "808b6ee12521cfa65d940dbff573db470cef0b046828badf3561d86e29929d47",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.SE",
    "submitted_at": "2025-04-03T14:06:17Z",
    "title_canon_sha256": "786fcc79ffc89a2a8b47161e0f7428763a97880976273b83de4674713cf59455"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2504.02605",
    "kind": "arxiv",
    "version": 1
  }
}