pith. sign in
Pith Number

pith:NS3TKUHL

pith:2026:NS3TKUHLE75KCFYPPQZFWZH7MD
not attested not anchored not stored refs resolved

Not All RAGs Are Created Equal: A Component-Wise Empirical Study for Software Engineering Tasks

Haoyu Wang, Hongjin Leng, Qiang Ke, Shengming Zhao, Yanjie Zhao

Retriever components, especially the algorithm, often influence RAG performance for software engineering tasks more than the generator model.

arxiv:2605.14503 v1 · 2026-05-14 · cs.SE

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{NS3TKUHLE75KCFYPPQZFWZH7MD}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

the retriever-side components, particularly the choice of the retrieval algorithm, often exert a more significant influence on final system performance than the selection of the generator model

C2weakest assumption

That the three chosen SE tasks and the specific models and datasets used are representative enough for the observed component rankings to generalize to other software engineering contexts and real-world codebases.

C3one line summary

Retriever-side choices, particularly the retrieval algorithm, exert more influence on RAG performance than generator selection across code generation, summarization, and repair tasks.

References

56 extracted · 56 resolved · 20 Pith anchors

[1] Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone 2024 · arXiv:2404.14219
[2] Code2vec: Learning distributed representations of code 2019 · doi:10.1145/3290353
[3] Muhammad Arslan, Hussam Ghanem, Saba Munawar, and Christophe Cruz. 2024. A Survey on RAG with LLMs. Procedia Computer Science246 (2024), 3781–3790. doi:10.1016/j.procs.2024.09.178 2024 · doi:10.1016/j.procs.2024.09.178
[4] Cweval: Outcome-driven evaluation on functionality and security of LLM code generation 2025
[5] Nguyen, Hridesh Rajan, Nikolaos Tsantalis, and Danny Dig 2025 · doi:10.1109/icsme64153.2025.00046
Receipt and verification
First computed 2026-05-17T23:39:06.290692Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

6cb73550eb27faa1170f7c325b64ff60ea01fa8ebf2e13632b93574984016f39

Aliases

arxiv: 2605.14503 · arxiv_version: 2605.14503v1 · doi: 10.48550/arxiv.2605.14503 · pith_short_12: NS3TKUHLE75K · pith_short_16: NS3TKUHLE75KCFYP · pith_short_8: NS3TKUHL
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/NS3TKUHLE75KCFYPPQZFWZH7MD \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 6cb73550eb27faa1170f7c325b64ff60ea01fa8ebf2e13632b93574984016f39
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "eefc97159af8dfc774e669a4c648396f531565b4c582975463e78d529d0c725a",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.SE",
    "submitted_at": "2026-05-14T07:47:44Z",
    "title_canon_sha256": "2d9a2a7d20840e20242d6d4a50784bdef00b1b46df3cf6b8b31aee1e2c1d2057"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.14503",
    "kind": "arxiv",
    "version": 1
  }
}