pith:WIBKBDV4
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Large language models reach only up to 60 percent success on tasks requiring precise use of diverse function calls from many libraries, far below the 97 percent human level.
arxiv:2406.15877 v4 · 2024-06-22 · cs.SE · cs.AI · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{WIBKBDV4KRKOFYE3MOVUGKOGTE}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Our extensive evaluation of 60 LLMs shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%.
The 1,140 tasks and their test cases accurately represent the challenges of real-world practical coding that requires diverse function calls from many libraries.
BigCodeBench shows LLMs achieve at most 60% on 1,140 tasks needing diverse function calls and complex instructions, compared to 97% human performance.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:39:22.050812Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
b202a08ebc5454e2e09b63ab4329c6991b6ad92dac721646508438456ba6a097
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/WIBKBDV4KRKOFYE3MOVUGKOGTE \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: b202a08ebc5454e2e09b63ab4329c6991b6ad92dac721646508438456ba6a097
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "e27ee9ec1d97a8e6fe8dc014c14fe2b3f77ac5e8a07139bc07cc8139bcea59fa",
"cross_cats_sorted": [
"cs.AI",
"cs.CL"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.SE",
"submitted_at": "2024-06-22T15:52:04Z",
"title_canon_sha256": "224b55f50f07d308debafc307feffe7fc059e5057e530715828242735aa4cb43"
},
"schema_version": "1.0",
"source": {
"id": "2406.15877",
"kind": "arxiv",
"version": 4
}
}