pith:HVF3LOTT
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Augmenting HumanEval with 80 times more test cases reveals that LLM-generated code contains substantially more functional errors than prior benchmarks detected.
arxiv:2305.01210 v3 · 2023-05-02 · cs.SE · cs.CL · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{HVF3LOTTC34HDWC2LTG2VQFFI3}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Our extensive evaluation across 26 popular LLMs demonstrates that HumanEval+ is able to catch significant amounts of previously undetected wrong code synthesized by LLMs, reducing the pass@k by up-to 19.3-28.9%.
The automatically generated test cases are functionally correct and do not introduce false failures or miss important edge cases in the code under test.
EvalPlus augments HumanEval with 80x more tests via LLM and mutation strategies, exposing up to 28.9% more incorrect LLM-generated code and reversing some model performance rankings.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-18T02:44:08.792377Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
3d4bb5ba7316f871d85a5ccdaac0a546e902c3c27765137d80ddee3bd3d8c681
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/HVF3LOTTC34HDWC2LTG2VQFFI3 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 3d4bb5ba7316f871d85a5ccdaac0a546e902c3c27765137d80ddee3bd3d8c681
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "04b221cba3aa676fae32679075e1337b1723cea41e1848f41ca3a87499c2be97",
"cross_cats_sorted": [
"cs.CL",
"cs.LG"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.SE",
"submitted_at": "2023-05-02T05:46:48Z",
"title_canon_sha256": "e6912ff5b6a9a8d99edc6c7fc3fed66c47a34217d52c0df0c24b74db00f741a2"
},
"schema_version": "1.0",
"source": {
"id": "2305.01210",
"kind": "arxiv",
"version": 3
}
}