pith. sign in
Pith Number

pith:44ZJFCPE

pith:2026:44ZJFCPEX3VWDDBSJAXRZRE5F3
not attested not anchored not stored refs resolved

Task Abstention for Large Language Models in Code Generation

Senrong Xu, Taolue Chen, Xiaoxing Ma, Yanke Zhou, Yuan Yao, Yuhao Tan, Zenan Li

Code-generating LLMs can abstain from tasks likely to produce hallucinations by checking consistency of execution results across multiple generations.

arxiv:2605.17029 v1 · 2026-05-16 · cs.SE · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{44ZJFCPEX3VWDDBSJAXRZRE5F3}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We prove that our approach provides a rigorous, distribution-free theoretical guarantee on its abstention decisions.

C2weakest assumption

That consistency of code execution outcomes across multiple generations can serve as a reliable proxy for detecting likely hallucinations without oracle test cases or external databases.

C3one line summary

A distribution-free abstention rule grounded in multiple hypothesis testing uses execution consistency to let code LLMs avoid hallucination-prone tasks with theoretical guarantees.

References

42 extracted · 42 resolved · 8 Pith anchors

[1] Mankowitz, Esme Sutherland Robson, Pushmeet Kohli, Nando de Freitas, Koray Kavukcuoglu, and Oriol Vinyals · doi:10.1126/science.abq1158
[2] Code Llama: Open Foundation Models for Code · arXiv:2308.12950
[3] A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions · doi:10.1145/3703155
[4] Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models 2023
[5] CodeT: Code Generation with Generated Tests · arXiv:2207.10397
Receipt and verification
First computed 2026-05-20T00:03:36.725924Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

e7329289e4beeb618c32482f1cc49d2ee7e99d238b24795dda83cc3f25c1bdd8

Aliases

arxiv: 2605.17029 · arxiv_version: 2605.17029v1 · doi: 10.48550/arxiv.2605.17029 · pith_short_12: 44ZJFCPEX3VW · pith_short_16: 44ZJFCPEX3VWDDBS · pith_short_8: 44ZJFCPE
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/44ZJFCPEX3VWDDBSJAXRZRE5F3 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: e7329289e4beeb618c32482f1cc49d2ee7e99d238b24795dda83cc3f25c1bdd8
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "4f1ccf25eec5faf3f479ae818a7f7d356430bb51670bf0f7dbad50b02ea40975",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.SE",
    "submitted_at": "2026-05-16T14:58:11Z",
    "title_canon_sha256": "e57c261cd2650ec3a924438c8ec0b19466bad1bf27e72366a52e010d0c884128"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.17029",
    "kind": "arxiv",
    "version": 1
  }
}