pith. sign in
Pith Number

pith:XAR7ZRDS

pith:2021:XAR7ZRDSJJJ7TWLZ4LXLLIT572
not attested not anchored not stored refs resolved

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation

Shafiq Joty, Steven C.H. Hoi, Weishi Wang, Yue Wang

CodeT5 is a unified encoder-decoder model that pre-trains by distinguishing and recovering developer-assigned identifiers to handle both code understanding and generation.

arxiv:2109.00859 v1 · 2021-09-02 · cs.CL · cs.PL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{XAR7ZRDSJJJ7TWLZ4LXLLIT572}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

CodeT5 significantly outperforms prior methods on understanding tasks such as code defect detection and clone detection, and generation tasks across various directions including PL-NL, NL-PL, and PL-PL.

C2weakest assumption

That the identifier-aware pre-training objective and bimodal dual generation task provide gains that generalize beyond the specific datasets and fine-tuning regimes used in the experiments.

C3one line summary

CodeT5 adds identifier-aware pre-training and bimodal dual generation to a T5-style encoder-decoder, yielding better results on defect detection, clone detection, and code-to-text, text-to-code, and code-to-code tasks than prior encoder-only or decoder-only models.

References

71 extracted · 71 resolved · 7 Pith anchors

[2] Evaluating Large Language Models Trained on Code 2021 · arXiv:2107.03374
[3] Le, and Christopher D 2020
[5] Alexis Conneau and Guillaume Lample. 2019. https://proceedings.neurips.cc/paper/2019/hash/c04c19c2c2474dbf5f7ac4372c5b9af1-Abstract.html Cross-lingual language model pretraining . In Advances in Neura 2019
[7] Jacob Devlin, Ming - Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://www.aclweb.org/anthology/N19-1423/ BERT: pre-training of deep bidirectional transformers for language understanding . 2019
[8] Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao - Wuen Hon. 2019. https://proceedings.neurips.cc/paper/2019/hash/c20bb2d9a50d5ac1f713f8b34d9aac5a-Ab 2019

Formal links

2 machine-checked theorem links

Cited by

41 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:52.686273Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

b823fcc4724a53f9d979e2eeb5a27dfe87fba7ddacfa6c5ab32d9e3501f78fd2

Aliases

arxiv: 2109.00859 · arxiv_version: 2109.00859v1 · doi: 10.48550/arxiv.2109.00859 · pith_short_12: XAR7ZRDSJJJ7 · pith_short_16: XAR7ZRDSJJJ7TWLZ · pith_short_8: XAR7ZRDS
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/XAR7ZRDSJJJ7TWLZ4LXLLIT572 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: b823fcc4724a53f9d979e2eeb5a27dfe87fba7ddacfa6c5ab32d9e3501f78fd2
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "6a223f9984373fbed5eb2a35710c6cdf294e97eb542a5e7ab4dcf9dea6aa93de",
    "cross_cats_sorted": [
      "cs.PL"
    ],
    "license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2021-09-02T12:21:06Z",
    "title_canon_sha256": "8ad9c618477e458d6c6bbac1583f178e2fd6d63a222bf29a7dd18c5485e4ad34"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2109.00859",
    "kind": "arxiv",
    "version": 1
  }
}