pith. sign in
Pith Number

pith:FFKM5KHQ

pith:2021:FFKM5KHQJOWAOGZE2VQIBEC7J6
not attested not anchored not stored refs resolved

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Daya Guo, Duyu Tang, Ge Li, Junjie Huang, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, Shuai Lu, Shujie Liu, Shuo Ren

CodeXGLUE introduces a benchmark with 10 tasks across 14 datasets for code understanding and generation.

arxiv:2102.04664 v2 · 2021-02-09 · cs.SE · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{FFKM5KHQJOWAOGZE2VQIBEC7J6}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

CodeXGLUE includes a collection of 10 tasks across 14 datasets and a platform for model evaluation and comparison.

C2weakest assumption

The selected 10 tasks and 14 datasets are assumed to be representative of the broader space of program understanding and generation problems.

C3one line summary

CodeXGLUE supplies a standardized collection of 10 code-related tasks, 14 datasets, an evaluation platform, and BERT-, GPT-, and encoder-decoder-style baselines.

References

107 extracted · 107 resolved · 20 Pith anchors

[1] T., Devanbu, P., and Sutton, C 2018 · doi:10.1145/3212695
[2] Learning to represent programs with graphs 2017 · arXiv:1711.00740
[3] Miltiadis Allamanis, Hao Peng, and Charles Sutton. 2016. A convolutional at- tention network for extreme summarization of source code. In International conference on machine learning . 2091–2100 2016
[4] Miltiadis Allamanis and Charles Sutton. 2013. Mining Source Code Repositories at Massive Scale using Language Modeling. In 2013 10th Working Conference on Mining Software Repositories (MSR) . IEEE, 20 2013
[5] Miltiadis Allamanis and Charles Sutton. 2014. Mining idioms from source code. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 472–483 2014

Formal links

3 machine-checked theorem links

Cited by

37 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:52.654437Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

2954cea8f04bac071b24d56080905f4f8157d97e96558f7a676ad31e51e54ecc

Aliases

arxiv: 2102.04664 · arxiv_version: 2102.04664v2 · doi: 10.48550/arxiv.2102.04664 · pith_short_12: FFKM5KHQJOWA · pith_short_16: FFKM5KHQJOWAOGZE · pith_short_8: FFKM5KHQ
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/FFKM5KHQJOWAOGZE2VQIBEC7J6 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 2954cea8f04bac071b24d56080905f4f8157d97e96558f7a676ad31e51e54ecc
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "577000571d1832c71d8988455cd2943e664fd4fd9798e9932abb6058814ac2be",
    "cross_cats_sorted": [
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.SE",
    "submitted_at": "2021-02-09T06:16:25Z",
    "title_canon_sha256": "860be0ba9c5444e1c8b2bc4245a205e9d9c1e4bea18ae82f2556b8d32393d9c2"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2102.04664",
    "kind": "arxiv",
    "version": 2
  }
}