Pith Number

pith:FFKM5KHQ

pith:2021:FFKM5KHQJOWAOGZE2VQIBEC7J6

not attested not anchored not stored refs resolved

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Daya Guo, Duyu Tang, Ge Li, Junjie Huang, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, Shuai Lu, Shujie Liu, Shuo Ren

CodeXGLUE introduces a benchmark with 10 tasks across 14 datasets for code understanding and generation.

arxiv:2102.04664 v2 · 2021-02-09 · cs.SE · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{FFKM5KHQJOWAOGZE2VQIBEC7J6}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

CodeXGLUE includes a collection of 10 tasks across 14 datasets and a platform for model evaluation and comparison.

C2weakest assumption

The selected 10 tasks and 14 datasets are assumed to be representative of the broader space of program understanding and generation problems.

C3one line summary

CodeXGLUE supplies a standardized collection of 10 code-related tasks, 14 datasets, an evaluation platform, and BERT-, GPT-, and encoder-decoder-style baselines.

References

107 extracted · 107 resolved · 20 Pith anchors

[1] T., Devanbu, P., and Sutton, C 2018 · doi:10.1145/3212695

[2] Learning to represent programs with graphs 2017 · arXiv:1711.00740

[3] Miltiadis Allamanis, Hao Peng, and Charles Sutton. 2016. A convolutional at- tention network for extreme summarization of source code. In International conference on machine learning . 2091–2100 2016

[4] Miltiadis Allamanis and Charles Sutton. 2013. Mining Source Code Repositories at Massive Scale using Language Modeling. In 2013 10th Working Conference on Mining Software Repositories (MSR) . IEEE, 20 2013

[5] Miltiadis Allamanis and Charles Sutton. 2014. Mining idioms from source code. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 472–483 2014

Formal links

3 machine-checked theorem links

Cited by

37 papers in Pith

CODEPROMPTZIP: Code-specific Prompt Compression for Retrieval-Augmented Generation in Coding Tasks with LMs

A PennyLane-Centric Dataset to Enhance LLM-based Quantum Code Generation using RAG

XOXO: Stealthy Cross-Origin Context Poisoning Attacks against AI Coding Assistants

PseudoBridge: Pseudo Code as the Bridge for Better Semantic and Logic Alignment in Code Retrieval

BioDefect: The First Dataset for Defect Detection in Bioinformatics Software

Receipt and verification

First computed	2026-05-17T23:38:52.654437Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

2954cea8f04bac071b24d56080905f4f8157d97e96558f7a676ad31e51e54ecc

Aliases

arxiv: 2102.04664 · arxiv_version: 2102.04664v2 · doi: 10.48550/arxiv.2102.04664 · pith_short_12: FFKM5KHQJOWA · pith_short_16: FFKM5KHQJOWAOGZE · pith_short_8: FFKM5KHQ

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/FFKM5KHQJOWAOGZE2VQIBEC7J6 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 2954cea8f04bac071b24d56080905f4f8157d97e96558f7a676ad31e51e54ecc

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "577000571d1832c71d8988455cd2943e664fd4fd9798e9932abb6058814ac2be",
    "cross_cats_sorted": [
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.SE",
    "submitted_at": "2021-02-09T06:16:25Z",
    "title_canon_sha256": "860be0ba9c5444e1c8b2bc4245a205e9d9c1e4bea18ae82f2556b8d32393d9c2"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2102.04664",
    "kind": "arxiv",
    "version": 2
  }
}