pith. sign in
Pith Number

pith:65HLDQD3

pith:2019:65HLDQD3BXWBKONNJRWDL5CUMK
not attested not anchored not stored refs resolved

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

Alex Wang, Amanpreet Singh, Felix Hill, Julian Michael, Nikita Nangia, Omer Levy, Samuel R. Bowman, Yada Pruksachatkun

SuperGLUE introduces a new set of harder language understanding tasks after models surpass non-expert humans on GLUE.

arxiv:1905.00537 v3 · 2019-05-02 · cs.CL · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{65HLDQD3BXWBKONNJRWDL5CUMK}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Performance on the GLUE benchmark has recently surpassed the level of non-expert humans, suggesting limited headroom for further research, motivating SuperGLUE with a new set of more difficult language understanding tasks.

C2weakest assumption

That the newly selected tasks are sufficiently harder and more diagnostic of general language understanding than the original GLUE tasks, without introducing new biases or artifacts that models can exploit.

C3one line summary

SuperGLUE is a new benchmark with more difficult language understanding tasks, a toolkit, and leaderboard to drive further progress beyond GLUE.

References

135 extracted · 135 resolved · 16 Pith anchors

[1] Tenney and Yada Pruksachatkun and Katherin Yu and Jan Hula and Patrick Xia and Raghu Pappagari and Shuning Jin and R
[2] Zhang, Sheng and Liu, Xiaodong and Liu, Jingjing and Gao, Jianfeng and Duh, Kevin and Van Durme, Benjamin , journal=
[3] BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions , author=. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Hum 2019
[4] Zhilin Yang and Zihang Dai and Yiming Yang and Jaime Carbonell and Ruslan Salakhutdinov and Quoc V. Le , journal=
[5] Lipstick on a Pig: D ebiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them 2019

Formal links

2 machine-checked theorem links

Cited by

32 papers in Pith

Receipt and verification
First computed 2026-05-17T23:39:05.121814Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

f74eb1c07b0dec1539ad4c6c35f45462a8112c3b8c52c3a3a28d740bbc551349

Aliases

arxiv: 1905.00537 · arxiv_version: 1905.00537v3 · doi: 10.48550/arxiv.1905.00537 · pith_short_12: 65HLDQD3BXWB · pith_short_16: 65HLDQD3BXWBKONN · pith_short_8: 65HLDQD3
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/65HLDQD3BXWBKONNJRWDL5CUMK \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: f74eb1c07b0dec1539ad4c6c35f45462a8112c3b8c52c3a3a28d740bbc551349
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "1d7e3199da3cedeca7d3f9aef662c6c348f42f2d32022a1007ee3b8d5817fe51",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2019-05-02T00:41:50Z",
    "title_canon_sha256": "d3a724f90da0043f6d84299c8c492aaca2fc396dd0a908865bb9c87e418b0e94"
  },
  "schema_version": "1.0",
  "source": {
    "id": "1905.00537",
    "kind": "arxiv",
    "version": 3
  }
}