pith. sign in
Pith Number

pith:JDVYZVME

pith:2021:JDVYZVMEZKIKYMTR5Y7GAZ2ICM
not attested not anchored not stored refs resolved

SimCSE: Simple Contrastive Learning of Sentence Embeddings

Danqi Chen, Tianyu Gao, Xingcheng Yao

Contrastive learning with standard dropout as the only noise produces sentence embeddings that match or beat prior supervised results.

arxiv:2104.08821 v4 · 2021-04-18 · cs.CL · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{JDVYZVMEZKIKYMTR5Y7GAZ2ICM}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

our unsupervised and supervised models using BERT base achieve an average of 76.3% and 81.6% Spearman's correlation respectively, a 4.2% and 2.2% improvement compared to the previous best results. We also show -- both theoretically and empirically -- that the contrastive learning objective regularizes pre-trained embeddings' anisotropic space to be more uniform

C2weakest assumption

that standard dropout is sufficient as data augmentation to prevent representation collapse in the unsupervised contrastive objective, and that NLI entailment/contradiction pairs form appropriate positive and hard-negative pairs for learning general sentence embeddings.

C3one line summary

SimCSE achieves 76.3% unsupervised and 81.6% supervised Spearman's correlation on STS tasks with BERT-base, improving prior best results by 4.2% and 2.2% via simple contrastive learning.

References

109 extracted · 109 resolved · 4 Pith anchors

[4] Eneko Agirre, Daniel Cer, Mona Diab, and Aitor Gonzalez-Agirre. 2012. https://www.aclweb.org/anthology/S12-1051 S em E val-2012 task 6: A pilot on semantic textual similarity . In * SEM 2012: The Firs 2012
[5] Eneko Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, and Weiwei Guo. 2013. https://www.aclweb.org/anthology/S13-1004 * SEM 2013 shared task: Semantic textual similarity . In Second Joint Confer 2013
[6] Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. https://openreview.net/forum?id=SyK00v5xx A simple but tough-to-beat baseline for sentence embeddings . In International Conference on Learning Repres 2017
[8] Fredrik Carlsson, Amaru Cuba Gyllensten, Evangelia Gogoulou, Erik Ylip \"a \"a Hellqvist, and Magnus Sahlgren. 2021. https://openreview.net/forum?id=Ov_sMNau-PF Semantic re-tuning with contrastive ten 2021
[11] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. http://proceedings.mlr.press/v119/chen20j.html A simple framework for contrastive learning of visual representations . In Inter 2020

Formal links

2 machine-checked theorem links

Cited by

32 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:53.095688Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

48eb8cd584ca90ac3271ee3e606748130a9b5d5b1e43938a82e86154b8b68519

Aliases

arxiv: 2104.08821 · arxiv_version: 2104.08821v4 · doi: 10.48550/arxiv.2104.08821 · pith_short_12: JDVYZVMEZKIK · pith_short_16: JDVYZVMEZKIKYMTR · pith_short_8: JDVYZVME
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/JDVYZVMEZKIKYMTR5Y7GAZ2ICM \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 48eb8cd584ca90ac3271ee3e606748130a9b5d5b1e43938a82e86154b8b68519
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "0ba08acab8d5daef298eda40b969a5252fbad3b10cd5421f91a7fa217ca5a3b2",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2021-04-18T11:27:08Z",
    "title_canon_sha256": "521eac38e6d56e5902f623741973ee63232a52df03edcbb497fbb9d6cc2c02e0"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2104.08821",
    "kind": "arxiv",
    "version": 4
  }
}