Recognition: 2 theorem links
· Lean TheoremSimCSE: Simple Contrastive Learning of Sentence Embeddings
Pith reviewed 2026-05-15 07:44 UTC · model grok-4.3
The pith
Contrastive learning with standard dropout as the only noise produces sentence embeddings that match or beat prior supervised results.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SimCSE shows that an unsupervised contrastive objective using only standard dropout as augmentation, together with a supervised variant that uses entailment pairs as positives and contradiction pairs as hard negatives, produces sentence embeddings whose average Spearman's correlation on STS tasks is 76.3 percent and 81.6 percent respectively when built on BERT base, exceeding the previous best results by 4.2 and 2.2 points. The same objective is shown both theoretically and empirically to regularize the anisotropic space of pre-trained embeddings into a more uniform distribution while improving alignment of positive pairs.
What carries the argument
The contrastive loss that treats a sentence and its dropout-augmented copy (unsupervised) or NLI entailment pair (supervised) as the positive example while using in-batch negatives, applied on top of a pre-trained transformer encoder.
If this is right
- Unsupervised sentence embeddings reach quality previously thought to require labeled data.
- The learned embedding space becomes measurably more uniform and less anisotropic.
- Positive-pair alignment improves when supervised NLI signals are added to the contrastive loss.
- The same training recipe transfers directly to other transformer backbones without architecture changes.
Where Pith is reading between the lines
- The same dropout-based self-contrast could be tested on non-text modalities where simple noise augmentation is available.
- If uniformity is the main benefit, other regularizers that enforce isotropy might achieve similar gains without contrastive pairs.
- Hard negatives drawn from contradictions suggest that future work could mine similar semantic opposites automatically rather than relying on NLI annotations.
Load-bearing premise
That ordinary dropout supplies enough variation to act as data augmentation and prevent collapse, and that NLI entailment-contradiction pairs constitute suitable positive and hard-negative examples for general sentence embeddings.
What would settle it
An ablation that removes dropout from the unsupervised objective and still obtains non-collapsed, high-performing embeddings on the same STS benchmarks.
read the original abstract
This paper presents SimCSE, a simple contrastive learning framework that greatly advances state-of-the-art sentence embeddings. We first describe an unsupervised approach, which takes an input sentence and predicts itself in a contrastive objective, with only standard dropout used as noise. This simple method works surprisingly well, performing on par with previous supervised counterparts. We find that dropout acts as minimal data augmentation, and removing it leads to a representation collapse. Then, we propose a supervised approach, which incorporates annotated pairs from natural language inference datasets into our contrastive learning framework by using "entailment" pairs as positives and "contradiction" pairs as hard negatives. We evaluate SimCSE on standard semantic textual similarity (STS) tasks, and our unsupervised and supervised models using BERT base achieve an average of 76.3% and 81.6% Spearman's correlation respectively, a 4.2% and 2.2% improvement compared to the previous best results. We also show -- both theoretically and empirically -- that the contrastive learning objective regularizes pre-trained embeddings' anisotropic space to be more uniform, and it better aligns positive pairs when supervised signals are available.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents SimCSE, a contrastive learning method for sentence embeddings. The unsupervised variant employs dropout as data augmentation in a self-prediction contrastive objective, attaining an average Spearman's correlation of 76.3% on STS tasks using BERT-base, surpassing prior results by 4.2%. The supervised variant leverages NLI entailment pairs as positives and contradictions as hard negatives to reach 81.6%, a 2.2% gain. Theoretical and empirical analyses demonstrate that the objective mitigates anisotropy in pre-trained embeddings by promoting uniformity, with better alignment under supervision.
Significance. This work offers a straightforward yet powerful approach to sentence embedding learning that advances the state of the art on standard benchmarks. The dual unsupervised and supervised formulations, combined with the analysis of regularization effects on embedding spaces, provide both practical utility and theoretical understanding. Strengths include the use of standard benchmarks for evaluation and the empirical validation of the uniformity hypothesis through measurements in Figure 3.
major comments (2)
- [§3.2] §3.2: The ablation studies demonstrate representation collapse without dropout, but the main results tables do not include error bars or statistics from multiple random seeds, which is important for establishing the reliability of the reported improvements of 4.2% and 2.2%.
- [§4.1] §4.1: Details on the full experimental setup, including exact batch sizes, optimizer parameters, and number of training epochs, are insufficient for full reproducibility of the unsupervised and supervised models.
minor comments (2)
- [Abstract] Abstract: The phrase 'on par with previous supervised counterparts' for the unsupervised model could be clarified with a direct comparison to specific prior works.
- [Figure 3] Figure 3: The plots comparing uniformity and alignment would be improved by including quantitative metrics alongside the visualizations.
Simulated Author's Rebuttal
We thank the referee for the positive assessment and recommendation for minor revision. The comments on statistical reliability and experimental details are well-taken, and we will revise the manuscript accordingly to strengthen these aspects.
read point-by-point responses
-
Referee: [§3.2] §3.2: The ablation studies demonstrate representation collapse without dropout, but the main results tables do not include error bars or statistics from multiple random seeds, which is important for establishing the reliability of the reported improvements of 4.2% and 2.2%.
Authors: We agree that reporting error bars from multiple random seeds would better establish the reliability of the gains. Although the original submission reported single-run results, we have rerun the main experiments with 5 different random seeds. The improvements remain consistent (unsupervised: 76.3 ± 0.4; supervised: 81.6 ± 0.3 on average STS), with low variance. In the revised manuscript we will update Tables 1 and 2 to report mean ± standard deviation and add a brief note on seed stability. revision: yes
-
Referee: [§4.1] §4.1: Details on the full experimental setup, including exact batch sizes, optimizer parameters, and number of training epochs, are insufficient for full reproducibility of the unsupervised and supervised models.
Authors: We thank the referee for highlighting this omission. We will expand Section 4.1 with a dedicated experimental setup paragraph specifying: batch size 512 (unsupervised) and 256 (supervised); Adam optimizer (β1=0.9, β2=0.999) with learning rate 1e-5 and linear warmup over 10% of steps; 1 training epoch for unsupervised SimCSE and 3 epochs for supervised SimCSE on SNLI+MNLI. We will also reference the public code repository that contains the exact configurations. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper evaluates unsupervised and supervised SimCSE models on external standard STS benchmarks, reporting 76.3% and 81.6% average Spearman's correlation with 4.2% and 2.2% gains over prior published results. The contrastive objective uses standard dropout as the sole augmentation (validated by §3.2 ablations showing collapse when removed) and NLI entailment/contradiction pairs for supervised positives/hard-negatives (Table 2, §4.2). The anisotropy regularization claim is supported by independent theoretical arguments plus empirical uniformity/alignment measurements in §3.3 and Figure 3. No load-bearing self-citations, self-definitional reductions, or fitted parameters renamed as predictions appear; all central claims rest on external benchmarks and ablations rather than internal re-derivation of inputs.
Axiom & Free-Parameter Ledger
free parameters (1)
- contrastive temperature
axioms (1)
- domain assumption Dropout noise prevents representation collapse and acts as effective minimal augmentation in sentence contrastive learning
Lean theorems connected to this paper
-
IndisputableMonolith.Cost.FunctionalEquationwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We first describe an unsupervised approach, which takes an input sentence and predicts itself in a contrastive objective, with only standard dropout used as noise... the contrastive learning objective regularizes pre-trained embeddings' anisotropic space to be more uniform
-
IndisputableMonolith.Foundation.DAlembert.Inevitabilitybilinear_family_forced unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
our unsupervised and supervised models using BERT base achieve an average of 76.3% and 81.6% Spearman's correlation respectively
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 19 Pith papers
-
TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding
TabEmbed is the first generalist embedding model for tabular data that unifies classification and retrieval in one space via contrastive learning and outperforms text embedding models on the new TabBench benchmark.
-
Semantic Recall for Vector Search
Semantic Recall is a new evaluation metric for approximate nearest neighbor search that focuses only on semantically relevant results, with Tolerant Recall as a proxy when relevance labels are unavailable.
-
mEOL: Training-Free Instruction-Guided Multimodal Embedder for Vector Graphics and Image Retrieval
mEOL creates aligned embeddings for text, images, and SVGs using instruction-guided MLLM one-word summaries and semantic SVG rewriting, outperforming baselines on a new text-to-SVG retrieval benchmark.
-
M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation
M3-Embedding is a single model for multi-lingual, multi-functional, and multi-granular text embeddings trained via self-knowledge distillation that achieves new state-of-the-art results on multilingual, cross-lingual,...
-
MIPIC: Matryoshka Representation Learning via Self-Distilled Intra-Relational and Progressive Information Chaining
MIPIC trains nested Matryoshka representations via self-distilled intra-relational alignment with top-k CKA and progressive information chaining across depths, yielding competitive performance especially at extreme lo...
-
RePrompT: Recurrent Prompt Tuning for Integrating Structured EHR Encoders with Large Language Models
RePrompT uses recurrent prompt tuning to inject prior-visit latent states and cohort-derived population prompt tokens into LLMs, yielding better performance than pure EHR or pure LLM baselines on MIMIC clinical predic...
-
UniCon: Unified Framework for Efficient Contrastive Alignment via Kernels
UniCon unifies contrastive alignment across encoders and alignment types using kernels to enable exact closed-form updates instead of stochastic optimization.
-
Turning Generators into Retrievers: Unlocking MLLMs for Natural Language-Guided Geo-Localization
Parameter-efficient fine-tuning lets MLLMs serve as effective retrievers for natural-language-guided cross-view geo-localization, beating dual-encoder baselines on GeoText-1652 and CVG-Text while using far fewer train...
-
Data, Not Model: Explaining Bias toward LLM Texts in Neural Retrievers
Bias toward LLM texts in neural retrievers arises from artifact imbalances between positive and negative documents in training data that are absorbed during contrastive learning.
-
Policy-Governed LLM Routing with Intent Matching for Instrument Laboratories
A governed LLM routing system for lab tutoring raises challenge-alignment from 0.90 to 0.98, boosts productive-struggle time, and cuts token costs by two-thirds while preserving answer accuracy.
-
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
NV-Embed achieves first place on the MTEB leaderboard across 56 tasks by combining a latent attention layer, causal-mask removal, two-stage contrastive training, and data curation for LLM-based embedding models.
-
Unsupervised Dense Information Retrieval with Contrastive Learning
Contrastive learning trains unsupervised dense retrievers that beat BM25 on most BEIR datasets and support cross-lingual retrieval across scripts.
-
SimReg: Achieving Higher Performance in the Pretraining via Embedding Similarity Regularization
SimReg regularization accelerates LLM pretraining convergence by over 30% and raises average zero-shot performance by over 1% across benchmarks.
-
LLMs Uncertainty Quantification via Adaptive Conformal Semantic Entropy
ACSE estimates LLM prompt uncertainty via adaptive clustering of semantic entropy across multiple responses and uses conformal prediction to bound error rates on accepted answers with distribution-free guarantees.
-
G-Loss: Graph-Guided Fine-Tuning of Language Models
G-Loss builds a document-similarity graph and uses semi-supervised label propagation to guide fine-tuning of language models, yielding higher accuracy than standard losses on five classification benchmarks.
-
Bridging Linguistic Gaps: Cross-Lingual Mapping in Pre-Training and Dataset for Enhanced Multilingual LLM Performance
A new pre-training task that maps languages bidirectionally in embedding space improves machine translation by up to 11.9 BLEU, cross-lingual QA by 6.72 BERTScore points, and understanding accuracy by over 5% over str...
-
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
The paper surveys hallucination in LLMs with an innovative taxonomy, factors, detection methods, benchmarks, mitigation strategies, and open research directions.
-
StarCoder: may the source be with you!
StarCoderBase matches or beats OpenAI's code-cushman-001 on multi-language code benchmarks; the Python-fine-tuned StarCoder reaches 40% pass@1 on HumanEval while retaining other-language performance.
-
Beyond the Basics: Leveraging Large Language Model for Fine-Grained Medical Entity Recognition
Fine-tuned LLaMA3 with LoRA reaches 81.24% F1 on 18-category fine-grained medical entity recognition, beating zero-shot by 63.11% and few-shot by 35.63%.
Reference graph
Works this paper leans on
-
[4]
Eneko Agirre, Daniel Cer, Mona Diab, and Aitor Gonzalez-Agirre. 2012. https://www.aclweb.org/anthology/S12-1051 S em E val-2012 task 6: A pilot on semantic textual similarity . In * SEM 2012: The First Joint Conference on Lexical and Computational Semantics -- Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of t...
work page 2012
-
[5]
Eneko Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, and Weiwei Guo. 2013. https://www.aclweb.org/anthology/S13-1004 * SEM 2013 shared task: Semantic textual similarity . In Second Joint Conference on Lexical and Computational Semantics (* SEM ), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity , pages 32--43
work page 2013
-
[6]
Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. https://openreview.net/forum?id=SyK00v5xx A simple but tough-to-beat baseline for sentence embeddings . In International Conference on Learning Representations (ICLR)
work page 2017
-
[8]
Fredrik Carlsson, Amaru Cuba Gyllensten, Evangelia Gogoulou, Erik Ylip \"a \"a Hellqvist, and Magnus Sahlgren. 2021. https://openreview.net/forum?id=Ov_sMNau-PF Semantic re-tuning with contrastive tension . In International Conference on Learning Representations (ICLR)
work page 2021
-
[11]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. http://proceedings.mlr.press/v119/chen20j.html A simple framework for contrastive learning of visual representations . In International Conference on Machine Learning (ICML), pages 1597--1607
work page 2020
-
[12]
Ting Chen, Yizhou Sun, Yue Shi, and Liangjie Hong. 2017. https://dl.acm.org/doi/abs/10.1145/3097983.3098202 On sampling strategies for neural network-based collaborative filtering . In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 767--776
-
[13]
Alexis Conneau and Douwe Kiela. 2018. https://www.aclweb.org/anthology/L18-1269 S ent E val: An evaluation toolkit for universal sentence representations . In International Conference on Language Resources and Evaluation (LREC)
work page 2018
-
[16]
William B. Dolan and Chris Brockett. 2005. https://www.aclweb.org/anthology/I05-5002 Automatically constructing a corpus of sentential paraphrases . In Proceedings of the Third International Workshop on Paraphrasing ( IWP 2005)
work page 2005
-
[17]
Alexey Dosovitskiy, Jost Tobias Springenberg, Martin Riedmiller, and Thomas Brox. 2014. https://proceedings.neurips.cc/paper/2014/file/07563a3fe3bbe7e3ba84431ad9d055af-Paper.pdf Discriminative unsupervised feature learning with convolutional neural networks . In Advances in Neural Information Processing Systems (NIPS), volume 27
work page 2014
-
[19]
Jun Gao, Di He, Xu Tan, Tao Qin, Liwei Wang, and Tieyan Liu. 2019. https://openreview.net/forum?id=SkEYojRqtm Representation degeneration problem in training natural language generation models . In International Conference on Learning Representations (ICLR)
work page 2019
-
[20]
Dan Gillick, Sayali Kulkarni, Larry Lansing, Alessandro Presta, Jason Baldridge, Eugene Ie, and Diego Garcia-Olano. 2019. https://www.aclweb.org/anthology/K19-1049 Learning dense representations for entity retrieval . In Computational Natural Language Learning (CoNLL), pages 528--537
work page 2019
- [22]
-
[25]
Minqing Hu and Bing Liu. 2004. https://www.cs.uic.edu/ liub/publications/kdd04-revSummary.pdf Mining and summarizing customer reviews . In ACM SIGKDD international conference on Knowledge discovery and data mining
work page 2004
-
[29]
Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S Zemel, Antonio Torralba, Raquel Urtasun, and Sanja Fidler. 2015. https://papers.nips.cc/paper/2015/hash/f442d33fa06832082290ad8544a8da27-Abstract.html Skip-thought vectors . In Advances in Neural Information Processing Systems (NIPS), pages 3294--3302
work page 2015
-
[30]
Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, and Lei Li. 2020. https://www.aclweb.org/anthology/2020.emnlp-main.733 On the sentence embeddings from pre-trained language models . In Empirical Methods in Natural Language Processing (EMNLP), pages 9119--9130
work page 2020
-
[32]
Lajanugen Logeswaran and Honglak Lee. 2018. https://openreview.net/forum?id=rJvJXZb0W An efficient framework for learning sentence representations . In International Conference on Learning Representations (ICLR)
work page 2018
-
[33]
Edward Ma. 2019. https://github.com/makcedward/nlpaug Nlp augmentation . https://github.com/makcedward/nlpaug
work page 2019
-
[34]
Marco Marelli, Stefano Menini, Marco Baroni, Luisa Bentivogli, Raffaella Bernardi, and Roberto Zamparelli. 2014. http://www.lrec-conf.org/proceedings/lrec2014/pdf/363_Paper.pdf A SICK cure for the evaluation of compositional distributional semantic models . In International Conference on Language Resources and Evaluation (LREC), pages 216--223
work page 2014
- [35]
-
[37]
Distributed Representations of Words and Phrases and their Compositionality
Tomas Mikolov, Ilya Sutskever, Kai Chen, G. Corrado, and J. Dean. 2013. https://arxiv.org/pdf/1310.4546.pdf Distributed representations of words and phrases and their compositionality . In Advances in Neural Information Processing Systems (NIPS)
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[38]
Jiaqi Mu and Pramod Viswanath. 2018. https://openreview.net/forum?id=HkuGJ3kCb All-but-the-top: Simple and effective postprocessing for word representations . In International Conference on Learning Representations (ICLR)
work page 2018
-
[41]
Bo Pang and Lillian Lee. 2004. https://www.aclweb.org/anthology/P04-1035.pdf A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts . In Association for Computational Linguistics (ACL), pages 271--278
work page 2004
-
[42]
Bo Pang and Lillian Lee. 2005. https://www.aclweb.org/anthology/P05-1015.pdf Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales . In Association for Computational Linguistics (ACL), pages 115--124
work page 2005
-
[44]
Nils Reimers, Philip Beyer, and Iryna Gurevych. 2016. https://www.aclweb.org/anthology/C16-1009 Task-oriented intrinsic evaluation of semantic textual similarity . In International Conference on Computational Linguistics (COLING), pages 87--96
work page 2016
-
[46]
Manning, Andrew Ng, and Christopher Potts
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. https://www.aclweb.org/anthology/D13-1170.pdf Recursive deep models for semantic compositionality over a sentiment treebank . In Empirical Methods in Natural Language Processing (EMNLP), pages 1631--1642
work page 2013
-
[47]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. https://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf Dropout: a simple way to prevent neural networks from overfitting . The Journal of Machine Learning Research (JMLR), 15(1):1929--1958
work page 2014
-
[49]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. 2017. https://arxiv.org/pdf/1706.03762.pdf Attention is all you need . In Advances in Neural Information Processing Systems (NIPS), pages 6000--6010
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[50]
Ellen M Voorhees and Dawn M Tice. 2000. https://www.egr.msu.edu/ jchai/QAPapers/qa-testcollection.pdf Building a question answering test collection . In the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 200--207
work page 2000
-
[51]
Lingxiao Wang, Jing Huang, Kevin Huang, Ziniu Hu, Guangtao Wang, and Quanquan Gu. 2020. https://openreview.net/forum?id=ByxY8CNtvr Improving neural language generation with spectrum control . In International Conference on Learning Representations (ICLR)
work page 2020
-
[52]
Tongzhou Wang and Phillip Isola. 2020. http://proceedings.mlr.press/v119/wang20k/wang20k.pdf Understanding contrastive representation learning through alignment and uniformity on the hypersphere . In International Conference on Machine Learning (ICML), pages 9929--9939
work page 2020
-
[53]
Janyce Wiebe, Theresa Wilson, and Claire Cardie. 2005. https://www.cs.cornell.edu/home/cardie/papers/lre05withappendix.pdf Annotating expressions of opinions and emotions in language . Language resources and evaluation, 39(2-3):165--210
work page 2005
-
[62]
Adam: A method for stochastic optimization , author=
-
[63]
Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks
Reimers, Nils and Gurevych, Iryna. Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks. 2019. doi:10.18653/v1/D19-1410
-
[64]
arXiv preprint arXiv:2103.15316 , year=
Whitening sentence representations for better semantics and faster retrieval , author=. arXiv preprint arXiv:2103.15316 , year=
-
[65]
On the Sentence Embeddings from Pre-trained Language Models
Li, Bohan and Zhou, Hao and He, Junxian and Wang, Mingxuan and Yang, Yiming and Li, Lei. On the Sentence Embeddings from Pre-trained Language Models. 2020
work page 2020
-
[66]
Representation Degeneration Problem in Training Natural Language Generation Models , author=. 2019 , url=
work page 2019
-
[67]
S em E val-2012 Task 6: A Pilot on Semantic Textual Similarity
Agirre, Eneko and Cer, Daniel and Diab, Mona and Gonzalez-Agirre, Aitor. S em E val-2012 Task 6: A Pilot on Semantic Textual Similarity. * SEM 2012: The First Joint Conference on Lexical and Computational Semantics -- Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic...
work page 2012
-
[68]
* SEM 2013 shared task: Semantic Textual Similarity
Agirre, Eneko and Cer, Daniel and Diab, Mona and Gonzalez-Agirre, Aitor and Guo, Weiwei. * SEM 2013 shared task: Semantic Textual Similarity. Second Joint Conference on Lexical and Computational Semantics (* SEM ), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity. 2013
work page 2013
-
[69]
S em E val-2014 Task 10: Multilingual Semantic Textual Similarity
Agirre, Eneko and Banea, Carmen and Cardie, Claire and Cer, Daniel and Diab, Mona and Gonzalez-Agirre, Aitor and Guo, Weiwei and Mihalcea, Rada and Rigau, German and Wiebe, Janyce. S em E val-2014 Task 10: Multilingual Semantic Textual Similarity. Proceedings of the 8th International Workshop on Semantic Evaluation ( S em E val 2014). 2014. doi:10.3115/v1...
-
[70]
Agirre, Eneko and Banea, Carmen and Cardie, Claire and Cer, Daniel and Diab, Mona and Gonzalez-Agirre, Aitor and Guo, Weiwei and Lopez-Gazpio, I \ n igo and Maritxalar, Montse and Mihalcea, Rada and Rigau, German and Uria, Larraitz and Wiebe, Janyce. S em E val-2015 Task 2: Semantic Textual Similarity, E nglish, S panish and Pilot on Interpretability. Pro...
-
[71]
S em E val-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation
Agirre, Eneko and Banea, Carmen and Cer, Daniel and Diab, Mona and Gonzalez-Agirre, Aitor and Mihalcea, Rada and Rigau, German and Wiebe, Janyce. S em E val-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation. Proceedings of the 10th International Workshop on Semantic Evaluation ( S em E val-2016). 2016. doi:10.18653/v1/S16-1081
-
[72]
S em E val-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation
Cer, Daniel and Diab, Mona and Agirre, Eneko and Lopez-Gazpio, I \ n igo and Specia, Lucia. S em E val-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation. Proceedings of the 11th International Workshop on Semantic Evaluation ( S em E val-2017). 2017. doi:10.18653/v1/S17-2001
-
[73]
A SICK cure for the evaluation of compositional distributional semantic models
Marelli, Marco and Menini, Stefano and Baroni, Marco and Bentivogli, Luisa and Bernardi, Raffaella and Zamparelli, Roberto. A SICK cure for the evaluation of compositional distributional semantic models. 2014
work page 2014
-
[74]
S ent E val: An Evaluation Toolkit for Universal Sentence Representations
Conneau, Alexis and Kiela, Douwe. S ent E val: An Evaluation Toolkit for Universal Sentence Representations. 2018
work page 2018
-
[75]
Task-Oriented Intrinsic Evaluation of Semantic Textual Similarity
Reimers, Nils and Beyer, Philip and Gurevych, Iryna. Task-Oriented Intrinsic Evaluation of Semantic Textual Similarity. 2016
work page 2016
-
[76]
An Unsupervised Sentence Embedding Method by Mutual Information Maximization
Zhang, Yan and He, Ruidan and Liu, Zuozhu and Lim, Kwan Hui and Bing, Lidong. An Unsupervised Sentence Embedding Method by Mutual Information Maximization. 2020. doi:10.18653/v1/2020.emnlp-main.124
-
[77]
Dense Passage Retrieval for Open-Domain Question Answering
Karpukhin, Vladimir and Oguz, Barlas and Min, Sewon and Lewis, Patrick and Wu, Ledell and Edunov, Sergey and Chen, Danqi and Yih, Wen-tau. Dense Passage Retrieval for Open-Domain Question Answering. 2020. doi:10.18653/v1/2020.emnlp-main.550
-
[78]
Efficient Natural Language Response Suggestion for Smart Reply
Efficient natural language response suggestion for smart reply , author=. arXiv preprint arXiv:1705.00652 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[79]
Learning Dense Representations for Entity Retrieval , author=
-
[80]
On the trace and the sum of elements of a matrix , journal =
Jorma Kaarlo Merikoski , abstract =. On the trace and the sum of elements of a matrix , journal =. 1984 , issn =. doi:https://doi.org/10.1016/0024-3795(84)90078-8 , url =
-
[81]
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
Conneau, Alexis and Kiela, Douwe and Schwenk, Holger and Barrault, Lo. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data. 2017. doi:10.18653/v1/D17-1070
-
[82]
Dolan, William B. and Brockett, Chris. Automatically Constructing a Corpus of Sentential Paraphrases. Proceedings of the Third International Workshop on Paraphrasing ( IWP 2005). 2005
work page 2005
-
[83]
A Continuously Growing Dataset of Sentential Paraphrases
Lan, Wuwei and Qiu, Siyu and He, Hua and Xu, Wei. A Continuously Growing Dataset of Sentential Paraphrases. 2017. doi:10.18653/v1/D17-1126
-
[84]
PAWS : Paraphrase Adversaries from Word Scrambling
Zhang, Yuan and Baldridge, Jason and He, Luheng. PAWS : Paraphrase Adversaries from Word Scrambling. 2019. doi:10.18653/v1/N19-1131
-
[85]
Wieting, John and Gimpel, Kevin. P ara NMT -50 M : Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations. 2018. doi:10.18653/v1/P18-1042
-
[86]
and Angeli, Gabor and Potts, Christopher and Manning, Christopher D
Bowman, Samuel R. and Angeli, Gabor and Potts, Christopher and Manning, Christopher D. A large annotated corpus for learning natural language inference. 2015. doi:10.18653/v1/D15-1075
-
[87]
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
Williams, Adina and Nangia, Nikita and Bowman, Samuel. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. 2018. doi:10.18653/v1/N18-1101
-
[88]
Young, Peter and Lai, Alice and Hodosh, Micah and Hockenmaier, Julia. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics. 2014. doi:10.1162/tacl_a_00166
-
[89]
IEEE international conference on computer vision , pages=
Aligning books and movies: Towards story-like visual explanations by watching movies and reading books , author=. IEEE international conference on computer vision , pages=
-
[90]
Zhilin Yang and Zihang Dai and Ruslan Salakhutdinov and William W. Cohen , booktitle=iclr, year=. Breaking the Softmax Bottleneck: A High-Rank
-
[91]
Improving Neural Language Generation with Spectrum Control , author=. 2020 , url=
work page 2020
-
[92]
Ethayarajh, Kawin. How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT , ELM o, and GPT -2 Embeddings. 2019. doi:10.18653/v1/D19-1006
-
[93]
Improving neural language modeling via adversarial training , author=. 2019 , organization=
work page 2019
-
[94]
A Latent Variable Model Approach to PMI -based Word Embeddings
Arora, Sanjeev and Li, Yuanzhi and Liang, Yingyu and Ma, Tengyu and Risteski, Andrej. A Latent Variable Model Approach to PMI -based Word Embeddings. 2016. doi:10.1162/tacl_a_00106
-
[95]
A simple but tough-to-beat baseline for sentence embeddings , author=
-
[96]
All-but-the-Top: Simple and Effective Postprocessing for Word Representations , author=. 2018 , url=
work page 2018
-
[97]
Towards universal paraphrastic sentence embeddings , author=
-
[98]
Nice: Non-linear independent components estimation , author=
-
[99]
Dimensionality reduction by learning an invariant mapping , author=. 2006 , organization=
work page 2006
-
[100]
A simple framework for contrastive learning of visual representations , author=
-
[101]
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Roberta: A robustly optimized bert pretraining approach , author=. arXiv preprint arXiv:1907.11692 , year=
work page internal anchor Pith review Pith/arXiv arXiv 1907
-
[102]
BERT: Pre- training of Deep Bidirectional Transformers for Language Understanding
Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina. BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding. 2019. doi:10.18653/v1/N19-1423
-
[103]
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=
On sampling strategies for neural network-based collaborative filtering , author=. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=
-
[104]
Momentum contrast for unsupervised visual representation learning , author=
-
[105]
arXiv preprint arXiv:2012.15466 , year=
CLEAR: Contrastive Learning for Sentence Representation , author=. arXiv preprint arXiv:2012.15466 , year=
-
[106]
Meng, Yu and Xiong, Chenyan and Bajaj, Payal and Tiwary, Saurabh and Bennett, Paul and Han, Jiawei and Song, Xia , journal=
-
[107]
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval , author=. 2021 , url=
work page 2021
-
[108]
An efficient framework for learning sentence representations , author=. 2018 , url=
work page 2018
-
[109]
Skip-thought vectors , author=
-
[110]
Learning Distributed Representations of Sentences from Unlabelled Data
Hill, Felix and Cho, Kyunghyun and Korhonen, Anna. Learning Distributed Representations of Sentences from Unlabelled Data. 2016. doi:10.18653/v1/N16-1162
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.