Recognition: 2 theorem links
· Lean TheoremSuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Pith reviewed 2026-05-15 01:28 UTC · model grok-4.3
The pith
SuperGLUE introduces a new set of harder language understanding tasks after models surpass non-expert humans on GLUE.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Performance on the GLUE benchmark has recently surpassed the level of non-expert humans, suggesting limited headroom for further research. The authors therefore present SuperGLUE, a new benchmark styled after GLUE that supplies a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard at super.gluebenchmark.com.
What carries the argument
The SuperGLUE benchmark, which replaces GLUE with a new collection of more challenging tasks chosen to remain diagnostic of general language understanding.
If this is right
- Language model development will shift evaluation focus to the new, more demanding tasks in SuperGLUE.
- Reported progress will reflect performance on tasks that remain below non-expert human levels.
- The public leaderboard will standardize comparison across systems on the harder task set.
- Research incentives will favor methods that handle the added difficulty rather than GLUE-specific shortcuts.
Where Pith is reading between the lines
- Adoption of SuperGLUE could accelerate models that transfer more reliably to unseen real-world language scenarios.
- Success on SuperGLUE might still require separate checks that the gains reflect understanding rather than benchmark-specific patterns.
- Future benchmark designers may need to repeat this cycle as performance on SuperGLUE itself saturates.
Load-bearing premise
The newly chosen tasks are harder and more diagnostic of general language understanding than the original GLUE tasks without introducing new exploitable biases or artifacts.
What would settle it
If leading models reach or exceed human performance on the full SuperGLUE suite within a year using only the same pretraining and transfer methods that saturated GLUE, the claim that SuperGLUE restores meaningful headroom would be undermined.
read the original abstract
In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced a little over one year ago, offers a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently surpassed the level of non-expert humans, suggesting limited headroom for further research. In this paper we present SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard. SuperGLUE is available at super.gluebenchmark.com.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper observes that recent advances in pretraining and transfer learning have driven model performance on the GLUE benchmark above non-expert human levels, implying limited headroom for further progress. It introduces SuperGLUE, a successor benchmark consisting of eight more challenging language-understanding tasks (BoolQ, CB, COPA, MultiRC, ReCoRD, RTE, WiC, WSC), together with a software toolkit and public leaderboard.
Significance. If the new tasks indeed offer greater headroom and more diagnostic evaluation of general language understanding, SuperGLUE would serve as a valuable next standard benchmark, extending the impact of GLUE. The accompanying toolkit and leaderboard constitute practical, reproducible contributions that lower barriers to adoption and enable consistent community comparisons.
minor comments (2)
- [Abstract] Abstract: the saturation claim would be strengthened by a brief citation to the specific results or papers documenting model performance exceeding non-expert human baselines on GLUE.
- [§2] Task introduction section: a short table or paragraph explicitly comparing average model-human gaps on GLUE versus the proposed SuperGLUE tasks would make the 'stickier' claim more concrete and easier to evaluate.
Simulated Author's Rebuttal
We thank the referee for their positive review and recommendation to accept the manuscript. We appreciate the recognition that SuperGLUE offers greater headroom and diagnostic value for general language understanding.
Circularity Check
No significant circularity
full rationale
The paper proposes SuperGLUE motivated by the external empirical observation that GLUE performance has surpassed non-expert human levels. No derivation chain, equations, fitted parameters, or predictions are present. The central premise relies on publicly verifiable model results rather than any self-citation that reduces the argument to unverified inputs by construction. No self-definitional, fitted-input, uniqueness-imported, or ansatz-smuggled steps appear. The work is a benchmark proposal and toolkit release, self-contained against external performance data.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Language understanding can be meaningfully summarized by aggregate performance on a diverse but fixed set of tasks.
Forward citations
Cited by 23 Pith papers
-
Measuring Massive Multitask Language Understanding
Introduces the MMLU benchmark of 57 tasks and shows that current models, including GPT-3, achieve low accuracy far below expert level across academic and professional domains.
-
Queryable LoRA: Instruction-Regularized Routing Over Shared Low-Rank Update Atoms
Queryable LoRA adds dynamic routing over shared low-rank atoms with attention and language-instruction regularization to make parameter-efficient fine-tuning more adaptive across inputs and layers.
-
Language Is Not All You Need: Aligning Perception with Language Models
Kosmos-1 shows strong zero-shot and few-shot results on language tasks, image captioning, visual QA, OCR-free document understanding, and image recognition guided by text instructions.
-
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
RAG models set new state-of-the-art results on open-domain QA by retrieving Wikipedia passages and conditioning a generative model on them, while also producing more factual text than parametric baselines.
-
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colo...
-
PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts
PEML co-optimizes continuous prompts and low-rank adaptations to deliver up to 6.67% average accuracy gains over existing multi-task PEFT methods on GLUE, SuperGLUE, and other benchmarks.
-
SparseForge: Efficient Semi-Structured LLM Sparsification via Annealing of Hessian-Guided Soft-Mask
SparseForge achieves 57.27% zero-shot accuracy on LLaMA-2-7B at 2:4 sparsity using only 5B retraining tokens, beating the dense baseline and nearly matching a 40B-token SOTA method.
-
Defending Against Indirect Prompt Injection Attacks With Spotlighting
Spotlighting prompt transformations cut indirect prompt injection success rates from >50% to <2% on GPT models while preserving task performance.
-
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
DeepSpeed-Ulysses keeps communication volume constant for sequence-parallel attention when sequence length and device count scale together, delivering 2.5x faster training on 4x longer sequences than prior SOTA.
-
Retentive Network: A Successor to Transformer for Large Language Models
RetNet is a new sequence modeling architecture that delivers parallel training, constant-time inference, and competitive language modeling performance as a potential replacement for Transformers.
-
Kosmos-2: Grounding Multimodal Large Language Models to the World
Kosmos-2 grounds text to image regions by encoding refer expressions as Markdown links to sequences of location tokens and trains on a new GrIT dataset of grounded image-text pairs.
-
Language Models (Mostly) Know What They Know
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
-
Ethical and social risks of harm from Language Models
The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job...
-
A General Language Assistant as a Laboratory for Alignment
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
-
HuggingFace's Transformers: State-of-the-art Natural Language Processing
Hugging Face releases an open-source Python library that supplies a unified API and pretrained weights for major Transformer architectures used in natural language processing.
-
Complexity Horizons of Compressed Models in Analog Circuit Analysis
Prerequisite graphs map compressed LLM performance boundaries in analog circuit analysis to allow selecting the smallest viable model for a given task complexity.
-
Uncertainty-Aware Transformers: Conformal Prediction for Language Models
CONFIDE applies conformal prediction to transformer embeddings for valid prediction sets, improving accuracy up to 4.09% and efficiency over baselines on models like BERT-tiny.
-
Humanity's Last Exam
Humanity's Last Exam is a new 2,500-question benchmark at the frontier of human knowledge where state-of-the-art LLMs show low accuracy.
-
Detecting Language Model Attacks with Perplexity
Jailbreak prompts with adversarial suffixes have high GPT-2 perplexity, and a LightGBM model on perplexity and length detects most attacks.
-
PaLM 2 Technical Report
PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.
-
RoBERTa: A Robustly Optimized BERT Pretraining Approach
With better hyperparameters, more data, and longer training, an unchanged BERT-Large architecture matches or exceeds XLNet and other successors on GLUE, SQuAD, and RACE.
-
GLU Variants Improve Transformer
Some GLU variants using non-sigmoid nonlinearities improve Transformer quality over ReLU and GELU in feed-forward sublayers.
- Scaling Laws for Neural Language Models
Reference graph
Works this paper leans on
-
[1]
Alex Wang and Ian F. Tenney and Yada Pruksachatkun and Katherin Yu and Jan Hula and Patrick Xia and Raghu Pappagari and Shuning Jin and R. Thomas McCoy and Roma Patel and Yinghui Huang and Jason Phang and Edouard Grave and Haokun Liu and Najoung Kim and Phu Mon Htut and Thibault F'
-
[2]
Zhang, Sheng and Liu, Xiaodong and Liu, Jingjing and Gao, Jianfeng and Duh, Kevin and Van Durme, Benjamin , journal=
-
[3]
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions , author=. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , pages=
work page 2019
-
[4]
Zhilin Yang and Zihang Dai and Yiming Yang and Jaime Carbonell and Ruslan Salakhutdinov and Quoc V. Le , journal=
-
[5]
Gonen, Hila and Goldberg, Yoav. Lipstick on a Pig: D ebiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019
work page 2019
-
[7]
Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems
Kiritchenko, Svetlana and Mohammad, Saif. Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems. Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics. 2018. doi:10.18653/v1/S18-2005
-
[8]
Kaiji Lu and Piotr Mardziel and Fangjing Wu and Preetam Amancharla and Anupam Datta , title =. 2018 , journal =
work page 2018
-
[10]
Edward and Pavlick, Ellie and White, Aaron Steven and Van Durme, Benjamin
Poliak, Adam and Haldar, Aparajita and Rudinger, Rachel and Hu, J. Edward and Pavlick, Ellie and White, Aaron Steven and Van Durme, Benjamin. Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018
work page 2018
-
[12]
Clark, Kevin and Luong, Minh-Thang and Khandelwal, Urvashi and Manning, Christopher D. and Le, Quoc V. , year =. Proceedings of the Association of Computational Linguistics (ACL) , publisher =
-
[13]
Social Bias in Elicited Natural Language Inferences , author=. Proceedings of the First. 2017 , publisher =
work page 2017
-
[14]
International Conference on Machine Learning (ICML) , year=
Born again neural networks , author=. International Conference on Machine Learning (ICML) , year=
-
[16]
Stephen H. Bach and Daniel Rodriguez and Yintao Liu and Chong Luo and Haidong Shao and Cassandra Xia and Souvik Sen and Alexander Ratner and Braden Hancock and Houman Alborzi and Rahul Kuchhal and Christopher R. Snorkel DryBell:. 2018 , publisher =
work page 2018
-
[17]
Evidence Sentence Extraction for Machine Reading Comprehension , author=. 2019 , journal=
work page 2019
-
[18]
Long short-term memory , Year =
Hochreiter, Sepp and Schmidhuber, J. Long short-term memory , Year =. Neural computation , Publisher =
-
[19]
Modeling Empathy and Distress in Reaction to News Stories , year =
Buechel, Sven and Buffone, Anneke and Slaff, Barry and Ungar, Lyle and Sedoc, Jo. Modeling Empathy and Distress in Reaction to News Stories , year =
- [20]
-
[22]
Paszke, Adam and Gross, Sam and Chintala, Soumith and Chanan, Gregory and Yang, Edward and DeVito, Zachary and Lin, Zeming and Desmaison, Alban and Antiga, Luca and Lerer, Adam , year=. Automatic differentiation in. Advances in Neural Information Processing Systems (NeurIPS) , publisher =
-
[23]
Automatically constructing a corpus of sentential paraphrases , author=. Proceedings of IWP , year=
-
[25]
Liu and Matthew Peters and Michael Schmitz and Luke S
Matt Gardner and Joel Grus and Mark Neumann and Oyvind Tafjord and Pradeep Dasigi and Nelson F. Liu and Matthew Peters and Michael Schmitz and Luke S. Zettlemoyer , booktitle=. 2017 , journal =
work page 2017
-
[26]
Advances in Neural Information Processing Systems (NeurIPS) , publisher =
Attention is all you need , author=. Advances in Neural Information Processing Systems (NeurIPS) , publisher =
-
[29]
Improving Language Understanding by Generative Pre-Training , Note =
Radford, Alec and Narasimhan, Karthik and Salimans, Tim and Sutskever, Ilya , Date-Added =. Improving Language Understanding by Generative Pre-Training , Note =
-
[30]
Proceedings of the 25th International Conference on Machine Learning (ICML) , year=
A unified architecture for natural language processing: Deep neural networks with multitask learning , author=. Proceedings of the 25th International Conference on Machine Learning (ICML) , year=
-
[32]
Advances in neural information processing systems , year=
Skip-thought vectors , author=. Advances in neural information processing systems , year=
-
[33]
SentEval : An Evaluation Toolkit for Universal Sentence Representations
Conneau, Alexis and Kiela, Douwe. SentEval : An Evaluation Toolkit for Universal Sentence Representations. Proceedings of the 11th Language Resources and Evaluation Conference. 2018
work page 2018
-
[34]
Alex Wang and Amanpreet Singh and Julian Michael and Felix Hill and Omer Levy and Samuel R. Bowman , booktitle=. 2019 , url=
work page 2019
- [35]
- [36]
-
[37]
Dagan, Ido and Glickman, Oren and Magnini, Bernardo , title="The. Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Textual Entailment. 2006
work page 2006
-
[38]
Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , booktitle=. 2019 , publisher =
work page 2019
-
[40]
Bar Haim, Roy and Dagan, Ido and Dolan, Bill and Ferro, Lisa and Giampiccolo, Danilo and Magnini, Bernardo and Szpektor, Idan , booktitle=. The second. 2006 , url=
work page 2006
-
[41]
2011 AAAI Spring Symposium Series , year=
Choice of plausible alternatives: An evaluation of commonsense causal reasoning , author=. 2011 AAAI Spring Symposium Series , year=
work page 2011
-
[42]
Looking beyond the surface: A challenge set for reading comprehension over multiple sentences , author=. Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) , year=
-
[43]
Pilehvar, Mohammad Taher and Camacho-Collados, Jose , booktitle =. 2019 , url =
work page 2019
-
[44]
Levesque, Hector and Davis, Ernest and Morgenstern, Leora , booktitle=. The. 2012 , publishre =
work page 2012
- [45]
- [46]
-
[47]
The Referential Reader: A Recurrent Entity Network for Anaphora Resolution , author=. 2019 , journal =
work page 2019
- [48]
-
[49]
Transactions of the Association of Computational Linguists , year=
Neural network acceptability judgments , author=. Transactions of the Association of Computational Linguists , year=
-
[50]
Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing , booktitle=. 2002 , publisher=
work page 2002
-
[51]
Re-evaluation the role of bleu in machine translation research , author=. Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL) , year=
-
[53]
Choi, Eunsol and Levy, Omer and Choi, Yejin and Zettlemoyer, Luke. Ultra-Fine Entity Typing. Proceedings of the Association for Computational Linguistics (ACL). 2018
work page 2018
-
[55]
International Conference on Learning Representations (
What do you learn from context? Probing for sentence structure in contextualized word representations , author=. International Conference on Learning Representations (. 2019 , url=
work page 2019
-
[56]
SWAG : A Large-Scale Adversarial Dataset for Grounded Commonsense Inference
Zellers, Rowan and Bisk, Yonatan and Schwartz, Roy and Choi, Yejin. SWAG : A Large-Scale Adversarial Dataset for Grounded Commonsense Inference. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018
work page 2018
-
[59]
Identifying Well-formed Natural Language Questions
Faruqui, Manaal and Das, Dipanjan. Identifying Well-formed Natural Language Questions. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 2018
work page 2018
- [61]
-
[62]
ACM Transactions on Interactive Intelligent Systems (TiiS) , year=
Have You Lost the Thread? Discovering Ongoing Conversations in Scattered Dialog Blocks , author=. ACM Transactions on Interactive Intelligent Systems (TiiS) , year=
-
[63]
A corpus and model integrating multiword expressions and supersenses , author=. Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) , publisher =. 2015 , url=
work page 2015
-
[64]
QuAC : Question Answering in Context
Choi, Eunsol and He, He and Iyyer, Mohit and Yatskar, Mark and Yih, Wen-tau and Choi, Yejin and Liang, Percy and Zettlemoyer, Luke. QuAC : Question Answering in Context. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing ( EMNLP ). 2018
work page 2018
-
[65]
nternational Conference on Language Resources and Evaluation (LREC) , year=
The dialogue breakdown detection challenge: Task description, datasets, and evaluation metrics , author=. nternational Conference on Language Resources and Evaluation (LREC) , year=
-
[66]
McCoy, Richard T. and Linzen, Tal , Booktitle =. Non-entailed subsequences as a challenge for natural language inference , Url =
-
[67]
Proceedings of the Association for Computational Linguistics (ACL)
McCoy, R. Thomas and Pavlick, Ellie and Linzen, Tal , Title =. 2019 , booktitle = "Proceedings of the Association for Computational Linguistics (ACL)", publisher = "Association for Computational Linguistics", url =
work page 2019
-
[68]
Advances in Neural Information Processing Systems (NeurIPS) , year=
Learned in translation: Contextualized word vectors , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=
-
[70]
and Schwartz, Roy and Smith, Noah A
Liu, Nelson F. and Schwartz, Roy and Smith, Noah A. , title =. Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) , year =
-
[71]
and Gardner, Matt and Belinkov, Yonatan and Peters, Matthew E
Liu, Nelson F. and Gardner, Matt and Belinkov, Yonatan and Peters, Matthew E. and Smith, Noah A. , title =. Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) , year =
-
[72]
International Conference on Computational Linguistics (COLING) , year=
Stress Test Evaluation for Natural Language Inference , author=. International Conference on Computational Linguistics (COLING) , year=
-
[74]
Efficient Estimation of Word Representations in Vector Space , author=. 2013 , journal=
work page 2013
-
[75]
Advances in Neural Information Processing Systems (NeurIPS) , year =
Semi-supervised Sequence Learning , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =
-
[77]
Maarten Sap and Hannah Rashkin and Derek Chen and Ronan LeBras and Yejin Choi , year=. Social. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) , url =
-
[79]
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) , year=
Recursive deep models for semantic compositionality over a sentiment treebank , author=. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) , year=
-
[80]
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval) , year =
-
[82]
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
Williams, Adina and Nangia, Nikita and Bowman, Samuel. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). 2018
work page 2018
-
[83]
Stephen H. Bach, Daniel Rodriguez, Yintao Liu, Chong Luo, Haidong Shao, Cassandra Xia, Souvik Sen, Alexander Ratner, Braden Hancock, Houman Alborzi, Rahul Kuchhal, Christopher R \' e , and Rob Malkin. Snorkel drybell: A case study in deploying weak supervision at industrial scale. In SIGMOD. ACM , 2018
work page 2018
-
[84]
The second PASCAL recognising textual entailment challenge
Roy Bar Haim, Ido Dagan, Bill Dolan, Lisa Ferro, Danilo Giampiccolo, Bernardo Magnini, and Idan Szpektor. The second PASCAL recognising textual entailment challenge. In Proceedings of the Second PASCAL Challenges Workshop on Recognising Textual Entailment , 2006. URL http://u.cs.biu.ac.il/ nlp/RTE2/Proceedings/01.pdf
work page 2006
-
[85]
The fifth PASCAL recognizing textual entailment challenge
Luisa Bentivogli, Ido Dagan, Hoa Trang Dang, Danilo Giampiccolo, and Bernardo Magnini. The fifth PASCAL recognizing textual entailment challenge. In Textual Analysis Conference (TAC), 2009. URL http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.232.1231
work page 2009
-
[86]
Modeling empathy and distress in reaction to news stories
Sven Buechel, Anneke Buffone, Barry Slaff, Lyle Ungar, and Jo \ a o Sedoc. Modeling empathy and distress in reaction to news stories. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018
work page 2018
-
[87]
Re-evaluation the role of bleu in machine translation research
Chris Callison-Burch, Miles Osborne, and Philipp Koehn. Re-evaluation the role of bleu in machine translation research. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL). Association for Computational Linguistics, 2006. URL https://www.aclweb.org/anthology/E06-1032
work page 2006
-
[88]
S em E val-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation
Daniel Cer, Mona Diab, Eneko Agirre, Inigo Lopez-Gazpio, and Lucia Specia. Semeval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). Association for Computational Linguistics, 2017. doi:10.18653/v1/S17-2001. URL https://www.acl...
-
[89]
QuAC : Question answering in context
Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, and Luke Zettlemoyer. QuAC : Question answering in context. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing ( EMNLP ) . Association for Computational Linguistics, 2018 a
work page 2018
-
[90]
Eunsol Choi, Omer Levy, Yejin Choi, and Luke Zettlemoyer. Ultra-fine entity typing. In Proceedings of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, 2018 b . URL https://www.aclweb.org/anthology/P18-1009
work page 2018
-
[91]
Boolq: Exploring the surprising difficulty of natural yes/no questions
Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. Boolq: Exploring the surprising difficulty of natural yes/no questions. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers),...
work page 2019
-
[92]
BAM! Born-Again Multi-Task Networks for Natural Language Understanding
Kevin Clark, Minh-Thang Luong, Urvashi Khandelwal, Christopher D. Manning, and Quoc V. Le. BAM ! B orn-again multi-task networks for natural language understanding. In Proceedings of the Association of Computational Linguistics (ACL). Association for Computational Linguistics, 2019 b . URL https://arxiv.org/pdf/1907.04829.pdf
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[93]
A unified architecture for natural language processing: Deep neural networks with multitask learning
Ronan Collobert and Jason Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning (ICML). Association for Computing Machinery, 2008. URL https://dl.acm.org/citation.cfm?id=1390177
work page 2008
-
[94]
SentEval : An evaluation toolkit for universal sentence representations
Alexis Conneau and Douwe Kiela. SentEval : An evaluation toolkit for universal sentence representations. In Proceedings of the 11th Language Resources and Evaluation Conference. European Language Resource Association, 2018. URL https://www.aclweb.org/anthology/L18-1269
work page 2018
-
[95]
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
Alexis Conneau, Douwe Kiela, Holger Schwenk, Lo \" c Barrault, and Antoine Bordes. Supervised learning of universal sentence representations from natural language inference data. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing ( EMNLP ) . Association for Computational Linguistics, 2017. doi:10.18653/v1/D17-1070. U...
-
[96]
The PASCAL recognising textual entailment challenge
Ido Dagan, Oren Glickman, and Bernardo Magnini. The PASCAL recognising textual entailment challenge. In Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Textual Entailment. Springer, 2006. URL https://link.springer.com/chapter/10.1007/11736790_9
-
[97]
Semi-supervised sequence learning
Andrew M Dai and Quoc V Le. Semi-supervised sequence learning. In Advances in Neural Information Processing Systems (NeurIPS). Curran Associates, Inc., 2015. URL http://papers.nips.cc/paper/5949-semi-supervised-sequence-learning.pdf
work page 2015
-
[98]
The CommitmentBank : Investigating projection in naturally occurring discourse
Marie-Catherine d e Marneffe, Mandy Simons, and Judith Tonhauser. The CommitmentBank : Investigating projection in naturally occurring discourse. 2019. To appear in Proceedings of Sinn und Bedeutung 23. Data can be found at https://github.com/mcdm/CommitmentBank/
work page 2019
-
[99]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT : Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Association for Computational Linguistics, 2019. URL h...
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[100]
William B. Dolan and Chris Brockett. Automatically constructing a corpus of sentential paraphrases. In Proceedings of IWP, 2005
work page 2005
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.