{"work":{"id":"1bb6fb0c-482d-43cf-94a8-ed18f72a5563","openalex_id":null,"doi":null,"arxiv_id":"1804.07461","raw_key":null,"title":"GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding","authors":null,"authors_text":"Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman","year":2018,"venue":"cs.CL","abstract":"For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is not exclusively tailored to any one specific task or dataset. In pursuit of this objective, we introduce the General Language Understanding Evaluation benchmark (GLUE), a tool for evaluating and analyzing the performance of models across a diverse range of existing NLU tasks. GLUE is model-agnostic, but it incentivizes sharing knowledge across tasks because certain tasks have very limited training data. We further provide a hand-crafted diagnostic test suite that enables detailed linguistic analysis of NLU models. We evaluate baselines based on current methods for multi-task and transfer learning and find that they do not immediately give substantial improvements over the aggregate performance of training a separate model per task, indicating room for improvement in developing general and robust NLU systems.","external_url":"https://arxiv.org/abs/1804.07461","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-06-27T18:31:07.652999+00:00","pith_arxiv_id":"1804.07461","created_at":"2026-05-10T09:48:48.246320+00:00","updated_at":"2026-06-27T18:31:07.652999+00:00","title_quality_ok":true,"display_title":"GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding","render_title":"GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding"},"hub":{"state":{"work_id":"1bb6fb0c-482d-43cf-94a8-ed18f72a5563","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":68,"external_cited_by_count":null,"distinct_field_count":11,"first_pith_cited_at":"2019-07-09T04:46:31+00:00","last_pith_cited_at":"2026-06-07T20:07:24+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-29T13:08:47.584547+00:00","tier_text":"hub"},"tier":"hub","role_counts":[{"context_role":"background","n":7},{"context_role":"dataset","n":7}],"polarity_counts":[{"context_polarity":"background","n":8},{"context_polarity":"use_dataset","n":6}],"runs":{"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-14T18:10:02.185672+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding","work_id":"ed240a10-5b19-406c-baa5-30803f465785","shared_citers":12},{"title":"SQuAD: 100,000+ Questions for Machine Comprehension of Text","work_id":"0492dd16-26e8-48d9-874c-3dd90cae7b85","shared_citers":11},{"title":"RoBERTa: A Robustly Optimized BERT Pretraining Approach","work_id":"41fe12c4-e538-4890-a244-480650ed3078","shared_citers":9},{"title":"Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge","work_id":"28ea1282-d657-4c61-a83c-f1249be6d6b1","shared_citers":9},{"title":"Language Models are Few-Shot Learners","work_id":"214732c0-2edd-44a0-af9e-28184a2b8279","shared_citers":7},{"title":"LLaMA: Open and Efficient Foundation Language Models","work_id":"c018fc23-6f3f-4035-9d02-28a2173b2b9d","shared_citers":7},{"title":"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer","work_id":"50e3b368-0243-4726-8186-233869802ad1","shared_citers":6},{"title":"The Power of Scale for Parameter-Efficient Prompt Tuning","work_id":"1056ba8e-7b3f-4811-be8e-9a3ed9269acb","shared_citers":6},{"title":"Gaussian Error Linear Units (GELUs)","work_id":"0466fd22-03a1-4a61-af0a-a900e77bb023","shared_citers":5},{"title":"HellaSwag: Can a Machine Really Finish Your Sentence?","work_id":"79f44c0c-96f4-4edb-bc50-a3c9d6b85936","shared_citers":5},{"title":"LoRA: Low-Rank Adaptation of Large Language Models","work_id":"0426219a-789e-4964-adc8-a04538510818","shared_citers":5},{"title":"Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism","work_id":"c888e6d1-0b1d-43d6-9ef5-f0912a0efa1b","shared_citers":5},{"title":"Mixed Precision Training","work_id":"c525941b-ce20-4bcb-8509-a9968f1e89c3","shared_citers":5},{"title":"Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer","work_id":"2c6b3f6d-54e4-4df7-baa7-475a490799af","shared_citers":5},{"title":"Prefix-Tuning: Optimizing Continuous Prompts for Generation","work_id":"23ddcedc-909d-4b18-b28a-943a50a8cef7","shared_citers":5},{"title":"Scaling Laws for Neural Language Models","work_id":"b7dd8749-9c45-4977-ab9b-64478dce1ae8","shared_citers":5},{"title":"Training Deep Nets with Sublinear Memory Cost","work_id":"f2c5c287-a500-40e4-a136-e7e3172db1d7","shared_citers":5},{"title":"AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning","work_id":"6fa49657-348b-42dd-b870-8758c71af878","shared_citers":4},{"title":"Adam: A Method for Stochastic Optimization","work_id":"1910796d-9b52-4683-bf5c-de9632c1028b","shared_citers":4},{"title":"arXiv preprint arXiv:1806.03377 , year=","work_id":"335ca03b-43f7-43d8-af32-3eaeb6735100","shared_citers":4},{"title":"arXiv preprint arXiv:1905.00537 , year=","work_id":"54fdcd2d-ade5-4d5e-9b37-8d75abcbaae2","shared_citers":4},{"title":"Bitfit: Simple parameter- efficient fine-tuning for transformer-based masked language-models","work_id":"8505729c-c88b-43bd-8e2c-2c94644ca438","shared_citers":4},{"title":"BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions","work_id":"511eeb84-4b95-46d5-b14f-50da43f4f19f","shared_citers":4},{"title":"Distilling the Knowledge in a Neural Network","work_id":"d927ab1f-17b8-4002-9d09-c3d55764fbad","shared_citers":4}],"time_series":[{"n":2,"year":2019},{"n":3,"year":2020},{"n":1,"year":2021},{"n":3,"year":2022},{"n":3,"year":2023},{"n":4,"year":2024},{"n":1,"year":2025},{"n":18,"year":2026}],"dependency_candidates":[]},"error":null,"updated_at":"2026-05-14T18:09:27.412686+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"items":[{"title":"Qwen3 Technical Report","outcome":"unchanged","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"counts":{"fixed":0,"merged":0,"unchanged":1,"quarantined":0,"needs_external_resolution":0},"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-14T18:09:58.803632+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding","claims":[{"claim_text":"For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is not exclusively tailored to any one specific task or dataset. In pursuit of this objective, we introduce the General Language Understanding Evaluation benchmark (GLUE), a tool for evaluating and analyzing the performance of models across a diverse range of existing NLU tasks. GLUE is model-agnostic, but it incentivizes sharing knowledge across tasks because certain tasks have very limited train","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-14T18:09:54.444189+00:00"}},"summary":{"title":"GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding","claims":[{"claim_text":"For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is not exclusively tailored to any one specific task or dataset. In pursuit of this objective, we introduce the General Language Understanding Evaluation benchmark (GLUE), a tool for evaluating and analyzing the performance of models across a diverse range of existing NLU tasks. GLUE is model-agnostic, but it incentivizes sharing knowledge across tasks because certain tasks have very limited train","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding because it crossed a citation-hub threshold.","role_counts":[]},"graph":{"co_cited":[{"title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding","work_id":"ed240a10-5b19-406c-baa5-30803f465785","shared_citers":12},{"title":"SQuAD: 100,000+ Questions for Machine Comprehension of Text","work_id":"0492dd16-26e8-48d9-874c-3dd90cae7b85","shared_citers":11},{"title":"RoBERTa: A Robustly Optimized BERT Pretraining Approach","work_id":"41fe12c4-e538-4890-a244-480650ed3078","shared_citers":9},{"title":"Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge","work_id":"28ea1282-d657-4c61-a83c-f1249be6d6b1","shared_citers":9},{"title":"Language Models are Few-Shot Learners","work_id":"214732c0-2edd-44a0-af9e-28184a2b8279","shared_citers":7},{"title":"LLaMA: Open and Efficient Foundation Language Models","work_id":"c018fc23-6f3f-4035-9d02-28a2173b2b9d","shared_citers":7},{"title":"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer","work_id":"50e3b368-0243-4726-8186-233869802ad1","shared_citers":6},{"title":"The Power of Scale for Parameter-Efficient Prompt Tuning","work_id":"1056ba8e-7b3f-4811-be8e-9a3ed9269acb","shared_citers":6},{"title":"Gaussian Error Linear Units (GELUs)","work_id":"0466fd22-03a1-4a61-af0a-a900e77bb023","shared_citers":5},{"title":"HellaSwag: Can a Machine Really Finish Your Sentence?","work_id":"79f44c0c-96f4-4edb-bc50-a3c9d6b85936","shared_citers":5},{"title":"LoRA: Low-Rank Adaptation of Large Language Models","work_id":"0426219a-789e-4964-adc8-a04538510818","shared_citers":5},{"title":"Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism","work_id":"c888e6d1-0b1d-43d6-9ef5-f0912a0efa1b","shared_citers":5},{"title":"Mixed Precision Training","work_id":"c525941b-ce20-4bcb-8509-a9968f1e89c3","shared_citers":5},{"title":"Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer","work_id":"2c6b3f6d-54e4-4df7-baa7-475a490799af","shared_citers":5},{"title":"Prefix-Tuning: Optimizing Continuous Prompts for Generation","work_id":"23ddcedc-909d-4b18-b28a-943a50a8cef7","shared_citers":5},{"title":"Scaling Laws for Neural Language Models","work_id":"b7dd8749-9c45-4977-ab9b-64478dce1ae8","shared_citers":5},{"title":"Training Deep Nets with Sublinear Memory Cost","work_id":"f2c5c287-a500-40e4-a136-e7e3172db1d7","shared_citers":5},{"title":"AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning","work_id":"6fa49657-348b-42dd-b870-8758c71af878","shared_citers":4},{"title":"Adam: A Method for Stochastic Optimization","work_id":"1910796d-9b52-4683-bf5c-de9632c1028b","shared_citers":4},{"title":"arXiv preprint arXiv:1806.03377 , year=","work_id":"335ca03b-43f7-43d8-af32-3eaeb6735100","shared_citers":4},{"title":"arXiv preprint arXiv:1905.00537 , year=","work_id":"54fdcd2d-ade5-4d5e-9b37-8d75abcbaae2","shared_citers":4},{"title":"Bitfit: Simple parameter- efficient fine-tuning for transformer-based masked language-models","work_id":"8505729c-c88b-43bd-8e2c-2c94644ca438","shared_citers":4},{"title":"BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions","work_id":"511eeb84-4b95-46d5-b14f-50da43f4f19f","shared_citers":4},{"title":"Distilling the Knowledge in a Neural Network","work_id":"d927ab1f-17b8-4002-9d09-c3d55764fbad","shared_citers":4}],"time_series":[{"n":2,"year":2019},{"n":3,"year":2020},{"n":1,"year":2021},{"n":3,"year":2022},{"n":3,"year":2023},{"n":4,"year":2024},{"n":1,"year":2025},{"n":18,"year":2026}],"dependency_candidates":[]},"authors":[]}}