Learning and Evaluating General Linguistic Intelligence

Angeliki Lazaridou; Chris Dyer; Cyprien de Masson d'Autume; Dani Yogatama; Jerome Connor; Lei Yu; Lingpeng Kong; Mike Chrzanowski; Phil Blunsom; Tomas Kocisky

arxiv: 1901.11373 · v1 · pith:FIZ6LXJTnew · submitted 2019-01-31 · 💻 cs.LG · cs.CL· stat.ML

Learning and Evaluating General Linguistic Intelligence

Dani Yogatama , Cyprien de Masson d'Autume , Jerome Connor , Tomas Kocisky , Mike Chrzanowski , Lingpeng Kong , Angeliki Lazaridou , Wang Ling

show 3 more authors

Lei Yu Chris Dyer Phil Blunsom

This is my paper

classification 💻 cs.LG cs.CLstat.ML

keywords generalintelligencelinguisticmodelstasksacquiredknowledgelanguage

0 comments

read the original abstract

We define general linguistic intelligence as the ability to reuse previously acquired knowledge about a language's lexicon, syntax, semantics, and pragmatic conventions to adapt to new tasks quickly. Using this definition, we analyze state-of-the-art natural language understanding models and conduct an extensive empirical investigation to evaluate them against these criteria through a series of experiments that assess the task-independence of the knowledge being acquired by the learning process. In addition to task performance, we propose a new evaluation metric based on an online encoding of the test data that quantifies how quickly an existing agent (model) learns a new task. Our results show that while the field has made impressive progress in terms of model architectures that generalize to many tasks, these models still require a lot of in-domain training examples (e.g., for fine tuning, training task-specific modules), and are prone to catastrophic forgetting. Moreover, we find that far from solving general tasks (e.g., document question answering), our models are overfitting to the quirks of particular datasets (e.g., SQuAD). We discuss missing components and conjecture on how to make progress toward general linguistic intelligence.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Language Models are Few-Shot Learners
cs.CL 2020-05 accept novelty 8.0

GPT-3 shows that scaling an autoregressive language model to 175 billion parameters enables strong few-shot performance across diverse NLP tasks via in-context prompting without fine-tuning.
Language Models as Knowledge Bases?
cs.CL 2019-09 accept novelty 7.0

BERT stores relational knowledge extractable via cloze queries without fine-tuning and matches supervised baselines on open-domain QA tasks.
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
cs.CL 2022-01 unverdicted novelty 5.0

Trained the largest monolithic 530B-parameter transformer language model to date and reported new state-of-the-art zero- and few-shot results on multiple NLP benchmarks.
Transfer Learning for Risk Classification of Social Media Posts: Model Evaluation Study
cs.CL 2019-07 unverdicted novelty 4.0

Finetuning GPT-1 on 150000 unlabeled Reachout.com posts then feeding the features into AutoML yields a new state-of-the-art macro F1 of 0.572 for triaging risk in 1588 labeled CLPsych 2017 posts without metadata or history.