pith. sign in

arxiv: 1803.08240 · v1 · pith:MWFE5TYAnew · submitted 2018-03-22 · 💻 cs.CL · cs.AI· cs.NE

An Analysis of Neural Language Modeling at Multiple Scales

classification 💻 cs.CL cs.AIcs.NE
keywords languagecharacter-levelenwik8lstmsmodelingqrnnsresultsstate-of-the-art
0
0 comments X
read the original abstract

Many of the leading approaches in language modeling introduce novel, complex and specialized architectures. We take existing state-of-the-art word level language models based on LSTMs and QRNNs and extend them to both larger vocabularies as well as character-level granularity. When properly tuned, LSTMs and QRNNs achieve state-of-the-art results on character-level (Penn Treebank, enwik8) and word-level (WikiText-103) datasets, respectively. Results are obtained in only 12 hours (WikiText-103) to 2 days (enwik8) using a single modern GPU.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Augmenting Self-attention with Persistent Memory

    cs.LG 2019-07 unverdicted novelty 7.0

    Augmenting self-attention with persistent memory vectors allows removal of feed-forward layers from Transformers without degrading performance on character and word level language modeling benchmarks.

  2. Evaluating Computational Language Models with Scaling Properties of Natural Language

    cs.CL 2019-06 unverdicted novelty 5.0

    Only gated RNN language models reproduce the long-range correlation scaling of natural language among tested models, with Taylor's law exponent serving as a quality indicator.