On the State of the Art of Evaluation in Neural Language Models

Chris Dyer; G\'abor Melis; Phil Blunsom

arxiv: 1707.05589 · v2 · pith:AVFLXBK6new · submitted 2017-07-18 · 💻 cs.CL

On the State of the Art of Evaluation in Neural Language Models

G\'abor Melis , Chris Dyer , Phil Blunsom This is my paper

classification 💻 cs.CL

keywords architectureslanguagemodelsneuralstateapparentlyarriveautomatic

0 comments

read the original abstract

Ongoing innovations in recurrent neural network architectures have provided a steady influx of apparently state-of-the-art results on language modelling benchmarks. However, these have been evaluated using differing code bases and limited computational resources, which represent uncontrolled sources of experimental variation. We reevaluate several popular architectures and regularisation methods with large-scale automatic black-box hyperparameter tuning and arrive at the somewhat surprising conclusion that standard LSTM architectures, when properly regularised, outperform more recent models. We establish a new state of the art on the Penn Treebank and Wikitext-2 corpora, as well as strong baselines on the Hutter Prize dataset.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Rethinking 1-bit Optimization Leveraging Pre-trained Large Language Models
cs.CL 2025-08 conditional novelty 6.0

A progressive training scheme with binary-aware initialization and dual-scaling allows pre-trained LLMs to be converted to high-performance 1-bit models without training from scratch.
ChemCrow: Augmenting large-language models with chemistry tools
physics.chem-ph 2023-04 conditional novelty 6.0

ChemCrow augments LLMs with 18 expert chemistry tools to autonomously plan and execute syntheses and guide molecular discoveries in organic synthesis, drug discovery, and materials design.
Compressive Transformers for Long-Range Sequence Modelling
cs.LG 2019-11 unverdicted novelty 6.0

Compressive Transformer sets new records on WikiText-103 (17.1 ppl) and Enwik8 (0.97 bpc) via memory compression and introduces the PG-19 long-range language benchmark.
Naturalistic Computational Cognitive Science: Towards generalizable models and theories that capture the full range of natural behavior
q-bio.NC 2025-02 unverdicted novelty 3.0

Position paper advocating integration of naturalistic paradigms and AI models to create generalizable theories of natural human behavior and cognition.