On the State of the Art of Evaluation in Neural Language Models

· 2017 · cs.CL · arXiv 1707.05589

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open full Pith review browse 5 citing papers arXiv PDF

abstract

Ongoing innovations in recurrent neural network architectures have provided a steady influx of apparently state-of-the-art results on language modelling benchmarks. However, these have been evaluated using differing code bases and limited computational resources, which represent uncontrolled sources of experimental variation. We reevaluate several popular architectures and regularisation methods with large-scale automatic black-box hyperparameter tuning and arrive at the somewhat surprising conclusion that standard LSTM architectures, when properly regularised, outperform more recent models. We establish a new state of the art on the Penn Treebank and Wikitext-2 corpora, as well as strong baselines on the Hutter Prize dataset.

citation-role summary

background 1

citation-polarity summary

unclear 1

representative citing papers

Rethinking 1-bit Optimization Leveraging Pre-trained Large Language Models

cs.CL · 2025-08-09 · conditional · novelty 6.0

A progressive training scheme with binary-aware initialization and dual-scaling allows pre-trained LLMs to be converted to high-performance 1-bit models without training from scratch.

ChemCrow: Augmenting large-language models with chemistry tools

physics.chem-ph · 2023-04-11 · conditional · novelty 6.0

ChemCrow augments LLMs with 18 expert chemistry tools to autonomously plan and execute syntheses and guide molecular discoveries in organic synthesis, drug discovery, and materials design.

Compressive Transformers for Long-Range Sequence Modelling

cs.LG · 2019-11-13 · unverdicted · novelty 6.0

Compressive Transformer sets new records on WikiText-103 (17.1 ppl) and Enwik8 (0.97 bpc) via memory compression and introduces the PG-19 long-range language benchmark.

Reproducibility in Machine Learning for Health

cs.LG · 2019-07-02 · unverdicted · novelty 5.0

Systematic evaluation of over 100 ML4H papers finds poorer reproducibility than other ML fields, driven by limited data and code access, and offers recommendations to data providers, publishers, and researchers.

Naturalistic Computational Cognitive Science: Towards generalizable models and theories that capture the full range of natural behavior

q-bio.NC · 2025-02-27 · unverdicted · novelty 4.0 · 2 refs

Advocates integrating naturalistic paradigms and AI progress into cognitive science to develop generalizable models of natural behavior while retaining experimental control and theoretical insight.

citing papers explorer

Showing 5 of 5 citing papers.

Rethinking 1-bit Optimization Leveraging Pre-trained Large Language Models cs.CL · 2025-08-09 · conditional · none · ref 24 · internal anchor
A progressive training scheme with binary-aware initialization and dual-scaling allows pre-trained LLMs to be converted to high-performance 1-bit models without training from scratch.
ChemCrow: Augmenting large-language models with chemistry tools physics.chem-ph · 2023-04-11 · conditional · none · ref 107 · internal anchor
ChemCrow augments LLMs with 18 expert chemistry tools to autonomously plan and execute syntheses and guide molecular discoveries in organic synthesis, drug discovery, and materials design.
Compressive Transformers for Long-Range Sequence Modelling cs.LG · 2019-11-13 · unverdicted · none · ref 98 · internal anchor
Compressive Transformer sets new records on WikiText-103 (17.1 ppl) and Enwik8 (0.97 bpc) via memory compression and introduces the PG-19 long-range language benchmark.
Reproducibility in Machine Learning for Health cs.LG · 2019-07-02 · unverdicted · none · ref 31 · internal anchor
Systematic evaluation of over 100 ML4H papers finds poorer reproducibility than other ML fields, driven by limited data and code access, and offers recommendations to data providers, publishers, and researchers.
Naturalistic Computational Cognitive Science: Towards generalizable models and theories that capture the full range of natural behavior q-bio.NC · 2025-02-27 · unverdicted · none · ref 16 · 2 links · internal anchor
Advocates integrating naturalistic paradigms and AI progress into cognitive science to develop generalizable models of natural behavior while retaining experimental control and theoretical insight.

On the State of the Art of Evaluation in Neural Language Models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer