LSTM: A Search Space Odyssey

Bas R. Steunebrink; Jan Koutn\'ik; J\"urgen Schmidhuber; Klaus Greff; Rupesh Kumar Srivastava

arxiv: 1503.04069 · v2 · pith:TSE6BMATnew · submitted 2015-03-13 · 💻 cs.NE · cs.LG

LSTM: A Search Space Odyssey

Klaus Greff , Rupesh Kumar Srivastava , Jan Koutn\'ik , Bas R. Steunebrink , J\"urgen Schmidhuber This is my paper

classification 💻 cs.NE cs.LG

keywords lstmvariantsnetworksarchitecturecomponentshyperparametersrecognitionresults

0 comments

read the original abstract

Several variants of the Long Short-Term Memory (LSTM) architecture for recurrent neural networks have been proposed since its inception in 1995. In recent years, these networks have become the state-of-the-art models for a variety of machine learning problems. This has led to a renewed interest in understanding the role and utility of various computational components of typical LSTM variants. In this paper, we present the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling. The hyperparameters of all LSTM variants for each task were optimized separately using random search, and their importance was assessed using the powerful fANOVA framework. In total, we summarize the results of 5400 experimental runs ($\approx 15$ years of CPU time), which makes our study the largest of its kind on LSTM networks. Our results show that none of the variants can improve upon the standard LSTM architecture significantly, and demonstrate the forget gate and the output activation function to be its most critical components. We further observe that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Language Models as Knowledge Bases?
cs.CL 2019-09 accept novelty 7.0

BERT stores relational knowledge extractable via cloze queries without fine-tuning and matches supervised baselines on open-domain QA tasks.
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
cs.AI 2023-08 unverdicted novelty 6.0

MetaGPT embeds human SOPs into LLM prompts to create role-specialized agent teams that produce more coherent solutions on collaborative software engineering tasks than prior chat-based multi-agent systems.