pith. sign in

arxiv: 1807.03595 · v1 · pith:P6GVMDRVnew · submitted 2018-07-10 · 💻 cs.CL

Revisiting the Hierarchical Multiscale LSTM

classification 💻 cs.CL
keywords architecturemodelhierarchicallanguagelstmmultiscaleperformanceablation
0
0 comments X
read the original abstract

Hierarchical Multiscale LSTM (Chung et al., 2016a) is a state-of-the-art language model that learns interpretable structure from character-level input. Such models can provide fertile ground for (cognitive) computational linguistics studies. However, the high complexity of the architecture, training procedure and implementations might hinder its applicability. We provide a detailed reproduction and ablation study of the architecture, shedding light on some of the potential caveats of re-purposing complex deep-learning architectures. We further show that simplifying certain aspects of the architecture can in fact improve its performance. We also investigate the linguistic units (segments) learned by various levels of the model, and argue that their quality does not correlate with the overall performance of the model on language modeling.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.