Revisiting the Hierarchical Multiscale LSTM

Afra Alishahi; \'Akos K\'ad\'ar; Grzegorz Chrupa{\l}a; Marc-Alexandre C\^ot\'e

arxiv: 1807.03595 · v1 · pith:P6GVMDRVnew · submitted 2018-07-10 · 💻 cs.CL

Revisiting the Hierarchical Multiscale LSTM

\'Akos K\'ad\'ar , Marc-Alexandre C\^ot\'e , Grzegorz Chrupa{\l}a , Afra Alishahi This is my paper

classification 💻 cs.CL

keywords architecturemodelhierarchicallanguagelstmmultiscaleperformanceablation

0 comments

read the original abstract

Hierarchical Multiscale LSTM (Chung et al., 2016a) is a state-of-the-art language model that learns interpretable structure from character-level input. Such models can provide fertile ground for (cognitive) computational linguistics studies. However, the high complexity of the architecture, training procedure and implementations might hinder its applicability. We provide a detailed reproduction and ablation study of the architecture, shedding light on some of the potential caveats of re-purposing complex deep-learning architectures. We further show that simplifying certain aspects of the architecture can in fact improve its performance. We also investigate the linguistic units (segments) learned by various levels of the model, and argue that their quality does not correlate with the overall performance of the model on language modeling.

This paper has not been read by Pith yet.

Revisiting the Hierarchical Multiscale LSTM

discussion (0)