pith. machine review for the scientific record. sign in

arxiv: 1711.03953 · v4 · submitted 2017-11-10 · 💻 cs.CL · cs.LG

Recognition: unknown

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

Authors on Pith no claims yet
classification 💻 cs.CL cs.LG
keywords languagesoftmaxbottleneckmethodmodelmodelsnaturalword
0
0 comments X
read the original abstract

We formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck. Given that natural language is highly context-dependent, this further implies that in practice Softmax with distributed word embeddings does not have enough capacity to model natural language. We propose a simple and effective method to address this issue, and improve the state-of-the-art perplexities on Penn Treebank and WikiText-2 to 47.69 and 40.68 respectively. The proposed method also excels on the large-scale 1B Word dataset, outperforming the baseline by over 5.6 points in perplexity.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. The Expressivity Boundary of Probabilistic Circuits: A Comparison with Large Language Models

    cs.LG 2026-05 unverdicted novelty 7.0

    Probabilistic circuits have an output bottleneck with convex probability combinations and a context bottleneck limited to fixed vtree-aligned partitions, making them less expressive than transformers for language data...

  2. Subliminal Steering: Stronger Encoding of Hidden Signals

    cs.CL 2026-04 unverdicted novelty 7.0

    Subliminal steering transfers complex behavioral biases and the underlying steering vector through fine-tuning on innocuous data, achieving higher precision than prior prompt-based methods.

  3. BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion

    cs.CL 2026-05 unverdicted novelty 6.0

    BitLM replaces per-token softmax with bitwise continuous diffusion inside causal blocks to generate multiple tokens in parallel while preserving autoregressive structure.