Compression of Neural Machine Translation Models via Pruning

· 2016 · cs.AI · arXiv 1606.09274

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Neural Machine Translation (NMT), like many other deep learning domains, typically suffers from over-parameterization, resulting in large storage sizes. This paper examines three simple magnitude-based pruning schemes to compress NMT models, namely class-blind, class-uniform, and class-distribution, which differ in terms of how pruning thresholds are computed for the different classes of weights in the NMT architecture. We demonstrate the efficacy of weight pruning as a compression technique for a state-of-the-art NMT system. We show that an NMT model with over 200 million parameters can be pruned by 40% with very little performance loss as measured on the WMT'14 English-German translation task. This sheds light on the distribution of redundancy in the NMT architecture. Our main result is that with retraining, we can recover and even surpass the original performance with an 80%-pruned model.

representative citing papers

TAPIOCA: Why Task- Aware Pruning Improves OOD model Capability

cs.LG · 2026-05-14 · unverdicted · novelty 5.0

Task-aware pruning improves OOD performance by removing layers that distort task-adapted representation profiles, realigning OOD inputs with the geometry observed on ID data.

TELL-TALE: Task Efficient LLMs with Task Aware Layer Elimination

cs.LG · 2025-10-26 · unverdicted · novelty 5.0

TALE selectively prunes task-detrimental layers in LLMs at inference time to match or exceed baseline performance with lower computational cost across multiple models and tasks.

citing papers explorer

Showing 2 of 2 citing papers.

TAPIOCA: Why Task- Aware Pruning Improves OOD model Capability cs.LG · 2026-05-14 · unverdicted · none · ref 42 · internal anchor
Task-aware pruning improves OOD performance by removing layers that distort task-adapted representation profiles, realigning OOD inputs with the geometry observed on ID data.
TELL-TALE: Task Efficient LLMs with Task Aware Layer Elimination cs.LG · 2025-10-26 · unverdicted · none · ref 13 · internal anchor
TALE selectively prunes task-detrimental layers in LLMs at inference time to match or exceed baseline performance with lower computational cost across multiple models and tasks.

Compression of Neural Machine Translation Models via Pruning

fields

years

verdicts

representative citing papers

citing papers explorer