fairseq: A Fast, Extensible Toolkit for Sequence Modeling

Alexei Baevski; Angela Fan; David Grangier; Michael Auli; Myle Ott; Nathan Ng; Sam Gross; Sergey Edunov

arxiv: 1904.01038 · v1 · pith:26LC64CLnew · submitted 2019-04-01 · 💻 cs.CL

fairseq: A Fast, Extensible Toolkit for Sequence Modeling

Myle Ott , Sergey Edunov , Alexei Baevski , Angela Fan , Sam Gross , Nathan Ng , David Grangier , Michael Auli This is my paper

classification 💻 cs.CL

keywords modelingtoolkitfairseqfastgpussequencetrainingacross

0 comments

read the original abstract

fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks. The toolkit is based on PyTorch and supports distributed training across multiple GPUs and machines. We also support fast mixed-precision training and inference on modern GPUs. A demo video can be found at https://www.youtube.com/watch?v=OtgDdWtHvto

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
cs.LG 2022-08 conditional novelty 7.0

LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.
InCoder: A Generative Model for Code Infilling and Synthesis
cs.SE 2022-04 unverdicted novelty 7.0

InCoder is the first generative model to directly perform zero-shot code infilling via bidirectional context from a masked-then-appended training scheme, matching left-to-right models on synthesis while improving on t...
Enabling Global, Human-Centered Explanations for LLMs:From Tokens to Interpretable Code and Test Generation
cs.SE 2025-03 unverdicted novelty 6.0

CodeQ aggregates token rationales into code categories to enable global interpretability of LLMs, claiming over 50% entropy reduction and revealing model preference for syntactic cues plus human misalignment in a 37-p...
HuggingFace's Transformers: State-of-the-art Natural Language Processing
cs.CL 2019-10 accept novelty 6.0

Hugging Face releases an open-source Python library that supplies a unified API and pretrained weights for major Transformer architectures used in natural language processing.
Gated Memory Policy
cs.RO 2026-04 unverdicted novelty 5.0

GMP selectively activates and represents memory via a gate and lightweight cross-attention, yielding 30.1% higher success on non-Markovian robotic tasks while staying competitive on Markovian ones.
Video-guided Machine Translation with Global Video Context
cs.CV 2026-04 unverdicted novelty 4.0

A globally video-guided multimodal translation framework retrieves semantically related video segments with a vector database and applies attention mechanisms to improve subtitle translation accuracy in long videos.