Bag of Tricks for Efficient Text Classification

Armand Joulin , Edouard Grave , Piotr Bojanowski , Tomas Mikolov

Authors on Pith no claims yet

classification 💻 cs.CL

keywords textclassificationefficientfasttextlessaccuracybaselinebillion

read the original abstract

This paper explores a simple and efficient baseline for text classification. Our experiments show that our fast text classifier fastText is often on par with deep learning classifiers in terms of accuracy, and many orders of magnitude faster for training and evaluation. We can train fastText on more than one billion words in less than ten minutes using a standard multicore~CPU, and classify half a million sentences among~312K classes in less than a minute.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 11 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment
cs.CL 2026-05 unverdicted novelty 7.0

An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
Kathleen: Oscillator-Based Byte-Level Text Classification Without Tokenization or Attention
cs.CL 2026-04 unverdicted novelty 7.0

Kathleen uses recurrent oscillator banks, an efficient wavetable encoder, and phase harmonics to classify text at the byte level with high accuracy and low parameter count.
Kathleen: Oscillator-Based Byte-Level Text Classification Without Tokenization or Attention
cs.CL 2026-04 unverdicted novelty 7.0

Kathleen performs byte-level text classification via recurrent oscillator banks, FFT wavetable encoding, and phase harmonics, matching pretrained baselines on standard benchmarks with 36% fewer parameters.
Moshi: a speech-text foundation model for real-time dialogue
eess.AS 2024-09 accept novelty 7.0

Moshi is the first real-time full-duplex spoken large language model that casts dialogue as speech-to-speech generation using parallel audio streams and an inner monologue of time-aligned text tokens.
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
cs.CL 2018-04 unverdicted novelty 7.0

GLUE is a multi-task benchmark for general natural language understanding that includes a diagnostic test suite and finds limited gains from current multi-task learning methods over single-task training.
Finding Meaning in Embeddings: Concept Separation Curves
cs.CL 2026-04 unverdicted novelty 6.0

Concept Separation Curves provide a classifier-independent method to visualize and quantify how sentence embeddings distinguish conceptual meaning from syntactic variations across languages and domains.
SubFLOT: Submodel Extraction for Efficient and Personalized Federated Learning via Optimal Transport
cs.LG 2026-04 unverdicted novelty 6.0

SubFLOT uses optimal transport to generate data-aware personalized submodels via server-side pruning and scaling-based adaptive regularization to mitigate parametric divergence in heterogeneous federated learning.
Muon is Scalable for LLM Training
cs.LG 2025-02 unverdicted novelty 6.0

Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
cs.CL 2024-06 unverdicted novelty 6.0

FineWeb is a curated 15T-token web dataset that produces stronger LLMs than prior open collections, while its educational subset sharply improves performance on MMLU and ARC benchmarks.
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
cs.CL 2025-02 unverdicted novelty 5.0

SmolLM2 is a 1.7B-parameter language model that outperforms Qwen2.5-1.5B and Llama3.2-1B after overtraining on 11 trillion tokens using custom FineMath, Stack-Edu, and SmolTalk datasets in a multi-stage pipeline.
Galactica: A Large Language Model for Science
cs.CL 2022-11 unverdicted novelty 5.0

Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.