pith. machine review for the scientific record. sign in

arxiv: 2510.20797 · v2 · submitted 2025-10-23 · 💻 cs.CL · cs.AI· cs.LG

Recognition: unknown

No Mean Feat: Simple, Strong Baselines for Context Compression

Authors on Pith no claims yet
classification 💻 cs.CL cs.AIcs.LG
keywords compressionbaselinescontextsimpleattentionbenchpressbidirectionalcompression-token
0
0 comments X
read the original abstract

Context compression reduces Transformer inference costs by replacing lengthy inputs with shorter pre-computed representations. It carries significant benefits for retrieval-augmented generation (RAG) and has attracted growing research attention. However, progress remains difficult to measure due to inconsistent evaluations and baselines. We design a standard, easy-to-reproduce evaluation suite for context compression, BenchPress, along with simple, high-performance baselines for English reading comprehension. BenchPress supports benchmarking across model scales, datasets, compression ratios, and short ($<$1K tokens) to mid-range ($<$8K tokens) contexts. While the suite is applicable to any compression paradigm, our baselines target soft context compression. We establish two simple baselines that strongly outperform the widely used causal compression-token approach: mean pooling and a bidirectional compression-token variant. Our results show the benefit of bidirectional attention when computing compressed representations, and that simple pooling is an expressive compression operator.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Why Mean Pooling Works: Quantifying Second-Order Collapse in Text Embeddings

    cs.CL 2026-04 unverdicted novelty 7.0

    Modern text encoders resist second-order collapse under mean pooling because token embeddings concentrate tightly within texts, and this resistance correlates with stronger downstream performance.