LongBench is the first bilingual multi-task benchmark for long context understanding in LLMs, containing 21 datasets in 6 categories with average lengths of 6711 words (English) and 13386 characters (Chinese).
International Conference on Learning Representations , year=
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
method 1polarities
use method 1representative citing papers
Derives a blockwise resolvent-style attention operator that exploits structured sparsity for subquadratic O(n^{4/3}d) entity tracking while matching dense accuracy.
ASAP amortizes Sinkhorn-based doubly-stochastic attention by learning a parametric map from 1D potentials to the Sinkhorn dual and reconstructing the plan via two-sided entropic c-transform, delivering 5.3x faster inference at matched accuracy.
S5 uses a single MIMO state space model with S4-derived initialization to match S4 efficiency and reach 87.4% average accuracy on the Long Range Arena benchmark.
LaplacianFormer uses a Laplacian kernel with an injective feature map and efficient approximations to achieve linear attention that preserves mid-range interactions better than Gaussian-based linear attention in vision transformers.
citing papers explorer
-
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
LongBench is the first bilingual multi-task benchmark for long context understanding in LLMs, containing 21 datasets in 6 categories with average lengths of 6711 words (English) and 13386 characters (Chinese).
-
Structured-Sparse Attention for Entity Tracking with Subquadratic Sequence Complexity
Derives a blockwise resolvent-style attention operator that exploits structured sparsity for subquadratic O(n^{4/3}d) entity tracking while matching dense accuracy.
-
ASAP: Amortized Doubly-Stochastic Attention via Sliced Dual Projection
ASAP amortizes Sinkhorn-based doubly-stochastic attention by learning a parametric map from 1D potentials to the Sinkhorn dual and reconstructing the plan via two-sided entropic c-transform, delivering 5.3x faster inference at matched accuracy.
-
Simplified State Space Layers for Sequence Modeling
S5 uses a single MIMO state space model with S4-derived initialization to match S4 efficiency and reach 87.4% average accuracy on the Long Range Arena benchmark.
-
LaplacianFormer:Rethinking Linear Attention with Laplacian Kernel
LaplacianFormer uses a Laplacian kernel with an injective feature map and efficient approximations to achieve linear attention that preserves mid-range interactions better than Gaussian-based linear attention in vision transformers.
- BioFormer: Rethinking Cross-Subject Generalization via Spectral Structural Alignment in Biomedical Time-Series