Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers) , pages=

Longbench: A bilingual, multitask benchmark for long context understanding , author=

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

browse 9 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Block-Sphere Vector Quantization

cs.LG · 2026-05-19 · unverdicted · novelty 7.0

BlockQuant is a new block quantization algorithm on the sphere after random rotation that theoretically improves reconstruction MSE and expected inner-product distortion over EDEN, RabitQ, and TurboQuant.

GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment

cs.CL · 2026-05-19 · unverdicted · novelty 6.0

GoLongRL releases a 23K-sample open long-context RL dataset spanning 9 tasks and introduces TMN-Reweight to improve multitask optimization, achieving performance comparable to much larger models under GRPO.

Training-Inference Consistent Segmented Execution for Long-Context LLMs

cs.CL · 2026-05-12 · conditional · novelty 6.0

A training-inference consistent segmented execution framework for long-context LLMs matches full-context performance with substantially lower peak memory at very long lengths.

Test-Time Speculation

cs.CL · 2026-05-10 · unverdicted · novelty 6.0 · 2 refs

TTS adapts speculator models online via target model verifications to improve acceptance lengths by up to 72% over prior methods, with gains increasing for longer generations.

Forget, Then Recall: Learnable Compression and Selective Unfolding via Gist Sparse Attention

cs.LG · 2026-04-22 · unverdicted · novelty 6.0

Gist Sparse Attention uses learnable gist compression tokens as both summaries and routing signals, then selectively unfolds relevant raw chunks for fine-grained attention, outperforming compression and sparse-attention baselines on LongBench and RAG tasks at 8x-32x compression.

DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing

cs.CL · 2026-04-21 · unverdicted · novelty 6.0

DASH-KV accelerates long-context LLM inference to linear complexity via asymmetric KV cache hashing and mixed-precision retention, matching full attention performance on LongBench.

MDN: Parallelizing Stepwise Momentum for Delta Linear Attention

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

MDN parallelizes stepwise momentum for delta linear attention using geometric reordering and dynamical systems analysis, yielding performance gains over Mamba2 and GDN on 400M and 1.3B models.

PipeMax: Enhancing Offline LLM Inference on Commodity GPU Servers

cs.DC · 2026-05-04 · unverdicted · novelty 5.0

PipeMax integrates pipeline parallelism with offloading to achieve up to 2.51x higher throughput than vLLM for offline LLM inference on commodity 8-GPU servers.

Revisiting RaBitQ and TurboQuant: A Symmetric Comparison of Methods, Theory, and Experiments

cs.LG · 2026-04-21 · unverdicted · novelty 5.0

RaBitQ outperforms TurboQuant in most tested settings for inner-product estimation, nearest-neighbor search, and KV cache quantization, while several TurboQuant runtime and recall results could not be reproduced from the released code.

citing papers explorer

Showing 9 of 9 citing papers.

Block-Sphere Vector Quantization cs.LG · 2026-05-19 · unverdicted · none · ref 38
BlockQuant is a new block quantization algorithm on the sphere after random rotation that theoretically improves reconstruction MSE and expected inner-product distortion over EDEN, RabitQ, and TurboQuant.
GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment cs.CL · 2026-05-19 · unverdicted · none · ref 42
GoLongRL releases a 23K-sample open long-context RL dataset spanning 9 tasks and introduces TMN-Reweight to improve multitask optimization, achieving performance comparable to much larger models under GRPO.
Training-Inference Consistent Segmented Execution for Long-Context LLMs cs.CL · 2026-05-12 · conditional · none · ref 4
A training-inference consistent segmented execution framework for long-context LLMs matches full-context performance with substantially lower peak memory at very long lengths.
Test-Time Speculation cs.CL · 2026-05-10 · unverdicted · none · ref 9 · 2 links
TTS adapts speculator models online via target model verifications to improve acceptance lengths by up to 72% over prior methods, with gains increasing for longer generations.
Forget, Then Recall: Learnable Compression and Selective Unfolding via Gist Sparse Attention cs.LG · 2026-04-22 · unverdicted · none · ref 20
Gist Sparse Attention uses learnable gist compression tokens as both summaries and routing signals, then selectively unfolds relevant raw chunks for fine-grained attention, outperforming compression and sparse-attention baselines on LongBench and RAG tasks at 8x-32x compression.
DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing cs.CL · 2026-04-21 · unverdicted · none · ref 24
DASH-KV accelerates long-context LLM inference to linear complexity via asymmetric KV cache hashing and mixed-precision retention, matching full attention performance on LongBench.
MDN: Parallelizing Stepwise Momentum for Delta Linear Attention cs.LG · 2026-05-07 · unverdicted · none · ref 74
MDN parallelizes stepwise momentum for delta linear attention using geometric reordering and dynamical systems analysis, yielding performance gains over Mamba2 and GDN on 400M and 1.3B models.
PipeMax: Enhancing Offline LLM Inference on Commodity GPU Servers cs.DC · 2026-05-04 · unverdicted · none · ref 20
PipeMax integrates pipeline parallelism with offloading to achieve up to 2.51x higher throughput than vLLM for offline LLM inference on commodity 8-GPU servers.
Revisiting RaBitQ and TurboQuant: A Symmetric Comparison of Methods, Theory, and Experiments cs.LG · 2026-04-21 · unverdicted · none · ref 18
RaBitQ outperforms TurboQuant in most tested settings for inner-product estimation, nearest-neighbor search, and KV cache quantization, while several TurboQuant runtime and recall results could not be reproduced from the released code.

Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers) , pages=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer