LongGenBench: Benchmarking Long-Form Generation in Long Context

Yuhao Wu, Ming Shan Hee, Zhiqing Hu, Roy Ka-Wei Lee · 2025 · arXiv 2409.02076

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

RedKnot: Efficient Long-Context LLM Serving with Head-Aware KV Reuse and SegPagedAttention

cs.AI · 2026-06-04 · unverdicted · novelty 6.0

RedKnot decomposes the KV cache by attention heads to enable position-independent reuse, prefix compression, hot/cold separation, and distributed placement for long-context LLM serving without model changes.

VeriCache: Turning Lossy KV Cache into Lossless LLM Inference

cs.AR · 2026-05-17 · unverdicted · novelty 6.0

VeriCache turns lossy KV cache compression into lossless LLM inference by drafting with compressed cache and verifying drafts with full cache, achieving up to 4x throughput with identical outputs.

IceCache: Memory-efficient KV-cache Management for Long-Sequence LLMs

cs.LG · 2026-04-12 · unverdicted · novelty 6.0

IceCache combines semantic token clustering with PagedAttention to keep only 25% of the KV cache tokens while retaining 99% accuracy on LongBench and matching or beating prior offloading methods in latency.

Language models fail at extended rule following

cs.CL · 2026-05-03 · unverdicted · novelty 5.0 · 2 refs

LLMs fail at extended counting of repeated characters due to finite internal states, with abrupt errors persisting across model scales and inference methods.

citing papers explorer

Showing 4 of 4 citing papers after filters.

RedKnot: Efficient Long-Context LLM Serving with Head-Aware KV Reuse and SegPagedAttention cs.AI · 2026-06-04 · unverdicted · none · ref 66
RedKnot decomposes the KV cache by attention heads to enable position-independent reuse, prefix compression, hot/cold separation, and distributed placement for long-context LLM serving without model changes.
VeriCache: Turning Lossy KV Cache into Lossless LLM Inference cs.AR · 2026-05-17 · unverdicted · none · ref 67
VeriCache turns lossy KV cache compression into lossless LLM inference by drafting with compressed cache and verifying drafts with full cache, achieving up to 4x throughput with identical outputs.
IceCache: Memory-efficient KV-cache Management for Long-Sequence LLMs cs.LG · 2026-04-12 · unverdicted · none · ref 10
IceCache combines semantic token clustering with PagedAttention to keep only 25% of the KV cache tokens while retaining 99% accuracy on LongBench and matching or beating prior offloading methods in latency.
Language models fail at extended rule following cs.CL · 2026-05-03 · unverdicted · none · ref 9 · 2 links
LLMs fail at extended counting of repeated characters due to finite internal states, with abrupt errors persisting across model scales and inference methods.

LongGenBench: Benchmarking Long-Form Generation in Long Context

fields

years

verdicts

representative citing papers

citing papers explorer