Pyramid Forcing classifies attention heads into Anchor, Wave, and Veil types and applies type-specific KV cache policies to improve long-horizon autoregressive video generation quality.
{InfiniGen}: Efficient generative inference of large language models with dynamic {KV} cache management
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
baseline 1polarities
baseline 1representative citing papers
AB-Sparse adaptively allocates per-head block sizes for sparse attention, adds lossless centroid quantization and custom variable-block GPU kernels, and reports up to 5.43% accuracy gain over fixed-block baselines with no throughput loss.
MTraining scales LLM training to 512K-token contexts on 32 A100 GPUs by integrating dynamic sparse training patterns with balanced and hierarchical sparse ring attention, achieving up to 6x throughput gains without accuracy loss on long-context benchmarks.
citing papers explorer
-
Pyramid Forcing: Head-Aware Pyramid KV Cache Policy for High-Quality Long Video Generation
Pyramid Forcing classifies attention heads into Anchor, Wave, and Veil types and applies type-specific KV cache policies to improve long-horizon autoregressive video generation quality.
-
AB-Sparse: Sparse Attention with Adaptive Block Size for Accurate and Efficient Long-Context Inference
AB-Sparse adaptively allocates per-head block sizes for sparse attention, adds lossless centroid quantization and custom variable-block GPU kernels, and reports up to 5.43% accuracy gain over fixed-block baselines with no throughput loss.
-
MTraining: Distributed Dynamic Sparse Attention for Efficient Ultra-Long Context Training
MTraining scales LLM training to 512K-token contexts on 32 A100 GPUs by integrating dynamic sparse training patterns with balanced and hierarchical sparse ring attention, achieving up to 6x throughput gains without accuracy loss on long-context benchmarks.