hub Canonical reference

LongNet: Scaling transformers to 1,000,000,000 tokens.arXiv preprint arXiv:2307.02486

Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Nanning Zheng, Furu Wei · 2023 · arXiv 2307.02486

Canonical reference. 100% of citing Pith papers cite this work as background.

20 Pith papers citing it

Background 100% of classified citations

read on arXiv browse 20 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 7

citation-polarity summary

background 7

representative citing papers

RULER: What's the Real Context Size of Your Long-Context Language Models?

cs.CL · 2024-04-09 · accept · novelty 8.0

RULER shows most long-context LMs drop sharply in performance on complex tasks as length and difficulty increase, with only half maintaining results at 32K tokens.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

cs.LG · 2023-12-01 · unverdicted · novelty 8.0

Mamba is a linear-time sequence model using input-dependent selective SSMs that achieves SOTA results across modalities and matches twice-larger Transformers on language modeling with 5x higher inference throughput.

LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

cs.CL · 2023-08-28 · unverdicted · novelty 8.0

LongBench is the first bilingual multi-task benchmark for long context understanding in LLMs, containing 21 datasets in 6 categories with average lengths of 6711 words (English) and 13386 characters (Chinese).

Stream-CQSA: Avoiding Out-of-Memory in Attention Computation via Flexible Workload Scheduling

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

Stream-CQSA uses CQS-based decomposition to stream exact attention computations for billion-token sequences on limited-memory hardware.

Exact Flow Linear Attention: Exact Solution from Continuous-Time Dynamics

cs.LG · 2025-12-14 · unverdicted · novelty 7.0

Exact Flow Linear Attention derives a closed-form exact update for delta-rule linear attention from continuous-time dynamics, removing Euler discretization error while preserving linear complexity and structure.

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

cs.CL · 2024-04-10 · conditional · novelty 7.0

Infini-attention combines compressive memory with masked local attention and long-term linear attention inside each Transformer block to support infinite context length with bounded resources.

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

cs.CV · 2024-01-17 · conditional · novelty 7.0

Vim is a bidirectional Mamba vision backbone that outperforms DeiT in accuracy on standard tasks while being substantially faster and more memory-efficient for high-resolution images.

Where Does Long-Context Supervision Actually Go? Effective-Context Exposure Balancing

cs.CL · 2026-05-11 · conditional · novelty 6.0

EXACT re-allocates training supervision by inverse frequency of long effective-context targets, improving NoLiMa and RULER scores by 5-18 points on Qwen and LLaMA models without degrading standard QA or reasoning.

Stacked from One: Multi-Scale Self-Injection for Context Window Extension

cs.CL · 2026-03-05 · unverdicted · novelty 6.0

SharedLLM stacks two copies of a short-context LLM so the lower one compresses context into query-aware multi-grained tokens that are injected only at the lowest layers of the upper one, enabling generalization from 8K training to 128K+ inputs.

BlossomRec: Block-level Fused Sparse Attention Mechanism for Sequential Recommendations

cs.IR · 2025-12-15 · unverdicted · novelty 6.0

BlossomRec is a sparse attention mechanism that uses two distinct block-level patterns for long-term and short-term interests, fused by a gated output, to reduce computation in sequential recommendation Transformers.

Kimi Linear: An Expressive, Efficient Attention Architecture

cs.CL · 2025-10-30 · unverdicted · novelty 6.0

Kimi Linear hybridizes linear attention with a new KDA module to beat full attention on tasks while slashing KV cache by 75% and speeding decoding up to 6x.

Positional Encoding via Token-Aware Phase Attention

cs.CL · 2025-09-16 · unverdicted · novelty 6.0

TAPA adds a learnable phase function to attention to preserve long-range token interactions, enabling direct continual pretraining, length extrapolation, lower perplexity, and stronger retrieval than RoPE-style methods.

Accelerating Prefilling via Decoding-time Contribution Sparsity

cs.CL · 2025-07-29 · conditional · novelty 6.0

TriangleMix exploits decoding-time contribution sparsity via a training-free static attention pattern to accelerate LLM prefilling with nearly lossless performance.

eLLM: Elastic Memory Management Framework for Efficient LLM Serving

cs.DC · 2025-06-18 · unverdicted · novelty 6.0

eLLM unifies LLM memory management with virtual tensors and elastic ballooning to CPU memory, reporting 2.32x higher decoding throughput and 3x larger batch sizes for 128K inputs.

Beyond Chunking: Discourse-Aware Hierarchical Retrieval for Long Document Question Answering

cs.IR · 2025-05-26 · unverdicted · novelty 6.0

A hierarchical QA framework converts RST discourse trees into enhanced sentence representations for structure-guided retrieval and reports consistent gains over baselines on four datasets across genres and languages.

MoBA: Mixture of Block Attention for Long-Context LLMs

cs.LG · 2025-02-18 · unverdicted · novelty 6.0

MoBA routes attention over blocks via MoE-style gating to enable dynamic, bias-light long-context attention that matches full attention performance at lower cost.

Sessa: Selective State Space Attention

cs.LG · 2026-04-20 · unverdicted · novelty 5.0

Sessa integrates attention within recurrent paths to achieve power-law memory tails and flexible non-decaying selective retrieval, outperforming baselines on long-context tasks.

On Efficient Variants of Segment Anything Model: A Survey

cs.CV · 2024-10-07 · unverdicted · novelty 5.0

A survey that reviews efficient variants of the Segment Anything Model, categorizes acceleration strategies, and provides a unified hardware evaluation on benchmarks.

A Comprehensive Overview of Large Language Models

cs.CL · 2023-07-12 · unverdicted · novelty 2.0

A survey paper providing an overview of Large Language Models, their background, and recent advances in the field.

RAT+: Train Dense, Infer Sparse -- Recurrence Augmented Attention for Dilated Inference

cs.LG · 2026-02-20 · 2 refs

citing papers explorer

Showing 20 of 20 citing papers.

RULER: What's the Real Context Size of Your Long-Context Language Models? cs.CL · 2024-04-09 · accept · none · ref 10
RULER shows most long-context LMs drop sharply in performance on complex tasks as length and difficulty increase, with only half maintaining results at 32K tokens.
Mamba: Linear-Time Sequence Modeling with Selective State Spaces cs.LG · 2023-12-01 · unverdicted · none · ref 24
Mamba is a linear-time sequence model using input-dependent selective SSMs that achieves SOTA results across modalities and matches twice-larger Transformers on language modeling with 5x higher inference throughput.
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding cs.CL · 2023-08-28 · unverdicted · none · ref 80
LongBench is the first bilingual multi-task benchmark for long context understanding in LLMs, containing 21 datasets in 6 categories with average lengths of 6711 words (English) and 13386 characters (Chinese).
Stream-CQSA: Avoiding Out-of-Memory in Attention Computation via Flexible Workload Scheduling cs.LG · 2026-04-22 · unverdicted · none · ref 11
Stream-CQSA uses CQS-based decomposition to stream exact attention computations for billion-token sequences on limited-memory hardware.
Exact Flow Linear Attention: Exact Solution from Continuous-Time Dynamics cs.LG · 2025-12-14 · unverdicted · none · ref 6
Exact Flow Linear Attention derives a closed-form exact update for delta-rule linear attention from continuous-time dynamics, removing Euler discretization error while preserving linear complexity and structure.
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention cs.CL · 2024-04-10 · conditional · none · ref 10
Infini-attention combines compressive memory with masked local attention and long-term linear attention inside each Transformer block to support infinite context length with bounded resources.
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model cs.CV · 2024-01-17 · conditional · none · ref 11
Vim is a bidirectional Mamba vision backbone that outperforms DeiT in accuracy on standard tasks while being substantially faster and more memory-efficient for high-resolution images.
Where Does Long-Context Supervision Actually Go? Effective-Context Exposure Balancing cs.CL · 2026-05-11 · conditional · none · ref 8
EXACT re-allocates training supervision by inverse frequency of long effective-context targets, improving NoLiMa and RULER scores by 5-18 points on Qwen and LLaMA models without degrading standard QA or reasoning.
Stacked from One: Multi-Scale Self-Injection for Context Window Extension cs.CL · 2026-03-05 · unverdicted · none · ref 9
SharedLLM stacks two copies of a short-context LLM so the lower one compresses context into query-aware multi-grained tokens that are injected only at the lowest layers of the upper one, enabling generalization from 8K training to 128K+ inputs.
BlossomRec: Block-level Fused Sparse Attention Mechanism for Sequential Recommendations cs.IR · 2025-12-15 · unverdicted · none · ref 10
BlossomRec is a sparse attention mechanism that uses two distinct block-level patterns for long-term and short-term interests, fused by a gated output, to reduce computation in sequential recommendation Transformers.
Kimi Linear: An Expressive, Efficient Attention Architecture cs.CL · 2025-10-30 · unverdicted · none · ref 20
Kimi Linear hybridizes linear attention with a new KDA module to beat full attention on tasks while slashing KV cache by 75% and speeding decoding up to 6x.
Positional Encoding via Token-Aware Phase Attention cs.CL · 2025-09-16 · unverdicted · none · ref 5
TAPA adds a learnable phase function to attention to preserve long-range token interactions, enabling direct continual pretraining, length extrapolation, lower perplexity, and stronger retrieval than RoPE-style methods.
Accelerating Prefilling via Decoding-time Contribution Sparsity cs.CL · 2025-07-29 · conditional · none · ref 6
TriangleMix exploits decoding-time contribution sparsity via a training-free static attention pattern to accelerate LLM prefilling with nearly lossless performance.
eLLM: Elastic Memory Management Framework for Efficient LLM Serving cs.DC · 2025-06-18 · unverdicted · none · ref 10
eLLM unifies LLM memory management with virtual tensors and elastic ballooning to CPU memory, reporting 2.32x higher decoding throughput and 3x larger batch sizes for 128K inputs.
Beyond Chunking: Discourse-Aware Hierarchical Retrieval for Long Document Question Answering cs.IR · 2025-05-26 · unverdicted · none · ref 13
A hierarchical QA framework converts RST discourse trees into enhanced sentence representations for structure-guided retrieval and reports consistent gains over baselines on four datasets across genres and languages.
MoBA: Mixture of Block Attention for Long-Context LLMs cs.LG · 2025-02-18 · unverdicted · none · ref 7
MoBA routes attention over blocks via MoE-style gating to enable dynamic, bias-light long-context attention that matches full attention performance at lower cost.
Sessa: Selective State Space Attention cs.LG · 2026-04-20 · unverdicted · none · ref 7
Sessa integrates attention within recurrent paths to achieve power-law memory tails and flexible non-decaying selective retrieval, outperforming baselines on long-context tasks.
On Efficient Variants of Segment Anything Model: A Survey cs.CV · 2024-10-07 · unverdicted · none · ref 189
A survey that reviews efficient variants of the Segment Anything Model, categorizes acceleration strategies, and provides a unified hardware evaluation on benchmarks.
A Comprehensive Overview of Large Language Models cs.CL · 2023-07-12 · unverdicted · none · ref 188
A survey paper providing an overview of Large Language Models, their background, and recent advances in the field.
RAT+: Train Dense, Infer Sparse -- Recurrence Augmented Attention for Dilated Inference cs.LG · 2026-02-20 · unreviewed · ref 9 · 2 links

LongNet: Scaling transformers to 1,000,000,000 tokens.arXiv preprint arXiv:2307.02486

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer