hub Canonical reference

arXiv preprint arXiv:2307.02486 (2023)

Ding, Jiayu, Ma, Shuming, Dong, Li, Zhang, Xingxing, Huang, Shaohan, Wang, Wenhui · 2023 · arXiv 2307.02486

Canonical reference. 100% of citing Pith papers cite this work as background.

26 Pith papers citing it

Background 100% of classified citations

read on arXiv browse 26 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 7

citation-polarity summary

background 7

representative citing papers

RULER: What's the Real Context Size of Your Long-Context Language Models?

cs.CL · 2024-04-09 · accept · novelty 8.0

RULER shows most long-context LMs drop sharply in performance on complex tasks as length and difficulty increase, with only half maintaining results at 32K tokens.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

cs.LG · 2023-12-01 · unverdicted · novelty 8.0

Mamba is a linear-time sequence model using input-dependent selective SSMs that achieves SOTA results across modalities and matches twice-larger Transformers on language modeling with 5x higher inference throughput.

LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

cs.CL · 2023-08-28 · unverdicted · novelty 8.0

LongBench is the first bilingual multi-task benchmark for long context understanding in LLMs, containing 21 datasets in 6 categories with average lengths of 6711 words (English) and 13386 characters (Chinese).

Locality Does Not Imply Reachability: Boundary Repair in Block-Sparse Causal Attention

cs.LG · 2026-06-01 · conditional · novelty 7.0

Fixed block causal masks create reachability boundaries where representations depend only on block prefixes, formalized via dependency sets and phase-conditioned coverage functions, with a parameter-free boundary bridge repair.

Stream-CQSA: Avoiding Out-of-Memory in Attention Computation via Flexible Workload Scheduling

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

Stream-CQSA uses CQS-based decomposition to stream exact attention computations for billion-token sequences on limited-memory hardware.

Exact Flow Linear Attention: Exact Solution from Continuous-Time Dynamics

cs.LG · 2025-12-14 · unverdicted · novelty 7.0

Exact Flow Linear Attention derives a closed-form exact update for delta-rule linear attention from continuous-time dynamics, removing Euler discretization error while preserving linear complexity and structure.

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

cs.CL · 2024-04-10 · conditional · novelty 7.0

Infini-attention combines compressive memory with masked local attention and long-term linear attention inside each Transformer block to support infinite context length with bounded resources.

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

cs.CV · 2024-01-17 · conditional · novelty 7.0

Vim is a bidirectional Mamba vision backbone that outperforms DeiT in accuracy on standard tasks while being substantially faster and more memory-efficient for high-resolution images.

Vortex: Efficient and Programmable Sparse Attention Serving for AI Agents

cs.AI · 2026-06-04 · unverdicted · novelty 6.0

Vortex provides a programmable frontend and backend for sparse attention in LLM serving, delivering up to 3.46x throughput over full attention while preserving accuracy.

Where Does Long-Context Supervision Actually Go? Effective-Context Exposure Balancing

cs.CL · 2026-05-11 · conditional · novelty 6.0

EXACT re-allocates training supervision by inverse frequency of long effective-context targets, improving NoLiMa and RULER scores by 5-18 points on Qwen and LLaMA models without degrading standard QA or reasoning.

Stacked from One: Multi-Scale Self-Injection for Context Window Extension

cs.CL · 2026-03-05 · unverdicted · novelty 6.0

SharedLLM stacks two copies of a short-context LLM so the lower one compresses context into query-aware multi-grained tokens that are injected only at the lowest layers of the upper one, enabling generalization from 8K training to 128K+ inputs.

BlossomRec: Block-level Fused Sparse Attention Mechanism for Sequential Recommendations

cs.IR · 2025-12-15 · unverdicted · novelty 6.0

BlossomRec is a sparse attention mechanism that uses two distinct block-level patterns for long-term and short-term interests, fused by a gated output, to reduce computation in sequential recommendation Transformers.

Kimi Linear: An Expressive, Efficient Attention Architecture

cs.CL · 2025-10-30 · unverdicted · novelty 6.0

Kimi Linear hybridizes linear attention with a new KDA module to beat full attention on tasks while slashing KV cache by 75% and speeding decoding up to 6x.

Positional Encoding via Token-Aware Phase Attention

cs.CL · 2025-09-16 · unverdicted · novelty 6.0

TAPA adds a learnable phase function to attention to preserve long-range token interactions, enabling direct continual pretraining, length extrapolation, lower perplexity, and stronger retrieval than RoPE-style methods.

Accelerating Prefilling via Decoding-time Contribution Sparsity

cs.CL · 2025-07-29 · conditional · novelty 6.0

TriangleMix exploits decoding-time contribution sparsity via a training-free static attention pattern to accelerate LLM prefilling with nearly lossless performance.

eLLM: Elastic Memory Management Framework for Efficient LLM Serving

cs.DC · 2025-06-18 · unverdicted · novelty 6.0

eLLM unifies LLM memory management with virtual tensors and elastic ballooning to CPU memory, reporting 2.32x higher decoding throughput and 3x larger batch sizes for 128K inputs.

Beyond Chunking: Discourse-Aware Hierarchical Retrieval for Long Document Question Answering

cs.IR · 2025-05-26 · unverdicted · novelty 6.0

A hierarchical QA framework converts RST discourse trees into enhanced sentence representations for structure-guided retrieval and reports consistent gains over baselines on four datasets across genres and languages.

MoBA: Mixture of Block Attention for Long-Context LLMs

cs.LG · 2025-02-18 · unverdicted · novelty 6.0

MoBA routes attention over blocks via MoE-style gating to enable dynamic, bias-light long-context attention that matches full attention performance at lower cost.

Depth-Staggered Fibonacci Spacing for Sparse Attention: Static Schedules Beat Learned Dilation and Extrapolate Where Dense Attention Fails

cs.CL · 2026-06-26 · unverdicted · novelty 5.0

Static depth-staggered Fibonacci sparse attention improves perplexity over fixed/learned variants and extrapolates to 4x context while dense attention fails.

Sessa: Selective State Space Attention

cs.LG · 2026-04-20 · unverdicted · novelty 5.0

Sessa integrates attention within recurrent paths to achieve power-law memory tails and flexible non-decaying selective retrieval, outperforming baselines on long-context tasks.

MATCH: Modulating Attention via In-Context Retrieval for Long-Context Transformers

cs.CL · 2026-06-29 · unverdicted · novelty 4.0

MATCH augments sparsified attention with an efficient in-context retrieval system to boost performance on long-range recall tasks in transformers.

Simple Token-Efficient Vision-Language Model for Case-level Pathology Synoptic Report Generation

cs.CV · 2026-05-29 · unverdicted · novelty 4.0

A token-efficient VLM with frozen encoder, two-layer MLP aligner, and LLM decoder generates case-level synoptic pathology reports from multi-WSI inputs using 5x magnification patches and two-stage supervised training.

Transformer-Based Language Models Across Domain Verticals: Architectures, Applications and Critical Assessment

cs.CL · 2026-06-23 · unverdicted · novelty 2.0

A survey paper that taxonomizes transformer architectures, reviews domain applications, and critically assesses deployment trade-offs including parameter-energy costs and alignment issues.

A Comprehensive Overview of Large Language Models

cs.CL · 2023-07-12 · unverdicted · novelty 2.0

A survey paper providing an overview of Large Language Models, their background, and recent advances in the field.

citing papers explorer

Showing 2 of 2 citing papers after filters.

RAT+: Train Dense, Infer Sparse -- Recurrence Augmented Attention for Dilated Inference cs.LG · 2026-02-20 · unreviewed · ref 9 · 2 links
On Efficient Variants of Segment Anything Model: A Survey cs.CV · 2024-10-07 · unreviewed · ref 189

arXiv preprint arXiv:2307.02486 (2023)

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer