Title resolution pending

Longformer: The Long-Document Transformer , author= · 2020

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Divide-Prompt-Refine: a Training-Free, Structure-Aware Framework for Biomedical Abstract Generation

cs.CL · 2026-05-20 · conditional · novelty 6.0

DPR-BAG generates factually grounded biomedical abstracts from full texts via structured BOMRC decomposition, parallel LLM prompting, and coherence refinement without any model training.

Predictive Prefetching for Retrieval-Augmented Generation

cs.CL · 2026-05-18 · unverdicted · novelty 6.0

Introduces predictive prefetching for RAG that anticipates retrieval needs several tokens ahead via three components, reporting up to 43.5% latency reduction and 62.4% TTFT improvement while preserving answer quality.

Prefix-Adaptive Block Diffusion for Efficient Document Recognition

cs.CV · 2026-05-16 · unverdicted · novelty 6.0

PA-BDM adapts block diffusion by switching to causal intra-block denoising and dynamically committing reliable prefixes to KV cache, yielding higher accuracy and 71.6% higher throughput than a comparable baseline on document benchmarks.

Flex Attention: A Programming Model for Generating Optimized Attention Kernels

cs.LG · 2024-12-07 · unverdicted · novelty 6.0

FlexAttention supplies a compiler-driven interface that expresses common attention variants in a few lines of PyTorch and emits optimized kernels whose speed matches hand-written implementations.

DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

cs.LG · 2023-09-25 · accept · novelty 6.0

DeepSpeed-Ulysses keeps communication volume constant for sequence-parallel attention when sequence length and device count scale together, delivering 2.5x faster training on 4x longer sequences than prior SOTA.

Generating Query-Focused Summarization Datasets from Query-Free Summarization Datasets

cs.CL · 2026-05-06 · unverdicted · novelty 5.0

An evidence-based model generates queries from query-free datasets, yielding summaries with competitive ROUGE scores to those using original queries.

Mesh Based Simulations with Spatial and Temporal awareness

cs.LG · 2026-05-02 · unverdicted · novelty 5.0

A unified training framework for mesh-based ML surrogates in CFD improves accuracy and long-horizon stability by enforcing spatial derivative consistency via multi-node prediction, using temporal cross-attention correction, and adding 3D rotary positional embeddings.

Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models

cs.LG · 2026-04-24 · unverdicted · novelty 5.0

Toeplitz MLP Mixers replace attention with masked Toeplitz multiplications for sub-quadratic complexity while retaining more sequence information and outperforming on copying and in-context tasks.

Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning

cs.CL · 2026-04-06 · unverdicted · novelty 5.0 · 2 refs

ProxyCoT transfers CoT reasoning from proxy short contexts to full long contexts through RL/distillation followed by SFT, outperforming baselines with lower overhead and generalizing out-of-domain.

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

cs.CL · 2024-12-18 · unverdicted · novelty 5.0

ModernBERT is a new bidirectional encoder model achieving SOTA performance on diverse classification and retrieval benchmarks while offering superior speed and memory efficiency for long-context inference.

Simply Stabilizing the Loop via Fully Looped Transformer

cs.LG · 2026-05-11

citing papers explorer

Showing 11 of 11 citing papers.

Divide-Prompt-Refine: a Training-Free, Structure-Aware Framework for Biomedical Abstract Generation cs.CL · 2026-05-20 · conditional · none · ref 22
DPR-BAG generates factually grounded biomedical abstracts from full texts via structured BOMRC decomposition, parallel LLM prompting, and coherence refinement without any model training.
Predictive Prefetching for Retrieval-Augmented Generation cs.CL · 2026-05-18 · unverdicted · none · ref 34
Introduces predictive prefetching for RAG that anticipates retrieval needs several tokens ahead via three components, reporting up to 43.5% latency reduction and 62.4% TTFT improvement while preserving answer quality.
Prefix-Adaptive Block Diffusion for Efficient Document Recognition cs.CV · 2026-05-16 · unverdicted · none · ref 96
PA-BDM adapts block diffusion by switching to causal intra-block denoising and dynamically committing reliable prefixes to KV cache, yielding higher accuracy and 71.6% higher throughput than a comparable baseline on document benchmarks.
Flex Attention: A Programming Model for Generating Optimized Attention Kernels cs.LG · 2024-12-07 · unverdicted · none · ref 8
FlexAttention supplies a compiler-driven interface that expresses common attention variants in a few lines of PyTorch and emits optimized kernels whose speed matches hand-written implementations.
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models cs.LG · 2023-09-25 · accept · none · ref 43
DeepSpeed-Ulysses keeps communication volume constant for sequence-parallel attention when sequence length and device count scale together, delivering 2.5x faster training on 4x longer sequences than prior SOTA.
Generating Query-Focused Summarization Datasets from Query-Free Summarization Datasets cs.CL · 2026-05-06 · unverdicted · none · ref 27
An evidence-based model generates queries from query-free datasets, yielding summaries with competitive ROUGE scores to those using original queries.
Mesh Based Simulations with Spatial and Temporal awareness cs.LG · 2026-05-02 · unverdicted · none · ref 145
A unified training framework for mesh-based ML surrogates in CFD improves accuracy and long-horizon stability by enforcing spatial derivative consistency via multi-node prediction, using temporal cross-attention correction, and adding 3D rotary positional embeddings.
Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models cs.LG · 2026-04-24 · unverdicted · none · ref 39
Toeplitz MLP Mixers replace attention with masked Toeplitz multiplications for sub-quadratic complexity while retaining more sequence information and outperforming on copying and in-context tasks.
Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning cs.CL · 2026-04-06 · unverdicted · none · ref 31 · 2 links
ProxyCoT transfers CoT reasoning from proxy short contexts to full long contexts through RL/distillation followed by SFT, outperforming baselines with lower overhead and generalizing out-of-domain.
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference cs.CL · 2024-12-18 · unverdicted · none · ref 110
ModernBERT is a new bidirectional encoder model achieving SOTA performance on diverse classification and retrieval benchmarks while offering superior speed and memory efficiency for long-context inference.
Simply Stabilizing the Loop via Fully Looped Transformer cs.LG · 2026-05-11 · unreviewed · ref 41

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer