A systematic analysis of hybrid linear attention

URL https://arxiv · 2025 · arXiv 2507.06457

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Component-Aware Self-Speculative Decoding in Hybrid Language Models

cs.CL · 2026-05-01 · unverdicted · novelty 7.0

Component-aware self-speculative decoding achieves high acceptance rates in parallel hybrid models like Falcon-H1 but fails in sequential ones like Qwen3.5, with the gap tied to how components are integrated.

Long-Context Aware Upcycling: A New Frontier for Hybrid LLM Scaling

cs.CL · 2026-04-27 · unverdicted · novelty 6.0

HyLo upcycles Transformer LLMs into hybrids with MLA and Mamba2/Gated DeltaNet blocks via staged training and distillation, extending context to 2M tokens and outperforming prior upcycled hybrids on long-context benchmarks.

Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

Sonata is a small hybrid world model pre-trained to predict future IMU states that outperforms autoregressive baselines on clinical discrimination, fall-risk prediction, and cross-cohort transfer while fitting on-device wearables.

Short window attention enables long-term memorization

cs.LG · 2025-09-29 · unverdicted · novelty 6.0

Short sliding windows in hybrid attention-xLSTM models boost long-context performance by encouraging long-term memory use, and stochastic window sizing improves both short and long tasks.

MDN: Parallelizing Stepwise Momentum for Delta Linear Attention

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

MDN parallelizes stepwise momentum for delta linear attention using geometric reordering and dynamical systems analysis, yielding performance gains over Mamba2 and GDN on 400M and 1.3B models.

Hybrid Architectures for Language Models: Systematic Analysis and Design Insights

cs.CL · 2025-10-06 · unverdicted · novelty 4.0

This work systematically compares inter-layer and intra-layer hybridization strategies for combining self-attention and Mamba-style state space models, evaluating them on language modeling, downstream tasks, long-context performance, scaling, and efficiency to derive optimal design recipes.

citing papers explorer

Showing 6 of 6 citing papers.

Component-Aware Self-Speculative Decoding in Hybrid Language Models cs.CL · 2026-05-01 · unverdicted · none · ref 15
Component-aware self-speculative decoding achieves high acceptance rates in parallel hybrid models like Falcon-H1 but fails in sequential ones like Qwen3.5, with the gap tied to how components are integrated.
Long-Context Aware Upcycling: A New Frontier for Hybrid LLM Scaling cs.CL · 2026-04-27 · unverdicted · none · ref 46
HyLo upcycles Transformer LLMs into hybrids with MLA and Mamba2/Gated DeltaNet blocks via staged training and distillation, extending context to 2M tokens and outperforming prior upcycled hybrids on long-context benchmarks.
Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity cs.LG · 2026-04-20 · unverdicted · none · ref 45
Sonata is a small hybrid world model pre-trained to predict future IMU states that outperforms autoregressive baselines on clinical discrimination, fall-risk prediction, and cross-cohort transfer while fitting on-device wearables.
Short window attention enables long-term memorization cs.LG · 2025-09-29 · unverdicted · none · ref 34
Short sliding windows in hybrid attention-xLSTM models boost long-context performance by encouraging long-term memory use, and stochastic window sizing improves both short and long tasks.
MDN: Parallelizing Stepwise Momentum for Delta Linear Attention cs.LG · 2026-05-07 · unverdicted · none · ref 112
MDN parallelizes stepwise momentum for delta linear attention using geometric reordering and dynamical systems analysis, yielding performance gains over Mamba2 and GDN on 400M and 1.3B models.
Hybrid Architectures for Language Models: Systematic Analysis and Design Insights cs.CL · 2025-10-06 · unverdicted · none · ref 55
This work systematically compares inter-layer and intra-layer hybridization strategies for combining self-attention and Mamba-style state space models, evaluating them on language modeling, downstream tasks, long-context performance, scaling, and efficiency to derive optimal design recipes.

A systematic analysis of hybrid linear attention

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer