ConSA learns FA/SWA allocation via L0 masks and augmented Lagrangian constraints, outperforming rule-based baselines on 0.6B and 1.7B models with consistent layer patterns.
CoRR , volume =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
N-gram draft models give larger and more consistent speed-ups for multilingual speculative decoding than fine-tuned neural drafts, despite lower acceptance rates, across translation and story generation.
citing papers explorer
-
ConSA: Controllable Sparsity in Hybrid Attention via Learnable Allocation
ConSA learns FA/SWA allocation via L0 masks and augmented Lagrangian constraints, outperforming rule-based baselines on 0.6B and 1.7B models with consistent layer patterns.
-
Speculative Decoding Across Languages
N-gram draft models give larger and more consistent speed-ups for multilingual speculative decoding than fine-tuned neural drafts, despite lower acceptance rates, across translation and story generation.