A novel decoupled method for distributed saddle problems achieves optimal communication complexity via multi-stage residual norm minimization, with a matching lower bound and extension to variational inequalities.
Dokl akad nauk Sssr , volume=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
MDN parallelizes stepwise momentum for delta linear attention using geometric reordering and dynamical systems analysis, yielding performance gains over Mamba2 and GDN on 400M and 1.3B models.
citing papers explorer
-
Efficient Gradient Methods for Distributed Saddle Problems
A novel decoupled method for distributed saddle problems achieves optimal communication complexity via multi-stage residual norm minimization, with a matching lower bound and extension to variational inequalities.
-
MDN: Parallelizing Stepwise Momentum for Delta Linear Attention
MDN parallelizes stepwise momentum for delta linear attention using geometric reordering and dynamical systems analysis, yielding performance gains over Mamba2 and GDN on 400M and 1.3B models.