Title resolution pending

· 2024 · arXiv 2410.10800

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Muon Does Not Converge on Convex Lipschitz Functions

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

Muon does not converge on convex Lipschitz functions regardless of learning rate, while error feedback restores theoretical convergence but degrades performance on CIFAR-10 and nanoGPT tasks.

Constrained Stochastic Spectral Preconditioning Converges for Nonconvex Objectives

math.OC · 2026-05-12 · unverdicted · novelty 5.0

Proximal stochastic spectral preconditioning converges for nonconvex constrained objectives under heavy-tailed noise, with a variance-reduced version achieving faster rates and a refined analysis of Muon iterations.

DADA: Dual Averaging with Distance Adaptation

math.OC · 2025-01-17 · unverdicted · novelty 5.0

DADA is a parameter-free dual averaging method for convex optimization that adapts to local function growth and applies to nonsmooth, smooth, Holder-smooth, and other classes for both constrained and unbounded domains without prior knowledge of iteration count or accuracy.

citing papers explorer

Showing 3 of 3 citing papers.

Muon Does Not Converge on Convex Lipschitz Functions cs.LG · 2026-05-09 · unverdicted · none · ref 91
Muon does not converge on convex Lipschitz functions regardless of learning rate, while error feedback restores theoretical convergence but degrades performance on CIFAR-10 and nanoGPT tasks.
Constrained Stochastic Spectral Preconditioning Converges for Nonconvex Objectives math.OC · 2026-05-12 · unverdicted · none · ref 64
Proximal stochastic spectral preconditioning converges for nonconvex constrained objectives under heavy-tailed noise, with a variance-reduced version achieving faster rates and a refined analysis of Muon iterations.
DADA: Dual Averaging with Distance Adaptation math.OC · 2025-01-17 · unverdicted · none · ref 23
DADA is a parameter-free dual averaging method for convex optimization that adapts to local function growth and applies to nonsmooth, smooth, Holder-smooth, and other classes for both constrained and unbounded domains without prior knowledge of iteration count or accuracy.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer