Improved Stochastic Optimization of LogSumExp

Alexey Kroshnin; Egor Gladin; Jia-Jie Zhu; Pavel Dvurechensky

arxiv: 2509.24894 · v4 · pith:MERUPQBDnew · submitted 2025-09-29 · 🧮 math.OC · cs.LG

Improved Stochastic Optimization of LogSumExp

Egor Gladin , Alexey Kroshnin , Jia-Jie Zhu , Pavel Dvurechensky This is my paper

classification 🧮 math.OC cs.LG

keywords optimizationdivergencelogsumexpstochasticapproximationdualgradientadvantages

0 comments

read the original abstract

The LogSumExp function, dual to the Kullback-Leibler (KL) divergence, plays a central role in many important optimization problems, including entropy-regularized optimal transport (OT) and distributionally robust optimization (DRO). In practice, when the number of exponential terms inside the logarithm is large or infinite, optimization becomes challenging since computing the gradient requires differentiating every term. We propose a novel convexity- and smoothness-preserving approximation to LogSumExp that can be efficiently optimized using stochastic gradient methods. This approximation is rooted in a sound modification of the KL divergence in the dual, resulting in a new $f$-divergence called the Safe KL divergence. Our experiments and theoretical analysis of the LogSumExp-based stochastic optimization, arising in DRO and continuous OT, demonstrate the advantages of our approach over existing baselines.

This paper has not been read by Pith yet.

Improved Stochastic Optimization of LogSumExp

discussion (0)