pith. sign in

arxiv: 2509.24894 · v4 · pith:MERUPQBDnew · submitted 2025-09-29 · 🧮 math.OC · cs.LG

Improved Stochastic Optimization of LogSumExp

classification 🧮 math.OC cs.LG
keywords optimizationdivergencelogsumexpstochasticapproximationdualgradientadvantages
0
0 comments X
read the original abstract

The LogSumExp function, dual to the Kullback-Leibler (KL) divergence, plays a central role in many important optimization problems, including entropy-regularized optimal transport (OT) and distributionally robust optimization (DRO). In practice, when the number of exponential terms inside the logarithm is large or infinite, optimization becomes challenging since computing the gradient requires differentiating every term. We propose a novel convexity- and smoothness-preserving approximation to LogSumExp that can be efficiently optimized using stochastic gradient methods. This approximation is rooted in a sound modification of the KL divergence in the dual, resulting in a new $f$-divergence called the Safe KL divergence. Our experiments and theoretical analysis of the LogSumExp-based stochastic optimization, arising in DRO and continuous OT, demonstrate the advantages of our approach over existing baselines.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.