pith. sign in

hub

arXiv preprint arXiv:1704.00805 , year=

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it
abstract

In this paper, we utilize results from convex analysis and monotone operator theory to derive additional properties of the softmax function that have not yet been covered in the existing literature. In particular, we show that the softmax function is the monotone gradient map of the log-sum-exp function. By exploiting this connection, we show that the inverse temperature parameter determines the Lipschitz and co-coercivity properties of the softmax function. We then demonstrate the usefulness of these properties through an application in game-theoretic reinforcement learning.

hub tools

citation-role summary

background 1

citation-polarity summary

verdicts

UNVERDICTED 13

roles

background 1

polarities

background 1

representative citing papers

Sharp Spectral Thresholds for Logit Fixed Points

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

For finite-dimensional affine logit systems the sharp dimension-free stability threshold is β‖ΠWΠ‖_{T→T}<2, extending the certified regime beyond classical conservative bounds.

On Bayesian Softmax-Gated Mixture-of-Experts Models

stat.ML · 2026-04-22 · unverdicted · novelty 7.0

Bayesian softmax-gated mixture-of-experts models achieve posterior contraction for density estimation and parameter recovery using Voronoi losses, plus two strategies for choosing the number of experts.

Informative Graph Structure Learning

cs.LG · 2026-05-16 · unverdicted · novelty 5.0

InGSL reduces edge redundancy in existing graph structure learning methods by adding a mutual-information-guided diversity term, delivering better results with fewer edges across six tested frameworks.

Structure-Centric Graph Foundation Model via Geometric Bases

cs.LG · 2026-05-09 · unverdicted · novelty 5.0

SCGFM creates transferable graph representations by aligning heterogeneous topologies to shared learnable geometric bases via Gromov-Wasserstein distances and re-encoding features accordingly.

Learning Cut Distributions with Quantum Optimization

quant-ph · 2026-04-15 · unverdicted · novelty 5.0

QAOA ansatz with finite layers can capture any bitstring distribution and solves the Fair Cut Cover problem with provable and empirical advantages over classical approximations on certain graphs.

citing papers explorer

Showing 13 of 13 citing papers.