PGP achieves global last-iterate convergence for constrained entropy maximization in RL via penalty regularization and hidden convexity despite non-convex policy parameterization.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Workshop notes explain models, subproblems, globalization, and convergence assumptions for PIPA, monotone-LCP PIPA, implicit-programming, and PSQP algorithms applied to MPECs.
citing papers explorer
-
Global Optimality for Constrained Exploration via Penalty Regularization
PGP achieves global last-iterate convergence for constrained entropy maximization in RL via penalty regularization and hidden convexity despite non-convex policy parameterization.
-
Optimization Workshop Notes for Mathematical Programming with Equilibrium Constraints Algorithms: Penalty Interior-Point, Implicit-Programming, and Piecewise SQP
Workshop notes explain models, subproblems, globalization, and convergence assumptions for PIPA, monotone-LCP PIPA, implicit-programming, and PSQP algorithms applied to MPECs.