Delightful Policy Gradient removes exponential corner trapping in softmax policy optimization for bandits and tabular MDPs, achieving logarithmic escape times and global O(1/t) convergence.
Machine learning , volume=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
unclear 1representative citing papers
WeCon introduces gated residual fusion in the encoder, residual fusion in the decoder, and efficient preference optimization to match state-of-the-art hypervolume on MOCOPs while cutting inference time by 40%.
citing papers explorer
-
Delightful Gradients Accelerate Corner Escape
Delightful Policy Gradient removes exponential corner trapping in softmax policy optimization for bandits and tabular MDPs, achieving logarithmic escape times and global O(1/t) convergence.
-
WeCon: An Efficient Weight-Conditioned Neural Solver for Multi-Objective Combinatorial Optimization Problems
WeCon introduces gated residual fusion in the encoder, residual fusion in the decoder, and efficient preference optimization to match state-of-the-art hypervolume on MOCOPs while cutting inference time by 40%.