Opponent-aware peer-learning corrections in finite-unroll Meta-MAPG increase entry probability into target stable-Nash basins relative to standard policy gradient, with annealing to recover local convergence.
Machine Learning , volume=
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
BoostAPR boosts automated program repair by training a sequence-level assessor and line-level credit allocator from execution outcomes, then applying them in PPO to reach 40.7% on SWE-bench Verified.
RLBD trains a neural policy with REINFORCE to select cuts adaptively in Benders decomposition, yielding faster convergence and better generalization than standard BD or SVM-based LearnBD on an EV charging problem.
citing papers explorer
-
Equilibrium Selection in Multi-Agent Policy Gradients via Opponent-Aware Basin Entry
Opponent-aware peer-learning corrections in finite-unroll Meta-MAPG increase entry probability into target stable-Nash basins relative to standard policy gradient, with annealing to recover local convergence.
-
BoostAPR: Boosting Automated Program Repair via Execution-Grounded Reinforcement Learning with Dual Reward Models
BoostAPR boosts automated program repair by training a sequence-level assessor and line-level credit allocator from execution outcomes, then applying them in PPO to reach 40.7% on SWE-bench Verified.
-
Learning to Cut: Reinforcement Learning for Benders Decomposition
RLBD trains a neural policy with REINFORCE to select cuts adaptively in Benders decomposition, yielding faster convergence and better generalization than standard BD or SVM-based LearnBD on an EV charging problem.