ACE-MAPPO combines genetic soft updates, evolutionary replay, and adversarial curriculum learning with MAPPO to improve stability, speed, and win rate in cooperative air combat simulations.
(2021).PTR-PPO: Proximal Policy Optimization with Prioritized Trajectory Replay
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Randomly dropping 25% of transitions in PPO rollouts stabilizes training dynamics across five environments while matching vanilla PPO reward performance.
citing papers explorer
-
Not All Transitions Matter: Evidence from PPO
Randomly dropping 25% of transitions in PPO rollouts stabilizes training dynamics across five environments while matching vanilla PPO reward performance.