2025.00098

doi: 10 · 2025 · DOI 10.1109/icsme64153

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open at publisher browse 1 citing papers

representative citing papers

DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning

cs.CL · 2026-05-25 · unverdicted · novelty 5.0

DVAO dynamically weights multi-objective advantages by rollout-group reward variance to bound magnitudes, add cross-objective regularization, and outperform static baselines on math and tool-use tasks with Qwen models.

citing papers explorer

Showing 1 of 1 citing paper after filters.

DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning cs.CL · 2026-05-25 · unverdicted · none · ref 4
DVAO dynamically weights multi-objective advantages by rollout-group reward variance to bound magnitudes, add cross-objective regularization, and outperform static baselines on math and tool-use tasks with Qwen models.

2025.00098

fields

years

verdicts

representative citing papers

citing papers explorer