PRO-CUA trains CUAs via decoupled on-policy rollouts and PRM-guided step-level optimization to enable dense credit assignment without expert trajectories or golden answers.
Gui-pra: Process reward agent for gui tasks
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.AI 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Xiaomi-GUI-0 reports 72.0% success on an in-house real-mobile benchmark and 78.9% on AndroidWorld after training a GUI agent in a real-device closed loop with an error-driven data flywheel and three-stage RL pipeline.
citing papers explorer
-
PRO-CUA: Process-Reward Optimization for Computer Use Agents
PRO-CUA trains CUAs via decoupled on-policy rollouts and PRM-guided step-level optimization to enable dense credit assignment without expert trajectories or golden answers.
-
Xiaomi-GUI-0 Technical Report
Xiaomi-GUI-0 reports 72.0% success on an in-house real-mobile benchmark and 78.9% on AndroidWorld after training a GUI agent in a real-device closed loop with an error-driven data flywheel and three-stage RL pipeline.