A new 590k-instance dataset built with hard-negative mining and dual-agent verification, plus progressive SFT-to-ORPO-to-GRPO training, yields 58.7% step success on Mind2Web, beating GPT-4.5 and Claude-4.5.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
From Imitation to Discrimination: Progressive Curriculum Learning for Robust Web Navigation
A new 590k-instance dataset built with hard-negative mining and dual-agent verification, plus progressive SFT-to-ORPO-to-GRPO training, yields 58.7% step success on Mind2Web, beating GPT-4.5 and Claude-4.5.