Weasel is a trajectory selection method that improves out-of-domain generalization for web agents while achieving 9.7-12.5x training speedups via importance-diversity optimization, AXTree pruning, and rationale style matching.
Data Diversity Matters for Robust Instruction Tuning
2 Pith papers cite this work, alongside 11 external citations. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
FEST improves RLVR sample efficiency on math and coding benchmarks by combining supervised signals, on-policy signals, and decaying weights on just 128 randomly chosen demonstrations, matching full-dataset baselines.
citing papers explorer
-
Weasel: Out-of-Domain Generalization for Web Agents via Importance-Diversity Data Selection
Weasel is a trajectory selection method that improves out-of-domain generalization for web agents while achieving 9.7-12.5x training speedups via importance-diversity optimization, AXTree pruning, and rationale style matching.
-
Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance
FEST improves RLVR sample efficiency on math and coding benchmarks by combining supervised signals, on-policy signals, and decaying weights on just 128 randomly chosen demonstrations, matching full-dataset baselines.