Webexplorer: Explore and evolve for training long-horizon web agents

JuntengLiu,YunjiLi,ChiZhang,JingyangLi,AiliChen,KeJi,WeiyuCheng,ZijiaWu,ChengyuDu, QidiXu,etal · 2025 · arXiv 2509.06501

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 1 baseline 1

citation-polarity summary

background 1 baseline 1

representative citing papers

Terminal-World: Scaling Terminal-Agent Environments via Agent Skills

cs.CL · 2026-05-20 · unverdicted · novelty 7.0

Terminal-World is a skill-based synthesis pipeline that generates 5,723 training environments and produces Terminal-World-32B which outperforms baselines on Terminal-Bench 2.0 using only 1.2% of the data.

Efficient Agentic Reasoning Through Self-Regulated Simulative Planning

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

SR²AM achieves competitive Pass@1 accuracy on diverse tasks with 25.8-95.3% fewer reasoning tokens than much larger models by using self-regulated simulative planning trained via supervised learning and RL.

Argus: Evidence Assembly for Scalable Deep Research Agents

cs.CL · 2026-05-15 · unverdicted · novelty 6.0 · 2 refs

Argus coordinates a Navigator and multiple Searchers via an evidence graph for deep research, reporting average gains of 5.5 points with one Searcher and 12.7 points with eight parallel Searchers across eight benchmarks, reaching 86.2 on BrowseComp with 64 Searchers.

SciResearcher: Scaling Deep Research Agents for Frontier Scientific Reasoning

cs.AI · 2026-05-02 · unverdicted · novelty 6.0

SciResearcher automates creation of diverse scientific reasoning tasks from academic evidence to train an 8B model that sets new SOTA at 19.46% on HLE-Bio/Chem-Gold and gains 13-15% on SuperGPQA-Hard-Biology and TRQA-Literature.

Learning to Retrieve from Agent Trajectories

cs.IR · 2026-03-30 · conditional · novelty 6.0

Retrievers trained on agent trajectories via the LRAT framework improve evidence recall, task success, and efficiency in agentic search benchmarks.

MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

cs.CL · 2025-11-14 · unverdicted · novelty 6.0

MiroThinker shows that scaling agent-environment interactions via reinforcement learning lets a 72B open-source model reach up to 81.9% on GAIA and approach commercial performance on research benchmarks.

Automated Root-Cause Subclassification and No-Code Fix Generation for Invalid Bug Reports

cs.SE · 2026-05-17 · unverdicted · novelty 5.0

RAG reaches 0.66 weighted F1 on invalid bug report subclassification while agentic web search reaches 68.9% judge success on no-code fix generation, using a new gold-standard benchmark.

citing papers explorer

Showing 7 of 7 citing papers.

Terminal-World: Scaling Terminal-Agent Environments via Agent Skills cs.CL · 2026-05-20 · unverdicted · none · ref 25
Terminal-World is a skill-based synthesis pipeline that generates 5,723 training environments and produces Terminal-World-32B which outperforms baselines on Terminal-Bench 2.0 using only 1.2% of the data.
Efficient Agentic Reasoning Through Self-Regulated Simulative Planning cs.AI · 2026-05-21 · unverdicted · none · ref 49
SR²AM achieves competitive Pass@1 accuracy on diverse tasks with 25.8-95.3% fewer reasoning tokens than much larger models by using self-regulated simulative planning trained via supervised learning and RL.
Argus: Evidence Assembly for Scalable Deep Research Agents cs.CL · 2026-05-15 · unverdicted · none · ref 53 · 2 links
Argus coordinates a Navigator and multiple Searchers via an evidence graph for deep research, reporting average gains of 5.5 points with one Searcher and 12.7 points with eight parallel Searchers across eight benchmarks, reaching 86.2 on BrowseComp with 64 Searchers.
SciResearcher: Scaling Deep Research Agents for Frontier Scientific Reasoning cs.AI · 2026-05-02 · unverdicted · none · ref 20
SciResearcher automates creation of diverse scientific reasoning tasks from academic evidence to train an 8B model that sets new SOTA at 19.46% on HLE-Bio/Chem-Gold and gains 13-15% on SuperGPQA-Hard-Biology and TRQA-Literature.
Learning to Retrieve from Agent Trajectories cs.IR · 2026-03-30 · conditional · none · ref 7
Retrievers trained on agent trajectories via the LRAT framework improve evidence recall, task success, and efficiency in agentic search benchmarks.
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling cs.CL · 2025-11-14 · unverdicted · none · ref 21
MiroThinker shows that scaling agent-environment interactions via reinforcement learning lets a 72B open-source model reach up to 81.9% on GAIA and approach commercial performance on research benchmarks.
Automated Root-Cause Subclassification and No-Code Fix Generation for Invalid Bug Reports cs.SE · 2026-05-17 · unverdicted · none · ref 34
RAG reaches 0.66 weighted F1 on invalid bug report subclassification while agentic web search reaches 68.9% judge success on no-code fix generation, using a new gold-standard benchmark.

Webexplorer: Explore and evolve for training long-horizon web agents

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer