WebAggregator: Enhancing Compositional Reasoning Capabilities of Deep Research Agent Foundation Models

· 2025 · cs.CL · arXiv 2510.14438

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open full Pith review browse 4 citing papers arXiv PDF

abstract

The hallmark of Deep Research agents lies in compositional reasoning, the capacity to aggregate distributed, heterogeneous information into coherent logical insights. However, current agentic systems are often retrieval-heavy but reasoning-light, where success is predominantly determined by simple entity-seeking rather than the multi-step aggregation of scattered evidence. To address this, we propose a data synthesis pipeline WebAggregator, designed to shift the agentic paradigm from retrieval-centric to compositional aggregation. Our approach first employs Proactive Explorer to collect interconnected knowledge, then Compositional Logic Proposer to weave knowledge into complex questions using over 12 composition guidelines derived from a rigorous deconstruction of the Deep Research problem setting. By leveraging 10K verifiable QA pairs grounded on 50K websites, we curate a high-quality SFT dataset via rejection sampling. Fine-tuning on this corpus fundamentally transforms agent behavior, fostering deliberate composition reasoning and reduced tool redundancy. The resulting WebAggregator-32B surpasses GPT-4.1 and matches Claude-3.7-Sonnet on GAIA, WebWalkerQA, and XBench. To address the lack of benchmarks that emphasize both reasoning and retrieval, we introduce the WebAggregatorQA testbed, which reveals that even with perfect retrieval, top-tier models still underperformed. These results demonstrate that compositional reasoning, not retrieval, is the true performance ceiling for next-generation research agents.

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents

cs.CL · 2026-06-10 · unverdicted · novelty 7.0

FORT synthesizes shortcut-resistant search tasks by controlling four identified shortcut risks across entity selection, graph construction, question formulation, and refinement, producing training data that yields agents with longer search trajectories and top performance among open-source models on

Beyond Trajectory Rewards: Step-level Credit Assignment for Agentic Search via Graph Modeling

cs.AI · 2026-05-28 · unverdicted · novelty 6.0

GDCR assigns step-level rewards via distance to the answer node in a training-time ER graph and SAPO combines these with trajectory advantages for credit assignment in agentic search.

Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

LLM agents trained with a task-success reward on self-generated knowledge can spontaneously explore and adapt to new environments without any rewards or instructions at inference, yielding 20% gains on web tasks and allowing a 14B model to beat Gemini-2.5-Flash.

SciResearcher: Scaling Deep Research Agents for Frontier Scientific Reasoning

cs.AI · 2026-05-02 · unverdicted · novelty 5.0

SciResearcher is a new agentic data-construction framework that trains an 8B model via supervised fine-tuning and reinforcement learning to reach 19.46% on HLE-Bio/Chem-Gold and 13-15% gains on related biology and literature benchmarks.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

WebAggregator: Enhancing Compositional Reasoning Capabilities of Deep Research Agent Foundation Models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer