SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research

Jun Zhou; Kun Tao; Pu Ning; Qianggang Cao; Quan Chen; Tianshu Wang; Xinyu Kong; Xinyu Tang; Zhiqiang Zhang; Zujie Wen

arxiv: 2606.09730 · v1 · pith:OVV6Y4QNnew · submitted 2026-06-08 · 💻 cs.AI

SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research

Pu Ning , Quan Chen , Kun Tao , Xinyu Tang , Tianshu Wang , Qianggang Cao , Xinyu Kong , Zujie Wen

show 2 more authors

Zhiqiang Zhang Jun Zhou

This is my paper

Pith reviewed 2026-06-27 16:20 UTC · model grok-4.3

classification 💻 cs.AI

keywords delegation intelligenceagentic LLMslong-horizon taskstask decompositionsupervised fine-tuningmulti-agent systemsdeep research

0 comments

The pith

A harness generates training trajectories that teach models when and how to delegate subtasks, producing the strongest results among 30B-scale models on deep research benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models struggle with long-horizon tasks because their fixed context windows cannot accommodate ever-growing information needs. The paper shows that delegation intelligence—deciding what to break off, when to hand it to subagents, and how to fold summaries back into the main workflow—can be acquired through supervised fine-tuning. The authors build a harness that steers generation toward high-quality decompositions while forcing subagents to return compact, usable results. The resulting trajectories supply the scarce training signal, and fine-tuning yields SearchSwarm-30B-A3B, which scores 68.1 on BrowseComp and 73.3 on BrowseComp-ZH. The harness, weights, and data are released so others can extend the approach.

Core claim

A harness that guides the main agent through task decomposition and constrains subagents to return properly formatted summaries produces trajectories that encode correct delegation decisions; supervised fine-tuning on these trajectories internalizes delegation intelligence into the model weights, enabling the 30B model to achieve state-of-the-art scores on long-horizon research benchmarks.

What carries the argument

The harness that guides task decomposition, enforces delegation points, and requires subagents to return concise results that conserve the main agent's context budget.

If this is right

Models can sustain workflows whose total context demand grows without bound.
Delegation decisions move from prompt design into learned model behavior.
Open release of harness and trajectories lets the community scale data collection for this skill.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The harness method could be adapted to generate delegation data for domains such as codebases or experimental workflows.
If the learned behavior generalizes, future agent systems might need smaller context windows than direct long-context approaches.
Iterative self-application of the trained model inside the harness could produce higher-quality trajectories without additional human design.

Load-bearing premise

Trajectories produced inside the constrained harness encode delegation decisions that still work when the model faces open-ended tasks without harness guidance.

What would settle it

Test the fine-tuned model on a set of research problems whose required decomposition and delegation steps were never present in the harness data; if accuracy falls below untuned baselines, the generalization claim is falsified.

read the original abstract

Large language models are increasingly expected to handle complex, long-horizon real-world tasks whose context demands can grow without bound, yet model context windows remain inherently finite. Recent work explores a paradigm where a main agent decomposes tasks and dispatches subtasks to subagents, which execute and return only summarized results, conserving the main agent's context budget. However, performing this well requires delegation intelligence: the ability to decompose complex tasks, determine when and what to delegate, and integrate returned results into the ongoing workflow. Training data for this capability is scarce in naturally occurring text, and to our knowledge, how to synthesize such data and train models to acquire this capability remains largely unexplored in the open-source community. To bridge this gap, we present a preliminary exploration targeting deep research, a representative long-horizon agent task. Specifically, we design a harness that guides the model toward high-quality task decomposition and delegation, while constraining subagents to return results properly to support the main agent's workflow. The harness-guided trajectories naturally encode correct delegation decisions, which we use as supervised fine-tuning data to internalize delegation intelligence into model weights. Our resulting model, SearchSwarm-30B-A3B, achieves 68.1 on BrowseComp and 73.3 on BrowseComp-ZH, the best results among all models of comparable scale. We will release our harness, model weights, and training data to facilitate future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The harness for generating delegation trajectories is a concrete step but the reported gains lack any controls or ablations to show the model learned independent delegation.

read the letter

The one thing to know is that this paper describes a harness to produce SFT trajectories for teaching LLMs when and how to delegate subtasks in long-horizon research, then claims the resulting 30B model leads on BrowseComp. The harness itself looks like the actual new piece.

What stands out is the specific setup that guides decomposition while constraining subagent outputs to usable summaries. That produces trajectories they fine-tune on. Prior agent papers already talk about decomposition and subagents, but the open-source harness for deep-research delegation data is not described in the references they cite. Releasing the harness, weights, and data is straightforward and helpful.

The framing of context limits and the scarcity of natural delegation examples is clear. The basic logic of using guided trajectories to internalize the behavior makes sense on paper.

The soft spots sit in the evaluation. The abstract gives the 68.1 and 73.3 numbers with no baselines, no ablation on the harness, no statistical detail, and no account of how BrowseComp was scored or whether the harness stayed active at test time. There is also no held-out test of delegation quality on fully unconstrained tasks. That leaves the central claim—that the model acquired delegation intelligence that generalizes—without direct support. The gains could still be harness artifacts.

This is for groups already building agentic systems and running SFT on research-style tasks. A reader in that space could extract the harness idea and try it.

I would send it for peer review once the methods and controls are added, because the pipeline is worth checking even if the current results stay preliminary.

Referee Report

1 major / 0 minor

Summary. The manuscript presents SearchSwarm, a preliminary method for acquiring delegation intelligence in agentic LLMs for long-horizon deep research. A harness guides task decomposition and constrains subagent returns to produce trajectories that are used as supervised fine-tuning data; the resulting SearchSwarm-30B-A3B model reports 68.1 on BrowseComp and 73.3 on BrowseComp-ZH, stated as the best results among models of comparable scale. The authors note the scarcity of natural training data for delegation and commit to releasing the harness, model weights, and training data.

Significance. If the central claim holds, the work would supply a concrete, open-source recipe for synthesizing delegation trajectories at scale, addressing a recognized bottleneck for long-horizon agentic systems. The planned release of harness, weights, and data constitutes a concrete community contribution that would support reproducibility and follow-on experiments.

major comments (1)

[Abstract] Abstract, paragraph on harness-guided trajectories: the assertion that these trajectories 'naturally encode correct delegation decisions' which SFT then internalizes for generalization to unconstrained open-ended tasks is load-bearing for the central claim, yet the manuscript supplies no ablation comparing harness-on versus harness-off inference, no description of harness removal at test time, and no held-out evaluation of delegation quality outside the BrowseComp harness setting. This omission leaves open whether reported gains reflect learned delegation intelligence or continued harness effects.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for this constructive comment, which correctly identifies a key evidentiary gap in supporting the central claim of internalized delegation intelligence. We address the point directly below and commit to revisions.

read point-by-point responses

Referee: [Abstract] Abstract, paragraph on harness-guided trajectories: the assertion that these trajectories 'naturally encode correct delegation decisions' which SFT then internalizes for generalization to unconstrained open-ended tasks is load-bearing for the central claim, yet the manuscript supplies no ablation comparing harness-on versus harness-off inference, no description of harness removal at test time, and no held-out evaluation of delegation quality outside the BrowseComp harness setting. This omission leaves open whether reported gains reflect learned delegation intelligence or continued harness effects.

Authors: We agree the manuscript requires clarification on this point to substantiate the claim. The harness is employed exclusively during trajectory synthesis to produce high-quality SFT data; at inference the fine-tuned model is intended to operate without it. In the revised manuscript we will add: (1) an explicit section describing harness removal at test time and the unconstrained inference protocol; (2) an ablation comparing harness-on versus harness-off performance on a representative subset of BrowseComp tasks to isolate the contribution of learned delegation; and (3) a limitations discussion noting the current reliance on BrowseComp as the primary held-out benchmark while outlining plans for additional delegation-specific metrics. These changes will directly address whether observed gains derive from internalized capabilities rather than residual harness effects. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical SFT on harness trajectories

full rationale

The paper describes an empirical pipeline: a harness generates trajectories that are then used as SFT data to train delegation behavior. No equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations appear in the provided text. The central claim (benchmark gains after SFT) is a measured outcome rather than a quantity forced by construction from the inputs. Generalization from harness to open-ended use is an unproven assumption but does not constitute circularity under the defined patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard assumption that supervised fine-tuning on high-quality trajectories will cause the model to internalize the demonstrated delegation policy; no new physical or mathematical entities are introduced.

axioms (1)

domain assumption Supervised fine-tuning on trajectories generated under harness constraints will produce generalization to unconstrained tasks
Invoked when the abstract states that harness-guided trajectories are used as SFT data to internalize delegation intelligence

pith-pipeline@v0.9.1-grok · 5816 in / 1203 out tokens · 20863 ms · 2026-06-27T16:20:23.848083+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references

[1]

Second M1 funding locked in as part of economic recovery to create jobs https://statements.qld.gov.au/statements/908 28
[2]

First contract awarded for $1.53bn QLD Coomera Connector Stage 1 https://www.felix.net/project-news/first-contr act-awarded-for-1.53bn-qld-coomera-connector-stage-1
[3]

1093 (PDF) https://documents.parliament.qld.gov.au/tableoffice/questionsanswers /2021/1093-2021.pdf

Question on Notice No. 1093 (PDF) https://documents.parliament.qld.gov.au/tableoffice/questionsanswers /2021/1093-2021.pdf

2021
[4]

Coomera Connector Stage 1 North opens to traffic https://www.infrastructure.gov.au/department/media/news/co omera-connector-stage-1-north-opens-traffic
[5]

$3.4 billion Coomera Connector stage one to open after construction delays https://www.abc.net.au/news/2025-12-01/fi rst-stage-of-gold-coast-coomera-connector-to-open-to-motorists/106085710

arXiv 2025
[6]

Coomera Connector – Wikipediahttps://en.wikipedia.org/wiki/Coomera_Connector
[7]

Coomera Connector - Stage One - Central - Infrastructure Pipeline https://infrastructurepipeline.org/project/coome ra-connector-stage-one-central
[8]

INLink celebrates official commencement of Inland Rail project https://www.bmdgroup.global/news/inlink-celebrate s-official-commencement-of-inland-rail-project
[9]

Inland Rail construction begins (Senator’s media release)https://ministers.finance.gov.au/financeminister/media -release/2018/12/13/inland-rail-construction-begins(search snippet)

2018
[10]

Inland Rail Section 5: Parkes to Narromine (P2N) - Fulton Hogan https://www.fultonhogan.com/keyprojects/inland-r ail-section-5-parkes-to-narromine-p2n/
[11]

Parkes to Narromine Inland Rail complete - ARTC https://www.artc.com.au/2020/09/15/parkes-to-narromine-i nland-rail-complete/(search snippet)

2020
[12]

RTI Release – TMR Queensland https://www.tmr.qld.gov.au/_/media/aboutus/rti/disclog/2020/r_rti-100 3-release.pdf(search snippet)

2020
[13]

Name revealed for new $3.5 billion Gold Coast motorway Big Rigs https://bigrigs.com.au/2025/08/27/name-reveale d-for-new-3-5-billion-gold-coast-motorway/(search snippet)

2025
[14]

M12 Motorway (Sydney) – Wikipediahttps://en.wikipedia.org/wiki/M12_Motorway_(Sydney)(search snippet)
[15]

Northbound lanes open for first time on $2.2 billion Coffs Harbour bypass https://bigrigs.com.au/2025/05/02/northbou nd-lanes-open-for-first-time-on-2-2-billion-coffs-harbour-bypass/(search snippet)

2025
[16]

West Gate Tunnel Project Victoria’s Big Buildhttps://bigbuild.vic.gov.au/projects/west-gate-tunnel-project (search snippet) 25

[1] [1]

Second M1 funding locked in as part of economic recovery to create jobs https://statements.qld.gov.au/statements/908 28

[2] [2]

First contract awarded for $1.53bn QLD Coomera Connector Stage 1 https://www.felix.net/project-news/first-contr act-awarded-for-1.53bn-qld-coomera-connector-stage-1

[3] [3]

1093 (PDF) https://documents.parliament.qld.gov.au/tableoffice/questionsanswers /2021/1093-2021.pdf

Question on Notice No. 1093 (PDF) https://documents.parliament.qld.gov.au/tableoffice/questionsanswers /2021/1093-2021.pdf

2021

[4] [4]

Coomera Connector Stage 1 North opens to traffic https://www.infrastructure.gov.au/department/media/news/co omera-connector-stage-1-north-opens-traffic

[5] [5]

$3.4 billion Coomera Connector stage one to open after construction delays https://www.abc.net.au/news/2025-12-01/fi rst-stage-of-gold-coast-coomera-connector-to-open-to-motorists/106085710

arXiv 2025

[6] [6]

Coomera Connector – Wikipediahttps://en.wikipedia.org/wiki/Coomera_Connector

[7] [7]

Coomera Connector - Stage One - Central - Infrastructure Pipeline https://infrastructurepipeline.org/project/coome ra-connector-stage-one-central

[8] [8]

INLink celebrates official commencement of Inland Rail project https://www.bmdgroup.global/news/inlink-celebrate s-official-commencement-of-inland-rail-project

[9] [9]

Inland Rail construction begins (Senator’s media release)https://ministers.finance.gov.au/financeminister/media -release/2018/12/13/inland-rail-construction-begins(search snippet)

2018

[10] [10]

Inland Rail Section 5: Parkes to Narromine (P2N) - Fulton Hogan https://www.fultonhogan.com/keyprojects/inland-r ail-section-5-parkes-to-narromine-p2n/

[11] [11]

Parkes to Narromine Inland Rail complete - ARTC https://www.artc.com.au/2020/09/15/parkes-to-narromine-i nland-rail-complete/(search snippet)

2020

[12] [12]

RTI Release – TMR Queensland https://www.tmr.qld.gov.au/_/media/aboutus/rti/disclog/2020/r_rti-100 3-release.pdf(search snippet)

2020

[13] [13]

Name revealed for new $3.5 billion Gold Coast motorway Big Rigs https://bigrigs.com.au/2025/08/27/name-reveale d-for-new-3-5-billion-gold-coast-motorway/(search snippet)

2025

[14] [14]

M12 Motorway (Sydney) – Wikipediahttps://en.wikipedia.org/wiki/M12_Motorway_(Sydney)(search snippet)

[15] [15]

Northbound lanes open for first time on $2.2 billion Coffs Harbour bypass https://bigrigs.com.au/2025/05/02/northbou nd-lanes-open-for-first-time-on-2-2-billion-coffs-harbour-bypass/(search snippet)

2025

[16] [16]

West Gate Tunnel Project Victoria’s Big Buildhttps://bigbuild.vic.gov.au/projects/west-gate-tunnel-project (search snippet) 25