Pith Number

pith:STZE3XGY

pith:2025:STZE3XGYUA5FI64VOLYHDLMIWB

not attested not anchored not stored refs resolved

WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Chenxi Wang, Fei Huang, Jialong Wu, Jingren Zhou, Kuan Li, Pengjun Xie, Peng Xia, Qiuchen Wang, Ruixue Ding, Xinyu Geng, Xinyu Wang, Yida Zhao, Yong Jiang, Zhen Zhang

WebWatcher trains a vision-language agent on synthetic multimodal trajectories and reinforcement learning to outperform baselines on complex VQA tasks.

arxiv:2508.05748 v3 · 2025-08-07 · cs.IR

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{STZE3XGYUA5FI64VOLYHDLMIWB}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Experimental results show that WebWatcher significantly outperforms proprietary baseline, RAG workflow and open-source agents in four challenging VQA benchmarks, which paves the way for solving complex multimodal information-seeking tasks.

C2weakest assumption

That high-quality synthetic multimodal trajectories enable efficient cold start training for agents requiring stronger reasoning in perception, logic, knowledge, and that reinforcement learning further enhances generalization to complex tasks.

C3one line summary

WebWatcher introduces a vision-language deep research agent trained on synthetic multimodal trajectories and RL that outperforms baselines on VQA benchmarks, along with a new BrowseComp-VL evaluation.

References

31 extracted · 31 resolved · 11 Pith anchors

[1] Qwen2.5-VL Technical Report · arXiv:2502.13923

[2] Why reasoning matters? a survey of advancements in multimodal reasoning (v1)

[3] Evaluating Large Language Models Trained on Code · arXiv:2107.03374

[4] M3 cot: A novel benchmark for multi- domain multi-step multi-modal chain-of-thought

[5] arXiv preprint arXiv:2302.11713 , year=

Formal links

2 machine-checked theorem links

Cited by

32 papers in Pith

Gen-Searcher: Reinforcing Agentic Search for Image Generation

SVFSearch: A Multimodal Knowledge-Intensive Benchmark for Short-Video Frame Search in the Gaming Vertical Domain

FIKA-Bench: From Fine-grained Recognition to Fine-Grained Knowledge Acquisition

Don't Guess, Just Ask: Resolving Ambiguity in Referring Segmentation via Multi-turn Clarification

SVFSearch: A Multimodal Knowledge-Intensive Benchmark for Short-Video Frame Search in the Gaming Vertical Domain

Receipt and verification

First computed	2026-05-17T23:38:50.509905Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

94f24ddcd8a03a547b9572f071ad88b064a7504c02a0adb1f23fbe038cec5ac2

Aliases

arxiv: 2508.05748 · arxiv_version: 2508.05748v3 · doi: 10.48550/arxiv.2508.05748 · pith_short_12: STZE3XGYUA5F · pith_short_16: STZE3XGYUA5FI64V · pith_short_8: STZE3XGY

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/STZE3XGYUA5FI64VOLYHDLMIWB \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 94f24ddcd8a03a547b9572f071ad88b064a7504c02a0adb1f23fbe038cec5ac2

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "e5f2ae3615b247e22deaa32da02a6ac383263c0d2ad78dace4e467850ce21504",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.IR",
    "submitted_at": "2025-08-07T18:03:50Z",
    "title_canon_sha256": "a543c002b68a22ea3cccb801774aeff5d9c3a7cd3a2ef1ba117c6e419776e988"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2508.05748",
    "kind": "arxiv",
    "version": 3
  }
}