pith. sign in
Pith Number

pith:M2KQFQTZ

pith:2025:M2KQFQTZY6GYBGKAP6QP7DYZUW
not attested not anchored not stored refs resolved

MMSearch-R1: Incentivizing LMMs to Search

Bo Li, Bo You, Jinming Wu, Wei Li, Yiding Liu, Zejun Ma, Zihao Deng, Ziwei Liu

Reinforcement learning lets multimodal models search the internet only when needed

arxiv:2506.20670 v1 · 2025-06-25 · cs.CV · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{M2KQFQTZY6GYBGKAP6QP7DYZUW}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We present MMSearch-R1, the first end-to-end reinforcement learning framework that enables LMMs to perform on-demand, multi-turn search in real-world Internet environments... our model not only outperforms RAG-based baselines of the same model size, but also matches the performance of a larger RAG-based model while reducing search calls by over 30%.

C2weakest assumption

The outcome-based reward combined with a search penalty, together with the curated search-balanced dataset, is sufficient to produce efficient on-demand search behavior that generalizes beyond the training distribution.

C3one line summary

MMSearch-R1 uses reinforcement learning to train multimodal models for on-demand multi-turn internet search with image and text tools, outperforming same-size RAG baselines and matching larger ones while cutting search calls by over 30%.

References

94 extracted · 94 resolved · 26 Pith anchors

[1] Open deep search: Democratizing search with open-source reasoning agents 2025
[2] Claude 3.5 Sonnet 2024
[3] Self-rag: Learn- ing to retrieve, generate, and critique through self-reflection 2023
[4] Mint-1t: Scaling open-source multimodal data by 10x: A multimodal dataset with one trillion tokens 2024
[5] Qwen2.5-VL Technical Report 2025 · arXiv:2502.13923

Cited by

22 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:47.444503Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

669502c279c78d8099407fa0ff8f19a5a7a416364ed7208ee54d159fe652127e

Aliases

arxiv: 2506.20670 · arxiv_version: 2506.20670v1 · doi: 10.48550/arxiv.2506.20670 · pith_short_12: M2KQFQTZY6GY · pith_short_16: M2KQFQTZY6GYBGKA · pith_short_8: M2KQFQTZ
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/M2KQFQTZY6GYBGKAP6QP7DYZUW \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 669502c279c78d8099407fa0ff8f19a5a7a416364ed7208ee54d159fe652127e
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "d87506b60c8c6cbcb204c48af3d47e2d19d61c1bbb8a14d3a703a876367c7425",
    "cross_cats_sorted": [
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2025-06-25T17:59:42Z",
    "title_canon_sha256": "8ecc7b053022d2d258e49fd9c9a94ad2866fd3f21cc2e0a15ea9923f28eb1f08"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2506.20670",
    "kind": "arxiv",
    "version": 1
  }
}