pith. sign in
Pith Number

pith:LCB3UHAA

pith:2025:LCB3UHAAN6GMCNM5457GPWNF6W
not attested not anchored not stored refs pending

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Chao Qu, Fangzhen Lin, Haozhe Wang, Wei Chu, Wenhu Chen, Zuming Huang

Reinforcement learning with selective replay and forced rethinking steps lets vision-language models reflect on their answers and reach new highs on multimodal math benchmarks.

arxiv:2504.08837 v3 · 2025-04-10 · cs.LG · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{LCB3UHAAN6GMCNM5457GPWNF6W}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

By combining Selective Sample Replay and Forced Rethinking in RL training, VL-Rethinker advances state-of-the-art scores on MathVista to 80.4% and MathVerse to 63.5%, achieving open-source SoTA on MathVision, MMMU-Pro, EMMA, and MEGA-Bench.

C2weakest assumption

That the reported gains stem primarily from increased self-reflection and slow-thinking rather than from other side effects of the RL setup or from benchmark-specific optimizations.

C3one line summary

VL-Rethinker reaches 80.4% on MathVista and 63.5% on MathVerse by adapting GRPO with Selective Sample Replay and Forced Rethinking to promote self-reflection in vision-language models without distillation.

Formal links

2 machine-checked theorem links

Cited by

50 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:53.290891Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

5883ba1c006f8cc1359de77e67d9a5f5b64427899b130d9f6a57ce29a199434e

Aliases

arxiv: 2504.08837 · arxiv_version: 2504.08837v3 · doi: 10.48550/arxiv.2504.08837 · pith_short_12: LCB3UHAAN6GM · pith_short_16: LCB3UHAAN6GMCNM5 · pith_short_8: LCB3UHAA
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/LCB3UHAAN6GMCNM5457GPWNF6W \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 5883ba1c006f8cc1359de77e67d9a5f5b64427899b130d9f6a57ce29a199434e
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "c32ff01e1d574c8407f7d747be89e3134028b09fc84bdb3106f98de9fc65a742",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2025-04-10T17:41:56Z",
    "title_canon_sha256": "44ab9014d6ac1819ca9747e6da46d768642c25e8940acc1356b7d8608ad8420c"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2504.08837",
    "kind": "arxiv",
    "version": 3
  }
}