Pith Number

pith:O672LEEW

pith:2025:O672LEEW77T7WB3SZGNDBUGL4U

not attested not anchored not stored refs resolved

SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Cihang Xie, Fali Wang, Haoqin Tu, Hardy Chen, Hui Liu, Xianfeng Tang, Xinya Du, Yuyin Zhou

SFT induces pseudo reasoning paths that undermine subsequent RL in vision-language models.

arxiv:2504.11468 v1 · 2025-04-10 · cs.CL

Open paper page JSON Open Graph Bundle Merged state What is a Pith Number?

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

SFT can significantly undermine subsequent RL by inducing ``pseudo reasoning paths'' imitated from expert models. While these paths may resemble the native reasoning paths of RL models, they often involve prolonged, hesitant, less informative steps, and incorrect reasoning.

C2weakest assumption

That the performance gap between SFT-then-RL and RL-only is caused by the induction of pseudo-reasoning paths rather than differences in data difficulty, reward design, or training hyperparameters.

C3one line summary

SFT induces pseudo-reasoning paths that undermine RL in LVLMs, while RL with GRPO and mixed perception-cognition rewards on the new VLAA-Thinking dataset produces more genuine reasoning and top leaderboard performance.

References

25 extracted · 25 resolved · 0 Pith anchors

[1] **Replace references to “description”, “caption” and ”rationale”** with wording that references **“the image.”** - For example, “The description says...” could become “The image shows...” - “The capti

[2] **Preserve all line breaks, punctuation, and spacing** as much as possible, and make **no additional edits** outside of these replacements

[3] —— Here is the input: {input} Figure 10: Prompt for answer rewriting with GPT-4-Turbo 2024

[4] MathVista: The Test Mini split of MathVista dataset; overall accuracy

[5] MathVision: The Full test set of MathVision; overall accuracy

Formal links

2 machine-checked theorem links

Cited by

20 papers in Pith

Failure Makes the Agent Stronger: Enhancing Accuracy through Structured Reflection for Reliable Tool Interactions

Mixture-of-Visual-Thoughts: Exploring Context-Adaptive Reasoning Mode Selection for General Visual Reasoning

TeamPath: Building MultiModal Pathology Experts with Reasoning AI Copilots

WebSailor: Navigating Super-human Reasoning for Web Agent

Asking like Socrates: Socrates helps VLMs understand remote sensing images

Receipt and verification

First computed	2026-05-17T23:38:13.672223Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

77bfa59096ffe7fb0772c99a30d0cbe50abc4ccafe74fad7fbc453e9469792ba

Aliases

arxiv: 2504.11468 · arxiv_version: 2504.11468v1 · doi: 10.48550/arxiv.2504.11468 · pith_short_12: O672LEEW77T7 · pith_short_16: O672LEEW77T7WB3S · pith_short_8: O672LEEW

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/O672LEEW77T7WB3SZGNDBUGL4U \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 77bfa59096ffe7fb0772c99a30d0cbe50abc4ccafe74fad7fbc453e9469792ba

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "86f110acf42ce70033c15b9dfc44bfdb0cb15e4c6844ca5669f276b2aac0b858",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2025-04-10T16:54:05Z",
    "title_canon_sha256": "91c06de3452239265422fec0bb7bfc6768f80afb481f73766f6cdd37c08d11ad"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2504.11468",
    "kind": "arxiv",
    "version": 1
  }
}