pith. machine review for the scientific record. sign in
Pith Number

pith:O672LEEW

pith:2025:O672LEEW77T7WB3SZGNDBUGL4U
not attested not anchored not stored refs resolved

SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Cihang Xie, Fali Wang, Haoqin Tu, Hardy Chen, Hui Liu, Xianfeng Tang, Xinya Du, Yuyin Zhou

SFT induces pseudo reasoning paths that undermine subsequent RL in vision-language models.

arxiv:2504.11468 v1 · 2025-04-10 · cs.CL

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

SFT can significantly undermine subsequent RL by inducing ``pseudo reasoning paths'' imitated from expert models. While these paths may resemble the native reasoning paths of RL models, they often involve prolonged, hesitant, less informative steps, and incorrect reasoning.

C2weakest assumption

That the performance gap between SFT-then-RL and RL-only is caused by the induction of pseudo-reasoning paths rather than differences in data difficulty, reward design, or training hyperparameters.

C3one line summary

SFT induces pseudo-reasoning paths that undermine RL in LVLMs, while RL with GRPO and mixed perception-cognition rewards on the new VLAA-Thinking dataset produces more genuine reasoning and top leaderboard performance.

References

25 extracted · 25 resolved · 0 Pith anchors

[1] **Replace references to “description”, “caption” and ”rationale”** with wording that references **“the image.”** - For example, “The description says...” could become “The image shows...” - “The capti
[2] **Preserve all line breaks, punctuation, and spacing** as much as possible, and make **no additional edits** outside of these replacements
[3] —— Here is the input: {input} Figure 10: Prompt for answer rewriting with GPT-4-Turbo 2024
[4] MathVista: The Test Mini split of MathVista dataset; overall accuracy
[5] MathVision: The Full test set of MathVision; overall accuracy

Formal links

2 machine-checked theorem links

Cited by

20 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:13.672223Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

77bfa59096ffe7fb0772c99a30d0cbe50abc4ccafe74fad7fbc453e9469792ba

Aliases

arxiv: 2504.11468 · arxiv_version: 2504.11468v1 · doi: 10.48550/arxiv.2504.11468 · pith_short_12: O672LEEW77T7 · pith_short_16: O672LEEW77T7WB3S · pith_short_8: O672LEEW
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/O672LEEW77T7WB3SZGNDBUGL4U \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 77bfa59096ffe7fb0772c99a30d0cbe50abc4ccafe74fad7fbc453e9469792ba
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "86f110acf42ce70033c15b9dfc44bfdb0cb15e4c6844ca5669f276b2aac0b858",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2025-04-10T16:54:05Z",
    "title_canon_sha256": "91c06de3452239265422fec0bb7bfc6768f80afb481f73766f6cdd37c08d11ad"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2504.11468",
    "kind": "arxiv",
    "version": 1
  }
}