pith:O672LEEW
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
SFT induces pseudo reasoning paths that undermine subsequent RL in vision-language models.
arxiv:2504.11468 v1 · 2025-04-10 · cs.CL
Record completeness
Claims
SFT can significantly undermine subsequent RL by inducing ``pseudo reasoning paths'' imitated from expert models. While these paths may resemble the native reasoning paths of RL models, they often involve prolonged, hesitant, less informative steps, and incorrect reasoning.
That the performance gap between SFT-then-RL and RL-only is caused by the induction of pseudo-reasoning paths rather than differences in data difficulty, reward design, or training hyperparameters.
SFT induces pseudo-reasoning paths that undermine RL in LVLMs, while RL with GRPO and mixed perception-cognition rewards on the new VLAA-Thinking dataset produces more genuine reasoning and top leaderboard performance.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:13.672223Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
77bfa59096ffe7fb0772c99a30d0cbe50abc4ccafe74fad7fbc453e9469792ba
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/O672LEEW77T7WB3SZGNDBUGL4U \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 77bfa59096ffe7fb0772c99a30d0cbe50abc4ccafe74fad7fbc453e9469792ba
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "86f110acf42ce70033c15b9dfc44bfdb0cb15e4c6844ca5669f276b2aac0b858",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CL",
"submitted_at": "2025-04-10T16:54:05Z",
"title_canon_sha256": "91c06de3452239265422fec0bb7bfc6768f80afb481f73766f6cdd37c08d11ad"
},
"schema_version": "1.0",
"source": {
"id": "2504.11468",
"kind": "arxiv",
"version": 1
}
}