Lang2Act boosts VLM visual perception over 4% by letting models self-generate linguistic toolchains through two-stage RL training.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Lang2Act: Fine-Grained Visual Reasoning through Self-Emergent Linguistic Toolchains
Lang2Act boosts VLM visual perception over 4% by letting models self-generate linguistic toolchains through two-stage RL training.