ToolsRL trains MLLMs via a tool-specific then accuracy-focused RL curriculum to master visual tools for complex reasoning tasks.
Llava-onevision: Easy visual task transfer
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2verdicts
UNVERDICTED 2representative citing papers
Finetuning VLMs on perception tasks produces positive and negative transfers that can be mapped with a new normalized metric called Perfection Gap Factor across 13 tasks and three models.
citing papers explorer
-
Visual Reasoning through Tool-supervised Reinforcement Learning
ToolsRL trains MLLMs via a tool-specific then accuracy-focused RL curriculum to master visual tools for complex reasoning tasks.
-
Understanding Task Transfer in Vision-Language Models
Finetuning VLMs on perception tasks produces positive and negative transfers that can be mapped with a new normalized metric called Perfection Gap Factor across 13 tasks and three models.