InternVL3.5 advances open-source multimodal models with Cascade RL for +16% reasoning gains and ViR for 4x inference speedup, with the 241B model reaching SOTA among open-source MLLMs on multimodal, reasoning, and agentic tasks.
Language models are unsupervised multitask learners
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
method 1
citation-polarity summary
fields
cs.CV 2roles
method 1polarities
use method 1representative citing papers
MMBench is a new bilingual benchmark that uses curated questions, CircularEval, and LLM-assisted answer conversion to provide objective, fine-grained evaluation of vision-language models.
citing papers explorer
-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
InternVL3.5 advances open-source multimodal models with Cascade RL for +16% reasoning gains and ViR for 4x inference speedup, with the 241B model reaching SOTA among open-source MLLMs on multimodal, reasoning, and agentic tasks.
-
MMBench: Is Your Multi-modal Model an All-around Player?
MMBench is a new bilingual benchmark that uses curated questions, CircularEval, and LLM-assisted answer conversion to provide objective, fine-grained evaluation of vision-language models.