MM-ToolBench introduces 100 closed-loop multimodal tasks across two domains with 27 MCP servers and 324 tools, where agents must execute, inspect artifacts, and revise before final output.
Inspect these assets only when essential for task comprehension or result verification using available viewer tools
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
TOBench: A Task-Oriented Omni-Modal Benchmark for Real-World Tool-Using Agents
MM-ToolBench introduces 100 closed-loop multimodal tasks across two domains with 27 MCP servers and 324 tools, where agents must execute, inspect artifacts, and revise before final output.