Examples include malformed code, invalid tool arguments, runtime errors, missing file saves, or other failures that prevent the intended tool action from being executed correctly

Tool-Misexecution

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?

cs.AI · 2026-04-03 · accept · novelty 7.0

Agentic-MME is a process-verified benchmark for multimodal agentic capabilities featuring 418 tasks, dual-axis checkpoints, and an overthinking metric that reveals even the best models achieve only 56.3% accuracy overall.

citing papers explorer

Showing 1 of 1 citing paper.

Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence? cs.AI · 2026-04-03 · accept · none · ref 9
Agentic-MME is a process-verified benchmark for multimodal agentic capabilities featuring 418 tasks, dual-axis checkpoints, and an overthinking metric that reveals even the best models achieve only 56.3% accuracy overall.

Examples include malformed code, invalid tool arguments, runtime errors, missing file saves, or other failures that prevent the intended tool action from being executed correctly

fields

years

verdicts

representative citing papers

citing papers explorer