Mmdeepresearch-bench: A benchmark for multimodal deep research agents

· 2026 · arXiv 2601.12346

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1 dataset 1

citation-polarity summary

background 2

representative citing papers

EpiBench: Benchmarking Multi-turn Research Workflows for Multimodal Agents

cs.CL · 2026-04-07 · unverdicted · novelty 7.0

EpiBench is a new episodic multi-turn multimodal benchmark where even leading AI agents score only 29.23% on hard tasks requiring cross-paper evidence integration from figures and tables.

Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application

cs.CL · 2026-06-10 · unverdicted · novelty 5.0

This survey categorizes agentic environments for LLMs by eight attributes and domains, introduces symbolic and neural synthesis paradigms with evaluation, and outlines four agent evolution pathways plus three environment evolution paradigms.

TOBench: A Task-Oriented Omni-Modal Benchmark for Real-World Tool-Using Agents

cs.AI · 2026-05-16 · unverdicted · novelty 5.0

MM-ToolBench introduces 100 closed-loop multimodal tasks across two domains with 27 MCP servers and 324 tools, where agents must execute, inspect artifacts, and revise before final output.

ViDR: Grounding Multimodal Deep Research Reports in Source Visual Evidence

cs.CV · 2026-05-13 · unverdicted · novelty 5.0

ViDR treats source figures as retrievable and verifiable evidence objects in multimodal deep research reports and introduces MMR Bench+ to measure improvements in visual integration and verifiability.

citing papers explorer

Showing 4 of 4 citing papers.

EpiBench: Benchmarking Multi-turn Research Workflows for Multimodal Agents cs.CL · 2026-04-07 · unverdicted · none · ref 8
EpiBench is a new episodic multi-turn multimodal benchmark where even leading AI agents score only 29.23% on hard tasks requiring cross-paper evidence integration from figures and tables.
Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application cs.CL · 2026-06-10 · unverdicted · none · ref 69
This survey categorizes agentic environments for LLMs by eight attributes and domains, introduces symbolic and neural synthesis paradigms with evaluation, and outlines four agent evolution pathways plus three environment evolution paradigms.
TOBench: A Task-Oriented Omni-Modal Benchmark for Real-World Tool-Using Agents cs.AI · 2026-05-16 · unverdicted · none · ref 26
MM-ToolBench introduces 100 closed-loop multimodal tasks across two domains with 27 MCP servers and 324 tools, where agents must execute, inspect artifacts, and revise before final output.
ViDR: Grounding Multimodal Deep Research Reports in Source Visual Evidence cs.CV · 2026-05-13 · unverdicted · none · ref 7
ViDR treats source figures as retrievable and verifiable evidence objects in multimodal deep research reports and introduces MMR Bench+ to measure improvements in visual integration and verifiability.

Mmdeepresearch-bench: A benchmark for multimodal deep research agents

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer