Cmmmu: A chinese massive multi-discipline multimodal understanding benchmark

Zhang, G · 2024 · arXiv 2401.11944

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

From UAV Imagery to Agronomic Reasoning: A Multimodal LLM Benchmark for Plant Phenotyping

cs.CV · 2026-04-10 · unverdicted · novelty 7.0

PlantXpert benchmark shows fine-tuned VLMs reach up to 78% accuracy on plant phenotyping but scaling gains plateau and quantitative biological reasoning remains weak.

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

cs.CV · 2024-12-31 · accept · novelty 7.0

OCRBench v2 is a new benchmark with four times more tasks than prior versions that reveals most large multimodal models score below 50 out of 100 on visual text tasks and share five specific weaknesses.

OProver: A Unified Framework for Agentic Formal Theorem Proving

cs.CL · 2026-05-17 · unverdicted · novelty 6.0

OProver-32B achieves top Pass@32 scores on MiniF2F, ProverBench, and PutnamBench by combining continued pretraining with iterative agentic proving, retrieval, SFT on repairs, and RL on unresolved cases using a 6.86M-proof dataset.

DeepSeek-VL: Towards Real-World Vision-Language Understanding

cs.AI · 2024-03-08 · unverdicted · novelty 4.0

DeepSeek-VL develops open-source 1.3B and 7B vision-language models that achieve competitive or state-of-the-art results on real-world visual-language benchmarks through diverse data curation, a hybrid vision encoder, and pretraining that preserves language capabilities.

Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning

cs.CL · 2025-02-05 · unverdicted · novelty 2.0

Position paper claims multimodal LLMs can significantly advance scientific reasoning and proposes a four-stage roadmap plus challenges and suggestions.

citing papers explorer

Showing 5 of 5 citing papers.

From UAV Imagery to Agronomic Reasoning: A Multimodal LLM Benchmark for Plant Phenotyping cs.CV · 2026-04-10 · unverdicted · none · ref 22
PlantXpert benchmark shows fine-tuned VLMs reach up to 78% accuracy on plant phenotyping but scaling gains plateau and quantitative biological reasoning remains weak.
OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning cs.CV · 2024-12-31 · accept · none · ref 131
OCRBench v2 is a new benchmark with four times more tasks than prior versions that reveals most large multimodal models score below 50 out of 100 on visual text tasks and share five specific weaknesses.
OProver: A Unified Framework for Agentic Formal Theorem Proving cs.CL · 2026-05-17 · unverdicted · none · ref 56
OProver-32B achieves top Pass@32 scores on MiniF2F, ProverBench, and PutnamBench by combining continued pretraining with iterative agentic proving, retrieval, SFT on repairs, and RL on unresolved cases using a 6.86M-proof dataset.
DeepSeek-VL: Towards Real-World Vision-Language Understanding cs.AI · 2024-03-08 · unverdicted · none · ref 38
DeepSeek-VL develops open-source 1.3B and 7B vision-language models that achieve competitive or state-of-the-art results on real-world visual-language benchmarks through diverse data curation, a hybrid vision encoder, and pretraining that preserves language capabilities.
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning cs.CL · 2025-02-05 · unverdicted · none · ref 247
Position paper claims multimodal LLMs can significantly advance scientific reasoning and proposes a four-stage roadmap plus challenges and suggestions.

Cmmmu: A chinese massive multi-discipline multimodal understanding benchmark

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer