ErrorRadar is a new benchmark of 2,500 multimodal K-12 math problems for MLLM error step identification and categorization, where GPT-4o trails human experts by ~10%.
How to bridge the gap between modalities: A comprehensive survey on multimodal large language model
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Authors build a synthetic data generator and two-stage training pipeline for structured abstractive reasoning on multi-modal relational knowledge images, releasing STAR-64K and showing 3B/7B models outperforming GPT-4o.
A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.
citing papers explorer
-
ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection
ErrorRadar is a new benchmark of 2,500 multimodal K-12 math problems for MLLM error step identification and categorization, where GPT-4o trails human experts by ~10%.
-
Structured and Abstractive Reasoning on Multi-modal Relational Knowledge Images
Authors build a synthetic data generator and two-stage training pipeline for structured abstractive reasoning on multi-modal relational knowledge images, releasing STAR-64K and showing 3B/7B models outperforming GPT-4o.
-
A Survey on the Memory Mechanism of Large Language Model based Agents
A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.