How to bridge the gap between modalities: A comprehensive survey on multimodal large language model

Shezheng Song, Xiaopeng Li, Shasha Li · 2023 · arXiv 2311.07594

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection

cs.CL · 2024-10-06 · unverdicted · novelty 8.0

ErrorRadar is a new benchmark of 2,500 multimodal K-12 math problems for MLLM error step identification and categorization, where GPT-4o trails human experts by ~10%.

Structured and Abstractive Reasoning on Multi-modal Relational Knowledge Images

cs.CV · 2025-10-22 · unverdicted · novelty 6.0

Authors build a synthetic data generator and two-stage training pipeline for structured abstractive reasoning on multi-modal relational knowledge images, releasing STAR-64K and showing 3B/7B models outperforming GPT-4o.

A Survey on the Memory Mechanism of Large Language Model based Agents

cs.AI · 2024-04-21 · accept · novelty 3.0

A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.

citing papers explorer

Showing 3 of 3 citing papers.

ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection cs.CL · 2024-10-06 · unverdicted · none · ref 56
ErrorRadar is a new benchmark of 2,500 multimodal K-12 math problems for MLLM error step identification and categorization, where GPT-4o trails human experts by ~10%.
Structured and Abstractive Reasoning on Multi-modal Relational Knowledge Images cs.CV · 2025-10-22 · unverdicted · none · ref 9
Authors build a synthetic data generator and two-stage training pipeline for structured abstractive reasoning on multi-modal relational knowledge images, releasing STAR-64K and showing 3B/7B models outperforming GPT-4o.
A Survey on the Memory Mechanism of Large Language Model based Agents cs.AI · 2024-04-21 · accept · none · ref 23
A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.

How to bridge the gap between modalities: A comprehensive survey on multimodal large language model

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer