mplug-paperowl: Scientific diagram analysis with the multimodal large language model

· 2024 · arXiv 2311.18248

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

GENFIG1: Visual Summaries of Scholarly Work as a Challenge for Vision-Language Models

cs.CV · 2026-04-05 · unverdicted · novelty 7.0

GENFIG1 is a new benchmark that tests whether vision-language models can create effective Figure 1 visuals capturing the central scientific idea from paper text.

DeepSeek-VL: Towards Real-World Vision-Language Understanding

cs.AI · 2024-03-08 · unverdicted · novelty 4.0

DeepSeek-VL develops open-source 1.3B and 7B vision-language models that achieve competitive or state-of-the-art results on real-world visual-language benchmarks through diverse data curation, a hybrid vision encoder, and pretraining that preserves language capabilities.

A Survey on Multimodal Large Language Models

cs.CV · 2023-06-23 · accept · novelty 3.0

This survey organizes the architectures, training strategies, data, evaluation methods, extensions, and challenges of Multimodal Large Language Models.

citing papers explorer

Showing 3 of 3 citing papers.

GENFIG1: Visual Summaries of Scholarly Work as a Challenge for Vision-Language Models cs.CV · 2026-04-05 · unverdicted · none · ref 8
GENFIG1 is a new benchmark that tests whether vision-language models can create effective Figure 1 visuals capturing the central scientific idea from paper text.
DeepSeek-VL: Towards Real-World Vision-Language Understanding cs.AI · 2024-03-08 · unverdicted · none · ref 14
DeepSeek-VL develops open-source 1.3B and 7B vision-language models that achieve competitive or state-of-the-art results on real-world visual-language benchmarks through diverse data curation, a hybrid vision encoder, and pretraining that preserves language capabilities.
A Survey on Multimodal Large Language Models cs.CV · 2023-06-23 · accept · none · ref 41
This survey organizes the architectures, training strategies, data, evaluation methods, extensions, and challenges of Multimodal Large Language Models.

mplug-paperowl: Scientific diagram analysis with the multimodal large language model

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer