UniChart: A universal vision-language pretrained model for chart comprehension and reasoning

Ahmed Masry, Parsa Kavehzadeh, Xuan Long Do, Enamul Hoque, Shafiq Joty · 2023 · arXiv 2305.14761

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

dataset 1

citation-polarity summary

use dataset 1

representative citing papers

Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark

cs.AI · 2024-10-06 · unverdicted · novelty 7.0

PolyMATH is a new 5,000-image benchmark where top MLLMs reach at most 41 percent accuracy on multi-modal mathematical reasoning, with ablation showing minimal gain from text over images.

Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing

cs.CV · 2026-03-25 · conditional · novelty 6.0

PaddleOCR-VL uses a Valid Region Focus Module to select key visual tokens and a 0.9B model for guided recognition, delivering SOTA document parsing with far fewer tokens and parameters.

CycleChart: A Unified Consistency-Based Learning Framework for Bidirectional Chart Understanding and Generation

cs.CL · 2025-12-22 · unverdicted · novelty 6.0

CycleChart is a consistency-based framework that organizes chart generation, schema parsing, data parsing, and QA around single data instances to enforce bidirectional semantic alignment and improve cross-task generalization.

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

cs.CV · 2024-12-06 · unverdicted · novelty 6.0

InternVL 2.5 is the first open-source MLLM to surpass 70% on the MMMU benchmark via model, data, and test-time scaling, with a 3.7-point gain from chain-of-thought reasoning.

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

cs.CV · 2024-09-03 · unverdicted · novelty 5.0

GOT is a unified end-to-end model that treats all man-made optical signals as characters and handles multiple OCR tasks including formatted output and interactive region recognition via prompts.

DeepSeek-VL: Towards Real-World Vision-Language Understanding

cs.AI · 2024-03-08 · unverdicted · novelty 4.0

DeepSeek-VL develops open-source 1.3B and 7B vision-language models that achieve competitive or state-of-the-art results on real-world visual-language benchmarks through diverse data curation, a hybrid vision encoder, and pretraining that preserves language capabilities.

Bridging Language Models and Financial Analysis

q-fin.ST · 2025-03-14 · unverdicted · novelty 2.0

A survey synthesizing recent LLM research and assessing its applicability to financial data analysis.

citing papers explorer

Showing 7 of 7 citing papers.

Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark cs.AI · 2024-10-06 · unverdicted · none · ref 31
PolyMATH is a new 5,000-image benchmark where top MLLMs reach at most 41 percent accuracy on multi-modal mathematical reasoning, with ablation showing minimal gain from text over images.
Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing cs.CV · 2026-03-25 · conditional · none · ref 34
PaddleOCR-VL uses a Valid Region Focus Module to select key visual tokens and a 0.9B model for guided recognition, delivering SOTA document parsing with far fewer tokens and parameters.
CycleChart: A Unified Consistency-Based Learning Framework for Bidirectional Chart Understanding and Generation cs.CL · 2025-12-22 · unverdicted · none · ref 21
CycleChart is a consistency-based framework that organizes chart generation, schema parsing, data parsing, and QA around single data instances to enforce bidirectional semantic alignment and improve cross-task generalization.
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling cs.CV · 2024-12-06 · unverdicted · none · ref 182
InternVL 2.5 is the first open-source MLLM to surpass 70% on the MMMU benchmark via model, data, and test-time scaling, with a 3.7-point gain from chain-of-thought reasoning.
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model cs.CV · 2024-09-03 · unverdicted · none · ref 31
GOT is a unified end-to-end model that treats all man-made optical signals as characters and handles multiple OCR tasks including formatted output and interactive region recognition via prompts.
DeepSeek-VL: Towards Real-World Vision-Language Understanding cs.AI · 2024-03-08 · unverdicted · none · ref 22
DeepSeek-VL develops open-source 1.3B and 7B vision-language models that achieve competitive or state-of-the-art results on real-world visual-language benchmarks through diverse data curation, a hybrid vision encoder, and pretraining that preserves language capabilities.
Bridging Language Models and Financial Analysis q-fin.ST · 2025-03-14 · unverdicted · none · ref 65
A survey synthesizing recent LLM research and assessing its applicability to financial data analysis.

UniChart: A universal vision-language pretrained model for chart comprehension and reasoning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer