ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding

Fangzhi Xu; Han Lai; Jie Ma; Jun Liu; Lingling Zhang; Muye Huang; Wenjun Wu; Yaqiang Wu; Yifei Li

arxiv: 2505.19076 · v1 · pith:QX4UWYFMnew · submitted 2025-05-25 · 💻 cs.CV

ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding

Muye Huang , Lingling Zhang , Jie Ma , Han Lai , Fangzhi Xu , Yifei Li , Wenjun Wu , Yaqiang Wu

show 1 more author

Jun Liu

This is my paper

classification 💻 cs.CV

keywords reasoningunderstandingchartchartsketchermultimodalvisualchartscomplex

0 comments

read the original abstract

Charts are high-density visualization carriers for complex data, serving as a crucial medium for information extraction and analysis. Automated chart understanding poses significant challenges to existing multimodal large language models (MLLMs) due to the need for precise and complex visual reasoning. Current step-by-step reasoning models primarily focus on text-based logical reasoning for chart understanding. However, they struggle to refine or correct their reasoning when errors stem from flawed visual understanding, as they lack the ability to leverage multimodal interaction for deeper comprehension. Inspired by human cognitive behavior, we propose ChartSketcher, a multimodal feedback-driven step-by-step reasoning method designed to address these limitations. ChartSketcher is a chart understanding model that employs Sketch-CoT, enabling MLLMs to annotate intermediate reasoning steps directly onto charts using a programmatic sketching library, iteratively feeding these visual annotations back into the reasoning process. This mechanism enables the model to visually ground its reasoning and refine its understanding over multiple steps. We employ a two-stage training strategy: a cold start phase to learn sketch-based reasoning patterns, followed by off-policy reinforcement learning to enhance reflection and generalization. Experiments demonstrate that ChartSketcher achieves promising performance on chart understanding benchmarks and general vision tasks, providing an interactive and interpretable approach to chart comprehension.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Diversity Over Frequency: Rethinking Tool Use in Visual Chain-of-Thought Agents
cs.CV 2026-05 unverdicted novelty 7.0

Visual CoT agents exhibit tool-use collapse where tool usage declines but task accuracy rises, and adding entropy regularization for rollout diversity produces the strongest performance.
Chart-FR1: Visual Focus-Driven Fine-Grained Reasoning on Dense Charts
cs.CV 2026-05 unverdicted novelty 6.0

Chart-FR1 uses Focus-CoT for linking reasoning to visual cues and Focus-GRPO reinforcement learning with efficiency rewards to outperform prior MLLMs on dense chart reasoning tasks.
Chart-RL: Policy Optimization Reinforcement Learning for Enhanced Visual Reasoning in Chart Question Answering with Vision Language Models
cs.AI 2026-04 unverdicted novelty 6.0

Chart-RL uses RL policy optimization and LoRA to boost VLM chart reasoning, enabling a 4B model to reach 0.634 accuracy versus 0.580 for an 8B model with lower latency.