Chart2Code is a hierarchical benchmark with reproduction, editing, and table-to-chart tasks across 22 chart types that shows even top models like GPT-5 achieve low scores of 0.57 on code evaluation and 0.22 on chart quality for editing tasks.
The data should be defined directly within the code (e.g., in a 63 pandas DataFrame loaded from a string), without needing to read any external files
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SE 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
From Charts to Code: A Hierarchical Benchmark for Multimodal Models
Chart2Code is a hierarchical benchmark with reproduction, editing, and table-to-chart tasks across 22 chart types that shows even top models like GPT-5 achieve low scores of 0.57 on code evaluation and 0.22 on chart quality for editing tasks.