CycleChart: A Unified Consistency-Based Learning Framework for Bidirectional Chart Understanding and Generation

Dazhen Deng; Sen Yang; Yingcai Wu; Yuan Tian; Yuchen He

arxiv: 2512.19173 · v2 · submitted 2025-12-22 · 💻 cs.CL · cs.CV

CycleChart: A Unified Consistency-Based Learning Framework for Bidirectional Chart Understanding and Generation

Dazhen Deng , Sen Yang , Yuchen He , Yuan Tian , Yingcai Wu This is my paper

Pith reviewed 2026-05-16 20:51 UTC · model grok-4.3

classification 💻 cs.CL cs.CV

keywords chart understandingchart generationconsistency learningbidirectional modelsmultimodal learningdata visualization

0 comments

The pith

Enforcing generate-parse consistency on aligned chart data improves cross-task performance and generalization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CycleChart as a framework that trains models on chart tasks by cycling through generation and parsing for each data instance rather than handling tasks in isolation. From a table and natural-language query the model produces a chart specification, renders it to an image, then recovers the schema and data, with a consistency objective aligning the forward and reverse directions. This lifecycle approach is meant to capture the shared transformations between data, visual encoding, and structured recovery. A sympathetic reader would care because current chart work treats understanding and generation as separate problems, and linking them through consistency could yield models that handle new charts more reliably.

Core claim

CycleChart organizes all tasks around each single data instance in a lifecycle from source table and query through chart generation and rendering to schema and data parsing, with a generate-parse consistency objective that enforces semantic alignment between the forward and reverse directions, yielding strong results on four tasks and improved transfer to external benchmarks.

What carries the argument

The per-instance lifecycle design together with the generate-parse consistency objective that links generation from data to recovery from the rendered image.

If this is right

The model captures the full chain of transformations from raw data through visual encoding to structured recovery.
Performance improves simultaneously on NL2Chart generation, schema parsing, data parsing, and ChartQA.
The approach transfers effectively to unseen external benchmarks.
Cross-task generalization increases relative to conventional multi-task training that samples tasks independently.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same consistency cycle could be applied to other paired generation-understanding tasks such as diagram or map creation.
Training in this manner might reduce errors when charts are later edited or queried in new ways.
Extending the framework to charts with interactive elements or multiple linked views remains open for testing.

Load-bearing premise

That enforcing generate-parse consistency on the authors' lifecycle-aligned benchmark will produce models whose improvements generalize beyond the specific chart rendering pipeline and annotation style used.

What would settle it

Evaluating the trained model on charts rendered with a different library or on real-world charts that lack the aligned annotations from CycleChart-Bench.

Figures

Figures reproduced from arXiv: 2512.19173 by Dazhen Deng, Sen Yang, Yingcai Wu, Yuan Tian, Yuchen He.

**Figure 2.** Figure 2: CycleChart-Bench construction. nvBench-2.0 queries [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the CycleChart training framework. Given a source table and NL query, the model generates a chart specification [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Impact of training steps across benchmarks. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Current chart-related tasks, such as chart generation (NL2Chart), chart schema parsing, chart data parsing, and chart question answering (ChartQA), are typically studied in isolation, preventing models from learning the shared semantics that link chart creation and interpretation. We introduce CycleChart, a consistency-based learning framework for bidirectional chart understanding and generation. Unlike conventional multi-task approaches that draw training samples independently across tasks, CycleChart organizes all tasks around each single data instance. From a source table and natural-language query, the model generates a chart specification, renders and executes it, then learns to recover the schema and underlying data from the resulting chart image. This per-instance lifecycle design lets the model capture the full chain of transformations, from raw data through visual encoding to structured recovery, and a generate--parse consistency objective enforces semantic alignment between the forward generation and reverse parsing directions. To support this framework, we construct CycleChart-Bench, a lifecycle-aligned benchmark where every chart sample carries aligned annotations for generation, schema parsing, data parsing, and question answering. CycleChart achieves strong results across all four tasks and transfers effectively to unseen external benchmarks, demonstrating improved cross-task generalization and marking a step toward more general chart understanding models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CycleChart's per-instance lifecycle with generate-parse consistency is a clean methodological step for chart tasks, but the abstract gives no numbers so the gains are still unproven.

read the letter

The punchline is that this paper organizes four chart tasks around single data instances and adds an explicit consistency loss between generating a chart spec and parsing it back. That per-instance loop is new relative to the usual separate-task training, and they built CycleChart-Bench to make the full table-to-spec-to-image-to-data cycle available for every sample. The claim is that this produces better cross-task generalization and transfers to external benchmarks. If the consistency term actually drives the improvement, it could be a useful template for other generation-parsing loops like diagrams or code.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces CycleChart, a consistency-based framework for joint chart generation (NL2Chart), schema parsing, data parsing, and ChartQA. Tasks are organized around per-instance lifecycles on a newly constructed CycleChart-Bench benchmark: from table and query the model generates a chart specification, renders it, and must recover schema and data from the image, with a generate-parse consistency objective enforcing semantic alignment. The central claims are strong performance across the four tasks and effective transfer to unseen external benchmarks, demonstrating improved cross-task generalization over independent multi-task training.

Significance. If the consistency objective is shown to produce representations that capture invariant data-to-visual semantics rather than pipeline-specific regularities, the work could advance unified chart models that learn bidirectional mappings more robustly than isolated task training. The lifecycle-aligned benchmark construction is a constructive contribution, but its purpose-built nature makes the transfer claims load-bearing and in need of stronger controls.

major comments (2)

[§4.2] §4.2 (Benchmark Construction): CycleChart-Bench is built around a single table→spec→render→image→parse pipeline. The generate-parse consistency objective could therefore be satisfied by learning pipeline-specific artifacts (exact encoding choices, color mappings, annotation conventions) rather than generalizable chart semantics. The transfer experiments to external benchmarks must include explicit ablations or controls for rendering and annotation differences to attribute gains to the proposed mechanism rather than distributional overlap.
[§5.1] §5.1 (Experimental Results): The reported strong results and cross-task generalization claims lack ablations that isolate the contribution of the consistency loss from standard multi-task training on the same CycleChart-Bench data. Without these comparisons (and without quantitative metrics, error analysis, or statistical significance in the main results), it is difficult to confirm that the per-instance lifecycle design drives the improvements.

minor comments (2)

[§3] Notation for the consistency objective (e.g., the exact formulation of the cycle loss) should be clarified with an explicit equation in §3 to avoid ambiguity when comparing to standard reconstruction losses.
[Figure 2] Figure 2 (lifecycle diagram) would benefit from explicit arrows showing the forward generation and reverse parsing paths with the consistency term highlighted.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below, providing our response and indicating planned revisions to strengthen the presentation of the consistency objective and experimental claims.

read point-by-point responses

Referee: [§4.2] §4.2 (Benchmark Construction): CycleChart-Bench is built around a single table→spec→render→image→parse pipeline. The generate-parse consistency objective could therefore be satisfied by learning pipeline-specific artifacts (exact encoding choices, color mappings, annotation conventions) rather than generalizable chart semantics. The transfer experiments to external benchmarks must include explicit ablations or controls for rendering and annotation differences to attribute gains to the proposed mechanism rather than distributional overlap.

Authors: We agree that the single-pipeline construction of CycleChart-Bench introduces a risk that the consistency objective could exploit rendering-specific regularities rather than invariant data-to-visual semantics. Our transfer results to external benchmarks (which use different rendering libraries and annotation conventions) provide supporting evidence for generalization, but we acknowledge that explicit controls would make this attribution more robust. In the revised manuscript we will add an ablation that systematically varies rendering parameters and annotation styles between CycleChart-Bench and the target external benchmarks while measuring the resulting performance delta. revision: yes
Referee: [§5.1] §5.1 (Experimental Results): The reported strong results and cross-task generalization claims lack ablations that isolate the contribution of the consistency loss from standard multi-task training on the same CycleChart-Bench data. Without these comparisons (and without quantitative metrics, error analysis, or statistical significance in the main results), it is difficult to confirm that the per-instance lifecycle design drives the improvements.

Authors: We have included an ablation comparing the full CycleChart framework against a standard multi-task baseline trained on identical CycleChart-Bench data; these results appear in Section 5.2 and the appendix and show additional gains attributable to the consistency loss. We agree that the main results would benefit from expanded quantitative support. In the revision we will add statistical significance testing (paired t-tests with bootstrap confidence intervals), a concise error analysis of failure modes, and additional metrics to the primary experimental tables. revision: partial

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper introduces a consistency-based training framework organized around per-instance lifecycles and constructs CycleChart-Bench to support it, but the central claims rest on empirical results across tasks plus explicit transfer evaluation on unseen external benchmarks. No equations or self-citations are shown that reduce the consistency objective or generalization claim to a fitted parameter or input by construction; the method is a standard bidirectional alignment technique whose outputs are not definitionally equivalent to its training signals. This is the common honest case of a self-contained empirical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on the assumption that chart rendering is deterministic and invertible enough for consistency training to be meaningful, plus standard supervised learning assumptions. No explicit free parameters or invented physical entities are described.

axioms (2)

domain assumption Chart rendering from specification to image is a deterministic, lossless-enough mapping for the reverse parsing task to be well-defined.
Invoked when the model is required to recover schema and data from the rendered chart image.
ad hoc to paper Joint training on the closed generation-parsing loop improves cross-task generalization beyond independent multi-task training.
Central modeling assumption of the consistency objective.

invented entities (1)

CycleChart-Bench no independent evidence
purpose: Lifecycle-aligned benchmark providing aligned annotations for generation, schema parsing, data parsing, and QA on the same instances.
New dataset constructed to support the per-instance training loop.

pith-pipeline@v0.9.0 · 5523 in / 1513 out tokens · 27262 ms · 2026-05-16T20:51:03.853772+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 1 internal anchor

[1]

Recycle-gan: Unsupervised video retargeting

Aayush Bansal, Shugao Ma, Deva Ramanan, and Yaser Sheikh. Recycle-gan: Unsupervised video retargeting. In Proceedings of the European conference on computer vision (ECCV), pages 119–135, 2018. 3

work page 2018
[2]

Onechart: Purify the chart structural extrac- tion via one auxiliary token

Jinyue Chen, Lingyu Kong, Haoran Wei, Chenglong Liu, Zheng Ge, Liang Zhao, Jianjian Sun, Chunrui Han, and Xi- angyu Zhang. Onechart: Purify the chart structural extrac- tion via one auxiliary token. InProceedings of the 32nd ACM International Conference on Multimedia, pages 147– 155, 2024. 2

work page 2024
[3]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blis- tein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025. 6

work page internal anchor Pith review Pith/arXiv arXiv 2025
[4]

Towards cycle-consistent models for text and image retrieval

Marcella Cornia, Lorenzo Baraldi, Hamed R Tavakoli, and Rita Cucchiara. Towards cycle-consistent models for text and image retrieval. InProceedings of the European Con- ference on Computer Vision (ECCV) Workshops, pages 0–0,

work page
[5]

DashBot: Insight-driven dashboard generation based on deep reinforcement learning.IEEE Transactions on Visu- alization and Computer Graphics, 29(1):690–700, 2023

Dazhen Deng, Aoyu Wu, Huamin Qu, and Yingcai Wu. DashBot: Insight-driven dashboard generation based on deep reinforcement learning.IEEE Transactions on Visu- alization and Computer Graphics, 29(1):690–700, 2023. 3

work page 2023
[6]

Chartllama: A mul- timodal llm for chart understanding and generation.arXiv preprint arXiv:2311.16483, 2023

Yucheng Han, Chi Zhang, Xin Chen, Xu Yang, Zhibin Wang, Gang Yu, Bin Fu, and Hanwang Zhang. Chartllama: A mul- timodal llm for chart understanding and generation.arXiv preprint arXiv:2311.16483, 2023. 1, 3

work page arXiv 2023
[7]

Dual learning for machine trans- lation.Advances in neural information processing systems, 29, 2016

Di He, Yingce Xia, Tao Qin, Liwei Wang, Nenghai Yu, Tie- Yan Liu, and Wei-Ying Ma. Dual learning for machine trans- lation.Advances in neural information processing systems, 29, 2016. 3

work page 2016
[8]

Glm-4.5v and glm-4.1v-thinking: Towards versatile multimodal reasoning with scalable reinforcement learning, 2025

Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guob- ing Gan, Haomiao Tang, Jiale Cheng, Ji Qi, Junhui Ji, Li- hang Pan, et al. Glm-4.5v and glm-4.1v-thinking: Towards versatile multimodal reasoning with scalable reinforcement learning, 2025. 6

work page 2025
[9]

Natural language gen- eration for visualizations: State of the art, challenges and fu- ture directions

Enamul Hoque and M Saidul Islam. Natural language gen- eration for visualizations: State of the art, challenges and fu- ture directions. InComputer Graphics Forum, page e15266. Wiley Online Library, 2025. 3

work page 2025
[10]

Image quality metrics: Psnr vs

Alain Hor ´e and Djemel Ziou. Image quality metrics: Psnr vs. ssim. In2010 20th International Conference on Pattern Recognition, pages 2366–2369, 2010. 6

work page 2010
[11]

Scicap: Generating captions for scientific figures

Ting-Yao Hsu, C Lee Giles, and Ting-Hao Huang. Scicap: Generating captions for scientific figures. InFindings of the Association for Computational Linguistics: EMNLP 2021, pages 3258–3264, 2021. 2

work page 2021
[12]

From pixels to insights: A survey on automatic chart understanding in the era of large foundation models.IEEE Transactions on Knowledge and Data Engineering, 2024

Kung-Hsiang Huang, Hou Pong Chan, Yi R Fung, Haoyi Qiu, Mingyang Zhou, Shafiq Joty, Shih-Fu Chang, and Heng Ji. From pixels to insights: A survey on automatic chart understanding in the era of large foundation models.IEEE Transactions on Knowledge and Data Engineering, 2024. 1

work page 2024
[13]

Do LVLMs understand charts? analyzing and correcting factual errors in chart captioning

Kung-Hsiang Huang, Mingyang Zhou, Hou Pong Chan, Yi Fung, Zhenhailong Wang, Lingyu Zhang, Shih-Fu Chang, and Heng Ji. Do LVLMs understand charts? analyzing and correcting factual errors in chart captioning. InFindings of the Association for Computational Linguistics: ACL 2024, pages 730–749, Bangkok, Thailand, 2024. Association for Computational Linguistics. 1, 3

work page 2024
[14]

Claude 3.5 sonnet news.https : / / www

Anthropic Inc. Claude 3.5 sonnet news.https : / / www . anthropic . com / news / claude - 3 - 5 - sonnet, 2024. 6

work page 2024
[15]

Introducing gpt-4.1 in the api.https:// openai.com/index/gpt-4-1/, 2025

OpenAI Inc. Introducing gpt-4.1 in the api.https:// openai.com/index/gpt-4-1/, 2025. 6

work page 2025
[16]

Dvqa: Understanding data visualizations via ques- tion answering

Kushal Kafle, Brian Price, Scott Cohen, and Christopher Kanan. Dvqa: Understanding data visualizations via ques- tion answering. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 5648–5656,

work page
[17]

Rouge: A package for automatic evaluation of summaries

Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. InText summarization branches out, pages 74–81, 2004. 6

work page 2004
[18]

nvbench 2.0: A benchmark for natural language to visualization under ambi- guity.arXiv preprint arXiv:2503.12880, 2025

Tianqi Luo, Chuhan Huang, Leixian Shen, Boyan Li, Shuyu Shen, Wei Zeng, Nan Tang, and Yuyu Luo. nvbench 2.0: A benchmark for natural language to visualization under ambi- guity.arXiv preprint arXiv:2503.12880, 2025. 1, 3, 4

work page arXiv 2025
[19]

nvbench: A large- scale synthesized dataset for cross-domain natural language to visualization task.arXiv preprint arXiv:2112.12926,

Yuyu Luo, Jiawei Tang, and Guoliang Li. nvbench: A large- scale synthesized dataset for cross-domain natural language to visualization task.arXiv preprint arXiv:2112.12926,

work page arXiv
[20]

Chartqa: A benchmark for question answer- ing about charts with visual and logical reasoning

Ahmed Masry, Xuan Long Do, Jia Qing Tan, Shafiq Joty, and Enamul Hoque. Chartqa: A benchmark for question answer- ing about charts with visual and logical reasoning. InFind- ings of the association for computational linguistics: ACL 2022, pages 2263–2279, 2022. 1, 2, 6, 7

work page 2022
[21]

UniChart: A universal vision-language pretrained model for chart comprehension and reasoning

Ahmed Masry, Parsa Kavehzadeh, Xuan Long Do, Ena- mul Hoque, and Shafiq Joty. Unichart: A universal vision- language pretrained model for chart comprehension and rea- soning.arXiv preprint arXiv:2305.14761, 2023. 2, 6

work page arXiv 2023
[22]

Chartinstruct: Instruction tuning for chart comprehension and reasoning

Ahmed Masry, Mehrad Shahmohammadi, Md Rizwan Parvez, Enamul Hoque, and Shafiq Joty. Chartinstruct: Instruction tuning for chart comprehension and reasoning. arXiv preprint arXiv:2403.09028, 2024. 2, 6

work page arXiv 2024
[23]

Chartqapro: A more di- verse and challenging benchmark for chart question answer- ing.arXiv preprint arXiv:2504.05506, 2025

Ahmed Masry, Mohammed Saidul Islam, Mahir Ahmed, Aayush Bajaj, Firoz Kabir, Aaryaman Kartha, Md Tah- mid Rahman Laskar, Mizanur Rahman, Shadikur Rahman, Mehrad Shahmohammadi, et al. Chartqapro: A more di- verse and challenging benchmark for chart question answer- ing.arXiv preprint arXiv:2504.05506, 2025. 1, 2, 7

work page arXiv 2025
[24]

Chartassisstant: A universal chart multimodal language model via chart-to-table pre-training and multitask instruction tuning.arXiv preprint arXiv:2401.02384, 2024

Fanqing Meng, Wenqi Shao, Quanfeng Lu, Peng Gao, Kaipeng Zhang, Yu Qiao, and Ping Luo. Chartassisstant: A universal chart multimodal language model via chart-to-table pre-training and multitask instruction tuning.arXiv preprint arXiv:2401.02384, 2024. 1, 2

work page arXiv 2024
[25]

Plotqa: Reasoning over scientific plots

Nitesh Methani, Pritha Ganguly, Mitesh M Khapra, and Pratyush Kumar. Plotqa: Reasoning over scientific plots. InProceedings of the ieee/cvf winter conference on appli- cations of computer vision, pages 1527–1536, 2020. 2

work page 2020
[26]

Arpit Narechania, Arjun Srinivasan, and John Stasko. Nl4dv: A toolkit for generating analytic specifications for data vi- sualization from natural language queries.IEEE Transac- tions on Visualization and Computer Graphics, 27(2):369– 379, 2020. 1, 3

work page 2020
[27]

Reverse-engineering visualiza- tions: Recovering visual encodings from chart images

Jorge Poco and Jeffrey Heer. Reverse-engineering visualiza- tions: Recovering visual encodings from chart images. In Computer graphics forum, pages 353–363. Wiley Online Li- brary, 2017. 2

work page 2017
[28]

Learning transferable visual models from natural language supervision, 2021

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision, 2021. 6

work page 2021
[29]

Deepvis: Bridging natural language and data vi- sualization through step-wise reasoning.arXiv preprint arXiv:2508.01700, 2025

Zhihao Shuai, Boyan Li, Siyu Yan, Yuyu Luo, and Weikai Yang. Deepvis: Bridging natural language and data vi- sualization through step-wise reasoning.arXiv preprint arXiv:2508.01700, 2025. 1, 3

work page arXiv 2025
[30]

Vis- text: A benchmark for semantically rich chart captioning

Benny Tang, Angie Boggust, and Arvind Satyanarayan. Vis- text: A benchmark for semantically rich chart captioning. InProceedings of the 61st Annual Meeting of the Associa- tion for Computational Linguistics (Volume 1: Long Papers), pages 7268–7298, 2023. 2

work page 2023
[31]

ChartGPT: Lever- aging llms to generate charts from abstract natural language

Yuan Tian, Weiwei Cui, Dazhen Deng, Xinjing Yi, Yurun Yang, Haidong Zhang, and Yingcai Wu. ChartGPT: Lever- aging llms to generate charts from abstract natural language. IEEE Transactions on Visualization and Computer Graph- ics, 31(3):1731–1745, 2024. 1, 2, 3

work page 2024
[32]

Refchartqa: Grounding vi- sual answer on chart images through instruction tuning

Alexander V ogel, Omar Moured, Yufan Chen, Jiaming Zhang, and Rainer Stiefelhagen. Refchartqa: Grounding vi- sual answer on chart images through instruction tuning. In International Conference on Document Analysis and Recog- nition, pages 523–537. Springer, 2025. 1, 2, 3

work page 2025
[33]

Cycle-consistency learning for captioning and grounding

Ning Wang, Jiajun Deng, and Mingbo Jia. Cycle-consistency learning for captioning and grounding. InProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artifi- cial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence. AAAI Press, 2024. 3

work page 2024
[34]

Wang, E.P

Z. Wang, E.P. Simoncelli, and A.C. Bovik. Multiscale struc- tural similarity for image quality assessment. InThe Thrity- Seventh Asilomar Conference on Signals, Systems & Com- puters, 2003, pages 1398–1402 V ol.2, 2003. 6

work page 2003
[35]

Charxiv: Charting gaps in realistic chart understanding in multimodal llms.Advances in Neural In- formation Processing Systems, 37:113569–113697, 2024

Zirui Wang, Mengzhou Xia, Luxi He, Howard Chen, Yitao Liu, Richard Zhu, Kaiqu Liang, Xindi Wu, Haotian Liu, Sad- hika Malladi, et al. Charxiv: Charting gaps in realistic chart understanding in multimodal llms.Advances in Neural In- formation Processing Systems, 37:113569–113697, 2024. 1, 2, 7, 4

work page 2024
[36]

Chartmind: A comprehensive benchmark for complex real-world multimodal chart ques- tion answering

Jingxuan Wei, Nan Xu, Junnan Zhu, Gaowei Wu, Qi Chen, Bihui Yu, Lei Wang, et al. Chartmind: A comprehensive benchmark for complex real-world multimodal chart ques- tion answering. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 4555–4569, 2025. 2

work page 2025
[37]

The grammar of graphics

Leland Wilkinson. The grammar of graphics. InHandbook of computational statistics: Concepts and methods, pages 375–414. Springer, 2011. 1

work page 2011
[38]

V oyager: Ex- ploratory analysis via faceted browsing of visualization rec- ommendations.IEEE transactions on visualization and com- puter graphics, 22(1):649–658, 2015

Kanit Wongsuphasawat, Dominik Moritz, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer. V oyager: Ex- ploratory analysis via faceted browsing of visualization rec- ommendations.IEEE transactions on visualization and com- puter graphics, 22(1):649–658, 2015. 3

work page 2015
[39]

V oyager 2: Augmenting visual anal- ysis with partial view specifications

Kanit Wongsuphasawat, Zening Qu, Dominik Moritz, Riley Chang, Felix Ouk, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer. V oyager 2: Augmenting visual anal- ysis with partial view specifications. InProceedings of the 2017 chi conference on human factors in computing systems, pages 2648–2659, 2017. 3

work page 2017
[40]

Chartinsights: Evaluating multimodal large language models for low-level chart question answer- ing.arXiv preprint arXiv:2405.07001, 2024

Yifan Wu, Lutao Yan, Leixian Shen, Yunhai Wang, Nan Tang, and Yuyu Luo. Chartinsights: Evaluating multimodal large language models for low-level chart question answer- ing.arXiv preprint arXiv:2405.07001, 2024. 2

work page arXiv 2024
[41]

Chartx & chartvlm: A versatile benchmark and foundation model for complicated chart reasoning.IEEE Transactions on Image Processing, 2025

Renqiu Xia, Hancheng Ye, Xiangchao Yan, Qi Liu, Hongbin Zhou, Zijun Chen, Botian Shi, Junchi Yan, and Bo Zhang. Chartx & chartvlm: A versatile benchmark and foundation model for complicated chart reasoning.IEEE Transactions on Image Processing, 2025. 1, 3

work page 2025
[42]

Chartbench: A benchmark for complex visual reasoning in charts.arXiv preprint arXiv:2312.15915,

Zhengzhuo Xu, Sinan Du, Yiyan Qi, Chengjin Xu, Chun Yuan, and Jian Guo. Chartbench: A benchmark for complex visual reasoning in charts.arXiv preprint arXiv:2312.15915,

work page arXiv
[43]

Chartmoe: Mixture of di- versely aligned expert connector for chart understanding

Zhengzhuo Xu, Bowen Qu, Yiyan Qi, Sinan Du, Chengjin Xu, Chun Yuan, and Jian Guo. Chartmoe: Mixture of di- versely aligned expert connector for chart understanding. arXiv preprint arXiv:2409.03277, 2024. 2, 7

work page arXiv 2024
[44]

Chartpoint: Guiding mllms with grounding reflection for chart reasoning

Zhengzhuo Xu, SiNan Du, Yiyan Qi, Siwen Lu, Chengjin Xu, Chun Yuan, and Jian Guo. Chartpoint: Guiding mllms with grounding reflection for chart reasoning. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion (ICCV), pages 426–436, 2025. 2, 3

work page 2025
[45]

Effective training data synthesis for improving mllm chart understanding

Yuwei Yang, Zeyu Zhang, Yunzhong Hou, Zhuowan Li, Gaowen Liu, Ali Payani, Yuan-Sen Ting, and Liang Zheng. Effective training data synthesis for improving mllm chart understanding. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 2653–2663,

work page
[46]

Generative ai for visualization: State of the art and future directions.Visual Informatics, 8 (2):43–66, 2024

Yilin Ye, Jianing Hao, Yihan Hou, Zhan Wang, Shishi Xiao, Yuyu Luo, and Wei Zeng. Generative ai for visualization: State of the art and future directions.Visual Informatics, 8 (2):43–66, 2024. 3

work page 2024
[47]

Dual- gan: Unsupervised dual learning for image-to-image trans- lation

Zili Yi, Hao Zhang, Ping Tan, and Minglun Gong. Dual- gan: Unsupervised dual learning for image-to-image trans- lation. InProceedings of the IEEE international conference on computer vision, pages 2849–2857, 2017. 3

work page 2017
[48]

Tinychart: Efficient chart understanding with visual token merging and program- of-thoughts learning.arXiv preprint arXiv:2404.16635,

Liang Zhang, Anwen Hu, Haiyang Xu, Ming Yan, Yichen Xu, Qin Jin, Ji Zhang, and Fei Huang. Tinychart: Efficient chart understanding with visual token merging and program- of-thoughts learning.arXiv preprint arXiv:2404.16635,

work page arXiv
[49]

Advancing chart question answering with robust chart com- ponent recognition

Hanwen Zheng, Sijia Wang, Chris Thomas, and Lifu Huang. Advancing chart question answering with robust chart com- ponent recognition. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 5741–

work page
[50]

Unpaired image-to-image translation using cycle- consistent adversarial networks

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycle- consistent adversarial networks. InProceedings of the IEEE international conference on computer vision, pages 2223– 2232, 2017. 3 CycleChart: A Unified Consistency-Based Learning Framework for Bidirectional Chart Understanding and Generation Supp...

work page 2017
[51]

unanswerable

Details of Experiment Setting 8.1. Hyperparameters We report the full hyperparameter configuration used for CycleChart-3B and CycleChart-7B fine-tuning: •Optimizer:AdamW •Learning rate:1×10 −5 •LoRA:rank = 16,α= 32, dropout = 0.05 •Batch size:2 •Training steps:2000 •Scheduler:constant •Frozen components:vision encoder (only projector + LLM updated) These ...

work page 2000
[52]

A natural language query describing a charting intent

work page
[53]

columns_with_type

A table schema with data types and example rows. Your task is to generate a valid Vega-Lite specifi- cation in JSON format that visualizes the requested information. If any filtering or aggregation is implied in the query, include it using thetransformfield. Input format: • Natural language query:str • Table info: { "columns_with_type": {}, "column_exampl...

work page
[54]

columns_with_type

A table schema with data types and example rows. Your task is to generate a valid Vega-Lite specifica- tion in JSON format. Do not extract or infer any data values from the im- age; only describe its visual structure and encodings using the provided table schema. Input format: • Chart image:img • Table info: { "columns_with_type": {}, "column_examples": [...

work page
[55]

Your task is to extract all visible data values from the chart into a clean CSV table

Its corresponding Vega-Lite specification. Your task is to extract all visible data values from the chart into a clean CSV table. Only include columns that are visually encoded in the chart (from: x, y, color, size, theta, percentage). If the chart contains subplots (usingrowor columnencodings), include these fields as addi- tional columns in the output. ...

work page
[56]

yes” or “no

A natural language question about the chart. Your task is to answer the question using ONLY in- formation that is visible in the chart. Answer rules: • number→digits only; include unit ONLY if shown (e.g., %, $); no commas. • boolean→exactly “yes” or “no”. • category/text→must be a label that appears in the chart. • if not answerable→“unanswerable”. Input...

work page
[57]

column"or

Dataset Construction Details CycleChart-Bench is constructed on top of nvBench 2.0, which provides natural-language queries, raw tables, and Vega-Lite specifications for NL2Chart. However, nvBench 2.0 contains only single-view charts and offers a limited va- riety of visualization types. To better support our unified generate–parse–reason framework, we su...

work page
[58]

Successful Reasoning Cases (Ours vs

Quanlitative Analysis 10.1. Successful Reasoning Cases (Ours vs. Base- line) Table 4 presents representative ChartQA examples that re- veal how generate–parse consistency improves CycleChart- 7B’s reasoning behavior. All examples are taken from the ChartXiv[35] benchmark, whose figures are considerably more complex than those in our training corpus. While...

work page 2006

[1] [1]

Recycle-gan: Unsupervised video retargeting

Aayush Bansal, Shugao Ma, Deva Ramanan, and Yaser Sheikh. Recycle-gan: Unsupervised video retargeting. In Proceedings of the European conference on computer vision (ECCV), pages 119–135, 2018. 3

work page 2018

[2] [2]

Onechart: Purify the chart structural extrac- tion via one auxiliary token

Jinyue Chen, Lingyu Kong, Haoran Wei, Chenglong Liu, Zheng Ge, Liang Zhao, Jianjian Sun, Chunrui Han, and Xi- angyu Zhang. Onechart: Purify the chart structural extrac- tion via one auxiliary token. InProceedings of the 32nd ACM International Conference on Multimedia, pages 147– 155, 2024. 2

work page 2024

[3] [3]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blis- tein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025. 6

work page internal anchor Pith review Pith/arXiv arXiv 2025

[4] [4]

Towards cycle-consistent models for text and image retrieval

Marcella Cornia, Lorenzo Baraldi, Hamed R Tavakoli, and Rita Cucchiara. Towards cycle-consistent models for text and image retrieval. InProceedings of the European Con- ference on Computer Vision (ECCV) Workshops, pages 0–0,

work page

[5] [5]

DashBot: Insight-driven dashboard generation based on deep reinforcement learning.IEEE Transactions on Visu- alization and Computer Graphics, 29(1):690–700, 2023

Dazhen Deng, Aoyu Wu, Huamin Qu, and Yingcai Wu. DashBot: Insight-driven dashboard generation based on deep reinforcement learning.IEEE Transactions on Visu- alization and Computer Graphics, 29(1):690–700, 2023. 3

work page 2023

[6] [6]

Chartllama: A mul- timodal llm for chart understanding and generation.arXiv preprint arXiv:2311.16483, 2023

Yucheng Han, Chi Zhang, Xin Chen, Xu Yang, Zhibin Wang, Gang Yu, Bin Fu, and Hanwang Zhang. Chartllama: A mul- timodal llm for chart understanding and generation.arXiv preprint arXiv:2311.16483, 2023. 1, 3

work page arXiv 2023

[7] [7]

Dual learning for machine trans- lation.Advances in neural information processing systems, 29, 2016

Di He, Yingce Xia, Tao Qin, Liwei Wang, Nenghai Yu, Tie- Yan Liu, and Wei-Ying Ma. Dual learning for machine trans- lation.Advances in neural information processing systems, 29, 2016. 3

work page 2016

[8] [8]

Glm-4.5v and glm-4.1v-thinking: Towards versatile multimodal reasoning with scalable reinforcement learning, 2025

Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guob- ing Gan, Haomiao Tang, Jiale Cheng, Ji Qi, Junhui Ji, Li- hang Pan, et al. Glm-4.5v and glm-4.1v-thinking: Towards versatile multimodal reasoning with scalable reinforcement learning, 2025. 6

work page 2025

[9] [9]

Natural language gen- eration for visualizations: State of the art, challenges and fu- ture directions

Enamul Hoque and M Saidul Islam. Natural language gen- eration for visualizations: State of the art, challenges and fu- ture directions. InComputer Graphics Forum, page e15266. Wiley Online Library, 2025. 3

work page 2025

[10] [10]

Image quality metrics: Psnr vs

Alain Hor ´e and Djemel Ziou. Image quality metrics: Psnr vs. ssim. In2010 20th International Conference on Pattern Recognition, pages 2366–2369, 2010. 6

work page 2010

[11] [11]

Scicap: Generating captions for scientific figures

Ting-Yao Hsu, C Lee Giles, and Ting-Hao Huang. Scicap: Generating captions for scientific figures. InFindings of the Association for Computational Linguistics: EMNLP 2021, pages 3258–3264, 2021. 2

work page 2021

[12] [12]

From pixels to insights: A survey on automatic chart understanding in the era of large foundation models.IEEE Transactions on Knowledge and Data Engineering, 2024

Kung-Hsiang Huang, Hou Pong Chan, Yi R Fung, Haoyi Qiu, Mingyang Zhou, Shafiq Joty, Shih-Fu Chang, and Heng Ji. From pixels to insights: A survey on automatic chart understanding in the era of large foundation models.IEEE Transactions on Knowledge and Data Engineering, 2024. 1

work page 2024

[13] [13]

Do LVLMs understand charts? analyzing and correcting factual errors in chart captioning

Kung-Hsiang Huang, Mingyang Zhou, Hou Pong Chan, Yi Fung, Zhenhailong Wang, Lingyu Zhang, Shih-Fu Chang, and Heng Ji. Do LVLMs understand charts? analyzing and correcting factual errors in chart captioning. InFindings of the Association for Computational Linguistics: ACL 2024, pages 730–749, Bangkok, Thailand, 2024. Association for Computational Linguistics. 1, 3

work page 2024

[14] [14]

Claude 3.5 sonnet news.https : / / www

Anthropic Inc. Claude 3.5 sonnet news.https : / / www . anthropic . com / news / claude - 3 - 5 - sonnet, 2024. 6

work page 2024

[15] [15]

Introducing gpt-4.1 in the api.https:// openai.com/index/gpt-4-1/, 2025

OpenAI Inc. Introducing gpt-4.1 in the api.https:// openai.com/index/gpt-4-1/, 2025. 6

work page 2025

[16] [16]

Dvqa: Understanding data visualizations via ques- tion answering

Kushal Kafle, Brian Price, Scott Cohen, and Christopher Kanan. Dvqa: Understanding data visualizations via ques- tion answering. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 5648–5656,

work page

[17] [17]

Rouge: A package for automatic evaluation of summaries

Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. InText summarization branches out, pages 74–81, 2004. 6

work page 2004

[18] [18]

nvbench 2.0: A benchmark for natural language to visualization under ambi- guity.arXiv preprint arXiv:2503.12880, 2025

Tianqi Luo, Chuhan Huang, Leixian Shen, Boyan Li, Shuyu Shen, Wei Zeng, Nan Tang, and Yuyu Luo. nvbench 2.0: A benchmark for natural language to visualization under ambi- guity.arXiv preprint arXiv:2503.12880, 2025. 1, 3, 4

work page arXiv 2025

[19] [19]

nvbench: A large- scale synthesized dataset for cross-domain natural language to visualization task.arXiv preprint arXiv:2112.12926,

Yuyu Luo, Jiawei Tang, and Guoliang Li. nvbench: A large- scale synthesized dataset for cross-domain natural language to visualization task.arXiv preprint arXiv:2112.12926,

work page arXiv

[20] [20]

Chartqa: A benchmark for question answer- ing about charts with visual and logical reasoning

Ahmed Masry, Xuan Long Do, Jia Qing Tan, Shafiq Joty, and Enamul Hoque. Chartqa: A benchmark for question answer- ing about charts with visual and logical reasoning. InFind- ings of the association for computational linguistics: ACL 2022, pages 2263–2279, 2022. 1, 2, 6, 7

work page 2022

[21] [21]

UniChart: A universal vision-language pretrained model for chart comprehension and reasoning

Ahmed Masry, Parsa Kavehzadeh, Xuan Long Do, Ena- mul Hoque, and Shafiq Joty. Unichart: A universal vision- language pretrained model for chart comprehension and rea- soning.arXiv preprint arXiv:2305.14761, 2023. 2, 6

work page arXiv 2023

[22] [22]

Chartinstruct: Instruction tuning for chart comprehension and reasoning

Ahmed Masry, Mehrad Shahmohammadi, Md Rizwan Parvez, Enamul Hoque, and Shafiq Joty. Chartinstruct: Instruction tuning for chart comprehension and reasoning. arXiv preprint arXiv:2403.09028, 2024. 2, 6

work page arXiv 2024

[23] [23]

Chartqapro: A more di- verse and challenging benchmark for chart question answer- ing.arXiv preprint arXiv:2504.05506, 2025

Ahmed Masry, Mohammed Saidul Islam, Mahir Ahmed, Aayush Bajaj, Firoz Kabir, Aaryaman Kartha, Md Tah- mid Rahman Laskar, Mizanur Rahman, Shadikur Rahman, Mehrad Shahmohammadi, et al. Chartqapro: A more di- verse and challenging benchmark for chart question answer- ing.arXiv preprint arXiv:2504.05506, 2025. 1, 2, 7

work page arXiv 2025

[24] [24]

Chartassisstant: A universal chart multimodal language model via chart-to-table pre-training and multitask instruction tuning.arXiv preprint arXiv:2401.02384, 2024

Fanqing Meng, Wenqi Shao, Quanfeng Lu, Peng Gao, Kaipeng Zhang, Yu Qiao, and Ping Luo. Chartassisstant: A universal chart multimodal language model via chart-to-table pre-training and multitask instruction tuning.arXiv preprint arXiv:2401.02384, 2024. 1, 2

work page arXiv 2024

[25] [25]

Plotqa: Reasoning over scientific plots

Nitesh Methani, Pritha Ganguly, Mitesh M Khapra, and Pratyush Kumar. Plotqa: Reasoning over scientific plots. InProceedings of the ieee/cvf winter conference on appli- cations of computer vision, pages 1527–1536, 2020. 2

work page 2020

[26] [26]

Arpit Narechania, Arjun Srinivasan, and John Stasko. Nl4dv: A toolkit for generating analytic specifications for data vi- sualization from natural language queries.IEEE Transac- tions on Visualization and Computer Graphics, 27(2):369– 379, 2020. 1, 3

work page 2020

[27] [27]

Reverse-engineering visualiza- tions: Recovering visual encodings from chart images

Jorge Poco and Jeffrey Heer. Reverse-engineering visualiza- tions: Recovering visual encodings from chart images. In Computer graphics forum, pages 353–363. Wiley Online Li- brary, 2017. 2

work page 2017

[28] [28]

Learning transferable visual models from natural language supervision, 2021

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision, 2021. 6

work page 2021

[29] [29]

Deepvis: Bridging natural language and data vi- sualization through step-wise reasoning.arXiv preprint arXiv:2508.01700, 2025

Zhihao Shuai, Boyan Li, Siyu Yan, Yuyu Luo, and Weikai Yang. Deepvis: Bridging natural language and data vi- sualization through step-wise reasoning.arXiv preprint arXiv:2508.01700, 2025. 1, 3

work page arXiv 2025

[30] [30]

Vis- text: A benchmark for semantically rich chart captioning

Benny Tang, Angie Boggust, and Arvind Satyanarayan. Vis- text: A benchmark for semantically rich chart captioning. InProceedings of the 61st Annual Meeting of the Associa- tion for Computational Linguistics (Volume 1: Long Papers), pages 7268–7298, 2023. 2

work page 2023

[31] [31]

ChartGPT: Lever- aging llms to generate charts from abstract natural language

Yuan Tian, Weiwei Cui, Dazhen Deng, Xinjing Yi, Yurun Yang, Haidong Zhang, and Yingcai Wu. ChartGPT: Lever- aging llms to generate charts from abstract natural language. IEEE Transactions on Visualization and Computer Graph- ics, 31(3):1731–1745, 2024. 1, 2, 3

work page 2024

[32] [32]

Refchartqa: Grounding vi- sual answer on chart images through instruction tuning

Alexander V ogel, Omar Moured, Yufan Chen, Jiaming Zhang, and Rainer Stiefelhagen. Refchartqa: Grounding vi- sual answer on chart images through instruction tuning. In International Conference on Document Analysis and Recog- nition, pages 523–537. Springer, 2025. 1, 2, 3

work page 2025

[33] [33]

Cycle-consistency learning for captioning and grounding

Ning Wang, Jiajun Deng, and Mingbo Jia. Cycle-consistency learning for captioning and grounding. InProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artifi- cial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence. AAAI Press, 2024. 3

work page 2024

[34] [34]

Wang, E.P

Z. Wang, E.P. Simoncelli, and A.C. Bovik. Multiscale struc- tural similarity for image quality assessment. InThe Thrity- Seventh Asilomar Conference on Signals, Systems & Com- puters, 2003, pages 1398–1402 V ol.2, 2003. 6

work page 2003

[35] [35]

Charxiv: Charting gaps in realistic chart understanding in multimodal llms.Advances in Neural In- formation Processing Systems, 37:113569–113697, 2024

Zirui Wang, Mengzhou Xia, Luxi He, Howard Chen, Yitao Liu, Richard Zhu, Kaiqu Liang, Xindi Wu, Haotian Liu, Sad- hika Malladi, et al. Charxiv: Charting gaps in realistic chart understanding in multimodal llms.Advances in Neural In- formation Processing Systems, 37:113569–113697, 2024. 1, 2, 7, 4

work page 2024

[36] [36]

Chartmind: A comprehensive benchmark for complex real-world multimodal chart ques- tion answering

Jingxuan Wei, Nan Xu, Junnan Zhu, Gaowei Wu, Qi Chen, Bihui Yu, Lei Wang, et al. Chartmind: A comprehensive benchmark for complex real-world multimodal chart ques- tion answering. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 4555–4569, 2025. 2

work page 2025

[37] [37]

The grammar of graphics

Leland Wilkinson. The grammar of graphics. InHandbook of computational statistics: Concepts and methods, pages 375–414. Springer, 2011. 1

work page 2011

[38] [38]

V oyager: Ex- ploratory analysis via faceted browsing of visualization rec- ommendations.IEEE transactions on visualization and com- puter graphics, 22(1):649–658, 2015

Kanit Wongsuphasawat, Dominik Moritz, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer. V oyager: Ex- ploratory analysis via faceted browsing of visualization rec- ommendations.IEEE transactions on visualization and com- puter graphics, 22(1):649–658, 2015. 3

work page 2015

[39] [39]

V oyager 2: Augmenting visual anal- ysis with partial view specifications

Kanit Wongsuphasawat, Zening Qu, Dominik Moritz, Riley Chang, Felix Ouk, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer. V oyager 2: Augmenting visual anal- ysis with partial view specifications. InProceedings of the 2017 chi conference on human factors in computing systems, pages 2648–2659, 2017. 3

work page 2017

[40] [40]

Chartinsights: Evaluating multimodal large language models for low-level chart question answer- ing.arXiv preprint arXiv:2405.07001, 2024

Yifan Wu, Lutao Yan, Leixian Shen, Yunhai Wang, Nan Tang, and Yuyu Luo. Chartinsights: Evaluating multimodal large language models for low-level chart question answer- ing.arXiv preprint arXiv:2405.07001, 2024. 2

work page arXiv 2024

[41] [41]

Chartx & chartvlm: A versatile benchmark and foundation model for complicated chart reasoning.IEEE Transactions on Image Processing, 2025

Renqiu Xia, Hancheng Ye, Xiangchao Yan, Qi Liu, Hongbin Zhou, Zijun Chen, Botian Shi, Junchi Yan, and Bo Zhang. Chartx & chartvlm: A versatile benchmark and foundation model for complicated chart reasoning.IEEE Transactions on Image Processing, 2025. 1, 3

work page 2025

[42] [42]

Chartbench: A benchmark for complex visual reasoning in charts.arXiv preprint arXiv:2312.15915,

Zhengzhuo Xu, Sinan Du, Yiyan Qi, Chengjin Xu, Chun Yuan, and Jian Guo. Chartbench: A benchmark for complex visual reasoning in charts.arXiv preprint arXiv:2312.15915,

work page arXiv

[43] [43]

Chartmoe: Mixture of di- versely aligned expert connector for chart understanding

Zhengzhuo Xu, Bowen Qu, Yiyan Qi, Sinan Du, Chengjin Xu, Chun Yuan, and Jian Guo. Chartmoe: Mixture of di- versely aligned expert connector for chart understanding. arXiv preprint arXiv:2409.03277, 2024. 2, 7

work page arXiv 2024

[44] [44]

Chartpoint: Guiding mllms with grounding reflection for chart reasoning

Zhengzhuo Xu, SiNan Du, Yiyan Qi, Siwen Lu, Chengjin Xu, Chun Yuan, and Jian Guo. Chartpoint: Guiding mllms with grounding reflection for chart reasoning. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion (ICCV), pages 426–436, 2025. 2, 3

work page 2025

[45] [45]

Effective training data synthesis for improving mllm chart understanding

Yuwei Yang, Zeyu Zhang, Yunzhong Hou, Zhuowan Li, Gaowen Liu, Ali Payani, Yuan-Sen Ting, and Liang Zheng. Effective training data synthesis for improving mllm chart understanding. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 2653–2663,

work page

[46] [46]

Generative ai for visualization: State of the art and future directions.Visual Informatics, 8 (2):43–66, 2024

Yilin Ye, Jianing Hao, Yihan Hou, Zhan Wang, Shishi Xiao, Yuyu Luo, and Wei Zeng. Generative ai for visualization: State of the art and future directions.Visual Informatics, 8 (2):43–66, 2024. 3

work page 2024

[47] [47]

Dual- gan: Unsupervised dual learning for image-to-image trans- lation

Zili Yi, Hao Zhang, Ping Tan, and Minglun Gong. Dual- gan: Unsupervised dual learning for image-to-image trans- lation. InProceedings of the IEEE international conference on computer vision, pages 2849–2857, 2017. 3

work page 2017

[48] [48]

Tinychart: Efficient chart understanding with visual token merging and program- of-thoughts learning.arXiv preprint arXiv:2404.16635,

Liang Zhang, Anwen Hu, Haiyang Xu, Ming Yan, Yichen Xu, Qin Jin, Ji Zhang, and Fei Huang. Tinychart: Efficient chart understanding with visual token merging and program- of-thoughts learning.arXiv preprint arXiv:2404.16635,

work page arXiv

[49] [49]

Advancing chart question answering with robust chart com- ponent recognition

Hanwen Zheng, Sijia Wang, Chris Thomas, and Lifu Huang. Advancing chart question answering with robust chart com- ponent recognition. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 5741–

work page

[50] [50]

Unpaired image-to-image translation using cycle- consistent adversarial networks

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycle- consistent adversarial networks. InProceedings of the IEEE international conference on computer vision, pages 2223– 2232, 2017. 3 CycleChart: A Unified Consistency-Based Learning Framework for Bidirectional Chart Understanding and Generation Supp...

work page 2017

[51] [51]

unanswerable

Details of Experiment Setting 8.1. Hyperparameters We report the full hyperparameter configuration used for CycleChart-3B and CycleChart-7B fine-tuning: •Optimizer:AdamW •Learning rate:1×10 −5 •LoRA:rank = 16,α= 32, dropout = 0.05 •Batch size:2 •Training steps:2000 •Scheduler:constant •Frozen components:vision encoder (only projector + LLM updated) These ...

work page 2000

[52] [52]

A natural language query describing a charting intent

work page

[53] [53]

columns_with_type

A table schema with data types and example rows. Your task is to generate a valid Vega-Lite specifi- cation in JSON format that visualizes the requested information. If any filtering or aggregation is implied in the query, include it using thetransformfield. Input format: • Natural language query:str • Table info: { "columns_with_type": {}, "column_exampl...

work page

[54] [54]

columns_with_type

A table schema with data types and example rows. Your task is to generate a valid Vega-Lite specifica- tion in JSON format. Do not extract or infer any data values from the im- age; only describe its visual structure and encodings using the provided table schema. Input format: • Chart image:img • Table info: { "columns_with_type": {}, "column_examples": [...

work page

[55] [55]

Your task is to extract all visible data values from the chart into a clean CSV table

Its corresponding Vega-Lite specification. Your task is to extract all visible data values from the chart into a clean CSV table. Only include columns that are visually encoded in the chart (from: x, y, color, size, theta, percentage). If the chart contains subplots (usingrowor columnencodings), include these fields as addi- tional columns in the output. ...

work page

[56] [56]

yes” or “no

A natural language question about the chart. Your task is to answer the question using ONLY in- formation that is visible in the chart. Answer rules: • number→digits only; include unit ONLY if shown (e.g., %, $); no commas. • boolean→exactly “yes” or “no”. • category/text→must be a label that appears in the chart. • if not answerable→“unanswerable”. Input...

work page

[57] [57]

column"or

Dataset Construction Details CycleChart-Bench is constructed on top of nvBench 2.0, which provides natural-language queries, raw tables, and Vega-Lite specifications for NL2Chart. However, nvBench 2.0 contains only single-view charts and offers a limited va- riety of visualization types. To better support our unified generate–parse–reason framework, we su...

work page

[58] [58]

Successful Reasoning Cases (Ours vs

Quanlitative Analysis 10.1. Successful Reasoning Cases (Ours vs. Base- line) Table 4 presents representative ChartQA examples that re- veal how generate–parse consistency improves CycleChart- 7B’s reasoning behavior. All examples are taken from the ChartXiv[35] benchmark, whose figures are considerably more complex than those in our training corpus. While...

work page 2006