pith. sign in

arxiv: 2512.19173 · v2 · submitted 2025-12-22 · 💻 cs.CL · cs.CV

CycleChart: A Unified Consistency-Based Learning Framework for Bidirectional Chart Understanding and Generation

Pith reviewed 2026-05-16 20:51 UTC · model grok-4.3

classification 💻 cs.CL cs.CV
keywords chart understandingchart generationconsistency learningbidirectional modelsmultimodal learningdata visualization
0
0 comments X

The pith

Enforcing generate-parse consistency on aligned chart data improves cross-task performance and generalization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CycleChart as a framework that trains models on chart tasks by cycling through generation and parsing for each data instance rather than handling tasks in isolation. From a table and natural-language query the model produces a chart specification, renders it to an image, then recovers the schema and data, with a consistency objective aligning the forward and reverse directions. This lifecycle approach is meant to capture the shared transformations between data, visual encoding, and structured recovery. A sympathetic reader would care because current chart work treats understanding and generation as separate problems, and linking them through consistency could yield models that handle new charts more reliably.

Core claim

CycleChart organizes all tasks around each single data instance in a lifecycle from source table and query through chart generation and rendering to schema and data parsing, with a generate-parse consistency objective that enforces semantic alignment between the forward and reverse directions, yielding strong results on four tasks and improved transfer to external benchmarks.

What carries the argument

The per-instance lifecycle design together with the generate-parse consistency objective that links generation from data to recovery from the rendered image.

If this is right

  • The model captures the full chain of transformations from raw data through visual encoding to structured recovery.
  • Performance improves simultaneously on NL2Chart generation, schema parsing, data parsing, and ChartQA.
  • The approach transfers effectively to unseen external benchmarks.
  • Cross-task generalization increases relative to conventional multi-task training that samples tasks independently.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same consistency cycle could be applied to other paired generation-understanding tasks such as diagram or map creation.
  • Training in this manner might reduce errors when charts are later edited or queried in new ways.
  • Extending the framework to charts with interactive elements or multiple linked views remains open for testing.

Load-bearing premise

That enforcing generate-parse consistency on the authors' lifecycle-aligned benchmark will produce models whose improvements generalize beyond the specific chart rendering pipeline and annotation style used.

What would settle it

Evaluating the trained model on charts rendered with a different library or on real-world charts that lack the aligned annotations from CycleChart-Bench.

Figures

Figures reproduced from arXiv: 2512.19173 by Dazhen Deng, Sen Yang, Yingcai Wu, Yuan Tian, Yuchen He.

Figure 1
Figure 1. Figure 1: Overview of the chart creation pipeline, adapted from [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: CycleChart-Bench construction. nvBench-2.0 queries [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the CycleChart training framework. Given a source table and NL query, the model generates a chart specification [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Impact of training steps across benchmarks. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

Current chart-related tasks, such as chart generation (NL2Chart), chart schema parsing, chart data parsing, and chart question answering (ChartQA), are typically studied in isolation, preventing models from learning the shared semantics that link chart creation and interpretation. We introduce CycleChart, a consistency-based learning framework for bidirectional chart understanding and generation. Unlike conventional multi-task approaches that draw training samples independently across tasks, CycleChart organizes all tasks around each single data instance. From a source table and natural-language query, the model generates a chart specification, renders and executes it, then learns to recover the schema and underlying data from the resulting chart image. This per-instance lifecycle design lets the model capture the full chain of transformations, from raw data through visual encoding to structured recovery, and a generate--parse consistency objective enforces semantic alignment between the forward generation and reverse parsing directions. To support this framework, we construct CycleChart-Bench, a lifecycle-aligned benchmark where every chart sample carries aligned annotations for generation, schema parsing, data parsing, and question answering. CycleChart achieves strong results across all four tasks and transfers effectively to unseen external benchmarks, demonstrating improved cross-task generalization and marking a step toward more general chart understanding models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces CycleChart, a consistency-based framework for joint chart generation (NL2Chart), schema parsing, data parsing, and ChartQA. Tasks are organized around per-instance lifecycles on a newly constructed CycleChart-Bench benchmark: from table and query the model generates a chart specification, renders it, and must recover schema and data from the image, with a generate-parse consistency objective enforcing semantic alignment. The central claims are strong performance across the four tasks and effective transfer to unseen external benchmarks, demonstrating improved cross-task generalization over independent multi-task training.

Significance. If the consistency objective is shown to produce representations that capture invariant data-to-visual semantics rather than pipeline-specific regularities, the work could advance unified chart models that learn bidirectional mappings more robustly than isolated task training. The lifecycle-aligned benchmark construction is a constructive contribution, but its purpose-built nature makes the transfer claims load-bearing and in need of stronger controls.

major comments (2)
  1. [§4.2] §4.2 (Benchmark Construction): CycleChart-Bench is built around a single table→spec→render→image→parse pipeline. The generate-parse consistency objective could therefore be satisfied by learning pipeline-specific artifacts (exact encoding choices, color mappings, annotation conventions) rather than generalizable chart semantics. The transfer experiments to external benchmarks must include explicit ablations or controls for rendering and annotation differences to attribute gains to the proposed mechanism rather than distributional overlap.
  2. [§5.1] §5.1 (Experimental Results): The reported strong results and cross-task generalization claims lack ablations that isolate the contribution of the consistency loss from standard multi-task training on the same CycleChart-Bench data. Without these comparisons (and without quantitative metrics, error analysis, or statistical significance in the main results), it is difficult to confirm that the per-instance lifecycle design drives the improvements.
minor comments (2)
  1. [§3] Notation for the consistency objective (e.g., the exact formulation of the cycle loss) should be clarified with an explicit equation in §3 to avoid ambiguity when comparing to standard reconstruction losses.
  2. [Figure 2] Figure 2 (lifecycle diagram) would benefit from explicit arrows showing the forward generation and reverse parsing paths with the consistency term highlighted.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below, providing our response and indicating planned revisions to strengthen the presentation of the consistency objective and experimental claims.

read point-by-point responses
  1. Referee: [§4.2] §4.2 (Benchmark Construction): CycleChart-Bench is built around a single table→spec→render→image→parse pipeline. The generate-parse consistency objective could therefore be satisfied by learning pipeline-specific artifacts (exact encoding choices, color mappings, annotation conventions) rather than generalizable chart semantics. The transfer experiments to external benchmarks must include explicit ablations or controls for rendering and annotation differences to attribute gains to the proposed mechanism rather than distributional overlap.

    Authors: We agree that the single-pipeline construction of CycleChart-Bench introduces a risk that the consistency objective could exploit rendering-specific regularities rather than invariant data-to-visual semantics. Our transfer results to external benchmarks (which use different rendering libraries and annotation conventions) provide supporting evidence for generalization, but we acknowledge that explicit controls would make this attribution more robust. In the revised manuscript we will add an ablation that systematically varies rendering parameters and annotation styles between CycleChart-Bench and the target external benchmarks while measuring the resulting performance delta. revision: yes

  2. Referee: [§5.1] §5.1 (Experimental Results): The reported strong results and cross-task generalization claims lack ablations that isolate the contribution of the consistency loss from standard multi-task training on the same CycleChart-Bench data. Without these comparisons (and without quantitative metrics, error analysis, or statistical significance in the main results), it is difficult to confirm that the per-instance lifecycle design drives the improvements.

    Authors: We have included an ablation comparing the full CycleChart framework against a standard multi-task baseline trained on identical CycleChart-Bench data; these results appear in Section 5.2 and the appendix and show additional gains attributable to the consistency loss. We agree that the main results would benefit from expanded quantitative support. In the revision we will add statistical significance testing (paired t-tests with bootstrap confidence intervals), a concise error analysis of failure modes, and additional metrics to the primary experimental tables. revision: partial

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper introduces a consistency-based training framework organized around per-instance lifecycles and constructs CycleChart-Bench to support it, but the central claims rest on empirical results across tasks plus explicit transfer evaluation on unseen external benchmarks. No equations or self-citations are shown that reduce the consistency objective or generalization claim to a fitted parameter or input by construction; the method is a standard bidirectional alignment technique whose outputs are not definitionally equivalent to its training signals. This is the common honest case of a self-contained empirical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on the assumption that chart rendering is deterministic and invertible enough for consistency training to be meaningful, plus standard supervised learning assumptions. No explicit free parameters or invented physical entities are described.

axioms (2)
  • domain assumption Chart rendering from specification to image is a deterministic, lossless-enough mapping for the reverse parsing task to be well-defined.
    Invoked when the model is required to recover schema and data from the rendered chart image.
  • ad hoc to paper Joint training on the closed generation-parsing loop improves cross-task generalization beyond independent multi-task training.
    Central modeling assumption of the consistency objective.
invented entities (1)
  • CycleChart-Bench no independent evidence
    purpose: Lifecycle-aligned benchmark providing aligned annotations for generation, schema parsing, data parsing, and QA on the same instances.
    New dataset constructed to support the per-instance training loop.

pith-pipeline@v0.9.0 · 5523 in / 1513 out tokens · 27262 ms · 2026-05-16T20:51:03.853772+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 1 internal anchor

  1. [1]

    Recycle-gan: Unsupervised video retargeting

    Aayush Bansal, Shugao Ma, Deva Ramanan, and Yaser Sheikh. Recycle-gan: Unsupervised video retargeting. In Proceedings of the European conference on computer vision (ECCV), pages 119–135, 2018. 3

  2. [2]

    Onechart: Purify the chart structural extrac- tion via one auxiliary token

    Jinyue Chen, Lingyu Kong, Haoran Wei, Chenglong Liu, Zheng Ge, Liang Zhao, Jianjian Sun, Chunrui Han, and Xi- angyu Zhang. Onechart: Purify the chart structural extrac- tion via one auxiliary token. InProceedings of the 32nd ACM International Conference on Multimedia, pages 147– 155, 2024. 2

  3. [3]

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blis- tein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025. 6

  4. [4]

    Towards cycle-consistent models for text and image retrieval

    Marcella Cornia, Lorenzo Baraldi, Hamed R Tavakoli, and Rita Cucchiara. Towards cycle-consistent models for text and image retrieval. InProceedings of the European Con- ference on Computer Vision (ECCV) Workshops, pages 0–0,

  5. [5]

    DashBot: Insight-driven dashboard generation based on deep reinforcement learning.IEEE Transactions on Visu- alization and Computer Graphics, 29(1):690–700, 2023

    Dazhen Deng, Aoyu Wu, Huamin Qu, and Yingcai Wu. DashBot: Insight-driven dashboard generation based on deep reinforcement learning.IEEE Transactions on Visu- alization and Computer Graphics, 29(1):690–700, 2023. 3

  6. [6]

    Chartllama: A mul- timodal llm for chart understanding and generation.arXiv preprint arXiv:2311.16483, 2023

    Yucheng Han, Chi Zhang, Xin Chen, Xu Yang, Zhibin Wang, Gang Yu, Bin Fu, and Hanwang Zhang. Chartllama: A mul- timodal llm for chart understanding and generation.arXiv preprint arXiv:2311.16483, 2023. 1, 3

  7. [7]

    Dual learning for machine trans- lation.Advances in neural information processing systems, 29, 2016

    Di He, Yingce Xia, Tao Qin, Liwei Wang, Nenghai Yu, Tie- Yan Liu, and Wei-Ying Ma. Dual learning for machine trans- lation.Advances in neural information processing systems, 29, 2016. 3

  8. [8]

    Glm-4.5v and glm-4.1v-thinking: Towards versatile multimodal reasoning with scalable reinforcement learning, 2025

    Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guob- ing Gan, Haomiao Tang, Jiale Cheng, Ji Qi, Junhui Ji, Li- hang Pan, et al. Glm-4.5v and glm-4.1v-thinking: Towards versatile multimodal reasoning with scalable reinforcement learning, 2025. 6

  9. [9]

    Natural language gen- eration for visualizations: State of the art, challenges and fu- ture directions

    Enamul Hoque and M Saidul Islam. Natural language gen- eration for visualizations: State of the art, challenges and fu- ture directions. InComputer Graphics Forum, page e15266. Wiley Online Library, 2025. 3

  10. [10]

    Image quality metrics: Psnr vs

    Alain Hor ´e and Djemel Ziou. Image quality metrics: Psnr vs. ssim. In2010 20th International Conference on Pattern Recognition, pages 2366–2369, 2010. 6

  11. [11]

    Scicap: Generating captions for scientific figures

    Ting-Yao Hsu, C Lee Giles, and Ting-Hao Huang. Scicap: Generating captions for scientific figures. InFindings of the Association for Computational Linguistics: EMNLP 2021, pages 3258–3264, 2021. 2

  12. [12]

    From pixels to insights: A survey on automatic chart understanding in the era of large foundation models.IEEE Transactions on Knowledge and Data Engineering, 2024

    Kung-Hsiang Huang, Hou Pong Chan, Yi R Fung, Haoyi Qiu, Mingyang Zhou, Shafiq Joty, Shih-Fu Chang, and Heng Ji. From pixels to insights: A survey on automatic chart understanding in the era of large foundation models.IEEE Transactions on Knowledge and Data Engineering, 2024. 1

  13. [13]

    Do LVLMs understand charts? analyzing and correcting factual errors in chart captioning

    Kung-Hsiang Huang, Mingyang Zhou, Hou Pong Chan, Yi Fung, Zhenhailong Wang, Lingyu Zhang, Shih-Fu Chang, and Heng Ji. Do LVLMs understand charts? analyzing and correcting factual errors in chart captioning. InFindings of the Association for Computational Linguistics: ACL 2024, pages 730–749, Bangkok, Thailand, 2024. Association for Computational Linguistics. 1, 3

  14. [14]

    Claude 3.5 sonnet news.https : / / www

    Anthropic Inc. Claude 3.5 sonnet news.https : / / www . anthropic . com / news / claude - 3 - 5 - sonnet, 2024. 6

  15. [15]

    Introducing gpt-4.1 in the api.https:// openai.com/index/gpt-4-1/, 2025

    OpenAI Inc. Introducing gpt-4.1 in the api.https:// openai.com/index/gpt-4-1/, 2025. 6

  16. [16]

    Dvqa: Understanding data visualizations via ques- tion answering

    Kushal Kafle, Brian Price, Scott Cohen, and Christopher Kanan. Dvqa: Understanding data visualizations via ques- tion answering. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 5648–5656,

  17. [17]

    Rouge: A package for automatic evaluation of summaries

    Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. InText summarization branches out, pages 74–81, 2004. 6

  18. [18]

    nvbench 2.0: A benchmark for natural language to visualization under ambi- guity.arXiv preprint arXiv:2503.12880, 2025

    Tianqi Luo, Chuhan Huang, Leixian Shen, Boyan Li, Shuyu Shen, Wei Zeng, Nan Tang, and Yuyu Luo. nvbench 2.0: A benchmark for natural language to visualization under ambi- guity.arXiv preprint arXiv:2503.12880, 2025. 1, 3, 4

  19. [19]

    nvbench: A large- scale synthesized dataset for cross-domain natural language to visualization task.arXiv preprint arXiv:2112.12926,

    Yuyu Luo, Jiawei Tang, and Guoliang Li. nvbench: A large- scale synthesized dataset for cross-domain natural language to visualization task.arXiv preprint arXiv:2112.12926,

  20. [20]

    Chartqa: A benchmark for question answer- ing about charts with visual and logical reasoning

    Ahmed Masry, Xuan Long Do, Jia Qing Tan, Shafiq Joty, and Enamul Hoque. Chartqa: A benchmark for question answer- ing about charts with visual and logical reasoning. InFind- ings of the association for computational linguistics: ACL 2022, pages 2263–2279, 2022. 1, 2, 6, 7

  21. [21]

    UniChart: A universal vision-language pretrained model for chart comprehension and reasoning

    Ahmed Masry, Parsa Kavehzadeh, Xuan Long Do, Ena- mul Hoque, and Shafiq Joty. Unichart: A universal vision- language pretrained model for chart comprehension and rea- soning.arXiv preprint arXiv:2305.14761, 2023. 2, 6

  22. [22]

    Chartinstruct: Instruction tuning for chart comprehension and reasoning

    Ahmed Masry, Mehrad Shahmohammadi, Md Rizwan Parvez, Enamul Hoque, and Shafiq Joty. Chartinstruct: Instruction tuning for chart comprehension and reasoning. arXiv preprint arXiv:2403.09028, 2024. 2, 6

  23. [23]

    Chartqapro: A more di- verse and challenging benchmark for chart question answer- ing.arXiv preprint arXiv:2504.05506, 2025

    Ahmed Masry, Mohammed Saidul Islam, Mahir Ahmed, Aayush Bajaj, Firoz Kabir, Aaryaman Kartha, Md Tah- mid Rahman Laskar, Mizanur Rahman, Shadikur Rahman, Mehrad Shahmohammadi, et al. Chartqapro: A more di- verse and challenging benchmark for chart question answer- ing.arXiv preprint arXiv:2504.05506, 2025. 1, 2, 7

  24. [24]

    Chartassisstant: A universal chart multimodal language model via chart-to-table pre-training and multitask instruction tuning.arXiv preprint arXiv:2401.02384, 2024

    Fanqing Meng, Wenqi Shao, Quanfeng Lu, Peng Gao, Kaipeng Zhang, Yu Qiao, and Ping Luo. Chartassisstant: A universal chart multimodal language model via chart-to-table pre-training and multitask instruction tuning.arXiv preprint arXiv:2401.02384, 2024. 1, 2

  25. [25]

    Plotqa: Reasoning over scientific plots

    Nitesh Methani, Pritha Ganguly, Mitesh M Khapra, and Pratyush Kumar. Plotqa: Reasoning over scientific plots. InProceedings of the ieee/cvf winter conference on appli- cations of computer vision, pages 1527–1536, 2020. 2

  26. [26]

    Arpit Narechania, Arjun Srinivasan, and John Stasko. Nl4dv: A toolkit for generating analytic specifications for data vi- sualization from natural language queries.IEEE Transac- tions on Visualization and Computer Graphics, 27(2):369– 379, 2020. 1, 3

  27. [27]

    Reverse-engineering visualiza- tions: Recovering visual encodings from chart images

    Jorge Poco and Jeffrey Heer. Reverse-engineering visualiza- tions: Recovering visual encodings from chart images. In Computer graphics forum, pages 353–363. Wiley Online Li- brary, 2017. 2

  28. [28]

    Learning transferable visual models from natural language supervision, 2021

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision, 2021. 6

  29. [29]

    Deepvis: Bridging natural language and data vi- sualization through step-wise reasoning.arXiv preprint arXiv:2508.01700, 2025

    Zhihao Shuai, Boyan Li, Siyu Yan, Yuyu Luo, and Weikai Yang. Deepvis: Bridging natural language and data vi- sualization through step-wise reasoning.arXiv preprint arXiv:2508.01700, 2025. 1, 3

  30. [30]

    Vis- text: A benchmark for semantically rich chart captioning

    Benny Tang, Angie Boggust, and Arvind Satyanarayan. Vis- text: A benchmark for semantically rich chart captioning. InProceedings of the 61st Annual Meeting of the Associa- tion for Computational Linguistics (Volume 1: Long Papers), pages 7268–7298, 2023. 2

  31. [31]

    ChartGPT: Lever- aging llms to generate charts from abstract natural language

    Yuan Tian, Weiwei Cui, Dazhen Deng, Xinjing Yi, Yurun Yang, Haidong Zhang, and Yingcai Wu. ChartGPT: Lever- aging llms to generate charts from abstract natural language. IEEE Transactions on Visualization and Computer Graph- ics, 31(3):1731–1745, 2024. 1, 2, 3

  32. [32]

    Refchartqa: Grounding vi- sual answer on chart images through instruction tuning

    Alexander V ogel, Omar Moured, Yufan Chen, Jiaming Zhang, and Rainer Stiefelhagen. Refchartqa: Grounding vi- sual answer on chart images through instruction tuning. In International Conference on Document Analysis and Recog- nition, pages 523–537. Springer, 2025. 1, 2, 3

  33. [33]

    Cycle-consistency learning for captioning and grounding

    Ning Wang, Jiajun Deng, and Mingbo Jia. Cycle-consistency learning for captioning and grounding. InProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artifi- cial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence. AAAI Press, 2024. 3

  34. [34]

    Wang, E.P

    Z. Wang, E.P. Simoncelli, and A.C. Bovik. Multiscale struc- tural similarity for image quality assessment. InThe Thrity- Seventh Asilomar Conference on Signals, Systems & Com- puters, 2003, pages 1398–1402 V ol.2, 2003. 6

  35. [35]

    Charxiv: Charting gaps in realistic chart understanding in multimodal llms.Advances in Neural In- formation Processing Systems, 37:113569–113697, 2024

    Zirui Wang, Mengzhou Xia, Luxi He, Howard Chen, Yitao Liu, Richard Zhu, Kaiqu Liang, Xindi Wu, Haotian Liu, Sad- hika Malladi, et al. Charxiv: Charting gaps in realistic chart understanding in multimodal llms.Advances in Neural In- formation Processing Systems, 37:113569–113697, 2024. 1, 2, 7, 4

  36. [36]

    Chartmind: A comprehensive benchmark for complex real-world multimodal chart ques- tion answering

    Jingxuan Wei, Nan Xu, Junnan Zhu, Gaowei Wu, Qi Chen, Bihui Yu, Lei Wang, et al. Chartmind: A comprehensive benchmark for complex real-world multimodal chart ques- tion answering. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 4555–4569, 2025. 2

  37. [37]

    The grammar of graphics

    Leland Wilkinson. The grammar of graphics. InHandbook of computational statistics: Concepts and methods, pages 375–414. Springer, 2011. 1

  38. [38]

    V oyager: Ex- ploratory analysis via faceted browsing of visualization rec- ommendations.IEEE transactions on visualization and com- puter graphics, 22(1):649–658, 2015

    Kanit Wongsuphasawat, Dominik Moritz, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer. V oyager: Ex- ploratory analysis via faceted browsing of visualization rec- ommendations.IEEE transactions on visualization and com- puter graphics, 22(1):649–658, 2015. 3

  39. [39]

    V oyager 2: Augmenting visual anal- ysis with partial view specifications

    Kanit Wongsuphasawat, Zening Qu, Dominik Moritz, Riley Chang, Felix Ouk, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer. V oyager 2: Augmenting visual anal- ysis with partial view specifications. InProceedings of the 2017 chi conference on human factors in computing systems, pages 2648–2659, 2017. 3

  40. [40]

    Chartinsights: Evaluating multimodal large language models for low-level chart question answer- ing.arXiv preprint arXiv:2405.07001, 2024

    Yifan Wu, Lutao Yan, Leixian Shen, Yunhai Wang, Nan Tang, and Yuyu Luo. Chartinsights: Evaluating multimodal large language models for low-level chart question answer- ing.arXiv preprint arXiv:2405.07001, 2024. 2

  41. [41]

    Chartx & chartvlm: A versatile benchmark and foundation model for complicated chart reasoning.IEEE Transactions on Image Processing, 2025

    Renqiu Xia, Hancheng Ye, Xiangchao Yan, Qi Liu, Hongbin Zhou, Zijun Chen, Botian Shi, Junchi Yan, and Bo Zhang. Chartx & chartvlm: A versatile benchmark and foundation model for complicated chart reasoning.IEEE Transactions on Image Processing, 2025. 1, 3

  42. [42]

    Chartbench: A benchmark for complex visual reasoning in charts.arXiv preprint arXiv:2312.15915,

    Zhengzhuo Xu, Sinan Du, Yiyan Qi, Chengjin Xu, Chun Yuan, and Jian Guo. Chartbench: A benchmark for complex visual reasoning in charts.arXiv preprint arXiv:2312.15915,

  43. [43]

    Chartmoe: Mixture of di- versely aligned expert connector for chart understanding

    Zhengzhuo Xu, Bowen Qu, Yiyan Qi, Sinan Du, Chengjin Xu, Chun Yuan, and Jian Guo. Chartmoe: Mixture of di- versely aligned expert connector for chart understanding. arXiv preprint arXiv:2409.03277, 2024. 2, 7

  44. [44]

    Chartpoint: Guiding mllms with grounding reflection for chart reasoning

    Zhengzhuo Xu, SiNan Du, Yiyan Qi, Siwen Lu, Chengjin Xu, Chun Yuan, and Jian Guo. Chartpoint: Guiding mllms with grounding reflection for chart reasoning. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion (ICCV), pages 426–436, 2025. 2, 3

  45. [45]

    Effective training data synthesis for improving mllm chart understanding

    Yuwei Yang, Zeyu Zhang, Yunzhong Hou, Zhuowan Li, Gaowen Liu, Ali Payani, Yuan-Sen Ting, and Liang Zheng. Effective training data synthesis for improving mllm chart understanding. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 2653–2663,

  46. [46]

    Generative ai for visualization: State of the art and future directions.Visual Informatics, 8 (2):43–66, 2024

    Yilin Ye, Jianing Hao, Yihan Hou, Zhan Wang, Shishi Xiao, Yuyu Luo, and Wei Zeng. Generative ai for visualization: State of the art and future directions.Visual Informatics, 8 (2):43–66, 2024. 3

  47. [47]

    Dual- gan: Unsupervised dual learning for image-to-image trans- lation

    Zili Yi, Hao Zhang, Ping Tan, and Minglun Gong. Dual- gan: Unsupervised dual learning for image-to-image trans- lation. InProceedings of the IEEE international conference on computer vision, pages 2849–2857, 2017. 3

  48. [48]

    Tinychart: Efficient chart understanding with visual token merging and program- of-thoughts learning.arXiv preprint arXiv:2404.16635,

    Liang Zhang, Anwen Hu, Haiyang Xu, Ming Yan, Yichen Xu, Qin Jin, Ji Zhang, and Fei Huang. Tinychart: Efficient chart understanding with visual token merging and program- of-thoughts learning.arXiv preprint arXiv:2404.16635,

  49. [49]

    Advancing chart question answering with robust chart com- ponent recognition

    Hanwen Zheng, Sijia Wang, Chris Thomas, and Lifu Huang. Advancing chart question answering with robust chart com- ponent recognition. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 5741–

  50. [50]

    Unpaired image-to-image translation using cycle- consistent adversarial networks

    Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycle- consistent adversarial networks. InProceedings of the IEEE international conference on computer vision, pages 2223– 2232, 2017. 3 CycleChart: A Unified Consistency-Based Learning Framework for Bidirectional Chart Understanding and Generation Supp...

  51. [51]

    unanswerable

    Details of Experiment Setting 8.1. Hyperparameters We report the full hyperparameter configuration used for CycleChart-3B and CycleChart-7B fine-tuning: •Optimizer:AdamW •Learning rate:1×10 −5 •LoRA:rank = 16,α= 32, dropout = 0.05 •Batch size:2 •Training steps:2000 •Scheduler:constant •Frozen components:vision encoder (only projector + LLM updated) These ...

  52. [52]

    A natural language query describing a charting intent

  53. [53]

    columns_with_type

    A table schema with data types and example rows. Your task is to generate a valid Vega-Lite specifi- cation in JSON format that visualizes the requested information. If any filtering or aggregation is implied in the query, include it using thetransformfield. Input format: • Natural language query:str • Table info: { "columns_with_type": {}, "column_exampl...

  54. [54]

    columns_with_type

    A table schema with data types and example rows. Your task is to generate a valid Vega-Lite specifica- tion in JSON format. Do not extract or infer any data values from the im- age; only describe its visual structure and encodings using the provided table schema. Input format: • Chart image:img • Table info: { "columns_with_type": {}, "column_examples": [...

  55. [55]

    Your task is to extract all visible data values from the chart into a clean CSV table

    Its corresponding Vega-Lite specification. Your task is to extract all visible data values from the chart into a clean CSV table. Only include columns that are visually encoded in the chart (from: x, y, color, size, theta, percentage). If the chart contains subplots (usingrowor columnencodings), include these fields as addi- tional columns in the output. ...

  56. [56]

    yes” or “no

    A natural language question about the chart. Your task is to answer the question using ONLY in- formation that is visible in the chart. Answer rules: • number→digits only; include unit ONLY if shown (e.g., %, $); no commas. • boolean→exactly “yes” or “no”. • category/text→must be a label that appears in the chart. • if not answerable→“unanswerable”. Input...

  57. [57]

    column"or

    Dataset Construction Details CycleChart-Bench is constructed on top of nvBench 2.0, which provides natural-language queries, raw tables, and Vega-Lite specifications for NL2Chart. However, nvBench 2.0 contains only single-view charts and offers a limited va- riety of visualization types. To better support our unified generate–parse–reason framework, we su...

  58. [58]

    Successful Reasoning Cases (Ours vs

    Quanlitative Analysis 10.1. Successful Reasoning Cases (Ours vs. Base- line) Table 4 presents representative ChartQA examples that re- veal how generate–parse consistency improves CycleChart- 7B’s reasoning behavior. All examples are taken from the ChartXiv[35] benchmark, whose figures are considerably more complex than those in our training corpus. While...