pith. machine review for the scientific record. sign in

arxiv: 2604.15781 · v1 · submitted 2026-04-17 · 💻 cs.HC

Recognition: unknown

ReVis: Towards Reusable Image-Based Visualizations with MLLMs

Authors on Pith no claims yet

Pith reviewed 2026-05-10 08:13 UTC · model grok-4.3

classification 💻 cs.HC
keywords image-based visualization reusemultimodal large language modelsdomain-specific languagehuman-AI collaborationvisualization reproductioninteractive customizationvisual encoding extraction
0
0 comments X

The pith

ReVis turns static bitmap visualizations into editable reusable designs by parsing them into a generic language with multimodal AI.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Many visualizations are shared only as images, so reusing or adapting them to new data requires starting over with significant effort and expertise. ReVis defines a generic domain-specific language that captures the visual structures and data mappings inside a chart. An MLLM pipeline reads the image and populates this language, then the system redraws the visualization from the resulting description. An interactive interface lets users check the output, replace the data, and adjust encodings. The goal is to make image-based visualizations practical to reuse without rebuilding them manually.

Core claim

ReVis introduces a generic DSL that models complex visualizations to support both decomposition and reproduction. An MLLM-based pipeline parses an input image into the DSL by extracting core visual structures and data-to-encoding mappings, then reproduces the visualization from the DSL. An interactive interface allows users to upload images, inspect the reproduced result, update the underlying data, and customize visual encodings. A gallery of 40 visualizations demonstrates the DSL's expressiveness, a quantitative study evaluates reproduction quality, and usage scenarios with interviews of 16 practitioners show the system's effectiveness for flexible reuse.

What carries the argument

The generic Domain-Specific Language (DSL) that represents visualizations through their core visual structures and data-to-encoding mappings, which the MLLM pipeline populates from images and the interface uses for reproduction and edits.

If this is right

  • The DSL can express a wide variety of complex visualizations, as shown by successful modeling of a 40-example gallery.
  • An MLLM pipeline can parse images into the DSL well enough to support accurate reproduction of the original visuals.
  • Users can update the data driving a visualization and customize its encodings through an interactive interface without redrawing from scratch.
  • Human-AI collaboration via this pipeline and interface reduces the time and expertise needed to reuse image-based visualizations compared with prior SVG or specification-based tools.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The parsed DSL could serve as an intermediate representation for automatically refreshing published charts when source data changes.
  • Collecting many parsed examples might reveal common design patterns that could inform visualization recommendation systems.
  • Extending the DSL and pipeline to handle animated or multi-view visualizations would increase the range of reusable images.

Load-bearing premise

Multimodal large language models can reliably extract complex visual structures, data mappings, and hierarchies from arbitrary visualization images with little manual correction.

What would settle it

A test collection of visualization images that include dense hierarchies, unusual encodings, or edge-case layouts where the MLLM pipeline produces incorrect DSL structures or failed reproductions in more than a small fraction of cases.

Figures

Figures reproduced from arXiv: 2604.15781 by Can Liu, Changlin Li, Fangzhuo Jin, Manusha Karunathilaka, Xiaolin Wen, Yong Wang.

Figure 1
Figure 1. Figure 1: The details of our proposed DSL are described below. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 1
Figure 1. Figure 1: The domain-specific language (DSL) adopts a hierarchical container model, in which each container has its unique identifier and coordinates. Leaf [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The MLLM-based pipeline for visualization reproduction first parses an input visualization image into a structured DSL representation using a DSL [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The ReVis interface consists of five panels: (A) the Image Preview Panel, which allows users to upload and preview visualization images; (B) the DSL Visualization Panel, which provides an overview of how the DSL decomposes the visual design into hierarchical containers; (C) the DSL Editor, which enables users to edit the DSL to modify the visual design; (D) the Visualization Result Panel, which displays th… view at source ↗
Figure 4
Figure 4. Figure 4: With ReVis, a visualization designer can redesign a visualization from a static image. She first uploads a reference image (A4) to obtain the extracted DSL and the regenerated visualization (A). By interactively editing the DSL in the DSL Editor (B-E), she explores different design variations, creates a customized design, and finally applies her own data to it. the specified mark types, where the drawing f… view at source ↗
Figure 5
Figure 5. Figure 5: With ReVis, a data analyst can flexibly restructure an existing image-based visualization (A) and apply his own data to the redesigned result. By editing the data specification of the template container, he modifies the structure and layout of the repetitive bar–line components (D-F), and by following the exemplar data format, he interactively customizes bar colors and curve linking patterns (G). as the pr… view at source ↗
Figure 6
Figure 6. Figure 6: The user interview questionnaire results. Q1-Q12 are closed-ended questions rated on a 7-point Likert scale. Q13, Q14 are open-ended questions to [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The 20 composite visualizations were used for the Quantitative Evaluation. The original input image-based visualizations are shown on the left, and [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
read the original abstract

Many expressive visualizations are shared online only as bitmap images, making them difficult to redesign or adapt to new data. Reusing such image-based visualizations requires substantial expertise and is often time-consuming, even for experienced visualization practitioners. Existing work on reproducing visualizations often relies on structured SVG or specifications, supports limited visualization types, and offers limited flexibility for customization. To address these challenges, we present ReVis, a human-AI collaboration approach that enables flexible reuse of image-based visualizations. First, a generic Domain-Specific language (DSL) is proposed to model complex visualizations and support both visualization decomposition and reproduction. Then, ReVis employs an MLLM-based pipeline to parse an image-based visualization into the DSL, delineating its core visual structures and data-to-encoding mappings, and further reproduces the visualization from the DSL. Finally, ReVis includes an interactive interface to allow users to upload visualization images, inspect reproduced results, update the underlying data, and customize visual encodings. A gallery of 40 visualizations demonstrates the expressiveness of the DSL, and a quantitative study evaluates the reproduction quality of ReVis on these examples. Two usage scenarios and user interviews with 16 visualization practitioners demonstrate the effectiveness of ReVis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents ReVis, a human-AI collaboration system for reusing bitmap visualization images. It introduces a generic DSL to model complex visualizations, employs an MLLM-based pipeline to parse images into the DSL by extracting visual structures and data-to-encoding mappings, reproduces the visualization from the DSL, and provides an interactive interface for inspection, data updates, and customization. Support comes from a gallery of 40 examples demonstrating DSL expressiveness, a quantitative study of reproduction quality, two usage scenarios, and interviews with 16 visualization practitioners.

Significance. If the MLLM pipeline reliably parses arbitrary images into the DSL with limited manual intervention, ReVis would meaningfully lower barriers to reusing online visualizations, advancing practical human-AI tools in visualization design. The work explicitly credits the breadth of the 40-example gallery for showing DSL coverage and the practitioner interviews for grounding usability claims.

major comments (2)
  1. [Evaluation / Quantitative Study] Quantitative study (evaluation section): the abstract and manuscript describe a 'quantitative study' evaluating reproduction quality on the 40 examples but supply no concrete metrics (e.g., structure extraction accuracy, data-value fidelity, or fraction of cases requiring manual correction), no baselines, and no error analysis or test-set diversity details. This is load-bearing for the central claim because the promise of 'flexible reuse without substantial expertise' rests on the pipeline producing usable DSL outputs for arbitrary images.
  2. [§4 (MLLM Pipeline)] MLLM-based pipeline (§4): the description does not demonstrate systematic mitigation of documented MLLM failure modes (axis misreading, layered-encoding confusion, data hallucination) via an adversarial or diverse test set beyond the curated gallery. If a non-trivial fraction of cases still require substantial human fixes, the 'minimal manual correction' assumption underlying flexible reuse does not hold.
minor comments (2)
  1. [Abstract] The abstract would be strengthened by a one-sentence summary of the quantitative findings (e.g., average reproduction accuracy or correction effort) rather than only listing the study.
  2. [§3 (Generic DSL)] Notation for the generic DSL components could be introduced with a small table or diagram in §3 to improve readability for readers unfamiliar with the specific modeling choices.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments identify key areas where additional rigor in reporting will strengthen the manuscript. We address each major comment below and will revise the paper accordingly.

read point-by-point responses
  1. Referee: [Evaluation / Quantitative Study] Quantitative study (evaluation section): the abstract and manuscript describe a 'quantitative study' evaluating reproduction quality on the 40 examples but supply no concrete metrics (e.g., structure extraction accuracy, data-value fidelity, or fraction of cases requiring manual correction), no baselines, and no error analysis or test-set diversity details. This is load-bearing for the central claim because the promise of 'flexible reuse without substantial expertise' rests on the pipeline producing usable DSL outputs for arbitrary images.

    Authors: We agree that the quantitative study section requires substantially more detail to support the central claims. The current manuscript describes the study at a high level but does not report the specific metrics, baselines, or analyses noted by the referee. In the revised manuscript we will expand the evaluation section to include concrete metrics (structure extraction accuracy, data-value fidelity, fraction of cases requiring manual correction), baselines where feasible, error analysis, and explicit details on test-set diversity and selection criteria. These additions will directly address the load-bearing nature of the evaluation for the flexible-reuse claim. revision: yes

  2. Referee: [§4 (MLLM Pipeline)] MLLM-based pipeline (§4): the description does not demonstrate systematic mitigation of documented MLLM failure modes (axis misreading, layered-encoding confusion, data hallucination) via an adversarial or diverse test set beyond the curated gallery. If a non-trivial fraction of cases still require substantial human fixes, the 'minimal manual correction' assumption underlying flexible reuse does not hold.

    Authors: We acknowledge that §4 currently emphasizes pipeline design and the curated gallery without a dedicated, systematic treatment of MLLM failure modes. While the 40 examples illustrate coverage, they do not constitute an adversarial or explicitly diverse test set for the failure modes listed. In the revision we will augment §4 with an explicit analysis of axis misreading, layered-encoding confusion, and data hallucination, describing the prompting and post-processing steps used to mitigate each, together with observed correction rates from the gallery. We will also note the limitations of the current test set and outline how future work could incorporate a more diverse adversarial set. This will clarify the extent to which minimal manual intervention is achieved. revision: yes

Circularity Check

0 steps flagged

No circularity: system architecture and empirical evaluation are self-contained

full rationale

The paper presents a descriptive system (DSL definition, MLLM parsing pipeline, reproduction, and interactive interface) plus empirical support via a 40-example gallery, quantitative reproduction metrics, usage scenarios, and 16-practitioner interviews. No equations, fitted parameters, or predictions appear in the provided text. Central claims rest on component descriptions and external user-facing evaluations rather than any reduction to self-inputs, self-citations, or ansatzes. The load-bearing assumption (MLLM reliability) is acknowledged as an empirical risk but is not circular within the paper's own logic.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the unproven capability of MLLMs to parse visualizations and on the expressiveness of the newly introduced DSL; no numerical free parameters are mentioned.

axioms (1)
  • domain assumption Multimodal LLMs can extract core visual structures and data-to-encoding mappings from bitmap visualization images with sufficient accuracy for reproduction.
    This assumption underpins the entire MLLM-based pipeline described in the abstract.
invented entities (1)
  • Generic Domain-Specific Language (DSL) for modeling visualizations no independent evidence
    purpose: To support decomposition, reproduction, and customization of complex visualizations beyond limited existing formats.
    The DSL is proposed as a new modeling layer; no independent evidence of its expressiveness outside the paper's gallery is provided.

pith-pipeline@v0.9.0 · 5526 in / 1296 out tokens · 34447 ms · 2026-05-10T08:13:31.333050+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

75 extracted references · 8 canonical work pages · 1 internal anchor

  1. [2]

    Datawink: Reusing and adapting svg-based visualization examples with large multimodal models,

    L. Xie, Y . Lin, C. Liu, H. Qu, and X. Shu, “Datawink: Reusing and adapting svg-based visualization examples with large multimodal models,”arXiv preprint arXiv:2507.17734, 2025

  2. [3]

    Deconstructing and restyling d3 visual- izations,

    J. Harper and M. Agrawala, “Deconstructing and restyling d3 visual- izations,” inProceedings of the 27th Annual ACM Symposium on User Interface Software and Technology, 2014, p. 253–262

  3. [4]

    Converting basic d3 charts into reusable style templates,

    ——, “Converting basic d3 charts into reusable style templates,”IEEE Transactions on Visualization and Computer Graphics, vol. 24, no. 3, pp. 1274–1286, 2017

  4. [5]

    Structure-aware visualiza- tion retrieval,

    H. Li, Y . Wang, A. Wu, H. Wei, and H. Qu, “Structure-aware visualiza- tion retrieval,” inProceedings of the 2022 CHI Conference on Human Factors in Computing Systems, 2022, pp. 1–14

  5. [6]

    Mystique: Decon- structing svg charts for layout reuse,

    C. Chen, B. Lee, Y . Wang, Y . Chang, and Z. Liu, “Mystique: Decon- structing svg charts for layout reuse,”IEEE Transactions on Visualiza- tion and Computer Graphics, vol. 30, no. 1, pp. 447–457, 2023

  6. [7]

    Revision: Automated classification, analysis and redesign of chart images,

    M. Savva, N. Kong, A. Chhajta, L. Fei-Fei, M. Agrawala, and J. Heer, “Revision: Automated classification, analysis and redesign of chart images,” inProceedings of the 24th annual ACM Symposium on User Interface Software and Technology, 2011, pp. 393–402

  7. [8]

    Reverse-engineering visualizations: Recovering visual encodings from chart images,

    J. Poco and J. Heer, “Reverse-engineering visualizations: Recovering visual encodings from chart images,” inComputer Graphics Forum, vol. 36, no. 3. Wiley Online Library, 2017, pp. 353–363

  8. [9]

    Chartsense: Interactive data extraction from chart images,

    D. Jung, W. Kim, H. Song, J.-i. Hwang, B. Lee, B. Kim, and J. Seo, “Chartsense: Interactive data extraction from chart images,” inProceed- ings of the 2017 CHI Conference on Human Factors in Computing Systems, 2017, pp. 6706–6717

  9. [10]

    Reviving static charts into live charts,

    L. Ying, Y . Wang, H. Li, S. Dou, H. Zhang, X. Jiang, H. Qu, and Y . Wu, “Reviving static charts into live charts,”IEEE Transactions on Visualization and Computer Graphics, vol. 31, no. 8, pp. 4314–4328, 2025

  10. [11]

    Bricolage: example-based retargeting for web design,

    R. Kumar, J. O. Talton, S. Ahmad, and S. R. Klemmer, “Bricolage: example-based retargeting for web design,” inProceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2011, pp. 2197–2206

  11. [12]

    Interactive flexible style transfer for vector graphics,

    J. Warner, K. W. Kim, and B. Hartmann, “Interactive flexible style transfer for vector graphics,” inProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023, pp. 1–14

  12. [13]

    Understanding how designers find and use data visualization examples,

    H. K. Bako, X. Liu, L. Battle, and Z. Liu, “Understanding how designers find and use data visualization examples,”IEEE Transactions on Visualization and Computer Graphics, vol. 29, no. 1, pp. 1048–1058, 2023

  13. [14]

    How visualization designers perceive and use inspiration,

    A. Baigelenov, P. Shukla, and P. Parsons, “How visualization designers perceive and use inspiration,” inProceedings of the 2025 CHI Confer- ence on Human Factors in Computing Systems, 2025, pp. 1–13

  14. [15]

    D 3 data-driven documents,

    M. Bostock, V . Ogievetsky, and J. Heer, “D 3 data-driven documents,” IEEE Transactions on Visualization and Computer Graphics, vol. 17, no. 12, pp. 2301–2309, 2011

  15. [16]

    Vega- lite: A grammar of interactive graphics,

    A. Satyanarayan, D. Moritz, K. Wongsuphasawat, and J. Heer, “Vega- lite: A grammar of interactive graphics,”IEEE Transactions on Visual- ization and Computer Graphics, vol. 23, no. 1, pp. 341–350, 2016

  16. [17]

    Metal: A multi-agent framework for chart generation with test-time scaling, 2025

    B. Li, Y . Wang, J. Gu, K.-W. Chang, and N. Peng, “Metal: A multi-agent framework for chart generation with test-time scaling,”arXiv preprint arXiv:2502.17651, 2025

  17. [18]

    Chartmimic: Evaluating lmm’s cross-modal reasoning capability via chart-to-code generation.arXiv preprint arXiv:2406.09961, 2024

    C. Yang, C. Shi, Y . Liu, B. Shui, J. Wang, M. Jing, L. Xu, X. Zhu, S. Li, Y . Zhanget al., “Chartmimic: Evaluating lmm’s cross- modal reasoning capability via chart-to-code generation,”arXiv preprint arXiv:2406.09961, 2024

  18. [19]

    Vividgraph: Learning to extract and redesign network graphs from visualization images,

    S. Song, C. Li, Y . Sun, and C. Wang, “Vividgraph: Learning to extract and redesign network graphs from visualization images,”IEEE Transactions on Visualization and Computer Graphics, vol. 29, no. 7, pp. 3169–3181, 2022

  19. [20]

    Gvvst: Image-driven style extraction from graph visualizations for visual style transfer,

    S. Song, Y . Zhang, Y . Lin, H. Qu, C. Wang, and C. Li, “Gvvst: Image-driven style extraction from graph visualizations for visual style transfer,”IEEE Transactions on Visualization and Computer Graphics, vol. 31, no. 9, pp. 5975–5989, 2025

  20. [21]

    A mixed-initiative approach to reusing infographic charts,

    W. Cui, J. Wang, H. Huang, Y . Wang, C.-Y . Lin, H. Zhang, and D. Zhang, “A mixed-initiative approach to reusing infographic charts,” IEEE Transactions on Visualization and Computer Graphics, vol. 28, no. 1, pp. 173–183, 2022. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 13

  21. [22]

    Towards automated infographic design: Deep learning-based auto-extraction of extensible timeline,

    C. Zhu-Tian, Y . Wang, Q. Wang, Y . Wang, and H. Qu, “Towards automated infographic design: Deep learning-based auto-extraction of extensible timeline,”IEEE Transactions on Visualization and Computer Graphics, vol. 26, no. 1, pp. 917–926, 2019

  22. [23]

    Fowler,Domain-specific languages

    M. Fowler,Domain-specific languages. Pearson Education, 2010

  23. [24]

    Polaris: A system for query, analysis, and visualization of multidimensional relational databases,

    C. Stolte, D. Tang, and P. Hanrahan, “Polaris: A system for query, analysis, and visualization of multidimensional relational databases,” IEEE Transactions on Visualization and Computer Graphics, vol. 8, no. 1, pp. 52–65, 2002

  24. [25]

    Gotree: A grammar of tree visualizations,

    G. Li, M. Tian, Q. Xu, M. J. McGuffin, and X. Yuan, “Gotree: A grammar of tree visualizations,” inProceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM, 2020, p. 1–13

  25. [26]

    Atom: A grammar for unit visualizations,

    D. Park, S. M. Drucker, R. Fernandez, and N. Elmqvist, “Atom: A grammar for unit visualizations,”IEEE Transactions on Visualization and Computer Graphics, vol. 24, no. 12, pp. 3032–3043, 2017

  26. [27]

    A framework for multiclass contour visualization,

    S. Li, J. Yu, M. Li, L. Liu, X. L. Zhang, and X. Yuan, “A framework for multiclass contour visualization,”IEEE Transactions on Visualization and Computer Graphics, vol. 29, no. 1, pp. 353–362, 2023

  27. [28]

    A declarative rendering model for multiclass density maps,

    J. Jo, F. Vernier, P. Dragicevic, and J.-D. Fekete, “A declarative rendering model for multiclass density maps,”IEEE Transactions on Visualization and Computer Graphics, vol. 25, no. 1, pp. 470–480, 2018

  28. [29]

    Kyrix-s: Authoring scalable scatterplot visualizations of big data,

    W. Tao, X. Hou, A. Sah, L. Battle, R. Chang, and M. Stonebraker, “Kyrix-s: Authoring scalable scatterplot visualizations of big data,”IEEE Transactions on Visualization and Computer Graphics, vol. 27, no. 2, pp. 401–411, 2021

  29. [30]

    Bluefish: Composing diagrams with declarative relations,

    J. Pollock, C. Mei, G. Huang, E. Evans, D. Jackson, and A. Satya- narayan, “Bluefish: Composing diagrams with declarative relations,” in Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology, 2024, pp. 1–21

  30. [31]

    Setcola: High-level constraints for graph layout,

    J. Hoffswell, A. Borning, and J. Heer, “Setcola: High-level constraints for graph layout,” inComputer Graphics Forum, vol. 37, no. 3. Wiley Online Library, 2018, pp. 537–548

  31. [32]

    Vivaldi: A domain-specific language for volume processing and visualization on distributed heterogeneous systems,

    H. Choi, W. Choi, T. M. Quan, D. G. C. Hildebrand, H. Pfister, and W.-K. Jeong, “Vivaldi: A domain-specific language for volume processing and visualization on distributed heterogeneous systems,” IEEE Transactions on Visualization and Computer Graphics, vol. 20, no. 12, pp. 2407–2416, 2014

  32. [33]

    Atlas: Grammar-based procedural generation of data visualizations,

    Z. Liu, C. Chen, F. Morales, and Y . Zhao, “Atlas: Grammar-based procedural generation of data visualizations,” inProceedings of IEEE Visualization Conference (VIS). IEEE, 2021, pp. 171–175

  33. [34]

    Manipulable semantic components: A computational representation of data visualization scenes,

    Z. Liu, C. Chen, and J. Hooker, “Manipulable semantic components: A computational representation of data visualization scenes,”IEEE Transactions on Visualization and Computer Graphics, vol. 31, no. 1, pp. 732–742, 2025

  34. [35]

    Viseval: A benchmark for data visualization in the era of large language models,

    N. Chen, Y . Zhang, J. Xu, K. Ren, and Y . Yang, “Viseval: A benchmark for data visualization in the era of large language models,”IEEE Transactions on Visualization and Computer Graphics, vol. 31, no. 1, pp. 1301–1311, 2025

  35. [36]

    Visualization generation with large language models: An evaluation,

    X. Wang, C. Liang, S. Zheng, J. Liang, G. Li, Y . Zhang, and C. H. Liu, “Visualization generation with large language models: An evaluation,” arXiv preprint arXiv:2401.11255, 2024

  36. [37]

    Do llms have visualiza- tion literacy? an evaluation on modified visualizations to test general- ization in data interpretation,

    J. Hong, C. Seto, A. Fan, and R. Maciejewski, “Do llms have visualiza- tion literacy? an evaluation on modified visualizations to test general- ization in data interpretation,”IEEE Transactions on Visualization and Computer Graphics, vol. 31, no. 10, pp. 7004–7018, 2025

  37. [38]

    Llm4vis: Ex- plainable visualization recommendation using chatgpt,

    L. Wang, S. Zhang, Y . Wang, E.-P. Lim, and Y . Wang, “Llm4vis: Ex- plainable visualization recommendation using chatgpt,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, M. Wang and I. Zitouni, Eds., 2023, pp. 675–692

  38. [39]

    Charts-of-thought: Enhancing llm visualization literacy through structured data extraction,

    A. K. Das, M. Tarun, and K. Mueller, “Charts-of-thought: Enhancing llm visualization literacy through structured data extraction,”arXiv preprint arXiv:2508.04842, 2025

  39. [40]

    Simvecvis: A dataset for enhancing mllms in visualization understanding,

    C. Liu, C. Da, X. Long, Y . Yang, Y . Zhang, and Y . Wang, “Simvecvis: A dataset for enhancing mllms in visualization understanding,” in Proceedings of the 2025 IEEE VIS Conference, 2025, pp. 26–30

  40. [41]

    DataSway: Vivifying Metaphoric Visualization with Animation Clip Generation and Coordination

    L. Xie, J. Zhou, A. Rao, H. Qu, and X. Shu, “Datasway: Vivifying metaphoric visualization with animation clip generation and coordina- tion,”arXiv preprint arXiv:2507.22051, 2025

  41. [42]

    Revisiting the design patterns of composite visualizations,

    D. Deng, W. Cui, X. Meng, M. Xu, Y . Liao, H. Zhang, and Y . Wu, “Revisiting the design patterns of composite visualizations,”IEEE Transactions on Visualization and Computer Graphics, vol. 29, no. 12, pp. 5406–5421, 2023

  42. [43]

    The grammar of graphics,

    L. Wilkinson, “The grammar of graphics,” inHandbook of computa- tional statistics: Concepts and methods. Springer, 2011, pp. 375–414

  43. [44]

    Vaid: Indexing view designs in visual analytics system,

    L. Ying, A. Wu, H. Li, Z. Deng, J. Lan, J. Wu, Y . Wang, H. Qu, D. Deng, and Y . Wu, “Vaid: Indexing view designs in visual analytics system,” inProceedings of the 2024 CHI Conference on Human Factors in Computing Systems, 2024, pp. 1–15

  44. [45]

    Glyph-based visualization: Foun- dations, design guidelines, techniques and applications,

    R. Borgo, J. Kehrer, D. H. S. Chung, E. Maguire, R. S. Laramee, H. Hauser, M. Ward, and M. Chen, “Glyph-based visualization: Foun- dations, design guidelines, techniques and applications,” inProccedings of Eurographics (State of the Art Reports), 2013, pp. 39–63

  45. [46]

    E. R. Tufte, N. H. Goeler, and R. Benson,Envisioning Information. Cheshire, CT: Graphics Press, 1990, vol. 126

  46. [47]

    Munzner,Visualization Analysis and Design

    T. Munzner,Visualization Analysis and Design. CRC Press, 2014

  47. [48]

    Opinionseer: interactive visualization of hotel customer feedback,

    Y . Wu, F. Wei, S. Liu, N. Au, W. Cui, H. Zhou, and H. Qu, “Opinionseer: interactive visualization of hotel customer feedback,”IEEE Transactions on Visualization and Computer Graphics, vol. 16, no. 6, pp. 1109–1118, 2010

  48. [49]

    iforest: Interpreting random forests via visual analytics,

    X. Zhao, Y . Wu, D. L. Lee, and W. Cui, “iforest: Interpreting random forests via visual analytics,”IEEE Transactions on Visualization and Computer Graphics, vol. 25, no. 1, pp. 407–416, 2018

  49. [50]

    Bitextract: Interactive visualization for extracting bitcoin exchange in- telligence,

    X. Yue, X. Shu, X. Zhu, X. Du, Z. Yu, D. Papadopoulos, and S. Liu, “Bitextract: Interactive visualization for extracting bitcoin exchange in- telligence,”IEEE Transactions on Visualization and Computer Graphics, vol. 25, no. 1, pp. 162–171, 2018

  50. [51]

    Dropoutseer: Visualizing learning patterns in massive open online courses for dropout reasoning and prediction,

    Y . Chen, Q. Chen, M. Zhao, S. Boyer, K. Veeramachaneni, and H. Qu, “Dropoutseer: Visualizing learning patterns in massive open online courses for dropout reasoning and prediction,” inProceedings of 2016 IEEE Conference on Visual Analytics Science and Technology. IEEE, 2016, pp. 111–120

  51. [52]

    Clouddet: Interactive visual analysis of anomalous perfor- mances in cloud computing systems,

    K. Xu, Y . Wang, L. Yang, Y . Wang, B. Qiao, S. Qin, Y . Xu, H. Zhang, and H. Qu, “Clouddet: Interactive visual analysis of anomalous perfor- mances in cloud computing systems,”IEEE Transactions on Visualiza- tion and Computer Graphics, vol. 26, no. 1, pp. 1107–1117, 2019

  52. [53]

    Lineup: Visual analysis of multi-attribute rankings,

    S. Gratzl, A. Lex, N. Gehlenborg, H. Pfister, and M. Streit, “Lineup: Visual analysis of multi-attribute rankings,”IEEE Transactions on Visualization and Computer Graphics, vol. 19, no. 12, pp. 2277–2286, 2013

  53. [54]

    Nameclarifier: A visual analytics system for author name disambiguation,

    Q. Shen, T. Wu, H. Yang, Y . Wu, H. Qu, and W. Cui, “Nameclarifier: A visual analytics system for author name disambiguation,”IEEE Transactions on Visualization and Computer Graphics, vol. 23, no. 1, pp. 141–150, 2016

  54. [55]

    Ensemblelens: Ensemble- based visual exploration of anomaly detection algorithms with multi- dimensional data,

    K. Xu, M. Xia, X. Mu, Y . Wang, and N. Cao, “Ensemblelens: Ensemble- based visual exploration of anomaly detection algorithms with multi- dimensional data,”IEEE Transactions on Visualization and Computer Graphics, vol. 25, no. 1, pp. 109–119, 2018

  55. [56]

    Likert scale: Explored and explained,

    A. Joshi, S. Kale, S. Chandel, and D. K. Pal, “Likert scale: Explored and explained,”British journal of applied science & technology, vol. 7, no. 4, p. 396, 2015

  56. [57]

    Structgraphics: Flexible visualization design through data-agnostic and reusable graphical structures,

    T. Tsandilas, “Structgraphics: Flexible visualization design through data-agnostic and reusable graphical structures,”IEEE Transactions on Visualization and Computer Graphics, vol. 27, no. 2, pp. 315–325, 2020

  57. [58]

    Exploring the design space of composite visualization,

    W. Javed and N. Elmqvist, “Exploring the design space of composite visualization,” inProceedings of 2012 IEEE Pacific Visualization Sym- posium. IEEE, 2012, pp. 1–8

  58. [59]

    Composition and configuration patterns in multiple-view visualiza- tions,

    X. Chen, W. Zeng, Y . Lin, H. M. Ai-Maneea, J. Roberts, and R. Chang, “Composition and configuration patterns in multiple-view visualiza- tions,”IEEE Transactions on Visualization and Computer Graphics, vol. 27, no. 2, pp. 1514–1524, 2020

  59. [60]

    Visanatomy: an svg chart corpus with fine-grained semantic labels,

    C. Chen, H. K. Bako, P. Yu, J. Hooker, J. Joyal, S. C. Wang, S. Kim, J. Wu, A. Ding, L. Sandeep, A. Chen, C. Sinha, and Z. Liu, “Visanatomy: an svg chart corpus with fine-grained semantic labels,” IEEE Transactions on Visualization and Computer Graphics, pp. 1–11, 2025

  60. [61]

    Understanding how designers find and use data visualization examples,

    H. K. Bako, X. Liu, L. Battle, and Z. Liu, “Understanding how designers find and use data visualization examples,”IEEE Transactions on Visualization and Computer Graphics, vol. 29, no. 1, pp. 1048–1058, 2022

  61. [62]

    Chartinsights: Evaluating multimodal large language models for low-level chart ques- tion answering,

    Y . Wu, L. Yan, L. Shen, Y . Wang, N. Tang, and Y . Luo, “Chartinsights: Evaluating multimodal large language models for low-level chart ques- tion answering,”arXiv preprint arXiv:2405.07001, 2024

  62. [63]

    Chartqa: A benchmark for question answering about charts with visual and logical reasoning,

    A. Masry, X. L. Do, J. Q. Tan, S. Joty, and E. Hoque, “Chartqa: A benchmark for question answering about charts with visual and logical reasoning,” inFindings of the Association for Computational Linguistics: ACL 2022, 2022, pp. 2263–2279

  63. [64]

    Viznet: Towards a large-scale visualization learning and benchmarking reposi- tory,

    K. Hu, S. Gaikwad, M. Hulsebos, M. A. Bakker, E. Zgraggen, C. Hi- dalgo, T. Kraska, G. Li, A. Satyanarayan, and C ¸ . Demiralp, “Viznet: Towards a large-scale visualization learning and benchmarking reposi- tory,” inProceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 2019, pp. 1–12. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, A...

  64. [65]

    Vizml: A machine learning approach to visualization recommendation,

    K. Hu, M. A. Bakker, S. Li, T. Kraska, and C. Hidalgo, “Vizml: A machine learning approach to visualization recommendation,” inPro- ceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 2019, pp. 1–12

  65. [66]

    Kg4vis: A knowledge graph-based approach for visualization recommendation,

    H. Li, Y . Wang, S. Zhang, Y . Song, and H. Qu, “Kg4vis: A knowledge graph-based approach for visualization recommendation,”IEEE Trans- actions on Visualization and Computer Graphics, vol. 28, no. 1, pp. 195–205, 2021

  66. [67]

    Dashbot: Insight-driven dashboard generation based on deep reinforcement learning,

    D. Deng, A. Wu, H. Qu, and Y . Wu, “Dashbot: Insight-driven dashboard generation based on deep reinforcement learning,”IEEE Transactions on Visualization and Computer Graphics, vol. 29, no. 1, pp. 690–700, 2022

  67. [68]

    Dminer: Dashboard design mining and recommendation,

    Y . Lin, H. Li, A. Wu, Y . Wang, and H. Qu, “Dminer: Dashboard design mining and recommendation,”IEEE Transactions on Visualization and Computer Graphics, vol. 30, no. 7, pp. 4108–4121, 2023

  68. [69]

    Glyphweaver: Unlocking glyph design creativity with uniform glyph dsl and ai,

    C. Liu, S. Chen, Z. Jiang, and Y . Wang, “Glyphweaver: Unlocking glyph design creativity with uniform glyph dsl and ai,”arXiv preprint arXiv:2509.08444, 2025. Xiaolin Wenis currently a Ph.D student in the College of Computing and Data Science, Nanyang Technological University (NTU). His research inter- ests mainly focus on visualization for FinTech and LL...

  69. [70]

    container_id

    Treat each instance in instance_bboxes as one data instance. 2.Select Data Type •1D_LIST→containers arranged equally along a single dimension (horizontal row, vertical column, or circular ring). •2D_MATRIX→instances form a regular grid (uniform number of rows× columns). •2D_LIST→groups with varying item counts per group. •Always choose thesimpleststructur...

  70. [71]

    If the DSL uses a template container (with letter suffix like ‘‘_a’’), parseone representative instanceusing its own relative coordinate

    Context •DSL: {dsl} •Target: –mark_type = {mark_type} –container_id = {container_id} Only consider marks that belong to {container_id}. If the DSL uses a template container (with letter suffix like ‘‘_a’’), parseone representative instanceusing its own relative coordinate. Output must be JSON only. No text

  71. [72]

    data_structure

    Output Schema (Top-Level) The final returned JSON must match: { "data_structure": {...}, "mark_specification": {...}, "layout_specification": {...}, "non_layout_specification": {...} }

  72. [73]

    primary": {

    Step 1 --- Determine Data Structure 3.1 Choose data_type Three valid categories: (1) 1D_LIST JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 27 A single repeated sequence. dimension refers to the layout dimensions involved. Examples: •A simple bar chart along x→dimension = [’x’] •A scatter plot→dimension = [’x’,’y’] // circles distribute data po...

  73. [74]

    Step 2 --- Determine Layout Specification For each relevant dimension (’x’, ’y’, ’radius’, ’angle’), analyze: stacking (boolean) Whether items accumulate along that dimension. stacking_direction (’min’ | ’max’ | ’middle’) •’min’→align to left/bottom •’max’→align to right/top •’middle’→centered anchor (’min’ | ’max’ | ’middle’ | ’stacking_decided’) If stac...

  74. [75]

    scale":

    Step 3 --- Determine non_layout_specification Allowed fields: •fill •stroke •opacity •stroke_width •line_type (only for: line, band, area) •rx, ry (rounded corners, rectangles only) Each field uses one of the following scales: •’fix’: fixed value •’linear’: linear scale •’ordinal_primary’: ordinal scale with primary domain •’ordinal_secondary’: ordinal sc...

  75. [76]

    Explain the reason briefly

    FINAL OUTPUT FORMAT JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 30 Returnonly JSONfollowing: { data_structure: { data_type: ’1D_LIST’ | ’2D_MATRIX’ | ’2D_LIST’, data_size: { primary: { number: number, dimension: ’x’ | ’y’ | ’radius’ | ’angle’ | [dimension], explanation: "Explain the reason briefly" }, secondary?: { number: number | number[],...