From Static to Interactive: Authoring Interactive Visualizations via Natural Language

Can Liu; Jaeuk Lee; Tianhe Chen; Xiaolin Wen; Yong Wang; Zhibang Jiang

arxiv: 2601.17736 · v2 · submitted 2026-01-25 · 💻 cs.HC · cs.AI

From Static to Interactive: Authoring Interactive Visualizations via Natural Language

Can Liu , Jaeuk Lee , Tianhe Chen , Zhibang Jiang , Xiaolin Wen , Yong Wang This is my paper

Pith reviewed 2026-05-16 11:32 UTC · model grok-4.3

classification 💻 cs.HC cs.AI

keywords interactive visualizationsnatural language authoringmultimodal LLMsstatic to interactivevisualization designhuman-AI collaborationdata exploration

0 comments

The pith

Users convert static chart images into interactive visualizations by describing changes in plain English.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Athanor, a system that accepts a screenshot of any static visualization along with natural language instructions and produces a working interactive version. It does this by mapping user requests to a defined set of actions, analyzing requirements through multiple agents, and transforming the visualization into an abstract form that supports modifications. The goal is to remove the need for original code or programming expertise when adding features such as clicking for details, hovering for tooltips, or filtering data. A sympathetic reader would care because many published charts exist only as images, yet interactivity is essential for exploration and insight. The approach therefore opens visualization authoring to non-programmers and speeds up iteration on existing views.

Core claim

Athanor maps visualization interactions to user actions and adjustments, employs a multi-agent analyzer to convert natural language into executable steps, and uses a visualization abstraction transformer to turn static images into modifiable interactive representations, all powered by multimodal large language models.

What carries the argument

The visualization abstraction transformer that converts any static visualization image into a flexible, implementation-independent structure that supports runtime adjustments based on parsed user instructions.

If this is right

Non-programmers can add standard interactions such as selection, filtering, and detail views to existing static charts.
The same static image can support multiple rounds of natural language refinements to adjust or combine interactions.
Conversion works independently of whether the original visualization was built in D3, matplotlib, Tableau, or any other tool.
Evaluation through case studies and user interviews indicates the generated interactions are usable for typical data exploration tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be extended to accept multiple chart images at once to create linked interactive dashboards from a set of static figures.
If the underlying models improve, the approach might support domain-specific instructions such as 'add a time slider for this temporal data' with less user clarification needed.
Integration into existing visualization platforms could allow one-click conversion of exported images back into editable interactive forms.

Load-bearing premise

Multimodal large language models can correctly read the structure of any chart image and translate casual instructions into accurate, working interaction code without errors or hallucinations.

What would settle it

Present the system with a screenshot of a multi-series line chart and the instruction 'let me click a line to highlight it and show its data table' then verify whether the generated output produces correct, functional highlighting and table display on click.

Figures

Figures reproduced from arXiv: 2601.17736 by Can Liu, Jaeuk Lee, Tianhe Chen, Xiaolin Wen, Yong Wang, Zhibang Jiang.

**Figure 1.** Figure 1: Given a static chart (a), Athanor supports enabling interactions for it. The static chart has been enhanced with buttons to modify chart types (transforming (a) to (b) and (d) to (e)), the ability to rearrange the visual elements to stack or group ((b) to (c) and (e) to (f)), rescaling on the x-axis ((c) to (d)), and hover functionality to display a detailed tooltip (f). After authoring, the static line ch… view at source ↗

**Figure 2.** Figure 2: Users can upload existing visualizations in the chart view and present their authoring requirements (a) in the dialogue view. The [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The design space of visualization modifications. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 5.** Figure 5: The constraint model [27] is employed as the representation. format. A representative example of a polygon in SimVec format is: polygon: points[(x1,y1),(x2,y2),...],color: (h,s,l). 6.2.2 Element Identification We employ an MLLM to classify the role of elements in a chart, for example, the axis tick, the title, and the legend. The SimVec data, combined with the corresponding bitmap representation of the SVG… view at source ↗

**Figure 6.** Figure 6: The interface of the visualization interaction authoring system. [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: A creator defines an interaction: when hovering over a visual [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: A creator defines a comparison interaction with two actions and four sequential modifications. After deployment, when users select multiple [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 9.** Figure 9: User interview results. Horizontal bar charts on the left show the actions, modifications, and potential commands mentioned by users [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗

read the original abstract

Interactivity is crucial for effective data visualizations. However, it is often challenging to implement interactions for existing static visualizations, since the underlying code and data for existing static visualizations are often not available, and it also takes significant time and effort to enable interactions for them even if the original code and data are available. To fill this gap, we propose Athanor, a novel approach to transform existing static visualizations into interactive ones using multimodal large language models (MLLMs) and natural language instructions. Our approach introduces three key innovations: (1) an action-modification interaction design space that maps visualization interactions into user actions and corresponding adjustments, (2) a multi-agent requirement analyzer that translates natural language instructions into an actionable operational space, and (3) a visualization abstraction transformer that converts static visualizations into flexible and interactive representations regardless of their underlying implementation. Athanor allows users to effortlessly author interactions through natural language instructions, eliminating the need for programming. We conducted two case studies and in-depth interviews with target users to evaluate our approach. The results demonstrate the effectiveness and usability of our approach in allowing users to conveniently enable flexible interactions for static visualizations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Athanor sketches a workable MLLM pipeline for NL-driven viz interaction authoring but leaves the reliability claims untested.

read the letter

The main point is that this paper describes Athanor, a system that takes a static visualization image, accepts natural language instructions, and outputs an interactive version. It does this through three pieces: an action-modification design space that breaks interactions into user actions and adjustments, a multi-agent requirement analyzer that turns the instructions into steps, and a visualization abstraction transformer that handles the image-to-representation conversion. The approach is aimed at users who have only the rendered chart and no code or data underneath it. That framing is useful and the component breakdown looks like a reasonable way to structure the problem. The two case studies and user interviews give some indication that the workflow can feel natural in practice. The paper also avoids overclaiming prior results and presents the architecture cleanly. The soft spot is the evaluation. There are no accuracy numbers, no baseline comparisons, no tests across chart types or ambiguous instructions, and no failure analysis. The central claim rests on current MLLMs correctly reading arbitrary images and mapping instructions without hallucinations, yet nothing in the reported work measures that. This makes the usability story suggestive rather than demonstrated. The work is aimed at HCI and visualization researchers who want to explore AI-assisted authoring tools. A reader looking for concrete system designs and early user feedback would find material here. It is worth sending for peer review so the implementation details and the MLLM assumptions can be examined directly.

Referee Report

2 major / 2 minor

Summary. The paper introduces Athanor, a system that leverages multimodal large language models (MLLMs) to transform static visualizations into interactive ones via natural language instructions. It defines an action-modification interaction design space, a multi-agent requirement analyzer for translating instructions into operations, and a visualization abstraction transformer to create flexible representations independent of original code. Evaluation consists of two case studies and user interviews demonstrating usability for non-programmers.

Significance. If the core MLLM components prove reliable, the work could meaningfully advance HCI and visualization authoring by removing the need for programming or access to source code when adding interactions to existing charts. The proposed design space and multi-agent architecture offer concrete, reusable abstractions that could influence future tools for natural-language-driven visualization editing.

major comments (2)

[Evaluation] Evaluation section: The abstract and evaluation report positive outcomes from two case studies and interviews, yet provide no quantitative metrics (e.g., success rate, error rate, or latency), baseline comparisons, or systematic failure-mode analysis across chart types, rendering styles, or ambiguous instructions. This leaves the central claim—that the MLLM-based analyzer and transformer reliably map arbitrary static images and NL instructions to correct executable adjustments—only weakly supported.
[§3] §3 (System Architecture): The multi-agent requirement analyzer and visualization abstraction transformer are presented as key innovations, but the manuscript does not describe or measure how these components handle edge cases such as non-standard chart encodings, low-resolution images, or instructions requiring inference beyond the visible marks. Without such analysis, the assumption that current MLLMs can perform the required image-to-abstraction and NL-to-action mapping without hallucinations remains untested.

minor comments (2)

[§2] The action-modification design space is introduced without a formal definition or exhaustive enumeration of supported actions; a table or figure listing the full space with examples would improve clarity.
[Figures] Figure captions and system diagrams would benefit from explicit callouts to the three key components (design space, analyzer, transformer) to help readers trace the pipeline.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We appreciate the recognition of Athanor's potential to advance natural-language visualization authoring in HCI. We address each major comment below, indicating planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Evaluation] Evaluation section: The abstract and evaluation report positive outcomes from two case studies and interviews, yet provide no quantitative metrics (e.g., success rate, error rate, or latency), baseline comparisons, or systematic failure-mode analysis across chart types, rendering styles, or ambiguous instructions. This leaves the central claim—that the MLLM-based analyzer and transformer reliably map arbitrary static images and NL instructions to correct executable adjustments—only weakly supported.

Authors: We agree that the current evaluation, limited to two case studies and user interviews, provides only qualitative support and leaves the reliability claims weakly evidenced. In the revised manuscript we will add a new quantitative evaluation subsection reporting success rates, error rates, and latency across a test set of 80 visualization-instruction pairs spanning multiple chart types and instruction ambiguities. We will also include a failure-mode analysis based on additional systematic testing we have performed since submission. These additions will directly address the central claim. revision: yes
Referee: [§3] §3 (System Architecture): The multi-agent requirement analyzer and visualization abstraction transformer are presented as key innovations, but the manuscript does not describe or measure how these components handle edge cases such as non-standard chart encodings, low-resolution images, or instructions requiring inference beyond the visible marks. Without such analysis, the assumption that current MLLMs can perform the required image-to-abstraction and NL-to-action mapping without hallucinations remains untested.

Authors: We acknowledge that §3 focuses on the core architecture without explicit treatment of edge cases. We will expand §3 with a new robustness subsection that describes how the multi-agent analyzer uses verification steps to reduce hallucinations, how the abstraction transformer normalizes non-standard encodings, and how low-resolution inputs are handled via MLLM preprocessing. We will also report observed hallucination rates and mitigation outcomes from our internal development tests. This revision will make the handling of edge cases explicit. revision: yes

Circularity Check

0 steps flagged

No circularity: system architecture presents independent components without self-referential derivations or fitted predictions

full rationale

The paper describes a new system (Athanor) built around three explicitly introduced components—an action-modification design space, a multi-agent requirement analyzer, and a visualization abstraction transformer—each defined directly by the authors rather than derived from prior equations or self-citations. No mathematical derivations, parameter-fitting steps, or predictions appear in the abstract or described structure; the work is a systems contribution evaluated via case studies and interviews. The central claim (NL-driven transformation of static visualizations) does not reduce to any input by construction, and no uniqueness theorems or ansatzes are imported from the authors' own prior work. The derivation chain is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on the unproven reliability of current multimodal LLMs for visualization understanding and action mapping, with no free parameters or new physical entities introduced.

axioms (1)

domain assumption Multimodal large language models can accurately parse images of static visualizations and map natural language instructions to precise interaction actions and adjustments.
This assumption underpins the multi-agent requirement analyzer and visualization abstraction transformer.

invented entities (1)

Athanor system no independent evidence
purpose: End-to-end pipeline for converting static visualizations to interactive ones via natural language
The proposed software architecture itself.

pith-pipeline@v0.9.0 · 5509 in / 1245 out tokens · 78800 ms · 2026-05-16T11:32:16.060356+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Proteus: Shapeshifting Desktop Visualizations for Mobile via Multi-level Intelligent Adaptation
cs.HC 2026-04 unverdicted novelty 7.0

Proteus uses a multi-level design space and LLM multi-agents to automatically convert desktop visualizations into equivalent mobile versions that preserve readability.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

Battle, P

L. Battle, P. Duan, Z. Miranda, D. Mukusheva, R. Chang, and M. Stone- braker. Beagle: Automated extraction and interpretation of visualizations from the web. InProceedings of the ACM CHI Conference on Human Factors in Computing Systems, pp. 1–8, 2018. 9

work page 2018
[2]

D. Baur, B. Lee, and S. Carpendale. TouchWave: Kinetic multi-touch manipulation for hierarchical stacked graphs. InProceedings of the ACM Conference on Interactive Tabletops and Surfaces, 10 pages, p. 255264,

work page
[3]

Bostock, V

M. Bostock, V . Ogievetsky, and J. Heer. D3: Data-Driven Documents. IEEE Transactions on Visualization and Computer Graphics, 17(12):2301– 2309, 9 pages, 2011. 2, 5

work page 2011
[4]

Brehmer and T

M. Brehmer and T. Munzner. A multi-level typology of abstract visualiza- tion tasks.IEEE Transactions on Visualization and Computer Graphics, 19(12):2376–2385, 2013. 2

work page 2013
[5]

C. Bu, Q. Zhang, Q. Wang, J. Zhang, M. Sedlmair, O. Deussen, and Y . Wang. Sinestream: Improving the readability of streamgraphs by minimizing sine illusion effects.IEEE Transactions on Visualization and Computer Graphics, 27(2):1634–1643, 2021. 7

work page 2021
[6]

S. K. Card, J. Mackinlay, and B. Shneiderman.Readings in information visualization: using vision to think. Morgan Kaufmann, 1999. 3

work page 1999
[7]

J. Choi, D. G. Park, Y . L. Wong, E. Fisher, and N. Elmqvist. VisDock: A toolkit for cross-cutting interactions in visualization.IEEE Transactions on Visualization and Computer Graphics, 21(9):1087–1100, 2015. 3

work page 2015
[8]

K. Cox, R. E. Grinter, S. L. Hibino, L. J. Jagadeesan, and D. Mantilla. A multi-modal natural language interface to an information visualization environment.International Journal of Speech Technology, 4(3-4):297–314,

work page
[9]

Devlin, M.-W

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. InProceed- ings of NAACL, vol. 1, pp. 4171–4186, 2019. 2

work page 2019
[10]

T. Gao, M. Dontcheva, E. Adar, Z. Liu, and K. G. Karahalios. DataTone: Managing ambiguity in natural language interfaces for data visualization. InProceedings of the ACM Symposium on User Interface Software and Technology, pp. 489–500, 2015. 2

work page 2015
[11]

Harper and M

J. Harper and M. Agrawala. Deconstructing and restyling D3 visualiza- tions. InProceedings of the ACM Symposium on User Interface Software and Technology, pp. 253–262, 2014. 3, 5

work page 2014
[12]

Harper and M

J. Harper and M. Agrawala. Converting basic D3 charts into reusable style templates.IEEE Transactions on Visualization and Computer Graphics, 24(3):1274–1286, 2018. 3, 5

work page 2018
[13]

Heer and B

J. Heer and B. Shneiderman. Interactive dynamics for visual analysis: A taxonomy of tools that support the fluent and flexible use of visualizations. Queue, 10(2):30–55, 2012. 1, 3, 4

work page 2012
[14]

Hoque, V

E. Hoque, V . Setlur, M. Tory, and I. Dykeman. Applying pragmatics principles for interaction with visual analytics.IEEE Transactions on Visualization and Computer Graphics, 24(1):309–318, 2017. 2

work page 2017
[15]

Joshi, S

A. Joshi, S. Kale, S. Chandel, and D. K. Pal. Likert scale: Explored and explained.British journal of applied science & technology, 7(4):396, 2015. 8

work page 2015
[16]

S. E. Kahou, V . Michalski, A. Atkinson, Á. Kádár, A. Trischler, and Y . Bengio. FigureQA: An annotated figure dataset for visual reasoning. arXiv preprint arXiv:1710.07300, 2017. 2

work page internal anchor Pith review Pith/arXiv arXiv 2017
[17]

Kavaz, A

E. Kavaz, A. Puig, and I. Rodríguez. Chatbot-based natural language interfaces for data visualisation: A scoping review.Applied Sciences, 13(12):7025, 2023. 2

work page 2023
[18]

D. H. Kim, E. Hoque, and M. Agrawala. Answering questions about charts and generating visual explanations. InProceedings of the ACM CHI Conference on Human Factors in Computing Systems, 13 pages, p. 113,

work page
[19]

T. Kim, B. Saket, A. Endert, and B. MacIntyre. Visar: Bringing interactiv- ity to static data visualizations through augmented reality.arXiv preprint arXiv:1708.01377, 2017. 3

work page internal anchor Pith review Pith/arXiv arXiv 2017
[20]

C. Lai, Z. Lin, R. Jiang, Y . Han, C. Liu, and X. Yuan. Automatic annotation synchronizing with textual description for visualization. InProceedings of the ACM CHI Conference on Human Factors in Computing Systems,

work page
[21]

D. Li, H. Mei, Y . Shen, S. Su, W. Zhang, J. Wang, M. Zu, and W. Chen. ECharts: A declarative framework for rapid construction of web-based visualization.Visual Informatics, 2(2):136–146, 2018. 2

work page 2018
[22]

G. Li, X. Wang, G. Aodeng, S. Zheng, Y . Zhang, C. Ou, S. Wang, and C. H. Liu. Visualization generation with large language models: An evaluation. arXiv preprint arXiv:2401.11255, 2024. 2

work page arXiv 2024
[23]

C. Liu, C. Da, X. Long, Y . Yang, Y . Zhang, and Y . Wang. SimVecVis: A dataset for enhancing mllms in visualization understanding. InProceed- ings of the IEEE Visualization and Visual Analytics, pp. 26–30, 2025. 5, 9

work page 2025
[24]

C. Liu, Y . Guo, and X. Yuan. AutoTitle: An interactive title generator for visualizations.IEEE Transactions on Visualization and Computer Graphics, 30(8):5276–5288, 2024. 2

work page 2024
[25]

C. Liu, Y . Han, R. Jiang, and X. Yuan. ADVISor: Automatic visualization answer for natural-language question on tabular data. InProceedings of the IEEE Pacific Visualization Symposium, pp. 6–15, 2021. 2

work page 2021
[26]

C. Liu, L. Xie, Y . Han, and X. Yuan. AutoCaption: An approach to generate natural language description from visualization automatically. In Proceedings of the IEEE Pacific Visualization Symposium, pp. 191–195,

work page
[27]

C. Liu, Y . Zhang, C. Wu, C. Li, and X. Yuan. A spatial constraint model for manipulating static visualizations.ACM Transactions on Interactive Intelligent Systems, 14(2):1–29, 2024. 2, 3, 5, 6

work page 2024
[28]

M. Lu, J. Liang, Y . Zhang, G. Li, S. Chen, Z. Li, and X. Yuan. Interaction+: Interaction enhancement for web-based visualizations. InProceedings of the IEEE Pacific Visualization Symposium, pp. 61–70, 2017. 3

work page 2017
[29]

Y . Luo, X. Qin, N. Tang, and G. Li. Deepeye: Towards automatic data visualization. InProceedings of International Conference on Data Engi- neering, pp. 101–112, 2018. 2

work page 2018
[30]

Y . Luo, N. Tang, G. Li, C. Chai, W. Li, and X. Qin. Synthesizing natural language to visualization (nl2vis) benchmarks from nl2sql benchmarks. InProceedings of the ACM SIGMOD, 13 pages, p. 12351247, 2021. 2

work page 2021
[31]

Y . Luo, N. Tang, G. Li, J. Tang, C. Chai, and X. Qin. Natural language to visualization by neural machine translation.IEEE Transactions on Visualization and Computer Graphics, 28(1):217–226, 2021. 2

work page 2021
[32]

Ma and M

J. Ma and M. Agrawala. Mover: Motion verification for motion graphics animations.ACM Transactions on Graphics, 44(4):Article 33, Aug. 2025. 3

work page 2025
[33]

Maddigan and T

P. Maddigan and T. Susnjak. Chat2vis: Generating data visualizations via natural language using chatgpt, codex and gpt-3 large language models. Ieee Access, 11:45181–45193, 2023. 2

work page 2023
[34]

Munzner.Visualization Analysis and Design

T. Munzner.Visualization Analysis and Design. 2014. 2, 4

work page 2014
[35]

ChatGPT API

OpenAI. ChatGPT API. https://beta.openai.com/docs/ api-reference/introduction, 2023. Accessed: April 1, 2025. 2

work page 2023
[36]

Sulfur dioxide emissions by sector, world

Our World in Data. Sulfur dioxide emissions by sector, world. https://ourworldindata.org/explorers/air-pollution? country=~OWID_WRL&Pollutant=Sulfur+dioxide&Sector= Breakdown+by+sector, 2025. Accessed: 2025-04-07. 7

work page 2025
[37]

Poco and J

J. Poco and J. Heer. Reverse-engineering visualizations: Recovering visual encodings from chart images.Computer Graphics Forum, 36(3):353–363,

work page
[38]

Raffel, N

C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y . Zhou, W. Li, P. J. Liu, et al. Exploring the limits of transfer learning with a unified text-to-text transformer.J. Mach. Learn. Res., 21(140):1–67, 2020. 2

work page 2020
[39]

Touching Data: A Discoverability-based Evaluation of a Visualization Interface for Tablet Computers

R. Sadana, M. Agnihotri, and J. T. Stasko. Touching data: A discoverability-based evaluation of a visualization interface for tablet computers.CoRR, abs/1806.06084, 2018. 2

work page internal anchor Pith review Pith/arXiv arXiv 2018
[40]

Satyanarayan, D

A. Satyanarayan, D. Moritz, K. Wongsuphasawat, and J. Heer. Vega-Lite: A grammar of interactive graphics.IEEE Transactions on Visualization and Computer Graphics, 23(1):341–350, 2017. 2, 3, 5

work page 2017
[41]

Savva, N

M. Savva, N. Kong, A. Chhajta, F.-F. Li, M. Agrawala, and J. Heer. ReVision: Automated classification, analysis and redesign of chart images. InProceedings of the ACM Symposium on User Interface Software and Technology, pp. 393–402, 2011. 6

work page 2011
[42]

Sedig and P

K. Sedig and P. Parsons. Interaction design for complex cognitive activities with visual representations: A pattern-based approach.AIS Transactions on Human-Computer Interaction, 5(2):84–133, 2013. 2

work page 2013
[43]

Setlur, S

V . Setlur, S. E. Battersby, M. Tory, R. Gossweiler, and A. X. Chang. Eviza: A natural language interface for visual analysis. InProceedings of the ACM Symposium on User Interface Software and Technology, pp. 365–377,

work page
[44]

Setlur and M

V . Setlur and M. Tory. How do you converse with an analytical chatbot? revisiting gricean maxims for designing analytical conversational behav- ior. InProceedings of the ACM CHI Conference on Human Factors in Computing Systems, pp. 1–17, 2022. 4

work page 2022
[45]

L. Shen, E. Shen, Y . Luo, X. Yang, X. Hu, X. Zhang, Z. Tai, and J. Wang. Towards natural language interfaces for data visualization: A survey.IEEE Transactions on Visualization and Computer Graphics, 29(6):3121–3144,

work page
[46]

D. Shi, Y . Guo, M. Guo, Y . Wu, Q. Chen, and N. Cao. Talk2Data: High- level question decomposition for data-oriented question and answering. CoRR, abs/2107.14420, 2021. 2

work page arXiv 2021
[47]

Shneiderman

B. Shneiderman. The eyes have it: a task by data type taxonomy for information visualizations. InProceedings of IEEE Symposium on Visual Languages, pp. 336–343, 1996. 2

work page 1996
[48]

L. S. Snyder and J. Heer. DIVI: Dynamically interactive visualization. IEEE Transactions on Visualization and Computer Graphics, 30(1):403– 413, 2023. 1, 3

work page 2023
[49]

Stolte, D

C. Stolte, D. Tang, and P. Hanrahan. Polaris: A system for query, anal- ysis, and visualization of multidimensional relational databases.IEEE Transactions on Visualization and Computer Graphics, 8(1):52–65, 2002. 2

work page 2002
[50]

Y . Sun, J. Leigh, A. Johnson, and S. Lee. Articulate: A semi-automated model for translating natural language queries into meaningful visualiza- tions. InProceedings of SG, pp. 184–195, 2010. 2

work page 2010
[51]

Vaithilingam, E

P. Vaithilingam, E. L. Glassman, J. P. Inala, and C. Wang. Dynavis: Dynamically synthesized ui widgets for visualization editing, 2024. 2

work page 2024
[52]

VanderPlas and H

S. VanderPlas and H. Hofmann. Signs of the sine illusion - Why we need to care.Journal of Computational and Graphical Statistics, 24(4):1170– 1190, 2015. 7

work page 2015
[53]

C. Wang, J. Thompson, and B. Lee. Data formulator: Ai-powered concept- driven visualization authoring.IEEE Transactions on Visualization and Computer Graphics, 30(1):1128–1138, 2023. 2

work page 2023
[54]

Y . Wang, Z. Hou, L. Shen, T. Wu, J. Wang, H. Huang, H. Zhang, and D. Zhang. Towards natural language-based visualization authoring.IEEE Transactions on Visualization and Computer Graphics, 29(1):1222–1232,

work page
[55]

M. O. Ward, G. Grinstein, and D. Keim.Interactive data visualization: foundations, techniques, and applications. AK Peters/CRC Press, 2010. 2

work page 2010
[56]

T. L. Weissgerber, V . D. Garovic, M. Savic, S. J. Winham, and N. M. Milic. From static to interactive: transforming data visualization to improve transparency.PLoS biology, 14(6):e1002484, 2016. 1

work page 2016
[57]

T. Wolf, J. Chaumond, et al. Transformers: State-of-the-art natural lan- guage processing. InProceedings of EMNLP: System Demonstrations, pp. 38–45, 2020. 2

work page 2020
[58]

L. Xie, Y . Lin, C. Liu, H. Qu, and X. Shu. Datawink: Reusing and adapting svg-based visualization examples with large multimodal models.(2025). arXiv preprint arXiv:2507.17734, 2025. 3

work page arXiv 2025
[59]

J.-S. Yi, Y . ah Kang, J. Stasko, and J. A. Jacko. Toward a deeper un- derstanding of the role of interaction in information visualization.IEEE Transactions on Visualization and Computer Graphics, 13(6):1224–1231,

work page
[60]

Zhu-Tian, W

C. Zhu-Tian, W. Tong, Q. Wang, B. Bach, and H. Qu. Augmenting static visualizations with paparvis designer. InProceedings of the ACM CHI Conference on Human Factors in Computing Systems, pp. 1–12, 2020. 3

work page 2020

[1] [1]

Battle, P

L. Battle, P. Duan, Z. Miranda, D. Mukusheva, R. Chang, and M. Stone- braker. Beagle: Automated extraction and interpretation of visualizations from the web. InProceedings of the ACM CHI Conference on Human Factors in Computing Systems, pp. 1–8, 2018. 9

work page 2018

[2] [2]

D. Baur, B. Lee, and S. Carpendale. TouchWave: Kinetic multi-touch manipulation for hierarchical stacked graphs. InProceedings of the ACM Conference on Interactive Tabletops and Surfaces, 10 pages, p. 255264,

work page

[3] [3]

Bostock, V

M. Bostock, V . Ogievetsky, and J. Heer. D3: Data-Driven Documents. IEEE Transactions on Visualization and Computer Graphics, 17(12):2301– 2309, 9 pages, 2011. 2, 5

work page 2011

[4] [4]

Brehmer and T

M. Brehmer and T. Munzner. A multi-level typology of abstract visualiza- tion tasks.IEEE Transactions on Visualization and Computer Graphics, 19(12):2376–2385, 2013. 2

work page 2013

[5] [5]

C. Bu, Q. Zhang, Q. Wang, J. Zhang, M. Sedlmair, O. Deussen, and Y . Wang. Sinestream: Improving the readability of streamgraphs by minimizing sine illusion effects.IEEE Transactions on Visualization and Computer Graphics, 27(2):1634–1643, 2021. 7

work page 2021

[6] [6]

S. K. Card, J. Mackinlay, and B. Shneiderman.Readings in information visualization: using vision to think. Morgan Kaufmann, 1999. 3

work page 1999

[7] [7]

J. Choi, D. G. Park, Y . L. Wong, E. Fisher, and N. Elmqvist. VisDock: A toolkit for cross-cutting interactions in visualization.IEEE Transactions on Visualization and Computer Graphics, 21(9):1087–1100, 2015. 3

work page 2015

[8] [8]

K. Cox, R. E. Grinter, S. L. Hibino, L. J. Jagadeesan, and D. Mantilla. A multi-modal natural language interface to an information visualization environment.International Journal of Speech Technology, 4(3-4):297–314,

work page

[9] [9]

Devlin, M.-W

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. InProceed- ings of NAACL, vol. 1, pp. 4171–4186, 2019. 2

work page 2019

[10] [10]

T. Gao, M. Dontcheva, E. Adar, Z. Liu, and K. G. Karahalios. DataTone: Managing ambiguity in natural language interfaces for data visualization. InProceedings of the ACM Symposium on User Interface Software and Technology, pp. 489–500, 2015. 2

work page 2015

[11] [11]

Harper and M

J. Harper and M. Agrawala. Deconstructing and restyling D3 visualiza- tions. InProceedings of the ACM Symposium on User Interface Software and Technology, pp. 253–262, 2014. 3, 5

work page 2014

[12] [12]

Harper and M

J. Harper and M. Agrawala. Converting basic D3 charts into reusable style templates.IEEE Transactions on Visualization and Computer Graphics, 24(3):1274–1286, 2018. 3, 5

work page 2018

[13] [13]

Heer and B

J. Heer and B. Shneiderman. Interactive dynamics for visual analysis: A taxonomy of tools that support the fluent and flexible use of visualizations. Queue, 10(2):30–55, 2012. 1, 3, 4

work page 2012

[14] [14]

Hoque, V

E. Hoque, V . Setlur, M. Tory, and I. Dykeman. Applying pragmatics principles for interaction with visual analytics.IEEE Transactions on Visualization and Computer Graphics, 24(1):309–318, 2017. 2

work page 2017

[15] [15]

Joshi, S

A. Joshi, S. Kale, S. Chandel, and D. K. Pal. Likert scale: Explored and explained.British journal of applied science & technology, 7(4):396, 2015. 8

work page 2015

[16] [16]

S. E. Kahou, V . Michalski, A. Atkinson, Á. Kádár, A. Trischler, and Y . Bengio. FigureQA: An annotated figure dataset for visual reasoning. arXiv preprint arXiv:1710.07300, 2017. 2

work page internal anchor Pith review Pith/arXiv arXiv 2017

[17] [17]

Kavaz, A

E. Kavaz, A. Puig, and I. Rodríguez. Chatbot-based natural language interfaces for data visualisation: A scoping review.Applied Sciences, 13(12):7025, 2023. 2

work page 2023

[18] [18]

D. H. Kim, E. Hoque, and M. Agrawala. Answering questions about charts and generating visual explanations. InProceedings of the ACM CHI Conference on Human Factors in Computing Systems, 13 pages, p. 113,

work page

[19] [19]

T. Kim, B. Saket, A. Endert, and B. MacIntyre. Visar: Bringing interactiv- ity to static data visualizations through augmented reality.arXiv preprint arXiv:1708.01377, 2017. 3

work page internal anchor Pith review Pith/arXiv arXiv 2017

[20] [20]

C. Lai, Z. Lin, R. Jiang, Y . Han, C. Liu, and X. Yuan. Automatic annotation synchronizing with textual description for visualization. InProceedings of the ACM CHI Conference on Human Factors in Computing Systems,

work page

[21] [21]

D. Li, H. Mei, Y . Shen, S. Su, W. Zhang, J. Wang, M. Zu, and W. Chen. ECharts: A declarative framework for rapid construction of web-based visualization.Visual Informatics, 2(2):136–146, 2018. 2

work page 2018

[22] [22]

G. Li, X. Wang, G. Aodeng, S. Zheng, Y . Zhang, C. Ou, S. Wang, and C. H. Liu. Visualization generation with large language models: An evaluation. arXiv preprint arXiv:2401.11255, 2024. 2

work page arXiv 2024

[23] [23]

C. Liu, C. Da, X. Long, Y . Yang, Y . Zhang, and Y . Wang. SimVecVis: A dataset for enhancing mllms in visualization understanding. InProceed- ings of the IEEE Visualization and Visual Analytics, pp. 26–30, 2025. 5, 9

work page 2025

[24] [24]

C. Liu, Y . Guo, and X. Yuan. AutoTitle: An interactive title generator for visualizations.IEEE Transactions on Visualization and Computer Graphics, 30(8):5276–5288, 2024. 2

work page 2024

[25] [25]

C. Liu, Y . Han, R. Jiang, and X. Yuan. ADVISor: Automatic visualization answer for natural-language question on tabular data. InProceedings of the IEEE Pacific Visualization Symposium, pp. 6–15, 2021. 2

work page 2021

[26] [26]

C. Liu, L. Xie, Y . Han, and X. Yuan. AutoCaption: An approach to generate natural language description from visualization automatically. In Proceedings of the IEEE Pacific Visualization Symposium, pp. 191–195,

work page

[27] [27]

C. Liu, Y . Zhang, C. Wu, C. Li, and X. Yuan. A spatial constraint model for manipulating static visualizations.ACM Transactions on Interactive Intelligent Systems, 14(2):1–29, 2024. 2, 3, 5, 6

work page 2024

[28] [28]

M. Lu, J. Liang, Y . Zhang, G. Li, S. Chen, Z. Li, and X. Yuan. Interaction+: Interaction enhancement for web-based visualizations. InProceedings of the IEEE Pacific Visualization Symposium, pp. 61–70, 2017. 3

work page 2017

[29] [29]

Y . Luo, X. Qin, N. Tang, and G. Li. Deepeye: Towards automatic data visualization. InProceedings of International Conference on Data Engi- neering, pp. 101–112, 2018. 2

work page 2018

[30] [30]

Y . Luo, N. Tang, G. Li, C. Chai, W. Li, and X. Qin. Synthesizing natural language to visualization (nl2vis) benchmarks from nl2sql benchmarks. InProceedings of the ACM SIGMOD, 13 pages, p. 12351247, 2021. 2

work page 2021

[31] [31]

Y . Luo, N. Tang, G. Li, J. Tang, C. Chai, and X. Qin. Natural language to visualization by neural machine translation.IEEE Transactions on Visualization and Computer Graphics, 28(1):217–226, 2021. 2

work page 2021

[32] [32]

Ma and M

J. Ma and M. Agrawala. Mover: Motion verification for motion graphics animations.ACM Transactions on Graphics, 44(4):Article 33, Aug. 2025. 3

work page 2025

[33] [33]

Maddigan and T

P. Maddigan and T. Susnjak. Chat2vis: Generating data visualizations via natural language using chatgpt, codex and gpt-3 large language models. Ieee Access, 11:45181–45193, 2023. 2

work page 2023

[34] [34]

Munzner.Visualization Analysis and Design

T. Munzner.Visualization Analysis and Design. 2014. 2, 4

work page 2014

[35] [35]

ChatGPT API

OpenAI. ChatGPT API. https://beta.openai.com/docs/ api-reference/introduction, 2023. Accessed: April 1, 2025. 2

work page 2023

[36] [36]

Sulfur dioxide emissions by sector, world

Our World in Data. Sulfur dioxide emissions by sector, world. https://ourworldindata.org/explorers/air-pollution? country=~OWID_WRL&Pollutant=Sulfur+dioxide&Sector= Breakdown+by+sector, 2025. Accessed: 2025-04-07. 7

work page 2025

[37] [37]

Poco and J

J. Poco and J. Heer. Reverse-engineering visualizations: Recovering visual encodings from chart images.Computer Graphics Forum, 36(3):353–363,

work page

[38] [38]

Raffel, N

C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y . Zhou, W. Li, P. J. Liu, et al. Exploring the limits of transfer learning with a unified text-to-text transformer.J. Mach. Learn. Res., 21(140):1–67, 2020. 2

work page 2020

[39] [39]

Touching Data: A Discoverability-based Evaluation of a Visualization Interface for Tablet Computers

R. Sadana, M. Agnihotri, and J. T. Stasko. Touching data: A discoverability-based evaluation of a visualization interface for tablet computers.CoRR, abs/1806.06084, 2018. 2

work page internal anchor Pith review Pith/arXiv arXiv 2018

[40] [40]

Satyanarayan, D

A. Satyanarayan, D. Moritz, K. Wongsuphasawat, and J. Heer. Vega-Lite: A grammar of interactive graphics.IEEE Transactions on Visualization and Computer Graphics, 23(1):341–350, 2017. 2, 3, 5

work page 2017

[41] [41]

Savva, N

M. Savva, N. Kong, A. Chhajta, F.-F. Li, M. Agrawala, and J. Heer. ReVision: Automated classification, analysis and redesign of chart images. InProceedings of the ACM Symposium on User Interface Software and Technology, pp. 393–402, 2011. 6

work page 2011

[42] [42]

Sedig and P

K. Sedig and P. Parsons. Interaction design for complex cognitive activities with visual representations: A pattern-based approach.AIS Transactions on Human-Computer Interaction, 5(2):84–133, 2013. 2

work page 2013

[43] [43]

Setlur, S

V . Setlur, S. E. Battersby, M. Tory, R. Gossweiler, and A. X. Chang. Eviza: A natural language interface for visual analysis. InProceedings of the ACM Symposium on User Interface Software and Technology, pp. 365–377,

work page

[44] [44]

Setlur and M

V . Setlur and M. Tory. How do you converse with an analytical chatbot? revisiting gricean maxims for designing analytical conversational behav- ior. InProceedings of the ACM CHI Conference on Human Factors in Computing Systems, pp. 1–17, 2022. 4

work page 2022

[45] [45]

L. Shen, E. Shen, Y . Luo, X. Yang, X. Hu, X. Zhang, Z. Tai, and J. Wang. Towards natural language interfaces for data visualization: A survey.IEEE Transactions on Visualization and Computer Graphics, 29(6):3121–3144,

work page

[46] [46]

D. Shi, Y . Guo, M. Guo, Y . Wu, Q. Chen, and N. Cao. Talk2Data: High- level question decomposition for data-oriented question and answering. CoRR, abs/2107.14420, 2021. 2

work page arXiv 2021

[47] [47]

Shneiderman

B. Shneiderman. The eyes have it: a task by data type taxonomy for information visualizations. InProceedings of IEEE Symposium on Visual Languages, pp. 336–343, 1996. 2

work page 1996

[48] [48]

L. S. Snyder and J. Heer. DIVI: Dynamically interactive visualization. IEEE Transactions on Visualization and Computer Graphics, 30(1):403– 413, 2023. 1, 3

work page 2023

[49] [49]

Stolte, D

C. Stolte, D. Tang, and P. Hanrahan. Polaris: A system for query, anal- ysis, and visualization of multidimensional relational databases.IEEE Transactions on Visualization and Computer Graphics, 8(1):52–65, 2002. 2

work page 2002

[50] [50]

Y . Sun, J. Leigh, A. Johnson, and S. Lee. Articulate: A semi-automated model for translating natural language queries into meaningful visualiza- tions. InProceedings of SG, pp. 184–195, 2010. 2

work page 2010

[51] [51]

Vaithilingam, E

P. Vaithilingam, E. L. Glassman, J. P. Inala, and C. Wang. Dynavis: Dynamically synthesized ui widgets for visualization editing, 2024. 2

work page 2024

[52] [52]

VanderPlas and H

S. VanderPlas and H. Hofmann. Signs of the sine illusion - Why we need to care.Journal of Computational and Graphical Statistics, 24(4):1170– 1190, 2015. 7

work page 2015

[53] [53]

C. Wang, J. Thompson, and B. Lee. Data formulator: Ai-powered concept- driven visualization authoring.IEEE Transactions on Visualization and Computer Graphics, 30(1):1128–1138, 2023. 2

work page 2023

[54] [54]

Y . Wang, Z. Hou, L. Shen, T. Wu, J. Wang, H. Huang, H. Zhang, and D. Zhang. Towards natural language-based visualization authoring.IEEE Transactions on Visualization and Computer Graphics, 29(1):1222–1232,

work page

[55] [55]

M. O. Ward, G. Grinstein, and D. Keim.Interactive data visualization: foundations, techniques, and applications. AK Peters/CRC Press, 2010. 2

work page 2010

[56] [56]

T. L. Weissgerber, V . D. Garovic, M. Savic, S. J. Winham, and N. M. Milic. From static to interactive: transforming data visualization to improve transparency.PLoS biology, 14(6):e1002484, 2016. 1

work page 2016

[57] [57]

T. Wolf, J. Chaumond, et al. Transformers: State-of-the-art natural lan- guage processing. InProceedings of EMNLP: System Demonstrations, pp. 38–45, 2020. 2

work page 2020

[58] [58]

L. Xie, Y . Lin, C. Liu, H. Qu, and X. Shu. Datawink: Reusing and adapting svg-based visualization examples with large multimodal models.(2025). arXiv preprint arXiv:2507.17734, 2025. 3

work page arXiv 2025

[59] [59]

J.-S. Yi, Y . ah Kang, J. Stasko, and J. A. Jacko. Toward a deeper un- derstanding of the role of interaction in information visualization.IEEE Transactions on Visualization and Computer Graphics, 13(6):1224–1231,

work page

[60] [60]

Zhu-Tian, W

C. Zhu-Tian, W. Tong, Q. Wang, B. Bach, and H. Qu. Augmenting static visualizations with paparvis designer. InProceedings of the ACM CHI Conference on Human Factors in Computing Systems, pp. 1–12, 2020. 3

work page 2020