From Static to Interactive: Authoring Interactive Visualizations via Natural Language
Pith reviewed 2026-05-16 11:32 UTC · model grok-4.3
The pith
Users convert static chart images into interactive visualizations by describing changes in plain English.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Athanor maps visualization interactions to user actions and adjustments, employs a multi-agent analyzer to convert natural language into executable steps, and uses a visualization abstraction transformer to turn static images into modifiable interactive representations, all powered by multimodal large language models.
What carries the argument
The visualization abstraction transformer that converts any static visualization image into a flexible, implementation-independent structure that supports runtime adjustments based on parsed user instructions.
If this is right
- Non-programmers can add standard interactions such as selection, filtering, and detail views to existing static charts.
- The same static image can support multiple rounds of natural language refinements to adjust or combine interactions.
- Conversion works independently of whether the original visualization was built in D3, matplotlib, Tableau, or any other tool.
- Evaluation through case studies and user interviews indicates the generated interactions are usable for typical data exploration tasks.
Where Pith is reading between the lines
- The method could be extended to accept multiple chart images at once to create linked interactive dashboards from a set of static figures.
- If the underlying models improve, the approach might support domain-specific instructions such as 'add a time slider for this temporal data' with less user clarification needed.
- Integration into existing visualization platforms could allow one-click conversion of exported images back into editable interactive forms.
Load-bearing premise
Multimodal large language models can correctly read the structure of any chart image and translate casual instructions into accurate, working interaction code without errors or hallucinations.
What would settle it
Present the system with a screenshot of a multi-series line chart and the instruction 'let me click a line to highlight it and show its data table' then verify whether the generated output produces correct, functional highlighting and table display on click.
Figures
read the original abstract
Interactivity is crucial for effective data visualizations. However, it is often challenging to implement interactions for existing static visualizations, since the underlying code and data for existing static visualizations are often not available, and it also takes significant time and effort to enable interactions for them even if the original code and data are available. To fill this gap, we propose Athanor, a novel approach to transform existing static visualizations into interactive ones using multimodal large language models (MLLMs) and natural language instructions. Our approach introduces three key innovations: (1) an action-modification interaction design space that maps visualization interactions into user actions and corresponding adjustments, (2) a multi-agent requirement analyzer that translates natural language instructions into an actionable operational space, and (3) a visualization abstraction transformer that converts static visualizations into flexible and interactive representations regardless of their underlying implementation. Athanor allows users to effortlessly author interactions through natural language instructions, eliminating the need for programming. We conducted two case studies and in-depth interviews with target users to evaluate our approach. The results demonstrate the effectiveness and usability of our approach in allowing users to conveniently enable flexible interactions for static visualizations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Athanor, a system that leverages multimodal large language models (MLLMs) to transform static visualizations into interactive ones via natural language instructions. It defines an action-modification interaction design space, a multi-agent requirement analyzer for translating instructions into operations, and a visualization abstraction transformer to create flexible representations independent of original code. Evaluation consists of two case studies and user interviews demonstrating usability for non-programmers.
Significance. If the core MLLM components prove reliable, the work could meaningfully advance HCI and visualization authoring by removing the need for programming or access to source code when adding interactions to existing charts. The proposed design space and multi-agent architecture offer concrete, reusable abstractions that could influence future tools for natural-language-driven visualization editing.
major comments (2)
- [Evaluation] Evaluation section: The abstract and evaluation report positive outcomes from two case studies and interviews, yet provide no quantitative metrics (e.g., success rate, error rate, or latency), baseline comparisons, or systematic failure-mode analysis across chart types, rendering styles, or ambiguous instructions. This leaves the central claim—that the MLLM-based analyzer and transformer reliably map arbitrary static images and NL instructions to correct executable adjustments—only weakly supported.
- [§3] §3 (System Architecture): The multi-agent requirement analyzer and visualization abstraction transformer are presented as key innovations, but the manuscript does not describe or measure how these components handle edge cases such as non-standard chart encodings, low-resolution images, or instructions requiring inference beyond the visible marks. Without such analysis, the assumption that current MLLMs can perform the required image-to-abstraction and NL-to-action mapping without hallucinations remains untested.
minor comments (2)
- [§2] The action-modification design space is introduced without a formal definition or exhaustive enumeration of supported actions; a table or figure listing the full space with examples would improve clarity.
- [Figures] Figure captions and system diagrams would benefit from explicit callouts to the three key components (design space, analyzer, transformer) to help readers trace the pipeline.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We appreciate the recognition of Athanor's potential to advance natural-language visualization authoring in HCI. We address each major comment below, indicating planned revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: The abstract and evaluation report positive outcomes from two case studies and interviews, yet provide no quantitative metrics (e.g., success rate, error rate, or latency), baseline comparisons, or systematic failure-mode analysis across chart types, rendering styles, or ambiguous instructions. This leaves the central claim—that the MLLM-based analyzer and transformer reliably map arbitrary static images and NL instructions to correct executable adjustments—only weakly supported.
Authors: We agree that the current evaluation, limited to two case studies and user interviews, provides only qualitative support and leaves the reliability claims weakly evidenced. In the revised manuscript we will add a new quantitative evaluation subsection reporting success rates, error rates, and latency across a test set of 80 visualization-instruction pairs spanning multiple chart types and instruction ambiguities. We will also include a failure-mode analysis based on additional systematic testing we have performed since submission. These additions will directly address the central claim. revision: yes
-
Referee: [§3] §3 (System Architecture): The multi-agent requirement analyzer and visualization abstraction transformer are presented as key innovations, but the manuscript does not describe or measure how these components handle edge cases such as non-standard chart encodings, low-resolution images, or instructions requiring inference beyond the visible marks. Without such analysis, the assumption that current MLLMs can perform the required image-to-abstraction and NL-to-action mapping without hallucinations remains untested.
Authors: We acknowledge that §3 focuses on the core architecture without explicit treatment of edge cases. We will expand §3 with a new robustness subsection that describes how the multi-agent analyzer uses verification steps to reduce hallucinations, how the abstraction transformer normalizes non-standard encodings, and how low-resolution inputs are handled via MLLM preprocessing. We will also report observed hallucination rates and mitigation outcomes from our internal development tests. This revision will make the handling of edge cases explicit. revision: yes
Circularity Check
No circularity: system architecture presents independent components without self-referential derivations or fitted predictions
full rationale
The paper describes a new system (Athanor) built around three explicitly introduced components—an action-modification design space, a multi-agent requirement analyzer, and a visualization abstraction transformer—each defined directly by the authors rather than derived from prior equations or self-citations. No mathematical derivations, parameter-fitting steps, or predictions appear in the abstract or described structure; the work is a systems contribution evaluated via case studies and interviews. The central claim (NL-driven transformation of static visualizations) does not reduce to any input by construction, and no uniqueness theorems or ansatzes are imported from the authors' own prior work. The derivation chain is therefore self-contained and non-circular.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Multimodal large language models can accurately parse images of static visualizations and map natural language instructions to precise interaction actions and adjustments.
invented entities (1)
-
Athanor system
no independent evidence
Forward citations
Cited by 1 Pith paper
-
Proteus: Shapeshifting Desktop Visualizations for Mobile via Multi-level Intelligent Adaptation
Proteus uses a multi-level design space and LLM multi-agents to automatically convert desktop visualizations into equivalent mobile versions that preserve readability.
Reference graph
Works this paper leans on
- [1]
-
[2]
D. Baur, B. Lee, and S. Carpendale. TouchWave: Kinetic multi-touch manipulation for hierarchical stacked graphs. InProceedings of the ACM Conference on Interactive Tabletops and Surfaces, 10 pages, p. 255264,
-
[3]
M. Bostock, V . Ogievetsky, and J. Heer. D3: Data-Driven Documents. IEEE Transactions on Visualization and Computer Graphics, 17(12):2301– 2309, 9 pages, 2011. 2, 5
work page 2011
-
[4]
M. Brehmer and T. Munzner. A multi-level typology of abstract visualiza- tion tasks.IEEE Transactions on Visualization and Computer Graphics, 19(12):2376–2385, 2013. 2
work page 2013
-
[5]
C. Bu, Q. Zhang, Q. Wang, J. Zhang, M. Sedlmair, O. Deussen, and Y . Wang. Sinestream: Improving the readability of streamgraphs by minimizing sine illusion effects.IEEE Transactions on Visualization and Computer Graphics, 27(2):1634–1643, 2021. 7
work page 2021
-
[6]
S. K. Card, J. Mackinlay, and B. Shneiderman.Readings in information visualization: using vision to think. Morgan Kaufmann, 1999. 3
work page 1999
-
[7]
J. Choi, D. G. Park, Y . L. Wong, E. Fisher, and N. Elmqvist. VisDock: A toolkit for cross-cutting interactions in visualization.IEEE Transactions on Visualization and Computer Graphics, 21(9):1087–1100, 2015. 3
work page 2015
-
[8]
K. Cox, R. E. Grinter, S. L. Hibino, L. J. Jagadeesan, and D. Mantilla. A multi-modal natural language interface to an information visualization environment.International Journal of Speech Technology, 4(3-4):297–314,
-
[9]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. InProceed- ings of NAACL, vol. 1, pp. 4171–4186, 2019. 2
work page 2019
-
[10]
T. Gao, M. Dontcheva, E. Adar, Z. Liu, and K. G. Karahalios. DataTone: Managing ambiguity in natural language interfaces for data visualization. InProceedings of the ACM Symposium on User Interface Software and Technology, pp. 489–500, 2015. 2
work page 2015
-
[11]
J. Harper and M. Agrawala. Deconstructing and restyling D3 visualiza- tions. InProceedings of the ACM Symposium on User Interface Software and Technology, pp. 253–262, 2014. 3, 5
work page 2014
-
[12]
J. Harper and M. Agrawala. Converting basic D3 charts into reusable style templates.IEEE Transactions on Visualization and Computer Graphics, 24(3):1274–1286, 2018. 3, 5
work page 2018
-
[13]
J. Heer and B. Shneiderman. Interactive dynamics for visual analysis: A taxonomy of tools that support the fluent and flexible use of visualizations. Queue, 10(2):30–55, 2012. 1, 3, 4
work page 2012
- [14]
- [15]
-
[16]
S. E. Kahou, V . Michalski, A. Atkinson, Á. Kádár, A. Trischler, and Y . Bengio. FigureQA: An annotated figure dataset for visual reasoning. arXiv preprint arXiv:1710.07300, 2017. 2
work page internal anchor Pith review Pith/arXiv arXiv 2017
- [17]
-
[18]
D. H. Kim, E. Hoque, and M. Agrawala. Answering questions about charts and generating visual explanations. InProceedings of the ACM CHI Conference on Human Factors in Computing Systems, 13 pages, p. 113,
-
[19]
T. Kim, B. Saket, A. Endert, and B. MacIntyre. Visar: Bringing interactiv- ity to static data visualizations through augmented reality.arXiv preprint arXiv:1708.01377, 2017. 3
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[20]
C. Lai, Z. Lin, R. Jiang, Y . Han, C. Liu, and X. Yuan. Automatic annotation synchronizing with textual description for visualization. InProceedings of the ACM CHI Conference on Human Factors in Computing Systems,
-
[21]
D. Li, H. Mei, Y . Shen, S. Su, W. Zhang, J. Wang, M. Zu, and W. Chen. ECharts: A declarative framework for rapid construction of web-based visualization.Visual Informatics, 2(2):136–146, 2018. 2
work page 2018
- [22]
-
[23]
C. Liu, C. Da, X. Long, Y . Yang, Y . Zhang, and Y . Wang. SimVecVis: A dataset for enhancing mllms in visualization understanding. InProceed- ings of the IEEE Visualization and Visual Analytics, pp. 26–30, 2025. 5, 9
work page 2025
-
[24]
C. Liu, Y . Guo, and X. Yuan. AutoTitle: An interactive title generator for visualizations.IEEE Transactions on Visualization and Computer Graphics, 30(8):5276–5288, 2024. 2
work page 2024
-
[25]
C. Liu, Y . Han, R. Jiang, and X. Yuan. ADVISor: Automatic visualization answer for natural-language question on tabular data. InProceedings of the IEEE Pacific Visualization Symposium, pp. 6–15, 2021. 2
work page 2021
-
[26]
C. Liu, L. Xie, Y . Han, and X. Yuan. AutoCaption: An approach to generate natural language description from visualization automatically. In Proceedings of the IEEE Pacific Visualization Symposium, pp. 191–195,
-
[27]
C. Liu, Y . Zhang, C. Wu, C. Li, and X. Yuan. A spatial constraint model for manipulating static visualizations.ACM Transactions on Interactive Intelligent Systems, 14(2):1–29, 2024. 2, 3, 5, 6
work page 2024
-
[28]
M. Lu, J. Liang, Y . Zhang, G. Li, S. Chen, Z. Li, and X. Yuan. Interaction+: Interaction enhancement for web-based visualizations. InProceedings of the IEEE Pacific Visualization Symposium, pp. 61–70, 2017. 3
work page 2017
-
[29]
Y . Luo, X. Qin, N. Tang, and G. Li. Deepeye: Towards automatic data visualization. InProceedings of International Conference on Data Engi- neering, pp. 101–112, 2018. 2
work page 2018
-
[30]
Y . Luo, N. Tang, G. Li, C. Chai, W. Li, and X. Qin. Synthesizing natural language to visualization (nl2vis) benchmarks from nl2sql benchmarks. InProceedings of the ACM SIGMOD, 13 pages, p. 12351247, 2021. 2
work page 2021
-
[31]
Y . Luo, N. Tang, G. Li, J. Tang, C. Chai, and X. Qin. Natural language to visualization by neural machine translation.IEEE Transactions on Visualization and Computer Graphics, 28(1):217–226, 2021. 2
work page 2021
- [32]
-
[33]
P. Maddigan and T. Susnjak. Chat2vis: Generating data visualizations via natural language using chatgpt, codex and gpt-3 large language models. Ieee Access, 11:45181–45193, 2023. 2
work page 2023
-
[34]
Munzner.Visualization Analysis and Design
T. Munzner.Visualization Analysis and Design. 2014. 2, 4
work page 2014
-
[35]
OpenAI. ChatGPT API. https://beta.openai.com/docs/ api-reference/introduction, 2023. Accessed: April 1, 2025. 2
work page 2023
-
[36]
Sulfur dioxide emissions by sector, world
Our World in Data. Sulfur dioxide emissions by sector, world. https://ourworldindata.org/explorers/air-pollution? country=~OWID_WRL&Pollutant=Sulfur+dioxide&Sector= Breakdown+by+sector, 2025. Accessed: 2025-04-07. 7
work page 2025
-
[37]
J. Poco and J. Heer. Reverse-engineering visualizations: Recovering visual encodings from chart images.Computer Graphics Forum, 36(3):353–363,
- [38]
-
[39]
Touching Data: A Discoverability-based Evaluation of a Visualization Interface for Tablet Computers
R. Sadana, M. Agnihotri, and J. T. Stasko. Touching data: A discoverability-based evaluation of a visualization interface for tablet computers.CoRR, abs/1806.06084, 2018. 2
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[40]
A. Satyanarayan, D. Moritz, K. Wongsuphasawat, and J. Heer. Vega-Lite: A grammar of interactive graphics.IEEE Transactions on Visualization and Computer Graphics, 23(1):341–350, 2017. 2, 3, 5
work page 2017
- [41]
-
[42]
K. Sedig and P. Parsons. Interaction design for complex cognitive activities with visual representations: A pattern-based approach.AIS Transactions on Human-Computer Interaction, 5(2):84–133, 2013. 2
work page 2013
- [43]
-
[44]
V . Setlur and M. Tory. How do you converse with an analytical chatbot? revisiting gricean maxims for designing analytical conversational behav- ior. InProceedings of the ACM CHI Conference on Human Factors in Computing Systems, pp. 1–17, 2022. 4
work page 2022
-
[45]
L. Shen, E. Shen, Y . Luo, X. Yang, X. Hu, X. Zhang, Z. Tai, and J. Wang. Towards natural language interfaces for data visualization: A survey.IEEE Transactions on Visualization and Computer Graphics, 29(6):3121–3144,
- [46]
-
[47]
B. Shneiderman. The eyes have it: a task by data type taxonomy for information visualizations. InProceedings of IEEE Symposium on Visual Languages, pp. 336–343, 1996. 2
work page 1996
-
[48]
L. S. Snyder and J. Heer. DIVI: Dynamically interactive visualization. IEEE Transactions on Visualization and Computer Graphics, 30(1):403– 413, 2023. 1, 3
work page 2023
- [49]
-
[50]
Y . Sun, J. Leigh, A. Johnson, and S. Lee. Articulate: A semi-automated model for translating natural language queries into meaningful visualiza- tions. InProceedings of SG, pp. 184–195, 2010. 2
work page 2010
-
[51]
P. Vaithilingam, E. L. Glassman, J. P. Inala, and C. Wang. Dynavis: Dynamically synthesized ui widgets for visualization editing, 2024. 2
work page 2024
-
[52]
S. VanderPlas and H. Hofmann. Signs of the sine illusion - Why we need to care.Journal of Computational and Graphical Statistics, 24(4):1170– 1190, 2015. 7
work page 2015
-
[53]
C. Wang, J. Thompson, and B. Lee. Data formulator: Ai-powered concept- driven visualization authoring.IEEE Transactions on Visualization and Computer Graphics, 30(1):1128–1138, 2023. 2
work page 2023
-
[54]
Y . Wang, Z. Hou, L. Shen, T. Wu, J. Wang, H. Huang, H. Zhang, and D. Zhang. Towards natural language-based visualization authoring.IEEE Transactions on Visualization and Computer Graphics, 29(1):1222–1232,
-
[55]
M. O. Ward, G. Grinstein, and D. Keim.Interactive data visualization: foundations, techniques, and applications. AK Peters/CRC Press, 2010. 2
work page 2010
-
[56]
T. L. Weissgerber, V . D. Garovic, M. Savic, S. J. Winham, and N. M. Milic. From static to interactive: transforming data visualization to improve transparency.PLoS biology, 14(6):e1002484, 2016. 1
work page 2016
-
[57]
T. Wolf, J. Chaumond, et al. Transformers: State-of-the-art natural lan- guage processing. InProceedings of EMNLP: System Demonstrations, pp. 38–45, 2020. 2
work page 2020
- [58]
-
[59]
J.-S. Yi, Y . ah Kang, J. Stasko, and J. A. Jacko. Toward a deeper un- derstanding of the role of interaction in information visualization.IEEE Transactions on Visualization and Computer Graphics, 13(6):1224–1231,
-
[60]
C. Zhu-Tian, W. Tong, Q. Wang, B. Bach, and H. Qu. Augmenting static visualizations with paparvis designer. InProceedings of the ACM CHI Conference on Human Factors in Computing Systems, pp. 1–12, 2020. 3
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.