Figures as Interfaces: Toward LLM-Native Artifacts for Scientific Discovery
Pith reviewed 2026-05-10 17:11 UTC · model grok-4.3
The pith
LLM-native figures embed full provenance to let models and users trace and extend scientific analyses.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LLM-native figures are data-driven artifacts that are simultaneously human-legible and machine-addressable, each embedding complete provenance including the data subset, analytical operations and code, and visualization specification used to generate it. This enables an LLM to trace selections back to their sources, generate code to extend analyses, and orchestrate new visualizations through natural-language instructions or direct manipulation via a hybrid language-visual interface with bidirectional mapping. In the science of science domain, this leads to accelerated discovery, improved reproducibility, and transparent reasoning across agents and users, redefining the figure as an interface
What carries the argument
The hybrid language-visual interface integrating LLM agents with a bidirectional mapping between figures and their underlying data, code, and specifications.
If this is right
- Figures can serve as starting points for new analyses instead of endpoints.
- Provenance embedding makes all analytical steps explicitly accessible to LLMs.
- Natural language instructions suffice to modify or extend visualizations.
- Multi-agent systems benefit from shared, context-rich artifacts.
- Reproducibility gains follow from direct links to executable code and data.
Where Pith is reading between the lines
- Similar native artifacts could be developed for other research outputs such as tables or simulation results.
- Adoption might require new standards for figure formats that support provenance embedding.
- This framework could integrate with existing visualization libraries to make the transition easier.
- Long-term, it may influence how scientific papers are structured around interactive rather than static elements.
Load-bearing premise
A reliable bidirectional mapping between the rendered figure and its underlying data and code can be maintained without errors or constant human correction during LLM interactions.
What would settle it
A test case in which an LLM is asked to modify a figure's analysis and the resulting output is checked for fidelity to the original data and code without external fixes.
Figures
read the original abstract
Large language models (LLMs) are transforming scientific workflows, not only through their generative capabilities but also through their emerging ability to use tools, reason about data, and coordinate complex analytical tasks. Yet in most human-AI collaborations, the primary outputs, figures, are still treated as static visual summaries: once rendered, they are handled by both humans and multimodal LLMs as images to be re-interpreted from pixels or captions. The emergent capabilities of LLMs open an opportunity to fundamentally rethink this paradigm. In this paper, we introduce the concept of LLM-native figures: data-driven artifacts that are simultaneously human-legible and machine-addressable. Unlike traditional plots, each artifact embeds complete provenance: the data subset, analytical operations and code, and visualization specification used to generate it. As a result, an LLM can "see through" the figure--tracing selections back to their sources, generating code to extend analyses, and orchestrating new visualizations through natural-language instructions or direct manipulation. We implement this concept through a hybrid language-visual interface that integrates LLM agents with a bidirectional mapping between figures and underlying data. Using the science of science domain as a testbed, we demonstrate that LLM-native figures can accelerate discovery, improve reproducibility, and make reasoning transparent across agents and users. More broadly, this work establishes a general framework for embedding provenance, interactivity, and explainability into the artifacts of modern research, redefining the figure not as an end product, but as an interface for discovery. For more details, please refer to the demo video available at www.llm-native-figure.com.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces LLM-native figures as data-driven artifacts that embed complete provenance (data subsets, code, analytical operations, and visualization specs) to enable bidirectional interaction between humans, LLMs, and the underlying data. Unlike static plots, these figures support tracing, code generation, and orchestration via natural-language instructions through a hybrid language-visual interface. Using a science-of-science testbed, the work claims to demonstrate accelerated discovery, improved reproducibility, and transparent reasoning across agents and users, establishing a general framework for provenance-embedded research artifacts.
Significance. If the bidirectional mapping can be shown to be robust, the framework could meaningfully advance human-AI scientific collaboration by redefining figures as active interfaces rather than passive outputs, with potential benefits for reproducibility and multi-agent workflows in data-intensive fields.
major comments (1)
- [Abstract / testbed demonstration] Abstract and science-of-science testbed demonstration: the central claim that LLM-native figures accelerate discovery, improve reproducibility, and make reasoning transparent rests on an asserted demonstration, yet the manuscript provides no quantitative metrics, error rates, or systematic analysis of failure modes (e.g., ambiguous instructions, data-subset drift after LLM edits, or visualization-spec inconsistencies). This leaves the reliability of the bidirectional mapping unverified and is load-bearing for the primary contribution.
minor comments (1)
- [Abstract] The demo video link is referenced but the manuscript text does not include sufficient standalone description of the interface mechanics or example interaction traces to allow readers to assess the hybrid mapping without external resources.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address the single major comment below, acknowledging where the current manuscript falls short and outlining specific revisions.
read point-by-point responses
-
Referee: [Abstract / testbed demonstration] Abstract and science-of-science testbed demonstration: the central claim that LLM-native figures accelerate discovery, improve reproducibility, and make reasoning transparent rests on an asserted demonstration, yet the manuscript provides no quantitative metrics, error rates, or systematic analysis of failure modes (e.g., ambiguous instructions, data-subset drift after LLM edits, or visualization-spec inconsistencies). This leaves the reliability of the bidirectional mapping unverified and is load-bearing for the primary contribution.
Authors: We agree that the science-of-science testbed demonstration is primarily qualitative and illustrative, consisting of worked examples that show provenance tracing, code generation, and natural-language orchestration rather than controlled quantitative evaluation. The manuscript does not report metrics such as task success rates, error rates, or systematic failure-mode analysis for issues like ambiguous instructions or post-edit data drift. This is a substantive limitation for claims about acceleration, reproducibility, and transparency. In the revised version we will add a dedicated evaluation subsection that (1) defines a set of representative tasks with success/failure criteria, (2) reports quantitative results (e.g., success rates over repeated trials, latency, and reproducibility scores), and (3) provides a categorized analysis of observed failure modes together with mitigation strategies. We will also make the testbed artifacts and query logs available to support reproducibility of the evaluation itself. revision: yes
Circularity Check
No significant circularity: conceptual framework with independent implementation and demonstration.
full rationale
The paper proposes LLM-native figures as a new class of artifacts embedding provenance for bidirectional LLM-human interaction. Its central claims rest on a definitional introduction of the concept, description of a hybrid language-visual interface, and empirical demonstration within a science-of-science testbed rather than any derivation chain. No equations, fitted parameters, or load-bearing self-citations appear that would reduce the claims to inputs by construction; the bidirectional mapping is presented as an implemented capability rather than a result derived from prior self-referential assumptions. The work is therefore self-contained against external benchmarks of reproducibility and transparency.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs can effectively use bidirectional mappings between rendered figures and underlying data/code for analysis extension
invented entities (1)
-
LLM-native figure
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Anscombe, F. J. Graphs in statistical analysis.The american statistician27, 17–21 (1973)
work page 1973
-
[2]
Introducing deep research (2025)
OpenAI. Introducing deep research (2025). URLhttps://openai.com/index/int roducing-deep-research/. Accessed: 2026-02-09
work page 2025
-
[3]
Gemini, G. Gemini deep research (2025). URLhttps://gemini.google/overvi ew/deep-research/?hl=en. Accessed: 2026-02-09
work page 2025
-
[4]
Transforming r&d with agentic ai: Introducing microsoft discovery (2025)
Datar, A. Transforming r&d with agentic ai: Introducing microsoft discovery (2025). URL https://azure.microsoft.com/en-us/blog/transforming-rd-with-a gentic-ai-introducing-microsoft-discovery/. Accessed: 2026-02-09
work page 2025
- [5]
-
[6]
Tu, T.et al.Towards conversational diagnostic artificial intelligence.Nature(2025)
work page 2025
-
[7]
Harvey Professional Class AI (2025)
AI, H. Harvey Professional Class AI (2025). URLhttps://www.harvey.ai/. Accessed: 2026-02-09
work page 2025
-
[8]
Lu, C.et al.The ai scientist: Towards fully automated open-ended scientific discovery.arXiv preprint arXiv:2408.06292(2024)
work page internal anchor Pith review arXiv 2024
-
[9]
Creating an executable paper is a journey through open science.Communications Physics(2020)
Lasser, J. Creating an executable paper is a journey through open science.Communications Physics(2020)
work page 2020
-
[10]
Konkol, M., N¨ ust, D. & Goulier, L. Publishing computational research-a review of infras- tructures for reproducible and transparent scholarly communication.Research integrity and peer review(2020)
work page 2020
- [11]
-
[12]
A publishing platform that places code front and centre (2025)
Nature. A publishing platform that places code front and centre (2025). URLhttps: //www.nature.com/articles/d41586-024-02577-1. Accessed: 2026-02-09
work page 2025
-
[13]
Pioneering ’live-code’ article allows scientists to play with each other’s results (2025)
Nature. Pioneering ’live-code’ article allows scientists to play with each other’s results (2025). URLhttps://www.nature.com/articles/d41586-019-00724-7. Accessed: 2026-02-09
work page 2025
-
[14]
InProceedings of the 2017 symposium on cloud computing, 405–418 (2017)
Pasquier, T.et al.Practical whole-system provenance capture. InProceedings of the 2017 symposium on cloud computing, 405–418 (2017)
work page 2017
-
[15]
Rupprecht, L., Davis, J. C., Arnold, C., Gur, Y. & Bhagwat, D. Improving reproducibility of data science pipelines through transparent provenance capture.Proceedings of the VLDB Endowment(2020)
work page 2020
-
[16]
Direct manipulation: A step beyond programming languages.Computer (1983)
Shneiderman, B. Direct manipulation: A step beyond programming languages.Computer (1983)
work page 1983
-
[17]
Ware, C.Information visualization: perception for design(Morgan Kaufmann, 2019)
work page 2019
-
[18]
InInformation visual- ization: Human-centered issues and perspectives, 154–175 (Springer, 2008)
Keim, D.et al.Visual analytics: Definition, process, and challenges. InInformation visual- ization: Human-centered issues and perspectives, 154–175 (Springer, 2008)
work page 2008
-
[19]
Munzner, T.Visualization analysis and design(CRC press, 2014). 24
work page 2014
- [20]
-
[21]
Zhang, Y.et al.A comprehensive survey of scientific large language models and their applications in scientific discovery. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing(Association for Computational Linguistics, 2024)
work page 2024
-
[22]
Wang, H.et al.Scientific discovery in the age of artificial intelligence.Nature620, 47–60 (2023)
work page 2023
-
[23]
ACM Computing Surveys57, 1–38 (2025)
Zhang, Q.et al.Scientific large language models: A survey on biological & chemical domains. ACM Computing Surveys57, 1–38 (2025)
work page 2025
-
[24]
Ghafarollahi, A. & Buehler, M. J. SciAgents: automating scientific discovery through bioin- spired multi-agent intelligent graph reasoning.Advanced Materials(2025)
work page 2025
-
[25]
Zheng, Y.et al.Large language models for scientific discovery in molecular property predic- tion.Nature Machine Intelligence1–11 (2025)
work page 2025
-
[26]
Maojun, S.et al.A survey on large language model-based agents for statistics and data science.The American Statistician1–21 (2025)
work page 2025
-
[27]
P.et al.Data analysis in the era of generative ai.arXiv preprint arXiv:2409.18475 (2024)
Inala, J. P.et al.Data analysis in the era of generative ai.arXiv preprint arXiv:2409.18475 (2024)
-
[28]
arXiv preprint arXiv:2412.14222(2024)
Sun, M.et al.A survey on large language model-based agents for statistics and data science. arXiv preprint arXiv:2412.14222(2024)
- [29]
- [30]
-
[31]
Wang, D.et al.Human-ai collaboration in data science: Exploring data scientists’ perceptions of automated ai.Proceedings of the ACM on human-computer interaction3, 1–24 (2019)
work page 2019
-
[32]
Manning, B. S., Zhu, K. & Horton, J. J. Automated social science: Language models as scientist and subjects. Tech. Rep., National Bureau of Economic Research (2024)
work page 2024
- [33]
- [34]
-
[35]
Huang, K.et al.Biomni: A general-purpose biomedical AI agent.biorxiv(2025)
work page 2025
-
[36]
Sphinx: Enabling data science across academia
Sphinx. Sphinx: Enabling data science across academia. (2026). URLhttps://www.sp hinx.ai. Accessed: 2026-02-09
work page 2026
-
[37]
Introducing observable canvases a collaborative, visual, spatial medium for data analysis (2026)
Observable. Introducing observable canvases a collaborative, visual, spatial medium for data analysis (2026). URLhttps://observablehq.com/blog/introducing-can vases-early-access. Accessed: 2026-02-09
work page 2026
- [38]
-
[39]
Plottie: Free to explore, collect and inspire your next figure
Plottie. Plottie: Free to explore, collect and inspire your next figure. discover high-quality scientific plots from open-access literature. (2026). URLhttps://plottie.art/. Accessed: 2026-02-09
work page 2026
-
[40]
Lu, C.et al.Towards end-to-end automation of ai research.Nature651, 914–919 (2026)
work page 2026
- [41]
- [42]
-
[43]
Introducing Observable Canvases (2025)
Bostock, M. Introducing Observable Canvases (2025). URLhttps://observablehq .com/blog/introducing-canvases-early-access. Accessed: 2026-02-09
work page 2025
- [44]
-
[45]
Wang, C., Lee, B., Drucker, S. M., Marshall, D. & Gao, J. Data Formulator 2: Iterative creation of data visualizations, with ai transforming data along the way. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, 1–17 (2025)
work page 2025
-
[46]
Tableau. Tableau Agent (2025). URLhttps://www.tableau.com/products/tab leau-agent. Accessed: 2026-02-09. 26
work page 2025
-
[47]
IEEE Transactions on Visualization and Computer Graphics(2024)
Tian, Y.et al.Chartgpt: Leveraging llms to generate charts from abstract natural language. IEEE Transactions on Visualization and Computer Graphics(2024)
work page 2024
- [48]
-
[49]
Dibia, V. LIDA: A tool for automatic generation of grammar-agnostic visualizations and infographics using large language models. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics(Association for Computational Linguistics, 2023)
work page 2023
-
[50]
Zhao, Y.et al.LightV A: Lightweight visual analytics with llm agent-based task planning and execution.IEEE Transactions on Visualization and Computer Graphics(2024)
work page 2024
-
[51]
Zhao, Y.et al.LA V A: Using large language models to enhance visual analytics.IEEE transactions on visualization and computer graphics(2024)
work page 2024
-
[52]
Lange, D.et al.YAC: Bridging natural language and interactive visual exploration with generative ai for biomedical data discovery.arXiv preprint arXiv:2509.19182(2025)
work page internal anchor Pith review arXiv 2025
- [53]
-
[54]
In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems(2024)
Gao, J.et al.A taxonomy for human-llm interaction modes: An initial exploration. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems(2024)
work page 2024
-
[55]
Shen, L., Li, H., Wang, Y., Xie, X. & Qu, H. Prompting generative ai with interaction- augmented instructions. InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems, 1–9 (2025)
work page 2025
-
[56]
He, G., Demartini, G. & Gadiraju, U. Plan-then-execute: An empirical study of user trust and team performance when using llm agents as a daily assistant. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems(2025)
work page 2025
-
[57]
Shen, L., Wang, Y., Qu, H., Xie, X. & Li, H. Interaction-augmented instruction: Modeling the synergy of prompts and interactions in human-genai collaboration. InCHI 2026(2026)
work page 2026
- [58]
-
[59]
Generative Interfaces for Language Models
Chen, J., Zhang, Y., Zhang, Y., Shao, Y. & Yang, D. Generative interfaces for language models.arXiv preprint arXiv:2508.19227(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[60]
ChatGPT. ChatGPT Canvas (2025). URLhttps://openai.com/index/introdu cing-canvas/. Accessed: 2026-02-09
work page 2025
-
[61]
What are artifacts and how do i use them? (2025)
Claude. What are artifacts and how do i use them? (2025). URLhttps://support.cl aude.com/en/articles/9487310-what-are-artifacts-and-how-do-i -use-them. Accessed: 2026-02-09
- [62]
- [63]
-
[64]
You, W.et al.DesignManager: An agent-powered copilot for designers to integrate ai design tools into creative workflows.ACM Transactions on Graphics (TOG)(2025)
work page 2025
-
[65]
Get to know BigQuery data canvas: an ai-centric experience to reimagine data analytics (2025)
BigQuery, G. Get to know BigQuery data canvas: an ai-centric experience to reimagine data analytics (2025). URLhttps://cloud.google.com/blog/products/data-a nalytics/get-to-know-bigquery-data-canvas. Accessed: 2026-02-09
work page 2025
- [66]
-
[67]
Ahmadpoor, M. & Jones, B. F. The dual frontier: Patented inventions and prior scientific advance.Science357, 583–587 (2017)
work page 2017
-
[68]
Liang, W., Elrod, S., McFarland, D. A. & Zou, J. Systematic analysis of 50 years of stanford university technology transfer and commercialization.Patterns3(2022)
work page 2022
-
[69]
Yin, Y., Dong, Y., Wang, K., Wang, D. & Jones, B. F. Public use and public funding of science.Nature human behaviour6, 1344–1350 (2022)
work page 2022
-
[70]
Tripodi, G.et al.Tenure and research trajectories.Proceedings of the National Academy of Sciences122, e2500322122 (2025). 28
work page 2025
-
[71]
InInternational Conference on Learning Representations (ICLR)(2023)
Yao, S.et al.React: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR)(2023)
work page 2023
-
[72]
Advances in neural information processing systems36, 11809–11822 (2023)
Yao, S.et al.Tree of thoughts: Deliberate problem solving with large language models. Advances in neural information processing systems36, 11809–11822 (2023)
work page 2023
-
[73]
Schick, T.et al.Toolformer: Language models can teach themselves to use tools.Advances in Neural Information Processing Systems36, 68539–68551 (2023)
work page 2023
-
[74]
Madaan, A.et al.Self-refine: Iterative refinement with self-feedback.Advances in Neural Information Processing Systems36, 46534–46594 (2023)
work page 2023
- [75]
- [76]
-
[77]
Chen, Q.et al.Vizlinter: A linter and fixer framework for data visualization.IEEE transactions on visualization and computer graphics28, 206–216 (2021)
work page 2021
- [78]
-
[79]
Hong, Z.et al.Next-generation database interfaces: A survey of llm-based text-to-sql.IEEE Transactions on Knowledge and Data Engineering(2025)
work page 2025
-
[80]
IEEE Transactions on Visualization and Computer Graphics28, 5049–5070 (2021)
Wu, A.et al.AI4VIS: Survey on artificial intelligence approaches for data visualization. IEEE Transactions on Visualization and Computer Graphics28, 5049–5070 (2021)
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.