pith. sign in

arxiv: 2605.21825 · v1 · pith:JG72U7PZnew · submitted 2026-05-20 · 💻 cs.AI · cs.HC

Toward AI VIS Co-Scientists: A General and End-to-End Agent Harness for Solving Complex Data Visualization Tasks

Pith reviewed 2026-05-22 08:22 UTC · model grok-4.3

classification 💻 cs.AI cs.HC
keywords multi-agent systemsdata visualizationscientific visualizationautonomous agentsvisual analysis applicationsend-to-end pipelineSciVis contests
0
0 comments X

The pith

A collection of specialized AI agents can autonomously design and build functional single-page visualization applications from raw data and high-level task descriptions alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how an end-to-end harness of AI agents works together to handle the full pipeline of scientific visualization: exploring data, planning views, setting up code environments, writing the implementation, checking the interface, and judging whether the result meets the stated goals. Each stage creates documents and instructions that the next agents use, allowing refinement without external fixes. If the approach holds, domain scientists could receive customized interactive tools for complex datasets directly from their questions rather than needing separate visualization experts. The system is demonstrated on IEEE SciVis Contest problems that include ambiguous requirements and varied data types across science and engineering fields.

Core claim

Given only the data and target tasks, the agent harness produces functional single-page VIS Apps with verified linked-view behavior that are highly customized to the domain experts' specified tasks and needs. The harness coordinates agents across exploratory analysis, planning, environment configuration, implementation, interface validation, and task evaluation, with each stage emitting artifacts that guide downstream agents and support iterative refinement.

What carries the argument

The end-to-end agentic harness, a coordinated collection of agents with specialized skills that generate document and instruction artifacts at each stage to enable autonomous progression from data and task description to validated visualization application.

If this is right

  • Domain scientists receive ready-to-use interactive visualization tools tailored to their exact questions without writing code or consulting visualization specialists.
  • The same harness structure can process the ambiguous requirements and mixed data types that appear in real contest problems across multiple scientific fields.
  • Each agent stage outputs reusable artifacts that allow the system to iterate on the visualization design internally.
  • The harness forms a concrete building block for larger autonomous AI systems that execute long-horizon scientific workflows from high-level directions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Extending the harness to output multi-page or web-deployable apps would require only adding new validation agents for layout and performance checks.
  • Combining this visualization harness with separate agents for experiment design or simulation could close the loop from hypothesis to visual result without human handoff.
  • The artifact-passing mechanism between agents suggests a general pattern for other complex creative tasks such as report generation or model prototyping.

Load-bearing premise

A team of AI agents can reliably divide and complete the full sequence of data exploration, planning, coding, validation, and evaluation for ambiguous scientific tasks without human intervention or later manual corrections.

What would settle it

Deploy the system on an unseen IEEE SciVis Contest dataset with new data modalities and observe whether it produces a working single-page app with correct linked views and full task completion with zero post-hoc code changes.

Figures

Figures reproduced from arXiv: 2605.21825 by Chaoli Wang, Haichao Miao, Kaiyuan Tang, Kuangshi Ai, Peer-Timo Bremer, Shusen Liu, Zhimin Li.

Figure 1
Figure 1. Figure 1: VIS co-scientist agent harness diagram and generated visualization applications (VIS Apps). Left: Given a dataset and task description, the main code agent orchestrates a multi-stage workflow: specialized subagents, evaluation procedures, and custom skills. The harness enables closed-loop design with browser-based validation. A hierarchical memory system stores insights and lessons learned across sessions.… view at source ↗
Figure 2
Figure 2. Figure 2: The key capabilities for an AI VIS co-scientist. We define a VIS co-scientist as a research assistant who can independently: 1) obtain do￾main understanding and trans￾late tasks into concrete require￾ments; 2) carry out self-directed exploratory data analysis and analytical reasoning; 3) de￾sign and implement a complete, fully featured interactive visual￾ization tool; and 4) reliably as￾sess the quality an… view at source ↗
Figure 3
Figure 3. Figure 3: VIS co-scientist output for the 2025 materials-discovery case [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 1
Figure 1. Figure 1: VIS App screenshot for the 2021 SciVis Contest case. [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: VIS App screenshot for the 2023 SciVis Contest case. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: VIS App screenshot for the 2024 SciVis Contest case. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: VIS App screenshot for the 2025 SciVis Contest case. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Baseline coding-agent plots for the 2025 SciVis Contest materials-discovery case. Top left: correlation overview for Challenge 1, Task [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Additional baseline coding-agent plots for the 2025 SciVis Contest materials-discovery case. Top left: composition–phase–property [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
read the original abstract

The ability to inspect, interpret, and communicate complex data is crucial for virtually any scientific endeavor, but often requires significant expertise outside the core domain ranging from data management and analysis to visualization design and implementation. We present an end-to-end agentic harness that, based on only the data and a high level description of the tasks, independently designs custom visual analysis applications (VIS apps). This represents an important step towards a general AI co-scientist envisioned by many as an autonomous system that can autonomously execute long horizon tasks based on high-level directions. Our proposed VIS co-scientist is an essential component of this broader AI co-scientist vision: a harness that can autonomously analyze data and design visualization solutions using a collection of agents and specialized skills that coordinate exploratory analysis, plan, configure the environment, implement, validate the interface, and most importantly evaluate the overall task completion. Each stage produces document and instruction artifacts that guide downstream work and enable iterative refinement. We validate this approach on IEEE SciVis Contests spanning multiple science and engineering fields. These contests serve as ideal proving grounds because they encode real-world complexity: ambiguous requirements, diverse data modalities, design trade-offs, and task-driven validation. Given only the data and target tasks, our system autonomously produces functional single-page VIS Apps with verified linked-view behavior, highly customized to domain experts' specified tasks and needs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper presents an end-to-end agentic harness for autonomously designing custom single-page visual analysis applications (VIS apps) from input data and high-level task descriptions. A collection of specialized AI agents coordinates stages including exploratory analysis, planning, environment configuration, implementation, interface validation, and task evaluation, with each stage producing document and instruction artifacts to enable iterative refinement. The system is claimed to produce functional VIS apps with verified linked-view behavior, highly customized to domain experts' needs. Validation is described on IEEE SciVis Contests spanning multiple science and engineering fields, which involve ambiguous requirements, diverse data modalities, and design trade-offs.

Significance. If substantiated, the work would constitute a meaningful step toward general AI co-scientists for long-horizon scientific tasks, specifically by automating the expertise-intensive pipeline of data analysis and visualization design. The artifact-handoff architecture for multi-agent coordination is a plausible mechanism for managing ambiguity and trade-offs without human intervention. The choice of SciVis contests as a testbed is appropriate for demonstrating real-world applicability. Credit is due for framing the system as a general, end-to-end harness rather than a narrow tool.

major comments (1)
  1. [Abstract] Abstract: the statement that the approach 'was validated on IEEE SciVis Contests' supplies no performance metrics, success rates, failure modes, or comparison baselines. This omission leaves the central claim—that the harness autonomously produces functional, verified VIS apps on ambiguous tasks—without visible quantitative or qualitative supporting evidence.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive evaluation of the work's significance and for the constructive comment on the abstract. We address the point below and will make the corresponding revision.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the statement that the approach 'was validated on IEEE SciVis Contests' supplies no performance metrics, success rates, failure modes, or comparison baselines. This omission leaves the central claim—that the harness autonomously produces functional, verified VIS apps on ambiguous tasks—without visible quantitative or qualitative supporting evidence.

    Authors: We agree that the abstract would be strengthened by including a concise summary of the validation outcomes. The full manuscript reports quantitative results (task completion rates and verified linked-view success across multiple SciVis contest entries), qualitative case studies, and analysis of failure modes and design trade-offs in Sections 5 and 6. We will revise the abstract to incorporate key metrics and a brief statement of the evidence supporting the central claim, ensuring the validation is visible at the highest level of the paper. revision: yes

Circularity Check

0 steps flagged

No significant circularity; system description is self-contained

full rationale

The paper describes an end-to-end multi-agent architecture for generating visualization applications from input data and task descriptions. No equations, fitted parameters, or mathematical derivations are present in the provided text. Claims rest on the implemented harness producing functional outputs, with validation on external SciVis contests rather than any self-referential reduction or self-citation chain that defines the result by construction. This is a standard systems paper whose central contribution is the working pipeline itself, independent of the circularity patterns listed.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the untested premise that current AI agents possess sufficient reliability and coordination to autonomously manage the full visualization pipeline for complex, ambiguous scientific tasks.

axioms (1)
  • domain assumption AI agents can autonomously perform and coordinate exploratory analysis, planning, environment configuration, implementation, validation, and task evaluation for visualization design.
    This premise is invoked throughout the abstract as the basis for the harness producing functional apps from data and high-level descriptions alone.
invented entities (1)
  • VIS co-scientist harness no independent evidence
    purpose: Autonomous system that designs and implements custom visualization applications
    Introduced as the proposed framework without external falsifiable predictions beyond the contest validation mentioned.

pith-pipeline@v0.9.0 · 5803 in / 1314 out tokens · 48007 ms · 2026-05-22T08:22:12.184027+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 5 internal anchors

  1. [1]

    K. Ai, H. Miao, Z. Li, C. Wang, and S. Liu. An evaluation- centric paradigm for scientific visualization agents.arXiv preprint arXiv:2509.15160, 2025. 2

  2. [2]

    K. Ai, H. Miao, K. Tang, N. Gorski, J. Sun, G. Liu, H. I. Ingolfs- son, D. Lenz, H. Guo, H. Yu, et al. SciVisAgentBench: A benchmark for evaluating scientific data analysis and visualization agents.arXiv preprint arXiv:2603.29139, 2026. 1, 2, 4

  3. [3]

    K. Ai, K. Tang, and C. Wang. NLI4V olVis: Natural language inter- action for volume visualization via multi-LLM agents and editable 3D Gaussian splatting.IEEE Transactions on Visualization and Computer Graphics, 32(1):46–56, 2026. 2

  4. [4]

    Announcements: Introducing the model context protocol

    Anthropic. Announcements: Introducing the model context protocol. https://www.anthropic.com/news/model-context-protocol. 2

  5. [5]

    Equipping agents for the real world with agent skills.https://claude.com/blog/ equipping-agents-for-the-real-world-with-agent-skills,

    Anthropic. Equipping agents for the real world with agent skills.https://claude.com/blog/ equipping-agents-for-the-real-world-with-agent-skills,

  6. [6]

    Harness design for long-running apps.https://www.anthropic.com/engineering/ harness-design-long-running-apps, 2026

    Anthropic. Harness design for long-running apps.https://www.anthropic.com/engineering/ harness-design-long-running-apps, 2026. 2

  7. [7]

    Bendeck and J

    A. Bendeck and J. Stasko. An empirical evaluation of the GPT-4 mul- timodal language model on visualization literacy tasks.IEEE Transac- tions on Visualization and Computer Graphics, 31(1):1105–1115, 2025. 2

  8. [8]

    Castelvecchi

    D. Castelvecchi. Researchers built an ‘AI scientist’—what can it do. Nature, 633(8029):266–266, 2024. 1, 2

  9. [9]

    N. Chen, Y . Zhang, J. Xu, K. Ren, and Y . Yang. VisEval: A benchmark for data visualization in the era of large language models.IEEE Transac- tions on Visualization and Computer Graphics, 31(1):1301–1311, 2025. 2

  10. [10]

    P. P. Do, K. Tang, K. Ai, and C. Wang. SVLAT: Scientific visualization literacy assessment test.arXiv preprint arXiv:2603.19000, 2026. 2

  11. [11]

    Esmaeili, S

    S. Esmaeili, S. Kabir, A. M. Colas, R. P. Linder, and E. D. Ragan. Eval- uating graphical perception of visual motion for quantitative data en- coding.IEEE Transactions on Visualization and Computer Graphics, 29(12):4845–4857, 2022. 1

  12. [12]

    Towards an AI co-scientist

    J. Gottweis, W.-H. Weng, A. Daryin, T. Tu, A. Palepu, P. Sirkovic, A. Myaskovsky, F. Weissenberger, K. Rong, R. Tanno, et al. Towards an AI co-scientist.arXiv preprint arXiv:2502.18864, 2025. 1, 2

  13. [13]

    J. Hong, C. Seto, A. Fan, and R. Maciejewski. Do LLMs have visual- ization literacy? an evaluation on modified visualizations to test gener- alization in data interpretation.IEEE Transactions on Visualization and Computer Graphics, 31(10):7004–7018, 2025. 2

  14. [14]

    Karpathy

    A. Karpathy. LLM Wiki.https://gist.github.com/karpathy/ 442a6bf555914893e9891c11519de94f, 2024. Accessed: 2026-04-

  15. [15]

    Lee, S.-H

    S. Lee, S.-H. Kim, and B. C. Kwon. VLAT: Development of a visual- ization literacy assessment test.IEEE Transactions on Visualization and Computer Graphics, 23(1):551–560, 2016. 2

  16. [16]

    Y . Li, E. D. Berger, M. Kahng, and C. X. Bearfield. From perception to decision: Assessing the role of chart type affordances in high-level decision tasks. InProceedings of IEEE VIS Conference (Short Papers), pp. 366–370, 2025. 1

  17. [17]

    Z. Li, H. Miao, X. Yan, V . Pascucci, M. Berger, and S. Liu. See or recall: A sanity check for the role of vision in solving visualization question answer tasks with multimodal LLMs.arXiv preprint arXiv:2504.09809,

  18. [18]

    S. Liu, H. Miao, and P.-T. Bremer. ParaView-MCP: An autonomous visualization agent with direct tool use. InProceedings of IEEE VIS Conference (Short Papers), pp. 61–65, 2025. 2

  19. [19]

    S. Liu, H. Miao, Z. Li, M. Olson, V . Pascucci, and P.-T. Bremer. A V A: Towards autonomous visualization agents through visual perception- driven decision-making.Computer Graphics F orum, 43(3):e15093,

  20. [20]

    C. Lu, C. Lu, R. T. Lange, J. Foerster, J. Clune, and D. Ha. The AI sci- entist: Towards fully automated open-ended scientific discovery.arXiv preprint arXiv:2408.06292, 2024. 1, 2

  21. [21]

    Y . Lyu, X. Zhang, X. Yi, Y . Zhao, S. Guo, W. Hu, J. Piotrowski, J. Kaliski, J. Urbani, Z. Meng, et al. EvoScientist: Towards multi-agent evolving AI scientists for end-to-end scientific discovery.arXiv preprint arXiv:2603.08127, 2026. 2

  22. [22]

    Playwright MCP documentation.https://playwright

    Microsoft. Playwright MCP documentation.https://playwright. dev/docs/getting-started-mcp, 2024. Accessed: 2026-04-29. 3

  23. [23]

    Kosmos: An AI Scientist for Autonomous Discovery

    L. Mitchener, A. Yiu, B. Chang, M. Bourdenx, T. Nadolski, A. Sulo- vari, E. C. Landsness, D. L. Barabasi, S. Narayanan, N. Evans, et al. Kosmos: An AI scientist for autonomous discovery.arXiv preprint arXiv:2511.02824, 2025. 1, 2

  24. [24]

    Pandey and A

    S. Pandey and A. Ottley. Mini-VLAT: A short and effective measure of visualization literacy.Computer graphics forum, 42(3):1–11, 2023. 2

  25. [25]

    Peterka, T

    T. Peterka, T. Mallick, O. Yildiz, D. Lenz, C. Quammen, and B. Geveci. ChatVis: Large language model agent for generating scientific visual- izations. InProceedings of IEEE Workshop on Large Data Analysis and Visualization, pp. 22–32, 2025. 2

  26. [26]

    C. Shao, D. Huang, Y . Li, K. Zhao, W. Lin, Y . Zhang, Q. Zeng, Z. Chen, T. Li, Y . Huang, et al. OmniScientist: Toward a co-evolving ecosystem of human and AI scientists.arXiv preprint arXiv:2511.16931, 2025. 2

  27. [27]

    J. Sun, D. Lenz, T. Peterka, and H. Yu. SASA V: Self-directed agent for scientific analysis and visualization.arXiv preprint arXiv:2604.03406,

  28. [28]

    J. Tang, L. Xia, Z. Li, and C. Huang. AI-researcher: Autonomous sci- entific innovation.arXiv preprint arXiv:2505.18705, 2025. 2

  29. [29]

    Y . Weng, M. Zhu, Q. Xie, Q. Sun, Z. Lin, S. Liu, and Y . Zhang. Deep- Scientist: Advancing frontier-pushing scientific findings progressively. arXiv preprint arXiv:2509.26603, 2025. 2

  30. [30]

    The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

    Y . Yamada, R. T. Lange, C. Lu, S. Hu, C. Lu, J. Foerster, J. Clune, and D. Ha. The AI scientist-v2: Workshop-level automated scientific dis- covery via agentic tree search.arXiv preprint arXiv:2504.08066, 2025. 1, 2 5 1 APPENDIX: EVALUATIONSETUP 1.1 Evaluation of Individual Agent Reasoning Artifacts As discussed in the Method section of the main paper, t...

  31. [31]

    Bottom left: parallel-coordinates view for Challenge 1, Task 1

    Top right: PCA embedding for Challenge 1, Task 1. Bottom left: parallel-coordinates view for Challenge 1, Task 1. Bottom right: candidate- family summary for Challenge 1, Task 2. 5 Figure 6: Additional baseline coding-agent plots for the 2025 SciVis Contest materials-discovery case. Top left: composition–phase–property pathway view for Challenge 1, Task 2...