Toward AI VIS Co-Scientists: A General and End-to-End Agent Harness for Solving Complex Data Visualization Tasks
Pith reviewed 2026-05-22 08:22 UTC · model grok-4.3
The pith
A collection of specialized AI agents can autonomously design and build functional single-page visualization applications from raw data and high-level task descriptions alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Given only the data and target tasks, the agent harness produces functional single-page VIS Apps with verified linked-view behavior that are highly customized to the domain experts' specified tasks and needs. The harness coordinates agents across exploratory analysis, planning, environment configuration, implementation, interface validation, and task evaluation, with each stage emitting artifacts that guide downstream agents and support iterative refinement.
What carries the argument
The end-to-end agentic harness, a coordinated collection of agents with specialized skills that generate document and instruction artifacts at each stage to enable autonomous progression from data and task description to validated visualization application.
If this is right
- Domain scientists receive ready-to-use interactive visualization tools tailored to their exact questions without writing code or consulting visualization specialists.
- The same harness structure can process the ambiguous requirements and mixed data types that appear in real contest problems across multiple scientific fields.
- Each agent stage outputs reusable artifacts that allow the system to iterate on the visualization design internally.
- The harness forms a concrete building block for larger autonomous AI systems that execute long-horizon scientific workflows from high-level directions.
Where Pith is reading between the lines
- Extending the harness to output multi-page or web-deployable apps would require only adding new validation agents for layout and performance checks.
- Combining this visualization harness with separate agents for experiment design or simulation could close the loop from hypothesis to visual result without human handoff.
- The artifact-passing mechanism between agents suggests a general pattern for other complex creative tasks such as report generation or model prototyping.
Load-bearing premise
A team of AI agents can reliably divide and complete the full sequence of data exploration, planning, coding, validation, and evaluation for ambiguous scientific tasks without human intervention or later manual corrections.
What would settle it
Deploy the system on an unseen IEEE SciVis Contest dataset with new data modalities and observe whether it produces a working single-page app with correct linked views and full task completion with zero post-hoc code changes.
Figures
read the original abstract
The ability to inspect, interpret, and communicate complex data is crucial for virtually any scientific endeavor, but often requires significant expertise outside the core domain ranging from data management and analysis to visualization design and implementation. We present an end-to-end agentic harness that, based on only the data and a high level description of the tasks, independently designs custom visual analysis applications (VIS apps). This represents an important step towards a general AI co-scientist envisioned by many as an autonomous system that can autonomously execute long horizon tasks based on high-level directions. Our proposed VIS co-scientist is an essential component of this broader AI co-scientist vision: a harness that can autonomously analyze data and design visualization solutions using a collection of agents and specialized skills that coordinate exploratory analysis, plan, configure the environment, implement, validate the interface, and most importantly evaluate the overall task completion. Each stage produces document and instruction artifacts that guide downstream work and enable iterative refinement. We validate this approach on IEEE SciVis Contests spanning multiple science and engineering fields. These contests serve as ideal proving grounds because they encode real-world complexity: ambiguous requirements, diverse data modalities, design trade-offs, and task-driven validation. Given only the data and target tasks, our system autonomously produces functional single-page VIS Apps with verified linked-view behavior, highly customized to domain experts' specified tasks and needs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents an end-to-end agentic harness for autonomously designing custom single-page visual analysis applications (VIS apps) from input data and high-level task descriptions. A collection of specialized AI agents coordinates stages including exploratory analysis, planning, environment configuration, implementation, interface validation, and task evaluation, with each stage producing document and instruction artifacts to enable iterative refinement. The system is claimed to produce functional VIS apps with verified linked-view behavior, highly customized to domain experts' needs. Validation is described on IEEE SciVis Contests spanning multiple science and engineering fields, which involve ambiguous requirements, diverse data modalities, and design trade-offs.
Significance. If substantiated, the work would constitute a meaningful step toward general AI co-scientists for long-horizon scientific tasks, specifically by automating the expertise-intensive pipeline of data analysis and visualization design. The artifact-handoff architecture for multi-agent coordination is a plausible mechanism for managing ambiguity and trade-offs without human intervention. The choice of SciVis contests as a testbed is appropriate for demonstrating real-world applicability. Credit is due for framing the system as a general, end-to-end harness rather than a narrow tool.
major comments (1)
- [Abstract] Abstract: the statement that the approach 'was validated on IEEE SciVis Contests' supplies no performance metrics, success rates, failure modes, or comparison baselines. This omission leaves the central claim—that the harness autonomously produces functional, verified VIS apps on ambiguous tasks—without visible quantitative or qualitative supporting evidence.
Simulated Author's Rebuttal
We thank the referee for their positive evaluation of the work's significance and for the constructive comment on the abstract. We address the point below and will make the corresponding revision.
read point-by-point responses
-
Referee: [Abstract] Abstract: the statement that the approach 'was validated on IEEE SciVis Contests' supplies no performance metrics, success rates, failure modes, or comparison baselines. This omission leaves the central claim—that the harness autonomously produces functional, verified VIS apps on ambiguous tasks—without visible quantitative or qualitative supporting evidence.
Authors: We agree that the abstract would be strengthened by including a concise summary of the validation outcomes. The full manuscript reports quantitative results (task completion rates and verified linked-view success across multiple SciVis contest entries), qualitative case studies, and analysis of failure modes and design trade-offs in Sections 5 and 6. We will revise the abstract to incorporate key metrics and a brief statement of the evidence supporting the central claim, ensuring the validation is visible at the highest level of the paper. revision: yes
Circularity Check
No significant circularity; system description is self-contained
full rationale
The paper describes an end-to-end multi-agent architecture for generating visualization applications from input data and task descriptions. No equations, fitted parameters, or mathematical derivations are present in the provided text. Claims rest on the implemented harness producing functional outputs, with validation on external SciVis contests rather than any self-referential reduction or self-citation chain that defines the result by construction. This is a standard systems paper whose central contribution is the working pipeline itself, independent of the circularity patterns listed.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption AI agents can autonomously perform and coordinate exploratory analysis, planning, environment configuration, implementation, validation, and task evaluation for visualization design.
invented entities (1)
-
VIS co-scientist harness
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We present an end-to-end agentic harness that, based on only the data and a high level description of the tasks, independently designs custom visual analysis applications (VIS apps).
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The main design loop consists of three roles: Planner, VIS Designer, and Evaluator... Each stage produces document and instruction artifacts that guide downstream work and enable iterative refinement.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
- [2]
-
[3]
K. Ai, K. Tang, and C. Wang. NLI4V olVis: Natural language inter- action for volume visualization via multi-LLM agents and editable 3D Gaussian splatting.IEEE Transactions on Visualization and Computer Graphics, 32(1):46–56, 2026. 2
work page 2026
-
[4]
Announcements: Introducing the model context protocol
Anthropic. Announcements: Introducing the model context protocol. https://www.anthropic.com/news/model-context-protocol. 2
-
[5]
Anthropic. Equipping agents for the real world with agent skills.https://claude.com/blog/ equipping-agents-for-the-real-world-with-agent-skills,
-
[6]
Anthropic. Harness design for long-running apps.https://www.anthropic.com/engineering/ harness-design-long-running-apps, 2026. 2
work page 2026
-
[7]
A. Bendeck and J. Stasko. An empirical evaluation of the GPT-4 mul- timodal language model on visualization literacy tasks.IEEE Transac- tions on Visualization and Computer Graphics, 31(1):1105–1115, 2025. 2
work page 2025
-
[8]
D. Castelvecchi. Researchers built an ‘AI scientist’—what can it do. Nature, 633(8029):266–266, 2024. 1, 2
work page 2024
-
[9]
N. Chen, Y . Zhang, J. Xu, K. Ren, and Y . Yang. VisEval: A benchmark for data visualization in the era of large language models.IEEE Transac- tions on Visualization and Computer Graphics, 31(1):1301–1311, 2025. 2
work page 2025
- [10]
-
[11]
S. Esmaeili, S. Kabir, A. M. Colas, R. P. Linder, and E. D. Ragan. Eval- uating graphical perception of visual motion for quantitative data en- coding.IEEE Transactions on Visualization and Computer Graphics, 29(12):4845–4857, 2022. 1
work page 2022
-
[12]
J. Gottweis, W.-H. Weng, A. Daryin, T. Tu, A. Palepu, P. Sirkovic, A. Myaskovsky, F. Weissenberger, K. Rong, R. Tanno, et al. Towards an AI co-scientist.arXiv preprint arXiv:2502.18864, 2025. 1, 2
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[13]
J. Hong, C. Seto, A. Fan, and R. Maciejewski. Do LLMs have visual- ization literacy? an evaluation on modified visualizations to test gener- alization in data interpretation.IEEE Transactions on Visualization and Computer Graphics, 31(10):7004–7018, 2025. 2
work page 2025
- [14]
- [15]
-
[16]
Y . Li, E. D. Berger, M. Kahng, and C. X. Bearfield. From perception to decision: Assessing the role of chart type affordances in high-level decision tasks. InProceedings of IEEE VIS Conference (Short Papers), pp. 366–370, 2025. 1
work page 2025
- [17]
-
[18]
S. Liu, H. Miao, and P.-T. Bremer. ParaView-MCP: An autonomous visualization agent with direct tool use. InProceedings of IEEE VIS Conference (Short Papers), pp. 61–65, 2025. 2
work page 2025
-
[19]
S. Liu, H. Miao, Z. Li, M. Olson, V . Pascucci, and P.-T. Bremer. A V A: Towards autonomous visualization agents through visual perception- driven decision-making.Computer Graphics F orum, 43(3):e15093,
-
[20]
C. Lu, C. Lu, R. T. Lange, J. Foerster, J. Clune, and D. Ha. The AI sci- entist: Towards fully automated open-ended scientific discovery.arXiv preprint arXiv:2408.06292, 2024. 1, 2
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [21]
-
[22]
Playwright MCP documentation.https://playwright
Microsoft. Playwright MCP documentation.https://playwright. dev/docs/getting-started-mcp, 2024. Accessed: 2026-04-29. 3
work page 2024
-
[23]
Kosmos: An AI Scientist for Autonomous Discovery
L. Mitchener, A. Yiu, B. Chang, M. Bourdenx, T. Nadolski, A. Sulo- vari, E. C. Landsness, D. L. Barabasi, S. Narayanan, N. Evans, et al. Kosmos: An AI scientist for autonomous discovery.arXiv preprint arXiv:2511.02824, 2025. 1, 2
work page internal anchor Pith review arXiv 2025
-
[24]
S. Pandey and A. Ottley. Mini-VLAT: A short and effective measure of visualization literacy.Computer graphics forum, 42(3):1–11, 2023. 2
work page 2023
-
[25]
T. Peterka, T. Mallick, O. Yildiz, D. Lenz, C. Quammen, and B. Geveci. ChatVis: Large language model agent for generating scientific visual- izations. InProceedings of IEEE Workshop on Large Data Analysis and Visualization, pp. 22–32, 2025. 2
work page 2025
- [26]
-
[27]
J. Sun, D. Lenz, T. Peterka, and H. Yu. SASA V: Self-directed agent for scientific analysis and visualization.arXiv preprint arXiv:2604.03406,
work page internal anchor Pith review Pith/arXiv arXiv
- [28]
- [29]
-
[30]
The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search
Y . Yamada, R. T. Lange, C. Lu, S. Hu, C. Lu, J. Foerster, J. Clune, and D. Ha. The AI scientist-v2: Workshop-level automated scientific dis- covery via agentic tree search.arXiv preprint arXiv:2504.08066, 2025. 1, 2 5 1 APPENDIX: EVALUATIONSETUP 1.1 Evaluation of Individual Agent Reasoning Artifacts As discussed in the Method section of the main paper, t...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[31]
Bottom left: parallel-coordinates view for Challenge 1, Task 1
Top right: PCA embedding for Challenge 1, Task 1. Bottom left: parallel-coordinates view for Challenge 1, Task 1. Bottom right: candidate- family summary for Challenge 1, Task 2. 5 Figure 6: Additional baseline coding-agent plots for the 2025 SciVis Contest materials-discovery case. Top left: composition–phase–property pathway view for Challenge 1, Task 2...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.