SciVisAgentSkills: Design and Evaluation of Agent Skills for Scientific Data Analysis and Visualization

Chaoli Wang; Haichao Miao; Kaiyuan Tang; Kuangshi Ai; Shusen Liu

arxiv: 2606.05525 · v1 · pith:LBGZTPO3new · submitted 2026-06-04 · 💻 cs.AI · cs.HC

SciVisAgentSkills: Design and Evaluation of Agent Skills for Scientific Data Analysis and Visualization

Kuangshi Ai , Haichao Miao , Kaiyuan Tang , Shusen Liu , Chaoli Wang This is my paper

Pith reviewed 2026-06-28 02:17 UTC · model grok-4.3

classification 💻 cs.AI cs.HC

keywords agent skillsscientific visualizationcoding agentsSciVisAgentBenchParaViewlong-horizon workflowstool usage patternsdomain heuristics

0 comments

The pith

Reusable agent skills encoding SciVis tool patterns raise coding agents' mean task scores on multi-step workflows.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SciVisAgentSkills as a collection of reusable skills that add structured knowledge about scientific visualization tools to general coding agents. These skills capture environment assumptions, usage patterns, and domain heuristics for packages including ParaView, napari, VMD, and TTK. When loaded into Codex and Claude Code and tested on SciVisAgentBench's 108 expert-designed tasks, the skills produce higher average success rates, with token savings that vary by the agent harness and tool configuration. A reader would care because long-horizon SciVis tasks have been difficult for unaugmented agents to complete reliably without such procedural scaffolding.

Core claim

SciVisAgentSkills is a collection of reusable agent skills that augment coding agents for scientific data analysis and visualization by encoding environment assumptions, tool usage patterns, and domain heuristics across scientific tools such as ParaView, napari, VMD, and TTK. Evaluated on Codex and Claude Code using SciVisAgentBench, a benchmark of 108 expert-designed multi-step tasks, the skills improve mean task scores across the evaluated suites, with token-efficiency benefits that depend on the agent harness and tool setting. The findings indicate that structured procedural knowledge supports more reliable long-horizon SciVis workflows and that skills must be examined together with the e

What carries the argument

SciVisAgentSkills, a collection of reusable agent skills that encode environment assumptions, tool usage patterns, and domain heuristics for SciVis tools.

If this is right

Agent skills improve mean task scores on the 108 SciVis tasks.
Token-efficiency benefits depend on the specific agent harness and tool setting.
Structured procedural knowledge supports reliable long-horizon SciVis workflows.
Skills should be studied together with the execution harness that loads them.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same skill design pattern could be applied to other agent toolkits that control different scientific software stacks.
Skills tuned for one set of visualization packages may require updates when new versions or entirely new tools appear.
Integrating the skills into additional agent frameworks beyond the two tested here could reveal further performance differences.
Real deployment would need mechanisms to keep the encoded heuristics current as the underlying tools evolve.

Load-bearing premise

The 108 expert-designed tasks in SciVisAgentBench represent real-world long-horizon SciVis workflows and the skills will transfer to agents and tools beyond the two tested harnesses.

What would settle it

A follow-up evaluation on a fresh collection of tasks drawn directly from actual user sessions in ParaView or napari that shows no score improvement or worse token use when the skills are applied.

Figures

Figures reproduced from arXiv: 2606.05525 by Chaoli Wang, Haichao Miao, Kaiyuan Tang, Kuangshi Ai, Shusen Liu.

**Figure 2.** Figure 2: pass@{1,2,3} (i.e., success in at least one of the first k trials) and passˆ{1,2,3} (i.e., success in all k trials) results of coding agents with and without SciVisAgentSkills across five SciVisAgentBench task suites. mark designed for realistic, multi-step SciVis workflows. Unlike prior benchmarks that focus on short-horizon or 2D visualization tasks, SciVisAgentBench comprises 108 expert-crafted cases sp… view at source ↗

read the original abstract

Recent advances in agentic visualization have enabled the translation of natural language into executable scientific visualization (SciVis) workflows. While general-purpose coding agents show strong capabilities, they often lack the tool-specific expertise required for SciVis tasks. In this work, we present SciVisAgentSkills, a collection of reusable agent skills that augment coding agents for scientific data analysis and visualization by encoding environment assumptions, tool usage patterns, and domain heuristics across scientific tools such as ParaView, napari, VMD, and TTK. We evaluate these skills on Codex and Claude Code using SciVisAgentBench, a benchmark of 108 expert-designed multi-step tasks. Results show that agent skills improve mean task scores across the evaluated suites, with token-efficiency benefits that depend on the agent harness and tool setting. These findings highlight the importance of structured procedural knowledge for enabling reliable, long-horizon SciVis workflows, while also showing that skills should be studied alongside the execution harness that loads and applies them. The skills are available at https://github.com/KuangshiAi/SciVisAgentSkills.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces reusable skills for SciVis agents plus a 108-task benchmark and reports mean-score gains on two harnesses, but supplies almost no evaluation details.

read the letter

The core contribution is SciVisAgentSkills, a set of hand-written skills that encode tool-specific patterns and heuristics for ParaView, napari, VMD, and TTK, plus SciVisAgentBench, a new collection of 108 expert-designed multi-step tasks. They test the skills inside Codex and Claude Code and report that mean task scores rise and that token use sometimes drops, with the size of the efficiency effect depending on the harness and tool.

That is a straightforward, practical move. Packaging domain knowledge into reusable skills rather than leaving it to the base model is a reasonable way to make agents more reliable on long-horizon SciVis work, and releasing the skills on GitHub lets others try them directly.

The main weakness is that the abstract gives no numbers, no baseline descriptions, no error bars, and no statistical tests. We therefore cannot tell how large the gains actually are or whether they survive reasonable controls. The tasks are all expert-designed, so it is also unclear how well they represent real workflows or whether the skills transfer beyond the two harnesses tested. The authors themselves flag the harness-dependence, which is honest but leaves the broader claim thin.

The work is aimed at people already building or evaluating agents for scientific computing and visualization. Anyone in that niche can extract the skills and the benchmark and run their own checks. It is not a foundational result, but the artifacts are concrete and the claim is modest.

I would send it to peer review. The idea is clear enough and the code is public, so referees can verify the implementation even if the current write-up needs more experimental detail.

Referee Report

1 major / 0 minor

Summary. The paper introduces SciVisAgentSkills, a collection of reusable agent skills for coding agents performing scientific data analysis and visualization. The skills encode environment assumptions, tool usage patterns, and domain heuristics for tools such as ParaView, napari, VMD, and TTK. They are evaluated on Codex and Claude Code using SciVisAgentBench, a benchmark of 108 expert-designed multi-step tasks. Results show that the skills improve mean task scores across suites, with token-efficiency benefits that depend on the agent harness and tool setting. The skills are released on GitHub.

Significance. If the empirical results hold with proper controls, this work would demonstrate the value of domain-specific procedural knowledge for reliable long-horizon SciVis agent workflows and the necessity of evaluating skills together with their execution harness. The open release of the skills supports reproducibility and extension by the community.

major comments (1)

[Evaluation] The abstract asserts performance improvements but supplies no information on baselines, statistical tests, task selection criteria, or error bars, so the data cannot be checked against the claim from the given text. The evaluation section must supply these details to support the central claim that skills improve mean task scores.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment and the recommendation of minor revision. We address the evaluation concern below and will update the manuscript accordingly.

read point-by-point responses

Referee: [Evaluation] The abstract asserts performance improvements but supplies no information on baselines, statistical tests, task selection criteria, or error bars, so the data cannot be checked against the claim from the given text. The evaluation section must supply these details to support the central claim that skills improve mean task scores.

Authors: We agree that the abstract and evaluation section require additional detail to make the performance claims verifiable. The revised manuscript will expand the evaluation section to describe the baselines (agents without the skills), the statistical tests applied to the mean task score differences, the expert criteria used to design and select the 108 tasks, and the error bars or variance measures reported for each suite. The abstract will be updated to briefly reference the baseline comparisons and the nature of the reported improvements. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper reports an empirical evaluation of newly designed agent skills on the 108-task SciVisAgentBench benchmark using two agent harnesses. No equations, fitted parameters, predictions derived from inputs, or self-citation chains appear in the provided text or abstract. The central claim (skills improve mean task scores with harness-dependent efficiency effects) is a direct reporting of experimental outcomes on the introduced benchmark and does not reduce to any definitional or fitted equivalence by construction. This is a standard self-contained empirical study with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations, fitted parameters, or new postulated entities are referenced in the abstract; the paper is an empirical study of agent skills and a benchmark.

pith-pipeline@v0.9.1-grok · 5731 in / 1147 out tokens · 29050 ms · 2026-06-28T02:17:27.558862+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

HiLSVA: Design and Evaluation of a Human-in-the-Loop Agentic System for Scientific Visualization
cs.HC 2026-06 unverdicted novelty 4.0

HiLSVA introduces a plan-first multi-agent LLM system for scientific visualization that incorporates explicit human oversight, stepwise provenance, and learn-at-test-time adaptation, evaluated via case studies and a 1...

Reference graph

Works this paper leans on

46 extracted references · 30 canonical work pages · cited by 1 Pith paper · 9 internal anchors

[1]

J. P. Ahrens, B. Geveci, and C. C. Law. ParaView: An end-user tool for large-data visualization. In C. D. Hansen and C. R. Johnson, eds., The Visualization Handbook, chap. 36, pp. 717–731. Academic Press,
[2]

doi:10.1016/B978-012387582-2/50038-11, 2

work page doi:10.1016/b978-012387582-2/50038-11
[3]

K. Ai, H. Miao, Z. Li, C. Wang, and S. Liu. An evaluation-centric paradigm for scientific visualization agents. InProceedings of IEEE Workshop on GenAI, Agents, and the Future of VIS, 2025. doi:10. 48550/arXiv.2509.151602

arXiv 2025
[4]

K. Ai, H. Miao, K. Tang, N. Gorski, J. Sun, G. Liu, H. I. Ing ´olfsson, D. Lenz, H. Guo, H. Yu, et al. SciVisAgentBench: A benchmark for evaluating scientific data analysis and visualization agents.arXiv preprint arXiv:2603.29139, 2026. doi:10.48550/arXiv.2603.291391, 2, 3

work page doi:10.48550/arxiv.2603.291391 2026
[5]

K. Ai, K. Tang, and C. Wang. NLI4V olVis: Natural language interac- tion for volume visualization via multi-LLM agents and editable 3D Gaussian splatting.IEEE Transactions on Visualization and Computer Graphics, 32(1):46–56, 2026. doi:10.1109/TVCG.2025.36338881, 2

work page doi:10.1109/tvcg.2025.36338881 2026
[6]

Announcements: Introducing the model con- text protocol.https://www.anthropic.com/news/ model-context-protocol, 2024

Anthropic. Announcements: Introducing the model con- text protocol.https://www.anthropic.com/news/ model-context-protocol, 2024. 1

2024
[7]

Claude Code: An agentic coding tool.https://github

Anthropic. Claude Code: An agentic coding tool.https://github. com/anthropics/claude-code, 2025. 1

2025
[8]

Effective harnesses for long-running agents.https://www.anthropic.com/engineering/ effective-harnesses-for-long-running-agents, 2025

Anthropic. Effective harnesses for long-running agents.https://www.anthropic.com/engineering/ effective-harnesses-for-long-running-agents, 2025. 4

2025
[9]

Equipping agents for the real world with agent skills.https://claude.com/blog/ equipping-agents-for-the-real-world-with-agent-skills,

Anthropic. Equipping agents for the real world with agent skills.https://claude.com/blog/ equipping-agents-for-the-real-world-with-agent-skills,
[10]

Harness design for long-running application de- velopment.https://www.anthropic.com/engineering/ harness-design-long-running-apps, 2026

Anthropic. Harness design for long-running application de- velopment.https://www.anthropic.com/engineering/ harness-design-long-running-apps, 2026. 4

2026
[11]

Biswas, T

A. Biswas, T. L. Turton, N. R. Ranasinghe, S. Jones, B. Love, W. Jones, A. Hagberg, H.-W. Shen, N. DeBardeleben, and E. Lawrence. VizGenie: Toward self-refining, domain-aware work- flows for next-generation scientific visualization.IEEE Transactions on Visualization and Computer Graphics, 32(1):1021–1031, 2026. doi:10.1109/TVCG.2025.36346551, 2

work page doi:10.1109/tvcg.2025.36346551 2026
[12]

N. Chen, Y . Zhang, J. Xu, K. Ren, and Y . Yang. VisEval: A bench- mark for data visualization in the era of large language models.IEEE Transactions on Visualization and Computer Graphics, 31(1):1301– 1311, 2025. doi:10.1109/TVCG.2024.34563202

work page doi:10.1109/tvcg.2024.34563202 2025
[13]

Z. Chen, J. Chen, S. ¨O. Arik, M. Sra, T. Pfister, and J. Yoon. CoDA: Agentic systems for collaborative data visualization.arXiv preprint arXiv:2510.03194, 2025. doi:10.48550/arXiv.2510.031942

work page doi:10.48550/arxiv.2510.031942 2025
[14]

Dhanoa, A

V . Dhanoa, A. Wolter, G. M. Le ´on, H.-J. Schulz, and N. Elmqvist. Agentic visualization: Extracting agent-based design patterns from visualization systems.IEEE Computer Graphics and Applications, 45(6):89–90, 2025. doi:10.1109/MCG.2025.36077411, 2

work page doi:10.1109/mcg.2025.36077411 2025
[15]

V . Dibia. LIDA: A tool for automatic generation of grammar-agnostic visualizations and infographics using large language models. InPro- ceedings of Annual Meeting of the Association for Computational Lin- guistics: System Demonstrations, pp. 113–126, 2023. doi:10.18653/ v1/2023.acl-demo.112

2023
[16]

P. P. Do, K. Tang, K. Ai, and C. Wang. SVLAT: Scientific visualiza- tion literacy assessment test.arXiv preprint arXiv:2603.19000, 2026. doi:10.48550/arXiv.2603.190002

work page doi:10.48550/arxiv.2603.190002 2026
[17]

GMX-VMD-MCP: MCP service for GROMACS and VMD molecular dynamics simulations and visualization.https:// github.com/egtai/gmx-vmd-mcp, 2025

EgT. GMX-VMD-MCP: MCP service for GROMACS and VMD molecular dynamics simulations and visualization.https:// github.com/egtai/gmx-vmd-mcp, 2025. 1, 2

2025
[18]

Galimzyanov, S

T. Galimzyanov, S. Titov, Y . Golubev, and E. Bogomolov. Drawing Pandas: A benchmark for LLMs in generating plotting code. InPro- ceedings of IEEE/ACM International Conference on Mining Software Repositories, pp. 503–507, 2025. doi:10.48550/arXiv.2412.027642

work page doi:10.48550/arxiv.2412.027642 2025
[19]

Gemini CLI: An open-source ai agent that brings the power of gemini directly into your terminal.https://github.com/ google-gemini/gemini-cli, 2025

Google. Gemini CLI: An open-source ai agent that brings the power of gemini directly into your terminal.https://github.com/ google-gemini/gemini-cli, 2025. 1

2025
[20]

Gorski, S

N. Gorski, S. Liu, and B. Wang. TopoPilot: Reliable conversational workflow automation for topological data analysis and visualization. arXiv preprint arXiv:2603.25063, 2026. doi:10.48550/arXiv.2603.25063 1, 2

work page doi:10.48550/arxiv.2603.25063 2026
[21]

Humphrey, A

W. Humphrey, A. Dalke, and K. Schulten. VMD: Visual molecular dynamics.Journal of Molecular Graphics, 14:33–38, 1996. doi:10. 1016/0263-7855(96)00018-51, 2

1996
[22]

D. Jia, A. Irger, L. Besanc ¸on, O. Strnad, D. Luo, J. Bj ¨orklund, A. Kouyoumdjian, A. Ynnerman, and I. Viola. VOICE: Visual ora- cle for interaction, conversation, and explanation.IEEE Transactions on Visualization and Computer Graphics, 31(10):8828–8845, 2025. doi:10.1109/TVCG.2025.35799562

work page doi:10.1109/tvcg.2025.35799562 2025
[23]

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

Y . Jiang, D. Li, H. Deng, B. Ma, X. Wang, Q. Wang, and G. Yu. SoK: Agentic skills–beyond tool use in LLM agents.arXiv preprint arXiv:2602.20867, 2026. doi:10.48550/arXiv.2602.208672

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.208672 2026
[24]

X. Li, W. Chen, Y . Liu, S. Zheng, X. Chen, Y . He, Y . Li, B. You, H. Shen, J. Sun, et al. SkillsBench: Benchmarking how well agent skills work across diverse tasks.arXiv preprint arXiv:2602.12670,

Pith/arXiv arXiv
[25]

doi:10.48550/arXiv.2602.126702

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.126702
[26]

G. Ling, S. Zhong, and R. Huang. Agent skills: A data-driven anal- ysis of claude skills for extending large language model functionality. arXiv preprint arXiv:2602.08004, 2026. doi:10.48550/arXiv.2602.08004 2

work page doi:10.48550/arxiv.2602.08004 2026
[27]

S. Liu, H. Miao, and P.-T. Bremer. ParaView-MCP: An autonomous visualization agent with direct tool use. InProceedings of IEEE VIS Conference (Short Papers), pp. 61–65, 2025. doi:10.48550/arXiv.2505. 070641, 2

work page doi:10.48550/arxiv.2505 2025
[28]

S. Liu, H. Miao, Z. Li, M. Olson, V . Pascucci, and P.-T. Bremer. A V A: Towards autonomous visualization agents through visual perception- driven decision-making.Computer Graphics F orum, 43(3):e15093,
[29]

doi:10.1111/cgf.150931, 2

work page doi:10.1111/cgf.150931
[30]

X. Liu, H. Yu, H. Zhang, Y . Xu, X. Lei, H. Lai, Y . Gu, H. Ding, K. Men, K. Yang, et al. AgentBench: Evaluating LLMs as agents. InProceedings of International Conference on Learning Representa- tions, 2023. doi:10.48550/arXiv.2308.036881

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2308.036881 2023
[31]

Mathai, M

M. Mathai, M. Han, J. Knowles, V . A. Mateevitsi, S. Rizzi, and H. Childs. NL2SciVis: A benchmark for natural language to scientific visualization. InProceedings of Eurographics Conference on Visual- ization (Short Papers), 2026. doi:10.2312/evs.202610172

work page doi:10.2312/evs.202610172 2026
[32]

GAIA: a benchmark for General AI Assistants

G. Mialon, C. Fourrier, T. Wolf, Y . LeCun, and T. Scialom. GAIA: A benchmark for general AI assistants. InProceedings of International Conference on Learning Representations, 2023. doi:10.48550/arXiv. 2311.129831

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2023
[33]

H. Miao, Z. Li, K. Ai, K. Tang, C. Wang, P.-T. Bremer, and S. Liu. Toward AI VIS co-scientists: A general and end-to-end agent har- ness for solving complex data visualization tasks.arXiv preprint arXiv:2605.21825, 2026. doi:10.48550/arXiv.2605.218252, 4

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.218252 2026
[34]

Miao and S

H. Miao and S. Liu. BioImage-Agent.https://github.com/LLNL/ bioimage-agent, 2025. 1, 2

2025
[35]

OpenAI Codex: Lightweight coding agent that runs in your terminal.https://github.com/openai/codex, 2025

OpenAI. OpenAI Codex: Lightweight coding agent that runs in your terminal.https://github.com/openai/codex, 2025. 1

2025
[36]

Peterka, T

T. Peterka, T. Mallick, O. Yildiz, D. Lenz, C. Quammen, and B. Geveci. ChatVis: Large language model agent for generating scien- tific visualizations. InProceedings of IEEE Workshop on Large Data Analysis and Visualization, pp. 22–32, 2025. doi:10.1109/LDAV68558. 2025.000071, 2

work page doi:10.1109/ldav68558 2025
[37]

Sofroniew, T

N. Sofroniew, T. Lambert, G. Bokota, J. Nunez-Iglesias, P. Sobolewski, A. Sweet, L. Gaifas, K. Evans, A. Burt, D. Don- cila Pop, et al. napari: A multi-dimensional image viewer for Python,
[38]

doi:10.5281/zenodo.35556201, 2

work page doi:10.5281/zenodo.35556201
[39]

J. Sun, D. Lenz, T. Peterka, and H. Yu. SASA V: Self-directed agent for scientific analysis and visualization.arXiv preprint arXiv:2604.03406, 2026. doi:10.48550/arXiv.2604.034061, 2

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.034061 2026
[40]

J. Z. Tam, P. Grosset, D. Banesh, N. Ramachandra, T. L. Turton, and J. P. Ahrens. InferA: A smart assistant for cosmological ensemble data. InProceedings of ACM/IEEE SC Workshops, pp. 20–28, 2025. doi:10.1145/3731599.37673421, 2

work page doi:10.1145/3731599.37673421 2025
[41]

K. Tang, K. Ai, J. Han, and C. Wang. TexGS-V olVis: Expressive scene editing for volume visualization via textured Gaussian splat- ting.IEEE Transactions on Visualization and Computer Graphics, 32(1):933–943, 2026. doi:10.1109/TVCG.2025.36346432

work page doi:10.1109/tvcg.2025.36346432 2026
[42]

Tierny, G

J. Tierny, G. Favelier, J. A. Levine, C. Gueunet, and M. Michaux. The topology toolkit.IEEE Transactions on Visualization and Computer Graphics, 24(1):832–842, 2018. doi:10.1109/TVCG.2017.27439381, 2

work page doi:10.1109/tvcg.2017.27439381 2018
[43]

Exploring Interaction Paradigms for LLM Agents in Scientific Visualization

J. V onderhorst, K. Ai, H. Miao, S. Liu, and C. Wang. Exploring in- teraction paradigms for LLM agents in scientific visualization.arXiv preprint arXiv:2604.27996, 2026. doi:10.48550/arXiv.2604.279961, 2, 4

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.279961 2026
[44]

Y . Wang, B. Pan, K. Wang, H. Liu, J. Mao, Y . Liu, M. Zhu, B. Zhang, W. Chen, X. Huang, et al. IntuiTF: MLLM-guided trans- fer function optimization for direct volume rendering.arXiv preprint arXiv:2506.18407, 2025. doi:10.48550/arXiv.2506.184072

work page doi:10.48550/arxiv.2506.184072 2025
[45]

Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

R. Xu and Y . Yan. Agent skills for large language models: Archi- tecture, acquisition, security, and the path forward.arXiv preprint arXiv:2602.12430, 2026. doi:10.48550/arXiv.2602.124302

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.124302 2026
[46]

S. Yao, N. Shinn, P. Razavi, and K. Narasimhan.τ-bench: A bench- mark for tool-agent-user interaction in real-world domains.arXiv preprint arXiv:2406.12045, 2024. doi:10.48550/arXiv.2406.120451

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2406.120451 2024

[1] [1]

J. P. Ahrens, B. Geveci, and C. C. Law. ParaView: An end-user tool for large-data visualization. In C. D. Hansen and C. R. Johnson, eds., The Visualization Handbook, chap. 36, pp. 717–731. Academic Press,

[2] [2]

doi:10.1016/B978-012387582-2/50038-11, 2

work page doi:10.1016/b978-012387582-2/50038-11

[3] [3]

K. Ai, H. Miao, Z. Li, C. Wang, and S. Liu. An evaluation-centric paradigm for scientific visualization agents. InProceedings of IEEE Workshop on GenAI, Agents, and the Future of VIS, 2025. doi:10. 48550/arXiv.2509.151602

arXiv 2025

[4] [4]

K. Ai, H. Miao, K. Tang, N. Gorski, J. Sun, G. Liu, H. I. Ing ´olfsson, D. Lenz, H. Guo, H. Yu, et al. SciVisAgentBench: A benchmark for evaluating scientific data analysis and visualization agents.arXiv preprint arXiv:2603.29139, 2026. doi:10.48550/arXiv.2603.291391, 2, 3

work page doi:10.48550/arxiv.2603.291391 2026

[5] [5]

K. Ai, K. Tang, and C. Wang. NLI4V olVis: Natural language interac- tion for volume visualization via multi-LLM agents and editable 3D Gaussian splatting.IEEE Transactions on Visualization and Computer Graphics, 32(1):46–56, 2026. doi:10.1109/TVCG.2025.36338881, 2

work page doi:10.1109/tvcg.2025.36338881 2026

[6] [6]

Announcements: Introducing the model con- text protocol.https://www.anthropic.com/news/ model-context-protocol, 2024

Anthropic. Announcements: Introducing the model con- text protocol.https://www.anthropic.com/news/ model-context-protocol, 2024. 1

2024

[7] [7]

Claude Code: An agentic coding tool.https://github

Anthropic. Claude Code: An agentic coding tool.https://github. com/anthropics/claude-code, 2025. 1

2025

[8] [8]

Effective harnesses for long-running agents.https://www.anthropic.com/engineering/ effective-harnesses-for-long-running-agents, 2025

Anthropic. Effective harnesses for long-running agents.https://www.anthropic.com/engineering/ effective-harnesses-for-long-running-agents, 2025. 4

2025

[9] [9]

Equipping agents for the real world with agent skills.https://claude.com/blog/ equipping-agents-for-the-real-world-with-agent-skills,

Anthropic. Equipping agents for the real world with agent skills.https://claude.com/blog/ equipping-agents-for-the-real-world-with-agent-skills,

[10] [10]

Harness design for long-running application de- velopment.https://www.anthropic.com/engineering/ harness-design-long-running-apps, 2026

Anthropic. Harness design for long-running application de- velopment.https://www.anthropic.com/engineering/ harness-design-long-running-apps, 2026. 4

2026

[11] [11]

Biswas, T

A. Biswas, T. L. Turton, N. R. Ranasinghe, S. Jones, B. Love, W. Jones, A. Hagberg, H.-W. Shen, N. DeBardeleben, and E. Lawrence. VizGenie: Toward self-refining, domain-aware work- flows for next-generation scientific visualization.IEEE Transactions on Visualization and Computer Graphics, 32(1):1021–1031, 2026. doi:10.1109/TVCG.2025.36346551, 2

work page doi:10.1109/tvcg.2025.36346551 2026

[12] [12]

N. Chen, Y . Zhang, J. Xu, K. Ren, and Y . Yang. VisEval: A bench- mark for data visualization in the era of large language models.IEEE Transactions on Visualization and Computer Graphics, 31(1):1301– 1311, 2025. doi:10.1109/TVCG.2024.34563202

work page doi:10.1109/tvcg.2024.34563202 2025

[13] [13]

Z. Chen, J. Chen, S. ¨O. Arik, M. Sra, T. Pfister, and J. Yoon. CoDA: Agentic systems for collaborative data visualization.arXiv preprint arXiv:2510.03194, 2025. doi:10.48550/arXiv.2510.031942

work page doi:10.48550/arxiv.2510.031942 2025

[14] [14]

Dhanoa, A

V . Dhanoa, A. Wolter, G. M. Le ´on, H.-J. Schulz, and N. Elmqvist. Agentic visualization: Extracting agent-based design patterns from visualization systems.IEEE Computer Graphics and Applications, 45(6):89–90, 2025. doi:10.1109/MCG.2025.36077411, 2

work page doi:10.1109/mcg.2025.36077411 2025

[15] [15]

V . Dibia. LIDA: A tool for automatic generation of grammar-agnostic visualizations and infographics using large language models. InPro- ceedings of Annual Meeting of the Association for Computational Lin- guistics: System Demonstrations, pp. 113–126, 2023. doi:10.18653/ v1/2023.acl-demo.112

2023

[16] [16]

P. P. Do, K. Tang, K. Ai, and C. Wang. SVLAT: Scientific visualiza- tion literacy assessment test.arXiv preprint arXiv:2603.19000, 2026. doi:10.48550/arXiv.2603.190002

work page doi:10.48550/arxiv.2603.190002 2026

[17] [17]

GMX-VMD-MCP: MCP service for GROMACS and VMD molecular dynamics simulations and visualization.https:// github.com/egtai/gmx-vmd-mcp, 2025

EgT. GMX-VMD-MCP: MCP service for GROMACS and VMD molecular dynamics simulations and visualization.https:// github.com/egtai/gmx-vmd-mcp, 2025. 1, 2

2025

[18] [18]

Galimzyanov, S

T. Galimzyanov, S. Titov, Y . Golubev, and E. Bogomolov. Drawing Pandas: A benchmark for LLMs in generating plotting code. InPro- ceedings of IEEE/ACM International Conference on Mining Software Repositories, pp. 503–507, 2025. doi:10.48550/arXiv.2412.027642

work page doi:10.48550/arxiv.2412.027642 2025

[19] [19]

Gemini CLI: An open-source ai agent that brings the power of gemini directly into your terminal.https://github.com/ google-gemini/gemini-cli, 2025

Google. Gemini CLI: An open-source ai agent that brings the power of gemini directly into your terminal.https://github.com/ google-gemini/gemini-cli, 2025. 1

2025

[20] [20]

Gorski, S

N. Gorski, S. Liu, and B. Wang. TopoPilot: Reliable conversational workflow automation for topological data analysis and visualization. arXiv preprint arXiv:2603.25063, 2026. doi:10.48550/arXiv.2603.25063 1, 2

work page doi:10.48550/arxiv.2603.25063 2026

[21] [21]

Humphrey, A

W. Humphrey, A. Dalke, and K. Schulten. VMD: Visual molecular dynamics.Journal of Molecular Graphics, 14:33–38, 1996. doi:10. 1016/0263-7855(96)00018-51, 2

1996

[22] [22]

D. Jia, A. Irger, L. Besanc ¸on, O. Strnad, D. Luo, J. Bj ¨orklund, A. Kouyoumdjian, A. Ynnerman, and I. Viola. VOICE: Visual ora- cle for interaction, conversation, and explanation.IEEE Transactions on Visualization and Computer Graphics, 31(10):8828–8845, 2025. doi:10.1109/TVCG.2025.35799562

work page doi:10.1109/tvcg.2025.35799562 2025

[23] [23]

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

Y . Jiang, D. Li, H. Deng, B. Ma, X. Wang, Q. Wang, and G. Yu. SoK: Agentic skills–beyond tool use in LLM agents.arXiv preprint arXiv:2602.20867, 2026. doi:10.48550/arXiv.2602.208672

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.208672 2026

[24] [24]

X. Li, W. Chen, Y . Liu, S. Zheng, X. Chen, Y . He, Y . Li, B. You, H. Shen, J. Sun, et al. SkillsBench: Benchmarking how well agent skills work across diverse tasks.arXiv preprint arXiv:2602.12670,

Pith/arXiv arXiv

[25] [25]

doi:10.48550/arXiv.2602.126702

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.126702

[26] [26]

G. Ling, S. Zhong, and R. Huang. Agent skills: A data-driven anal- ysis of claude skills for extending large language model functionality. arXiv preprint arXiv:2602.08004, 2026. doi:10.48550/arXiv.2602.08004 2

work page doi:10.48550/arxiv.2602.08004 2026

[27] [27]

S. Liu, H. Miao, and P.-T. Bremer. ParaView-MCP: An autonomous visualization agent with direct tool use. InProceedings of IEEE VIS Conference (Short Papers), pp. 61–65, 2025. doi:10.48550/arXiv.2505. 070641, 2

work page doi:10.48550/arxiv.2505 2025

[28] [28]

S. Liu, H. Miao, Z. Li, M. Olson, V . Pascucci, and P.-T. Bremer. A V A: Towards autonomous visualization agents through visual perception- driven decision-making.Computer Graphics F orum, 43(3):e15093,

[29] [29]

doi:10.1111/cgf.150931, 2

work page doi:10.1111/cgf.150931

[30] [30]

X. Liu, H. Yu, H. Zhang, Y . Xu, X. Lei, H. Lai, Y . Gu, H. Ding, K. Men, K. Yang, et al. AgentBench: Evaluating LLMs as agents. InProceedings of International Conference on Learning Representa- tions, 2023. doi:10.48550/arXiv.2308.036881

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2308.036881 2023

[31] [31]

Mathai, M

M. Mathai, M. Han, J. Knowles, V . A. Mateevitsi, S. Rizzi, and H. Childs. NL2SciVis: A benchmark for natural language to scientific visualization. InProceedings of Eurographics Conference on Visual- ization (Short Papers), 2026. doi:10.2312/evs.202610172

work page doi:10.2312/evs.202610172 2026

[32] [32]

GAIA: a benchmark for General AI Assistants

G. Mialon, C. Fourrier, T. Wolf, Y . LeCun, and T. Scialom. GAIA: A benchmark for general AI assistants. InProceedings of International Conference on Learning Representations, 2023. doi:10.48550/arXiv. 2311.129831

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2023

[33] [33]

H. Miao, Z. Li, K. Ai, K. Tang, C. Wang, P.-T. Bremer, and S. Liu. Toward AI VIS co-scientists: A general and end-to-end agent har- ness for solving complex data visualization tasks.arXiv preprint arXiv:2605.21825, 2026. doi:10.48550/arXiv.2605.218252, 4

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.218252 2026

[34] [34]

Miao and S

H. Miao and S. Liu. BioImage-Agent.https://github.com/LLNL/ bioimage-agent, 2025. 1, 2

2025

[35] [35]

OpenAI Codex: Lightweight coding agent that runs in your terminal.https://github.com/openai/codex, 2025

OpenAI. OpenAI Codex: Lightweight coding agent that runs in your terminal.https://github.com/openai/codex, 2025. 1

2025

[36] [36]

Peterka, T

T. Peterka, T. Mallick, O. Yildiz, D. Lenz, C. Quammen, and B. Geveci. ChatVis: Large language model agent for generating scien- tific visualizations. InProceedings of IEEE Workshop on Large Data Analysis and Visualization, pp. 22–32, 2025. doi:10.1109/LDAV68558. 2025.000071, 2

work page doi:10.1109/ldav68558 2025

[37] [37]

Sofroniew, T

N. Sofroniew, T. Lambert, G. Bokota, J. Nunez-Iglesias, P. Sobolewski, A. Sweet, L. Gaifas, K. Evans, A. Burt, D. Don- cila Pop, et al. napari: A multi-dimensional image viewer for Python,

[38] [38]

doi:10.5281/zenodo.35556201, 2

work page doi:10.5281/zenodo.35556201

[39] [39]

J. Sun, D. Lenz, T. Peterka, and H. Yu. SASA V: Self-directed agent for scientific analysis and visualization.arXiv preprint arXiv:2604.03406, 2026. doi:10.48550/arXiv.2604.034061, 2

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.034061 2026

[40] [40]

J. Z. Tam, P. Grosset, D. Banesh, N. Ramachandra, T. L. Turton, and J. P. Ahrens. InferA: A smart assistant for cosmological ensemble data. InProceedings of ACM/IEEE SC Workshops, pp. 20–28, 2025. doi:10.1145/3731599.37673421, 2

work page doi:10.1145/3731599.37673421 2025

[41] [41]

K. Tang, K. Ai, J. Han, and C. Wang. TexGS-V olVis: Expressive scene editing for volume visualization via textured Gaussian splat- ting.IEEE Transactions on Visualization and Computer Graphics, 32(1):933–943, 2026. doi:10.1109/TVCG.2025.36346432

work page doi:10.1109/tvcg.2025.36346432 2026

[42] [42]

Tierny, G

J. Tierny, G. Favelier, J. A. Levine, C. Gueunet, and M. Michaux. The topology toolkit.IEEE Transactions on Visualization and Computer Graphics, 24(1):832–842, 2018. doi:10.1109/TVCG.2017.27439381, 2

work page doi:10.1109/tvcg.2017.27439381 2018

[43] [43]

Exploring Interaction Paradigms for LLM Agents in Scientific Visualization

J. V onderhorst, K. Ai, H. Miao, S. Liu, and C. Wang. Exploring in- teraction paradigms for LLM agents in scientific visualization.arXiv preprint arXiv:2604.27996, 2026. doi:10.48550/arXiv.2604.279961, 2, 4

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.279961 2026

[44] [44]

Y . Wang, B. Pan, K. Wang, H. Liu, J. Mao, Y . Liu, M. Zhu, B. Zhang, W. Chen, X. Huang, et al. IntuiTF: MLLM-guided trans- fer function optimization for direct volume rendering.arXiv preprint arXiv:2506.18407, 2025. doi:10.48550/arXiv.2506.184072

work page doi:10.48550/arxiv.2506.184072 2025

[45] [45]

Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

R. Xu and Y . Yan. Agent skills for large language models: Archi- tecture, acquisition, security, and the path forward.arXiv preprint arXiv:2602.12430, 2026. doi:10.48550/arXiv.2602.124302

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.124302 2026

[46] [46]

S. Yao, N. Shinn, P. Razavi, and K. Narasimhan.τ-bench: A bench- mark for tool-agent-user interaction in real-world domains.arXiv preprint arXiv:2406.12045, 2024. doi:10.48550/arXiv.2406.120451

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2406.120451 2024