VESTA: Visual Exploration with Statistical Tool Agents

Abhishek Divekar; Greg Durrett; Junyi Jessy Li; Kanishk Jain; Kyle Mahowald; Matthew Lease; Sebastian Joseph; Stella S. R. Offner; William Rudman

arxiv: 2606.00384 · v2 · pith:JRVRNJPGnew · submitted 2026-05-29 · 💻 cs.AI · cs.CL· cs.CV· cs.LG· stat.CO

VESTA: Visual Exploration with Statistical Tool Agents

William Rudman , Abhishek Divekar , Kanishk Jain , Sebastian Joseph , Stella S. R. Offner , Matthew Lease , Kyle Mahowald , Greg Durrett

show 1 more author

Junyi Jessy Li

This is my paper

Pith reviewed 2026-06-28 21:56 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.CVcs.LGstat.CO

keywords VESTAdynamic tool creationvision-language modelsstatistical model fittingDAWN benchmarkagentic systemsdata visualizationastronomy modeling

0 comments

The pith

VESTA lets vision-language models create and reuse their own diagnostic tools during statistical model fitting, outperforming fixed-tool agent systems especially on complex tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents VESTA as a way to improve automated fitting of quantitative models to data by giving vision-language models a growing library of tools they generate themselves. These tools handle data transformations, hypothesis-driven visualizations, and statistical tests, and they stay available in the model's context for later reuse instead of relying only on iterative critique. The authors test this against baselines on the new DAWN benchmark, which includes distribution fitting, time series modeling, and real astronomy problems such as initial mass functions and gravitational-wave signals. Dynamic tool creation produces larger gains than static expert tools or no tools at all, with the biggest differences appearing on harder and more specialized tasks. The generated tools also turn out more sophisticated than those from prior visual tool-creation methods, favoring outputs the model can inspect directly.

Core claim

VESTA demonstrates that endowing VLMs with the ability to dynamically select or write diagnostic tools for data exploration and model refinement leads to better performance than prior agentic pipelines that use only critique loops or fixed tool sets. The largest improvements occur on complex and domain-specific tasks in the DAWN benchmark. Dynamically generated tools cover more diagnostic categories per function and show a strong preference for visual outputs that support direct reasoning by the VLM critic.

What carries the argument

The dynamic tool creation and accumulation mechanism, in which the VLM writes or selects new functions for transformations, visualizations, and tests that persist in context for reuse across refinement steps.

If this is right

VESTA with dynamic tools outperforms no-tool and static-expert-tool baselines across the evaluated tasks.
The performance gap widens on complex and domain-specific problems such as astronomy modeling.
Dynamically generated tools are more sophisticated than those from existing visual tool-creation systems, spanning more diagnostic categories and favoring visual outputs.
Tools accumulate in context and remain available for later reuse during iterative refinement.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the tool-creation process scales, the same framework could be applied to other iterative scientific workflows that currently require extensive human-written diagnostics.
The preference for visual outputs suggests the method may integrate naturally with existing vision-language capabilities rather than requiring separate text-only pipelines.
Over repeated use on similar data types, the growing tool library might reduce the need for fresh tool invention on each new modeling problem.

Load-bearing premise

The specific modeling tasks in the DAWN benchmark, including the astronomy examples, represent the kinds of challenges where dynamic tools give a real advantage over static or no-tool approaches.

What would settle it

Running the same three toolkit configurations on a fresh collection of distribution-fitting or time-series tasks drawn from a different domain, such as particle physics or financial data, and finding that the dynamic-tool version loses its performance edge.

Figures

Figures reproduced from arXiv: 2606.00384 by Abhishek Divekar, Greg Durrett, Junyi Jessy Li, Kanishk Jain, Kyle Mahowald, Matthew Lease, Sebastian Joseph, Stella S. R. Offner, William Rudman.

**Figure 1.** Figure 1: Overview of VESTA. By effectively using and creating tools, VESTA produces a probabilistic PyMC program that models the input data. model definition and fitting to data using an efficient Markov Chain Monte Carlo (MCMC) method. Model-building with PyMC code is compositional, allowing for complex distributions to be constructed by combining simpler components. VESTA instantiates a loop of proposing models,… view at source ↗

**Figure 3.** Figure 3: DAWN’s Astro distribution fitting tasks. Example of Initial Mass Functions projected into log-log space. Distributions become visually distinct only when projected into log-log space. 4 The DAWN Benchmark Distribution fitting and time series modeling are two key data science modeling tasks that appear consistently across scientific disciplines. We select these domains because they allow us to benchmark AI … view at source ↗

**Figure 2.** Figure 2: Sample inputs from both domains and all dataset splits in DAWN. Easy splits contain easily recognizable forms. Hard tasks contain mixtures of distinct forms, and Astro tasks reflect real-world astronomy challenges that require additional analysis beyond simple visualization to solve. The Easy tasks in distribution fitting consists of identifying the family and associated parameters for a unimodal distrib… view at source ↗

**Figure 4.** Figure 4: [Top] Average Jensen-Shannon divergence (↓ better) between the ground-truth distribution and the probability density function of the proposed PYMC model on the Distribution Fitting task of DAWN. [Bottom] Average ELPD-LOO (↑ better) for the Time Series Modeling task of DAWN, computed via leave-one-out cross-validation. Error bars denote ±1 standard error of the mean. 6.2 Analysis of Generated Tools [PITH_F… view at source ↗

**Figure 5.** Figure 5: Example of the output from a VESTA generated tool. This tool composes multiple functions to analyze a heavy-tailed distribution. This multi-panel visualization output is fed back into VESTA to generate better hypotheses. Panel titles are enlarged for clarity and panel numbers are added manually. Case Study [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Jensen–Shannon divergence on the Hard distribution fitting split (lower is better) comparing three accumulated toolkit conditions with Claude Sonnet 4.6. presents JS divergence scores across the three toolkit conditions on the Hard distribution fitting split. Overall, performance differences across conditions are modest, with all three variants achieving mean JS divergence between 0.106 and 0.124, and o… view at source ↗

**Figure 7.** Figure 7: Critique-stage prompt used by VESTA for time series modeling. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗

**Figure 8.** Figure 8: Critique-stage prompt used by VESTA for distribution fitting. 28 [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗

**Figure 9.** Figure 9: Tool-selection prompt used by the Generate-Tools stage of V [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗

**Figure 10.** Figure 10: Tool-creation prompt used by the Generate-Tools stage of V [PITH_FULL_IMAGE:figures/full_fig_p029_10.png] view at source ↗

**Figure 11.** Figure 11: Prompt used by the Summarize stage of VESTA. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_11.png] view at source ↗

**Figure 12.** Figure 12: Proposal prompt used by the BoxLM baseline for time series modeling. [PITH_FULL_IMAGE:figures/full_fig_p031_12.png] view at source ↗

**Figure 13.** Figure 13: Proposal prompt used by the BoxLM baseline for distribution fitting. [PITH_FULL_IMAGE:figures/full_fig_p032_13.png] view at source ↗

**Figure 14.** Figure 14: Critic prompt used by the BoxLM baseline for both distribution fitting and time series [PITH_FULL_IMAGE:figures/full_fig_p033_14.png] view at source ↗

**Figure 15.** Figure 15: Agent prompt used by the PyVision baseline for both distribution fitting and time series [PITH_FULL_IMAGE:figures/full_fig_p034_15.png] view at source ↗

read the original abstract

Fitting quantitative models to data is a central step in scientific workflows, yet it remains one of the least automated. Recent agent-based systems leverage language and vision-language models (VLMs) to iteratively propose and refine statistical models, but these systems struggle on more challenging modeling tasks. To address these limitations, we introduce VESTA: Visual Exploration with Statistical Tool Agents, a framework that equips VLMs with a dynamically growing exploration toolkit to guide model refinement through data transformations, hypothesis-driven visualizations, and robust statistical tests. Unlike prior systems that rely on iterative critique alone, VESTA actively explores data before and during refinement by selecting or creating diagnostic tools, which accumulate in the model's context and can be reused later. We evaluate VESTA against established baselines in three toolkit configurations: no tools, static expert-written tools, and dynamic model-written tools. To support this evaluation, we introduce DAWN (Dataset for Automated Workflows and Numerical Modeling), a benchmark targeting distribution fitting and time series modeling with varying difficulty tiers, and culminating in real-world astronomy tasks including modeling initial mass functions and gravitational-wave chirp signals. We find that VESTA's dynamic tool creation outperforms prior agentic pipelines, with the largest gains on complex and domain-specific tasks. We further show that dynamically generated tools are substantially more sophisticated than those produced by existing visual tool-creation systems, covering more diagnostic categories per function and strongly preferring visual outputs that the VLM critic can reason over directly.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

VESTA adds dynamic tool creation to VLM agents for statistical modeling and reports gains on a new DAWN benchmark with astronomy tasks.

read the letter

VESTA equips VLMs with a toolkit that grows through model-generated tools for data transformations, visualizations, and tests during statistical model fitting. The central result is that the dynamic-tool version beats no-tool and static-tool baselines, with the biggest edges on harder and domain-specific cases.

The work is clearest on the distinction from prior iterative-critique agents: it adds active exploration before and during refinement, lets tools accumulate in context, and evaluates three controlled configurations on the new DAWN benchmark. That benchmark includes distribution fitting, time series, and real astronomy examples such as initial mass functions and gravitational-wave chirps, which gives the evaluation some grounding in actual scientific use. The claim that the generated tools are more sophisticated (more diagnostic categories, preference for visual outputs) is also stated directly.

The soft spot is the lack of visible methods, tables, error bars, or code, so it is not yet possible to check how fairly the baselines were run or how tool sophistication was scored. The abstract alone leaves open whether the reported outperformance would hold under closer inspection of implementation details.

This is for people working on agent systems that automate parts of scientific data analysis. A reader who needs concrete examples of tool-using VLMs in modeling workflows would find the setup and benchmark worth examining.

It should go to peer review because the framework and benchmark are concrete and address a stated gap, even though the current version will need fuller methods and results to stand up to scrutiny.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces VESTA, a framework that augments vision-language models with a dynamically growing toolkit of statistical tools (data transformations, visualizations, and tests) for iterative model fitting and refinement. It introduces the DAWN benchmark covering distribution fitting, time series, and real-world astronomy tasks (initial mass functions, gravitational-wave signals), and reports results from three controlled configurations (no tools, static expert tools, dynamic model-written tools) showing that dynamic tool creation yields the largest gains on complex and domain-specific tasks while producing more sophisticated tools than prior visual tool-creation systems.

Significance. If the empirical results hold under full scrutiny of the methods and error analysis, the work would represent a meaningful step toward more adaptive agentic systems for scientific modeling. The introduction of a new benchmark with tiered difficulty and domain-specific astronomy examples, together with the explicit comparison of tool-creation strategies, provides a concrete testbed that future systems can build upon.

major comments (2)

[Abstract (evaluation description)] The central empirical claim rests on the DAWN benchmark tasks being representative of the modeling challenges where static tools fail; however, the abstract provides no quantitative breakdown of task difficulty tiers or failure modes of baselines on the astronomy subset, making it difficult to assess whether the reported gains generalize beyond the chosen examples.
[Abstract (tool sophistication result)] The claim that dynamically generated tools are 'substantially more sophisticated' is load-bearing for the contribution, yet the abstract does not specify the rubric or inter-rater protocol used to rate diagnostic categories and visual-output preference; without this, the comparison to existing visual tool-creation systems cannot be independently verified.

minor comments (1)

[Abstract] The three toolkit configurations are described at a high level; adding a table that explicitly lists the tool inventory size, reuse frequency, and example tool signatures for each configuration would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We address each major point below and will revise the abstract accordingly to improve clarity and verifiability while preserving its length.

read point-by-point responses

Referee: [Abstract (evaluation description)] The central empirical claim rests on the DAWN benchmark tasks being representative of the modeling challenges where static tools fail; however, the abstract provides no quantitative breakdown of task difficulty tiers or failure modes of baselines on the astronomy subset, making it difficult to assess whether the reported gains generalize beyond the chosen examples.

Authors: We agree that the abstract would benefit from a concise reference to the benchmark structure. The revised abstract will note the tiered design of DAWN (distribution fitting, time series, and domain-specific astronomy tasks) and state that dynamic tool creation yields the largest gains on the astronomy subset. Full quantitative results, including baseline failure rates and error analysis by tier, are already provided in Section 4 and Appendix B; we will ensure the abstract points readers to these sections. revision: yes
Referee: [Abstract (tool sophistication result)] The claim that dynamically generated tools are 'substantially more sophisticated' is load-bearing for the contribution, yet the abstract does not specify the rubric or inter-rater protocol used to rate diagnostic categories and visual-output preference; without this, the comparison to existing visual tool-creation systems cannot be independently verified.

Authors: The rubric (counting diagnostic categories such as distribution shape, outliers, and correlations, plus preference for visual outputs) and inter-rater protocol are described in Section 5.2. We will revise the abstract to briefly indicate that tool sophistication was assessed by human raters using these criteria. This addition will allow the claim to be evaluated from the abstract while directing readers to the full protocol in the main text. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical evaluation on new benchmark

full rationale

The paper introduces VESTA as an empirical framework for agentic model fitting and evaluates it via controlled comparisons (no-tool, static tools, dynamic tools) on the newly introduced DAWN benchmark, including astronomy tasks. The abstract and described claims rest on direct performance reporting and tool-sophistication ratings rather than any derivation chain, equations, fitted parameters, or self-citation load-bearing premises. No load-bearing step reduces to its own inputs by construction, and the central results are externally falsifiable via the benchmark tasks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract only; no information on free parameters, axioms, or invented entities is available.

pith-pipeline@v0.9.1-grok · 5833 in / 897 out tokens · 19160 ms · 2026-06-28T21:56:25.469991+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

96 extracted references · 11 canonical work pages · 1 internal anchor

[1]

Pymc: a modern, and comprehensive probabilistic programming framework in python.PeerJ Computer Science, 9:e1516, 2023

Oriol Abril-Pla, Virgile Andreani, Colin Carroll, Larry Dong, Christopher J Fonnesbeck, Maxim Kochurov, Ravin Kumar, Junpeng Lao, Christian C Luhmann, Osvaldo A Martin, et al. Pymc: a modern, and comprehensive probabilistic programming framework in python.PeerJ Computer Science, 9:e1516, 2023

2023
[2]

Evoskill: Automated skill discovery for multi-agent systems, 2026

Salaheddin Alzubi, Noah Provenzano, Jaydon Bingham, Weiyuan Chen, and Tu Vu. Evoskill: Automated skill discovery for multi-agent systems, 2026. URL https://arxiv.org/abs/ 2603.02766

Pith/arXiv arXiv 2026
[3]

Speech signal modeling using multivariate distributions.EURASIP Journal on Audio Speech and Music Processing, 2015: 1–14, 12 2015

Ali Aroudi, Hadi Veisi, Hossein Sameti, and Zahra Mafakheri. Speech signal modeling using multivariate distributions.EURASIP Journal on Audio Speech and Music Processing, 2015: 1–14, 12 2015. doi: 10.1186/s13636-015-0078-1

work page doi:10.1186/s13636-015-0078-1 2015
[4]

Covey, and Michael R

Nate Bastian, Kevin R. Covey, and Michael R. Meyer. A universal stellar initial mass function? a critical look at variations.Annual Review of Astronomy and Astro- physics, 48(V olume 48, 2010):339–389, 2010. ISSN 1545-4282. doi: https://doi.org/ 10 10.1146/annurev-astro-082708-101642. URL https://www.annualreviews.org/content/ journals/10.1146/annurev-ast...

work page doi:10.1146/annurev-astro-082708-101642 2010
[5]

Automated reverse engineering of nonlinear dynamical systems

Josh Bongard and Hod Lipson. Automated reverse engineering of nonlinear dynamical systems. Proceedings of the National Academy of Sciences of the United States of America, 104(24): 9943–9948, Jun 2007. doi: 10.1073/pnas.0609476104

work page doi:10.1073/pnas.0609476104 2007
[6]

Probabilistic grammars for equation discovery.CoRR, abs/2012.00428, 2020

Jure Brence, Ljupco Todorovski, and Saso Dzeroski. Probabilistic grammars for equation discovery.CoRR, abs/2012.00428, 2020. URLhttps://arxiv.org/abs/2012.00428

arXiv 2012
[7]

Large language models as tool makers, 2024

Tianle Cai, Xuezhi Wang, Tengyu Ma, Xinyun Chen, and Denny Zhou. Large language models as tool makers, 2024. URLhttps://arxiv.org/abs/2305.17126

arXiv 2024
[8]

Adaevolve: Adaptive llm driven zeroth-order optimization, 2026

Mert Cemri, Shubham Agrawal, Akshat Gupta, Shu Liu, Audrey Cheng, Qiuyang Mang, Ashwin Naren, Lutfi Eren Erdogan, Koushik Sen, Matei Zaharia, Alex Dimakis, and Ion Stoica. Adaevolve: Adaptive llm driven zeroth-order optimization, 2026. URL https://arxiv.org/ abs/2602.20133

arXiv 2026
[9]

2003, Publications of the Astronomical Society of the Pacific, 115, 763, doi: 10.1086/376392

Gilles Chabrier. Galactic stellar and substellar initial mass function.Publications of the Astronomical Society of the Pacific, 115(809):763–795, July 2003. ISSN 1538-3873. doi: 10.1086/376392. URLhttp://dx.doi.org/10.1086/376392

work page internal anchor Pith review doi:10.1086/376392 2003
[10]

Mle-bench: Evaluating machine learning agents on machine learning engineering, 2025

Jun Shern Chan, Neil Chowdhury, Oliver Jaffe, James Aung, Dane Sherburn, Evan Mays, Giulio Starace, Kevin Liu, Leon Maksin, Tejal Patwardhan, Lilian Weng, and Aleksander M ˛ adry. Mle-bench: Evaluating machine learning agents on machine learning engineering, 2025. URL https://arxiv.org/abs/2410.07095

Pith/arXiv arXiv 2025
[11]

Evoclaw: Evaluating ai agents on continuous software evolution, 2026

Gangda Deng, Zhaoling Chen, Zhongming Yu, Haoyang Fan, Yuhong Liu, Yuxin Yang, Dhruv Parikh, Rajgopal Kannan, Le Cong, Mengdi Wang, Qian Zhang, Viktor Prasanna, Xiangru Tang, and Xingyao Wang. Evoclaw: Evaluating ai agents on continuous software evolution, 2026. URLhttps://arxiv.org/abs/2603.13428

Pith/arXiv arXiv 2026
[12]

Tenenbaum, and Zoubin Ghahramani

David Duvenaud, James Robert Lloyd, Roger Grosse, Joshua B. Tenenbaum, and Zoubin Ghahramani. Structure discovery in nonparametric regression through compositional kernel search, 2013. URLhttps://arxiv.org/abs/1302.4922

Pith/arXiv arXiv 2013
[13]

Dabstep: Data agent benchmark for multi-step reasoning, 2025

Alex Egg, Martin Iglesias Goyanes, Friso Kingma, Andreu Mora, Leandro von Werra, and Thomas Wolf. Dabstep: Data agent benchmark for multi-step reasoning, 2025. URL https: //arxiv.org/abs/2506.23719

arXiv 2025
[14]

Time-series fore- casting of seasonal items sales using machine learning – a comparative analysis.International Journal of Information Management Data Insights, 2(1):100058, 2022

Yasaman Ensafi, Saman Hassanzadeh Amin, Guoqing Zhang, and Bharat Shah. Time-series fore- casting of seasonal items sales using machine learning – a comparative analysis.International Journal of Information Management Data Insights, 2(1):100058, 2022. ISSN 2667-0968. doi: 10.1016/j.jjimei.2022.100058. URL https://www.sciencedirect.com/science/article/ pii...

work page doi:10.1016/j.jjimei.2022.100058 2022
[15]

Li, Lyle Goodyear, Agam Bhatia, Louise Li, Aditi Bhaskar, Mohammed Zaman, and Noah D

Kanishk Gandhi, Michael Y . Li, Lyle Goodyear, Agam Bhatia, Louise Li, Aditi Bhaskar, Mohammed Zaman, and Noah D. Goodman. Boxinggym: Benchmarking progress in automated experimental design and model discovery, 2025. URL https://arxiv.org/abs/2501.01540

arXiv 2025
[16]

Large language models are zero-shot time series forecasters, 2024

Nate Gruver, Marc Finzi, Shikai Qiu, and Andrew Gordon Wilson. Large language models are zero-shot time series forecasters, 2024. URLhttps://arxiv.org/abs/2310.07820

arXiv 2024
[17]

Visual programming: Compositional visual reasoning without training, 2022

Tanmay Gupta and Aniruddha Kembhavi. Visual programming: Compositional visual reasoning without training, 2022. URLhttps://arxiv.org/abs/2211.11559

arXiv 2022
[18]

Deepeyesv2: Toward agentic multimodal model, 2026

Jack Hong, Chenxiao Zhao, ChengLin Zhu, Weiheng Lu, Guohai Xu, and Xing Yu. Deepeyesv2: Toward agentic multimodal model, 2026. URLhttps://arxiv.org/abs/2511.05271

Pith/arXiv arXiv 2026
[19]

Hollon, and Bryan Wang

Xinhai Hou, Shaoyuan Xu, Manan Biyani, Moyan Li, Jia Liu, Todd C. Hollon, and Bryan Wang. Codev: Code with images for faithful visual reasoning via tool-aware policy optimization, 2026. URLhttps://arxiv.org/abs/2511.19661. 11

arXiv 2026
[20]

Visual sketchpad: Sketching as a visual chain of thought for multimodal language models, 2024

Yushi Hu, Weijia Shi, Xingyu Fu, Dan Roth, Mari Ostendorf, Luke Zettlemoyer, Noah A Smith, and Ranjay Krishna. Visual sketchpad: Sketching as a visual chain of thought for multimodal language models, 2024. URLhttps://arxiv.org/abs/2406.09403

arXiv 2024
[21]

Toolace-dev: Self-improving tool learning via decomposition and evolution, 2025

Xu Huang, Weiwen Liu, Xingshan Zeng, Yuefeng Huang, Xinlong Hao, Yuxian Wang, Yirong Zeng, Chuhan Wu, Yasheng Wang, Ruiming Tang, and Defu Lian. Toolace-dev: Self-improving tool learning via decomposition and evolution, 2025. URL https://arxiv.org/abs/2505. 07512

2025
[22]

Jordan, Song Mei, Jason E Weston, Weijie J

Wenlong Ji, Weizhe Yuan, Emily Getzen, Kyunghyun Cho, Michael I. Jordan, Song Mei, Jason E Weston, Weijie J. Su, Jing Xu, and Linjun Zhang. An overview of large language models for statisticians, 2025. URLhttps://arxiv.org/abs/2502.17814

arXiv 2025
[23]

Astro- visbench: A code benchmark for scientific computing and visualization in astronomy.arXiv preprint arXiv:2505.20538, 2025

Sebastian Antony Joseph, Syed Murtaza Husain, Stella SR Offner, StÃŠphanie Juneau, Paul Torrey, Adam S Bolton, Juan P Farias, Niall Gaffney, Greg Durrett, and Junyi Jessy Li. Astro- visbench: A code benchmark for scientific computing and visualization in astronomy.arXiv preprint arXiv:2505.20538, 2025

arXiv 2025
[24]

Automated model discovery via multi-modal & multi-step pipeline, 2025

Lee Jung-Mok, Nam Hyeon-Woo, Moon Ye-Bin, Junhyun Nam, and Tae-Hyun Oh. Automated model discovery via multi-modal & multi-step pipeline, 2025. URL https://arxiv.org/abs/ 2509.25946

arXiv 2025
[25]

P. Kroupa. On the variation of the initial mass function.Monthly Notices of the Royal Astronomical Society, 322(2):231–246, April 2001. ISSN 1365-2966. doi: 10.1046/j.1365-8711. 2001.04022.x. URLhttp://dx.doi.org/10.1046/j.1365-8711.2001.04022.x

work page doi:10.1046/j.1365-8711 2001
[26]

Opensage: Self-programming agent generation engine, 2026

Hongwei Li, Zhun Wang, Qinrun Dai, Yuzhou Nie, Jinjun Peng, Ruitong Liu, Jingyang Zhang, Kaijie Zhu, Jingxuan He, Lun Wang, Yangruibo Ding, Yueqi Chen, Wenbo Guo, and Dawn Song. Opensage: Self-programming agent generation engine, 2026. URL https: //arxiv.org/abs/2602.16891

arXiv 2026
[27]

Li, Emily B

Michael Y . Li, Emily B. Fox, and Noah D. Goodman. Automated statistical model discovery with language models, 2024. URLhttps://arxiv.org/abs/2402.17879

arXiv 2024
[28]

Li, Vivek Vajipey, Noah D

Michael Y . Li, Vivek Vajipey, Noah D. Goodman, and Emily B. Fox. Critical: Critic automation with language models, 2024. URLhttps://arxiv.org/abs/2411.06590

arXiv 2024
[29]

Tenenbaum, and Zoubin Ghahramani

James Robert Lloyd, David Duvenaud, Roger Grosse, Joshua B. Tenenbaum, and Zoubin Ghahramani. Automatic construction and natural-language description of nonparametric regres- sion models, 2014. URLhttps://arxiv.org/abs/1402.4304

Pith/arXiv arXiv 2014
[30]

Beyond static tools: Test-time tool evolution for scientific reasoning, 2026

Jiaxuan Lu, Ziyu Kong, Yemin Wang, Rong Fu, Haiyuan Wan, Cheng Yang, Wenjie Lou, Haoran Sun, Lilong Wang, Yankai Jiang, Xiaosong Wang, Xiao Sun, and Dongzhan Zhou. Beyond static tools: Test-time tool evolution for scientific reasoning, 2026. URL https: //arxiv.org/abs/2601.07641

arXiv 2026
[31]

Mixture cure model methodology in survival analysis: Some recent results for the one-sample case.Statistics Surveys, 18, 01 2024

Ross Maller, Sidney Resnick, Soudabeh Shemehsavar, and Muzhi Zhao. Mixture cure model methodology in survival analysis: Some recent results for the one-sample case.Statistics Surveys, 18, 01 2024. doi: 10.1214/24-SS147

work page doi:10.1214/24-ss147 2024
[32]

Vesta: In depth

NASA Science. Vesta: In depth. https://science.nasa.gov/solar-system/asteroids/ 4-vesta/, . Accessed: May 2, 2026

2026
[33]

Dawn mission overview

NASA Science. Dawn mission overview. https://science.nasa.gov/mission/dawn/, . Accessed: May 2, 2026

2026
[34]

Harnessing vision models for time series analysis: A survey, 2025

Jingchao Ni, Ziming Zhao, ChengAo Shen, Hanghang Tong, Dongjin Song, Wei Cheng, Dongsheng Luo, and Haifeng Chen. Harnessing vision models for time series analysis: A survey, 2025. URLhttps://arxiv.org/abs/2502.08869

arXiv 2025
[35]

S. S. R. Offner, P. C. Clark, P. Hennebelle, N. Bastian, M. R. Bate, P. F. Hopkins, E. Moreaux, and A. P. Whitworth.The Origin and Universality of the Stellar Initial Mass Function. University of Arizona Press, 2014. ISBN 9780816531240. doi: 10.2458/azu_uapress_9780816531240-ch003. URLhttp://dx.doi.org/10.2458/azu_uapress_9780816531240-ch003. 12

work page doi:10.2458/azu_uapress_9780816531240-ch003 2014
[36]

Fung, Yujia Qin, Zhiyuan Liu, and Heng Ji

Cheng Qian, Chi Han, Yi R. Fung, Yujia Qin, Zhiyuan Liu, and Heng Ji. Creator: Tool creation for disentangling abstract and concrete reasoning of large language models, 2024. URL https://arxiv.org/abs/2305.14318

arXiv 2024
[37]

Turner, and David Duvenaud

James Requeima, John Bronskill, Dami Choi, Richard E. Turner, and David Duvenaud. Llm processes: Numerical predictive distributions conditioned on natural language, 2024. URL https://arxiv.org/abs/2405.12856

arXiv 2024
[38]

Forgotten polygons: Multimodal large language models are shape-blind

William Rudman, Michal Golovanevsky, Amir Bar, Vedant Palit, Yann LeCun, Carsten Eickhoff, and Ritambhara Singh. Forgotten polygons: Multimodal large language models are shape-blind. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors, Findings of the Association for Computational Linguistics: ACL 2025, pages 11983–1199...

work page doi:10.18653/v1/2025.findings-acl.620 2025
[39]

ApJ , year = 1955, month = jan, volume =

Edwin E. Salpeter. The Luminosity Function and Stellar Evolution.apj, 121:161, January 1955. doi: 10.1086/145971

work page doi:10.1086/145971 1955
[40]

Towards execution-grounded automated ai research, 2026

Chenglei Si, Zitong Yang, Yejin Choi, Emmanuel Candès, Diyi Yang, and Tatsunori Hashimoto. Towards execution-grounded automated ai research, 2026. URL https://arxiv.org/abs/ 2601.14525

arXiv 2026
[41]

Restgpt: Connecting large language models with real-world restful apis, 2023

Yifan Song, Weimin Xiong, Dawei Zhu, Wenhao Wu, Han Qian, Mingbo Song, Hailiang Huang, Cheng Li, Ke Wang, Rong Yao, Ye Tian, and Sujian Li. Restgpt: Connecting large language models with real-world restful apis, 2023. URLhttps://arxiv.org/abs/2306.06624

arXiv 2023
[42]

A survey on large language model-based agents for statistics and data science.The American Statistician, page 1–14, October 2025

Maojun Sun, Ruijian Han, Binyan Jiang, Houduo Qi, Defeng Sun, Yancheng Yuan, and Jian Huang. A survey on large language model-based agents for statistics and data science.The American Statistician, page 1–14, October 2025. ISSN 1537-2731. doi: 10.1080/00031305. 2025.2561140. URLhttp://dx.doi.org/10.1080/00031305.2025.2561140

work page doi:10.1080/00031305 2025
[43]

Seagent: Self-evolving computer use agent with autonomous learning from experience,

Zeyi Sun, Ziyu Liu, Yuhang Zang, Yuhang Cao, Xiaoyi Dong, Tong Wu, Dahua Lin, and Jiaqi Wang. Seagent: Self-evolving computer use agent with autonomous learning from experience,
[44]

URLhttps://arxiv.org/abs/2508.04700

arXiv
[45]

Vipergpt: Visual inference via python execution for reasoning, 2023

Dídac Surís, Sachit Menon, and Carl V ondrick. Vipergpt: Visual inference via python execution for reasoning, 2023. URLhttps://arxiv.org/abs/2303.08128

Pith/arXiv arXiv 2023
[46]

Eyes wide shut? exploring the visual shortcomings of multimodal llms, 2024

Shengbang Tong, Zhuang Liu, Yuexiang Zhai, Yi Ma, Yann LeCun, and Saining Xie. Eyes wide shut? exploring the visual shortcomings of multimodal llms, 2024. URL https://arxiv. org/abs/2401.06209

arXiv 2024
[47]

V oyager: An open-ended embodied agent with large language models,

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. V oyager: An open-ended embodied agent with large language models,
[48]

URLhttps://arxiv.org/abs/2305.16291

Pith/arXiv arXiv
[49]

Transformers in time series: A survey, 2023

Qingsong Wen, Tian Zhou, Chaoli Zhang, Weiqi Chen, Ziqing Ma, Junchi Yan, and Liang Sun. Transformers in time series: A survey, 2023. URLhttps://arxiv.org/abs/2202.07125

arXiv 2023
[50]

Visual chatgpt: Talking, drawing and editing with visual foundation models, 2023

Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, and Nan Duan. Visual chatgpt: Talking, drawing and editing with visual foundation models, 2023. URL https://arxiv.org/abs/2303.04671

Pith/arXiv arXiv 2023
[51]

Llm agents making agent tools, 2025

Georg Wölflein, Dyke Ferber, Daniel Truhn, Ognjen Arandjelovi´c, and Jakob Nikolas Kather. Llm agents making agent tools, 2025. URLhttps://arxiv.org/abs/2502.11705

arXiv 2025
[52]

Act wisely: Cultivating meta-cognitive tool use in agentic multimodal models, 2026

Shilin Yan, Jintao Tong, Hongwei Xue, Xiaojun Tang, Yangyang Wang, Kunyu Shi, Guannan Zhang, Ruixuan Li, and Yixiong Zou. Act wisely: Cultivating meta-cognitive tool use in agentic multimodal models, 2026. URLhttps://arxiv.org/abs/2604.08545

Pith/arXiv arXiv 2026
[53]

Vismem: Latent vision memory unlocks potential of vision-language models, 2026

Xinlei Yu, Chengming Xu, Guibin Zhang, Zhangquan Chen, Yudong Zhang, Yongbo He, Peng-Tao Jiang, Jiangning Zhang, Xiaobin Hu, and Shuicheng Yan. Vismem: Latent vision memory unlocks potential of vision-language models, 2026. URL https://arxiv.org/abs/ 2511.11007. 13

arXiv 2026
[54]

A transformer-based framework for multivariate time series representation learning,

George Zerveas, Srideepika Jayaraman, Dhaval Patel, Anuradha Bhamidipaty, and Carsten Eickhoff. A transformer-based framework for multivariate time series representation learning,
[55]

URLhttps://arxiv.org/abs/2010.02803

arXiv 2010
[56]

Skywork-r1v4: Toward agentic multimodal intelligence through interleaved thinking with images and deepresearch, 2025

Yifan Zhang, Liang Hu, Haofeng Sun, Peiyu Wang, Yichen Wei, Shukang Yin, Jiangbo Pei, Wei Shen, Peng Xia, Yi Peng, Tianyidan Xie, Eric Li, Yang Liu, Xuchen Song, and Yahui Zhou. Skywork-r1v4: Toward agentic multimodal intelligence through interleaved thinking with images and deepresearch, 2025. URLhttps://arxiv.org/abs/2512.02395

arXiv 2025
[57]

Vipact: Visual-perception enhancement via specialized vlm agent collaboration and tool-use, 2025

Zhehao Zhang, Ryan Rossi, Tong Yu, Franck Dernoncourt, Ruiyi Zhang, Jiuxiang Gu, Sungchul Kim, Xiang Chen, Zichao Wang, and Nedim Lipka. Vipact: Visual-perception enhancement via specialized vlm agent collaboration and tool-use, 2025. URL https://arxiv.org/abs/2410. 16400

2025
[58]

Pyvision: Agentic vision with dynamic tooling, 2025

Shitian Zhao, Haoquan Zhang, Shaoheng Lin, Ming Li, Qilong Wu, Kaipeng Zhang, and Chen Wei. Pyvision: Agentic vision with dynamic tooling, 2025. URL https://arxiv.org/abs/ 2507.07998

arXiv 2025
[59]

Time-vlm: Exploring multimodal vision-language models for augmented time series forecasting, 2025

Siru Zhong, Weilin Ruan, Ming Jin, Huan Li, Qingsong Wen, and Yuxuan Liang. Time-vlm: Exploring multimodal vision-language models for augmented time series forecasting, 2025. URLhttps://arxiv.org/abs/2502.04395

arXiv 2025
[60]

Image-of- thought prompting for visual reasoning refinement in multimodal large language models, 2024

Qiji Zhou, Ruochen Zhou, Zike Hu, Panzhong Lu, Siyang Gao, and Yue Zhang. Image-of- thought prompting for visual reasoning refinement in multimodal large language models, 2024. URLhttps://arxiv.org/abs/2405.13872

arXiv 2024
[61]

Reinforced visual perception with tools, 2025

Zetong Zhou, Dongping Chen, Zixian Ma, Zhihan Hu, Mingyang Fu, Sinan Wang, Yao Wan, Zhou Zhao, and Ranjay Krishna. Reinforced visual perception with tools, 2025. URL https: //arxiv.org/abs/2509.01656

arXiv 2025
[62]

VESTAbeats baseline

Yizhang Zhu, Shiyin Du, Boyan Li, Yuyu Luo, and Nan Tang. Are large language models good statisticians?, 2024. URLhttps://arxiv.org/abs/2406.07815. 14 AVESTADetails Algorithm 2Visual Exploration Agents (Detailed) Require: Data D, iteration limit N, proposals per iteration p, metric R, registry E (initial state: generate_new_toolonly) Ensure:M best, θbest ...

arXiv 2024
[63]

CalculateMoments: Computes the mean, variance, skewness, and excess kurtosis of the input data. Returns a JSON artifact with a plain-language interpretation to guide distribution selection, including symmetry hints (e.g., right-skewed data suggests Gamma, Lognormal, or Weibull families) and tail-weight hints (e.g., leptokurtic data suggests Student-t, Cau...
[64]

Handles both single distributions and mixtures by summing component PDFs weighted by their mixture weights

Histogram: Plots a histogram of the empirical data with the fitted distribution’s probability density function (PDF) overlaid. Handles both single distributions and mixtures by summing component PDFs weighted by their mixture weights. Provides an immediate visual check of whether the model captures the overall shape, modality, and spread of the data. When...
[65]

Produces both a segmentation image with a total mixture overlay and a JSON summary of per-component statistics with distribution family hints

SegmentDistributionsAndCalculateMoments: Segments the data into a specified number of mixture components using a Gaussian Mixture Model (GMM), then computes per- component moments (mean, variance, skewness, kurtosis). Produces both a segmentation image with a total mixture overlay and a JSON summary of per-component statistics with distribution family hin...
[66]

QQPlot: Generates a Quantile-Quantile (Q-Q) plot comparing empirical data quantiles to theoretical quantiles from the currently fitted distribution. Linearity indicates a good fit; S-shaped curvature signals tail mismatch; one-sided curvature suggests skew; and sharp tail departures may indicate outliers or heavier tails than the model captures
[67]

A straight line on the log-log plot indicates power-law or Pareto-type heavy tails, while a straight line on the semi-log plot indicates exponential decay

PlotTailsTransform: Produces log-log and semi-log complementary CDF (CCDF) plots to diagnose tail behavior. A straight line on the log-log plot indicates power-law or Pareto-type heavy tails, while a straight line on the semi-log plot indicates exponential decay. Useful for distinguishing heavy-tailed from light-tailed distributions when the histogram alo...
[68]

A consistent horizontal shift indicates a mis-specified location parameter; a slope mismatch indicates a scale misfit; and systematic tail deviations suggest distributional misfit

ProbabilityPlot: Generates a probability plot comparing the empirical CDF to the fitted distribution’s theoretical CDF. A consistent horizontal shift indicates a mis-specified location parameter; a slope mismatch indicates a scale misfit; and systematic tail deviations suggest distributional misfit. Also reports a Kolmogorov-Smirnov (KS) statistic for qua...
[69]

Returns a plain-text summary of the detected period

GetDominantPeriod: Extracts the dominant period from the time series using Fast Fourier Transform (FFT) analysis. Returns a plain-text summary of the detected period. Most useful when Periodic or PeriodicComplex kernels are under consideration and the period has not yet been numerically determined. The result is available in the subsequent feedback iteration
[70]

Essential for visually assessing whether the model adequately captures the underlying trend and seasonality while appropriately discounting noise

FitVsActuals: Produces a visual overlay of the Gaussian Process (GP) fit on the raw time series data. Essential for visually assessing whether the model adequately captures the underlying trend and seasonality while appropriately discounting noise. Falls back to a raw series plot if no model has been fitted yet
[71]

Used to assess whether residuals resemble white noise; a broadly normal residual distribution is indicative of a well-specified model

FitVsActualsWithResidualsDistribution: Generates a combined plot showing the GP fit overlaid on the observed time series alongside the distribution of residuals. Used to assess whether residuals resemble white noise; a broadly normal residual distribution is indicative of a well-specified model. Falls back to a raw series plot if no model has been fitted yet
[72]

Significant spikes above the confidence band indicate that the model is failing to capture some latent structure in the data

ResidualsAutoCorrelationPlot: Produces an Autocorrelation Function (ACF) plot of the model residuals to check for temporal independence. Significant spikes above the confidence band indicate that the model is failing to capture some latent structure in the data. Falls back to a raw series plot if no model has been fitted yet. 5.ResidualsAutoCorrelationSco...
[73]

Each family encodes different assumptions

diagnostic_fit_checks:Naming a concrete model family (gaussian, gamma, lognormal, Pareto, Weibull, etc.) and trying it on the data. Each family encodes different assumptions. These tools allow for typically allow for a visual comparison of multiple model families at once. Occasionally, we observe some single use model fitting
[74]

Beyond simply fitting and visualizing models, AIC and BIC provide quantitative fit metrics

information_criteria:Numerical scores that rank competing fits while penalizing model complexity. Beyond simply fitting and visualizing models, AIC and BIC provide quantitative fit metrics
[75]

) that maximize the probability of observing the data under the chosen family

mle_fitting:Maximum likelihood estimation: choosing the parameter values ( µ, σ, shape, scale, . . . ) that maximize the probability of observing the data under the chosen family. This is the how you actually of fit models, distinct from what models we want to test in diagnostic_fit_checks. MLE gives you the canonical “best” parameters under a given famil...
[76]

extreme regime

mean_excess_plotPlots the conditional expectation E[X−u|X > u] against threshold u. For the Generalized Pareto distribution this function is linear in u, so a straight line in the upper tail signals a GPD-like tail and tells you where the “extreme regime” begins. This is a tail-diagnostic that complementsdiagnostic_fit_checks. These test help it VESTAdeci...
[77]

how heavy

hill_estimatorEstimates the tail index α of a heavy-tailed distribution from the largest k order statistics, giving a concrete number for “how heavy” the tail is. A Hill plot ( ˆαvs. k) lets you check stability and pick a sensible threshold. This refines a Pareto/power-law fit by pinning down its single most important parameter, and serves as a sanity che...
[78]

If Shapiro-Wilk rejects normality strongly, that rules out the normal family in diagnostic_fit_checks

shapiro_wilkA formal hypothesis test for whether data come from a normal distribu- tion. If Shapiro-Wilk rejects normality strongly, that rules out the normal family in diagnostic_fit_checks
[79]

F.2 Time Series

box_coxA parametric family of power transforms y= (x λ −1)/λ that searches for the λ making the transformed data closest to normal and can be useful when working with exotic, heavy-tailed distributions. F.2 Time Series
[80]

This is typically 22 Table 11: Analysis of functions in VESTA-generated tools that are not contained in the expert toolkit for Distribution Fitting

density_visualization:Overlays a histogram with a kernel density estimate (KDE) to give a non-parametric picture of the marginal distribution of a time series. This is typically 22 Table 11: Analysis of functions in VESTA-generated tools that are not contained in the expert toolkit for Distribution Fitting. Function Easy Hard Astro All Diagnostic Fit Chec...

Showing first 80 references.

[1] [1]

Pymc: a modern, and comprehensive probabilistic programming framework in python.PeerJ Computer Science, 9:e1516, 2023

Oriol Abril-Pla, Virgile Andreani, Colin Carroll, Larry Dong, Christopher J Fonnesbeck, Maxim Kochurov, Ravin Kumar, Junpeng Lao, Christian C Luhmann, Osvaldo A Martin, et al. Pymc: a modern, and comprehensive probabilistic programming framework in python.PeerJ Computer Science, 9:e1516, 2023

2023

[2] [2]

Evoskill: Automated skill discovery for multi-agent systems, 2026

Salaheddin Alzubi, Noah Provenzano, Jaydon Bingham, Weiyuan Chen, and Tu Vu. Evoskill: Automated skill discovery for multi-agent systems, 2026. URL https://arxiv.org/abs/ 2603.02766

Pith/arXiv arXiv 2026

[3] [3]

Speech signal modeling using multivariate distributions.EURASIP Journal on Audio Speech and Music Processing, 2015: 1–14, 12 2015

Ali Aroudi, Hadi Veisi, Hossein Sameti, and Zahra Mafakheri. Speech signal modeling using multivariate distributions.EURASIP Journal on Audio Speech and Music Processing, 2015: 1–14, 12 2015. doi: 10.1186/s13636-015-0078-1

work page doi:10.1186/s13636-015-0078-1 2015

[4] [4]

Covey, and Michael R

Nate Bastian, Kevin R. Covey, and Michael R. Meyer. A universal stellar initial mass function? a critical look at variations.Annual Review of Astronomy and Astro- physics, 48(V olume 48, 2010):339–389, 2010. ISSN 1545-4282. doi: https://doi.org/ 10 10.1146/annurev-astro-082708-101642. URL https://www.annualreviews.org/content/ journals/10.1146/annurev-ast...

work page doi:10.1146/annurev-astro-082708-101642 2010

[5] [5]

Automated reverse engineering of nonlinear dynamical systems

Josh Bongard and Hod Lipson. Automated reverse engineering of nonlinear dynamical systems. Proceedings of the National Academy of Sciences of the United States of America, 104(24): 9943–9948, Jun 2007. doi: 10.1073/pnas.0609476104

work page doi:10.1073/pnas.0609476104 2007

[6] [6]

Probabilistic grammars for equation discovery.CoRR, abs/2012.00428, 2020

Jure Brence, Ljupco Todorovski, and Saso Dzeroski. Probabilistic grammars for equation discovery.CoRR, abs/2012.00428, 2020. URLhttps://arxiv.org/abs/2012.00428

arXiv 2012

[7] [7]

Large language models as tool makers, 2024

Tianle Cai, Xuezhi Wang, Tengyu Ma, Xinyun Chen, and Denny Zhou. Large language models as tool makers, 2024. URLhttps://arxiv.org/abs/2305.17126

arXiv 2024

[8] [8]

Adaevolve: Adaptive llm driven zeroth-order optimization, 2026

Mert Cemri, Shubham Agrawal, Akshat Gupta, Shu Liu, Audrey Cheng, Qiuyang Mang, Ashwin Naren, Lutfi Eren Erdogan, Koushik Sen, Matei Zaharia, Alex Dimakis, and Ion Stoica. Adaevolve: Adaptive llm driven zeroth-order optimization, 2026. URL https://arxiv.org/ abs/2602.20133

arXiv 2026

[9] [9]

2003, Publications of the Astronomical Society of the Pacific, 115, 763, doi: 10.1086/376392

Gilles Chabrier. Galactic stellar and substellar initial mass function.Publications of the Astronomical Society of the Pacific, 115(809):763–795, July 2003. ISSN 1538-3873. doi: 10.1086/376392. URLhttp://dx.doi.org/10.1086/376392

work page internal anchor Pith review doi:10.1086/376392 2003

[10] [10]

Mle-bench: Evaluating machine learning agents on machine learning engineering, 2025

Jun Shern Chan, Neil Chowdhury, Oliver Jaffe, James Aung, Dane Sherburn, Evan Mays, Giulio Starace, Kevin Liu, Leon Maksin, Tejal Patwardhan, Lilian Weng, and Aleksander M ˛ adry. Mle-bench: Evaluating machine learning agents on machine learning engineering, 2025. URL https://arxiv.org/abs/2410.07095

Pith/arXiv arXiv 2025

[11] [11]

Evoclaw: Evaluating ai agents on continuous software evolution, 2026

Gangda Deng, Zhaoling Chen, Zhongming Yu, Haoyang Fan, Yuhong Liu, Yuxin Yang, Dhruv Parikh, Rajgopal Kannan, Le Cong, Mengdi Wang, Qian Zhang, Viktor Prasanna, Xiangru Tang, and Xingyao Wang. Evoclaw: Evaluating ai agents on continuous software evolution, 2026. URLhttps://arxiv.org/abs/2603.13428

Pith/arXiv arXiv 2026

[12] [12]

Tenenbaum, and Zoubin Ghahramani

David Duvenaud, James Robert Lloyd, Roger Grosse, Joshua B. Tenenbaum, and Zoubin Ghahramani. Structure discovery in nonparametric regression through compositional kernel search, 2013. URLhttps://arxiv.org/abs/1302.4922

Pith/arXiv arXiv 2013

[13] [13]

Dabstep: Data agent benchmark for multi-step reasoning, 2025

Alex Egg, Martin Iglesias Goyanes, Friso Kingma, Andreu Mora, Leandro von Werra, and Thomas Wolf. Dabstep: Data agent benchmark for multi-step reasoning, 2025. URL https: //arxiv.org/abs/2506.23719

arXiv 2025

[14] [14]

Time-series fore- casting of seasonal items sales using machine learning – a comparative analysis.International Journal of Information Management Data Insights, 2(1):100058, 2022

Yasaman Ensafi, Saman Hassanzadeh Amin, Guoqing Zhang, and Bharat Shah. Time-series fore- casting of seasonal items sales using machine learning – a comparative analysis.International Journal of Information Management Data Insights, 2(1):100058, 2022. ISSN 2667-0968. doi: 10.1016/j.jjimei.2022.100058. URL https://www.sciencedirect.com/science/article/ pii...

work page doi:10.1016/j.jjimei.2022.100058 2022

[15] [15]

Li, Lyle Goodyear, Agam Bhatia, Louise Li, Aditi Bhaskar, Mohammed Zaman, and Noah D

Kanishk Gandhi, Michael Y . Li, Lyle Goodyear, Agam Bhatia, Louise Li, Aditi Bhaskar, Mohammed Zaman, and Noah D. Goodman. Boxinggym: Benchmarking progress in automated experimental design and model discovery, 2025. URL https://arxiv.org/abs/2501.01540

arXiv 2025

[16] [16]

Large language models are zero-shot time series forecasters, 2024

Nate Gruver, Marc Finzi, Shikai Qiu, and Andrew Gordon Wilson. Large language models are zero-shot time series forecasters, 2024. URLhttps://arxiv.org/abs/2310.07820

arXiv 2024

[17] [17]

Visual programming: Compositional visual reasoning without training, 2022

Tanmay Gupta and Aniruddha Kembhavi. Visual programming: Compositional visual reasoning without training, 2022. URLhttps://arxiv.org/abs/2211.11559

arXiv 2022

[18] [18]

Deepeyesv2: Toward agentic multimodal model, 2026

Jack Hong, Chenxiao Zhao, ChengLin Zhu, Weiheng Lu, Guohai Xu, and Xing Yu. Deepeyesv2: Toward agentic multimodal model, 2026. URLhttps://arxiv.org/abs/2511.05271

Pith/arXiv arXiv 2026

[19] [19]

Hollon, and Bryan Wang

Xinhai Hou, Shaoyuan Xu, Manan Biyani, Moyan Li, Jia Liu, Todd C. Hollon, and Bryan Wang. Codev: Code with images for faithful visual reasoning via tool-aware policy optimization, 2026. URLhttps://arxiv.org/abs/2511.19661. 11

arXiv 2026

[20] [20]

Visual sketchpad: Sketching as a visual chain of thought for multimodal language models, 2024

Yushi Hu, Weijia Shi, Xingyu Fu, Dan Roth, Mari Ostendorf, Luke Zettlemoyer, Noah A Smith, and Ranjay Krishna. Visual sketchpad: Sketching as a visual chain of thought for multimodal language models, 2024. URLhttps://arxiv.org/abs/2406.09403

arXiv 2024

[21] [21]

Toolace-dev: Self-improving tool learning via decomposition and evolution, 2025

Xu Huang, Weiwen Liu, Xingshan Zeng, Yuefeng Huang, Xinlong Hao, Yuxian Wang, Yirong Zeng, Chuhan Wu, Yasheng Wang, Ruiming Tang, and Defu Lian. Toolace-dev: Self-improving tool learning via decomposition and evolution, 2025. URL https://arxiv.org/abs/2505. 07512

2025

[22] [22]

Jordan, Song Mei, Jason E Weston, Weijie J

Wenlong Ji, Weizhe Yuan, Emily Getzen, Kyunghyun Cho, Michael I. Jordan, Song Mei, Jason E Weston, Weijie J. Su, Jing Xu, and Linjun Zhang. An overview of large language models for statisticians, 2025. URLhttps://arxiv.org/abs/2502.17814

arXiv 2025

[23] [23]

Astro- visbench: A code benchmark for scientific computing and visualization in astronomy.arXiv preprint arXiv:2505.20538, 2025

Sebastian Antony Joseph, Syed Murtaza Husain, Stella SR Offner, StÃŠphanie Juneau, Paul Torrey, Adam S Bolton, Juan P Farias, Niall Gaffney, Greg Durrett, and Junyi Jessy Li. Astro- visbench: A code benchmark for scientific computing and visualization in astronomy.arXiv preprint arXiv:2505.20538, 2025

arXiv 2025

[24] [24]

Automated model discovery via multi-modal & multi-step pipeline, 2025

Lee Jung-Mok, Nam Hyeon-Woo, Moon Ye-Bin, Junhyun Nam, and Tae-Hyun Oh. Automated model discovery via multi-modal & multi-step pipeline, 2025. URL https://arxiv.org/abs/ 2509.25946

arXiv 2025

[25] [25]

P. Kroupa. On the variation of the initial mass function.Monthly Notices of the Royal Astronomical Society, 322(2):231–246, April 2001. ISSN 1365-2966. doi: 10.1046/j.1365-8711. 2001.04022.x. URLhttp://dx.doi.org/10.1046/j.1365-8711.2001.04022.x

work page doi:10.1046/j.1365-8711 2001

[26] [26]

Opensage: Self-programming agent generation engine, 2026

Hongwei Li, Zhun Wang, Qinrun Dai, Yuzhou Nie, Jinjun Peng, Ruitong Liu, Jingyang Zhang, Kaijie Zhu, Jingxuan He, Lun Wang, Yangruibo Ding, Yueqi Chen, Wenbo Guo, and Dawn Song. Opensage: Self-programming agent generation engine, 2026. URL https: //arxiv.org/abs/2602.16891

arXiv 2026

[27] [27]

Li, Emily B

Michael Y . Li, Emily B. Fox, and Noah D. Goodman. Automated statistical model discovery with language models, 2024. URLhttps://arxiv.org/abs/2402.17879

arXiv 2024

[28] [28]

Li, Vivek Vajipey, Noah D

Michael Y . Li, Vivek Vajipey, Noah D. Goodman, and Emily B. Fox. Critical: Critic automation with language models, 2024. URLhttps://arxiv.org/abs/2411.06590

arXiv 2024

[29] [29]

Tenenbaum, and Zoubin Ghahramani

James Robert Lloyd, David Duvenaud, Roger Grosse, Joshua B. Tenenbaum, and Zoubin Ghahramani. Automatic construction and natural-language description of nonparametric regres- sion models, 2014. URLhttps://arxiv.org/abs/1402.4304

Pith/arXiv arXiv 2014

[30] [30]

Beyond static tools: Test-time tool evolution for scientific reasoning, 2026

Jiaxuan Lu, Ziyu Kong, Yemin Wang, Rong Fu, Haiyuan Wan, Cheng Yang, Wenjie Lou, Haoran Sun, Lilong Wang, Yankai Jiang, Xiaosong Wang, Xiao Sun, and Dongzhan Zhou. Beyond static tools: Test-time tool evolution for scientific reasoning, 2026. URL https: //arxiv.org/abs/2601.07641

arXiv 2026

[31] [31]

Mixture cure model methodology in survival analysis: Some recent results for the one-sample case.Statistics Surveys, 18, 01 2024

Ross Maller, Sidney Resnick, Soudabeh Shemehsavar, and Muzhi Zhao. Mixture cure model methodology in survival analysis: Some recent results for the one-sample case.Statistics Surveys, 18, 01 2024. doi: 10.1214/24-SS147

work page doi:10.1214/24-ss147 2024

[32] [32]

Vesta: In depth

NASA Science. Vesta: In depth. https://science.nasa.gov/solar-system/asteroids/ 4-vesta/, . Accessed: May 2, 2026

2026

[33] [33]

Dawn mission overview

NASA Science. Dawn mission overview. https://science.nasa.gov/mission/dawn/, . Accessed: May 2, 2026

2026

[34] [34]

Harnessing vision models for time series analysis: A survey, 2025

Jingchao Ni, Ziming Zhao, ChengAo Shen, Hanghang Tong, Dongjin Song, Wei Cheng, Dongsheng Luo, and Haifeng Chen. Harnessing vision models for time series analysis: A survey, 2025. URLhttps://arxiv.org/abs/2502.08869

arXiv 2025

[35] [35]

S. S. R. Offner, P. C. Clark, P. Hennebelle, N. Bastian, M. R. Bate, P. F. Hopkins, E. Moreaux, and A. P. Whitworth.The Origin and Universality of the Stellar Initial Mass Function. University of Arizona Press, 2014. ISBN 9780816531240. doi: 10.2458/azu_uapress_9780816531240-ch003. URLhttp://dx.doi.org/10.2458/azu_uapress_9780816531240-ch003. 12

work page doi:10.2458/azu_uapress_9780816531240-ch003 2014

[36] [36]

Fung, Yujia Qin, Zhiyuan Liu, and Heng Ji

Cheng Qian, Chi Han, Yi R. Fung, Yujia Qin, Zhiyuan Liu, and Heng Ji. Creator: Tool creation for disentangling abstract and concrete reasoning of large language models, 2024. URL https://arxiv.org/abs/2305.14318

arXiv 2024

[37] [37]

Turner, and David Duvenaud

James Requeima, John Bronskill, Dami Choi, Richard E. Turner, and David Duvenaud. Llm processes: Numerical predictive distributions conditioned on natural language, 2024. URL https://arxiv.org/abs/2405.12856

arXiv 2024

[38] [38]

Forgotten polygons: Multimodal large language models are shape-blind

William Rudman, Michal Golovanevsky, Amir Bar, Vedant Palit, Yann LeCun, Carsten Eickhoff, and Ritambhara Singh. Forgotten polygons: Multimodal large language models are shape-blind. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors, Findings of the Association for Computational Linguistics: ACL 2025, pages 11983–1199...

work page doi:10.18653/v1/2025.findings-acl.620 2025

[39] [39]

ApJ , year = 1955, month = jan, volume =

Edwin E. Salpeter. The Luminosity Function and Stellar Evolution.apj, 121:161, January 1955. doi: 10.1086/145971

work page doi:10.1086/145971 1955

[40] [40]

Towards execution-grounded automated ai research, 2026

Chenglei Si, Zitong Yang, Yejin Choi, Emmanuel Candès, Diyi Yang, and Tatsunori Hashimoto. Towards execution-grounded automated ai research, 2026. URL https://arxiv.org/abs/ 2601.14525

arXiv 2026

[41] [41]

Restgpt: Connecting large language models with real-world restful apis, 2023

Yifan Song, Weimin Xiong, Dawei Zhu, Wenhao Wu, Han Qian, Mingbo Song, Hailiang Huang, Cheng Li, Ke Wang, Rong Yao, Ye Tian, and Sujian Li. Restgpt: Connecting large language models with real-world restful apis, 2023. URLhttps://arxiv.org/abs/2306.06624

arXiv 2023

[42] [42]

A survey on large language model-based agents for statistics and data science.The American Statistician, page 1–14, October 2025

Maojun Sun, Ruijian Han, Binyan Jiang, Houduo Qi, Defeng Sun, Yancheng Yuan, and Jian Huang. A survey on large language model-based agents for statistics and data science.The American Statistician, page 1–14, October 2025. ISSN 1537-2731. doi: 10.1080/00031305. 2025.2561140. URLhttp://dx.doi.org/10.1080/00031305.2025.2561140

work page doi:10.1080/00031305 2025

[43] [43]

Seagent: Self-evolving computer use agent with autonomous learning from experience,

Zeyi Sun, Ziyu Liu, Yuhang Zang, Yuhang Cao, Xiaoyi Dong, Tong Wu, Dahua Lin, and Jiaqi Wang. Seagent: Self-evolving computer use agent with autonomous learning from experience,

[44] [44]

URLhttps://arxiv.org/abs/2508.04700

arXiv

[45] [45]

Vipergpt: Visual inference via python execution for reasoning, 2023

Dídac Surís, Sachit Menon, and Carl V ondrick. Vipergpt: Visual inference via python execution for reasoning, 2023. URLhttps://arxiv.org/abs/2303.08128

Pith/arXiv arXiv 2023

[46] [46]

Eyes wide shut? exploring the visual shortcomings of multimodal llms, 2024

Shengbang Tong, Zhuang Liu, Yuexiang Zhai, Yi Ma, Yann LeCun, and Saining Xie. Eyes wide shut? exploring the visual shortcomings of multimodal llms, 2024. URL https://arxiv. org/abs/2401.06209

arXiv 2024

[47] [47]

V oyager: An open-ended embodied agent with large language models,

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. V oyager: An open-ended embodied agent with large language models,

[48] [48]

URLhttps://arxiv.org/abs/2305.16291

Pith/arXiv arXiv

[49] [49]

Transformers in time series: A survey, 2023

Qingsong Wen, Tian Zhou, Chaoli Zhang, Weiqi Chen, Ziqing Ma, Junchi Yan, and Liang Sun. Transformers in time series: A survey, 2023. URLhttps://arxiv.org/abs/2202.07125

arXiv 2023

[50] [50]

Visual chatgpt: Talking, drawing and editing with visual foundation models, 2023

Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, and Nan Duan. Visual chatgpt: Talking, drawing and editing with visual foundation models, 2023. URL https://arxiv.org/abs/2303.04671

Pith/arXiv arXiv 2023

[51] [51]

Llm agents making agent tools, 2025

Georg Wölflein, Dyke Ferber, Daniel Truhn, Ognjen Arandjelovi´c, and Jakob Nikolas Kather. Llm agents making agent tools, 2025. URLhttps://arxiv.org/abs/2502.11705

arXiv 2025

[52] [52]

Act wisely: Cultivating meta-cognitive tool use in agentic multimodal models, 2026

Shilin Yan, Jintao Tong, Hongwei Xue, Xiaojun Tang, Yangyang Wang, Kunyu Shi, Guannan Zhang, Ruixuan Li, and Yixiong Zou. Act wisely: Cultivating meta-cognitive tool use in agentic multimodal models, 2026. URLhttps://arxiv.org/abs/2604.08545

Pith/arXiv arXiv 2026

[53] [53]

Vismem: Latent vision memory unlocks potential of vision-language models, 2026

Xinlei Yu, Chengming Xu, Guibin Zhang, Zhangquan Chen, Yudong Zhang, Yongbo He, Peng-Tao Jiang, Jiangning Zhang, Xiaobin Hu, and Shuicheng Yan. Vismem: Latent vision memory unlocks potential of vision-language models, 2026. URL https://arxiv.org/abs/ 2511.11007. 13

arXiv 2026

[54] [54]

A transformer-based framework for multivariate time series representation learning,

George Zerveas, Srideepika Jayaraman, Dhaval Patel, Anuradha Bhamidipaty, and Carsten Eickhoff. A transformer-based framework for multivariate time series representation learning,

[55] [55]

URLhttps://arxiv.org/abs/2010.02803

arXiv 2010

[56] [56]

Skywork-r1v4: Toward agentic multimodal intelligence through interleaved thinking with images and deepresearch, 2025

Yifan Zhang, Liang Hu, Haofeng Sun, Peiyu Wang, Yichen Wei, Shukang Yin, Jiangbo Pei, Wei Shen, Peng Xia, Yi Peng, Tianyidan Xie, Eric Li, Yang Liu, Xuchen Song, and Yahui Zhou. Skywork-r1v4: Toward agentic multimodal intelligence through interleaved thinking with images and deepresearch, 2025. URLhttps://arxiv.org/abs/2512.02395

arXiv 2025

[57] [57]

Vipact: Visual-perception enhancement via specialized vlm agent collaboration and tool-use, 2025

Zhehao Zhang, Ryan Rossi, Tong Yu, Franck Dernoncourt, Ruiyi Zhang, Jiuxiang Gu, Sungchul Kim, Xiang Chen, Zichao Wang, and Nedim Lipka. Vipact: Visual-perception enhancement via specialized vlm agent collaboration and tool-use, 2025. URL https://arxiv.org/abs/2410. 16400

2025

[58] [58]

Pyvision: Agentic vision with dynamic tooling, 2025

Shitian Zhao, Haoquan Zhang, Shaoheng Lin, Ming Li, Qilong Wu, Kaipeng Zhang, and Chen Wei. Pyvision: Agentic vision with dynamic tooling, 2025. URL https://arxiv.org/abs/ 2507.07998

arXiv 2025

[59] [59]

Time-vlm: Exploring multimodal vision-language models for augmented time series forecasting, 2025

Siru Zhong, Weilin Ruan, Ming Jin, Huan Li, Qingsong Wen, and Yuxuan Liang. Time-vlm: Exploring multimodal vision-language models for augmented time series forecasting, 2025. URLhttps://arxiv.org/abs/2502.04395

arXiv 2025

[60] [60]

Image-of- thought prompting for visual reasoning refinement in multimodal large language models, 2024

Qiji Zhou, Ruochen Zhou, Zike Hu, Panzhong Lu, Siyang Gao, and Yue Zhang. Image-of- thought prompting for visual reasoning refinement in multimodal large language models, 2024. URLhttps://arxiv.org/abs/2405.13872

arXiv 2024

[61] [61]

Reinforced visual perception with tools, 2025

Zetong Zhou, Dongping Chen, Zixian Ma, Zhihan Hu, Mingyang Fu, Sinan Wang, Yao Wan, Zhou Zhao, and Ranjay Krishna. Reinforced visual perception with tools, 2025. URL https: //arxiv.org/abs/2509.01656

arXiv 2025

[62] [62]

VESTAbeats baseline

Yizhang Zhu, Shiyin Du, Boyan Li, Yuyu Luo, and Nan Tang. Are large language models good statisticians?, 2024. URLhttps://arxiv.org/abs/2406.07815. 14 AVESTADetails Algorithm 2Visual Exploration Agents (Detailed) Require: Data D, iteration limit N, proposals per iteration p, metric R, registry E (initial state: generate_new_toolonly) Ensure:M best, θbest ...

arXiv 2024

[63] [63]

CalculateMoments: Computes the mean, variance, skewness, and excess kurtosis of the input data. Returns a JSON artifact with a plain-language interpretation to guide distribution selection, including symmetry hints (e.g., right-skewed data suggests Gamma, Lognormal, or Weibull families) and tail-weight hints (e.g., leptokurtic data suggests Student-t, Cau...

[64] [64]

Handles both single distributions and mixtures by summing component PDFs weighted by their mixture weights

Histogram: Plots a histogram of the empirical data with the fitted distribution’s probability density function (PDF) overlaid. Handles both single distributions and mixtures by summing component PDFs weighted by their mixture weights. Provides an immediate visual check of whether the model captures the overall shape, modality, and spread of the data. When...

[65] [65]

Produces both a segmentation image with a total mixture overlay and a JSON summary of per-component statistics with distribution family hints

SegmentDistributionsAndCalculateMoments: Segments the data into a specified number of mixture components using a Gaussian Mixture Model (GMM), then computes per- component moments (mean, variance, skewness, kurtosis). Produces both a segmentation image with a total mixture overlay and a JSON summary of per-component statistics with distribution family hin...

[66] [66]

QQPlot: Generates a Quantile-Quantile (Q-Q) plot comparing empirical data quantiles to theoretical quantiles from the currently fitted distribution. Linearity indicates a good fit; S-shaped curvature signals tail mismatch; one-sided curvature suggests skew; and sharp tail departures may indicate outliers or heavier tails than the model captures

[67] [67]

A straight line on the log-log plot indicates power-law or Pareto-type heavy tails, while a straight line on the semi-log plot indicates exponential decay

PlotTailsTransform: Produces log-log and semi-log complementary CDF (CCDF) plots to diagnose tail behavior. A straight line on the log-log plot indicates power-law or Pareto-type heavy tails, while a straight line on the semi-log plot indicates exponential decay. Useful for distinguishing heavy-tailed from light-tailed distributions when the histogram alo...

[68] [68]

A consistent horizontal shift indicates a mis-specified location parameter; a slope mismatch indicates a scale misfit; and systematic tail deviations suggest distributional misfit

ProbabilityPlot: Generates a probability plot comparing the empirical CDF to the fitted distribution’s theoretical CDF. A consistent horizontal shift indicates a mis-specified location parameter; a slope mismatch indicates a scale misfit; and systematic tail deviations suggest distributional misfit. Also reports a Kolmogorov-Smirnov (KS) statistic for qua...

[69] [69]

Returns a plain-text summary of the detected period

GetDominantPeriod: Extracts the dominant period from the time series using Fast Fourier Transform (FFT) analysis. Returns a plain-text summary of the detected period. Most useful when Periodic or PeriodicComplex kernels are under consideration and the period has not yet been numerically determined. The result is available in the subsequent feedback iteration

[70] [70]

Essential for visually assessing whether the model adequately captures the underlying trend and seasonality while appropriately discounting noise

FitVsActuals: Produces a visual overlay of the Gaussian Process (GP) fit on the raw time series data. Essential for visually assessing whether the model adequately captures the underlying trend and seasonality while appropriately discounting noise. Falls back to a raw series plot if no model has been fitted yet

[71] [71]

Used to assess whether residuals resemble white noise; a broadly normal residual distribution is indicative of a well-specified model

FitVsActualsWithResidualsDistribution: Generates a combined plot showing the GP fit overlaid on the observed time series alongside the distribution of residuals. Used to assess whether residuals resemble white noise; a broadly normal residual distribution is indicative of a well-specified model. Falls back to a raw series plot if no model has been fitted yet

[72] [72]

Significant spikes above the confidence band indicate that the model is failing to capture some latent structure in the data

ResidualsAutoCorrelationPlot: Produces an Autocorrelation Function (ACF) plot of the model residuals to check for temporal independence. Significant spikes above the confidence band indicate that the model is failing to capture some latent structure in the data. Falls back to a raw series plot if no model has been fitted yet. 5.ResidualsAutoCorrelationSco...

[73] [73]

Each family encodes different assumptions

diagnostic_fit_checks:Naming a concrete model family (gaussian, gamma, lognormal, Pareto, Weibull, etc.) and trying it on the data. Each family encodes different assumptions. These tools allow for typically allow for a visual comparison of multiple model families at once. Occasionally, we observe some single use model fitting

[74] [74]

Beyond simply fitting and visualizing models, AIC and BIC provide quantitative fit metrics

information_criteria:Numerical scores that rank competing fits while penalizing model complexity. Beyond simply fitting and visualizing models, AIC and BIC provide quantitative fit metrics

[75] [75]

) that maximize the probability of observing the data under the chosen family

mle_fitting:Maximum likelihood estimation: choosing the parameter values ( µ, σ, shape, scale, . . . ) that maximize the probability of observing the data under the chosen family. This is the how you actually of fit models, distinct from what models we want to test in diagnostic_fit_checks. MLE gives you the canonical “best” parameters under a given famil...

[76] [76]

extreme regime

mean_excess_plotPlots the conditional expectation E[X−u|X > u] against threshold u. For the Generalized Pareto distribution this function is linear in u, so a straight line in the upper tail signals a GPD-like tail and tells you where the “extreme regime” begins. This is a tail-diagnostic that complementsdiagnostic_fit_checks. These test help it VESTAdeci...

[77] [77]

how heavy

hill_estimatorEstimates the tail index α of a heavy-tailed distribution from the largest k order statistics, giving a concrete number for “how heavy” the tail is. A Hill plot ( ˆαvs. k) lets you check stability and pick a sensible threshold. This refines a Pareto/power-law fit by pinning down its single most important parameter, and serves as a sanity che...

[78] [78]

If Shapiro-Wilk rejects normality strongly, that rules out the normal family in diagnostic_fit_checks

shapiro_wilkA formal hypothesis test for whether data come from a normal distribu- tion. If Shapiro-Wilk rejects normality strongly, that rules out the normal family in diagnostic_fit_checks

[79] [79]

F.2 Time Series

box_coxA parametric family of power transforms y= (x λ −1)/λ that searches for the λ making the transformed data closest to normal and can be useful when working with exotic, heavy-tailed distributions. F.2 Time Series

[80] [80]

This is typically 22 Table 11: Analysis of functions in VESTA-generated tools that are not contained in the expert toolkit for Distribution Fitting

density_visualization:Overlays a histogram with a kernel density estimate (KDE) to give a non-parametric picture of the marginal distribution of a time series. This is typically 22 Table 11: Analysis of functions in VESTA-generated tools that are not contained in the expert toolkit for Distribution Fitting. Function Easy Hard Astro All Diagnostic Fit Chec...