pith. sign in

arxiv: 1907.02889 · v1 · pith:XS2OMCZ6new · submitted 2019-07-05 · 💻 cs.LG · cs.HC

Visus: An Interactive System for Automatic Machine Learning Model Building and Curation

Pith reviewed 2026-05-25 02:17 UTC · model grok-4.3

classification 💻 cs.LG cs.HC
keywords AutoMLinteractive visualizationpipeline curationmachine learning interfacesdomain expert toolsmodel refinement
0
0 comments X

The pith

Visus is an interactive system that supports domain experts in building and curating AutoML-generated machine learning pipelines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Visus as a response to the scarcity of data scientists by giving domain experts tools to refine the end-to-end pipelines that AutoML systems produce. It grounds design choices in an explicit framework, shows a concrete usage scenario, and reports feedback from testing sessions with domain experts. The central claim is that such an interface makes the model-building process accessible when users have little machine-learning background. If the claim holds, AutoML outputs move from best-effort artifacts to objects that non-experts can actively improve.

Core claim

Visus is a system designed to support the model building process and curation of ML data processing pipelines generated by AutoML systems. The work describes the framework used to ground design choices, illustrates a usage scenario enabled by the system, and discusses feedback received in user testing sessions with domain experts.

What carries the argument

Visus, the interactive interface that guides users through inspection, modification, and refinement of AutoML pipelines.

Load-bearing premise

Domain experts lack machine-learning expertise and therefore need dedicated interactive interfaces to curate AutoML outputs effectively.

What would settle it

A controlled comparison in which domain experts using Visus produce no measurable improvement in pipeline quality or usability over experts working without the system.

Figures

Figures reproduced from arXiv: 1907.02889 by A\'ecio Santos, Bowen Yu, Cl\'audio T. Silva, Cristian Felix, Enrico Bertini, Jorge Piazentin Ono, Juliana Freire, Sonia Castelo, Sungsoo Hong.

Figure 1
Figure 1. Figure 1: Our proposed framework encompasses problem [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visus’s data selection and problem definition screens: (A) select or load dataset view, (B) select task view (create a [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Explore models view Examine Model Explanations. To better understand how a given pipeline performs, Visus generates more detailed explanations. For classification problems, Visus currently supports two visualiza￾tions: a standard confusion matrix and rule matrix [15]. The confu￾sion matrix shows the predicted classes as columns and the true classes as rows as shown in [PITH_FULL_IMAGE:figures/full_fig_p00… view at source ↗
Figure 5
Figure 5. Figure 5: Data augmentation support in Visus. (G) Dataset [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: User-generated visualizations: (1) initial confusion [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
read the original abstract

While the demand for machine learning (ML) applications is booming, there is a scarcity of data scientists capable of building such models. Automatic machine learning (AutoML) approaches have been proposed that help with this problem by synthesizing end-to-end ML data processing pipelines. However, these follow a best-effort approach and a user in the loop is necessary to curate and refine the derived pipelines. Since domain experts often have little or no expertise in machine learning, easy-to-use interactive interfaces that guide them throughout the model building process are necessary. In this paper, we present Visus, a system designed to support the model building process and curation of ML data processing pipelines generated by AutoML systems. We describe the framework used to ground our design choices and a usage scenario enabled by Visus. Finally, we discuss the feedback received in user testing sessions with domain experts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper presents Visus, an interactive system to support domain experts (with limited ML expertise) in curating and refining end-to-end ML data processing pipelines generated by AutoML systems. It grounds the system design in a stated framework, describes a usage scenario, and reports qualitative feedback from user testing sessions with domain experts.

Significance. If the system and its design choices function as described, the work addresses a practical gap in making AutoML outputs usable by non-experts through interactive curation interfaces. The explicit design framework and reported user sessions provide concrete examples of interface features for pipeline inspection and refinement, which could inform future HCI-for-AutoML efforts. The contribution is primarily descriptive and system-oriented rather than a new algorithmic or theoretical result.

major comments (1)
  1. [user testing / evaluation] User testing section: The validation rests entirely on qualitative feedback from domain-expert sessions, with no reported participant count, task protocol, success metrics, or comparison to a baseline interface. This leaves the central claim that Visus 'supports the model building process and curation' supported only by narrative description rather than observable outcomes.
minor comments (2)
  1. [abstract] Abstract: The motivation sentence on domain experts having 'little or no expertise in machine learning' is repeated from the introduction without additional grounding; a brief citation to prior studies on AutoML user barriers would strengthen it.
  2. [usage scenario] The usage scenario is presented narratively; adding a short table or figure summarizing the sequence of user actions and system responses would improve clarity and reproducibility of the demonstrated workflow.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and constructive suggestion regarding the user testing section. We agree that additional specifics will strengthen the manuscript and will revise accordingly.

read point-by-point responses
  1. Referee: [user testing / evaluation] User testing section: The validation rests entirely on qualitative feedback from domain-expert sessions, with no reported participant count, task protocol, success metrics, or comparison to a baseline interface. This leaves the central claim that Visus 'supports the model building process and curation' supported only by narrative description rather than observable outcomes.

    Authors: We acknowledge that the current user testing description is primarily narrative. In the revised version we will explicitly report the number of domain-expert participants, outline the session protocol (including tasks performed and questions asked), and provide more concrete examples of the feedback received and how it informed design decisions. Because the study was designed as a qualitative validation of the proposed design framework rather than a controlled experiment, we did not collect quantitative success metrics or run a baseline comparison; we will make this scope explicit so readers understand the nature of the evidence. revision: partial

Circularity Check

0 steps flagged

No significant circularity: system description with independent user validation

full rationale

The paper presents a descriptive system (Visus) for curating AutoML pipelines, grounded in an explicitly stated design framework, illustrated via a usage scenario, and evaluated through reported user testing sessions with domain experts. No equations, fitted parameters, predictions, or derivations exist that could reduce to inputs by construction. No self-citation chains are invoked as load-bearing uniqueness theorems or ansatzes. The central claims are self-contained in the paper's own construction and external user feedback, qualifying for the default non-circularity outcome.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

No mathematical free parameters or invented physical entities. The design framework is presented as grounding the choices, but its specific axioms are not enumerated in the abstract. The central assumption that interactive interfaces are necessary for domain experts is treated as a domain_assumption rather than derived.

axioms (1)
  • domain assumption Domain experts require guided interactive interfaces because they lack ML expertise
    This premise is stated directly in the abstract as the reason an interactive system is needed.

pith-pipeline@v0.9.0 · 5706 in / 1176 out tokens · 17280 ms · 2026-05-25T02:17:47.322307+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 2 internal anchors

  1. [1]

    A User-based Visual Analytics Workflow for Exploratory Model Analysis

    Dylan Cashman, Shah Rukh Humayoun, Florian Heimerl, Kendall Park, Subhajit Das, John Thompson, Bahador Saket, Abigail Mosca, John T. Stasko, Alex Endert, Michael Gleicher, and Remco Chang. 2018. Visual Analytics for Automated Model Discovery. CoRR abs/1809.10782 (2018). arXiv:1809.10782 http://arxiv.org/abs/ 1809.10782

  2. [2]

    Adriane Chapman, Elena Simperl, Laura Koesten, George Konstantinidis, Luis Daniel Ibáñez-Gonzalez, Emilia Kacprzak, and Paul T. Groth. 2019. Dataset search: a survey. CoRR abs/1901.00735 (2019). arXiv:1901.00735 http://arxiv.org/ abs/1901.00735

  3. [3]

    Louis Columbus. [n. d.]. IBM Predicts Demand For Data Scientists Will Soar 28% By 2020. https://www.forbes.com/sites/louiscolumbus/2017/05/13/ ibm-predicts-demand-for-data-scientists-will-soar-28-by-2020/

  4. [4]

    Iddo Drori, Yamuna Krishnamurthy, Remi Rampin, Raoni Lourenço, Jorge Ono, Kyunghyun Cho, Claudio Silva, and Juliana Freire. 2018. AlphaD3M: Machine Learning Pipeline Synthesis. In Proceedings of Machine Learning Research, ICML 2018 AutoML Workshop

  5. [5]

    Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and robust automated machine learning. In Advances in neural information processing systems . 2962–2970

  6. [6]

    Yolanda Gil, James Honaker, Shikhar Gupta, Yibo Ma, Vito D’Orazio, Daniel Garijo, Shruti Gadewar, Qifan Yang, and Neda Jahanshad. 2019. Towards Human- guided Machine Learning. In Proceedings of the 24th International Conference on Intelligent User Interfaces (IUI ’19). ACM, New York, NY, USA, 614–624. https: //doi.org/10.1145/3301275.3302324

  7. [7]

    Yolanda Gil, Ke-Thia Yao, Varun Ratnakar, Daniel Garijo, Greg Ver Steeg, Rob Brekelmans, Mayank Kejriwal, Fanghao Luo, and I-De Huang. 2018. P4ML: A Phased Performance-Based Pipeline Planner for Automated Machine Learning. In Proceedings of Machine Learning Research, ICML 2018 AutoML Workshop

  8. [8]

    Moritz Hardt, Eric Price, , and Nati Srebro. 2016. Equality of Opportu- nity in Supervised Learning. In Advances in Neural Information Processing Systems 29 , D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Gar- nett (Eds.). Curran Associates, Inc., 3315–3323. http://papers.nips.cc/paper/ 6374-equality-of-opportunity-in-supervised-learning.pdf

  9. [9]

    Hastie, R

    T. Hastie, R. Tibshirani, and J.H. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction . Springer. https://books.google. com/books?id=eBSgoAEACAAJ

  10. [10]

    James Honaker and Vito D’Orazio. 2014. Statistical Modeling by Gesture: A graph- ical, Browser-based Statistical Interface for Data Repositories.. In HT (Doctoral Consortium/Late-breaking Results/Workshops)

  11. [11]

    Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren. 2019. Automatic machine learning: methods, systems, challenges. Challenges in Machine Learning (2019)

  12. [12]

    J J Thomas, K A Cook, Institute Electrical, and Electronics Engineers. 2005. Illuminating the path: The research and development agenda for visual analytics

  13. [13]

    Fei-Fei Li and Jia Li. [n. d.]. Cloud AutoML: Making AI accessi- ble to every business. https://www.blog.google/products/google-cloud/ cloud-automl-making-ai-accessible-every-business/

  14. [14]

    Microsoft. [n. d.]. Microsoft Azure Machine Learning Studio. https://studio. azureml.net/

  15. [15]

    Y. Ming, H. Qu, and E. Bertini. 2019. RuleMatrix: Visualizing and Understanding Classifiers with Rules. IEEE Transactions on Visualization and Computer Graphics 25, 1 (Jan 2019), 342–352. https://doi.org/10.1109/TVCG.2018.2864812

  16. [16]

    nyc-vision-zero [n. d.]. NYC Vision Zero. https://www1.nyc.gov/site/visionzero/ index.page

  17. [17]

    Olson, Nathan Bartley, Ryan J

    Randal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, and Jason H. Moore. 2016. Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science. In Proceedings of the Genetic and Evolutionary Computation Conference 2016 (GECCO ’16). ACM, New York, NY, USA, 485–492. https://doi.org/10.1145/ 2908812.2908918

  18. [18]

    Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P. Gummadi. 2017. Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification Without Disparate Mistreatment. In Proceedings of the 26th International Conference on World Wide Web (WWW ’17). International World Wide Web Conferences Steering Committee, Republic...

  19. [19]

    Indre Zliobaite. 2015. A survey on measuring indirect discrimination in machine learning. CoRR abs/1511.00148 (2015). arXiv:1511.00148 http://arxiv.org/abs/ 1511.00148