Gypscie: A Cross-Platform AI Artifact Management System

Augusto Fonseca; Eduardo Ogasawara; Esther Pacitti; Fabio Porto; Gabriela Moraes Botaro; Julia Neumann Bastos; Patrick Valduriez

arxiv: 2604.10311 · v1 · submitted 2026-04-11 · 💻 cs.AI · cs.DB

Gypscie: A Cross-Platform AI Artifact Management System

Fabio Porto , Eduardo Ogasawara , Gabriela Moraes Botaro , Julia Neumann Bastos , Augusto Fonseca , Esther Pacitti , Patrick Valduriez This is my paper

Pith reviewed 2026-05-10 15:16 UTC · model grok-4.3

classification 💻 cs.AI cs.DB

keywords AI artifact managementknowledge graphcross-platform schedulingAI lifecycledataflow optimizationprovenance trackingrule-based reasoning

0 comments

The pith

Gypscie uses a knowledge graph to give a single unified view of AI artifacts and schedule their workflows across platforms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Gypscie as a system that isolates AI applications from the complexity of heterogeneous services by maintaining one consistent picture of datasets, models, and dataflows. This picture lives in a knowledge graph whose semantics are queried with rules, allowing the system to reason about what the artifacts mean and what operations they support. Lifecycle steps become high-level dataflows that Gypscie can automatically optimize and send to whatever servers, clouds, or supercomputers are available. The same graph also stores provenance so every produced artifact carries a traceable history. The authors report that this design covers more of the AI lifecycle than existing tools and that real scheduling from abstract descriptions succeeds in their tests.

Core claim

Gypscie is a cross-platform AI artifact management system realized through a knowledge graph that captures application semantics and a rule-based query language that supports reasoning over data and models. Model lifecycle activities are represented as high-level dataflows that can be scheduled across multiple platforms such as servers, cloud platforms, or supercomputers. Gypscie also records provenance information about the artifacts it produces, thereby enabling explainability. Its qualitative comparison with representative AI systems shows broader functionality across the AI artifact lifecycle, and its experimental evaluation demonstrates successful optimization and scheduling of dataflow

What carries the argument

The knowledge graph that encodes AI artifact semantics together with the rule-based query language used for reasoning and dataflow scheduling.

If this is right

Developers can write AI workflows once at a high level and let the system choose and optimize the execution platforms.
The same artifacts and dataflows become portable across servers, clouds, and supercomputers without rewriting platform-specific code.
Provenance stored in the graph supplies an auditable record of how each model or dataset was produced.
A single system can cover more stages of the AI lifecycle than tools specialized for only training, only deployment, or only monitoring.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The semantic layer could reduce the cost of moving AI work between research groups or between commercial providers.
If the graph grows large, query performance and maintenance effort become practical limits on adoption.
Integration hooks to popular ML frameworks would be needed before teams can use Gypscie without changing their existing pipelines.

Load-bearing premise

A knowledge graph plus rule-based query language can capture the semantics of diverse AI artifacts and platforms sufficiently to enable effective cross-platform scheduling and reasoning without major information loss or performance penalties.

What would settle it

A concrete multi-platform dataflow that Gypscie either cannot schedule at all or schedules incorrectly because required semantic details about an artifact or platform are missing from the knowledge graph.

Figures

Figures reproduced from arXiv: 2604.10311 by Augusto Fonseca, Eduardo Ogasawara, Esther Pacitti, Fabio Porto, Gabriela Moraes Botaro, Julia Neumann Bastos, Patrick Valduriez.

**Figure 1.** Figure 1: Dataflow for data preparation and model building In the Rionowcast project, the Gypscie platform is used to provide data engineers and model developers with a unified view of all AI artifacts across their entire lifecycle, spanning heterogeneous platforms. It also offers meteorologists a high-level, web-based interface to this unified view, supporting both operational needs and scientific validation [PITH… view at source ↗

**Figure 2.** Figure 2: Data preparation for inference dataflow 3 Gypscie Architecture After providing an overview of the Gypscie platform and the AI artifacts it supports, we describe the platform’s interface and system architecture. Then, we discuss how data management and provenance management are supported. Model management, knowledge graph management, and dataflow processing are described in the subsequent sections. 3.1 Plat… view at source ↗

**Figure 3.** Figure 3: Overview of the Gypscie Platform [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Gypscie Web Interface more complex applications. For instance, in a meteorology application, front-end pipelines that consume data from different streaming sources can use the Gypscie dataset registration API to register data windows. Then, the meteorology panel used by meteorologists can consume the predictions produced by AI models in Gypscie, using the prediction query API. For more complex services inv… view at source ↗

**Figure 5.** Figure 5: Gypscie Architecture The architecture is divided into two main components, as shown in [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: MLflow online training loss 4.4 Model Selection Model selection enables users to search for model artifacts of interest for inference. Searches can be performed based on various criteria, such as scientific domain or subdomain, metadata, format, tools, and keywords. This capability relies on the artifact catalog. A key functionality of the Gypscie model manager is the automatic selection of models for a … view at source ↗

**Figure 7.** Figure 7: Gypscie Knowledge Graph a unified representation for expressing queries and dataflows, the ability to define domain rules as extensions of domain knowledge, and cost-based optimization of queries and dataflows. In this setting, the explicit graph structure provides the base relations, whereas Datalog rules define derived predicates that capture higher-level semantics needed by AI applications. The code sni… view at source ↗

**Figure 8.** Figure 8: shows a UML representation of the dataflow language data model. An artifact is a generic concept that represents data, processes, and dataflows. Each artifact object is identified by a unique GID, as described in Section 3 [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗

**Figure 9.** Figure 9: complements [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

**Figure 10.** Figure 10: Dataflow split into fragments This complex activity is typically performed manually by the dataflow developer, which is error-prone and suboptimal. In Gypscie, we automate this process using a scheduling approach that prioritizes data locality for data preprocessing and model inference and GPU availability for model training. The computation of dataflow fragments proceeds as follows. It starts by analyzi… view at source ↗

**Figure 11.** Figure 11: shows the average execution time and standard deviation for Pandas and Spark as the data size increases, with significant differences in performance behavior. For the data sample of size 10, both approaches exhibit higher variability in execution time, reflected by a larger standard deviation. Such behavior is explained by the presence of an initial overhead associated with the first execution, related… view at source ↗

**Figure 12.** Figure 12: provides a complementary view of the results by considering only the fifth execution of each sample, since execution times become consistent after the initial stabilization phase. Thus, the analysis focuses on the approximate absolute execution time, expressed in minutes, providing a direct comparison of computational cost between the approaches. These results further highlight the scalability difference … view at source ↗

**Figure 13.** Figure 13: Comparison of RAM usage over time between the original and optimized dataflows, considering a sample of 70 files processed using Pandas. 8 Related Work Spurred by the growing adoption of AI across various applications, numerous new systems have been proposed to support the entire AI model lifecycle. Unlike traditional software engineering, the development of AI applications is more iterative and explorat… view at source ↗

read the original abstract

Artificial Intelligence (AI) models, encompassing both traditional machine learning (ML) and more advanced approaches such as deep learning and large language models (LLMs), play a central role in modern applications. AI model lifecycle management involves the end-to-end process of managing these models, from data collection and preparation to model building, evaluation, deployment, and continuous monitoring. This process is inherently complex, as it requires the coordination of diverse services that manage AI artifacts such as datasets, dataflows, and models, all orchestrated to operate seamlessly. In this context, it is essential to isolate applications from the complexity of interacting with heterogeneous services, datasets, and AI platforms. In this paper, we introduce Gypscie, a cross-platform AI artifact management system. By providing a unified view of all AI artifacts, the Gypscie platform simplifies the development and deployment of AI applications. This unified view is realized through a knowledge graph that captures application semantics and a rule-based query language that supports reasoning over data and models. Model lifecycle activities are represented as high-level dataflows that can be scheduled across multiple platforms, such as servers, cloud platforms, or supercomputers. Finally, Gypscie records provenance information about the artifacts it produces, thereby enabling explainability. Our qualitative comparison with representative AI systems shows that Gypscie supports a broader range of functionalities across the AI artifact lifecycle. Our experimental evaluation demonstrates that Gypscie can successfully optimize and schedule dataflows on AI platforms from an abstract specification.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Gypscie describes a practical system for unifying AI artifact management via knowledge graphs and cross-platform dataflow scheduling, but the abstract supplies almost no evidence that the claims about broader coverage and successful scheduling actually hold.

read the letter

Gypscie is a system that gives a single view of AI artifacts (datasets, models, dataflows) through a knowledge graph, uses rules for reasoning over them, turns lifecycle steps into abstract dataflows that get scheduled on different platforms, and keeps provenance for explainability. The core engineering move—hiding platform differences behind a high-level specification—is sensible for teams that already juggle servers, clouds, and supercomputers.

Referee Report

2 major / 2 minor

Summary. The paper introduces Gypscie, a cross-platform AI artifact management system that provides a unified view of AI artifacts (datasets, dataflows, models) via a knowledge graph capturing application semantics and a rule-based query language supporting reasoning. Lifecycle activities are represented as high-level dataflows schedulable across heterogeneous platforms (servers, cloud, supercomputers), with provenance recording for explainability. The authors claim, via qualitative comparison, that Gypscie supports a broader range of functionalities across the AI artifact lifecycle than representative systems, and via experimental evaluation, that it can successfully optimize and schedule dataflows from an abstract specification.

Significance. If the core claims hold, the work addresses a genuine need for interoperability in AI artifact management by abstracting away platform heterogeneity through semantic modeling and automated scheduling. The combination of knowledge graphs with rule-based reasoning and provenance tracking could improve explainability and reduce development overhead for complex, multi-platform AI applications. However, the absence of concrete evaluation details limits assessment of whether the approach delivers practical gains without substantial information loss or performance overhead.

major comments (2)

[Experimental Evaluation] Experimental Evaluation section: The claim that 'Gypscie can successfully optimize and schedule dataflows on AI platforms from an abstract specification' is load-bearing for the central contribution, yet the section provides no information on the platforms tested, the concrete dataflow specifications used, optimization metrics (e.g., makespan, resource usage, success rate), baselines, or failure modes. This prevents verification that the results support the claim.
[Qualitative Comparison] Qualitative Comparison section: The assertion that Gypscie 'supports a broader range of functionalities across the AI artifact lifecycle' is central to the significance argument, but the section does not define the comparison criteria, name the representative AI systems, or detail which lifecycle stages (data collection, model building, deployment, monitoring) were assessed and how. This leaves the broader-functionality claim unsubstantiated.

minor comments (2)

The abstract and introduction refer to 'representative AI systems' without naming them; explicitly list the systems and the functionality matrix in the comparison section.
Provide at least one concrete example of the knowledge-graph schema and a sample rule from the query language to illustrate how artifact semantics are captured and reasoned over.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review of our manuscript on Gypscie. The comments highlight important areas where additional clarity and detail will strengthen the paper, and we address each point below.

read point-by-point responses

Referee: [Experimental Evaluation] Experimental Evaluation section: The claim that 'Gypscie can successfully optimize and schedule dataflows on AI platforms from an abstract specification' is load-bearing for the central contribution, yet the section provides no information on the platforms tested, the concrete dataflow specifications used, optimization metrics (e.g., makespan, resource usage, success rate), baselines, or failure modes. This prevents verification that the results support the claim.

Authors: We agree that the Experimental Evaluation section requires substantially more detail to allow verification of the central claim. In the revised manuscript, we will expand the section to specify the platforms tested (including local servers, cloud instances on AWS, and supercomputers), the concrete dataflow specifications used as input, the optimization metrics evaluated (makespan, resource usage, and success rate), the baseline systems employed for comparison, and any observed failure modes with corresponding mitigation strategies. These additions will directly support the claim that Gypscie can optimize and schedule dataflows from an abstract specification. revision: yes
Referee: [Qualitative Comparison] Qualitative Comparison section: The assertion that Gypscie 'supports a broader range of functionalities across the AI artifact lifecycle' is central to the significance argument, but the section does not define the comparison criteria, name the representative AI systems, or detail which lifecycle stages (data collection, model building, deployment, monitoring) were assessed and how. This leaves the broader-functionality claim unsubstantiated.

Authors: We concur that the Qualitative Comparison section needs explicit definitions and details to substantiate the claim. In the revised version, we will define the comparison criteria (such as coverage of lifecycle stages, cross-platform scheduling support, provenance capabilities, and reasoning features), name the specific representative AI systems evaluated (e.g., MLflow, Kubeflow, and DVC), and provide a stage-by-stage breakdown of the AI artifact lifecycle (data collection, model building, deployment, and monitoring) with explanations of how Gypscie offers broader functionality in each area compared to the baselines. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes the design and implementation of the Gypscie system, including its use of a knowledge graph and rule-based query language for AI artifact management and dataflow scheduling. It supports claims via qualitative comparison to other systems and experimental evaluation of scheduling from abstract specifications. No mathematical derivations, fitted parameters presented as predictions, self-definitional constructs, or load-bearing self-citations appear in the provided text. The central functionality claims rest on external evaluation rather than reducing to inputs by construction, making the work self-contained against the listed circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claims rest on the domain assumption that AI artifacts and their relationships can be adequately modeled in a knowledge graph and that high-level dataflows can be automatically optimized and scheduled across platforms without loss of correctness or efficiency.

axioms (2)

domain assumption AI artifacts (datasets, dataflows, models) and their lifecycle relationships can be represented and reasoned over using a knowledge graph and rule-based query language
This is the foundational premise stated in the abstract for achieving a unified view and explainability.
domain assumption High-level abstract dataflow specifications can be successfully optimized and scheduled on heterogeneous AI platforms
Directly invoked by the experimental evaluation claim.

invented entities (1)

Gypscie platform no independent evidence
purpose: Cross-platform unified management of AI artifacts via knowledge graph and dataflow scheduling
The new system introduced by the paper; no independent evidence outside the paper is provided.

pith-pipeline@v0.9.0 · 5585 in / 1322 out tokens · 33415 ms · 2026-05-10T15:16:44.176624+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

[1]

2.ArangoDB

1.Akbarinia, R., Botella, C., Joly, A., Masseglia, F., Mattoso, M., Oga- sa w ara, E., de Oliveira, D., Pacitti, E., Porto, F., Pradal, C., Shasha, D., and V alduriez, P.Life Science Workflow Services (LifeSWS): Motivations and Architecture.Transactions on Large Scale Data and Knowledge Centered Sys- tems 55(2023), 1–24. 2.ArangoDB. ArangoML Pipeline. Tec...

work page 2023
[2]

InBDA 2023 - 39.Conférence sur la Gestion de Données – Principes, Technologies et Applications(Oct

3.Baget, J.-F., Bisquert, P., Leclère, M., Mugnier, M.-L., Pérution-Kihli, G., Tornil, F., and Ulliana, F.InteGraal: a Tool for Data-Integration and Reasoning on Heterogeneous and Federated Sources. InBDA 2023 - 39.Conférence sur la Gestion de Données – Principes, Technologies et Applications(Oct. 2023). 36 F. Porto et al. 4.Bala, B., and Behal, S.A Brief...

work page 2023
[3]

Y., Haque, Z., Haykal, S., Ispir, M., Jain, V., Koc, L., Koo, C

7.Baylor, D., Breck, E., Cheng, H.-T., Fiedel, N., Foo, C. Y., Haque, Z., Haykal, S., Ispir, M., Jain, V., Koc, L., Koo, C. Y., Lew, L., Mew ald, C., Modi, A. N., Polyzotis, N., Ramesh, S., Roy, S., Whang, S. E., Wicke, M., Wilkiewicz, J., Zhang, X., and Zinkevich, M.TFX: A TensorFlow- Based Production-Scale Machine Learning Platform. InProceedings of the...

work page 2017
[4]

B.SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle

9.Boehm, M., Antonov, I., Baunsgaard, S., Dokter, M., Ginthör, R., In- nerebner, K., Klezin, F., Lindstaedt, S., Phani, A., Rath, B., Reinw ald, B., Siddiqi, S., and Wrede, S. B.SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle. InAnnu. Conf. Innovative Data Syst. Res., CIDR(2020), Conference on Innovative Data Syst...

work page 2020
[5]

M., Ogasa w ara, E

11.Castro, R., Souto, Y. M., Ogasa w ara, E. S., Porto, F. A. M., and Bez- erra, E.STConvS2S: Spatiotemporal Convolutional Sequence to Sequence Net- work for weather forecasting.Neurocomputing 426(2021), 285 –

work page 2021
[6]

12.Chen, A., Chow, A., Da vidson, A., DCunha, A., Ghodsi, A., Hong, S. A., Konwinski, A., Mew ald, C., Murching, S., Nykodym, T., Ogil vie, P., Parkhe, M., Singh, A., Xie, F., Zaharia, M., Zang, R., Zheng, J., and Zu- mar, C.Developments in MLflow: A System to Accelerate the Machine Learning Lifecycle. InProceedings of the Fourth International Workshop on...

work page 2020
[7]

N., Simões, A., Cardoso, C

Gypscie: A Cross-Platform AI Artifact Management System 37 14.da Sil v a, D. N., Simões, A., Cardoso, C. L. S., de Oliveira, D. E. M., Rittmeyer, J. G., Wehmuth, K., Lustosa, H., Pereira, R. S., Souto, Y. M., and Vignoli, L. E.A conceptual vision toward the management of machine learning models. InER Forum and Poster & Demos Session(2019), vol. 2469 ofCEU...

work page 2019
[8]

P.Automated capture of experiment context for easier reproducibil- ity in computational research.Computing in Science and Engineering 14, 4 (2012), 48 –

15.Da vison, A. P.Automated capture of experiment context for easier reproducibil- ity in computational research.Computing in Science and Engineering 14, 4 (2012), 48 –

work page 2012
[9]

M., and Stonebraker, M

18.Hellerstein, J. M., and Stonebraker, M. R.Predicate Migration: Opti- mizing Queries with Expensive Predicates.SIGMOD Record 22, 2 (1993), 267 –

work page 1993
[10]

R., Polikar, R., and Cha wla, N

19.Hoens, T. R., Polikar, R., and Cha wla, N. V.Learning from streaming data with concept drift and imbalance: An overview.Progress in Artificial Intelligence 1, 1 (2012), 89 –

work page 2012
[11]

D., Gutierrez, C., Kirrane, S., Gayo, J

20.Hogan, A., Blomqvist, E., Cochez, M., D’amato, C., Melo, G. D., Gutierrez, C., Kirrane, S., Gayo, J. E. L., Na vigli, R., Neumaier, S., Ngomo, A.-C. N., Polleres, A., Rashid, S. M., Rula, A., Schmelzeisen, L., Sequeda, J., Staab, S., and Zimmermann, A.Knowledge Graphs.ACM Comput. Surv. 54, 4 (July 2021), 71:1–71:37. 21.Hopsworks. Hopsworks: The AI Fact...

work page 2021
[12]

O., Strüber, D

23.Idowu, S. O., Strüber, D. G., and Berger, T.Asset Management in Machine Learning: State-of-research and State-of-practice.ACM Computing Surveys 55, 7 (2023). 24.IDSIA. Sacred: a tool to help configure, organize, log and reproduce experiments. Tech. rep., https://github.com/IDSIA/sacred/,

work page 2023
[13]

InProceedings - International Conference on Distributed Computing Systems(2017), pp

25.Ismail, M., Gebremeskel, E., Kakantousis, T., Berthou, G., and Dowl- ing, J.Hopsworks: Improving User Experience and Development on Hadoop with Scalable, Strongly Consistent Metadata. InProceedings - International Conference on Distributed Computing Systems(2017), pp. 2525 –

work page 2017
[14]

Streamlining ML Training in Kubernetes: An MLOps Architecture with Kubeflow

26.Korontanis, I., Zacharia, A., Makris, A., Pateraki, M., and Tserpes, K. Streamlining ML Training in Kubernetes: An MLOps Architecture with Kubeflow. InIOT 2025 - Proceedings of the 15th International Conference on the Internet of Things 2025(New York, NY, USA, 2025), IOT 2025, Association for Computing Machinery, pp. 267 –

work page 2025
[15]

S., and Deshpande, A.ModelHub: Deep Learn- ing Lifecycle Management

29.Miao, H., Li, A., Da vis, L. S., and Deshpande, A.ModelHub: Deep Learn- ing Lifecycle Management. In2017 IEEE 33rd International Conference on Data Engineering (ICDE)(Apr. 2017), pp. 1393–1394. 30.Miao, X., Wu, Y., Chen, L., Gao, Y., and Yin, J.An Experimental Survey of Missing Data Imputation Algorithms.IEEE Transactions on Knowledge and Data Engineer...

work page 2017
[16]

Porto et al

38 F. Porto et al. 31.Moraes, G., Porto, F., Ulliana, F., Baget, J.-F., Leclère, M., Bisquert, P., Gonçal ves, B., and V alduriez, P.Gypscie-KG: Building a Logic-Based Approach for Knowledge Graph Data Integration View in ML Systems | Anais Es- tendidosdoSimpósioBrasileirodeBancodeDados(SBBD). InSimpósio Brasileiro de Banco de Dados (SBBD)(Sept. 2025), SB...

work page 2025
[17]

A., Khoonsari, P

33.Novella, J. A., Khoonsari, P. E., Herman, S., Whitenack, D., Capuccini, M., Burman, J., Kultima, K., and Spjuth, O.Container-based bioinformatics with Pachyderm.Bioinformatics 35, 5 (2019), 839 –

work page 2019
[18]

34.Ogasa w ara, E., de Oliveira, D., V alduriez, P., Dias, J., Porto, F., and Mattoso, M.An algebraic approach for data-centric scientific workflows.Pro- ceedings of the VLDB Endowment 4, 12 (2011), 1328 –

work page 2011
[19]

S., Souto, Y

36.Pereira, R. S., Souto, Y. M., Cha ves, A., Zorilla, R., Tsan, B., Rusu, F., Ogasa w ara, E. S., Ziviani, A., and Porto, F. A. M.DJEnsemble: A Cost-Based Selection and Allocation of a Disjoint Ensemble of Spatio-Temporal Models. InACM International Conference Proceeding Series(2021), pp. 226 –

work page 2021
[20]

InACM SIGMOD Joint International Work- shop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)(2022)

37.Ramusat, Y., Maniu, S., and Senellart, P.Efficient provenance-aware query- ing of graph databases with datalog. InACM SIGMOD Joint International Work- shop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)(2022). 38.Schlegel, M., and Sattler, K. U.Management of Machine Learning Lifecycle Artifacts: A Survey.SIGMOD...

work page 2022
[21]

G., Lourenco, V

39.Souza, R., Azevedo, L. G., Lourenco, V. N., Soares, E. F. D. S., Thi- ago, R. M., Brandão, R. R. M., Civitarese, D. S., Vital-Brazil, E. A., Moreno, M. F., and V alduriez, P.Workflow provenance in the lifecycle of sci- entific machine learning.Concurrency and Computation: Practice and Experience 34, 14 (2022). 40.Sparks, E. R., Venkataraman, S., Kaftan...

work page 2022
[22]

K.RDMAreadbasedrendezvous protocol for MPI over InfiniBand: design alternatives and benefits

41.Sur, S., Jin, H.-W., Chai, L., and Panda, D. K.RDMAreadbasedrendezvous protocol for MPI over InfiniBand: design alternatives and benefits. InProceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of paral- lel programming(New York, NY, USA, Mar. 2006), PPoPP ’06, Association for Computing Machinery, pp. 32–39. 42.V artak, M., Subr...

work page 2006
[23]

HELIX: Holistic optimization for accelerating iterative machine learning

Gypscie: A Cross-Platform AI Artifact Management System 39 45.Xin, D., Macke, S., Ma, L., Liu, J., Song, S., and Paramesw aran, A. HELIX: Holistic optimization for accelerating iterative machine learning. InPro- ceedings of the VLDB Endowment(2018), vol. 12, VLDB Endowment, pp. 446 –

work page 2018
[24]

InProceedings of the 28th ACM International Conference on Multimedia(New York, NY, USA, Oct

47.Zhang, H., Li, Y., Huang, Y., Wen, Y., Yin, J., and Guan, K.MLModelCI: AnAutomaticCloudPlatformforEfficientMLaaS. InProceedings of the 28th ACM International Conference on Multimedia(New York, NY, USA, Oct. 2020), MM ’20, Association for Computing Machinery, pp. 4453–4456

work page 2020

[1] [1]

2.ArangoDB

1.Akbarinia, R., Botella, C., Joly, A., Masseglia, F., Mattoso, M., Oga- sa w ara, E., de Oliveira, D., Pacitti, E., Porto, F., Pradal, C., Shasha, D., and V alduriez, P.Life Science Workflow Services (LifeSWS): Motivations and Architecture.Transactions on Large Scale Data and Knowledge Centered Sys- tems 55(2023), 1–24. 2.ArangoDB. ArangoML Pipeline. Tec...

work page 2023

[2] [2]

InBDA 2023 - 39.Conférence sur la Gestion de Données – Principes, Technologies et Applications(Oct

3.Baget, J.-F., Bisquert, P., Leclère, M., Mugnier, M.-L., Pérution-Kihli, G., Tornil, F., and Ulliana, F.InteGraal: a Tool for Data-Integration and Reasoning on Heterogeneous and Federated Sources. InBDA 2023 - 39.Conférence sur la Gestion de Données – Principes, Technologies et Applications(Oct. 2023). 36 F. Porto et al. 4.Bala, B., and Behal, S.A Brief...

work page 2023

[3] [3]

Y., Haque, Z., Haykal, S., Ispir, M., Jain, V., Koc, L., Koo, C

7.Baylor, D., Breck, E., Cheng, H.-T., Fiedel, N., Foo, C. Y., Haque, Z., Haykal, S., Ispir, M., Jain, V., Koc, L., Koo, C. Y., Lew, L., Mew ald, C., Modi, A. N., Polyzotis, N., Ramesh, S., Roy, S., Whang, S. E., Wicke, M., Wilkiewicz, J., Zhang, X., and Zinkevich, M.TFX: A TensorFlow- Based Production-Scale Machine Learning Platform. InProceedings of the...

work page 2017

[4] [4]

B.SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle

9.Boehm, M., Antonov, I., Baunsgaard, S., Dokter, M., Ginthör, R., In- nerebner, K., Klezin, F., Lindstaedt, S., Phani, A., Rath, B., Reinw ald, B., Siddiqi, S., and Wrede, S. B.SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle. InAnnu. Conf. Innovative Data Syst. Res., CIDR(2020), Conference on Innovative Data Syst...

work page 2020

[5] [5]

M., Ogasa w ara, E

11.Castro, R., Souto, Y. M., Ogasa w ara, E. S., Porto, F. A. M., and Bez- erra, E.STConvS2S: Spatiotemporal Convolutional Sequence to Sequence Net- work for weather forecasting.Neurocomputing 426(2021), 285 –

work page 2021

[6] [6]

12.Chen, A., Chow, A., Da vidson, A., DCunha, A., Ghodsi, A., Hong, S. A., Konwinski, A., Mew ald, C., Murching, S., Nykodym, T., Ogil vie, P., Parkhe, M., Singh, A., Xie, F., Zaharia, M., Zang, R., Zheng, J., and Zu- mar, C.Developments in MLflow: A System to Accelerate the Machine Learning Lifecycle. InProceedings of the Fourth International Workshop on...

work page 2020

[7] [7]

N., Simões, A., Cardoso, C

Gypscie: A Cross-Platform AI Artifact Management System 37 14.da Sil v a, D. N., Simões, A., Cardoso, C. L. S., de Oliveira, D. E. M., Rittmeyer, J. G., Wehmuth, K., Lustosa, H., Pereira, R. S., Souto, Y. M., and Vignoli, L. E.A conceptual vision toward the management of machine learning models. InER Forum and Poster & Demos Session(2019), vol. 2469 ofCEU...

work page 2019

[8] [8]

P.Automated capture of experiment context for easier reproducibil- ity in computational research.Computing in Science and Engineering 14, 4 (2012), 48 –

15.Da vison, A. P.Automated capture of experiment context for easier reproducibil- ity in computational research.Computing in Science and Engineering 14, 4 (2012), 48 –

work page 2012

[9] [9]

M., and Stonebraker, M

18.Hellerstein, J. M., and Stonebraker, M. R.Predicate Migration: Opti- mizing Queries with Expensive Predicates.SIGMOD Record 22, 2 (1993), 267 –

work page 1993

[10] [10]

R., Polikar, R., and Cha wla, N

19.Hoens, T. R., Polikar, R., and Cha wla, N. V.Learning from streaming data with concept drift and imbalance: An overview.Progress in Artificial Intelligence 1, 1 (2012), 89 –

work page 2012

[11] [11]

D., Gutierrez, C., Kirrane, S., Gayo, J

20.Hogan, A., Blomqvist, E., Cochez, M., D’amato, C., Melo, G. D., Gutierrez, C., Kirrane, S., Gayo, J. E. L., Na vigli, R., Neumaier, S., Ngomo, A.-C. N., Polleres, A., Rashid, S. M., Rula, A., Schmelzeisen, L., Sequeda, J., Staab, S., and Zimmermann, A.Knowledge Graphs.ACM Comput. Surv. 54, 4 (July 2021), 71:1–71:37. 21.Hopsworks. Hopsworks: The AI Fact...

work page 2021

[12] [12]

O., Strüber, D

23.Idowu, S. O., Strüber, D. G., and Berger, T.Asset Management in Machine Learning: State-of-research and State-of-practice.ACM Computing Surveys 55, 7 (2023). 24.IDSIA. Sacred: a tool to help configure, organize, log and reproduce experiments. Tech. rep., https://github.com/IDSIA/sacred/,

work page 2023

[13] [13]

InProceedings - International Conference on Distributed Computing Systems(2017), pp

25.Ismail, M., Gebremeskel, E., Kakantousis, T., Berthou, G., and Dowl- ing, J.Hopsworks: Improving User Experience and Development on Hadoop with Scalable, Strongly Consistent Metadata. InProceedings - International Conference on Distributed Computing Systems(2017), pp. 2525 –

work page 2017

[14] [14]

Streamlining ML Training in Kubernetes: An MLOps Architecture with Kubeflow

26.Korontanis, I., Zacharia, A., Makris, A., Pateraki, M., and Tserpes, K. Streamlining ML Training in Kubernetes: An MLOps Architecture with Kubeflow. InIOT 2025 - Proceedings of the 15th International Conference on the Internet of Things 2025(New York, NY, USA, 2025), IOT 2025, Association for Computing Machinery, pp. 267 –

work page 2025

[15] [15]

S., and Deshpande, A.ModelHub: Deep Learn- ing Lifecycle Management

29.Miao, H., Li, A., Da vis, L. S., and Deshpande, A.ModelHub: Deep Learn- ing Lifecycle Management. In2017 IEEE 33rd International Conference on Data Engineering (ICDE)(Apr. 2017), pp. 1393–1394. 30.Miao, X., Wu, Y., Chen, L., Gao, Y., and Yin, J.An Experimental Survey of Missing Data Imputation Algorithms.IEEE Transactions on Knowledge and Data Engineer...

work page 2017

[16] [16]

Porto et al

38 F. Porto et al. 31.Moraes, G., Porto, F., Ulliana, F., Baget, J.-F., Leclère, M., Bisquert, P., Gonçal ves, B., and V alduriez, P.Gypscie-KG: Building a Logic-Based Approach for Knowledge Graph Data Integration View in ML Systems | Anais Es- tendidosdoSimpósioBrasileirodeBancodeDados(SBBD). InSimpósio Brasileiro de Banco de Dados (SBBD)(Sept. 2025), SB...

work page 2025

[17] [17]

A., Khoonsari, P

33.Novella, J. A., Khoonsari, P. E., Herman, S., Whitenack, D., Capuccini, M., Burman, J., Kultima, K., and Spjuth, O.Container-based bioinformatics with Pachyderm.Bioinformatics 35, 5 (2019), 839 –

work page 2019

[18] [18]

34.Ogasa w ara, E., de Oliveira, D., V alduriez, P., Dias, J., Porto, F., and Mattoso, M.An algebraic approach for data-centric scientific workflows.Pro- ceedings of the VLDB Endowment 4, 12 (2011), 1328 –

work page 2011

[19] [19]

S., Souto, Y

36.Pereira, R. S., Souto, Y. M., Cha ves, A., Zorilla, R., Tsan, B., Rusu, F., Ogasa w ara, E. S., Ziviani, A., and Porto, F. A. M.DJEnsemble: A Cost-Based Selection and Allocation of a Disjoint Ensemble of Spatio-Temporal Models. InACM International Conference Proceeding Series(2021), pp. 226 –

work page 2021

[20] [20]

InACM SIGMOD Joint International Work- shop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)(2022)

37.Ramusat, Y., Maniu, S., and Senellart, P.Efficient provenance-aware query- ing of graph databases with datalog. InACM SIGMOD Joint International Work- shop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)(2022). 38.Schlegel, M., and Sattler, K. U.Management of Machine Learning Lifecycle Artifacts: A Survey.SIGMOD...

work page 2022

[21] [21]

G., Lourenco, V

39.Souza, R., Azevedo, L. G., Lourenco, V. N., Soares, E. F. D. S., Thi- ago, R. M., Brandão, R. R. M., Civitarese, D. S., Vital-Brazil, E. A., Moreno, M. F., and V alduriez, P.Workflow provenance in the lifecycle of sci- entific machine learning.Concurrency and Computation: Practice and Experience 34, 14 (2022). 40.Sparks, E. R., Venkataraman, S., Kaftan...

work page 2022

[22] [22]

K.RDMAreadbasedrendezvous protocol for MPI over InfiniBand: design alternatives and benefits

41.Sur, S., Jin, H.-W., Chai, L., and Panda, D. K.RDMAreadbasedrendezvous protocol for MPI over InfiniBand: design alternatives and benefits. InProceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of paral- lel programming(New York, NY, USA, Mar. 2006), PPoPP ’06, Association for Computing Machinery, pp. 32–39. 42.V artak, M., Subr...

work page 2006

[23] [23]

HELIX: Holistic optimization for accelerating iterative machine learning

Gypscie: A Cross-Platform AI Artifact Management System 39 45.Xin, D., Macke, S., Ma, L., Liu, J., Song, S., and Paramesw aran, A. HELIX: Holistic optimization for accelerating iterative machine learning. InPro- ceedings of the VLDB Endowment(2018), vol. 12, VLDB Endowment, pp. 446 –

work page 2018

[24] [24]

InProceedings of the 28th ACM International Conference on Multimedia(New York, NY, USA, Oct

47.Zhang, H., Li, Y., Huang, Y., Wen, Y., Yin, J., and Guan, K.MLModelCI: AnAutomaticCloudPlatformforEfficientMLaaS. InProceedings of the 28th ACM International Conference on Multimedia(New York, NY, USA, Oct. 2020), MM ’20, Association for Computing Machinery, pp. 4453–4456

work page 2020