Gypscie: A Cross-Platform AI Artifact Management System
Pith reviewed 2026-05-10 15:16 UTC · model grok-4.3
The pith
Gypscie uses a knowledge graph to give a single unified view of AI artifacts and schedule their workflows across platforms.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Gypscie is a cross-platform AI artifact management system realized through a knowledge graph that captures application semantics and a rule-based query language that supports reasoning over data and models. Model lifecycle activities are represented as high-level dataflows that can be scheduled across multiple platforms such as servers, cloud platforms, or supercomputers. Gypscie also records provenance information about the artifacts it produces, thereby enabling explainability. Its qualitative comparison with representative AI systems shows broader functionality across the AI artifact lifecycle, and its experimental evaluation demonstrates successful optimization and scheduling of dataflow
What carries the argument
The knowledge graph that encodes AI artifact semantics together with the rule-based query language used for reasoning and dataflow scheduling.
If this is right
- Developers can write AI workflows once at a high level and let the system choose and optimize the execution platforms.
- The same artifacts and dataflows become portable across servers, clouds, and supercomputers without rewriting platform-specific code.
- Provenance stored in the graph supplies an auditable record of how each model or dataset was produced.
- A single system can cover more stages of the AI lifecycle than tools specialized for only training, only deployment, or only monitoring.
Where Pith is reading between the lines
- The semantic layer could reduce the cost of moving AI work between research groups or between commercial providers.
- If the graph grows large, query performance and maintenance effort become practical limits on adoption.
- Integration hooks to popular ML frameworks would be needed before teams can use Gypscie without changing their existing pipelines.
Load-bearing premise
A knowledge graph plus rule-based query language can capture the semantics of diverse AI artifacts and platforms sufficiently to enable effective cross-platform scheduling and reasoning without major information loss or performance penalties.
What would settle it
A concrete multi-platform dataflow that Gypscie either cannot schedule at all or schedules incorrectly because required semantic details about an artifact or platform are missing from the knowledge graph.
Figures
read the original abstract
Artificial Intelligence (AI) models, encompassing both traditional machine learning (ML) and more advanced approaches such as deep learning and large language models (LLMs), play a central role in modern applications. AI model lifecycle management involves the end-to-end process of managing these models, from data collection and preparation to model building, evaluation, deployment, and continuous monitoring. This process is inherently complex, as it requires the coordination of diverse services that manage AI artifacts such as datasets, dataflows, and models, all orchestrated to operate seamlessly. In this context, it is essential to isolate applications from the complexity of interacting with heterogeneous services, datasets, and AI platforms. In this paper, we introduce Gypscie, a cross-platform AI artifact management system. By providing a unified view of all AI artifacts, the Gypscie platform simplifies the development and deployment of AI applications. This unified view is realized through a knowledge graph that captures application semantics and a rule-based query language that supports reasoning over data and models. Model lifecycle activities are represented as high-level dataflows that can be scheduled across multiple platforms, such as servers, cloud platforms, or supercomputers. Finally, Gypscie records provenance information about the artifacts it produces, thereby enabling explainability. Our qualitative comparison with representative AI systems shows that Gypscie supports a broader range of functionalities across the AI artifact lifecycle. Our experimental evaluation demonstrates that Gypscie can successfully optimize and schedule dataflows on AI platforms from an abstract specification.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Gypscie, a cross-platform AI artifact management system that provides a unified view of AI artifacts (datasets, dataflows, models) via a knowledge graph capturing application semantics and a rule-based query language supporting reasoning. Lifecycle activities are represented as high-level dataflows schedulable across heterogeneous platforms (servers, cloud, supercomputers), with provenance recording for explainability. The authors claim, via qualitative comparison, that Gypscie supports a broader range of functionalities across the AI artifact lifecycle than representative systems, and via experimental evaluation, that it can successfully optimize and schedule dataflows from an abstract specification.
Significance. If the core claims hold, the work addresses a genuine need for interoperability in AI artifact management by abstracting away platform heterogeneity through semantic modeling and automated scheduling. The combination of knowledge graphs with rule-based reasoning and provenance tracking could improve explainability and reduce development overhead for complex, multi-platform AI applications. However, the absence of concrete evaluation details limits assessment of whether the approach delivers practical gains without substantial information loss or performance overhead.
major comments (2)
- [Experimental Evaluation] Experimental Evaluation section: The claim that 'Gypscie can successfully optimize and schedule dataflows on AI platforms from an abstract specification' is load-bearing for the central contribution, yet the section provides no information on the platforms tested, the concrete dataflow specifications used, optimization metrics (e.g., makespan, resource usage, success rate), baselines, or failure modes. This prevents verification that the results support the claim.
- [Qualitative Comparison] Qualitative Comparison section: The assertion that Gypscie 'supports a broader range of functionalities across the AI artifact lifecycle' is central to the significance argument, but the section does not define the comparison criteria, name the representative AI systems, or detail which lifecycle stages (data collection, model building, deployment, monitoring) were assessed and how. This leaves the broader-functionality claim unsubstantiated.
minor comments (2)
- The abstract and introduction refer to 'representative AI systems' without naming them; explicitly list the systems and the functionality matrix in the comparison section.
- Provide at least one concrete example of the knowledge-graph schema and a sample rule from the query language to illustrate how artifact semantics are captured and reasoned over.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review of our manuscript on Gypscie. The comments highlight important areas where additional clarity and detail will strengthen the paper, and we address each point below.
read point-by-point responses
-
Referee: [Experimental Evaluation] Experimental Evaluation section: The claim that 'Gypscie can successfully optimize and schedule dataflows on AI platforms from an abstract specification' is load-bearing for the central contribution, yet the section provides no information on the platforms tested, the concrete dataflow specifications used, optimization metrics (e.g., makespan, resource usage, success rate), baselines, or failure modes. This prevents verification that the results support the claim.
Authors: We agree that the Experimental Evaluation section requires substantially more detail to allow verification of the central claim. In the revised manuscript, we will expand the section to specify the platforms tested (including local servers, cloud instances on AWS, and supercomputers), the concrete dataflow specifications used as input, the optimization metrics evaluated (makespan, resource usage, and success rate), the baseline systems employed for comparison, and any observed failure modes with corresponding mitigation strategies. These additions will directly support the claim that Gypscie can optimize and schedule dataflows from an abstract specification. revision: yes
-
Referee: [Qualitative Comparison] Qualitative Comparison section: The assertion that Gypscie 'supports a broader range of functionalities across the AI artifact lifecycle' is central to the significance argument, but the section does not define the comparison criteria, name the representative AI systems, or detail which lifecycle stages (data collection, model building, deployment, monitoring) were assessed and how. This leaves the broader-functionality claim unsubstantiated.
Authors: We concur that the Qualitative Comparison section needs explicit definitions and details to substantiate the claim. In the revised version, we will define the comparison criteria (such as coverage of lifecycle stages, cross-platform scheduling support, provenance capabilities, and reasoning features), name the specific representative AI systems evaluated (e.g., MLflow, Kubeflow, and DVC), and provide a stage-by-stage breakdown of the AI artifact lifecycle (data collection, model building, deployment, and monitoring) with explanations of how Gypscie offers broader functionality in each area compared to the baselines. revision: yes
Circularity Check
No significant circularity
full rationale
The paper describes the design and implementation of the Gypscie system, including its use of a knowledge graph and rule-based query language for AI artifact management and dataflow scheduling. It supports claims via qualitative comparison to other systems and experimental evaluation of scheduling from abstract specifications. No mathematical derivations, fitted parameters presented as predictions, self-definitional constructs, or load-bearing self-citations appear in the provided text. The central functionality claims rest on external evaluation rather than reducing to inputs by construction, making the work self-contained against the listed circularity patterns.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption AI artifacts (datasets, dataflows, models) and their lifecycle relationships can be represented and reasoned over using a knowledge graph and rule-based query language
- domain assumption High-level abstract dataflow specifications can be successfully optimized and scheduled on heterogeneous AI platforms
invented entities (1)
-
Gypscie platform
no independent evidence
Reference graph
Works this paper leans on
-
[1]
1.Akbarinia, R., Botella, C., Joly, A., Masseglia, F., Mattoso, M., Oga- sa w ara, E., de Oliveira, D., Pacitti, E., Porto, F., Pradal, C., Shasha, D., and V alduriez, P.Life Science Workflow Services (LifeSWS): Motivations and Architecture.Transactions on Large Scale Data and Knowledge Centered Sys- tems 55(2023), 1–24. 2.ArangoDB. ArangoML Pipeline. Tec...
work page 2023
-
[2]
InBDA 2023 - 39.Conférence sur la Gestion de Données – Principes, Technologies et Applications(Oct
3.Baget, J.-F., Bisquert, P., Leclère, M., Mugnier, M.-L., Pérution-Kihli, G., Tornil, F., and Ulliana, F.InteGraal: a Tool for Data-Integration and Reasoning on Heterogeneous and Federated Sources. InBDA 2023 - 39.Conférence sur la Gestion de Données – Principes, Technologies et Applications(Oct. 2023). 36 F. Porto et al. 4.Bala, B., and Behal, S.A Brief...
work page 2023
-
[3]
Y., Haque, Z., Haykal, S., Ispir, M., Jain, V., Koc, L., Koo, C
7.Baylor, D., Breck, E., Cheng, H.-T., Fiedel, N., Foo, C. Y., Haque, Z., Haykal, S., Ispir, M., Jain, V., Koc, L., Koo, C. Y., Lew, L., Mew ald, C., Modi, A. N., Polyzotis, N., Ramesh, S., Roy, S., Whang, S. E., Wicke, M., Wilkiewicz, J., Zhang, X., and Zinkevich, M.TFX: A TensorFlow- Based Production-Scale Machine Learning Platform. InProceedings of the...
work page 2017
-
[4]
B.SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle
9.Boehm, M., Antonov, I., Baunsgaard, S., Dokter, M., Ginthör, R., In- nerebner, K., Klezin, F., Lindstaedt, S., Phani, A., Rath, B., Reinw ald, B., Siddiqi, S., and Wrede, S. B.SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle. InAnnu. Conf. Innovative Data Syst. Res., CIDR(2020), Conference on Innovative Data Syst...
work page 2020
-
[5]
11.Castro, R., Souto, Y. M., Ogasa w ara, E. S., Porto, F. A. M., and Bez- erra, E.STConvS2S: Spatiotemporal Convolutional Sequence to Sequence Net- work for weather forecasting.Neurocomputing 426(2021), 285 –
work page 2021
-
[6]
12.Chen, A., Chow, A., Da vidson, A., DCunha, A., Ghodsi, A., Hong, S. A., Konwinski, A., Mew ald, C., Murching, S., Nykodym, T., Ogil vie, P., Parkhe, M., Singh, A., Xie, F., Zaharia, M., Zang, R., Zheng, J., and Zu- mar, C.Developments in MLflow: A System to Accelerate the Machine Learning Lifecycle. InProceedings of the Fourth International Workshop on...
work page 2020
-
[7]
Gypscie: A Cross-Platform AI Artifact Management System 37 14.da Sil v a, D. N., Simões, A., Cardoso, C. L. S., de Oliveira, D. E. M., Rittmeyer, J. G., Wehmuth, K., Lustosa, H., Pereira, R. S., Souto, Y. M., and Vignoli, L. E.A conceptual vision toward the management of machine learning models. InER Forum and Poster & Demos Session(2019), vol. 2469 ofCEU...
work page 2019
-
[8]
15.Da vison, A. P.Automated capture of experiment context for easier reproducibil- ity in computational research.Computing in Science and Engineering 14, 4 (2012), 48 –
work page 2012
-
[9]
18.Hellerstein, J. M., and Stonebraker, M. R.Predicate Migration: Opti- mizing Queries with Expensive Predicates.SIGMOD Record 22, 2 (1993), 267 –
work page 1993
-
[10]
R., Polikar, R., and Cha wla, N
19.Hoens, T. R., Polikar, R., and Cha wla, N. V.Learning from streaming data with concept drift and imbalance: An overview.Progress in Artificial Intelligence 1, 1 (2012), 89 –
work page 2012
-
[11]
D., Gutierrez, C., Kirrane, S., Gayo, J
20.Hogan, A., Blomqvist, E., Cochez, M., D’amato, C., Melo, G. D., Gutierrez, C., Kirrane, S., Gayo, J. E. L., Na vigli, R., Neumaier, S., Ngomo, A.-C. N., Polleres, A., Rashid, S. M., Rula, A., Schmelzeisen, L., Sequeda, J., Staab, S., and Zimmermann, A.Knowledge Graphs.ACM Comput. Surv. 54, 4 (July 2021), 71:1–71:37. 21.Hopsworks. Hopsworks: The AI Fact...
work page 2021
-
[12]
23.Idowu, S. O., Strüber, D. G., and Berger, T.Asset Management in Machine Learning: State-of-research and State-of-practice.ACM Computing Surveys 55, 7 (2023). 24.IDSIA. Sacred: a tool to help configure, organize, log and reproduce experiments. Tech. rep., https://github.com/IDSIA/sacred/,
work page 2023
-
[13]
InProceedings - International Conference on Distributed Computing Systems(2017), pp
25.Ismail, M., Gebremeskel, E., Kakantousis, T., Berthou, G., and Dowl- ing, J.Hopsworks: Improving User Experience and Development on Hadoop with Scalable, Strongly Consistent Metadata. InProceedings - International Conference on Distributed Computing Systems(2017), pp. 2525 –
work page 2017
-
[14]
Streamlining ML Training in Kubernetes: An MLOps Architecture with Kubeflow
26.Korontanis, I., Zacharia, A., Makris, A., Pateraki, M., and Tserpes, K. Streamlining ML Training in Kubernetes: An MLOps Architecture with Kubeflow. InIOT 2025 - Proceedings of the 15th International Conference on the Internet of Things 2025(New York, NY, USA, 2025), IOT 2025, Association for Computing Machinery, pp. 267 –
work page 2025
-
[15]
S., and Deshpande, A.ModelHub: Deep Learn- ing Lifecycle Management
29.Miao, H., Li, A., Da vis, L. S., and Deshpande, A.ModelHub: Deep Learn- ing Lifecycle Management. In2017 IEEE 33rd International Conference on Data Engineering (ICDE)(Apr. 2017), pp. 1393–1394. 30.Miao, X., Wu, Y., Chen, L., Gao, Y., and Yin, J.An Experimental Survey of Missing Data Imputation Algorithms.IEEE Transactions on Knowledge and Data Engineer...
work page 2017
-
[16]
38 F. Porto et al. 31.Moraes, G., Porto, F., Ulliana, F., Baget, J.-F., Leclère, M., Bisquert, P., Gonçal ves, B., and V alduriez, P.Gypscie-KG: Building a Logic-Based Approach for Knowledge Graph Data Integration View in ML Systems | Anais Es- tendidosdoSimpósioBrasileirodeBancodeDados(SBBD). InSimpósio Brasileiro de Banco de Dados (SBBD)(Sept. 2025), SB...
work page 2025
-
[17]
33.Novella, J. A., Khoonsari, P. E., Herman, S., Whitenack, D., Capuccini, M., Burman, J., Kultima, K., and Spjuth, O.Container-based bioinformatics with Pachyderm.Bioinformatics 35, 5 (2019), 839 –
work page 2019
-
[18]
34.Ogasa w ara, E., de Oliveira, D., V alduriez, P., Dias, J., Porto, F., and Mattoso, M.An algebraic approach for data-centric scientific workflows.Pro- ceedings of the VLDB Endowment 4, 12 (2011), 1328 –
work page 2011
-
[19]
36.Pereira, R. S., Souto, Y. M., Cha ves, A., Zorilla, R., Tsan, B., Rusu, F., Ogasa w ara, E. S., Ziviani, A., and Porto, F. A. M.DJEnsemble: A Cost-Based Selection and Allocation of a Disjoint Ensemble of Spatio-Temporal Models. InACM International Conference Proceeding Series(2021), pp. 226 –
work page 2021
-
[20]
37.Ramusat, Y., Maniu, S., and Senellart, P.Efficient provenance-aware query- ing of graph databases with datalog. InACM SIGMOD Joint International Work- shop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)(2022). 38.Schlegel, M., and Sattler, K. U.Management of Machine Learning Lifecycle Artifacts: A Survey.SIGMOD...
work page 2022
-
[21]
39.Souza, R., Azevedo, L. G., Lourenco, V. N., Soares, E. F. D. S., Thi- ago, R. M., Brandão, R. R. M., Civitarese, D. S., Vital-Brazil, E. A., Moreno, M. F., and V alduriez, P.Workflow provenance in the lifecycle of sci- entific machine learning.Concurrency and Computation: Practice and Experience 34, 14 (2022). 40.Sparks, E. R., Venkataraman, S., Kaftan...
work page 2022
-
[22]
K.RDMAreadbasedrendezvous protocol for MPI over InfiniBand: design alternatives and benefits
41.Sur, S., Jin, H.-W., Chai, L., and Panda, D. K.RDMAreadbasedrendezvous protocol for MPI over InfiniBand: design alternatives and benefits. InProceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of paral- lel programming(New York, NY, USA, Mar. 2006), PPoPP ’06, Association for Computing Machinery, pp. 32–39. 42.V artak, M., Subr...
work page 2006
-
[23]
HELIX: Holistic optimization for accelerating iterative machine learning
Gypscie: A Cross-Platform AI Artifact Management System 39 45.Xin, D., Macke, S., Ma, L., Liu, J., Song, S., and Paramesw aran, A. HELIX: Holistic optimization for accelerating iterative machine learning. InPro- ceedings of the VLDB Endowment(2018), vol. 12, VLDB Endowment, pp. 446 –
work page 2018
-
[24]
InProceedings of the 28th ACM International Conference on Multimedia(New York, NY, USA, Oct
47.Zhang, H., Li, Y., Huang, Y., Wen, Y., Yin, J., and Guan, K.MLModelCI: AnAutomaticCloudPlatformforEfficientMLaaS. InProceedings of the 28th ACM International Conference on Multimedia(New York, NY, USA, Oct. 2020), MM ’20, Association for Computing Machinery, pp. 4453–4456
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.