AI4EOSC: a Federated Cloud Platform for Artificial Intelligence in Scientific Research
Pith reviewed 2026-05-16 21:29 UTC · model grok-4.3
The pith
AI4EOSC is a federated open-source platform that runs the full AI/ML lifecycle inside the European Open Science Cloud using modular architecture and built-in FAIR metadata.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AI4EOSC is a federated, open-source platform that operationalizes the full AI/ML lifecycle within the EOSC ecosystem. It uses a modular and distributed architecture that includes an AI development platform, a serverless AI-as-a-Service layer, and a federated orchestration model to integrate heterogeneous compute and storage resources. The platform enforces FAIR principles by standardizing metadata with MLDCAT-AP and tracking provenance according to W3C PROV through an integrated CI/CD pipeline. Its value is demonstrated by consistent deployments across heterogeneous cloud providers and validation through scientific cases that show reduced manual burden and improved reproducibility.
What carries the argument
Modular distributed architecture with federated orchestration that pulls together heterogeneous resources while enforcing MLDCAT-AP metadata and W3C PROV provenance tracking.
Load-bearing premise
The assumption that the modular architecture can integrate compute and storage resources from different providers in a consistent and seamless way without major compatibility or performance problems.
What would settle it
A documented case where an AI workflow fails to deploy or shows large performance differences when run on resources from two or more distinct cloud providers in the same installation.
Figures
read the original abstract
The rapid growth of Artificial Intelligence and Machine Learning in scientific research has highlighted a gap between industry-standard MLOps tools and platforms, and the unique requirements of modern and Open Science, particularly regarding the FAIR (Findable, Accessible, Interoperable, and Reusable) principles. This paper presents AI4EOSC, a federated, open-source platform designed to operationalize the full AI/ML lifecycle within the European Open Science Cloud (EOSC) ecosystem. Our methodology tackles the fragmentation of distributed research infrastructures by integrating a modular and distributed architecture comprising an AI development platform, a serverless AI-as-a-Service layer, and a federated orchestration model that is able to integrate heterogeneous compute and storage resources from distributed e-Infrastructures. AI4EOSC also introduces a ``FAIR-by-design'' approach that enforces metadata standardization (via MLDCAT-AP) and W3C PROV-compliant provenance tracking through a platform-integrated CI/CD pipeline. AI4EOSC added value is demonstrated through the delivery of a diverse set of community installations, showing consistent and seamless deployment across heterogeneous cloud providers. These installations are validated by a set of scientific cases, showing how our work reduces the manual burden on researchers while ensuring high levels of reproducibility and interoperability and providing an unified environment for development, training, and production of AI/ML models in the EOSC.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents AI4EOSC, a federated open-source platform for operationalizing the full AI/ML lifecycle within the European Open Science Cloud (EOSC). It integrates a modular distributed architecture (AI development platform, serverless AI-as-a-Service layer, federated orchestration for heterogeneous compute/storage resources), a FAIR-by-design approach enforcing MLDCAT-AP metadata and W3C PROV provenance via CI/CD pipelines, and demonstrates value through community installations and scientific use cases that reduce researcher burden while improving reproducibility and interoperability.
Significance. If the seamless integration of heterogeneous resources holds, the work would address fragmentation in distributed research infrastructures and provide a standards-compliant environment for AI in open science, advancing reproducibility and FAIR compliance within EOSC. The open-source nature and emphasis on provenance tracking represent practical strengths for community adoption.
major comments (1)
- [Abstract and validation description] The central claim of 'consistent and seamless deployment across heterogeneous cloud providers' (Abstract) rests on descriptive accounts of installations and cases without quantitative metrics such as deployment success rates, performance overhead, compatibility failure modes, or cross-provider variance. This leaves the assumption of robust federated orchestration unverified by measurement.
minor comments (1)
- [Abstract] The abstract is information-dense; separating the architecture description from the validation claims would improve readability.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation of AI4EOSC's significance and for the constructive feedback on validation. We address the major comment below and describe the revisions planned for the manuscript.
read point-by-point responses
-
Referee: [Abstract and validation description] The central claim of 'consistent and seamless deployment across heterogeneous cloud providers' (Abstract) rests on descriptive accounts of installations and cases without quantitative metrics such as deployment success rates, performance overhead, compatibility failure modes, or cross-provider variance. This leaves the assumption of robust federated orchestration unverified by measurement.
Authors: We acknowledge that the current manuscript validates the federated orchestration primarily through descriptive accounts of successful community installations across heterogeneous providers and their use in scientific cases. These demonstrate operational functionality and seamlessness in practice, but we agree that the absence of explicit quantitative metrics (e.g., success rates, overhead, or variance) leaves the robustness claim less strongly evidenced than it could be. In the revised manuscript we will add a dedicated subsection under validation reporting available quantitative indicators from our deployment logs and CI/CD pipelines, including deployment success rates, average provisioning times, and any observed compatibility notes across providers. This will provide measurable support while preserving the paper's primary focus on architecture, serverless integration, and FAIR-by-design provenance. revision: yes
Circularity Check
No circularity: platform description and empirical validation are self-contained
full rationale
The paper presents a descriptive account of a federated platform architecture, its modular components, serverless layer, orchestration model, and FAIR-by-design metadata approach. Validation rests on reported community installations and scientific use cases rather than any mathematical derivation, fitted parameters, or predictions. No equations, self-definitional constructs, or load-bearing self-citations appear in the derivation chain; the central claims concern design choices and deployment feasibility, which do not reduce to their own inputs by construction. This is the expected non-finding for an engineering/platform paper without quantitative modeling.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
federated orchestration model that is able to integrate heterogeneous compute and storage resources
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
doi:10.48550/arXiv.2502.03544 , url =
Y . Chervonyi, T. H. Trinh, M. Olšák, X. Yang, H. Nguyen, M. Menegali, J. Jung, V . Verma, Q. V . Le, T. Luong, Gold- medalist performance in solving olympiad geometry with al- phageometry2, arXiv preprint arXiv:2502.03544 (2025)
- [2]
-
[3]
L. Espeholt, S. Agrawal, C. Sønderby, M. Kumar, J. Heek, C. Bromberg, C. Gazen, R. Carver, M. Andrychowicz, J. Hickey, et al., Deep learning for twelve hour precipitation forecasts, Nature communications 13 (2022) 5145
work page 2022
-
[4]
H. O. Khogali, S. Mekid, The blended future of automation and ai: Examining some long-term societal and ethical impact features, Technology in Society 73 (2023) 102232. URL:https://www.sciencedirect.com/ science/article/pii/S0160791X23000374. doi:https: //doi.org/10.1016/j.techsoc.2023.102232
-
[5]
A. Sergeyuk, Y . Golubev, T. Bryksin, I. Ahmed, Using ai- based coding assistants in practice: State of affairs, percep- tions, and ways forward, Information and Software Tech- nology 178 (2025) 107610. URL:http://dx.doi.org/10. 1016/j.infsof.2024.107610. doi:10.1016/j.infsof. 2024.107610
- [6]
-
[7]
A. Calatrava, H. Asorey, J. Astalos, A. Azevedo, F. Ben- incasa, I. Blanquer, M. Bobak, F. Brasileiro, L. Codó, L. del Cano, B. Esteban, M. Ferret, J. Handl, T. Kerzen- macher, V . Kozlov, A. K ˇrenek, R. Martins, M. Pavesio, A. J. Rubio-Montero, J. Sánchez-Ferrero, A survey of the european open science cloud services for ex- panding the capacity and capa...
-
[8]
EOSC Association, European Commission, The eosc part- nership monitoring framework v7.0,https://eosc.eu/ wp-content/uploads/2024/08/20240621_Draft_ EOSC_Monitoring_Framework_v7_Final_Clean.pdf,
work page 2024
-
[9]
Version 7.0, Final Clean Draft
-
[10]
M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Apple- ton, M. Axton, A. Baak, N. Blomberg, J.-W. Boiten, L. B. da Silva Santos, P. E. Bourne, et al., The fair guiding principles for scientific data management and stewardship, Scientific data 3 (2016) 1–9
work page 2016
-
[11]
W. Ouyang, F. Beuttenmueller, E. Gómez-de Mariscal, C. Pape, T. Burke, C. Garcia-López-de Haro, C. Russell, L. Moya-Sans, C. De-La-Torre-Gutiérrez, D. Schmidt, et al., Bioimage model zoo: a community-driven resource for ac- cessible deep learning in bioimage analysis, BioRxiv (2022) 2022–06
work page 2022
-
[12]
Žiga Avsec, R. Kreuzhuber, J. Israeli, N. Xu, J. Cheng, A. Shrikumar, A. Banerjee, D. S. Kim, L. Urban, A. Kun- daje, O. Stegle, J. Gagneur, Kipoi: accelerating the commu- nity exchange and reuse of predictive models for genomics, bioRxiv (2018). URL:https://api.semanticscholar. org/CorpusID:92535930
work page 2018
-
[13]
T. Wolf, L. Debut, V . Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, C. Ma, Y . Jernite, J. Plu, C. Xu, T. Le Scao, S. Gugger, M. Drame, Q. Lhoest, A. M. Rush, Transform- ers: State-of-the-Art Natural Language Processing, in: ACL Anthology, Association for Computational Linguistics, 2020, pp. 38–45. URL:https://www.aclweb.org/anthology/ 2020.emnlp-demos.6
work page 2020
-
[14]
Replicate,https://replicate.com/, 2025. [Accessed: 2025-08-21]
work page 2025
-
[15]
L. Berberi, V . Kozlov, G. Nguyen, J. Sáinz-Pardo Díaz, A. Calatrava, G. Moltó, V . Tran, Á. López García, Ma- chine learning operations landscape: platforms and tools, Artificial Intelligence Review 58 (2025) 167. doi:10.1007/ s10462-025-11164-3
work page 2025
-
[16]
M. Zaharia, A. Chen, A. Davidson, A. Ghodsi, S. A. Hong, A. Konwinski, S. Murching, T. Nykodym, P. Ogilvie, M. Parkhe, et al., Accelerating the machine learning lifecycle with mlflow., IEEE Data Eng. Bull. 41 (2018) 39–45
work page 2018
-
[17]
V . Jayasiri, N. Wijerathne, A. Narasinghe, L. Nishshanke, labml.ai: A library to organize machine learning experiments,
-
[18]
URL:https://labml.ai/
- [19]
- [20]
- [21]
-
[22]
M. Tkachenko, M. Malyuk, A. Holmanyuk, N. Liubimov, La- bel Studio: Data labeling software, 2020-2025. URL:https: //github.com/HumanSignal/label-studio
work page 2020
-
[23]
H. Nakayama, T. Kubo, J. Kamura, Y . Taniguchi, X. Liang, doccano: Text Annotation Tool for Human, 2018. URL: https://github.com/doccano/doccano
work page 2018
-
[24]
Kern AI, Refinery,https://github.com/code-kern-ai/ refinery, 2025. [Accessed: 2025-08-21]
work page 2025
- [25]
- [26]
-
[27]
Fxhenn: Fpga-based acceleration framework for homomorphic encrypted cnn inference,
CV AT.ai Corporation, Computer Vision Annotation Tool (CV AT), 2024. URL:https://doi.org/10.5281/zenodo. 12771595. doi:10.5281/zenodo.12771595
- [28]
-
[29]
Alphabet, Google Vertex AI,https://cloud.google.com/ vertex-ai, 2025. [Accessed: 2025-08-21]. 12
work page 2025
-
[30]
Amazon, Amazon Sagemaker,https://aws.amazon.com/ sagemaker/, 2025. [Accessed: 2025-08-21]
work page 2025
-
[31]
Microsoft, Azure AI Machine Learning Studio,https://ml. azure.com/, 2025. [Accessed: 2025-08-21]
work page 2025
-
[32]
Polyaxon, Polyaxon,https://polyaxon.com/, 2025. [Ac- cessed: 2025-08-21]
work page 2025
-
[33]
Á. L. García, J. M. De Lucas, M. Antonacci, W. Zu Castell, M. David, M. Hardt, L. L. Iglesias, G. Moltó, M. Plociennik, V . Tran, et al., A cloud-based framework for machine learn- ing workloads and applications, IEEE access 8 (2020) 18681– 18692
work page 2020
-
[34]
A. e. a. Costantini, A cloud-edge orchestration platform for the innovative industrial scenarios of the iotwins project, Gervasi, O., et al. Computational Science and Its Appli- cations – ICCSA 2021. ICCSA 2021. Lecture Notes in Computer Science 12950 (2021). doi:https://doi.org/10. 1007/978-3-030-86960-1_37
work page 2021
-
[35]
D. Salomoni, I. Campos, L. Gaido, J. M. de Lucas, P. Solagna, J. Gomes, L. Matyska, P. Fuhrman, M. Hardt, G. Donvito, et al., Indigo-datacloud: A platform to facilitate seamless ac- cess to e-infrastructures, Journal of Grid Computing 16 (2018) 381–408
work page 2018
-
[36]
M. Caballer, I. Blanquer, G. Moltó, C. de Alfonso, Dy- namic management of virtual infrastructures, Jour- nal of Grid Computing 13 (2015) 53–70. URL:http: //link.springer.com/10.1007/s10723-014-9296-5. doi:10.1007/s10723-014-9296-5
-
[37]
M. Caballer, G. Moltó, A. Calatrava, I. Blanquer, Infrastructure manager: A tosca-based orchestrator for the computing contin- uum, Journal of Grid Computing 21 (2023) 51. URL:https: //link.springer.com/10.1007/s10723-023-09686-7. doi:10.1007/s10723-023-09686-7
-
[38]
L. Giommi, G. Savarese, G. Vino, D. Ranieri, A. Costantini, G. Donvito, Improving the cloud provider ranking in the indigo paas orchestration system using ai techniques, Ger- vasi, O., et al. Computational Science and Its Applications – ICCSA 2025 Workshops. ICCSA 2025. Lecture Notes in Computer Science 15886 (2025). doi:https://doi.org/10. 1007/978-3-031...
work page 2025
-
[39]
Proceedings of Science 476, 1002 (2025) https://doi.org/10.22323/1
G. Savarese, M. Antonacci, L. Giommi, Federation-registry: the renovated configuration management database for dynamic cloud federation, PoS ISGC2024 (2024). doi:10.22323/1. 458.0021
work page doi:10.22323/1 2024
-
[40]
S. Ramírez, FastAPI, 2025. URL:https://github.com/ fastapi/fastapi
work page 2025
-
[41]
Neo4j,https://github.com/neo4j/neo4j, 2025. [Ac- cessed 12-08-2025]
work page 2025
-
[42]
Pallets Team, Flask,https://github.com/pallets/ flask, 2025. [Accessed 12-08-2025]
work page 2025
- [43]
-
[44]
D. Palma, M. Rutkowski, T. Spatzier, TOSCA Simple Profile in Y AML Version 1.0, OASIS Committee Spec- ification Draft 04/Public Review Draft 01, OASIS, 2015-08-27. URL:http://docs.oasis-open.org/ tosca/TOSCA-Simple-Profile-YAML/v1.0/csprd01/ TOSCA-Simple-Profile-YAML-v1.0-csprd01.html
work page 2015
-
[45]
Red Hat, Ansible,https://docs.ansible.com/, 2025. [Accessed 12-08-2025]
work page 2025
-
[46]
com/consul?product_intent=consul, 2025
HashiCorp, Consul,https://developer.hashicorp. com/consul?product_intent=consul, 2025. [Accessed 12-08-2025]
work page 2025
-
[47]
com/consul?product_intent=nomad, 2025
HashiCorp, Nomad,https://developer.hashicorp. com/consul?product_intent=nomad, 2025. [Accessed 12-08-2025]
work page 2025
-
[48]
A. Abid, A. Abdalla, A. Abid, D. Khan, A. Alfozan, J. Zou, Gradio: Hassle-free sharing and testing of ml models in the wild, arXiv preprint arXiv:1906.02569 (2019)
work page internal anchor Pith review Pith/arXiv arXiv 1906
-
[49]
Traefik Labs, Traefik,https://traefik.io/traefik/,
-
[51]
Boyer, Docuum,https://github.com/stepchowfun/ docuum, 2025
S. Boyer, Docuum,https://github.com/stepchowfun/ docuum, 2025. [Accessed 12-08-2025]
work page 2025
- [52]
-
[53]
GRyCAP Team, OSCAR Documentation: Introduction and User Guide,https://docs.oscar.grycap.net/latest/,
-
[54]
[Accessed: 2025-08-21]
work page 2025
- [55]
-
[56]
OpenJS Foundation & Contributors, Node-RED,https:// nodered.org/, 2025. [Accessed 12-08-2025]
work page 2025
-
[57]
FlowFuse Inc, FlowFuse,https://flowfuse.com/, 2025. [Accessed 12-08-2025]
work page 2025
-
[58]
Elyra AI, Elyra,https://github.com/elyra-ai/elyra,
-
[60]
com/kubernetes/kube-state-metrics, 2025
Kubernetes Team, Kube State Metrics,https://github. com/kubernetes/kube-state-metrics, 2025. [Accessed 12-08-2025]
work page 2025
- [61]
- [62]
-
[63]
Cookiecutter,https://www.cookiecutter.io/, 2025. [Accessed 12-08-2025]
work page 2025
-
[64]
Á. L. García, Deepaas api: A rest api for machine learning and deep learning models, Journal of Open Source Software 4 (2019) 1517
work page 2019
-
[65]
JSON Schema,https://json-schema.org/, 2025. [Ac- cessed 12-08-2025]. 13
work page 2025
-
[66]
SEMIC EU, MLDCAT-AP,https://semiceu.github.io/ MLDCAT-AP/releases/2.0.0/, 2025. [Accessed 12-08- 2025]
work page 2025
- [67]
-
[68]
tox,https://github.com/tox-dev/tox, 2025. [Accessed 12-08-2025]
work page 2025
-
[69]
G. van Rossum, B. Warsaw, N. Coghlan, Style Guide for Python Code, PEP 8, Python Software Foundation, 2001. URL:https://www.python.org/dev/peps/pep-0008/
work page 2001
- [70]
-
[71]
Bandit,https://github.com/PyCQA/bandit, 2025. [Ac- cessed 12-08-2025]
work page 2025
-
[72]
European Organization For Nuclear Research, OpenAIRE, Zenodo, 2013. URL:https://www.zenodo.org/. doi:10. 25495/7GXK-RD71
work page 2013
- [73]
-
[74]
PyTorch: An Imperative Style, High-Performance Deep Learning Library
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Z. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chin- tala, Pytorch: An imperative style, high-performance deep learning library, CoRR abs/1912.01703 (2019). URL:http: //arxiv.o...
work page internal anchor Pith review Pith/arXiv arXiv 1912
-
[75]
Microsoft, VSCode,https://code.visualstudio.com/,
-
[77]
Coder, Code Server,https://github.com/coder/ code-server, 2025. [Accessed 12-08-2025]
work page 2025
-
[78]
Jupyter, JupyterLab,https://jupyter.org/, 2025. [Ac- cessed 12-08-2025]
work page 2025
-
[79]
G. Nguyen, J. Sáinz-Pardo Díaz, A. Calatrava, L. Berberi, O. Lytvyn, V . Kozlov, V . Tran, G. Moltó, Á. López García, Landscape of machine learning evolution: privacy-preserving federated learning frameworks and tools, Artificial Intelligence Review 58 (2025) 51. doi:10.1007/s10462-024-11036-2
-
[80]
D. J. Beutel, T. Topal, A. Mathur, X. Qiu, J. Fernandez- Marques, Y . Gao, L. Sani, K. H. Li, T. Parcollet, P. P. B. de Gusmão, et al., Flower: A friendly federated learning re- search framework, arXiv preprint arXiv:2007.14390 (2020)
work page internal anchor Pith review arXiv 2007
- [81]
- [82]
- [83]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.