DECICE: AI-Driven Scheduling and Digital Twin Integration for the Cloud-HPC-Edge Compute Continuum
Pith reviewed 2026-06-29 23:18 UTC · model grok-4.3
The pith
DECICE combines an RNN-based AI scheduler with a digital twin to map workloads across cloud, HPC, and edge systems while respecting constraints and tracking carbon intensity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The DECICE framework supplies an Integrated AI Scheduler (IAIS) that employs RNN-based prediction and formal workflow modeling for constraint-aware workload mapping, paired with a Digital Twin that aggregates real-time metrics with carbon intensity and anomaly prediction to support energy-aware scheduling; the system operates in Kubernetes environments, accepts unified workflow input from several formats, and bridges cloud-native and HPC orchestration through a Slurm integration layer.
What carries the argument
The Integrated AI Scheduler (IAIS) with RNN prediction plus formal workflow modeling, together with the Digital Twin that folds in carbon intensity and anomaly data for energy-aware decisions.
If this is right
- Workloads are placed with explicit respect for constraints through the combination of RNN forecasts and formal models.
- Scheduling decisions incorporate carbon intensity to favor lower-emission placements.
- A single Kubernetes-based system ingests workflows from multiple formats and routes them across cloud, HPC, and edge resources.
- HPC clusters become reachable from cloud orchestration via the Slurm integration layer.
Where Pith is reading between the lines
- Live carbon data inside the twin could let operators set explicit emission budgets rather than simple performance goals.
- Anomaly prediction may trigger proactive migration before performance degrades across the continuum.
- Open release of the code allows third parties to test the same scheduler logic on additional hardware mixes or prediction models.
Load-bearing premise
RNN-based prediction together with the digital twin will produce practically better scheduling outcomes than existing methods in real deployments across the compute continuum.
What would settle it
A controlled multi-site deployment that measures task completion time, energy consumption, and constraint violations when using DECICE versus standard Kubernetes or Slurm schedulers on the same workload mix.
Figures
read the original abstract
This paper presents the DECICE project (Device Edge Cloud Intelligent Collaboration framEwork), a Horizon Europe Research and Innovation Action (Grant No. 101092582, December 2022 to November 2025) that developed an open-source framework for intelligent workload scheduling across the cloud-HPC-edge compute continuum. A consortium of 12 partners across 6 European countries organized the work into six work packages covering AI-driven scheduling, digital twin infrastructure, system architecture and integration, monitoring, use case validation, and dissemination. The two core technical contributions are an Integrated AI Scheduler (IAIS) employing RNN-based prediction and formal workflow modeling for constraint-aware workload mapping, and a Digital Twin aggregating real-time metrics with carbon intensity and anomaly prediction for energy-aware scheduling. The framework operates within Kubernetes environments, supports unified workflow ingestion from multiple formats, and bridges cloud-native and HPC orchestration through a Slurm integration layer. We present the project vision, the overall architecture, contributions from each work package, quantitative evaluation results, and the open-source release.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper describes the DECICE project (Horizon Europe Grant 101092582), an open-source framework for AI-driven workload scheduling across the cloud-HPC-edge continuum. It outlines an Integrated AI Scheduler (IAIS) that combines RNN-based prediction with formal workflow modeling for constraint-aware mapping, a Digital Twin that aggregates real-time metrics, carbon intensity, and anomaly prediction for energy-aware decisions, Kubernetes-native operation with Slurm integration, unified workflow ingestion, and contributions from six work packages. The manuscript covers project vision, architecture, work-package results, quantitative evaluation, and open-source release.
Significance. Scheduling across heterogeneous compute continua with energy and constraint awareness is a relevant problem in distributed systems. A working implementation that demonstrably improves on baselines in makespan, energy, or constraint satisfaction could be useful for practitioners. The manuscript, however, functions primarily as a project overview rather than a self-contained technical contribution with novel algorithms or detailed empirical validation.
major comments (1)
- [Abstract and evaluation sections] Abstract and sections describing IAIS and the Digital Twin: the central claim that IAIS (RNN prediction + formal modeling) and the Digital Twin produce practically superior scheduling outcomes is unsupported because no RNN architecture, training corpus, prediction horizon, constraint-encoding method, anomaly-prediction technique, baselines (Kubernetes, Slurm, prior AI schedulers), or quantitative metrics (accuracy, makespan reduction, energy savings, statistical tests) are supplied.
minor comments (1)
- The manuscript would benefit from an explicit section that separates project-level description from the specific technical contributions and their evaluation.
Simulated Author's Rebuttal
We thank the referee for the constructive review and the recommendation for major revision. The manuscript is intended as a high-level overview of the DECICE project, its architecture, and work-package contributions rather than a self-contained algorithmic paper. We address the single major comment below.
read point-by-point responses
-
Referee: [Abstract and evaluation sections] Abstract and sections describing IAIS and the Digital Twin: the central claim that IAIS (RNN prediction + formal modeling) and the Digital Twin produce practically superior scheduling outcomes is unsupported because no RNN architecture, training corpus, prediction horizon, constraint-encoding method, anomaly-prediction technique, baselines (Kubernetes, Slurm, prior AI schedulers), or quantitative metrics (accuracy, makespan reduction, energy savings, statistical tests) are supplied.
Authors: We agree that the current manuscript does not supply the requested low-level specifications (RNN architecture details, training corpus, prediction horizon, constraint-encoding method, anomaly-prediction technique, explicit baselines, or statistical quantitative metrics). The paper frames itself as a project overview that describes the IAIS and Digital Twin at the architectural level and states that quantitative evaluation results exist from the work packages; it does not advance a central claim of practical superiority with concrete numbers. Because the requested details are absent, the evaluation sections are underspecified for a reader seeking reproducible technical validation. We will revise the manuscript to either (a) incorporate additional technical descriptions and high-level metrics drawn from the project deliverables where they can be released without violating consortium agreements, or (b) explicitly qualify the evaluation claims as high-level outcomes and point to companion technical reports or future publications for the missing details. revision: yes
Circularity Check
No circularity: high-level project description with no derivations or fitted predictions.
full rationale
The paper describes a Horizon Europe project framework (DECICE) at the architectural level, covering work packages, IAIS (RNN-based prediction + formal modeling), and Digital Twin without any equations, derivations, parameter fitting, or quantitative prediction claims that could reduce to inputs by construction. No self-citations are load-bearing for a mathematical result, and the text contains no self-definitional steps or renamed known results. This matches the default expectation for non-mathematical project papers.
Axiom & Free-Parameter Ledger
invented entities (2)
-
Integrated AI Scheduler (IAIS)
no independent evidence
-
Digital Twin
no independent evidence
Reference graph
Works this paper leans on
-
[1]
A. K. Sharma, C. Boehme, P. Gelß, R. Yahyapour, and J. Kunkel, “Workflow-driven modeling for the compute continuum: An optimiza- tion approach to automated system and workload scheduling,” inProc. IEEE COMPSAC, 2025. doi: https://doi.org/10.1109/COMPSAC65507. 2025.00343
-
[2]
Symbols of One-Loop Integrals From Mixed Tate Motives
M. Bidollahkhani, A. K. Sharma, and J. Kunkel, “HOSHMAND: Ac- celerated AI-driven scheduler emulating conventional task distribution techniques for cloud workloads,” inProc. IEEE COMPSAC, 2024. doi: https://doi.org/10.1109/COMPSAC61105.2024.00372
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1109/compsac61105.2024.00372 2024
-
[3]
A. K. Sharma and J. M. Kunkel, “Grapheon RL: A Graph Neural Network and Reinforcement Learning Framework for Constraint and Data-Aware Workflow Mapping and Scheduling in Heterogeneous HPC Systems,” inProc. IEEE COMPSAC, 2025. doi: https://doi.org/10.1109/ COMPSAC65507.2025.00341
-
[4]
DECICE: Device-Edge-Cloud Intelligent Collaboration Framework,
J. Kunkelet al., “DECICE: Device-Edge-Cloud Intelligent Collaboration Framework,” inProc. ACM Computing Frontiers (CF’23), 2023. doi: https://doi.org/10.1145/3587135.3592179
-
[5]
Enabling Kubernetes workload execution on rootless HPC systems with KSI,
J. Deckeret al., “Enabling Kubernetes workload execution on rootless HPC systems with KSI,”Int. J. Advances in Intelligent Systems, vol. 18, no. 3&4, pp. 126–136, 2025
2025
-
[6]
Comparing fault-tolerance in Kubernetes and Slurm in HPC infrastructure,
M. Aydin, M. Bidollahkhani, and J. Kunkel, “Comparing fault-tolerance in Kubernetes and Slurm in HPC infrastructure,” inProc. ADVCOMP, ISBN 978-1-68558-184-8, 2024, pp. 40–48
2024
-
[7]
M. Bidollahkhaniet al., “Design and implementation of integrated AI scheduler for dynamic cloud workloads allocation in Kubernetes environments,” inProc. FTC, Springer LNNS, 2025. doi: https://doi. org/10.1007/978-3-032-07986-2 25
-
[8]
GRAAFE: GRaph Anomaly Anticipation Framework for Exascale HPC systems,
M. Molanet al., “GRAAFE: GRaph Anomaly Anticipation Framework for Exascale HPC systems,”Future Gener . Comput. Syst., vol. 160, 2024. doi: https://doi.org/10.1016/j.future.2024.06.032
-
[9]
HazardNet: A thermal hazard prediction framework for datacenters,
M. S. Ardebili, A. Acquaviva, L. Benini, and A. Bartolini, “HazardNet: A thermal hazard prediction framework for datacenters,”Future Gener . Comput. Syst., vol. 155, pp. 340–353, 2024. doi: https://doi.org/10.1016/ j.future.2024.01.031
2024
-
[10]
Ephemeral Kubernetes: dynamically delet- ing and recreating clusters using Warewulf,
J. Decker and J. M. Kunkel, “Ephemeral Kubernetes: dynamically delet- ing and recreating clusters using Warewulf,”J. Supercomput., vol. 81,
-
[11]
doi: https://doi.org/10.1007/s11227-025-07668-y
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.