DECICE: AI-Driven Scheduling and Digital Twin Integration for the Cloud-HPC-Edge Compute Continuum

Aasish Kumar Sharma; Felix Stein; Giorgi Mamulashvili; Jonathan Decker; Julian M. Kunkel; Michael Bidollahkhani; Mirac Aydin; Mohsen Seyedkazemi Ardebili; Mojtaba Akbari; Sachin P. Nanavati

arxiv: 2605.25292 · v1 · pith:PTFOQXAYnew · submitted 2026-05-24 · 💻 cs.DC

DECICE: AI-Driven Scheduling and Digital Twin Integration for the Cloud-HPC-Edge Compute Continuum

Aasish Kumar Sharma , Felix Stein , Mirac Aydin , Michael Bidollahkhani , Sachin P. Nanavati , Mohsen Seyedkazemi Ardebili , Giorgi Mamulashvili , Mojtaba Akbari

show 3 more authors

Jonathan Decker Zoya Masih Julian M. Kunkel

This is my paper

Pith reviewed 2026-06-29 23:18 UTC · model grok-4.3

classification 💻 cs.DC

keywords AI schedulingdigital twincloud-HPC-edge continuumKubernetesenergy-aware schedulingRNN predictionworkload mappingSlurm integration

0 comments

The pith

DECICE combines an RNN-based AI scheduler with a digital twin to map workloads across cloud, HPC, and edge systems while respecting constraints and tracking carbon intensity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the DECICE project, which built an open-source framework for scheduling workloads across the heterogeneous cloud-HPC-edge compute continuum. Its Integrated AI Scheduler uses recurrent neural network predictions together with formal workflow models to place tasks under given constraints. A Digital Twin layer gathers live metrics, carbon-intensity signals, and anomaly forecasts to steer energy-aware decisions. The implementation runs inside Kubernetes and adds a Slurm bridge for HPC resources, with workflow ingestion from multiple formats. A sympathetic reader would care because coordinated scheduling in mixed environments can improve resource use and lower overall energy demand.

Core claim

The DECICE framework supplies an Integrated AI Scheduler (IAIS) that employs RNN-based prediction and formal workflow modeling for constraint-aware workload mapping, paired with a Digital Twin that aggregates real-time metrics with carbon intensity and anomaly prediction to support energy-aware scheduling; the system operates in Kubernetes environments, accepts unified workflow input from several formats, and bridges cloud-native and HPC orchestration through a Slurm integration layer.

What carries the argument

The Integrated AI Scheduler (IAIS) with RNN prediction plus formal workflow modeling, together with the Digital Twin that folds in carbon intensity and anomaly data for energy-aware decisions.

If this is right

Workloads are placed with explicit respect for constraints through the combination of RNN forecasts and formal models.
Scheduling decisions incorporate carbon intensity to favor lower-emission placements.
A single Kubernetes-based system ingests workflows from multiple formats and routes them across cloud, HPC, and edge resources.
HPC clusters become reachable from cloud orchestration via the Slurm integration layer.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Live carbon data inside the twin could let operators set explicit emission budgets rather than simple performance goals.
Anomaly prediction may trigger proactive migration before performance degrades across the continuum.
Open release of the code allows third parties to test the same scheduler logic on additional hardware mixes or prediction models.

Load-bearing premise

RNN-based prediction together with the digital twin will produce practically better scheduling outcomes than existing methods in real deployments across the compute continuum.

What would settle it

A controlled multi-site deployment that measures task completion time, energy consumption, and constraint violations when using DECICE versus standard Kubernetes or Slurm schedulers on the same workload mix.

Figures

Figures reproduced from arXiv: 2605.25292 by Aasish Kumar Sharma, Felix Stein, Giorgi Mamulashvili, Jonathan Decker, Julian M. Kunkel, Michael Bidollahkhani, Mirac Aydin, Mohsen Seyedkazemi Ardebili, Mojtaba Akbari, Sachin P. Nanavati, Zoya Masih.

**Figure 2.** Figure 2: WP2: IAIS data flow. Heterogeneous node and job attributes enter [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 4.** Figure 4: IAIS scalability: training and validation times across problem sizes [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗

**Figure 5.** Figure 5: Runtime comparison of 10 scheduling tools across three workflow [PITH_FULL_IMAGE:figures/full_fig_p003_5.png] view at source ↗

read the original abstract

This paper presents the DECICE project (Device Edge Cloud Intelligent Collaboration framEwork), a Horizon Europe Research and Innovation Action (Grant No. 101092582, December 2022 to November 2025) that developed an open-source framework for intelligent workload scheduling across the cloud-HPC-edge compute continuum. A consortium of 12 partners across 6 European countries organized the work into six work packages covering AI-driven scheduling, digital twin infrastructure, system architecture and integration, monitoring, use case validation, and dissemination. The two core technical contributions are an Integrated AI Scheduler (IAIS) employing RNN-based prediction and formal workflow modeling for constraint-aware workload mapping, and a Digital Twin aggregating real-time metrics with carbon intensity and anomaly prediction for energy-aware scheduling. The framework operates within Kubernetes environments, supports unified workflow ingestion from multiple formats, and bridges cloud-native and HPC orchestration through a Slurm integration layer. We present the project vision, the overall architecture, contributions from each work package, quantitative evaluation results, and the open-source release.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DECICE describes a Kubernetes-based framework combining RNN scheduling with a digital twin for cloud-HPC-edge workloads but presents the performance claims without enough methodological or numerical detail to assess them.

read the letter

The one thing to know is that this is a project report on the DECICE framework for scheduling across cloud, HPC, and edge using AI and digital twins, and it does not introduce a standalone new scientific result.

The new element is the specific combination of an Integrated AI Scheduler that uses recurrent neural networks for workload prediction alongside formal workflow models to handle constraints, paired with a Digital Twin that pulls in real-time data, carbon intensity, and anomaly detection to guide energy-aware decisions. The framework runs on Kubernetes, accepts workflows in various formats, and adds a layer to work with Slurm for HPC parts. The project organized this across six work packages with partners from six countries, and they plan to release the code.

This setup does a reasonable job of addressing the practical challenges of managing workloads across very different compute environments while trying to account for energy use. The mention of use case validation suggests they tested it in real scenarios.

The soft spot is the lack of detail around the results. The text says quantitative evaluation results are presented, but the high-level description does not include the RNN architecture, training data, prediction performance, how the formal models encode constraints, the comparison methods against standard schedulers, or any reported numbers on improvements. This makes it difficult to judge whether the approach delivers better outcomes than existing tools.

This paper is aimed at researchers and practitioners working on compute continuum orchestration, digital twins for systems, or energy-efficient scheduling. Someone already familiar with Kubernetes and Slurm might find the integration points useful as a reference.

It deserves a serious referee because the topic is timely and the open-source component adds value that others can use, even if the current write-up needs more on the evaluation to stand on its own.

I would recommend sending it out for peer review with the expectation that the authors will need to expand the results section substantially.

Referee Report

1 major / 1 minor

Summary. The paper describes the DECICE project (Horizon Europe Grant 101092582), an open-source framework for AI-driven workload scheduling across the cloud-HPC-edge continuum. It outlines an Integrated AI Scheduler (IAIS) that combines RNN-based prediction with formal workflow modeling for constraint-aware mapping, a Digital Twin that aggregates real-time metrics, carbon intensity, and anomaly prediction for energy-aware decisions, Kubernetes-native operation with Slurm integration, unified workflow ingestion, and contributions from six work packages. The manuscript covers project vision, architecture, work-package results, quantitative evaluation, and open-source release.

Significance. Scheduling across heterogeneous compute continua with energy and constraint awareness is a relevant problem in distributed systems. A working implementation that demonstrably improves on baselines in makespan, energy, or constraint satisfaction could be useful for practitioners. The manuscript, however, functions primarily as a project overview rather than a self-contained technical contribution with novel algorithms or detailed empirical validation.

major comments (1)

[Abstract and evaluation sections] Abstract and sections describing IAIS and the Digital Twin: the central claim that IAIS (RNN prediction + formal modeling) and the Digital Twin produce practically superior scheduling outcomes is unsupported because no RNN architecture, training corpus, prediction horizon, constraint-encoding method, anomaly-prediction technique, baselines (Kubernetes, Slurm, prior AI schedulers), or quantitative metrics (accuracy, makespan reduction, energy savings, statistical tests) are supplied.

minor comments (1)

The manuscript would benefit from an explicit section that separates project-level description from the specific technical contributions and their evaluation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation for major revision. The manuscript is intended as a high-level overview of the DECICE project, its architecture, and work-package contributions rather than a self-contained algorithmic paper. We address the single major comment below.

read point-by-point responses

Referee: [Abstract and evaluation sections] Abstract and sections describing IAIS and the Digital Twin: the central claim that IAIS (RNN prediction + formal modeling) and the Digital Twin produce practically superior scheduling outcomes is unsupported because no RNN architecture, training corpus, prediction horizon, constraint-encoding method, anomaly-prediction technique, baselines (Kubernetes, Slurm, prior AI schedulers), or quantitative metrics (accuracy, makespan reduction, energy savings, statistical tests) are supplied.

Authors: We agree that the current manuscript does not supply the requested low-level specifications (RNN architecture details, training corpus, prediction horizon, constraint-encoding method, anomaly-prediction technique, explicit baselines, or statistical quantitative metrics). The paper frames itself as a project overview that describes the IAIS and Digital Twin at the architectural level and states that quantitative evaluation results exist from the work packages; it does not advance a central claim of practical superiority with concrete numbers. Because the requested details are absent, the evaluation sections are underspecified for a reader seeking reproducible technical validation. We will revise the manuscript to either (a) incorporate additional technical descriptions and high-level metrics drawn from the project deliverables where they can be released without violating consortium agreements, or (b) explicitly qualify the evaluation claims as high-level outcomes and point to companion technical reports or future publications for the missing details. revision: yes

Circularity Check

0 steps flagged

No circularity: high-level project description with no derivations or fitted predictions.

full rationale

The paper describes a Horizon Europe project framework (DECICE) at the architectural level, covering work packages, IAIS (RNN-based prediction + formal modeling), and Digital Twin without any equations, derivations, parameter fitting, or quantitative prediction claims that could reduce to inputs by construction. No self-citations are load-bearing for a mathematical result, and the text contains no self-definitional steps or renamed known results. This matches the default expectation for non-mathematical project papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

The central claims rest on the unverified effectiveness of the two introduced components; no free parameters, axioms, or external evidence are detailed in the abstract.

invented entities (2)

Integrated AI Scheduler (IAIS) no independent evidence
purpose: RNN-based prediction and formal workflow modeling for constraint-aware mapping
Presented as a core technical contribution without independent validation data.
Digital Twin no independent evidence
purpose: Aggregating metrics including carbon intensity and anomaly prediction for energy-aware decisions
Introduced as the second core contribution without external benchmarks.

pith-pipeline@v0.9.1-grok · 5771 in / 1088 out tokens · 23574 ms · 2026-06-29T23:18:37.583246+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 7 canonical work pages · 1 internal anchor

[1]

Workflow-driven modeling for the compute continuum: An optimiza- tion approach to automated system and workload scheduling,

A. K. Sharma, C. Boehme, P. Gelß, R. Yahyapour, and J. Kunkel, “Workflow-driven modeling for the compute continuum: An optimiza- tion approach to automated system and workload scheduling,” inProc. IEEE COMPSAC, 2025. doi: https://doi.org/10.1109/COMPSAC65507. 2025.00343

work page doi:10.1109/compsac65507 2025
[2]

Symbols of One-Loop Integrals From Mixed Tate Motives

M. Bidollahkhani, A. K. Sharma, and J. Kunkel, “HOSHMAND: Ac- celerated AI-driven scheduler emulating conventional task distribution techniques for cloud workloads,” inProc. IEEE COMPSAC, 2024. doi: https://doi.org/10.1109/COMPSAC61105.2024.00372

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1109/compsac61105.2024.00372 2024
[3]

Grapheon RL: A Graph Neural Network and Reinforcement Learning Framework for Constraint and Data-Aware Workflow Mapping and Scheduling in Heterogeneous HPC Systems,

A. K. Sharma and J. M. Kunkel, “Grapheon RL: A Graph Neural Network and Reinforcement Learning Framework for Constraint and Data-Aware Workflow Mapping and Scheduling in Heterogeneous HPC Systems,” inProc. IEEE COMPSAC, 2025. doi: https://doi.org/10.1109/ COMPSAC65507.2025.00341

work page arXiv 2025
[4]

DECICE: Device-Edge-Cloud Intelligent Collaboration Framework,

J. Kunkelet al., “DECICE: Device-Edge-Cloud Intelligent Collaboration Framework,” inProc. ACM Computing Frontiers (CF’23), 2023. doi: https://doi.org/10.1145/3587135.3592179

work page doi:10.1145/3587135.3592179 2023
[5]

Enabling Kubernetes workload execution on rootless HPC systems with KSI,

J. Deckeret al., “Enabling Kubernetes workload execution on rootless HPC systems with KSI,”Int. J. Advances in Intelligent Systems, vol. 18, no. 3&4, pp. 126–136, 2025

2025
[6]

Comparing fault-tolerance in Kubernetes and Slurm in HPC infrastructure,

M. Aydin, M. Bidollahkhani, and J. Kunkel, “Comparing fault-tolerance in Kubernetes and Slurm in HPC infrastructure,” inProc. ADVCOMP, ISBN 978-1-68558-184-8, 2024, pp. 40–48

2024
[7]

Design and implementation of integrated AI scheduler for dynamic cloud workloads allocation in Kubernetes environments,

M. Bidollahkhaniet al., “Design and implementation of integrated AI scheduler for dynamic cloud workloads allocation in Kubernetes environments,” inProc. FTC, Springer LNNS, 2025. doi: https://doi. org/10.1007/978-3-032-07986-2 25

work page doi:10.1007/978-3-032-07986-2 2025
[8]

GRAAFE: GRaph Anomaly Anticipation Framework for Exascale HPC systems,

M. Molanet al., “GRAAFE: GRaph Anomaly Anticipation Framework for Exascale HPC systems,”Future Gener . Comput. Syst., vol. 160, 2024. doi: https://doi.org/10.1016/j.future.2024.06.032

work page doi:10.1016/j.future.2024.06.032 2024
[9]

HazardNet: A thermal hazard prediction framework for datacenters,

M. S. Ardebili, A. Acquaviva, L. Benini, and A. Bartolini, “HazardNet: A thermal hazard prediction framework for datacenters,”Future Gener . Comput. Syst., vol. 155, pp. 340–353, 2024. doi: https://doi.org/10.1016/ j.future.2024.01.031

2024
[10]

Ephemeral Kubernetes: dynamically delet- ing and recreating clusters using Warewulf,

J. Decker and J. M. Kunkel, “Ephemeral Kubernetes: dynamically delet- ing and recreating clusters using Warewulf,”J. Supercomput., vol. 81,
[11]

doi: https://doi.org/10.1007/s11227-025-07668-y

work page doi:10.1007/s11227-025-07668-y

[1] [1]

Workflow-driven modeling for the compute continuum: An optimiza- tion approach to automated system and workload scheduling,

A. K. Sharma, C. Boehme, P. Gelß, R. Yahyapour, and J. Kunkel, “Workflow-driven modeling for the compute continuum: An optimiza- tion approach to automated system and workload scheduling,” inProc. IEEE COMPSAC, 2025. doi: https://doi.org/10.1109/COMPSAC65507. 2025.00343

work page doi:10.1109/compsac65507 2025

[2] [2]

Symbols of One-Loop Integrals From Mixed Tate Motives

M. Bidollahkhani, A. K. Sharma, and J. Kunkel, “HOSHMAND: Ac- celerated AI-driven scheduler emulating conventional task distribution techniques for cloud workloads,” inProc. IEEE COMPSAC, 2024. doi: https://doi.org/10.1109/COMPSAC61105.2024.00372

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1109/compsac61105.2024.00372 2024

[3] [3]

Grapheon RL: A Graph Neural Network and Reinforcement Learning Framework for Constraint and Data-Aware Workflow Mapping and Scheduling in Heterogeneous HPC Systems,

A. K. Sharma and J. M. Kunkel, “Grapheon RL: A Graph Neural Network and Reinforcement Learning Framework for Constraint and Data-Aware Workflow Mapping and Scheduling in Heterogeneous HPC Systems,” inProc. IEEE COMPSAC, 2025. doi: https://doi.org/10.1109/ COMPSAC65507.2025.00341

work page arXiv 2025

[4] [4]

DECICE: Device-Edge-Cloud Intelligent Collaboration Framework,

J. Kunkelet al., “DECICE: Device-Edge-Cloud Intelligent Collaboration Framework,” inProc. ACM Computing Frontiers (CF’23), 2023. doi: https://doi.org/10.1145/3587135.3592179

work page doi:10.1145/3587135.3592179 2023

[5] [5]

Enabling Kubernetes workload execution on rootless HPC systems with KSI,

J. Deckeret al., “Enabling Kubernetes workload execution on rootless HPC systems with KSI,”Int. J. Advances in Intelligent Systems, vol. 18, no. 3&4, pp. 126–136, 2025

2025

[6] [6]

Comparing fault-tolerance in Kubernetes and Slurm in HPC infrastructure,

M. Aydin, M. Bidollahkhani, and J. Kunkel, “Comparing fault-tolerance in Kubernetes and Slurm in HPC infrastructure,” inProc. ADVCOMP, ISBN 978-1-68558-184-8, 2024, pp. 40–48

2024

[7] [7]

Design and implementation of integrated AI scheduler for dynamic cloud workloads allocation in Kubernetes environments,

M. Bidollahkhaniet al., “Design and implementation of integrated AI scheduler for dynamic cloud workloads allocation in Kubernetes environments,” inProc. FTC, Springer LNNS, 2025. doi: https://doi. org/10.1007/978-3-032-07986-2 25

work page doi:10.1007/978-3-032-07986-2 2025

[8] [8]

GRAAFE: GRaph Anomaly Anticipation Framework for Exascale HPC systems,

M. Molanet al., “GRAAFE: GRaph Anomaly Anticipation Framework for Exascale HPC systems,”Future Gener . Comput. Syst., vol. 160, 2024. doi: https://doi.org/10.1016/j.future.2024.06.032

work page doi:10.1016/j.future.2024.06.032 2024

[9] [9]

HazardNet: A thermal hazard prediction framework for datacenters,

M. S. Ardebili, A. Acquaviva, L. Benini, and A. Bartolini, “HazardNet: A thermal hazard prediction framework for datacenters,”Future Gener . Comput. Syst., vol. 155, pp. 340–353, 2024. doi: https://doi.org/10.1016/ j.future.2024.01.031

2024

[10] [10]

Ephemeral Kubernetes: dynamically delet- ing and recreating clusters using Warewulf,

J. Decker and J. M. Kunkel, “Ephemeral Kubernetes: dynamically delet- ing and recreating clusters using Warewulf,”J. Supercomput., vol. 81,

[11] [11]

doi: https://doi.org/10.1007/s11227-025-07668-y

work page doi:10.1007/s11227-025-07668-y