Hybrid Edge-HPC Systems for Low-Latency Data-Driven Inference

Alan Subedi; Andre Merzky; Avhishek Biswas; Benjamin Carter; Chandra Krintz; Douglas Thainand Memet Can Vuran; Liubov Kurafeeva; Michael Fay; Rich Wolski; Ryan Hartung

arxiv: 2605.20532 · v1 · pith:PITKQT6Dnew · submitted 2026-05-19 · 💻 cs.DC

Hybrid Edge-HPC Systems for Low-Latency Data-Driven Inference

Liubov Kurafeeva , Ryan Hartung , Benjamin Carter , Alan Subedi , Avhishek Biswas , Michael Fay , Shantenu Jha , Chandra Krintz

show 3 more authors

Andre Merzky Douglas Thainand Memet Can Vuran Rich Wolski

This is my paper

Pith reviewed 2026-05-21 06:11 UTC · model grok-4.3

classification 💻 cs.DC

keywords hybrid edge-HPClow-latency inferencesurrogate modelsasynchronous updatescomputational fluid dynamicsdigital agriculturemodel fidelityreverse backfill

0 comments

The pith

RBF enables continuous low-latency inference at the edge while model accuracy improves asynchronously from HPC simulations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a hybrid system for applications that need fast responses from live sensor data but depend on slow high-performance simulations to build accurate models. RBF deploys simple surrogate models on edge devices to handle immediate inference needs and brings in better versions from remote HPC when they finish. This was shown in a digital agriculture example where airflow in a screenhouse is inferred from CFD simulations. The approach keeps the system responsive and allows accuracy to rise over time even when updates arrive irregularly due to scheduling. A sympathetic reader would care because many cyber-physical systems face this exact tension between speed and model quality.

Core claim

RBF (Reverse Backfill) decouples low-latency inference from simulation-driven training by running lightweight surrogate models at the edge and asynchronously incorporating improved models from HPC backfill computations, allowing continuous operation and progressive fidelity gains despite delayed and irregular model updates in simulation-bounded settings.

What carries the argument

The Reverse Backfill (RBF) architecture, which reinterprets opportunistic HPC backfilling to prioritize model accuracy improvement rather than utilization, through pluggable surrogate models and asynchronous updates across edge to HPC infrastructure.

If this is right

Continuous low-latency inference persists even with HPC scheduling delays and irregular model updates.
Model fidelity improves over time as higher-accuracy versions from simulations become available.
The system orchestrates computation across edge devices, private 5G, cloud, and HPC resources.
Pluggable surrogate models support adaptation to different simulation-driven inference tasks.
Evaluation quantifies the impact of delayed updates on prediction accuracy in a CFD-based airflow inference application.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This method could extend to other domains requiring real-time physical process modeling with slow update cycles, such as environmental monitoring or industrial control.
It offers a practical way to leverage backfill opportunities on HPC systems specifically for enhancing model quality instead of just filling idle time.
Testable extensions include measuring the optimal frequency of surrogate refreshes based on application tolerance for accuracy drift.

Load-bearing premise

Lightweight surrogate models at the edge maintain usable accuracy for the application while waiting for asynchronous updates from HPC simulations.

What would settle it

If the prediction accuracy for airflow patterns in the agricultural screenhouse drops below acceptable levels for the duration of the longest delays between model updates, the central claim would be falsified.

Figures

Figures reproduced from arXiv: 2605.20532 by Alan Subedi, Andre Merzky, Avhishek Biswas, Benjamin Carter, Chandra Krintz, Douglas Thainand Memet Can Vuran, Liubov Kurafeeva, Michael Fay, Rich Wolski, Ryan Hartung, Shantenu Jha.

**Figure 1.** Figure 1: RBF System Architecture The Reverse Backfill architecture has three tiers. (Left) At the remote facility, sensor data is collected and conveyed over a private wireless network to a fault resilient distributed log using a pub/sub protocol. When available, published models are downloaded and used for rapid inference in place of simulation. (Middle) A dedicated, but resource-limited, cluster pulls the publish… view at source ↗

**Figure 2.** Figure 2: Timeline of the RBF instantiation illustrating asynchronous, simulationdriven model updates. Passive data collection (pdc), simulation (sim), and training (train) stages overlap across multiple pipeline instances, while model updates are published opportunistically upon completion. This design enables continuous inference despite irregular and delayed HPC execution. continuous, low-latency inference while… view at source ↗

**Figure 3.** Figure 3: Model accuracy decay over time using different history windows for all three models (PINN, FNO, PCR). The x-axis shows elapsed time ranging from [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Timeline of model publish events for all three model types (PINN, FNO, PCR) during a simultaneous live experiment on two resources: the [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: P-95 Model Transfer Time of 100 runs. Transfer time is in seconds. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

Emerging cyber-physical systems increasingly require low-latency inference from streaming sensor data while maintaining models that reflect complex and evolving physical processes. In many domains, however, model updates depend on high-fidelity simulations and training executed on remote high-performance computing (HPC) systems under batch scheduling. This creates a fundamental mismatch between the responsiveness required at the edge and the cost, throughput, and availability of simulation-driven model updates. We present RBF (Reverse Backfill), a hybrid edge-HPC learning and inference architecture that integrates low-latency edge inference with asynchronous, simulation-driven model improvement. RBF targets simulation-bounded settings in which model updates are constrained by simulation throughput and HPC scheduling delays, and reinterprets HPC backfilling by using opportunistic computation to improve model accuracy rather than system utilization. RBF decouples inference from simulation and training by deploying lightweight surrogate models at the edge while incorporating improved models asynchronously as they become available. The architecture supports pluggable surrogate models and orchestrates computation across heterogeneous infrastructure spanning edge devices, private 5G, cloud, and HPC resources. We instantiate RBF using a real-world digital agriculture deployment that couples edge sensing with computational fluid dynamics (CFD) simulations to infer airflow patterns in a large agricultural screenhouse. Our evaluation characterizes end-to-end system behavior under realistic constraints, quantifying simulation latency, training cost, inference throughput, and the impact of delayed model updates on prediction accuracy. Results demonstrate that RBF enables continuous, low-latency inference while improving model fidelity over time despite delayed and irregular model updates.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RBF gives a workable pattern for low-latency edge inference with async HPC model updates in sim-heavy settings like digital ag CFD, but the evaluation stops short of showing how long surrogates stay usable under irregular delays.

read the letter

The main thing to know is that this paper describes RBF, a hybrid architecture that runs fast surrogate inference at the edge while pulling in improved models from HPC simulations on an irregular schedule. They apply it to a screenhouse airflow inference task that couples sensors with CFD simulations, and the deployment gives the claims some concrete grounding rather than staying purely architectural.

Referee Report

1 major / 2 minor

Summary. The paper introduces RBF (Reverse Backfill), a hybrid edge-HPC architecture that decouples low-latency inference from simulation-driven model updates by deploying lightweight surrogate models at the edge while asynchronously incorporating improved models from HPC resources. It targets simulation-bounded cyber-physical systems and is evaluated in a real-world digital agriculture deployment that uses edge sensing and CFD simulations to infer airflow patterns in a screenhouse, with measurements of simulation latency, training cost, inference throughput, and accuracy effects from delayed updates.

Significance. If the evaluation demonstrates that surrogate accuracy remains within application-specific limits under irregular HPC update delays, the result would be significant for practical hybrid systems in domains requiring continuous inference with evolving physical models. The real-world deployment and quantification of end-to-end metrics (latency, throughput, accuracy impact) are strengths that ground the architecture beyond abstract claims.

major comments (1)

The central claim that RBF enables continuous low-latency inference while improving model fidelity over time despite delayed and irregular updates is load-bearing on the assumption that lightweight edge surrogates maintain usable accuracy for the CFD airflow task. The abstract states that the evaluation quantifies the impact of delayed model updates on prediction accuracy, yet the manuscript provides no explicit tolerance threshold (e.g., maximum allowable RMSE or percentage error) for the screenhouse application and no analysis or plot of prediction error versus wall-clock delay. Without this, it is not possible to verify that degradation stays within acceptable limits rather than merely improving over time.

minor comments (2)

The architecture section would benefit from a clear diagram illustrating the data and model flow across edge, private 5G, cloud, and HPC components to aid reader understanding of the orchestration.
Clarify how surrogate model pluggability is implemented and whether any specific constraints apply when swapping models in the deployed CFD inference pipeline.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review and for identifying an important gap in how we present the accuracy results. We address the major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: The central claim that RBF enables continuous low-latency inference while improving model fidelity over time despite delayed and irregular updates is load-bearing on the assumption that lightweight edge surrogates maintain usable accuracy for the CFD airflow task. The abstract states that the evaluation quantifies the impact of delayed model updates on prediction accuracy, yet the manuscript provides no explicit tolerance threshold (e.g., maximum allowable RMSE or percentage error) for the screenhouse application and no analysis or plot of prediction error versus wall-clock delay. Without this, it is not possible to verify that degradation stays within acceptable limits rather than merely improving over time.

Authors: We agree that the manuscript would be strengthened by an explicit, application-derived tolerance threshold and by a direct plot of prediction error versus wall-clock delay. While the evaluation already quantifies accuracy changes under delayed updates and shows progressive improvement, it does not define a screenhouse-specific bound (e.g., maximum RMSE tolerable for practical airflow-based climate control) nor isolate error as a function of update latency. In the revised manuscript we will add (1) a short subsection stating the tolerance threshold justified by the digital-agriculture use case and (2) a new figure plotting RMSE against wall-clock time since the last model update. These additions will make the claim that accuracy remains usable under irregular HPC delays directly verifiable. revision: yes

Circularity Check

0 steps flagged

No circularity: architecture and evaluation grounded in described deployment

full rationale

The manuscript describes the RBF hybrid edge-HPC architecture, its decoupling of edge inference from asynchronous HPC updates, and an instantiation on a real digital-agriculture CFD screenhouse deployment. Evaluation reports measured quantities (simulation latency, training cost, inference throughput, impact of delayed updates) obtained from that deployment. No equations, fitted parameters, or self-citations are used to derive the central claims; the results are empirical characterizations of the implemented system rather than reductions to inputs by construction. The paper is therefore self-contained against external benchmarks and receives score 0.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper rests on the domain assumption that model updates are constrained by simulation throughput and HPC scheduling delays, and introduces the RBF system design without explicit free parameters or new physical entities in the abstract.

axioms (1)

domain assumption Model updates depend on high-fidelity simulations and training executed on remote HPC systems under batch scheduling, creating a mismatch with edge responsiveness requirements.
Stated directly in the abstract as the fundamental problem RBF addresses.

invented entities (1)

RBF (Reverse Backfill) architecture no independent evidence
purpose: Integrates low-latency edge inference with asynchronous simulation-driven model improvement using pluggable surrogate models.
New system design presented in the paper to solve the stated mismatch.

pith-pipeline@v0.9.0 · 5852 in / 1396 out tokens · 50336 ms · 2026-05-21T06:11:27.263203+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

RBF decouples inference from simulation and training by deploying lightweight surrogate models at the edge while incorporating improved models asynchronously as they become available.
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Figure 3 displays the decay of model accuracy over time... mean absolute error (MAE) in meters per second

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 1 internal anchor

[1]

OpenFOAM — openfoam.com,

“OpenFOAM — openfoam.com,” https://www.openfoam.com/, 2024, [Accessed 26-08-2024]

work page 2024
[2]

Openfoam: User guide: snappyhexmesh,

O. , “Openfoam: User guide: snappyhexmesh,” www.openfoam.com, 01

work page
[3]

Available: https://www.openfoam.com/documentation/ guides/latest/doc/guide-meshing-snappyhexmesh.html

[Online]. Available: https://www.openfoam.com/documentation/ guides/latest/doc/guide-meshing-snappyhexmesh.html

work page
[4]

Openfoam: Api guide: poroussimple- foam directory reference,

Openfoam.com, “Openfoam: Api guide: poroussimple- foam directory reference,” 2026. [Online]. Avail- able: https://www.openfoam.com/documentation/guides/v2012/api/dir 03decea705bc0c0c22f85f91452a296f.html

work page 2026
[5]

Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations,

M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations,”Journal of Computational physics, vol. 378, pp. 686–707, 2019

work page 2019
[6]

Physics- informed neural networks (pinns) for fluid mechanics: A review,

S. Cai, Z. Mao, Z. Wang, M. Yin, and G. E. Karniadakis, “Physics- informed neural networks (pinns) for fluid mechanics: A review,”Acta Mechanica Sinica, vol. 37, no. 12, pp. 1727–1738, 2021

work page 2021
[7]

Fourier Neural Operator for Parametric Partial Differential Equations

Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar, “Fourier neural operator for parametric partial differential equations,”arXiv preprint arXiv:2010.08895, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[8]

A note on the use of principal components in regression,

I. T. Jolliffe, “A note on the use of principal components in regression,”Journal of the Royal Statistical Society. Series C (Applied Statistics), vol. 31, no. 3, pp. 300–303, 1982. [Online]. Available: http://www.jstor.org/stable/2348005

work page arXiv 1982
[9]

Raspberry Pi,

“Raspberry Pi,” https://www.raspberrypi.org

work page
[10]

CSPOT: Portable, Multi-scale Functions-as-a-Service for IoT,

R. Wolski, C. Krintz, F. Bakir, G. George, and W.-T. Lin, “CSPOT: Portable, Multi-scale Functions-as-a-Service for IoT,” inACM Symposium on Edge Computing, 2019, pp. 1–14

work page 2019
[11]

CAPLets: Resource Aware, Capability-Based Access Control for IoT,

F. Bakir, C. Krintz, and R. Wolski, “CAPLets: Resource Aware, Capability-Based Access Control for IoT,” inACM/IEEE Symposium on Edge Computing, 2021

work page 2021
[12]

srsRAN Project, https://www.srsran.com

work page
[13]

Open5GS, https://open5gs.org/

work page
[14]

Perlmutter HPC Queue Wait Times,

NERSC, “Perlmutter HPC Queue Wait Times,” 2026. [Online]. Available: https://rest.nersc.gov/rest/mynersc/frames/queuewaittimes.html

work page 2026
[15]

xgfabric: Coupling sensor networks and hpc facilities with private 5g wireless networks for real-time digital agriculture,

L. Kurafeeva, A. Subedi, R. Hartung, M. Fay, A. Biswas, S. Jha, O. Kilic, C. Krintz, A. Merzky, D. Thain, M. Vuran, and R. Wolski, “xgfabric: Coupling sensor networks and hpc facilities with private 5g wireless networks for real-time digital agriculture,” inSC25 Workshop on High Performance Computing, Networking, Storage and Analysis, 2025

work page 2025
[16]

Ray: a distributed framework for emerging ai applications,

P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang, M. Elibol, Z. Yang, W. Paul, M. I. Jordan, and I. Stoica, “Ray: a distributed framework for emerging ai applications,” inUSENIX Conference on Operating Systems Design and Implementation, 2018

work page 2018
[17]

Inferline: latency-aware provisioning and scaling for prediction serving pipelines,

D. Crankshaw, G.-E. Sela, X. Mo, C. Zumar, I. Stoica, J. Gonzalez, and A. Tumanov, “Inferline: latency-aware provisioning and scaling for prediction serving pipelines,” inSymposium on Cloud Computing, 2020

work page 2020
[18]

Inference serving with end-to-end latency slos over dynamic edge networks,

V . Nigade, P. Bauszat, H. Bal, and L. Wang, “Inference serving with end-to-end latency slos over dynamic edge networks,”Real-Time Systems, vol. 60, 2024

work page 2024
[19]

Ec5: Edge–cloud collaborative computing framework with compressive communication,

J. Tan, F. Liu, B. Wang, Q. Wu, and C. P. Chen, “Ec5: Edge–cloud collaborative computing framework with compressive communication,” Future Generation Computer Systems, vol. 166, 2025

work page 2025
[20]

Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,

Y . Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang, “Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,”Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, Apr. 2017

work page 2017
[21]

Edge intelligence: On-demand deep learning model co-inference with device-edge synergy,

E. Li, Z. Zhou, and X. Chen, “Edge intelligence: On-demand deep learning model co-inference with device-edge synergy,” inWorkshop on Mobile Edge Communications, 2018

work page 2018
[22]

Jointdnn: An efficient training and inference engine for intelligent mobile cloud computing services,

A. E. Eshratifar, M. S. Abrishami, and M. Pedram, “Jointdnn: An efficient training and inference engine for intelligent mobile cloud computing services,”IEEE Transactions on Mobile Computing, vol. 20, no. 2, 2021

work page 2021
[23]

Expanding the cloud-to-edge continuum to the IoT in serverless federated learning,

D. Loconte, F. Ieva, L. Pinto, G. Loseto, F. Scioscia, and M. Ruta, “Expanding the cloud-to-edge continuum to the IoT in serverless federated learning,”Future Generation Computer Systems, vol. 155, 2024

work page 2024
[24]

A comprehensive survey of contin- ual learning: Theory, method and application, 2024

L. Wang, X. Zhang, H. Su, and J. Zhu, “A comprehensive survey of continual learning: Theory, method and application,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 8, pp. 5362–5383, 2024. [Online]. Available: https://arxiv.org/abs/2302.00487

work page arXiv 2024
[25]

Machine learning for streaming data: state of the art, challenges, and opportunities,

H. M. Gomes, J. Read, A. Bifet, J. P. Barddal, and J. Gama, “Machine learning for streaming data: state of the art, challenges, and opportunities,” ACM SIGKDD Explorations Newsletter, vol. 21, no. 2, pp. 6–22, 2019. [Online]. Available: https://dl.acm.org/doi/10.1145/3373464.3373470

work page doi:10.1145/3373464.3373470 2019
[26]

Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators,

L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis, “Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators,”Nature Machine Intelligence, vol. 3, 2021

work page 2021
[27]

A computational workflow to study particle transport and filtration in porous media: Coupling cfd and deep learning,

A. Marcato, G. Boccardo, and D. Marchisio, “A computational workflow to study particle transport and filtration in porous media: Coupling cfd and deep learning,”Chemical Engineering Journal, vol. 417, 2021

work page 2021
[28]

Using physics-informed neural networks for solving navier-stokes equations in fluid dynamic complex scenarios,

T. Botarelli, M. Fanfani, P. Nesi, and L. Pinelli, “Using physics-informed neural networks for solving navier-stokes equations in fluid dynamic complex scenarios,”Engineering Applications of Artificial Intelligence, vol. 148, 2025

work page 2025
[29]

Lbann: livermore big artificial neural network hpc toolkit,

B. Van Essen, H. Kim, R. Pearce, K. Boakye, and B. Chen, “Lbann: livermore big artificial neural network hpc toolkit,” inWorkshop on Machine Learning in High-Performance Computing Environments, 2015

work page 2015
[30]

ExaLearn: US Department of Energy (DOE) Exascale Computing Project (ECP) center,

“ExaLearn: US Department of Energy (DOE) Exascale Computing Project (ECP) center,” 2020. [Online]. Available: https://acdc.alcf.anl. gov/exalearn/

work page 2020
[31]

Distributed dataflow across the edge-cloud continuum,

T. Ekaireb, L. Brand, N. Avaraddy, M. Mock, C. Krintz, and R. Wolski, “Distributed dataflow across the edge-cloud continuum,” in2024 IEEE 17th International Conference on Cloud Computing (CLOUD). IEEE, 2024, pp. 316–327

work page 2024
[32]

Pegasus for computational workflows,

“Pegasus for computational workflows,” https://pegasus.isi.edu, [Online; accessed 11-July-2024]

work page 2024
[33]

Design and performance characterization of radical-pilot on leadership-class platforms,

A. Merzky, M. Turilli, M. Titov, A. Al-Saadi, and S. Jha, “Design and performance characterization of radical-pilot on leadership-class platforms,”IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 4, 2022

work page 2022
[34]

Swift/T High Performance Dataflow Computing,

“Swift/T High Performance Dataflow Computing,” http://swift-lang.org/ Swift-T/, [Online; accessed 15-Nov-2016]

work page 2016
[35]

KubeFlow: The foundation of tools for AI Platforms on Kubernetes,

KubeFlow.org, “KubeFlow: The foundation of tools for AI Platforms on Kubernetes,” 2020. [Online]. Available: https://www.kubeflow.org/

work page 2020
[36]

TensorFlow Extended,

“TensorFlow Extended,” https://www.tensorflow.org/tfx, [Online; ac- cessed 15-Apr-2026]

work page 2026
[37]

In situ framework for coupling simulation and machine learning with application to CFD,

R. Balin, F. Simini, J. T. Simpson, A. Shao, A. Rigazzi, B. Ellis, S. Becker, A. Doostan, J. A. Evans, and K. E. Jansen, “In situ framework for coupling simulation and machine learning with application to CFD,” inWorkshop on Machine Learning in HPC Environments (MLHPC),

work page
[38]

Available: https://arxiv.org/abs/2306.12900

[Online]. Available: https://arxiv.org/abs/2306.12900

work page arXiv
[39]

VESTEC: Visual exploration and sampling toolkit for extreme computing—urgent decision making meets HPC,

M. Flatken, A. Podobas, R. Fellegara, A. Basermann, J. Holke, L. Knapp, M. Kontak, N. Krullikowski, B. Nolde, N. Brownet al., “VESTEC: Visual exploration and sampling toolkit for extreme computing—urgent decision making meets HPC,”IEEE Access, vol. 11, 2023

work page 2023
[40]

Rose: Radical orchestrator for surrogate exploration,

A. Alsaadi, T. Wang, A. Park, P. Bajracharya, L. Wang, F. Sun, S. Seal, V . Jadhao, G. Fox, and S. Jha, “Rose: Radical orchestrator for surrogate exploration,” inProceedings of the SC ’25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC Workshops ’25. New York, NY , USA: Association for Co...

work page doi:10.1145/3731599.3767347 2025
[41]

Empowering the 6G cellular architecture with open RAN,

M. Polese, M. Dohler, A. Dressler, M. Erol-Kantarci, R. Jana, R. Knopp, and T. Melodia, “Empowering the 6G cellular architecture with open RAN,”IEEE Journal on Selected Areas in Communications, vol. 42, no. 2, pp. 245–262, 2024

work page 2024
[42]

Resource allocation in multi-access edge computing for 5G-and-beyond networks,

Z. Sarah, G. Nencioni, and M. A. Khan, “Resource allocation in multi-access edge computing for 5G-and-beyond networks,”Computer Networks, vol. 227, 2023

work page 2023
[43]

An industrial private 5G testbed for networked automation systems,

J. Geng, M. K. Hany, and R. Candell, “An industrial private 5G testbed for networked automation systems,” inIEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), 2024. [Online]. Available: https://www.nist.gov/publications/ industrial-private-5g-testbed-networked-automation-systems

work page 2024
[44]

Time- sensitive networking (TSN) for industrial automation: Current advances and future directions,

F. Zhang, J. Wang, J. Xue, R. Wang, M. Nixon, and Y . Han, “Time- sensitive networking (TSN) for industrial automation: Current advances and future directions,”ACM Computing Surveys, vol. 57, no. 2, 2024

work page 2024
[45]

A comprehensive systematic review of integration of time sensitive networking and 5G communication,

Z. Satka, M. Ashjaei, H. Fotouhi, M. Daneshtalab, M. Sj ¨odin, and S. Mubeen, “A comprehensive systematic review of integration of time sensitive networking and 5G communication,”Journal of Systems Architecture, vol. 138, 2023

work page 2023

[1] [1]

OpenFOAM — openfoam.com,

“OpenFOAM — openfoam.com,” https://www.openfoam.com/, 2024, [Accessed 26-08-2024]

work page 2024

[2] [2]

Openfoam: User guide: snappyhexmesh,

O. , “Openfoam: User guide: snappyhexmesh,” www.openfoam.com, 01

work page

[3] [3]

Available: https://www.openfoam.com/documentation/ guides/latest/doc/guide-meshing-snappyhexmesh.html

[Online]. Available: https://www.openfoam.com/documentation/ guides/latest/doc/guide-meshing-snappyhexmesh.html

work page

[4] [4]

Openfoam: Api guide: poroussimple- foam directory reference,

Openfoam.com, “Openfoam: Api guide: poroussimple- foam directory reference,” 2026. [Online]. Avail- able: https://www.openfoam.com/documentation/guides/v2012/api/dir 03decea705bc0c0c22f85f91452a296f.html

work page 2026

[5] [5]

Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations,

M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations,”Journal of Computational physics, vol. 378, pp. 686–707, 2019

work page 2019

[6] [6]

Physics- informed neural networks (pinns) for fluid mechanics: A review,

S. Cai, Z. Mao, Z. Wang, M. Yin, and G. E. Karniadakis, “Physics- informed neural networks (pinns) for fluid mechanics: A review,”Acta Mechanica Sinica, vol. 37, no. 12, pp. 1727–1738, 2021

work page 2021

[7] [7]

Fourier Neural Operator for Parametric Partial Differential Equations

Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar, “Fourier neural operator for parametric partial differential equations,”arXiv preprint arXiv:2010.08895, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[8] [8]

A note on the use of principal components in regression,

I. T. Jolliffe, “A note on the use of principal components in regression,”Journal of the Royal Statistical Society. Series C (Applied Statistics), vol. 31, no. 3, pp. 300–303, 1982. [Online]. Available: http://www.jstor.org/stable/2348005

work page arXiv 1982

[9] [9]

Raspberry Pi,

“Raspberry Pi,” https://www.raspberrypi.org

work page

[10] [10]

CSPOT: Portable, Multi-scale Functions-as-a-Service for IoT,

R. Wolski, C. Krintz, F. Bakir, G. George, and W.-T. Lin, “CSPOT: Portable, Multi-scale Functions-as-a-Service for IoT,” inACM Symposium on Edge Computing, 2019, pp. 1–14

work page 2019

[11] [11]

CAPLets: Resource Aware, Capability-Based Access Control for IoT,

F. Bakir, C. Krintz, and R. Wolski, “CAPLets: Resource Aware, Capability-Based Access Control for IoT,” inACM/IEEE Symposium on Edge Computing, 2021

work page 2021

[12] [12]

srsRAN Project, https://www.srsran.com

work page

[13] [13]

Open5GS, https://open5gs.org/

work page

[14] [14]

Perlmutter HPC Queue Wait Times,

NERSC, “Perlmutter HPC Queue Wait Times,” 2026. [Online]. Available: https://rest.nersc.gov/rest/mynersc/frames/queuewaittimes.html

work page 2026

[15] [15]

xgfabric: Coupling sensor networks and hpc facilities with private 5g wireless networks for real-time digital agriculture,

L. Kurafeeva, A. Subedi, R. Hartung, M. Fay, A. Biswas, S. Jha, O. Kilic, C. Krintz, A. Merzky, D. Thain, M. Vuran, and R. Wolski, “xgfabric: Coupling sensor networks and hpc facilities with private 5g wireless networks for real-time digital agriculture,” inSC25 Workshop on High Performance Computing, Networking, Storage and Analysis, 2025

work page 2025

[16] [16]

Ray: a distributed framework for emerging ai applications,

P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang, M. Elibol, Z. Yang, W. Paul, M. I. Jordan, and I. Stoica, “Ray: a distributed framework for emerging ai applications,” inUSENIX Conference on Operating Systems Design and Implementation, 2018

work page 2018

[17] [17]

Inferline: latency-aware provisioning and scaling for prediction serving pipelines,

D. Crankshaw, G.-E. Sela, X. Mo, C. Zumar, I. Stoica, J. Gonzalez, and A. Tumanov, “Inferline: latency-aware provisioning and scaling for prediction serving pipelines,” inSymposium on Cloud Computing, 2020

work page 2020

[18] [18]

Inference serving with end-to-end latency slos over dynamic edge networks,

V . Nigade, P. Bauszat, H. Bal, and L. Wang, “Inference serving with end-to-end latency slos over dynamic edge networks,”Real-Time Systems, vol. 60, 2024

work page 2024

[19] [19]

Ec5: Edge–cloud collaborative computing framework with compressive communication,

J. Tan, F. Liu, B. Wang, Q. Wu, and C. P. Chen, “Ec5: Edge–cloud collaborative computing framework with compressive communication,” Future Generation Computer Systems, vol. 166, 2025

work page 2025

[20] [20]

Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,

Y . Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang, “Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,”Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, Apr. 2017

work page 2017

[21] [21]

Edge intelligence: On-demand deep learning model co-inference with device-edge synergy,

E. Li, Z. Zhou, and X. Chen, “Edge intelligence: On-demand deep learning model co-inference with device-edge synergy,” inWorkshop on Mobile Edge Communications, 2018

work page 2018

[22] [22]

Jointdnn: An efficient training and inference engine for intelligent mobile cloud computing services,

A. E. Eshratifar, M. S. Abrishami, and M. Pedram, “Jointdnn: An efficient training and inference engine for intelligent mobile cloud computing services,”IEEE Transactions on Mobile Computing, vol. 20, no. 2, 2021

work page 2021

[23] [23]

Expanding the cloud-to-edge continuum to the IoT in serverless federated learning,

D. Loconte, F. Ieva, L. Pinto, G. Loseto, F. Scioscia, and M. Ruta, “Expanding the cloud-to-edge continuum to the IoT in serverless federated learning,”Future Generation Computer Systems, vol. 155, 2024

work page 2024

[24] [24]

A comprehensive survey of contin- ual learning: Theory, method and application, 2024

L. Wang, X. Zhang, H. Su, and J. Zhu, “A comprehensive survey of continual learning: Theory, method and application,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 8, pp. 5362–5383, 2024. [Online]. Available: https://arxiv.org/abs/2302.00487

work page arXiv 2024

[25] [25]

Machine learning for streaming data: state of the art, challenges, and opportunities,

H. M. Gomes, J. Read, A. Bifet, J. P. Barddal, and J. Gama, “Machine learning for streaming data: state of the art, challenges, and opportunities,” ACM SIGKDD Explorations Newsletter, vol. 21, no. 2, pp. 6–22, 2019. [Online]. Available: https://dl.acm.org/doi/10.1145/3373464.3373470

work page doi:10.1145/3373464.3373470 2019

[26] [26]

Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators,

L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis, “Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators,”Nature Machine Intelligence, vol. 3, 2021

work page 2021

[27] [27]

A computational workflow to study particle transport and filtration in porous media: Coupling cfd and deep learning,

A. Marcato, G. Boccardo, and D. Marchisio, “A computational workflow to study particle transport and filtration in porous media: Coupling cfd and deep learning,”Chemical Engineering Journal, vol. 417, 2021

work page 2021

[28] [28]

Using physics-informed neural networks for solving navier-stokes equations in fluid dynamic complex scenarios,

T. Botarelli, M. Fanfani, P. Nesi, and L. Pinelli, “Using physics-informed neural networks for solving navier-stokes equations in fluid dynamic complex scenarios,”Engineering Applications of Artificial Intelligence, vol. 148, 2025

work page 2025

[29] [29]

Lbann: livermore big artificial neural network hpc toolkit,

B. Van Essen, H. Kim, R. Pearce, K. Boakye, and B. Chen, “Lbann: livermore big artificial neural network hpc toolkit,” inWorkshop on Machine Learning in High-Performance Computing Environments, 2015

work page 2015

[30] [30]

ExaLearn: US Department of Energy (DOE) Exascale Computing Project (ECP) center,

“ExaLearn: US Department of Energy (DOE) Exascale Computing Project (ECP) center,” 2020. [Online]. Available: https://acdc.alcf.anl. gov/exalearn/

work page 2020

[31] [31]

Distributed dataflow across the edge-cloud continuum,

T. Ekaireb, L. Brand, N. Avaraddy, M. Mock, C. Krintz, and R. Wolski, “Distributed dataflow across the edge-cloud continuum,” in2024 IEEE 17th International Conference on Cloud Computing (CLOUD). IEEE, 2024, pp. 316–327

work page 2024

[32] [32]

Pegasus for computational workflows,

“Pegasus for computational workflows,” https://pegasus.isi.edu, [Online; accessed 11-July-2024]

work page 2024

[33] [33]

Design and performance characterization of radical-pilot on leadership-class platforms,

A. Merzky, M. Turilli, M. Titov, A. Al-Saadi, and S. Jha, “Design and performance characterization of radical-pilot on leadership-class platforms,”IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 4, 2022

work page 2022

[34] [34]

Swift/T High Performance Dataflow Computing,

“Swift/T High Performance Dataflow Computing,” http://swift-lang.org/ Swift-T/, [Online; accessed 15-Nov-2016]

work page 2016

[35] [35]

KubeFlow: The foundation of tools for AI Platforms on Kubernetes,

KubeFlow.org, “KubeFlow: The foundation of tools for AI Platforms on Kubernetes,” 2020. [Online]. Available: https://www.kubeflow.org/

work page 2020

[36] [36]

TensorFlow Extended,

“TensorFlow Extended,” https://www.tensorflow.org/tfx, [Online; ac- cessed 15-Apr-2026]

work page 2026

[37] [37]

In situ framework for coupling simulation and machine learning with application to CFD,

R. Balin, F. Simini, J. T. Simpson, A. Shao, A. Rigazzi, B. Ellis, S. Becker, A. Doostan, J. A. Evans, and K. E. Jansen, “In situ framework for coupling simulation and machine learning with application to CFD,” inWorkshop on Machine Learning in HPC Environments (MLHPC),

work page

[38] [38]

Available: https://arxiv.org/abs/2306.12900

[Online]. Available: https://arxiv.org/abs/2306.12900

work page arXiv

[39] [39]

VESTEC: Visual exploration and sampling toolkit for extreme computing—urgent decision making meets HPC,

M. Flatken, A. Podobas, R. Fellegara, A. Basermann, J. Holke, L. Knapp, M. Kontak, N. Krullikowski, B. Nolde, N. Brownet al., “VESTEC: Visual exploration and sampling toolkit for extreme computing—urgent decision making meets HPC,”IEEE Access, vol. 11, 2023

work page 2023

[40] [40]

Rose: Radical orchestrator for surrogate exploration,

A. Alsaadi, T. Wang, A. Park, P. Bajracharya, L. Wang, F. Sun, S. Seal, V . Jadhao, G. Fox, and S. Jha, “Rose: Radical orchestrator for surrogate exploration,” inProceedings of the SC ’25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC Workshops ’25. New York, NY , USA: Association for Co...

work page doi:10.1145/3731599.3767347 2025

[41] [41]

Empowering the 6G cellular architecture with open RAN,

M. Polese, M. Dohler, A. Dressler, M. Erol-Kantarci, R. Jana, R. Knopp, and T. Melodia, “Empowering the 6G cellular architecture with open RAN,”IEEE Journal on Selected Areas in Communications, vol. 42, no. 2, pp. 245–262, 2024

work page 2024

[42] [42]

Resource allocation in multi-access edge computing for 5G-and-beyond networks,

Z. Sarah, G. Nencioni, and M. A. Khan, “Resource allocation in multi-access edge computing for 5G-and-beyond networks,”Computer Networks, vol. 227, 2023

work page 2023

[43] [43]

An industrial private 5G testbed for networked automation systems,

J. Geng, M. K. Hany, and R. Candell, “An industrial private 5G testbed for networked automation systems,” inIEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), 2024. [Online]. Available: https://www.nist.gov/publications/ industrial-private-5g-testbed-networked-automation-systems

work page 2024

[44] [44]

Time- sensitive networking (TSN) for industrial automation: Current advances and future directions,

F. Zhang, J. Wang, J. Xue, R. Wang, M. Nixon, and Y . Han, “Time- sensitive networking (TSN) for industrial automation: Current advances and future directions,”ACM Computing Surveys, vol. 57, no. 2, 2024

work page 2024

[45] [45]

A comprehensive systematic review of integration of time sensitive networking and 5G communication,

Z. Satka, M. Ashjaei, H. Fotouhi, M. Daneshtalab, M. Sj ¨odin, and S. Mubeen, “A comprehensive systematic review of integration of time sensitive networking and 5G communication,”Journal of Systems Architecture, vol. 138, 2023

work page 2023