Cloud Is Closer Than It Appears: Revisiting the Tradeoffs of Distributed Real-Time Inference

Hang Qiu; Mani Srivastava; Pragya Sharma

arxiv: 2605.00005 · v1 · submitted 2026-02-17 · 💻 cs.LG · cs.AI· cs.DC· cs.NI

Cloud Is Closer Than It Appears: Revisiting the Tradeoffs of Distributed Real-Time Inference

Pragya Sharma , Hang Qiu , Mani Srivastava This is my paper

Pith reviewed 2026-05-15 21:16 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.DCcs.NI

keywords cloud inferencereal-time systemscyber-physical systemsautonomous drivinglatency tradeoffsemergency brakingon-device vs clouddistributed inference

0 comments

The pith

When given high-throughput compute, cloud platforms can match or surpass on-device inference for real-time emergency braking in autonomous driving.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper challenges the long-standing preference for on-device inference in cyber-physical systems, which avoids network delays at the cost of local energy and compute burdens. It builds a formal analytical model linking inference latency to sensing frequency, platform throughput, network delay, and safety margins. Simulations of emergency braking show that sufficient cloud resources let remote inference amortize those delays and stick to safety deadlines more consistently than local hardware. The work concludes that cloud is often the better choice for such tasks rather than a fallback option.

Core claim

We develop a formal analytical model that characterizes distributed inference latency as a function of the sensing frequency, platform throughput, network delay, and task-specific safety constraints. We instantiate this model in the context of emergency braking for autonomous driving and validate it through extensive simulations using real-time vehicular dynamics. Our empirical results identify concrete conditions under which cloud-based inference adheres to safety margins more reliably than its on-device counterpart.

What carries the argument

A formal analytical model of distributed inference latency expressed in terms of sensing frequency, platform throughput, network delay, and task-specific safety constraints.

Load-bearing premise

The analytical model accurately captures real-world network variability, queueing delays, and safety constraints when instantiated only through simulations of vehicular dynamics.

What would settle it

A physical testbed deployment in which measured network and queueing delays cause cloud inference to violate braking safety margins more often than on-device inference.

Figures

Figures reproduced from arXiv: 2605.00005 by Hang Qiu, Mani Srivastava, Pragya Sharma.

**Figure 1.** Figure 1: Temporal dynamics of emergency braking scenario. The ego vehicle traveling at v0 m/s detects an obstacle at time tdet. A braking command is issued at tbrake following inference execution either ondevice (m, d) or on cloud (m, c) platform. The vehicle stops at tstop. to offload a task to a remote server [12], and how to offload data efficiently [13]. The service placement problem itself has been addressed … view at source ↗

**Figure 2.** Figure 2: (Left) Latency distributions for WiFi and 5G networks; (Right) Inference latency as a function of GPU utilization for cloud and on-device deployments. To evaluate the perception-to-action latency in safety-critical settings, we introduce a static obstacle approximately 300 m ahead of the vehicle’s trajectory. The perception module uses the YOLO11 [41] family of object detectors, which are pretrained on th… view at source ↗

**Figure 3.** Figure 3: Platform-wise braking performance under (a) baseline, (b) tail latency, (c) concurrent workload, and (d) varying obstacle scenarios. Each horizontal series corresponds to a specific vehicle–platform–speed configuration. Marker positions indicate distances at perception, brake reception, and vehicle stop. Line styles represent vehicle types, and unsafe outcomes are flagged in red. both cloud and on-device d… view at source ↗

read the original abstract

The increasing deployment of deep neural networks (DNNs) in cyber-physical systems (CPS) enhances perception fidelity, but imposes substantial computational demands on execution platforms, posing challenges to real-time control deadlines. Traditional distributed CPS architectures typically favor on-device inference to avoid network variability and contention-induced delays on remote platforms. However, this design choice places significant energy and computational demands on the local hardware. In this work, we revisit the assumption that cloud-based inference is intrinsically unsuitable for latency-sensitive control tasks. We demonstrate that, when provisioned with high-throughput compute resources, cloud platforms can effectively amortize network and queueing delays, enabling them to match or surpass on-device performance for real-time decision-making. Specifically, we develop a formal analytical model that characterizes distributed inference latency as a function of the sensing frequency, platform throughput, network delay, and task-specific safety constraints. We instantiate this model in the context of emergency braking for autonomous driving and validate it through extensive simulations using real-time vehicular dynamics. Our empirical results identify concrete conditions under which cloud-based inference adheres to safety margins more reliably than its on-device counterpart. These findings challenge prevailing design strategies and suggest that the cloud is not merely a feasible option, but often the preferred inference location for distributed CPS architectures. In this light, the cloud is not as distant as traditionally perceived; in fact, it is closer than it appears.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a parameterized latency model for cloud vs on-device inference and finds conditions where cloud meets braking safety margins better in simulation.

read the letter

The main thing to know is that this paper builds an analytical model for distributed inference latency and uses it to show that, with high-throughput cloud resources, cloud can sometimes match or beat on-device performance on safety deadlines for emergency braking. They tie latency directly to sensing frequency, platform throughput, network delay, and task safety margins, then run simulations with vehicular dynamics to identify concrete parameter ranges where cloud wins on reliability. That directly questions the default assumption that network variability rules out cloud for real-time CPS control. The model is straightforward and input-driven rather than fitted, which makes it easy to inspect and reuse for other scenarios. The braking case study adds a practical anchor that generic comparisons often lack. The simulations at least ground the control dynamics in something realistic. The clear limitation is that validation stays inside simulation. There are no real cloud latency traces, production queueing logs, or hardware-in-the-loop runs, so the model’s delay distributions are never checked against actual tail behavior or contention. If real networks or schedulers produce heavier tails than assumed, the reported conditions favoring cloud could move. The paper also does not appear to include sensitivity checks on the delay parameters themselves. This work is aimed at people designing real-time inference pipelines for autonomous systems and distributed CPS. Anyone already thinking about edge-cloud tradeoffs for latency-critical control would find the model and the specific numbers useful for their own calculations. I would send it to peer review. The model is explicit enough and the results are concrete enough that referees can give targeted feedback on both the math and the validation gaps.

Referee Report

2 major / 1 minor

Summary. The paper claims that cloud-based DNN inference for cyber-physical systems can match or surpass on-device performance for real-time control tasks such as emergency braking when provisioned with high-throughput resources. It develops a formal analytical model expressing distributed inference latency in terms of sensing frequency, platform throughput, network delay, and task-specific safety margins; the model is instantiated for autonomous driving and validated exclusively through simulations driven by real-time vehicular dynamics, yielding concrete conditions under which cloud inference adheres more reliably to safety margins.

Significance. If the analytical model and simulation results hold under realistic conditions, the work would challenge the prevailing preference for on-device inference in latency-sensitive CPS, potentially enabling lower on-device energy use and simpler hardware while maintaining safety. The identification of explicit provisioning thresholds for cloud superiority would be a useful design guideline.

major comments (2)

[Analytical Model] The central claim rests on the formal analytical model showing that high-throughput cloud amortizes network and queueing delays to meet or beat on-device safety margins. However, the manuscript provides no explicit equations, parameter definitions, or derivation for the latency expression (described only at the level of the abstract), preventing verification that the model is free of circularity or post-hoc fitting and that the reported conditions are not artifacts of the chosen distributions.
[Simulation Validation] Validation occurs exclusively via simulations using real-time vehicular dynamics for the emergency-braking scenario. No real cloud latency traces, production queueing measurements, or hardware-in-the-loop experiments are reported; if actual tail latencies or contention exceed the modeled distributions, the concrete conditions favoring cloud inference may not hold.

minor comments (1)

[Abstract] The abstract states that the model is 'instantiated' and 'validated through extensive simulations' but supplies no numerical parameter values, error bars, or exclusion criteria, making it difficult to reproduce or assess robustness.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to improve the presentation of the analytical model and to clarify the scope and limitations of the simulation-based validation.

read point-by-point responses

Referee: [Analytical Model] The central claim rests on the formal analytical model showing that high-throughput cloud amortizes network and queueing delays to meet or beat on-device safety margins. However, the manuscript provides no explicit equations, parameter definitions, or derivation for the latency expression (described only at the level of the abstract), preventing verification that the model is free of circularity or post-hoc fitting and that the reported conditions are not artifacts of the chosen distributions.

Authors: The referee correctly observes that the current version of the manuscript does not present the explicit equations, parameter definitions, or step-by-step derivation of the latency model. We will revise the paper to include a dedicated subsection that states the full analytical expression for distributed inference latency, defines every parameter (sensing frequency, platform throughput, network delay, safety margins), and provides the complete derivation. This addition will make it possible to verify that the model contains no circularity and that the reported conditions follow directly from the stated assumptions and distributions. revision: yes
Referee: [Simulation Validation] Validation occurs exclusively via simulations using real-time vehicular dynamics for the emergency-braking scenario. No real cloud latency traces, production queueing measurements, or hardware-in-the-loop experiments are reported; if actual tail latencies or contention exceed the modeled distributions, the concrete conditions favoring cloud inference may not hold.

Authors: Validation is performed through simulations that incorporate real-time vehicular dynamics for the emergency-braking task. We agree that real cloud latency traces and hardware-in-the-loop experiments would provide stronger evidence. In the revised manuscript we will add an expanded limitations section that includes sensitivity analysis with respect to tail latencies and contention, explicitly states the distributional assumptions, and notes that the framework can be re-instantiated with production traces when they become available. We do not claim the current results substitute for such measurements. revision: partial

Circularity Check

0 steps flagged

Analytical model is self-contained with no circular reduction

full rationale

The paper defines a formal analytical model expressing distributed inference latency directly as a function of sensing frequency, platform throughput, network delay, and task-specific safety constraints. This is a definitional mapping, not a fitted parameter or self-referential loop. The model is then instantiated for the emergency-braking scenario and validated via simulations driven by real-time vehicular dynamics. No equations or steps reduce any 'prediction' back to the inputs by construction, no self-citations are load-bearing, and no ansatz or uniqueness result is smuggled in. The derivation chain is independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on an unshown analytical latency model whose inputs are treated as given; no free parameters are explicitly fitted in the abstract, and no new physical entities are postulated.

axioms (1)

domain assumption Network and queueing delays can be characterized as deterministic functions of sensing frequency, platform throughput, and task safety constraints
Invoked when the paper states the model 'characterizes distributed inference latency as a function of' those quantities

pith-pipeline@v0.9.0 · 5556 in / 1274 out tokens · 30164 ms · 2026-05-15T21:16:33.662872+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

[1]

A review of cyber-physical system research relevant to the emerging it trends: industry 4.0, iot, big data, and cloud computing,

J. H. Kim, “A review of cyber-physical system research relevant to the emerging it trends: industry 4.0, iot, big data, and cloud computing,” Journal of industrial integration and management, vol. 2, no. 03, p. 1750011, 2017

work page 2017
[2]

Multimodal data fusion: an overview of methods, challenges, and prospects,

D. Lahat, T. Adali, and C. Jutten, “Multimodal data fusion: an overview of methods, challenges, and prospects,”Proceedings of the IEEE, vol. 103, no. 9, pp. 1449–1477, 2015

work page 2015
[3]

Edge computing for autonomous driving: Opportunities and challenges,

S. Liu, L. Liu, J. Tang, B. Yu, Y . Wang, and W. Shi, “Edge computing for autonomous driving: Opportunities and challenges,”Proceedings of the IEEE, vol. 107, no. 8, pp. 1697–1716, 2019

work page 2019
[4]

“Waymo.” https://www.waymo.com

work page
[5]

Compute solution for tesla’s full self-driving computer,

E. Talpes, D. D. Sarma, G. Venkataramanan, P. Bannon, B. McGee, B. Floering, A. Jalote, C. Hsiong, S. Arora, A. Gorti,et al., “Compute solution for tesla’s full self-driving computer,”IEEE Micro, vol. 40, no. 2, pp. 25–35, 2020

work page 2020
[6]

Life cycle assessment of connected and automated vehicles: sensing and computing subsystem and vehicle level effects,

J. H. Gawron, G. A. Keoleian, R. D. De Kleine, T. J. Wallington, and H. C. Kim, “Life cycle assessment of connected and automated vehicles: sensing and computing subsystem and vehicle level effects,” Environmental science & technology, vol. 52, no. 5, 2018

work page 2018
[7]

Data centers on wheels: Emissions from computing onboard autonomous vehicles,

S. Sudhakar, V . Sze, and S. Karaman, “Data centers on wheels: Emissions from computing onboard autonomous vehicles,”IEEE Micro, vol. 43, no. 1, pp. 29–39, 2022

work page 2022
[8]

Amazon rekognition

“Amazon rekognition.” https://aws.amazon.com/rekognition/

work page
[9]

Aws local cloud

“Aws local cloud.” https://aws.amazon.com/about-aws/ global-infrastructure/localzones/

work page
[10]

Latency comparison of cloud datacenters and edge servers,

B. Charyyev, E. Arslan, and M. H. Gunes, “Latency comparison of cloud datacenters and edge servers,” inGLOBECOM 2020-2020 IEEE Global Communications Conference, pp. 1–6, IEEE, 2020

work page 2020
[11]

A dynamic offloading algorithm for mobile computing,

D. Huang, P. Wang, and D. Niyato, “A dynamic offloading algorithm for mobile computing,”IEEE Transactions on Wireless Communications, vol. 11, no. 6, pp. 1991–1995, 2012

work page 1991
[12]

To offload or not to offload? the bandwidth and energy costs of mobile cloud computing,

M. V . Barbera, S. Kosta, A. Mei, and J. Stefa, “To offload or not to offload? the bandwidth and energy costs of mobile cloud computing,” in2013 Proceedings Ieee Infocom, pp. 1285–1293, IEEE, 2013

work page 2013
[13]

Joint model and data adaptation for cloud inference serving,

J. Jiang, Z. Luo, C. Hu, Z. He, Z. Wang, S. Xia, and C. Wu, “Joint model and data adaptation for cloud inference serving,” in2021 IEEE Real-Time Systems Symposium (RTSS), pp. 279–289, IEEE, 2021

work page 2021
[14]

Collaborative cloud and edge computing for latency minimization,

J. Ren, G. Yu, Y . He, and G. Y . Li, “Collaborative cloud and edge computing for latency minimization,”IEEE Transactions on V ehicular Technology, vol. 68, no. 5, pp. 5031–5044, 2019

work page 2019
[15]

Estimating energy consumption of cloud, fog, and edge computing infrastructures,

E. Ahvar, A.-C. Orgerie, and A. Lebre, “Estimating energy consumption of cloud, fog, and edge computing infrastructures,”IEEE Transactions on Sustainable Computing, vol. 7, no. 2, pp. 277–288, 2019

work page 2019
[16]

On the cost–qoe tradeoff for cloud-based video streaming under amazon ec2’s pricing models,

J. He, Y . Wen, J. Huang, and D. Wu, “On the cost–qoe tradeoff for cloud-based video streaming under amazon ec2’s pricing models,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 4, pp. 669–680, 2013

work page 2013
[17]

Appealnet: An efficient and highly-accurate edge/cloud collaborative architecture for dnn infer- ence,

M. Li, Y . Li, Y . Tian, L. Jiang, and Q. Xu, “Appealnet: An efficient and highly-accurate edge/cloud collaborative architecture for dnn infer- ence,” in2021 58th ACM/IEEE Design Automation Conference (DAC), pp. 409–414, IEEE, 2021

work page 2021
[18]

Jalad: Joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution,

H. Li, C. Hu, J. Jiang, Z. Wang, Y . Wen, and W. Zhu, “Jalad: Joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution,” in2018 IEEE 24th international conference on parallel and distributed systems (ICPADS), pp. 671–678, IEEE, 2018

work page 2018
[19]

Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,

Y . Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang, “Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,”ACM SIGARCH Computer Architecture News, vol. 45, no. 1, pp. 615–629, 2017

work page 2017
[20]

Cnnpc: End- edge-cloud collaborative cnn inference with joint model partition and compression,

S. Yang, Z. Zhang, C. Zhao, X. Song, S. Guo, and H. Li, “Cnnpc: End- edge-cloud collaborative cnn inference with joint model partition and compression,”IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 12, pp. 4039–4056, 2022

work page 2022
[21]

Nephele: efficient parallel data processing in the cloud,

D. Warneke and O. Kao, “Nephele: efficient parallel data processing in the cloud,” inProceedings of the 2nd workshop on many-task computing on grids and supercomputers, pp. 1–10, 2009

work page 2009
[22]

A survey on vehicular cloud computing,

M. Whaiduzzaman, M. Sookhak, A. Gani, and R. Buyya, “A survey on vehicular cloud computing,”Journal of Network and Computer applications, vol. 40, pp. 325–344, 2014

work page 2014
[23]

Exploring edge computing for multitier industrial control,

Y . Ma and et al., “Exploring edge computing for multitier industrial control,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 11, pp. 3506–3518, 2020

work page 2020
[24]

Impact of delays and computation placement on sense-act application performance in iot,

P. Sharma and M. B. Srivastava, “Impact of delays and computation placement on sense-act application performance in iot,” inMILCOM 2023-2023 IEEE Military Communications Conference (MILCOM), pp. 133–138, IEEE, 2023

work page 2023
[25]

Towards a performance-driven device-edge-cloud relationship,

P. Sharma, B. Wang, X. Ouyang, R. Nanayakkara, B. Balaji, P. Tabuada, and M. B. Srivastava, “Towards a performance-driven device-edge-cloud relationship,” inProceedings of the 26th International Workshop on Mobile Computing Systems and Applications, pp. 125–125, 2025

work page 2025
[26]

A stochastic model to investigate data center performance and qos in iaas cloud computing systems,

D. Bruneo, “A stochastic model to investigate data center performance and qos in iaas cloud computing systems,”IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 3, pp. 560–569, 2013

work page 2013
[27]

Optimality analysis of energy-performance trade-off for server farm management,

A. Gandhi, V . Gupta, M. Harchol-Balter, and M. A. Kozuch, “Optimality analysis of energy-performance trade-off for server farm management,” Performance Evaluation, vol. 67, no. 11, pp. 1155–1171, 2010

work page 2010
[28]

Exact analysis of the m/m/k/setup class of markov chains via recursive renewal reward,

A. Gandhi, S. Doroudi, M. Harchol-Balter, and A. Scheller-Wolf, “Exact analysis of the m/m/k/setup class of markov chains via recursive renewal reward,” inProceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems, pp. 153– 166, 2013

work page 2013
[29]

Performance analysis of cloud computing centers using m/g/m/m+ r queuing systems,

H. Khazaei, J. Misic, and V . B. Misic, “Performance analysis of cloud computing centers using m/g/m/m+ r queuing systems,”IEEE Transactions on parallel and distributed systems, vol. 23, no. 5, pp. 936– 943, 2011

work page 2011
[30]

Toward inference delivery networks: distributing machine learning with optimality guarantees,

T. S. Salem, G. Castellano, G. Neglia, F. Pianese, and A. Araldo, “Toward inference delivery networks: distributing machine learning with optimality guarantees,”IEEE/ACM Transactions on Networking, vol. 32, no. 1, pp. 859–873, 2023

work page 2023
[31]

Performance analysis of cloud computing using queuing models,

P. S. Varma, A. Satyanarayana, and M. R. Sundari, “Performance analysis of cloud computing using queuing models,” in2012 Interna- tional Conference on Cloud Computing Technologies, Applications and Management (ICCCTAM), pp. 12–15, IEEE, 2012

work page 2012
[32]

The hidden cost of the edge: a performance comparison of edge and cloud latencies,

A. Ali-Eldin, B. Wang, and P. Shenoy, “The hidden cost of the edge: a performance comparison of edge and cloud latencies,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12, 2021

work page 2021
[33]

Aws computing resources

“Aws computing resources.” https://docs.aws.amazon.com/

work page
[34]

Google distributed cloud

“Google distributed cloud.” https://cloud.google.com/ distributed-cloud-connected?hl=en

work page
[35]

Aws load balancing

“Aws load balancing.” https://aws.amazon.com/what-is/load-balancing/

work page
[36]

Aws batch

“Aws batch.” https://aws.amazon.com/batch/

work page
[37]

Ood-maml: Meta-learning for few-shot out-of- distribution detection and classification,

T. Jeong and H. Kim, “Ood-maml: Meta-learning for few-shot out-of- distribution detection and classification,”Advances in Neural Informa- tion Processing Systems, vol. 33, pp. 3907–3916, 2020

work page 2020
[38]

Machine learn- ing for 6g wireless networks: Carrying forward enhanced bandwidth, massive access, and ultrareliable/low-latency service,

J. Du, C. Jiang, J. Wang, Y . Ren, and M. Debbah, “Machine learn- ing for 6g wireless networks: Carrying forward enhanced bandwidth, massive access, and ultrareliable/low-latency service,”IEEE V ehicular Technology Magazine, vol. 15, no. 4, pp. 122–134, 2020

work page 2020
[39]

A queuing theory model for cloud computing,

J. Vilaplana, F. Solsona, I. Teixid ´o, J. Mateo, F. Abella, and J. Rius, “A queuing theory model for cloud computing,”The Journal of Supercom- puting, vol. 69, pp. 492–507, 2014

work page 2014
[40]

Carla: An open urban driving simulator,

A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “Carla: An open urban driving simulator,” inConference on robot learning, pp. 1–16, PMLR, 2017

work page 2017
[41]

Ultralytics yolo11,

G. Jocher and J. Qiu, “Ultralytics yolo11,” 2024

work page 2024
[42]

Microsoft coco: Common objects in context,

T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” inComputer vision–ECCV 2014: 13th European conference, zurich, Switzerland, September 6-12, 2014, proceedings, part v 13, pp. 740–755, Springer, 2014

work page 2014
[43]

Imbalance in the cloud: An analysis on alibaba cluster trace,

C. Lu, K. Ye, G. Xu, C.-Z. Xu, and T. Bai, “Imbalance in the cloud: An analysis on alibaba cluster trace,” in2017 IEEE International Conference on Big Data (Big Data), pp. 2884–2892, IEEE, 2017

work page 2017
[44]

Drivers’ brake reaction times,

G. Johansson and K. Rumar, “Drivers’ brake reaction times,”Human factors, vol. 13, no. 1, pp. 23–27, 1971

work page 1971
[45]

Nvidia nsight systems

“Nvidia nsight systems.” https://developer.nvidia.com/nsight-systems

work page
[46]

Nvidia tegrastats

“Nvidia tegrastats.” https://docs.nvidia.com/util-tegrastats.html

work page

[1] [1]

A review of cyber-physical system research relevant to the emerging it trends: industry 4.0, iot, big data, and cloud computing,

J. H. Kim, “A review of cyber-physical system research relevant to the emerging it trends: industry 4.0, iot, big data, and cloud computing,” Journal of industrial integration and management, vol. 2, no. 03, p. 1750011, 2017

work page 2017

[2] [2]

Multimodal data fusion: an overview of methods, challenges, and prospects,

D. Lahat, T. Adali, and C. Jutten, “Multimodal data fusion: an overview of methods, challenges, and prospects,”Proceedings of the IEEE, vol. 103, no. 9, pp. 1449–1477, 2015

work page 2015

[3] [3]

Edge computing for autonomous driving: Opportunities and challenges,

S. Liu, L. Liu, J. Tang, B. Yu, Y . Wang, and W. Shi, “Edge computing for autonomous driving: Opportunities and challenges,”Proceedings of the IEEE, vol. 107, no. 8, pp. 1697–1716, 2019

work page 2019

[4] [4]

“Waymo.” https://www.waymo.com

work page

[5] [5]

Compute solution for tesla’s full self-driving computer,

E. Talpes, D. D. Sarma, G. Venkataramanan, P. Bannon, B. McGee, B. Floering, A. Jalote, C. Hsiong, S. Arora, A. Gorti,et al., “Compute solution for tesla’s full self-driving computer,”IEEE Micro, vol. 40, no. 2, pp. 25–35, 2020

work page 2020

[6] [6]

Life cycle assessment of connected and automated vehicles: sensing and computing subsystem and vehicle level effects,

J. H. Gawron, G. A. Keoleian, R. D. De Kleine, T. J. Wallington, and H. C. Kim, “Life cycle assessment of connected and automated vehicles: sensing and computing subsystem and vehicle level effects,” Environmental science & technology, vol. 52, no. 5, 2018

work page 2018

[7] [7]

Data centers on wheels: Emissions from computing onboard autonomous vehicles,

S. Sudhakar, V . Sze, and S. Karaman, “Data centers on wheels: Emissions from computing onboard autonomous vehicles,”IEEE Micro, vol. 43, no. 1, pp. 29–39, 2022

work page 2022

[8] [8]

Amazon rekognition

“Amazon rekognition.” https://aws.amazon.com/rekognition/

work page

[9] [9]

Aws local cloud

“Aws local cloud.” https://aws.amazon.com/about-aws/ global-infrastructure/localzones/

work page

[10] [10]

Latency comparison of cloud datacenters and edge servers,

B. Charyyev, E. Arslan, and M. H. Gunes, “Latency comparison of cloud datacenters and edge servers,” inGLOBECOM 2020-2020 IEEE Global Communications Conference, pp. 1–6, IEEE, 2020

work page 2020

[11] [11]

A dynamic offloading algorithm for mobile computing,

D. Huang, P. Wang, and D. Niyato, “A dynamic offloading algorithm for mobile computing,”IEEE Transactions on Wireless Communications, vol. 11, no. 6, pp. 1991–1995, 2012

work page 1991

[12] [12]

To offload or not to offload? the bandwidth and energy costs of mobile cloud computing,

M. V . Barbera, S. Kosta, A. Mei, and J. Stefa, “To offload or not to offload? the bandwidth and energy costs of mobile cloud computing,” in2013 Proceedings Ieee Infocom, pp. 1285–1293, IEEE, 2013

work page 2013

[13] [13]

Joint model and data adaptation for cloud inference serving,

J. Jiang, Z. Luo, C. Hu, Z. He, Z. Wang, S. Xia, and C. Wu, “Joint model and data adaptation for cloud inference serving,” in2021 IEEE Real-Time Systems Symposium (RTSS), pp. 279–289, IEEE, 2021

work page 2021

[14] [14]

Collaborative cloud and edge computing for latency minimization,

J. Ren, G. Yu, Y . He, and G. Y . Li, “Collaborative cloud and edge computing for latency minimization,”IEEE Transactions on V ehicular Technology, vol. 68, no. 5, pp. 5031–5044, 2019

work page 2019

[15] [15]

Estimating energy consumption of cloud, fog, and edge computing infrastructures,

E. Ahvar, A.-C. Orgerie, and A. Lebre, “Estimating energy consumption of cloud, fog, and edge computing infrastructures,”IEEE Transactions on Sustainable Computing, vol. 7, no. 2, pp. 277–288, 2019

work page 2019

[16] [16]

On the cost–qoe tradeoff for cloud-based video streaming under amazon ec2’s pricing models,

J. He, Y . Wen, J. Huang, and D. Wu, “On the cost–qoe tradeoff for cloud-based video streaming under amazon ec2’s pricing models,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 4, pp. 669–680, 2013

work page 2013

[17] [17]

Appealnet: An efficient and highly-accurate edge/cloud collaborative architecture for dnn infer- ence,

M. Li, Y . Li, Y . Tian, L. Jiang, and Q. Xu, “Appealnet: An efficient and highly-accurate edge/cloud collaborative architecture for dnn infer- ence,” in2021 58th ACM/IEEE Design Automation Conference (DAC), pp. 409–414, IEEE, 2021

work page 2021

[18] [18]

Jalad: Joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution,

H. Li, C. Hu, J. Jiang, Z. Wang, Y . Wen, and W. Zhu, “Jalad: Joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution,” in2018 IEEE 24th international conference on parallel and distributed systems (ICPADS), pp. 671–678, IEEE, 2018

work page 2018

[19] [19]

Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,

Y . Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang, “Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,”ACM SIGARCH Computer Architecture News, vol. 45, no. 1, pp. 615–629, 2017

work page 2017

[20] [20]

Cnnpc: End- edge-cloud collaborative cnn inference with joint model partition and compression,

S. Yang, Z. Zhang, C. Zhao, X. Song, S. Guo, and H. Li, “Cnnpc: End- edge-cloud collaborative cnn inference with joint model partition and compression,”IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 12, pp. 4039–4056, 2022

work page 2022

[21] [21]

Nephele: efficient parallel data processing in the cloud,

D. Warneke and O. Kao, “Nephele: efficient parallel data processing in the cloud,” inProceedings of the 2nd workshop on many-task computing on grids and supercomputers, pp. 1–10, 2009

work page 2009

[22] [22]

A survey on vehicular cloud computing,

M. Whaiduzzaman, M. Sookhak, A. Gani, and R. Buyya, “A survey on vehicular cloud computing,”Journal of Network and Computer applications, vol. 40, pp. 325–344, 2014

work page 2014

[23] [23]

Exploring edge computing for multitier industrial control,

Y . Ma and et al., “Exploring edge computing for multitier industrial control,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 11, pp. 3506–3518, 2020

work page 2020

[24] [24]

Impact of delays and computation placement on sense-act application performance in iot,

P. Sharma and M. B. Srivastava, “Impact of delays and computation placement on sense-act application performance in iot,” inMILCOM 2023-2023 IEEE Military Communications Conference (MILCOM), pp. 133–138, IEEE, 2023

work page 2023

[25] [25]

Towards a performance-driven device-edge-cloud relationship,

P. Sharma, B. Wang, X. Ouyang, R. Nanayakkara, B. Balaji, P. Tabuada, and M. B. Srivastava, “Towards a performance-driven device-edge-cloud relationship,” inProceedings of the 26th International Workshop on Mobile Computing Systems and Applications, pp. 125–125, 2025

work page 2025

[26] [26]

A stochastic model to investigate data center performance and qos in iaas cloud computing systems,

D. Bruneo, “A stochastic model to investigate data center performance and qos in iaas cloud computing systems,”IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 3, pp. 560–569, 2013

work page 2013

[27] [27]

Optimality analysis of energy-performance trade-off for server farm management,

A. Gandhi, V . Gupta, M. Harchol-Balter, and M. A. Kozuch, “Optimality analysis of energy-performance trade-off for server farm management,” Performance Evaluation, vol. 67, no. 11, pp. 1155–1171, 2010

work page 2010

[28] [28]

Exact analysis of the m/m/k/setup class of markov chains via recursive renewal reward,

A. Gandhi, S. Doroudi, M. Harchol-Balter, and A. Scheller-Wolf, “Exact analysis of the m/m/k/setup class of markov chains via recursive renewal reward,” inProceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems, pp. 153– 166, 2013

work page 2013

[29] [29]

Performance analysis of cloud computing centers using m/g/m/m+ r queuing systems,

H. Khazaei, J. Misic, and V . B. Misic, “Performance analysis of cloud computing centers using m/g/m/m+ r queuing systems,”IEEE Transactions on parallel and distributed systems, vol. 23, no. 5, pp. 936– 943, 2011

work page 2011

[30] [30]

Toward inference delivery networks: distributing machine learning with optimality guarantees,

T. S. Salem, G. Castellano, G. Neglia, F. Pianese, and A. Araldo, “Toward inference delivery networks: distributing machine learning with optimality guarantees,”IEEE/ACM Transactions on Networking, vol. 32, no. 1, pp. 859–873, 2023

work page 2023

[31] [31]

Performance analysis of cloud computing using queuing models,

P. S. Varma, A. Satyanarayana, and M. R. Sundari, “Performance analysis of cloud computing using queuing models,” in2012 Interna- tional Conference on Cloud Computing Technologies, Applications and Management (ICCCTAM), pp. 12–15, IEEE, 2012

work page 2012

[32] [32]

The hidden cost of the edge: a performance comparison of edge and cloud latencies,

A. Ali-Eldin, B. Wang, and P. Shenoy, “The hidden cost of the edge: a performance comparison of edge and cloud latencies,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12, 2021

work page 2021

[33] [33]

Aws computing resources

“Aws computing resources.” https://docs.aws.amazon.com/

work page

[34] [34]

Google distributed cloud

“Google distributed cloud.” https://cloud.google.com/ distributed-cloud-connected?hl=en

work page

[35] [35]

Aws load balancing

“Aws load balancing.” https://aws.amazon.com/what-is/load-balancing/

work page

[36] [36]

Aws batch

“Aws batch.” https://aws.amazon.com/batch/

work page

[37] [37]

Ood-maml: Meta-learning for few-shot out-of- distribution detection and classification,

T. Jeong and H. Kim, “Ood-maml: Meta-learning for few-shot out-of- distribution detection and classification,”Advances in Neural Informa- tion Processing Systems, vol. 33, pp. 3907–3916, 2020

work page 2020

[38] [38]

Machine learn- ing for 6g wireless networks: Carrying forward enhanced bandwidth, massive access, and ultrareliable/low-latency service,

J. Du, C. Jiang, J. Wang, Y . Ren, and M. Debbah, “Machine learn- ing for 6g wireless networks: Carrying forward enhanced bandwidth, massive access, and ultrareliable/low-latency service,”IEEE V ehicular Technology Magazine, vol. 15, no. 4, pp. 122–134, 2020

work page 2020

[39] [39]

A queuing theory model for cloud computing,

J. Vilaplana, F. Solsona, I. Teixid ´o, J. Mateo, F. Abella, and J. Rius, “A queuing theory model for cloud computing,”The Journal of Supercom- puting, vol. 69, pp. 492–507, 2014

work page 2014

[40] [40]

Carla: An open urban driving simulator,

A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “Carla: An open urban driving simulator,” inConference on robot learning, pp. 1–16, PMLR, 2017

work page 2017

[41] [41]

Ultralytics yolo11,

G. Jocher and J. Qiu, “Ultralytics yolo11,” 2024

work page 2024

[42] [42]

Microsoft coco: Common objects in context,

T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” inComputer vision–ECCV 2014: 13th European conference, zurich, Switzerland, September 6-12, 2014, proceedings, part v 13, pp. 740–755, Springer, 2014

work page 2014

[43] [43]

Imbalance in the cloud: An analysis on alibaba cluster trace,

C. Lu, K. Ye, G. Xu, C.-Z. Xu, and T. Bai, “Imbalance in the cloud: An analysis on alibaba cluster trace,” in2017 IEEE International Conference on Big Data (Big Data), pp. 2884–2892, IEEE, 2017

work page 2017

[44] [44]

Drivers’ brake reaction times,

G. Johansson and K. Rumar, “Drivers’ brake reaction times,”Human factors, vol. 13, no. 1, pp. 23–27, 1971

work page 1971

[45] [45]

Nvidia nsight systems

“Nvidia nsight systems.” https://developer.nvidia.com/nsight-systems

work page

[46] [46]

Nvidia tegrastats

“Nvidia tegrastats.” https://docs.nvidia.com/util-tegrastats.html

work page