pith. sign in

arxiv: 2605.00005 · v1 · submitted 2026-02-17 · 💻 cs.LG · cs.AI· cs.DC· cs.NI

Cloud Is Closer Than It Appears: Revisiting the Tradeoffs of Distributed Real-Time Inference

Pith reviewed 2026-05-15 21:16 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.DCcs.NI
keywords cloud inferencereal-time systemscyber-physical systemsautonomous drivinglatency tradeoffsemergency brakingon-device vs clouddistributed inference
0
0 comments X

The pith

When given high-throughput compute, cloud platforms can match or surpass on-device inference for real-time emergency braking in autonomous driving.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper challenges the long-standing preference for on-device inference in cyber-physical systems, which avoids network delays at the cost of local energy and compute burdens. It builds a formal analytical model linking inference latency to sensing frequency, platform throughput, network delay, and safety margins. Simulations of emergency braking show that sufficient cloud resources let remote inference amortize those delays and stick to safety deadlines more consistently than local hardware. The work concludes that cloud is often the better choice for such tasks rather than a fallback option.

Core claim

We develop a formal analytical model that characterizes distributed inference latency as a function of the sensing frequency, platform throughput, network delay, and task-specific safety constraints. We instantiate this model in the context of emergency braking for autonomous driving and validate it through extensive simulations using real-time vehicular dynamics. Our empirical results identify concrete conditions under which cloud-based inference adheres to safety margins more reliably than its on-device counterpart.

What carries the argument

A formal analytical model of distributed inference latency expressed in terms of sensing frequency, platform throughput, network delay, and task-specific safety constraints.

Load-bearing premise

The analytical model accurately captures real-world network variability, queueing delays, and safety constraints when instantiated only through simulations of vehicular dynamics.

What would settle it

A physical testbed deployment in which measured network and queueing delays cause cloud inference to violate braking safety margins more often than on-device inference.

Figures

Figures reproduced from arXiv: 2605.00005 by Hang Qiu, Mani Srivastava, Pragya Sharma.

Figure 1
Figure 1. Figure 1: Temporal dynamics of emergency braking scenario. The ego vehicle traveling at v0 m/s detects an obstacle at time tdet. A braking command is issued at tbrake following inference execution either on￾device (m, d) or on cloud (m, c) platform. The vehicle stops at tstop. to offload a task to a remote server [12], and how to offload data efficiently [13]. The service placement problem itself has been addressed … view at source ↗
Figure 2
Figure 2. Figure 2: (Left) Latency distributions for WiFi and 5G networks; (Right) Inference latency as a function of GPU utilization for cloud and on-device deployments. To evaluate the perception-to-action latency in safety-critical settings, we introduce a static obstacle approximately 300 m ahead of the vehicle’s trajectory. The perception module uses the YOLO11 [41] family of object detectors, which are pre￾trained on th… view at source ↗
Figure 3
Figure 3. Figure 3: Platform-wise braking performance under (a) baseline, (b) tail latency, (c) concurrent workload, and (d) varying obstacle scenarios. Each horizontal series corresponds to a specific vehicle–platform–speed configuration. Marker positions indicate distances at perception, brake reception, and vehicle stop. Line styles represent vehicle types, and unsafe outcomes are flagged in red. both cloud and on-device d… view at source ↗
read the original abstract

The increasing deployment of deep neural networks (DNNs) in cyber-physical systems (CPS) enhances perception fidelity, but imposes substantial computational demands on execution platforms, posing challenges to real-time control deadlines. Traditional distributed CPS architectures typically favor on-device inference to avoid network variability and contention-induced delays on remote platforms. However, this design choice places significant energy and computational demands on the local hardware. In this work, we revisit the assumption that cloud-based inference is intrinsically unsuitable for latency-sensitive control tasks. We demonstrate that, when provisioned with high-throughput compute resources, cloud platforms can effectively amortize network and queueing delays, enabling them to match or surpass on-device performance for real-time decision-making. Specifically, we develop a formal analytical model that characterizes distributed inference latency as a function of the sensing frequency, platform throughput, network delay, and task-specific safety constraints. We instantiate this model in the context of emergency braking for autonomous driving and validate it through extensive simulations using real-time vehicular dynamics. Our empirical results identify concrete conditions under which cloud-based inference adheres to safety margins more reliably than its on-device counterpart. These findings challenge prevailing design strategies and suggest that the cloud is not merely a feasible option, but often the preferred inference location for distributed CPS architectures. In this light, the cloud is not as distant as traditionally perceived; in fact, it is closer than it appears.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that cloud-based DNN inference for cyber-physical systems can match or surpass on-device performance for real-time control tasks such as emergency braking when provisioned with high-throughput resources. It develops a formal analytical model expressing distributed inference latency in terms of sensing frequency, platform throughput, network delay, and task-specific safety margins; the model is instantiated for autonomous driving and validated exclusively through simulations driven by real-time vehicular dynamics, yielding concrete conditions under which cloud inference adheres more reliably to safety margins.

Significance. If the analytical model and simulation results hold under realistic conditions, the work would challenge the prevailing preference for on-device inference in latency-sensitive CPS, potentially enabling lower on-device energy use and simpler hardware while maintaining safety. The identification of explicit provisioning thresholds for cloud superiority would be a useful design guideline.

major comments (2)
  1. [Analytical Model] The central claim rests on the formal analytical model showing that high-throughput cloud amortizes network and queueing delays to meet or beat on-device safety margins. However, the manuscript provides no explicit equations, parameter definitions, or derivation for the latency expression (described only at the level of the abstract), preventing verification that the model is free of circularity or post-hoc fitting and that the reported conditions are not artifacts of the chosen distributions.
  2. [Simulation Validation] Validation occurs exclusively via simulations using real-time vehicular dynamics for the emergency-braking scenario. No real cloud latency traces, production queueing measurements, or hardware-in-the-loop experiments are reported; if actual tail latencies or contention exceed the modeled distributions, the concrete conditions favoring cloud inference may not hold.
minor comments (1)
  1. [Abstract] The abstract states that the model is 'instantiated' and 'validated through extensive simulations' but supplies no numerical parameter values, error bars, or exclusion criteria, making it difficult to reproduce or assess robustness.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to improve the presentation of the analytical model and to clarify the scope and limitations of the simulation-based validation.

read point-by-point responses
  1. Referee: [Analytical Model] The central claim rests on the formal analytical model showing that high-throughput cloud amortizes network and queueing delays to meet or beat on-device safety margins. However, the manuscript provides no explicit equations, parameter definitions, or derivation for the latency expression (described only at the level of the abstract), preventing verification that the model is free of circularity or post-hoc fitting and that the reported conditions are not artifacts of the chosen distributions.

    Authors: The referee correctly observes that the current version of the manuscript does not present the explicit equations, parameter definitions, or step-by-step derivation of the latency model. We will revise the paper to include a dedicated subsection that states the full analytical expression for distributed inference latency, defines every parameter (sensing frequency, platform throughput, network delay, safety margins), and provides the complete derivation. This addition will make it possible to verify that the model contains no circularity and that the reported conditions follow directly from the stated assumptions and distributions. revision: yes

  2. Referee: [Simulation Validation] Validation occurs exclusively via simulations using real-time vehicular dynamics for the emergency-braking scenario. No real cloud latency traces, production queueing measurements, or hardware-in-the-loop experiments are reported; if actual tail latencies or contention exceed the modeled distributions, the concrete conditions favoring cloud inference may not hold.

    Authors: Validation is performed through simulations that incorporate real-time vehicular dynamics for the emergency-braking task. We agree that real cloud latency traces and hardware-in-the-loop experiments would provide stronger evidence. In the revised manuscript we will add an expanded limitations section that includes sensitivity analysis with respect to tail latencies and contention, explicitly states the distributional assumptions, and notes that the framework can be re-instantiated with production traces when they become available. We do not claim the current results substitute for such measurements. revision: partial

Circularity Check

0 steps flagged

Analytical model is self-contained with no circular reduction

full rationale

The paper defines a formal analytical model expressing distributed inference latency directly as a function of sensing frequency, platform throughput, network delay, and task-specific safety constraints. This is a definitional mapping, not a fitted parameter or self-referential loop. The model is then instantiated for the emergency-braking scenario and validated via simulations driven by real-time vehicular dynamics. No equations or steps reduce any 'prediction' back to the inputs by construction, no self-citations are load-bearing, and no ansatz or uniqueness result is smuggled in. The derivation chain is independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on an unshown analytical latency model whose inputs are treated as given; no free parameters are explicitly fitted in the abstract, and no new physical entities are postulated.

axioms (1)
  • domain assumption Network and queueing delays can be characterized as deterministic functions of sensing frequency, platform throughput, and task safety constraints
    Invoked when the paper states the model 'characterizes distributed inference latency as a function of' those quantities

pith-pipeline@v0.9.0 · 5556 in / 1274 out tokens · 30164 ms · 2026-05-15T21:16:33.662872+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

  1. [1]

    A review of cyber-physical system research relevant to the emerging it trends: industry 4.0, iot, big data, and cloud computing,

    J. H. Kim, “A review of cyber-physical system research relevant to the emerging it trends: industry 4.0, iot, big data, and cloud computing,” Journal of industrial integration and management, vol. 2, no. 03, p. 1750011, 2017

  2. [2]

    Multimodal data fusion: an overview of methods, challenges, and prospects,

    D. Lahat, T. Adali, and C. Jutten, “Multimodal data fusion: an overview of methods, challenges, and prospects,”Proceedings of the IEEE, vol. 103, no. 9, pp. 1449–1477, 2015

  3. [3]

    Edge computing for autonomous driving: Opportunities and challenges,

    S. Liu, L. Liu, J. Tang, B. Yu, Y . Wang, and W. Shi, “Edge computing for autonomous driving: Opportunities and challenges,”Proceedings of the IEEE, vol. 107, no. 8, pp. 1697–1716, 2019

  4. [4]

    “Waymo.” https://www.waymo.com

  5. [5]

    Compute solution for tesla’s full self-driving computer,

    E. Talpes, D. D. Sarma, G. Venkataramanan, P. Bannon, B. McGee, B. Floering, A. Jalote, C. Hsiong, S. Arora, A. Gorti,et al., “Compute solution for tesla’s full self-driving computer,”IEEE Micro, vol. 40, no. 2, pp. 25–35, 2020

  6. [6]

    Life cycle assessment of connected and automated vehicles: sensing and computing subsystem and vehicle level effects,

    J. H. Gawron, G. A. Keoleian, R. D. De Kleine, T. J. Wallington, and H. C. Kim, “Life cycle assessment of connected and automated vehicles: sensing and computing subsystem and vehicle level effects,” Environmental science & technology, vol. 52, no. 5, 2018

  7. [7]

    Data centers on wheels: Emissions from computing onboard autonomous vehicles,

    S. Sudhakar, V . Sze, and S. Karaman, “Data centers on wheels: Emissions from computing onboard autonomous vehicles,”IEEE Micro, vol. 43, no. 1, pp. 29–39, 2022

  8. [8]

    Amazon rekognition

    “Amazon rekognition.” https://aws.amazon.com/rekognition/

  9. [9]

    Aws local cloud

    “Aws local cloud.” https://aws.amazon.com/about-aws/ global-infrastructure/localzones/

  10. [10]

    Latency comparison of cloud datacenters and edge servers,

    B. Charyyev, E. Arslan, and M. H. Gunes, “Latency comparison of cloud datacenters and edge servers,” inGLOBECOM 2020-2020 IEEE Global Communications Conference, pp. 1–6, IEEE, 2020

  11. [11]

    A dynamic offloading algorithm for mobile computing,

    D. Huang, P. Wang, and D. Niyato, “A dynamic offloading algorithm for mobile computing,”IEEE Transactions on Wireless Communications, vol. 11, no. 6, pp. 1991–1995, 2012

  12. [12]

    To offload or not to offload? the bandwidth and energy costs of mobile cloud computing,

    M. V . Barbera, S. Kosta, A. Mei, and J. Stefa, “To offload or not to offload? the bandwidth and energy costs of mobile cloud computing,” in2013 Proceedings Ieee Infocom, pp. 1285–1293, IEEE, 2013

  13. [13]

    Joint model and data adaptation for cloud inference serving,

    J. Jiang, Z. Luo, C. Hu, Z. He, Z. Wang, S. Xia, and C. Wu, “Joint model and data adaptation for cloud inference serving,” in2021 IEEE Real-Time Systems Symposium (RTSS), pp. 279–289, IEEE, 2021

  14. [14]

    Collaborative cloud and edge computing for latency minimization,

    J. Ren, G. Yu, Y . He, and G. Y . Li, “Collaborative cloud and edge computing for latency minimization,”IEEE Transactions on V ehicular Technology, vol. 68, no. 5, pp. 5031–5044, 2019

  15. [15]

    Estimating energy consumption of cloud, fog, and edge computing infrastructures,

    E. Ahvar, A.-C. Orgerie, and A. Lebre, “Estimating energy consumption of cloud, fog, and edge computing infrastructures,”IEEE Transactions on Sustainable Computing, vol. 7, no. 2, pp. 277–288, 2019

  16. [16]

    On the cost–qoe tradeoff for cloud-based video streaming under amazon ec2’s pricing models,

    J. He, Y . Wen, J. Huang, and D. Wu, “On the cost–qoe tradeoff for cloud-based video streaming under amazon ec2’s pricing models,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 4, pp. 669–680, 2013

  17. [17]

    Appealnet: An efficient and highly-accurate edge/cloud collaborative architecture for dnn infer- ence,

    M. Li, Y . Li, Y . Tian, L. Jiang, and Q. Xu, “Appealnet: An efficient and highly-accurate edge/cloud collaborative architecture for dnn infer- ence,” in2021 58th ACM/IEEE Design Automation Conference (DAC), pp. 409–414, IEEE, 2021

  18. [18]

    Jalad: Joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution,

    H. Li, C. Hu, J. Jiang, Z. Wang, Y . Wen, and W. Zhu, “Jalad: Joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution,” in2018 IEEE 24th international conference on parallel and distributed systems (ICPADS), pp. 671–678, IEEE, 2018

  19. [19]

    Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,

    Y . Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang, “Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,”ACM SIGARCH Computer Architecture News, vol. 45, no. 1, pp. 615–629, 2017

  20. [20]

    Cnnpc: End- edge-cloud collaborative cnn inference with joint model partition and compression,

    S. Yang, Z. Zhang, C. Zhao, X. Song, S. Guo, and H. Li, “Cnnpc: End- edge-cloud collaborative cnn inference with joint model partition and compression,”IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 12, pp. 4039–4056, 2022

  21. [21]

    Nephele: efficient parallel data processing in the cloud,

    D. Warneke and O. Kao, “Nephele: efficient parallel data processing in the cloud,” inProceedings of the 2nd workshop on many-task computing on grids and supercomputers, pp. 1–10, 2009

  22. [22]

    A survey on vehicular cloud computing,

    M. Whaiduzzaman, M. Sookhak, A. Gani, and R. Buyya, “A survey on vehicular cloud computing,”Journal of Network and Computer applications, vol. 40, pp. 325–344, 2014

  23. [23]

    Exploring edge computing for multitier industrial control,

    Y . Ma and et al., “Exploring edge computing for multitier industrial control,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 11, pp. 3506–3518, 2020

  24. [24]

    Impact of delays and computation placement on sense-act application performance in iot,

    P. Sharma and M. B. Srivastava, “Impact of delays and computation placement on sense-act application performance in iot,” inMILCOM 2023-2023 IEEE Military Communications Conference (MILCOM), pp. 133–138, IEEE, 2023

  25. [25]

    Towards a performance-driven device-edge-cloud relationship,

    P. Sharma, B. Wang, X. Ouyang, R. Nanayakkara, B. Balaji, P. Tabuada, and M. B. Srivastava, “Towards a performance-driven device-edge-cloud relationship,” inProceedings of the 26th International Workshop on Mobile Computing Systems and Applications, pp. 125–125, 2025

  26. [26]

    A stochastic model to investigate data center performance and qos in iaas cloud computing systems,

    D. Bruneo, “A stochastic model to investigate data center performance and qos in iaas cloud computing systems,”IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 3, pp. 560–569, 2013

  27. [27]

    Optimality analysis of energy-performance trade-off for server farm management,

    A. Gandhi, V . Gupta, M. Harchol-Balter, and M. A. Kozuch, “Optimality analysis of energy-performance trade-off for server farm management,” Performance Evaluation, vol. 67, no. 11, pp. 1155–1171, 2010

  28. [28]

    Exact analysis of the m/m/k/setup class of markov chains via recursive renewal reward,

    A. Gandhi, S. Doroudi, M. Harchol-Balter, and A. Scheller-Wolf, “Exact analysis of the m/m/k/setup class of markov chains via recursive renewal reward,” inProceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems, pp. 153– 166, 2013

  29. [29]

    Performance analysis of cloud computing centers using m/g/m/m+ r queuing systems,

    H. Khazaei, J. Misic, and V . B. Misic, “Performance analysis of cloud computing centers using m/g/m/m+ r queuing systems,”IEEE Transactions on parallel and distributed systems, vol. 23, no. 5, pp. 936– 943, 2011

  30. [30]

    Toward inference delivery networks: distributing machine learning with optimality guarantees,

    T. S. Salem, G. Castellano, G. Neglia, F. Pianese, and A. Araldo, “Toward inference delivery networks: distributing machine learning with optimality guarantees,”IEEE/ACM Transactions on Networking, vol. 32, no. 1, pp. 859–873, 2023

  31. [31]

    Performance analysis of cloud computing using queuing models,

    P. S. Varma, A. Satyanarayana, and M. R. Sundari, “Performance analysis of cloud computing using queuing models,” in2012 Interna- tional Conference on Cloud Computing Technologies, Applications and Management (ICCCTAM), pp. 12–15, IEEE, 2012

  32. [32]

    The hidden cost of the edge: a performance comparison of edge and cloud latencies,

    A. Ali-Eldin, B. Wang, and P. Shenoy, “The hidden cost of the edge: a performance comparison of edge and cloud latencies,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12, 2021

  33. [33]

    Aws computing resources

    “Aws computing resources.” https://docs.aws.amazon.com/

  34. [34]

    Google distributed cloud

    “Google distributed cloud.” https://cloud.google.com/ distributed-cloud-connected?hl=en

  35. [35]

    Aws load balancing

    “Aws load balancing.” https://aws.amazon.com/what-is/load-balancing/

  36. [36]

    Aws batch

    “Aws batch.” https://aws.amazon.com/batch/

  37. [37]

    Ood-maml: Meta-learning for few-shot out-of- distribution detection and classification,

    T. Jeong and H. Kim, “Ood-maml: Meta-learning for few-shot out-of- distribution detection and classification,”Advances in Neural Informa- tion Processing Systems, vol. 33, pp. 3907–3916, 2020

  38. [38]

    Machine learn- ing for 6g wireless networks: Carrying forward enhanced bandwidth, massive access, and ultrareliable/low-latency service,

    J. Du, C. Jiang, J. Wang, Y . Ren, and M. Debbah, “Machine learn- ing for 6g wireless networks: Carrying forward enhanced bandwidth, massive access, and ultrareliable/low-latency service,”IEEE V ehicular Technology Magazine, vol. 15, no. 4, pp. 122–134, 2020

  39. [39]

    A queuing theory model for cloud computing,

    J. Vilaplana, F. Solsona, I. Teixid ´o, J. Mateo, F. Abella, and J. Rius, “A queuing theory model for cloud computing,”The Journal of Supercom- puting, vol. 69, pp. 492–507, 2014

  40. [40]

    Carla: An open urban driving simulator,

    A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “Carla: An open urban driving simulator,” inConference on robot learning, pp. 1–16, PMLR, 2017

  41. [41]

    Ultralytics yolo11,

    G. Jocher and J. Qiu, “Ultralytics yolo11,” 2024

  42. [42]

    Microsoft coco: Common objects in context,

    T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” inComputer vision–ECCV 2014: 13th European conference, zurich, Switzerland, September 6-12, 2014, proceedings, part v 13, pp. 740–755, Springer, 2014

  43. [43]

    Imbalance in the cloud: An analysis on alibaba cluster trace,

    C. Lu, K. Ye, G. Xu, C.-Z. Xu, and T. Bai, “Imbalance in the cloud: An analysis on alibaba cluster trace,” in2017 IEEE International Conference on Big Data (Big Data), pp. 2884–2892, IEEE, 2017

  44. [44]

    Drivers’ brake reaction times,

    G. Johansson and K. Rumar, “Drivers’ brake reaction times,”Human factors, vol. 13, no. 1, pp. 23–27, 1971

  45. [45]

    Nvidia nsight systems

    “Nvidia nsight systems.” https://developer.nvidia.com/nsight-systems

  46. [46]

    Nvidia tegrastats

    “Nvidia tegrastats.” https://docs.nvidia.com/util-tegrastats.html