pith. sign in

arxiv: 2512.20946 · v4 · submitted 2025-12-24 · 💻 cs.NI

SLIDE: Simultaneous Model Downloading and Inference at the Wireless Network Edge

Pith reviewed 2026-05-16 20:04 UTC · model grok-4.3

classification 💻 cs.NI
keywords simultaneous downloading and inferencewireless edge networksAI model servingresource allocation optimizationtask throughput maximizationlatency reductionmulti-user systemslayered model splitting
0
0 comments X

The pith

The SLIDE framework allows mobile devices to begin inferring AI models with early layers while downloading the rest, cutting end-to-end latency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces SLIDE, a framework for simultaneous model downloading and inference in wireless networks to support on-device AI. Large models cause long waits if downloaded fully before inference starts, but SLIDE lets computation begin on received layers during ongoing download. The authors set up an optimization to maximize the number of tasks completed within latency limits by deciding layer allocations, bandwidth shares, and compute assignments for multiple users at once. They account for how inference time for a layer depends on when prior layers finished downloading and computing. An efficient algorithm solves this in polynomial time, and simulations confirm higher throughput than standard download-first methods.

Core claim

The SLIDE framework enables users to perform inference with downloaded layers while simultaneously receiving the remaining layers of the model. By jointly optimizing model provisioning, spectrum bandwidth allocation, and computing resource allocation for multi-user downlink systems, and accounting for recursive dependencies in inference latency across layers, an efficient polynomial-time algorithm yields solutions that significantly improve task throughput under latency and communication resource constraints compared to conventional model downloading schemes.

What carries the argument

The recursive latency dependencies across model layers in the SLIDE framework, where the inference time for each layer depends on the downloading bandwidth and computing resources allocated to all preceding layers.

If this is right

  • Task throughput is maximized by solving a joint optimization over model splits, bandwidth, and compute resources.
  • The approach achieves better performance than sequential download and inference under the same constraints.
  • Real-time AI inference services become more viable in next-generation mobile networks despite large model sizes.
  • An efficient algorithm computes the optimal allocation in polynomial time for practical deployment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • SLIDE could be combined with model compression techniques to further reduce download times for even larger models.
  • Dynamic adjustments to layer splits based on real-time channel conditions might enhance robustness in varying wireless environments.
  • Similar simultaneous processing ideas could apply to other data-intensive tasks like video analytics or sensor data processing at the edge.

Load-bearing premise

That AI models can be divided into independent layers for sequential inference without any drop in accuracy or extra overhead from the splitting process.

What would settle it

An experiment on real hardware showing whether splitting a neural network model for layer-by-layer inference maintains the same accuracy as full-model inference while measuring actual latency savings in a wireless setup.

Figures

Figures reproduced from arXiv: 2512.20946 by Guanqiao Qu, Qian Chen, Sheng Zhou, Tao Li, Xianhao Chen.

Figure 1
Figure 1. Figure 1: The E2E latency in wireless networks, where an end user downloads an [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The proposed SLIDE framework, where users start inference with [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The procedures of conventional DAI and the proposed SLIDE [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Experimental hardware system with an edge server (functioning as a [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Served user ratio of SLIDE, evaluated on the Jetson Orin Nano and Jetson Orin NX running at GPU frequencies of 624.75 MHz and 918 MHz, [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Performance of SLIDE under different model libraries, where the [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Performance of SLIDE in mobile scenarios, where the default values [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Performance comparison of SLIDE and conventional DAI on Jetson Orin NX ( [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Ablation study on spectrum bandwidth allocation, model provisioning, and computing resource allocation. The default values of GPU frequencies, [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Running time comparison between the proposed algorithm and the [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗
read the original abstract

To support on-device inference, the next-generation mobile networks are expected to support real-time model downloading services to mobile users. However, powerful AI models typically have large model sizes, resulting in excessive end-to-end (E2E) downloading-and-inference (DAI) latency. To address this issue, we propose a simultaneous model downloading and inference (SLIDE) framework, which allows users to perform inference with downloaded layers while simultaneously receiving the remaining layers of the model. To this end, we formulate a task throughput maximization problem by jointly optimizing model provisioning, spectrum bandwidth allocation, and computing resource allocation for multi-user downlink systems. Unlike traditional DAI frameworks, SLIDE introduces recursive dependencies across layers, where inference latency depends recursively on the downloading bandwidth and computing resource allocation for each of the preceding layers. To solve this challenging problem, we design an efficient algorithm that acquires the optimal solution with polynomial-time complexity. Simulation results demonstrate that the proposed SLIDE framework significantly improves task throughput under latency and communication resource constraints compared with the conventional model downloading schemes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes the SLIDE framework to reduce end-to-end latency for on-device AI inference by enabling simultaneous model layer downloading and inference on already-received layers in multi-user wireless downlink systems. It formulates a task throughput maximization problem that jointly optimizes model provisioning (layer allocation), spectrum bandwidth, and computing resource allocation, explicitly incorporating recursive per-layer latency dependencies. The resulting non-convex optimization is solved by a polynomial-time algorithm whose optimality is asserted for the multi-user setting, with simulations claiming substantial throughput gains over conventional download-and-inference baselines under latency and resource constraints.

Significance. If the recursive latency model and optimality claims hold, the work offers a practical mechanism for overlapping communication and computation phases in edge AI, which could improve task throughput in bandwidth- and latency-constrained 5G/6G scenarios. The polynomial-time solvability is a concrete strength for real-time deployment, provided the formulation avoids hidden overheads from partial-layer execution.

major comments (2)
  1. [§3] §3 (System Model): The recursive latency dependency (inference time for layer k depending on prior-layer bandwidth and compute allocations) is load-bearing for the claimed novelty over DAI; the manuscript must supply the exact recursive equations and verify that they introduce no circularity or unmodeled accuracy loss when layers are executed sequentially.
  2. [§4] §4 (Algorithm): The polynomial-time complexity and optimality guarantee for the joint allocation problem must be supported by a formal proof or reduction (e.g., to a known solvable structure such as water-filling or dynamic programming); without it, the simulation gains cannot be attributed to the algorithm rather than heuristic tuning.
minor comments (2)
  1. [Abstract] Abstract and §5 (Simulations): Quantitative throughput gains (e.g., percentage improvement or absolute values) should be stated explicitly rather than described only qualitatively as 'significant'.
  2. [§3] Notation consistency: Ensure that variables for per-layer bandwidth B_k and compute C_k are defined before first use in the optimization formulation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of the recursive latency model and the algorithm's theoretical guarantees. We address each major comment below and will incorporate the requested details into the revised manuscript.

read point-by-point responses
  1. Referee: [§3] §3 (System Model): The recursive latency dependency (inference time for layer k depending on prior-layer bandwidth and compute allocations) is load-bearing for the claimed novelty over DAI; the manuscript must supply the exact recursive equations and verify that they introduce no circularity or unmodeled accuracy loss when layers are executed sequentially.

    Authors: We agree that the recursive formulation is central to SLIDE's novelty. In the revised Section 3, we will explicitly state the recursive latency equations: let T_k denote the completion time of layer k; then T_k = max(T_{k-1} + d_k / b_k, C_{k-1}) + c_k / f_k, where d_k is layer size, b_k bandwidth, c_k compute demand, and f_k allocated compute rate, with T_0 = 0. This structure is strictly forward-recursive with no circularity, as each layer's inference begins only after its download completes and prior layers finish. We add a paragraph confirming that sequential on-device execution introduces no accuracy loss beyond standard model partitioning, as partial-layer inference is not performed. revision: yes

  2. Referee: [§4] §4 (Algorithm): The polynomial-time complexity and optimality guarantee for the joint allocation problem must be supported by a formal proof or reduction (e.g., to a known solvable structure such as water-filling or dynamic programming); without it, the simulation gains cannot be attributed to the algorithm rather than heuristic tuning.

    Authors: We acknowledge that the current manuscript asserts polynomial-time optimality without a self-contained proof. In the revised Section 4, we will add a formal proof by reduction to dynamic programming. The problem is solved by a DP table over layers and users that exploits the recursive latency structure, with state size O(K * U * R) where K is layers, U users, R discrete resource levels, yielding O(K U R^2) time. Optimality follows by induction: the subproblem optimum for the first k layers is preserved when extending to k+1 under the max-completion-time objective. This establishes that the reported simulation gains are due to the exact algorithm rather than tuning. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation remains self-contained

full rationale

The paper formulates a joint optimization for throughput maximization under recursive per-layer latency dependencies, then presents a polynomial-time algorithm asserted to solve it optimally. No equation reduces to a prior fitted parameter or self-defined quantity by construction, no load-bearing self-citation chain is invoked, and the simulation results are presented as external validation rather than tautological confirmation. The recursive dependency structure is explicitly introduced as modeling novelty rather than smuggled in via prior work. The central claim therefore rests on independent formulation and algorithmic design rather than circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on domain assumptions about wireless channels, model layer independence for sequential inference, and standard convex or efficient optimization techniques; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption Wireless downlink channel models and additive latency calculations for layer transmission and partial inference
    Invoked implicitly when formulating the E2E DAI latency and recursive dependencies.

pith-pipeline@v0.9.0 · 5485 in / 1083 out tokens · 28758 ms · 2026-05-16T20:04:33.350787+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 1 internal anchor

  1. [1]

    SecDeep: Secure and performant on-device deep learning inference framework for mobile and IoT devices,

    R. Liu, L. Garcia, Z. Liu, B. Ou, and M. Srivastava, “SecDeep: Secure and performant on-device deep learning inference framework for mobile and IoT devices,” inProc. Int. Conf. Internet Things Des. Implement., Charlottesvle, V A, USA, May 2021, p. 67–79

  2. [2]

    Energy-efficient optimal mode selection for edge AI inference via integrated sensing- communication-computation,

    S. Liu, D. Wen, D. Li, Q. Chen, G. Zhu, and Y . Shi, “Energy-efficient optimal mode selection for edge AI inference via integrated sensing- communication-computation,”IEEE Trans. Mobile Comput., vol. 23, no. 12, pp. 14 248–14 262, Dec. 2024

  3. [3]

    2021, version 18.2.0

    3GPP, “3rd generation partnership project; Technical specification group services and system aspects; Study on traffic characteristics and perfor- mance requirements for AI/ML model transfer in 5GS; (Release 18),” 3rd Generation Partnership Project (3GPP), Technical Specification (TS) 22.874, Dec. 2021, version 18.2.0

  4. [4]

    Green edge AI: A contemporary survey,

    Y . Mao, X. Yu, K. Huang, Y .-J. A. Zhang, and J. Zhang, “Green edge AI: A contemporary survey,”Proc. IEEE, pp. 1–32, early access 2024

  5. [5]

    In-situ model downloading to realize versatile edge AI in 6G mobile networks,

    K. Huang, H. Wu, Z. Liu, and X. Qi, “In-situ model downloading to realize versatile edge AI in 6G mobile networks,”IEEE Wireless Commun., vol. 30, no. 3, pp. 96–102, Jun. 2023

  6. [6]

    Gemini: A Family of Highly Capable Multimodal Models

    G. Team, R. Anil, S. Borgeaud, Y . Wu, J.-B. Alayrac, J. Yu, R. Soricut, J. Schalkwyk, A. M. Dai, A. Hauthet al., “Gemini: A family of highly capable multimodal models,”arXiv preprint arXiv:2312.11805, 2023. 12

  7. [7]

    Notable site recognition using deep learning on mobile and crowd-sourced imagery,

    J. Tan, A. Noulas, D. S ´aez, and R. Schifanella, “Notable site recognition using deep learning on mobile and crowd-sourced imagery,” inProc. 2020 21st IEEE Int. Conf. Mobile Data Manage. (MDM), Versailles, France, Aug. 2020, pp. 137–147

  8. [8]

    Sense4FL: Vehicular crowdsensing enhanced federated learning for autonomous driving,

    Y . Ma, S. Hu, Z. Fang, Y . Ji, Y . Deng, and Y . Fang, “Sense4FL: Vehicular crowdsensing enhanced federated learning for autonomous driving,”arXiv preprint arXiv:2503.17697, 2025

  9. [9]

    LoRA: Low-rank adaptation of large language models,

    E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models,” inProc. Int. Conf. Learn. Represent. (ICLR), Apr. 2022, pp. 1–13

  10. [10]

    Efficient multiuser AI downloading via reusable knowledge broadcasting,

    H. Wu, Q. Zeng, and K. Huang, “Efficient multiuser AI downloading via reusable knowledge broadcasting,”IEEE Trans. Wireless Commun., vol. 23, no. 8, pp. 10 459–10 472, Aug. 2024

  11. [11]

    AgentsCoMerge: Large language model empowered collabo- rative decision making for ramp merging,

    S. Hu, Z. Fang, Z. Fang, Y . Deng, X. Chen, Y . Fang, and S. T. W. Kwong, “AgentsCoMerge: Large language model empowered collabo- rative decision making for ramp merging,”IEEE Trans. Mobile Comput., vol. 24, no. 10, pp. 9791–9805, Oct. 2025

  12. [12]

    Hong Kong mobile network experience report,

    OPENSIGNAL, “Hong Kong mobile network experience report,”

  13. [13]

    Available: https://www.opensignal.com/reports/2023/11/ hongkong/mobile-network-experience

    [Online]. Available: https://www.opensignal.com/reports/2023/11/ hongkong/mobile-network-experience

  14. [14]

    Characterizing resource heterogeneity in edge devices for deep learning inferences,

    J. Hao, P. Subedi, I. K. Kim, and L. Ramaswamy, “Characterizing resource heterogeneity in edge devices for deep learning inferences,” inProc. 2021 Syst. Netw. Telemetry Anal. (SNTA), Jun. 2021, pp. 21– 24

  15. [15]

    FastDimeNet++: Training DimeNet++ in 22 minutes,

    F. Zhu, M. Futrega, H. Bao, S. B. Eryilmaz, F. Kong, K. Duan, X. Zheng, N. Angel, M. Jouanneaux, M. Stadleret al., “FastDimeNet++: Training DimeNet++ in 22 minutes,” inProc. 52nd Int. Conf. Parallel Process., Salt Lake City, UT, USA, Aug. 2023, pp. 274–284

  16. [16]

    Pre-warming is not enough: Accelerating serverless inference with opportunistic pre-loading,

    Y . Sui, H. Yu, Y . Hu, J. Li, and H. Wang, “Pre-warming is not enough: Accelerating serverless inference with opportunistic pre-loading,” in Proc. 2024 ACM Symp. Cloud Comput., Redmond, W A, USA, Nov. 2024, p. 178–195

  17. [17]

    3rd generation partnership project; Technical specification group radio access network; NR; Base station (BS) radio transmission and reception; (Release 18),

    3GPP, “3rd generation partnership project; Technical specification group radio access network; NR; Base station (BS) radio transmission and reception; (Release 18),” 3rd Generation Partnership Project (3GPP), Technical Specification (TS) 38.104, Dec. 2024, version 18.8.0

  18. [18]

    Learning multiple layers of features from tiny images,

    A. Krizhevsky, G. Hintonet al., “Learning multiple layers of features from tiny images,” Apr. 2009

  19. [19]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NV , USA, Jun. 2016, pp. 770–778

  20. [20]

    TrimCaching: Parameter- sharing AI model caching in wireless edge networks,

    G. Qu, Z. Lin, F. Liu, X. Chen, and K. Huang, “TrimCaching: Parameter- sharing AI model caching in wireless edge networks,” inProc. IEEE Int. Conf. Distrib. Comput. Syst. (ICDCS), Jersey City, NJ, USA, Jul. 2024, pp. 36–46

  21. [21]

    Multiuser co- inference with batch processing capable edge server,

    W. Shi, S. Zhou, Z. Niu, M. Jiang, and L. Geng, “Multiuser co- inference with batch processing capable edge server,”IEEE Trans. Wireless Commun., vol. 22, no. 1, pp. 286–300, Jan. 2023

  22. [22]

    Optimal model placement and online model splitting for device-edge co-inference,

    J. Yan, S. Bi, and Y .-J. A. Zhang, “Optimal model placement and online model splitting for device-edge co-inference,”IEEE Trans. Wireless Commun., vol. 21, no. 10, pp. 8354–8367, Oct. 2022

  23. [23]

    Improving device-edge cooperative inference of deep learning via 2-step pruning,

    W. Shi, Y . Hou, S. Zhou, Z. Niu, Y . Zhang, and L. Geng, “Improving device-edge cooperative inference of deep learning via 2-step pruning,” inProc. IEEE Conf. Comput. Commun. Workshops (INFOCOM WK- SHPS), Paris, France, Jul. 2019, pp. 1–6

  24. [24]

    A survey on quality of experience of HTTP adaptive streaming,

    M. Seufert, S. Egger, M. Slanina, T. Zinner, T. Hoßfeld, and P. Tran-Gia, “A survey on quality of experience of HTTP adaptive streaming,”IEEE Commun. Surveys Tuts., vol. 17, no. 1, pp. 469–492, 1st Quart. 2014

  25. [25]

    Measuring the quality of experience of HTTP video streaming,

    R. K. P. Mok, E. W. W. Chan, and R. K. C. Chang, “Measuring the quality of experience of HTTP video streaming,” inProc. IFIP/IEEE Int. Symp. Integrated Netw. Manag. (IM 2011) and Workshops, Dublin, Ireland, May 2011, pp. 485–492

  26. [26]

    Streaming video over HTTP with consistent quality,

    Z. Li, A. C. Begen, J. Gahm, Y . Shan, B. Osler, and D. Oran, “Streaming video over HTTP with consistent quality,” inProc. 5th ACM Multimedia Syst. Conf., Singapore, Singapore, Mar. 2014, p. 248–258

  27. [27]

    A control-theoretic approach for dynamic adaptive video streaming over HTTP,

    X. Yin, A. Jindal, V . Sekar, and B. Sinopoli, “A control-theoretic approach for dynamic adaptive video streaming over HTTP,” inProc. 2015 ACM Conf. Spec. Interest Group Data Commun. (SIGCOMM), London United Kingdom, Aug. 2015, pp. 325–338

  28. [28]

    Harnessing your DRAM and SSD for sustainable and ac- cessible LLM inference with mixed-precision and multi-level caching,

    J. Peng, Z. Cao, H. Qu, Z. Zhang, C. Guo, Y . Zhang, Z. Cao, and T. Chen, “Harnessing your DRAM and SSD for sustainable and ac- cessible LLM inference with mixed-precision and multi-level caching,” arXiv preprint arXiv:2410.14740, 2024

  29. [29]

    Sparsification and separation of deep learning layers for constrained resource inference on wearables,

    S. Bhattacharya and N. D. Lane, “Sparsification and separation of deep learning layers for constrained resource inference on wearables,” inProc. 14th ACM Conf. Embedded Netw. Sens. Syst. CD-ROM, Stanford, CA, USA, Nov. 2016, pp. 176–189

  30. [30]

    FlexNN: Efficient and adaptive DNN inference on memory-constrained edge devices,

    X. Li, Y . Li, Y . Li, T. Cao, and Y . Liu, “FlexNN: Efficient and adaptive DNN inference on memory-constrained edge devices,” inProc. 30th Annu. Int. Conf. Mobile Comput. Netw., Washington D.C., DC, USA, May 2024, p. 709–723

  31. [31]

    The larger the merrier? Efficient large AI model inference in wireless edge networks,

    Z. Lyu, M. Xiao, J. Xu, M. Skoglund, and M. D. Renzo, “The larger the merrier? Efficient large AI model inference in wireless edge networks,” IEEE J. Sel. Areas Commun., pp. 1–15, early access 2025

  32. [32]

    NVIDIA GeForce RTX 4090,

    Techpowerup, “NVIDIA GeForce RTX 4090,” 2022. [Online]. Available: https://www.techpowerup.com/gpu-specs/geforce-rtx-4090.c3889

  33. [33]

    [Online]

    PyTorch, “Module,” 2025. [Online]. Available: https://docs.pytorch.org/ docs/stable/generated/torch.nn.Module.html

  34. [34]

    iGniter: Interference-aware GPU resource provisioning for predictable DNN inference in the cloud,

    F. Xu, J. Xu, J. Chen, L. Chen, R. Shang, Z. Zhou, and F. Liu, “iGniter: Interference-aware GPU resource provisioning for predictable DNN inference in the cloud,”IEEE Trans. Parallel Distrib. Syst., vol. 34, no. 3, pp. 812–827, Mar. 2023

  35. [35]

    Efficient parallel split learning over resource-constrained wireless edge networks,

    Z. Lin, G. Zhu, Y . Deng, X. Chen, Y . Gao, K. Huang, and Y . Fang, “Efficient parallel split learning over resource-constrained wireless edge networks,”IEEE Trans. Mobile Comput., vol. 23, no. 10, pp. 9224–9239, Oct. 2024

  36. [36]

    Energy-efficient resource management for federated edge learning with CPU-GPU heterogeneous computing,

    Q. Zeng, Y . Du, K. Huang, and K. K. Leung, “Energy-efficient resource management for federated edge learning with CPU-GPU heterogeneous computing,”IEEE Trans. Wireless Commun., vol. 20, no. 12, pp. 7947– 7962, Dec. 2021

  37. [37]

    Evaluating and analyzing the energy efficiency of CNN inference on high-performance GPU,

    C. Yao, W. Liu, W. Tang, J. Guo, S. Hu, Y . Lu, and W. Jiang, “Evaluating and analyzing the energy efficiency of CNN inference on high-performance GPU,”Concurr. Comput.: Pract. Exper., vol. 33, no. 6, p. e6064, Oct. 2021

  38. [38]

    Power- efficient time-sensitive mapping in heterogeneous systems,

    C. Liu, J. Li, W. Huang, J. Rubio, E. Speight, and X. Lin, “Power- efficient time-sensitive mapping in heterogeneous systems,” inProc. Int. Conf. Parallel Archit. and Compilation Tech. (PACT), Minneapolis, MN, USA, Sep. 2012, pp. 23–32

  39. [39]

    Learning-based resource allocation for backscatter- aided vehicular networks,

    W. U. Khan, T. N. Nguyen, F. Jameel, M. A. Jamshed, H. Pervaiz, M. A. Javed, and R. J¨antti, “Learning-based resource allocation for backscatter- aided vehicular networks,”IEEE Trans. Intell. Transp. Syst., vol. 23, no. 10, pp. 19 676–19 690, Oct. 2022

  40. [40]

    Adaptive channel prediction, beamforming and scheduling design for 5G V2I network: Analytical and machine learning approaches,

    T. E. Bogale, X. Wang, and L. B. Le, “Adaptive channel prediction, beamforming and scheduling design for 5G V2I network: Analytical and machine learning approaches,”IEEE Trans. Veh. Technol., vol. 69, no. 5, pp. 5055–5067, May 2020

  41. [41]

    SDN enabled 5G-V ANET: Adaptive vehicle clustering and beamformed transmission for aggregated traffic,

    X. Duan, Y . Liu, and X. Wang, “SDN enabled 5G-V ANET: Adaptive vehicle clustering and beamformed transmission for aggregated traffic,” IEEE Commun. Mag., vol. 55, no. 7, pp. 120–127, Jul. 2017

  42. [42]

    Delay-based maximum power-weight scheduling with heavy-tailed traffic,

    S.-C. Lin, P. Wang, I. F. Akyildiz, and M. Luo, “Delay-based maximum power-weight scheduling with heavy-tailed traffic,”IEEE/ACM Trans. Netw., vol. 25, no. 4, pp. 2540–2555, Aug. 2017

  43. [43]

    A tutorial on decomposition methods for network utility maximization,

    D. Palomar and M. Chiang, “A tutorial on decomposition methods for network utility maximization,”IEEE J. Sel. Areas Commun., vol. 24, no. 8, pp. 1439–1451, Aug. 2006

  44. [44]

    Training data-efficient image transformers & distillation through attention,

    H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jegou, “Training data-efficient image transformers & distillation through attention,” inProc. 38th Int. Conf. Mach. Learn. (ICML), vol. 139, Jul. 2021, pp. 10 347–10 357

  45. [45]

    Data sheet nvidia jetson orin NX series,

    NVIDIA, “Data sheet nvidia jetson orin NX series,” 2022. [Online]. Available: https://connecttech.com/ftp/pdf/jetson orin nx datasheet.pdf

  46. [46]

    Data sheet nvidia jetson orin nano series,

    NVIDIA, “Data sheet nvidia jetson orin nano series,” 2022. [Online]. Available: https://connecttech.com/ftp/pdf/nvidia jetson orin datasheet. pdf

  47. [47]

    Nesterov,Introductory lectures on convex optimization: A basic course, 1st ed., ser

    Y . Nesterov,Introductory lectures on convex optimization: A basic course, 1st ed., ser. Applied Optimization. New York, NY , USA: Springer Science & Business Media, 2013, vol. 87. 1 APPENDIXA PROOF OFPROPOSITION1 Wheny k = 0, constraint (11d) inP2enforcesˆz k,li = 0, which aligns with the computing resource allocation inP1un- der the same condition. When...