pith. sign in

arxiv: 2511.03159 · v2 · submitted 2025-11-05 · 💻 cs.NI

Joint Optimization of DNN Model Caching and Request Routing in Mobile Edge Computing

Pith reviewed 2026-05-18 02:00 UTC · model grok-4.3

classification 💻 cs.NI
keywords mobile edge computingDNN model cachingrequest routingdynamic DNNsinference precisionquality of experiencelinear programmingonline optimization
0
0 comments X p. Extension

The pith

Disassembling DNNs into submodels for joint caching and routing in edge networks raises average inference precision by 46%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the difficulty of caching full deep neural networks at capacity-limited edge servers while keeping inference accurate and fast. By splitting each model into smaller interrelated submodels, decisions about what to store locally and which server should handle a given request become more granular. This setup lets the system trade some precision for shorter model loading times without violating latency or resource limits. The authors formulate the problem as maximizing inference precision and solve it with an offline algorithm that uses linear programming followed by random rounding, plus an online version that adapts as requests arrive. Simulations indicate these choices produce substantially better precision and user quality of experience than earlier methods that treat models as indivisible units.

Core claim

In mobile edge computing networks, disassembling complete DNN models into interrelated submodels enables fine-grained caching and request routing that maximizes user request inference precision subject to server resource, latency, and model loading time constraints. The CoCaR algorithm, based on linear programming and random rounding, achieves near-optimal performance offline, while its online variant CoCaR-OL adapts to unpredictable request patterns and delivers at least 32.3% higher user QoE than competitive baselines.

What carries the argument

Joint optimization of submodel caching and request routing for dynamic DNNs, solved by linear programming with random rounding in the CoCaR algorithm.

If this is right

  • Average inference precision across user requests rises by 46% relative to prior approaches.
  • Online scenarios achieve at least 32.3% improvement in user quality of experience.
  • Caching becomes more efficient because only needed submodels occupy limited server storage.
  • Routing can steer requests toward servers holding the submodels that deliver the highest feasible precision for each query.
  • Solutions stay close to the theoretical optimum while respecting capacity, delay, and loading-time limits.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same modular decomposition idea could apply to other model families such as transformers if they admit clean submodel boundaries.
  • Real hardware testbeds would likely expose mobility and interference effects absent from the current simulations.
  • Energy use at edge servers may drop when fewer parameters are loaded per request.
  • Combining the optimizer with request forecasting could reduce loading events still further.

Load-bearing premise

Breaking full DNN models into interrelated submodels produces useful trade-offs between inference precision and loading latency without introducing unmodeled accuracy penalties or extra overheads.

What would settle it

A controlled measurement on real edge hardware showing that submodel-based inference yields no net precision gain or produces higher total latency than full-model baselines under identical request loads.

Figures

Figures reproduced from arXiv: 2511.03159 by Dian Shen, Fang Dong, Patrick P. C. Lee, Qilin Fan, Ruiting Zhou, Shuting Qiu, Siyu Tan.

Figure 1
Figure 1. Figure 1: Submodels division and switching of ViT. When switching from [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 4
Figure 4. Figure 4: Communication latency: Routing user u’s request involves wireless transmission from u to home BS n1, wired transmission from n1 to target BS n3, and a total of 6 hops from initiating the request to receive the inference result. {h m 0 , hm 1 , · · · , hm H(m) } denote the set of submodels {h m j } H(m) j=1 of the dynamic DNN associated with m, along with an empty submodel h m 0 1 . Then, denote by H = S m∈… view at source ↗
Figure 5
Figure 5. Figure 5: Illustration of model caching in an online scenario. There are two types [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 8
Figure 8. Figure 8: Impact of different Zipf skewness coefficients:(a) Average inference [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 7
Figure 7. Figure 7: Impact of popularity change frequency: (a) Average inference [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: Impact of different observation window duration: (a) Average inference [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Average BS memory resource utilization. edge collaboration mechanism and quickly adjusting caching schemes via submodel switching to serve more users. Impact of Observation Window Duration [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗
Figure 12
Figure 12. Figure 12: Effect of BS Memory Capacity on the average user’s QoE. [PITH_FULL_IMAGE:figures/full_fig_p013_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Effect of BS Memory Capacity on the cache hit rate. [PITH_FULL_IMAGE:figures/full_fig_p013_13.png] view at source ↗
Figure 16
Figure 16. Figure 16: Effect of the Zipf skewness coefficient on the average user’s QoE. [PITH_FULL_IMAGE:figures/full_fig_p014_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Effect of the Zipf skewness coefficient on the cache hit rate. [PITH_FULL_IMAGE:figures/full_fig_p014_17.png] view at source ↗
read the original abstract

Mobile edge computing (MEC) can pre-cache deep neural networks (DNNs) near end-users, providing low-latency services and improving users' quality of experience (QoE). However, caching all DNN models at edge servers with limited capacity is difficult, and the impact of model loading time on QoE remains underexplored. Hence, we introduce dynamic DNNs in edge scenarios, disassembling a complete DNN model into interrelated submodels for more fine-grained and flexible model caching and request routing solutions. This raises the pressing issue of jointly deciding request routing and submodel caching for dynamic DNNs to balance model inference precision and loading latency for QoE optimization. In this paper, we study the joint dynamic model caching and request routing problem in MEC networks, aiming to maximize user request inference precision under constraints of server resources, latency, and model loading time. To tackle this problem, we propose CoCaR, an offline algorithm based on linear programming and random rounding that leverages dynamic DNNs to optimize caching and routing schemes, achieving near-optimal performance. Furthermore, we develop an online variant of CoCaR, named CoCaR-OL, enabling effective adaptation to dynamic and unpredictable online request patterns. The simulation results demonstrate that the proposed CoCaR improves the average inference precision of user requests by 46% compared to state-of-the-art baselines. In addition, in online scenarios, CoCaR-OL achieves an improvement of no less than 32.3% in user QoE over competitive baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes CoCaR, an offline algorithm based on linear programming and random rounding, together with its online variant CoCaR-OL, for jointly optimizing submodel caching and request routing of dynamic DNNs in MEC networks. The goal is to maximize average inference precision subject to server capacity, latency, and model-loading-time constraints; the abstract reports that CoCaR yields a 46% gain in inference precision and CoCaR-OL yields at least 32.3% QoE improvement over baselines in simulations.

Significance. If the submodel precision model is accurate and the reported gains prove robust under proper statistical validation, the work would usefully extend MEC caching literature by exposing a controllable precision-latency trade-off that standard whole-model caching cannot exploit. The explicit treatment of loading latency as a first-class QoE factor is a constructive contribution.

major comments (2)
  1. [Abstract] Abstract: the headline claims of 46% precision improvement and 32.3% QoE improvement rest on simulation comparisons whose experimental setup, number of runs, statistical tests, baseline implementations, and sensitivity to the precision-latency trade-off weight are not described. Without these details the central performance assertions cannot be evaluated and may be sensitive to unstated parameter choices.
  2. [Problem Formulation] Problem Formulation (and the modeling of dynamic DNNs): the premise that disassembling a DNN into interrelated submodels yields a monotonic, controllable precision-versus-latency mapping free of unmodeled inter-submodel accuracy penalties or early-exit overheads is introduced without separate validation or sensitivity analysis. Because this mapping is directly encoded in the LP objective and constraints, any deviation in real submodel accuracy would render the reported gains artifacts of the simulation rather than evidence of effective joint decisions.
minor comments (1)
  1. [System Model] Clarify how the precision function for each submodel is obtained or fitted and whether it is treated as an input parameter or derived from the model architecture.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments highlight important areas for improving the clarity and robustness of our claims. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core contributions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline claims of 46% precision improvement and 32.3% QoE improvement rest on simulation comparisons whose experimental setup, number of runs, statistical tests, baseline implementations, and sensitivity to the precision-latency trade-off weight are not described. Without these details the central performance assertions cannot be evaluated and may be sensitive to unstated parameter choices.

    Authors: We agree that the abstract and experimental evaluation section require expanded details to support the reported gains. In the revised manuscript we will add a dedicated subsection on the simulation methodology. This will specify the number of independent runs (20 runs using distinct random seeds), the use of 95% confidence intervals and paired t-tests for statistical significance, the exact parameter settings and code-level implementations of all baselines, and a sensitivity study sweeping the precision-latency trade-off weight over [0.1, 10]. These additions will make the 46% precision and 32.3% QoE improvements reproducible and allow readers to assess robustness directly. revision: yes

  2. Referee: [Problem Formulation] Problem Formulation (and the modeling of dynamic DNNs): the premise that disassembling a DNN into interrelated submodels yields a monotonic, controllable precision-versus-latency mapping free of unmodeled inter-submodel accuracy penalties or early-exit overheads is introduced without separate validation or sensitivity analysis. Because this mapping is directly encoded in the LP objective and constraints, any deviation in real submodel accuracy would render the reported gains artifacts of the simulation rather than evidence of effective joint decisions.

    Authors: The monotonic precision-latency mapping follows from the modular structure of dynamic DNNs documented in the literature (e.g., early-exit and layer-wise decomposition papers). Nevertheless, we accept that explicit validation is needed. The revised manuscript will contain a new sensitivity-analysis subsection that perturbs the precision values with additive noise and multiplicative penalties simulating inter-submodel overheads. We will re-run CoCaR and CoCaR-OL under these perturbed models and report the resulting degradation (or retention) of the performance gains. This will quantify how sensitive the joint optimization remains when the ideal mapping is relaxed. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on external simulation benchmarks

full rationale

The paper formulates a joint optimization problem for submodel caching and request routing in MEC using linear programming with random rounding for the offline CoCaR algorithm and an online variant CoCaR-OL. Central performance claims (46% precision gain, 32.3% QoE gain) are obtained via direct simulation comparisons against independent baselines rather than any quantity that reduces by the paper's own equations to a fitted parameter or self-citation. The submodel precision-latency tradeoff is introduced as a modeling premise when dynamic DNNs are defined, but it is not shown to be self-definitional or derived from the optimization output itself. No load-bearing self-citation chains, ansatz smuggling, or renaming of known results appear in the provided derivation steps; the work remains self-contained against the stated simulation benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 1 invented entities

The central claim rests on the premise that dynamic DNN decomposition is feasible and that the resulting optimization problem accurately captures real resource and latency trade-offs; no explicit free parameters or invented physical entities are named in the abstract, but the linear program necessarily includes weighting coefficients between precision and latency that function as tunable parameters.

free parameters (1)
  • precision-latency trade-off weight
    The objective maximizes inference precision subject to latency and loading-time constraints; any scalar that balances these terms in the linear program is a free parameter chosen to produce the reported results.
invented entities (1)
  • dynamic DNN submodels no independent evidence
    purpose: To enable fine-grained caching by disassembling complete models into interrelated pieces
    Introduced in the abstract as a modeling device for more flexible caching; no independent empirical validation or falsifiable prediction outside the optimization is provided.

pith-pipeline@v0.9.0 · 5826 in / 1492 out tokens · 46016 ms · 2026-05-18T02:00:55.677093+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages

  1. [1]

    CoCaR: Enabling efficient dynamic DNN-based model caching and request routing in MEC,

    S. Qiu, F. Dong, S. Tan, D. Shen, R. Zhou, and Q. Fan, “CoCaR: Enabling efficient dynamic DNN-based model caching and request routing in MEC,” inproc. IEEE INFOCOM, London, United Kingdom, May, 2025, pp. 1–10

  2. [2]

    DeepSniffer: A DNN model extraction framework based on learning architectural hints,

    X. Hu, L. Liang, S. Li, L. Deng, P. Zuo, Y . Ji, X. Xie, Y . Ding, C. Liu, T. Sherwoodet al., “DeepSniffer: A DNN model extraction framework based on learning architectural hints,” inproc. ACM 25th ASPLOS, New York, USA, Mar, 2020, pp. 385–399

  3. [3]

    Joint admission control and resource allocation of virtual network embedding via hierarchical deep reinforcement learning,

    T. Wang, L. Shen, Q. Fan, T. Xu, T. Liu, and H. Xiong, “Joint admission control and resource allocation of virtual network embedding via hierarchical deep reinforcement learning,”IEEE Transactions on Services Computing, vol. 17, no. 03, pp. 1001–1015, 2024

  4. [4]

    Edge- assisted adaptive configuration for serverless-based video analytics,

    Z. Wang, R. Zhang, S. Zhang, W. Cheng, W. Wang, and Y . Cui, “Edge- assisted adaptive configuration for serverless-based video analytics,” IEEE Transactions on Networking, vol. 33, no. 3, pp. 1144–1159, 2025

  5. [5]

    Imagenet classification with deep convolutional neural networks,

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,”Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017

  6. [6]

    Advancing RNN transducer technology for speech recognition,

    G. Saon, Z. Tüske, D. Bolanos, and B. Kingsbury, “Advancing RNN transducer technology for speech recognition,” inproc. IEEE ICASSP, Toronto, Canada, Jun, 2021, pp. 5654–5658

  7. [7]

    Confidence-aware reinforcement learning for self-driving cars,

    Z. Cao, S. Xu, H. Peng, D. Yang, and R. Zidek, “Confidence-aware reinforcement learning for self-driving cars,”IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 7, pp. 7419–7430, 2021

  8. [8]

    Mobile edge computing and machine learning in the internet of unmanned aerial vehicles: a survey,

    Z. Ning, H. Hu, X. Wang, L. Guo, S. Guo, G. Wang, and X. Gao, “Mobile edge computing and machine learning in the internet of unmanned aerial vehicles: a survey,”ACM Computing Surveys, vol. 56, no. 1, pp. 1–31, 2023

  9. [9]

    Rein- forcement learning for cost-effective IoT service caching at the edge,

    B. Huang, X. Liu, Y . Xiang, D. Yu, S. Deng, and S. Wang, “Rein- forcement learning for cost-effective IoT service caching at the edge,” Journal of Parallel and Distributed Computing, vol. 168, pp. 120–136, 2022

  10. [10]

    Incentive mechanisms for online task offloading with privacy-preserving in UA V-assisted mo- bile edge computing,

    R. Zhang, R. Zhou, Y . Wang, H. Tan, and K. He, “Incentive mechanisms for online task offloading with privacy-preserving in UA V-assisted mo- bile edge computing,”IEEE/ACM Transactions on Networking, vol. 32, no. 3, pp. 2646–2661, 2024

  11. [11]

    Mean field graph based D2D collab- oration and offloading pricing in mobile edge computing,

    X. Wang, J. Ye, and J. C. Lui, “Mean field graph based D2D collab- oration and offloading pricing in mobile edge computing,”IEEE/ACM Transactions on Networking, vol. 32, no. 1, pp. 491–505, 2023

  12. [12]

    FEAT: Towards fast environment-adaptive task offloading and power allocation in MEC,

    T. Ren, Z. Hu, H. He, J. Niu, and X. Liu, “FEAT: Towards fast environment-adaptive task offloading and power allocation in MEC,” inproc. IEEE INFOCOM, New York City, USA, May, 2023, pp. 1–10

  13. [13]

    Near-optimal and collaborative service caching in mobile edge clouds,

    Z. Xu, L. Zhou, S. C.-K. Chau, W. Liang, H. Dai, L. Chen, W. Xu, Q. Xia, and P. Zhou, “Near-optimal and collaborative service caching in mobile edge clouds,”IEEE Transactions on Mobile Computing, vol. 22, no. 7, pp. 4070–4085, 2023

  14. [14]

    Asymptotically tight approximation for online file caching with delayed hits and bypassing,

    H. Tan, Y . Wang, C. Zhang, G. Li, H. Du, Z. Han, S. H.-C. Jiang, and X.-Y . Li, “Asymptotically tight approximation for online file caching with delayed hits and bypassing,”IEEE Transactions on Networking, vol. 33, no. 4, pp. 1886–1899, 2025

  15. [15]

    Two time-scale joint service caching and task offloading for UA V-assisted mobile edge computing,

    R. Zhou, X. Wu, H. Tan, and R. Zhang, “Two time-scale joint service caching and task offloading for UA V-assisted mobile edge computing,” inproc. IEEE INFOCOM, London, United Kingdom, May, 2022, pp. 1189–1198

  16. [16]

    Dynamic resource allocation for deep learning clusters with separated compute and storage,

    M. Li, Z. Han, C. Zhang, R. Zhou, Y . Liu, and H. Tan, “Dynamic resource allocation for deep learning clusters with separated compute and storage,” inproc. IEEE INFOCOM, New York City, USA, May, 2023, pp. 1–10

  17. [17]

    Offloading tasks with dependency and service caching in mobile edge computing,

    G. Zhao, H. Xu, Y . Zhao, C. Qiao, and L. Huang, “Offloading tasks with dependency and service caching in mobile edge computing,”IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 11, pp. 2777–2792, 2021

  18. [18]

    Joint optimization of base station clustering and service caching in user-centric MEC,

    L. Qin, H. Lu, Y . Lu, C. Zhang, and F. Wu, “Joint optimization of base station clustering and service caching in user-centric MEC,”IEEE Transactions on Mobile Computing, vol. 23, no. 5, pp. 6455–6469, 2023

  19. [19]

    Joint service caching, re- source allocation and task offloading for MEC-based networks: A multi- layer optimization approach,

    W. Chu, X. Jia, Z. Yu, J. C. Lui, and Y . Lin, “Joint service caching, re- source allocation and task offloading for MEC-based networks: A multi- layer optimization approach,”IEEE Transactions on Mobile Computing, vol. 23, no. 4, pp. 2958–2975, 2024

  20. [20]

    Toward deterministic wide-area networks via deadline-aware routing and scheduling,

    J. Ren, W. Zhang, H. Wang, D. Yang, S. Wang, H. Zhang, and S. Cui, “Toward deterministic wide-area networks via deadline-aware routing and scheduling,”IEEE Transactions on Networking, vol. 33, no. 4, pp. 1762–1778, 2025

  21. [21]

    FlagVNE: A flexible and generalizable RL framework for network resource allocation,

    T. Wang, Q. Fan, C. Wang, L. Ding, N. J. Yuan, and H. Xiong, “FlagVNE: A flexible and generalizable RL framework for network resource allocation,” inproc. 33rd IJCAI, 2024

  22. [22]

    Service placement and request routing in MEC networks with storage, computation, and communication constraints,

    K. Poularakis, J. Llorca, A. M. Tulino, I. Taylor, and L. Tassiulas, “Service placement and request routing in MEC networks with storage, computation, and communication constraints,”IEEE/ACM Transactions on Networking, vol. 28, no. 3, pp. 1047–1060, 2020

  23. [23]

    Joint service caching, computation offloading and resource allocation in mobile edge computing systems,

    G. Zhang, S. Zhang, W. Zhang, Z. Shen, and L. Wang, “Joint service caching, computation offloading and resource allocation in mobile edge computing systems,”IEEE Transactions on Wireless Communications, vol. 20, no. 8, pp. 5288–5300, 2021

  24. [24]

    Deep reinforcement learning for task offloading in mobile edge computing systems,

    M. Tang and V . W. Wong, “Deep reinforcement learning for task offloading in mobile edge computing systems,”IEEE Transactions on Mobile Computing, vol. 21, no. 6, pp. 1985–1997, 2020

  25. [26]

    Joint optimization of service deployment and request routing for microservices in mobile edge computing,

    K. Peng, L. Wang, J. He, C. Cai, and M. Hu, “Joint optimization of service deployment and request routing for microservices in mobile edge computing,”IEEE Transactions on Services Computing, vol. 17, no. 3, pp. 1016–1028, 2024

  26. [27]

    Edge intelligence in motion: Mobility-aware dynamic DNN inference service migration with downtime in mobile edge computing,

    P. Wang, T. Ouyang, G. Liao, J. Gong, S. Yu, and X. Chen, “Edge intelligence in motion: Mobility-aware dynamic DNN inference service migration with downtime in mobile edge computing,”Journal of Systems Architecture, vol. 130, p. 102664, 2022

  27. [28]

    Dynamic- OFA: Runtime DNN architecture switching for performance scaling on heterogeneous embedded platforms,

    W. Lou, L. Xun, A. Sabet, J. Bi, J. Hare, and G. V . Merrett, “Dynamic- OFA: Runtime DNN architecture switching for performance scaling on heterogeneous embedded platforms,” inproc. IEEE/CVF CVPR, Nashville, USA, Jun, 2021, pp. 3110–3118

  28. [29]

    Distributing inference tasks over interconnected systems through dy- 16 namic DNNs,

    C. Singhal, Y . Wu, F. Malandrino, M. Levorato, and C. F. Chiasserini, “Distributing inference tasks over interconnected systems through dy- 16 namic DNNs,”IEEE Transactions on Networking, vol. 33, no. 4, pp. 1717–1730, 2025

  29. [30]

    Dnn partitioning, task offloading, and resource allocation in dynamic vehicular networks: A lyapunov-guided diffusion-based rein- forcement learning approach,

    Z. Liu, H. Du, J. Lin, Z. Gao, L. Huang, S. Hosseinalipour, and D. Niyato, “Dnn partitioning, task offloading, and resource allocation in dynamic vehicular networks: A lyapunov-guided diffusion-based rein- forcement learning approach,”IEEE Transactions on Mobile Computing, vol. 24, no. 3, pp. 1945–1962, 2025

  30. [31]

    Distributed dnn inference with fine-grained model partitioning in mobile edge computing networks,

    H. Li, X. Li, Q. Fan, Q. He, X. Wang, and V . C. Leung, “Distributed dnn inference with fine-grained model partitioning in mobile edge computing networks,”IEEE Transactions on Mobile Computing, vol. 23, no. 10, pp. 9060–9074, 2024

  31. [32]

    OA-Cache: Oracle approximation-based cache replacement at the network edge,

    S. Qiu, Q. Fan, X. Li, X. Zhang, G. Min, and Y . Lyu, “OA-Cache: Oracle approximation-based cache replacement at the network edge,” IEEE Transactions on Network and Service Management, vol. 20, no. 3, pp. 3177–3189, 2023

  32. [33]

    Loading cost-aware model caching and request routing in edge-enabled wireless sensor networks,

    M. Yao, L. Chen, Y . Wu, and J. Wu, “Loading cost-aware model caching and request routing in edge-enabled wireless sensor networks,”The Computer Journal, vol. 66, no. 10, pp. 2409–2425, 2023

  33. [34]

    Loading cost-aware model caching and request routing for cooperative edge inference,

    M. Yao, L. Chen, J. Zhang, J. Huang, and J. Wu, “Loading cost-aware model caching and request routing for cooperative edge inference,” in proc. IEEE ICC, Seoul, Korea, Republic of, May, 2022, pp. 2327–2332

  34. [35]

    An image is worth 16x16 words: Transformers for image recognition at scale,

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” inproc. 9th ICLR. Virtual Event: OpenReview.net, May, 2021

  35. [36]

    Dy- namic edge-centric resource provisioning for online and offline services co-location via reactive and predictive approaches,

    T. Ouyang, K. Zhao, G. Hong, X. Zhang, Z. Zhou, and X. Chen, “Dy- namic edge-centric resource provisioning for online and offline services co-location via reactive and predictive approaches,”IEEE Transactions on Networking, vol. 33, no. 3, pp. 1388–1403, 2025

  36. [37]

    Collaborative service placement for edge computing in dense small cell networks,

    L. Chen, C. Shen, P. Zhou, and J. Xu, “Collaborative service placement for edge computing in dense small cell networks,”IEEE Transactions on Mobile Computing, vol. 20, no. 2, pp. 377–390, 2019

  37. [38]

    Efficient inverse maintenance and faster algorithms for linear programming,

    Y . T. Lee and A. Sidford, “Efficient inverse maintenance and faster algorithms for linear programming,” inproc. IEEE 56th FOCS, Berkeley, USA, Oct, 2015, pp. 230–249

  38. [39]

    Mitzenmacher and E

    M. Mitzenmacher and E. Upfal,Probability and computing: Random- ization and probabilistic techniques in algorithms and data analysis. Cambridge university press, 2017

  39. [40]

    Fast-response edge caching scheme for graph data,

    P. Wang, S. Li, Y . Han, F. Ye, and Q. Zhang, “Fast-response edge caching scheme for graph data,”IEEE Transactions on Networking, vol. 33, no. 4, pp. 1962–1975, 2025

  40. [41]

    Multi-user layer- aware online container migration in edge-assisted vehicular networks,

    Z. Tang, F. Mou, J. Lou, W. Jia, Y . Wu, and W. Zhao, “Multi-user layer- aware online container migration in edge-assisted vehicular networks,” IEEE/ACM Transactions on Networking, vol. 32, no. 2, pp. 1807–1822, 2023

  41. [42]

    On the evolution of random graphs,

    P. Erd ˝os, A. Rényiet al., “On the evolution of random graphs,”Publ. math. inst. hung. acad. sci, vol. 5, no. 1, pp. 17–60, 1960

  42. [43]

    Swin Transformer: Hierarchical vision transformer using shifted win- dows,

    Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin Transformer: Hierarchical vision transformer using shifted win- dows,” inproc. IEEE/CVF ICCV, Virtual Event, Oct, 2021, pp. 10 012– 10 022

  43. [44]

    Learning multiple layers of features from tiny images,

    A. Krizhevsky, G. Hintonet al., “Learning multiple layers of features from tiny images,”Master’s thesis, Department of Computer Science, University of Toronto, 2009

  44. [45]

    FemtoCaching: Wireless content delivery through distributed caching helpers,

    K. Shanmugam, N. Golrezaei, A. G. Dimakis, A. F. Molisch, and G. Caire, “FemtoCaching: Wireless content delivery through distributed caching helpers,”IEEE Transactions on Information Theory, vol. 59, no. 12, pp. 8402–8413, 2013

  45. [46]

    Cooperative task offloading and service caching for digital twin edge networks: A graph attention multi- agent reinforcement learning approach,

    Z. Yao, S. Xia, Y . Li, and G. Wu, “Cooperative task offloading and service caching for digital twin edge networks: A graph attention multi- agent reinforcement learning approach,”IEEE Journal on Selected Areas in Communications, vol. 41, no. 11, pp. 3401–3413, 2023

  46. [47]

    Improving proxy cache performance: Analysis of three replacement policies,

    J. Dilley and M. Arlitt, “Improving proxy cache performance: Analysis of three replacement policies,”IEEE Internet Computing, vol. 3, no. 6, pp. 44–50, 2002

  47. [48]

    Caching with delayed hits,

    N. Atre, J. Sherry, W. Wang, and D. S. Berger, “Caching with delayed hits,” inproc. ACM SIGCOMM, Virtual Event USA, Jul, 2020, pp. 495– 513

  48. [49]

    Joint optimization of service caching placement and computation offloading in mobile edge comput- ing systems,

    S. Bi, L. Huang, and Y . J. A. Zhang, “Joint optimization of service caching placement and computation offloading in mobile edge comput- ing systems,”IEEE Transactions on Wireless Communications, vol. 19, no. 7, pp. 4947–4963, 2020

  49. [50]

    Mutu- alNet: Adaptive convnet via mutual learning from network width and resolution,

    T. Yang, S. Zhu, C. Chen, S. Yan, M. Zhang, and A. Willis, “Mutu- alNet: Adaptive convnet via mutual learning from network width and resolution,” inproc. ECCV. Glasgow, UK: Springer, Aug, 2020, pp. 299–315

  50. [51]

    Multi-scale dense networks for resource efficient image classification,

    G. Huang, D. Chen, T. Li, F. Wu, L. van der Maaten, and K. Weinberger, “Multi-scale dense networks for resource efficient image classification,” inproc. ICLR. Vancouver, Canada: OpenReview.net, Apr, 2018

  51. [52]

    Subflow: A dynamic induced-subgraph strategy toward real-time DNN inference and training,

    S. Lee and S. Nirjon, “Subflow: A dynamic induced-subgraph strategy toward real-time DNN inference and training,” inproc. IEEE RTAS, Sydney, Australia, Apr, 2020, pp. 15–29

  52. [53]

    Stitchable neural networks,

    Z. Pan, J. Cai, and B. Zhuang, “Stitchable neural networks,” inproc. IEEE/CVF CVPR, Vancouver, Canada, Jun, 2023, pp. 16 102–16 112

  53. [54]

    Efficient stitchable task adaptation,

    H. He, Z. Pan, J. Liu, J. Cai, and B. Zhuang, “Efficient stitchable task adaptation,” inproc. IEEE/CVF CVPR, Seattle, USA, Jun, 2024, pp. 28 555–28 565