Design Insights into Partition Placement and Routing for DNN Inference in Multi-Hop Edge Networks

Jinkun Zhang; Poonam Yadav

arxiv: 2604.25571 · v1 · submitted 2026-04-28 · 💻 cs.NI

Design Insights into Partition Placement and Routing for DNN Inference in Multi-Hop Edge Networks

Jinkun Zhang , Poonam Yadav This is my paper

Pith reviewed 2026-05-07 14:40 UTC · model grok-4.3

classification 💻 cs.NI

keywords DNN partitioningedge networksmulti-hop routingplacement optimizationcongestion awarenessIoT-edge-cloudinference latency

0 comments

The pith

Joint optimization of DNN partition placement and routing in multi-hop edge networks improves performance when splits are flexible in IoT-edge-cloud setups and routing accounts for congestion under growing load.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies how to distribute segments of a deep neural network across end devices, edge servers, and the cloud in networks where traffic travels over multiple hops. Different segments produce feature maps of very different sizes, and nodes vary in compute speed, so placement and routing decisions are coupled. The authors set up a congestion-aware optimization problem that treats placement as discrete choices and routing as continuous flow variables, then solve it with an alternating procedure that refines placement and then updates forwarding to avoid congested links. Numerical tests on hierarchical, regular, irregular, and backbone-like topologies show that allowing flexible splits yields the largest gains in mixed IoT-edge-cloud environments, while congestion-aware routing refinements matter more as offered load rises. The preferred balance between placement and routing also shifts with the relative cost of communication versus computation.

Core claim

For fixed-partition DNN inference over heterogeneous multi-hop edge networks the authors formulate a congestion-aware mixed discrete-continuous optimization problem that jointly decides where each partition runs and how inference traffic is routed. They solve the problem with an alternating framework that repeatedly updates partition placement and then recomputes congestion-aware forwarding. Across hierarchical, regular, synthetic irregular, and real backbone-inspired topologies, numerical evaluation shows split flexibility is particularly important in IoT-edge-cloud settings while congestion-aware refinement becomes increasingly beneficial as the offered load grows; the preferred operating点

What carries the argument

A congestion-aware mixed discrete-continuous optimization problem solved by an alternating framework that couples discrete partition placement updates with continuous congestion-aware forwarding updates.

Load-bearing premise

Only a small number of DNN partitions are considered, each placed at exactly one node without replication.

What would settle it

Running the same optimization on a real multi-hop edge testbed that permits partition replication or a much larger number of partitions and measuring whether the reported gains in latency or throughput disappear.

Figures

Figures reproduced from arXiv: 2604.25571 by Jinkun Zhang, Poonam Yadav.

**Figure 1.** Figure 1: Inference with DNN partitions in a multi-hop network view at source ↗

**Figure 2.** Figure 2: Normalized objective J in all scenarios 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Cloud Edge IoT view at source ↗

**Figure 5.** Figure 5: communication–computation tradeoff in IoT tradeoff. We solve the problem under a weighted objective Jη = ηJcomm + (1 − η)Jcomp, where η ∈ [0, 1] controls the emphasis on communication cost. The figure reports the weighted total cost together with its communication and computation components. As expected, increasing η shifts the optimized solution toward communication-efficient operation, while decreasing η… view at source ↗

read the original abstract

Partitioned DNN inference is a promising approach for latency-sensitive intelligent services in edge networks, since it allows different parts of a model to be executed across end devices, edge servers, and the cloud. However, in a multi-hop edge network, partition placement and inference traffic routing are inherently coupled: raw inputs, intermediate features, and final outputs may have very different sizes, while candidate nodes also differ in computation capability. In addition, both communication and computation delays can become congestion-dependent under load. In this paper, we study joint partition placement and routing for fixed-partition DNN inference over heterogeneous multi-hop edge networks. We consider a small number of DNN partitions, each placed at exactly one node without replication, and formulate a congestion-aware mixed discrete--continuous optimization problem that captures both routing and execution costs. To solve it, we develop a practical alternating framework that couples partition placement with congestion-aware forwarding updates. Through numerical evaluation on hierarchical, regular, synthetic irregular, and real backbone-inspired topologies, we show that split flexibility is particularly important in IoT--edge--cloud settings, while congestion-aware refinement becomes increasingly beneficial as the offered load grows. We further illustrate how the preferred operating point depends on the communication--computation tradeoff.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a scoped but concrete alternating optimization for jointly placing fixed DNN partitions and routing their traffic to account for congestion in multi-hop edge networks.

read the letter

The main thing to know is that this work formulates placement and routing as a single congestion-aware mixed discrete-continuous problem and solves it with an alternating loop that updates locations then forwarding paths. The numerical runs on hierarchical, regular, irregular, and real-inspired topologies show split flexibility mattering most in IoT-edge-cloud chains and congestion awareness helping more as load increases, with the preferred point shifting by the comm-comp tradeoff.

Referee Report

0 major / 3 minor

Summary. The paper formulates a congestion-aware mixed discrete-continuous optimization problem for joint DNN partition placement and routing in multi-hop heterogeneous edge networks. It develops an alternating optimization framework that couples discrete partition placement decisions with continuous congestion-aware routing updates, then evaluates the approach numerically across hierarchical, regular, synthetic irregular, and real backbone-inspired topologies to derive design insights on split flexibility versus congestion awareness.

Significance. If the numerical trends hold, the work supplies actionable guidance for edge DNN deployment by showing that partition flexibility matters most in IoT-edge-cloud hierarchies while congestion-aware routing gains importance with rising load; the multi-topology evaluation and explicit scoping of assumptions (fixed small partition count, no replication) strengthen the practical relevance of the reported communication-computation trade-offs.

minor comments (3)

[Abstract and Evaluation] The abstract states that results illustrate dependence on the communication-computation tradeoff, but the evaluation section does not identify the exact parameters (e.g., specific load levels or partition counts) varied to produce that illustration.
[Evaluation] Numerical results across the four topology classes are presented without mention of error bars, multiple random seeds, or sensitivity checks on the offered-load and partition-count parameters; this reduces confidence in the reported trends even though the central claims remain plausible.
[Problem Formulation] The mixed discrete-continuous formulation would benefit from an explicit symbol table listing all variables, especially the continuous routing flows and congestion-dependent delay functions, to improve readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The evaluation across multiple topologies and the scoping of assumptions (fixed partitions, no replication) are indeed intended to provide actionable guidance on when split flexibility versus congestion awareness matters most.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper states a mixed discrete-continuous optimization problem directly from network routing, execution, and congestion cost models, then solves it via an alternating placement-and-forwarding framework whose outputs are evaluated numerically on multiple topology classes. The reported insights on split flexibility versus congestion awareness follow from those evaluations under explicitly scoped assumptions (fixed small partition count, single-node placement, no replication). No step reduces by construction to a fitted parameter, self-defined quantity, or load-bearing self-citation; the central claims remain independent of the inputs.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard network-flow and optimization assumptions plus the modeling choice of fixed small partitions without replication; no new physical entities are postulated.

free parameters (2)

number of DNN partitions
Assumed small and fixed; placement decisions are discrete variables in the optimization.
offered load levels
Varied parametrically in evaluation to demonstrate trends with congestion.

axioms (2)

domain assumption Communication and computation delays become congestion-dependent under load
Invoked to justify the mixed discrete-continuous formulation.
domain assumption Candidate nodes differ in computation capability and links have finite capacity
Stated as part of the heterogeneous multi-hop network model.

pith-pipeline@v0.9.0 · 5515 in / 1418 out tokens · 122018 ms · 2026-05-07T14:40:05.040422+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages

[1]

Distributed deep neural networks over the cloud, the edge and end devices,

S. Teerapittayanon, B. McDanel, and H.-T. Kung, “Distributed deep neural networks over the cloud, the edge and end devices,” in2017 IEEE 37th international conference on distributed computing systems (ICDCS). IEEE, 2017, pp. 328–339

work page 2017
[2]

Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,

Y . Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang, “Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,”ACM SIGARCH Computer Architecture News, vol. 45, no. 1, pp. 615–629, 2017

work page 2017
[3]

Penetralium: Privacy-preserving and memory-efficient neural network inference at the edge,

M. Yang, W. Yi, J. Wang, H. Hu, X. Xu, and Z. Li, “Penetralium: Privacy-preserving and memory-efficient neural network inference at the edge,”Future Generation Computer Systems, vol. 156, pp. 30–41, 2024

work page 2024
[4]

A survey on collaborative dnn inference for edge intelligence,

W.-Q. Ren, Y .-B. Qu, C. Dong, Y .-Q. Jing, H. Sun, Q.-H. Wu, and S. Guo, “A survey on collaborative dnn inference for edge intelligence,” Machine Intelligence Research, vol. 20, no. 3, pp. 370–395, 2023

work page 2023
[5]

Distributed dnn inference with fine-grained model partitioning in mobile edge computing networks,

H. Li, X. Li, Q. Fan, Q. He, X. Wang, and V . C. Leung, “Distributed dnn inference with fine-grained model partitioning in mobile edge computing networks,”IEEE Transactions on Mobile Computing, vol. 23, no. 10, pp. 9060–9074, 2024

work page 2024
[6]

Online optimization of dnn inference network utility in collaborative edge computing,

R. Li, T. Ouyang, L. Zeng, G. Liao, Z. Zhou, and X. Chen, “Online optimization of dnn inference network utility in collaborative edge computing,”IEEE/ACM Transactions on Networking, vol. 32, no. 5, pp. 4414–4426, 2024

work page 2024
[7]

Collaborative inference in resource-constrained edge networks: Challenges and opportunities,

N. Ng, A. Souza, S. Diggavi, N. Suri, T. Abdelzaher, D. Towsley, and P. Shenoy, “Collaborative inference in resource-constrained edge networks: Challenges and opportunities,” inMILCOM 2024-2024 IEEE Military Communications Conference (MILCOM). IEEE, 2024, pp. 1–6

work page 2024
[8]

Privacy-aware joint dnn model deployment and partitioning optimiza- tion for collaborative edge inference services,

Z. Cheng, X. Xia, H. Wang, M. Liwang, N. Chen, X. Fan, and X. Wang, “Privacy-aware joint dnn model deployment and partitioning optimiza- tion for collaborative edge inference services,”IEEE Transactions on Services Computing, 2025

work page 2025
[9]

Delay-optimal service chain forwarding and offloading in collaborative edge computing,

J. Zhang and E. Yeh, “Delay-optimal service chain forwarding and offloading in collaborative edge computing,” inICC 2024-IEEE Inter- national Conference on Communications. IEEE, 2024, pp. 3931–3936

work page 2024
[10]

Dnn partitioning for cooperative inference in edge intelligence: Modeling, solutions, toolchains,

Y . Hao, N. Ding, W. Xia, H. Ge, and L. Xu, “Dnn partitioning for cooperative inference in edge intelligence: Modeling, solutions, toolchains,”ACM Computing Surveys, vol. 58, no. 8, pp. 1–34, 2026

work page 2026
[11]

A minimum delay routing algorithm using distributed computation,

R. Gallager, “A minimum delay routing algorithm using distributed computation,”IEEE transactions on communications, vol. 25, 1977

work page 1977

[1] [1]

Distributed deep neural networks over the cloud, the edge and end devices,

S. Teerapittayanon, B. McDanel, and H.-T. Kung, “Distributed deep neural networks over the cloud, the edge and end devices,” in2017 IEEE 37th international conference on distributed computing systems (ICDCS). IEEE, 2017, pp. 328–339

work page 2017

[2] [2]

Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,

Y . Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang, “Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,”ACM SIGARCH Computer Architecture News, vol. 45, no. 1, pp. 615–629, 2017

work page 2017

[3] [3]

Penetralium: Privacy-preserving and memory-efficient neural network inference at the edge,

M. Yang, W. Yi, J. Wang, H. Hu, X. Xu, and Z. Li, “Penetralium: Privacy-preserving and memory-efficient neural network inference at the edge,”Future Generation Computer Systems, vol. 156, pp. 30–41, 2024

work page 2024

[4] [4]

A survey on collaborative dnn inference for edge intelligence,

W.-Q. Ren, Y .-B. Qu, C. Dong, Y .-Q. Jing, H. Sun, Q.-H. Wu, and S. Guo, “A survey on collaborative dnn inference for edge intelligence,” Machine Intelligence Research, vol. 20, no. 3, pp. 370–395, 2023

work page 2023

[5] [5]

Distributed dnn inference with fine-grained model partitioning in mobile edge computing networks,

H. Li, X. Li, Q. Fan, Q. He, X. Wang, and V . C. Leung, “Distributed dnn inference with fine-grained model partitioning in mobile edge computing networks,”IEEE Transactions on Mobile Computing, vol. 23, no. 10, pp. 9060–9074, 2024

work page 2024

[6] [6]

Online optimization of dnn inference network utility in collaborative edge computing,

R. Li, T. Ouyang, L. Zeng, G. Liao, Z. Zhou, and X. Chen, “Online optimization of dnn inference network utility in collaborative edge computing,”IEEE/ACM Transactions on Networking, vol. 32, no. 5, pp. 4414–4426, 2024

work page 2024

[7] [7]

Collaborative inference in resource-constrained edge networks: Challenges and opportunities,

N. Ng, A. Souza, S. Diggavi, N. Suri, T. Abdelzaher, D. Towsley, and P. Shenoy, “Collaborative inference in resource-constrained edge networks: Challenges and opportunities,” inMILCOM 2024-2024 IEEE Military Communications Conference (MILCOM). IEEE, 2024, pp. 1–6

work page 2024

[8] [8]

Privacy-aware joint dnn model deployment and partitioning optimiza- tion for collaborative edge inference services,

Z. Cheng, X. Xia, H. Wang, M. Liwang, N. Chen, X. Fan, and X. Wang, “Privacy-aware joint dnn model deployment and partitioning optimiza- tion for collaborative edge inference services,”IEEE Transactions on Services Computing, 2025

work page 2025

[9] [9]

Delay-optimal service chain forwarding and offloading in collaborative edge computing,

J. Zhang and E. Yeh, “Delay-optimal service chain forwarding and offloading in collaborative edge computing,” inICC 2024-IEEE Inter- national Conference on Communications. IEEE, 2024, pp. 3931–3936

work page 2024

[10] [10]

Dnn partitioning for cooperative inference in edge intelligence: Modeling, solutions, toolchains,

Y . Hao, N. Ding, W. Xia, H. Ge, and L. Xu, “Dnn partitioning for cooperative inference in edge intelligence: Modeling, solutions, toolchains,”ACM Computing Surveys, vol. 58, no. 8, pp. 1–34, 2026

work page 2026

[11] [11]

A minimum delay routing algorithm using distributed computation,

R. Gallager, “A minimum delay routing algorithm using distributed computation,”IEEE transactions on communications, vol. 25, 1977

work page 1977