Locker-based Truck-Drone Routing with Integrated Considerations of Pickups, Deliveries, and No-Fly Zones
Pith reviewed 2026-07-01 06:19 UTC · model grok-4.3
The pith
A two-stage deep reinforcement learning method constructs coordinated truck-drone routes that minimize costs while respecting pickups, deliveries, battery limits, loads, and no-fly zones.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper defines the LTDRP-PDNF and solves it by casting route construction as a Markov Decision Process solved via a two-stage deep reinforcement learning neural heuristic. Stage one applies an attention-based encoder and Bidirectional Gated Recurrent Unit decoder to the truck-only capacitated vehicle routing problem. Stage two combines policy transfer with a hybrid dispatch assignment heuristic to produce fully coordinated truck-drone routes that respect battery, load, pickup, delivery, and no-fly zone constraints while minimizing total cost.
What carries the argument
Two-stage DRL architecture consisting of an attention-based encoder plus BiGRU decoder for truck routing, followed by policy-transfer and hybrid dispatch heuristic for drone coordination.
If this is right
- The method produces feasible coordinated routes on instances of varying scales while incorporating all listed operational constraints.
- Computation times remain exceptionally short relative to metaheuristic and neural baselines.
- Solution quality exceeds that of the compared baselines in the majority of tested cases.
- The framework supplies a practical, scalable planning tool for locker-based operations under real airspace and vehicle limits.
Where Pith is reading between the lines
- The same staged architecture could be retrained on different objective weights to prioritize emissions or service time instead of cost.
- Embedding live weather or dynamic airspace data into the MDP state would allow the heuristic to react to temporary no-fly restrictions.
- Extending the locker nodes to include charging stations or parcel sorting would test whether the dispatch heuristic still scales without redesign.
- Deployment on city-scale graphs with thousands of lockers would reveal whether the policy-transfer step continues to avoid compounding errors across stages.
Load-bearing premise
The two-stage DRL architecture with policy transfer and hybrid dispatch heuristic can be trained and deployed to produce near-optimal coordinated truck-drone routes that correctly incorporate battery, load, no-fly zone, pickup, and delivery constraints without suffering generalization failure on unseen instances.
What would settle it
Generate a new collection of large-scale instances that include dense, irregular no-fly zones and compare the method's routes and runtimes against known lower bounds or exact solvers on the same instances; consistent large gaps or timeouts would falsify the claim of reliable near-optimality and scalability.
Figures
read the original abstract
Truck-drone delivery is an emerging last-mile logistics mode combining the long-haul capacity of trucks with the flexible service capability of drones. In locker-based operations, smart lockers serve not only as temporary parcel storage facilities but also as automated drone docking and service nodes. These automated nodes support drone takeoff, landing, parcel handover, and battery replacement, thereby significantly extending the service range and operational flexibility of drone-assisted delivery networks. However, practical locker-based delivery systems face complex real-world challenges, requiring the integrated coordination of not only parcel delivery, return pickup, battery-constrained and load-dependent drone flights, but also necessary detours around restricted airspace. To address this practical and multifaceted challenge, this paper introduces a locker-based truck-drone routing problem with integrated considerations of pickups, deliveries, and no-fly zones (LTDRP-PDNF), with the objective of minimizing the total operational cost of a fleet of drone-equipped trucks. We formulate the route construction process as a Markov Decision Process and develop a two-stage deep reinforcement learning-based neural heuristic. The first stage utilizes an attention-based encoder and a Bidirectional Gated Recurrent Unit decoder to solve the truck-only routing problem, formulated as a capacitated vehicle routing problem. The second stage combines a policy-transfer strategy with a hybrid dispatch assignment heuristic to construct fully coordinated truck and drone routes for LTDRP-PDNF. Experiments on instances of different scales demonstrate that the proposed method outperforms metaheuristic and neural heuristic baselines in most cases while maintaining exceptionally short computation times, offering an effective, scalable solution framework under practical operational constraints.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript defines the locker-based truck-drone routing problem with pickups, deliveries, and no-fly zones (LTDRP-PDNF) and minimizes total operational cost for a fleet of drone-equipped trucks. Route construction is cast as an MDP; a two-stage DRL heuristic is proposed in which stage 1 solves the truck-only CVRP via an attention encoder plus BiGRU decoder, and stage 2 applies policy transfer together with a hybrid dispatch heuristic to produce coordinated truck-drone routes that respect battery, load, and no-fly-zone constraints. Experiments on instances of varying scales are reported to show outperformance versus metaheuristic and neural-heuristic baselines while retaining short run times.
Significance. If the performance claims are substantiated by reproducible experiments, the work supplies a scalable neural-heuristic framework for a practically relevant constrained routing problem that simultaneously handles pickups, deliveries, battery limits, and airspace restrictions. The two-stage architecture with policy transfer is a concrete contribution to the literature on hybrid truck-drone logistics.
major comments (1)
- [§5] §5 (Computational Experiments): the abstract and available description state that the method “outperforms metaheuristic and neural heuristic baselines in most cases,” yet no information is supplied on instance-generation procedure, how the battery, load, and no-fly-zone constraints are encoded inside the MDP, training hyperparameters, ablation studies, or statistical significance tests. Without these details the central empirical claim cannot be verified or reproduced.
minor comments (1)
- [Abstract] The abstract refers to “exceptionally short computation times” without quantitative comparison (e.g., wall-clock seconds versus baseline run times on the same hardware).
Simulated Author's Rebuttal
We thank the referee for the detailed review and the opportunity to improve the manuscript. The concern about insufficient experimental details is valid, and we will revise Section 5 to provide full reproducibility information while preserving the core contributions of the two-stage DRL framework.
read point-by-point responses
-
Referee: §5 (Computational Experiments): the abstract and available description state that the method “outperforms metaheuristic and neural heuristic baselines in most cases,” yet no information is supplied on instance-generation procedure, how the battery, load, and no-fly-zone constraints are encoded inside the MDP, training hyperparameters, ablation studies, or statistical significance tests. Without these details the central empirical claim cannot be verified or reproduced.
Authors: We agree that the current version of Section 5 omits critical details required for reproducibility. In the revised manuscript we will add: (i) the full instance-generation procedure, specifying how pickup/delivery demands, locker locations, battery capacities, load-dependent flight times, and no-fly zones are sampled and encoded; (ii) the precise state, action, and reward definitions that embed battery, load, and airspace constraints inside the MDP; (iii) complete training hyperparameters (learning rates, batch sizes, network dimensions, episode counts, and random seeds); (iv) ablation results isolating the attention encoder, BiGRU decoder, policy-transfer stage, and hybrid dispatch heuristic; and (v) statistical significance tests (paired t-tests or Wilcoxon signed-rank tests with p-values) across 10 independent runs per instance size. These additions will directly substantiate the performance claims. revision: yes
Circularity Check
No significant circularity
full rationale
The paper describes a two-stage DRL heuristic (attention encoder + BiGRU for truck CVRP, followed by policy transfer and hybrid dispatch) for the LTDRP-PDNF routing problem. No derivation chain, equations, or first-principles results are presented that reduce claimed performance or solutions to quantities defined by fitted parameters or self-citations within the paper. The approach is positioned as an empirical heuristic evaluated on test instances against baselines, with no self-definitional mappings, fitted-input predictions, or load-bearing self-citation chains detectable from the provided material. The central claims rest on external empirical comparisons rather than internal reduction to inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Approaches to the truck-drone routing problem: A systematic review,
J. Duan, H. Luo, and G. Wang, “Approaches to the truck-drone routing problem: A systematic review,”Swarm and Evolutionary Computation, vol. 92, p. 101825, Feb. 2025
2025
-
[2]
Energy-aware and delay-sensitive management of a drone delivery system,
W. Liu and X. Sun, “Energy-aware and delay-sensitive management of a drone delivery system,”Manufacturing & Service Operations Management, vol. 24, no. 3, pp. 1294–1310, Nov. 2022
2022
-
[3]
Delivery optimization for collaborative truck– drone routing problem considering vehicle obstacle avoidance,
F. Kong and B. Jiang, “Delivery optimization for collaborative truck– drone routing problem considering vehicle obstacle avoidance,”Com- puters & Industrial Engineering, vol. 198, p. 110659, Dec. 2024
2024
-
[4]
Multi-objective multi-drone collaborative routing problem with heterogeneous delivery and pickup service,
F. Hong, G. Wu, Y . Wang, Q. Luo, L. Wang, and J. Shi, “Multi-objective multi-drone collaborative routing problem with heterogeneous delivery and pickup service,”IEEE Transactions on Intelligent Transportation Systems, vol. 26, no. 5, pp. 6084–6100, May 2025
2025
-
[5]
The flying sidekick traveling salesman problem: Optimization of drone-assisted parcel delivery,
C. C. Murray and A. G. Chu, “The flying sidekick traveling salesman problem: Optimization of drone-assisted parcel delivery,”Transportation Research Part C: Emerging Technologies, vol. 54, pp. 86–109, May 2015
2015
-
[6]
Truck-drone hybrid delivery routing: Payload-energy dependency and no-fly zones,
H. Y . Jeong, B. D. Song, and S. Lee, “Truck-drone hybrid delivery routing: Payload-energy dependency and no-fly zones,”International Journal of Production Economics, vol. 214, pp. 220–233, Aug. 2019
2019
-
[7]
Cooperative trucks and drones for rural last-mile delivery with steep roads,
J. Xiao, “Cooperative trucks and drones for rural last-mile delivery with steep roads,”Computers & Industrial Engineering, vol. 187, p. 109849, Jan. 2024
2024
-
[8]
The drone-assisted pickup and delivery problem: An adaptive large neighborhood search metaheuristic,
T. Mulumba, W. Najy, and A. Diabat, “The drone-assisted pickup and delivery problem: An adaptive large neighborhood search metaheuristic,” Computers & Operations Research, vol. 161, p. 106435, Jan. 2024
2024
-
[9]
The multi-visit drone routing problem for pickup and delivery services,
S. Meng, X. Guo, D. Li, and G. Liu, “The multi-visit drone routing problem for pickup and delivery services,”Transportation Research Part E: Logistics and Transportation Review, vol. 169, p. 102990, Jan. 2023
2023
-
[10]
Multivisit drone-vehicle routing problem with simultaneous pickup and delivery considering no- fly zones,
Y .-Q. Liu, J. Han, Y . Zhang, Y . Li, and T. Jiang, “Multivisit drone-vehicle routing problem with simultaneous pickup and delivery considering no- fly zones,”Discrete Dynamics in Nature and Society, vol. 2023, pp. 1–21, Aug. 2023
2023
-
[11]
Elite-based multi- objective improved iterative local search algorithm for time-dependent vehicle-drone collaborative routing problem with simultaneous pickup and delivery,
H. Duan, X. Li, G. Zhang, Y . Feng, and Q. Lu, “Elite-based multi- objective improved iterative local search algorithm for time-dependent vehicle-drone collaborative routing problem with simultaneous pickup and delivery,”Engineering Applications of Artificial Intelligence, vol. 139, p. 109608, Jan. 2025
2025
-
[12]
Pickup and delivery with lockers,
M. Dell’Amico, R. Montemanni, and S. Novellani, “Pickup and delivery with lockers,”Transportation Research Part C: Emerging Technologies, vol. 148, p. 104022, Mar. 2023
2023
-
[13]
Last-mile delivery with drone and lockers,
M. A. Boschetti and S. Novellani, “Last-mile delivery with drone and lockers,”Networks, vol. 83, no. 2, pp. 213–235, Mar. 2024
2024
-
[14]
Deploying autonomous mobile lockers in a two-echelon parcel operation,
J. Li, H. Ensafian, M. G. H. Bell, and D. G. Geers, “Deploying autonomous mobile lockers in a two-echelon parcel operation,”Trans- portation Research Part C: Emerging Technologies, vol. 128, p. 103155, Jul. 2021
2021
-
[15]
Cost-optimal deployment of autonomous mobile lockers co- operating with couriers for simultaneous pickup and delivery opera- tions,
H. Ensafian, A. Zare Andaryan, M. G. H. Bell, D. G. Geers, P. Kilby, and J. Li, “Cost-optimal deployment of autonomous mobile lockers co- operating with couriers for simultaneous pickup and delivery opera- tions,”Transportation Research Part C: Emerging Technologies, vol. 146, p. 103958, Jan. 2023
2023
-
[16]
A variable neighborhood search algorithm for locker-based drone delivery makespan minimization prob- lem,
W. Zhu, H. Sun, X. Hu, and Y . Ma, “A variable neighborhood search algorithm for locker-based drone delivery makespan minimization prob- lem,”Transportation Research Part E: Logistics and Transportation Review, vol. 192, p. 103820, Dec. 2024
2024
-
[17]
Minimizing the total travel distance for the locker-based drone delivery: A branch-and-cut-based method,
W. Zhu, X. Hu, J. Pei, and P. M. Pardalos, “Minimizing the total travel distance for the locker-based drone delivery: A branch-and-cut-based method,”Transportation Research Part B: Methodological, vol. 184, p. 102950, Jun. 2024
2024
-
[18]
Multi-agent deep reinforcement learn- ing for recharging-considered vehicle scheduling problem in container terminals,
A. Che, Z. Wang, and C. Zhou, “Multi-agent deep reinforcement learn- ing for recharging-considered vehicle scheduling problem in container terminals,”IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 11, pp. 16 855–16 868, Nov. 2024
2024
-
[19]
A multi- agent reinforcement learning method with route recorders for vehicle routing in supply chain management,
L. Ren, X. Fan, J. Cui, Z. Shen, Y . Lv, and G. Xiong, “A multi- agent reinforcement learning method with route recorders for vehicle routing in supply chain management,”IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 9, pp. 16 410–16 420, Sep. 2022
2022
-
[20]
Attention, learn to solve routing problems!
W. Kool, H. Van Hoof, and M. Welling, “Attention, learn to solve routing problems!” inProceedings of the 6th International Conference on Learning Representations (ICLR), 2018, pp. 1–12
2018
-
[21]
Heterogeneous at- tentions for solving pickup and delivery problem via deep reinforcement learning,
J. Li, L. Xin, Z. Cao, A. Lim, W. Song, and J. Zhang, “Heterogeneous at- tentions for solving pickup and delivery problem via deep reinforcement learning,”IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 3, pp. 2306–2315, Mar. 2022
2022
-
[22]
Multiobjective vehicle routing optimization with time windows: A hybrid approach using deep reinforcement learning and NSGA-II,
R. Wu, R. Wang, J. Hao, Q. Wu, P. Wang, and D. Niyato, “Multiobjective vehicle routing optimization with time windows: A hybrid approach using deep reinforcement learning and NSGA-II,”IEEE Transactions on Intelligent Transportation Systems, vol. 26, no. 3, pp. 4032–4047, Mar. 2025
2025
-
[23]
Deep reinforcement learning for solving vehicle routing problems with backhauls,
C. Wang, Z. Cao, Y . Wu, L. Teng, and G. Wu, “Deep reinforcement learning for solving vehicle routing problems with backhauls,”IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 3, pp. 1–15, Jan. 2024
2024
-
[24]
Deep learning-based predicting and compensating method for the pose deviations of parallel robots,
X. Zhuet al., “Deep learning-based predicting and compensating method for the pose deviations of parallel robots,”Computers & Industrial Engineering, vol. 191, p. 110179, May 2024
2024
-
[25]
Machine learning to solve vehicle routing problems: A survey,
A. Bogyrbayeva, M. Meraliyev, T. Mustakhov, and B. Dauletbayev, “Machine learning to solve vehicle routing problems: A survey,”IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 6, pp. 4754–4772, Jun. 2024
2024
-
[26]
A deep reinforcement learning approach for solving the traveling sales- man problem with drone,
A. Bogyrbayeva, T. Yoon, H. Ko, S. Lim, H. Yun, and C. Kwon, “A deep reinforcement learning approach for solving the traveling sales- man problem with drone,”Transportation Research Part C: Emerging Technologies, vol. 148, p. 103981, Mar. 2023
2023
-
[27]
The flying sidekick traveling salesman problem with stochastic travel time: A reinforcement learning approach,
Z. Liu, X. Li, and A. Khojandi, “The flying sidekick traveling salesman problem with stochastic travel time: A reinforcement learning approach,” Transportation Research Part E: Logistics and Transportation Review, vol. 164, p. 102816, Aug. 2022
2022
-
[28]
Multi-agent deep reinforcement learning-based truck-drone collaborative routing with dy- namic emergency response,
W. Peng, D. Wang, Y . Yin, and T. C. E. Cheng, “Multi-agent deep reinforcement learning-based truck-drone collaborative routing with dy- namic emergency response,”Transportation Research Part E: Logistics and Transportation Review, vol. 195, p. 103974, Mar. 2025
2025
-
[29]
Efficient joint deployment of multi-UA Vs for target tracking in traffic big data,
L. Sun, J. Wang, J. Wang, L. Lin, and M. Gen, “Efficient joint deployment of multi-UA Vs for target tracking in traffic big data,”IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 7, pp. 7780–7791, Jul. 2024
2024
-
[30]
GLU Variants Improve Transformer
N. Shazeer, “GLU variants improve transformer,” arXiv:2002.05202, Feb. 2020
work page internal anchor Pith review Pith/arXiv arXiv 2002
-
[31]
What is being transferred in transfer learning?
B. Neyshabur, H. Sedghi, and C. Zhang, “What is being transferred in transfer learning?” inProceedings of Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 512–523
2020
-
[32]
POMO: Policy optimization with multiple optima for reinforcement learning,
Y .-D. Kwon, J. Choo, B. Kim, I. Yoon, Y . Gwon, and S. Min, “POMO: Policy optimization with multiple optima for reinforcement learning,” inProceedings of Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 21 188–21 198
2020
-
[33]
Joint optimization of customer location clustering and drone-based routing for last-mile deliveries,
M. Salama and S. Srinivas, “Joint optimization of customer location clustering and drone-based routing for last-mile deliveries,”Transporta- tion Research Part C: Emerging Technologies, vol. 114, pp. 620–642, May 2020
2020
-
[34]
Slack induction by string re- movals for vehicle routing problems,
J. Christiaens and G. Vanden Berghe, “Slack induction by string re- movals for vehicle routing problems,”Transportation Science, vol. 54, no. 2, pp. 417–433, Mar. 2020
2020
-
[35]
Truck–drone routing problem with stochastic demand,
F. Wang, H. Li, and H. Xiong, “Truck–drone routing problem with stochastic demand,”European Journal of Operational Research, vol. 322, no. 3, pp. 854–869, May 2025
2025
-
[36]
Particle swarm optimization for integrated scheduling problem with batch additive manufacturing and batch direct-shipping delivery,
W. Chandra Sugianto and B. S. Kim, “Particle swarm optimization for integrated scheduling problem with batch additive manufacturing and batch direct-shipping delivery,”Computers & Operations Research, vol. 161, p. 106430, Jan. 2024. Xuanyu Liuis currently pursuing the Ph.D. degree with Chang’an University. He received the B.S. degree from Zhengzhou Univer...
2024
-
[37]
15 Hui Hureceived the Ph.D
His current research interests include delivery system design and truck-drone collaborative routing optimization. 15 Hui Hureceived the Ph.D. degree in systems en- gineering from Beijing Jiaotong University, Bei- jing, China, in 2008. She is currently a professor and the Director of the Institute of Transportation Systems Organization and Control, Chang’a...
2008
-
[38]
She has authored or coauthored more than 20 publications
She is currently an associate professor in the College of Transportation Engineering at Chang’an University. She has authored or coauthored more than 20 publications. Her current research interests include port scheduling optimization, transportation planning and logistics optimization. Ziliang Wangis a Lecturer from School of Trans- portation Engineering...
2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.