A Deep Reinforcement Learning Approach for Global Routing
Pith reviewed 2026-05-25 19:28 UTC · model grok-4.3
The pith
A deep reinforcement learning agent can outperform sequential A* in finding global routing solutions for circuit designs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The deep reinforcement learning approach enables an agent to produce an optimal routing policy based on the variety of problems presented, and the fine-tuned model outperforms the sequential A* method in routing solutions and rewards within the simulated global routing environment.
What carries the argument
The deep reinforcement learning agent that uses a conjoint optimization mechanism to learn a policy for connecting circuit components with wires in a simulated environment.
Load-bearing premise
The simulated environment and reward structure must accurately capture the constraints and objectives of real global routing problems so that the policy generalizes beyond the training instances.
What would settle it
Evaluating the trained routing agent on actual global routing instances from commercial electronic design automation software and comparing total wire length, via count, and design rule violations against A* results.
Figures
read the original abstract
Global routing has been a historically challenging problem in electronic circuit design, where the challenge is to connect a large and arbitrary number of circuit components with wires without violating the design rules for the printed circuit boards or integrated circuits. Similar routing problems also exist in the design of complex hydraulic systems, pipe systems and logistic networks. Existing solutions typically consist of greedy algorithms and hard-coded heuristics. As such, existing approaches suffer from a lack of model flexibility and non-optimum solutions. As an alternative approach, this work presents a deep reinforcement learning method for solving the global routing problem in a simulated environment. At the heart of the proposed method is deep reinforcement learning that enables an agent to produce an optimal policy for routing based on the variety of problems it is presented with leveraging the conjoint optimization mechanism of deep reinforcement learning. Conjoint optimization mechanism is explained and demonstrated in details; the best network structure and the parameters of the learned model are explored. Based on the fine-tuned model, routing solutions and rewards are presented and analyzed. The results indicate that the approach can outperform the benchmark method of a sequential A* method, suggesting a promising potential for deep reinforcement learning for global routing and other routing or path planning problems in general. Another major contribution of this work is the development of a global routing problem sets generator with the ability to generate parameterized global routing problem sets with different size and constraints, enabling evaluation of different routing algorithms and the generation of training datasets for future data-driven routing approaches.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a deep reinforcement learning (DRL) method for global routing in circuit design. It introduces a custom parameterized problem-set generator and reports that a fine-tuned DRL policy produces better routing solutions and rewards than a sequential A* baseline when evaluated on instances drawn from the generator.
Significance. If the empirical claims hold under more rigorous evaluation, the work would demonstrate a data-driven alternative to heuristic routing methods and supply a generator useful for training and benchmarking future routing algorithms. The conjoint optimization aspect of DRL is presented as a conceptual contribution.
major comments (2)
- [Results] Results section: All reported comparisons are performed exclusively on synthetic instances from the authors' custom generator; no experiments appear on standard ISPD 2007/2008 global routing benchmarks containing realistic netlists, layer counts, and design-rule constraints. This directly undermines the load-bearing claim that the DRL approach outperforms A* for global routing.
- [Abstract and Results] Abstract and Results section: The outperformance claim is stated without quantitative metrics (e.g., wirelength, via counts, overflow), statistical significance tests, problem-size ranges, or ablation studies on network architecture or reward components.
minor comments (2)
- [Method] The description of the state representation and action space in the DRL formulation could be made more precise with explicit equations or pseudocode.
- [Results] Figure captions and axis labels in the results plots lack sufficient detail on the exact metrics plotted and the number of instances averaged.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and outline the revisions we will make.
read point-by-point responses
-
Referee: [Results] Results section: All reported comparisons are performed exclusively on synthetic instances from the authors' custom generator; no experiments appear on standard ISPD 2007/2008 global routing benchmarks containing realistic netlists, layer counts, and design-rule constraints. This directly undermines the load-bearing claim that the DRL approach outperforms A* for global routing.
Authors: We agree that all quantitative comparisons are performed on instances drawn from the custom generator. This choice was deliberate to enable controlled variation of problem parameters (grid size, net count, layer constraints) and to isolate the effect of conjoint optimization in DRL. The generator is explicitly presented as a contribution for creating training and evaluation data. We acknowledge that the absence of results on ISPD 2007/2008 benchmarks limits direct claims about performance on industrial netlists with full design-rule sets. In the revised manuscript we will add a limitations paragraph that (a) states this scope explicitly and (b) describes the additional engineering steps (multi-layer encoding, via-cost modeling, overflow handling) needed to apply the same policy network to those benchmarks. We will not claim general superiority beyond the generated distribution. revision: partial
-
Referee: [Abstract and Results] Abstract and Results section: The outperformance claim is stated without quantitative metrics (e.g., wirelength, via counts, overflow), statistical significance tests, problem-size ranges, or ablation studies on network architecture or reward components.
Authors: We will revise the abstract and the results section to report concrete metrics: average reward, total wirelength, via count, and overflow where measured; the range of grid sizes and net counts used (e.g., 8×8 to 32×32 grids with 10–100 nets); and the number of evaluation instances. We will also add error bars across multiple random seeds and note any statistical tests performed. In addition, we will include ablation results comparing network architectures (MLP vs. CNN variants) and reward-component weightings. These quantitative details and ablations will be inserted into the revised manuscript. revision: yes
Circularity Check
No circularity; empirical DRL evaluation against external A* baseline on generated instances
full rationale
The manuscript describes training a DRL policy on instances from a custom parameterized generator and reports empirical outperformance versus sequential A* on the same generator distribution. No derivation chain, fitted-parameter-as-prediction, or self-citation load-bearing step is present. The central claim is a direct experimental comparison whose validity rests on the (external) question of generator realism rather than any internal reduction of result to input by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Machine learning applications in physical design: Recent results and directions
Kahng, A. B., 2018. “Machine learning applications in physical design: Recent results and directions”. In Proceedings of the 2018 International Symposium on Physical Design, ACM, pp. 68–73
work page 2018
-
[2]
Moore’s law: past, present and future
Schaller, R. R., 1997. “Moore’s law: past, present and future”. IEEE spectrum, 34(6), pp. 52–59
work page 1997
-
[3]
A survey on multi- net global routing for integrated circuits
Hu, J., and Sapatnekar, S. S., 2001. “A survey on multi- net global routing for integrated circuits”. Integration, 31(1), pp. 1–49
work page 2001
-
[4]
The complexity of wirerout- ing and finding minimum area layouts for arbitrary vlsi circuits
Kramer, M. R., 1984. “The complexity of wirerout- ing and finding minimum area layouts for arbitrary vlsi circuits”. Advances in computing research, 2, pp. 129– 146
work page 1984
-
[5]
Auto- mated al-based mechanical design of hydraulic mani- fold blocks
Chambon, R., and Tollenaere, M., 1991. “Auto- mated al-based mechanical design of hydraulic mani- fold blocks”. Computer-Aided Design, 23(3), pp. 213– 222
work page 1991
-
[6]
A design expert system for auto-routing of ship pipes
Kang, S.-S., Myung, S., and Han, S., 1999. “A design expert system for auto-routing of ship pipes”. Journal of Ship Production, 15(1), pp. 1–9
work page 1999
-
[7]
Pipe rout- ing through ant colony optimization
Christodoulou, S. E., and Ellinas, G., 2010. “Pipe rout- ing through ant colony optimization”. Journal of In- frastructure systems, 16(2), pp. 149–159
work page 2010
-
[8]
Modeling distribution-system water quality; dynamic approach
Grayman, W. M., Clark, R. M., and Males, R. M., 1988. “Modeling distribution-system water quality; dynamic approach”. Journal of Water Resources Planning and Management, 114(3), pp. 295–312
work page 1988
-
[9]
Advanced routing for city logistics service providers based on time-dependent travel times
Ehmke, J. F., Steinert, A., and Mattfeld, D. C., 2012. “Advanced routing for city logistics service providers based on time-dependent travel times”.Journal of com- putational science, 3(4), pp. 193–205
work page 2012
-
[10]
Vehicle routing and scheduling models, simulation and city logistics
Barcel ´o, J., Grzybowska, H., and Pardo, S., 2007. “Vehicle routing and scheduling models, simulation and city logistics”. In Dynamic Fleet Management . Springer, pp. 163–195
work page 2007
-
[11]
Use of steiner’s problem in suboptimal routing in rectilinear metric
Lee, J., Bose, N., and Hwang, F., 1976. “Use of steiner’s problem in suboptimal routing in rectilinear metric”. IEEE Transactions on Circuits and Systems, 23(7), pp. 470–476
work page 1976
-
[12]
Maizerouter: Engineering an effective global router
Moffitt, M. D., 2008. “Maizerouter: Engineering an effective global router”. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Sys- tems, 27(11), pp. 2017–2026
work page 2008
-
[13]
Kastner, R., Bozorgzadeh, E., Sarrafzadeh, M., and Sarrafzadeh, M., 2000. “Predictable routing”. In Pro- ceedings of the 2000 IEEE/ACM international confer- ence on Computer-aided design, IEEE Press, pp. 110– 114
work page 2000
-
[14]
Boxrouter: a new global router based on box expansion and progressive ilp
Cho, M., and Pan, D. Z., 2007. “Boxrouter: a new global router based on box expansion and progressive ilp”. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 26(12), pp. 2130– 2143
work page 2007
-
[15]
A force-directed global router
Hasan, N., 1987. “A force-directed global router”. PhD thesis, University of Illinois at Urbana-Champaign
work page 1987
-
[16]
Playing Atari with Deep Reinforcement Learning
Mnih, V ., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M., 2013. “Playing atari with deep reinforcement learning”. arXiv preprint arXiv:1312.5602
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[17]
Human- level control through deep reinforcement learning
Mnih, V ., Kavukcuoglu, K., Silver, D., Rusu, A. A., Ve- ness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al., 2015. “Human- level control through deep reinforcement learning”. Nature, 518(7540), p. 529
work page 2015
-
[18]
Deep reinforcement learning with double q-learning
Van Hasselt, H., Guez, A., and Silver, D., 2016. “Deep reinforcement learning with double q-learning.”. In AAAI, V ol. 2, Phoenix, AZ, p. 5
work page 2016
-
[19]
Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation
Kulkarni, T. D., Narasimhan, K., Saeedi, A., and Tenenbaum, J., 2016. “Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation”. In Advances in neural information pro- cessing systems, pp. 3675–3683
work page 2016
-
[20]
Distributed Prioritized Experience Replay
Horgan, D., Quan, J., Budden, D., Barth-Maron, G., Hessel, M., Van Hasselt, H., and Silver, D., 2018. “Dis- tributed prioritized experience replay”. arXiv preprint arXiv:1803.00933
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[21]
A general reinforce- ment learning algorithm that masters chess, shogi, and go through self-play
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., et al., 2018. “A general reinforce- ment learning algorithm that masters chess, shogi, and go through self-play”. Science, 362(6419), pp. 1140– 1144
work page 2018
-
[22]
Bellman, R., 1957. “A markovian decision process”. Indiana Univ. Math. J., 6, pp. 679–684
work page 1957
-
[23]
VLSI placement and global routing using simulated annealing , V ol
Sechen, C., 2012. VLSI placement and global routing using simulated annealing , V ol. 54. Springer Science & Business Media
work page 2012
-
[24]
Rsr: A new rectilinear steiner minimum tree approxi- mation for fpga placement and global routing
de Vincente, J., Lanchares, J., and Hermida, R., 1998. “Rsr: A new rectilinear steiner minimum tree approxi- mation for fpga placement and global routing”. In Eu- romicro Conference, 1998. Proceedings. 24th, V ol. 1, IEEE, pp. 192–195
work page 1998
-
[25]
New algorithms for the rectilinear steiner tree problem
Ho, J.-M., Vijayan, G., and Wong, C.-K., 1990. “New algorithms for the rectilinear steiner tree problem”. IEEE Transactions on Computer-Aided Design of In- tegrated Circuits and Systems, 9(2), pp. 185–193
work page 1990
-
[26]
Global routing based on steiner min-max trees
Wong, C. C. M. S. C., 1990. “Global routing based on steiner min-max trees”. IEEE Trans. Comput. Aided Des, 9, pp. 1315–1325
work page 1990
-
[27]
Boxrouter 2.0: Architecture and implementation of a hybrid and robust global router
Cho, M., Lu, K., Yuan, K., and Pan, D. Z., 2007. “Boxrouter 2.0: Architecture and implementation of a hybrid and robust global router”. In Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design, IEEE Press, pp. 503–508
work page 2007
-
[28]
Accurate pre- diction of detailed routing congestion using supervised data learning
Qi, Z., Cai, Y ., and Zhou, Q., 2014. “Accurate pre- diction of detailed routing congestion using supervised data learning”. In 2014 IEEE 32nd International Con- ference on Computer Design (ICCD), IEEE, pp. 97– 103
work page 2014
-
[29]
An accurate detailed routing routability prediction model in placement
Zhou, Q., Wang, X., Qi, Z., Chen, Z., Zhou, Q., and Cai, Y ., 2015. “An accurate detailed routing routability prediction model in placement”. In 2015 6th Asia Sym- posium on Quality Electronic Design (ASQED), IEEE, pp. 119–122
work page 2015
-
[30]
Sutton, R. S., Barto, A. G., et al., 1998. Introduction 13 Copyright c⃝ by ASME to reinforcement learning, V ol. 135. MIT press Cam- bridge
work page 1998
-
[31]
Rectified linear units improve restricted boltzmann machines
Nair, V ., and Hinton, G. E., 2010. “Rectified linear units improve restricted boltzmann machines”. In Proceed- ings of the 27th international conference on machine learning (ICML-10), pp. 807–814
work page 2010
-
[32]
Nguyen, T. T., Nguyen, N. D., and Nahavandi, S., 2018. “Deep reinforcement learning for multi-agent systems: A review of challenges, solutions and applications”. CoRR, abs/1812.11794
-
[33]
Multiagent re- inforcement learning: theoretical framework and an al- gorithm
Hu, J., Wellman, M. P., et al., 1998. “Multiagent re- inforcement learning: theoretical framework and an al- gorithm.”. In ICML, V ol. 98, Citeseer, pp. 242–250
work page 1998
-
[34]
A com- prehensive survey of multiagent reinforcement learn- ing
Bu, L., Babu, R., De Schutter, B., et al., 2008. “A com- prehensive survey of multiagent reinforcement learn- ing”. IEEE Transactions on Systems, Man, and Cy- bernetics, Part C (Applications and Reviews), 38(2), pp. 156–172
work page 2008
-
[35]
Multi-agent reinforcement learning: Independent vs. cooperative agents
Tan, M., 1993. “Multi-agent reinforcement learning: Independent vs. cooperative agents”. In Proceedings of the tenth international conference on machine learning, pp. 330–337
work page 1993
-
[36]
Schaul, T., Quan, J., Antonoglou, I., and Silver, D.,
-
[37]
“Prioritized experience replay”. arXiv preprint arXiv:1511.05952. A Appendix A: Heat maps showing edge utilization of different problem sets based on A* solution B Appendix B: DQN Global Routing Algorithm
work page internal anchor Pith review Pith/arXiv arXiv
-
[38]
Burn in replay memory D=e1,e2, ...,et to capacity N et = (st ,at ,rt ,st+1,IsTerminalt+1)
-
[39]
Initialize action-value function Q with random weights
-
[40]
Decompose multi-pin problems with Minimum Spanning Tree
-
[41]
Network training: for episode = 1, maximum episodes Initialize sequence s1 starting at one pin With probability ε select a random action at otherwise select at = maxaQ∗(φ(st ,a; θ)) Execute action at in environment and observe reward rt Update the environment (capacity) for t = 1, maximum steps Set st+1 = st ,at Store transition (st ,at ,rt ,st+1,IsTermin...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.