pith. sign in

arxiv: 1906.08809 · v1 · pith:E3TZ27H4new · submitted 2019-06-20 · 💻 cs.LG · cs.AI· stat.ML

A Deep Reinforcement Learning Approach for Global Routing

Pith reviewed 2026-05-25 19:28 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML
keywords global routingdeep reinforcement learningcircuit designpath planningA* algorithmproblem generationelectronic design automation
0
0 comments X

The pith

A deep reinforcement learning agent can outperform sequential A* in finding global routing solutions for circuit designs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper applies deep reinforcement learning to the global routing problem, where wires must connect circuit components without violating design rules. The method trains an agent in a simulated environment to learn a routing policy that works across many different problem instances through conjoint optimization. It also introduces a generator that creates routing problems with adjustable sizes and constraints for training and testing. If the learned policy generalizes, it offers a flexible alternative to traditional greedy algorithms and heuristics that often yield suboptimal results. Experiments show the fine-tuned model achieves higher rewards than a sequential A* benchmark on the generated problems.

Core claim

The deep reinforcement learning approach enables an agent to produce an optimal routing policy based on the variety of problems presented, and the fine-tuned model outperforms the sequential A* method in routing solutions and rewards within the simulated global routing environment.

What carries the argument

The deep reinforcement learning agent that uses a conjoint optimization mechanism to learn a policy for connecting circuit components with wires in a simulated environment.

Load-bearing premise

The simulated environment and reward structure must accurately capture the constraints and objectives of real global routing problems so that the policy generalizes beyond the training instances.

What would settle it

Evaluating the trained routing agent on actual global routing instances from commercial electronic design automation software and comparing total wire length, via count, and design rule violations against A* results.

Figures

Figures reproduced from arXiv: 1906.08809 by Barnabas Poczos, Haiguang Liao, Kenji Shimada, Levent Burak Kara, Wentai Zhang, Xuliang Dong.

Figure 1
Figure 1. Figure 1: Sample of a real circuit and corresponding grid graph for [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Pipeline for solving global routing with DQN. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Example configurations of the simulated global routing envi [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Heat maps and edge utilization histogram of two global routing problems generated with global routing problem set generator. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Workflow of DQN-based router. of the best action (as deemed by the network) for the given state of the problem. Training Parameters: These parameters are listed in Tab. 1, which only includes the optimized set of parameters. The model and parameter tuning process are shown and dis￾cussed next. Training Specifications: The implementation of the proposed algorithm is completed with Python and Tensor￾flow. Th… view at source ↗
Figure 6
Figure 6. Figure 6: Analysis on DQN model performance on two types of problems. Type I problem: no-edge-depletion type: (a) edge utilization plot; [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Case study showing the conjoint optimization of DQN router: [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Reward plots based on different combinations of model pa [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Influence of discount factor γ on performance of DQN router and reward curves. WL of DQN routing solutions (red) with γ value of (a) 1, (b) 0.95, (c) 0.9, (d) 0.8 and their comparison to A* routing solutions (blue); Decrease in WL of DQN-based routing solutions compared to A* routing solutions with γ value of (e) 1, (f) 0.95, (g) 0.9, (h) 0.8; log-plot of reward curves for 10 randomly selected problems so… view at source ↗
Figure 11
Figure 11. Figure 11: Burn-in memory study showing statistical results of DQN [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Visualized results showing routing solutions for [PITH_FULL_IMAGE:figures/full_fig_p012_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Heat maps and edge utilization histogram of two global routing problems generated with global routing problem sets generator. [PITH_FULL_IMAGE:figures/full_fig_p015_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Heat maps and edge utilization histogram of two global routing problems generated with global routing problem sets generator. [PITH_FULL_IMAGE:figures/full_fig_p016_14.png] view at source ↗
read the original abstract

Global routing has been a historically challenging problem in electronic circuit design, where the challenge is to connect a large and arbitrary number of circuit components with wires without violating the design rules for the printed circuit boards or integrated circuits. Similar routing problems also exist in the design of complex hydraulic systems, pipe systems and logistic networks. Existing solutions typically consist of greedy algorithms and hard-coded heuristics. As such, existing approaches suffer from a lack of model flexibility and non-optimum solutions. As an alternative approach, this work presents a deep reinforcement learning method for solving the global routing problem in a simulated environment. At the heart of the proposed method is deep reinforcement learning that enables an agent to produce an optimal policy for routing based on the variety of problems it is presented with leveraging the conjoint optimization mechanism of deep reinforcement learning. Conjoint optimization mechanism is explained and demonstrated in details; the best network structure and the parameters of the learned model are explored. Based on the fine-tuned model, routing solutions and rewards are presented and analyzed. The results indicate that the approach can outperform the benchmark method of a sequential A* method, suggesting a promising potential for deep reinforcement learning for global routing and other routing or path planning problems in general. Another major contribution of this work is the development of a global routing problem sets generator with the ability to generate parameterized global routing problem sets with different size and constraints, enabling evaluation of different routing algorithms and the generation of training datasets for future data-driven routing approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a deep reinforcement learning (DRL) method for global routing in circuit design. It introduces a custom parameterized problem-set generator and reports that a fine-tuned DRL policy produces better routing solutions and rewards than a sequential A* baseline when evaluated on instances drawn from the generator.

Significance. If the empirical claims hold under more rigorous evaluation, the work would demonstrate a data-driven alternative to heuristic routing methods and supply a generator useful for training and benchmarking future routing algorithms. The conjoint optimization aspect of DRL is presented as a conceptual contribution.

major comments (2)
  1. [Results] Results section: All reported comparisons are performed exclusively on synthetic instances from the authors' custom generator; no experiments appear on standard ISPD 2007/2008 global routing benchmarks containing realistic netlists, layer counts, and design-rule constraints. This directly undermines the load-bearing claim that the DRL approach outperforms A* for global routing.
  2. [Abstract and Results] Abstract and Results section: The outperformance claim is stated without quantitative metrics (e.g., wirelength, via counts, overflow), statistical significance tests, problem-size ranges, or ablation studies on network architecture or reward components.
minor comments (2)
  1. [Method] The description of the state representation and action space in the DRL formulation could be made more precise with explicit equations or pseudocode.
  2. [Results] Figure captions and axis labels in the results plots lack sufficient detail on the exact metrics plotted and the number of instances averaged.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and outline the revisions we will make.

read point-by-point responses
  1. Referee: [Results] Results section: All reported comparisons are performed exclusively on synthetic instances from the authors' custom generator; no experiments appear on standard ISPD 2007/2008 global routing benchmarks containing realistic netlists, layer counts, and design-rule constraints. This directly undermines the load-bearing claim that the DRL approach outperforms A* for global routing.

    Authors: We agree that all quantitative comparisons are performed on instances drawn from the custom generator. This choice was deliberate to enable controlled variation of problem parameters (grid size, net count, layer constraints) and to isolate the effect of conjoint optimization in DRL. The generator is explicitly presented as a contribution for creating training and evaluation data. We acknowledge that the absence of results on ISPD 2007/2008 benchmarks limits direct claims about performance on industrial netlists with full design-rule sets. In the revised manuscript we will add a limitations paragraph that (a) states this scope explicitly and (b) describes the additional engineering steps (multi-layer encoding, via-cost modeling, overflow handling) needed to apply the same policy network to those benchmarks. We will not claim general superiority beyond the generated distribution. revision: partial

  2. Referee: [Abstract and Results] Abstract and Results section: The outperformance claim is stated without quantitative metrics (e.g., wirelength, via counts, overflow), statistical significance tests, problem-size ranges, or ablation studies on network architecture or reward components.

    Authors: We will revise the abstract and the results section to report concrete metrics: average reward, total wirelength, via count, and overflow where measured; the range of grid sizes and net counts used (e.g., 8×8 to 32×32 grids with 10–100 nets); and the number of evaluation instances. We will also add error bars across multiple random seeds and note any statistical tests performed. In addition, we will include ablation results comparing network architectures (MLP vs. CNN variants) and reward-component weightings. These quantitative details and ablations will be inserted into the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical DRL evaluation against external A* baseline on generated instances

full rationale

The manuscript describes training a DRL policy on instances from a custom parameterized generator and reports empirical outperformance versus sequential A* on the same generator distribution. No derivation chain, fitted-parameter-as-prediction, or self-citation load-bearing step is present. The central claim is a direct experimental comparison whose validity rests on the (external) question of generator realism rather than any internal reduction of result to input by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The method implicitly relies on standard RL assumptions (Markov decision process formulation, reward design) that are not detailed.

pith-pipeline@v0.9.0 · 5810 in / 946 out tokens · 22085 ms · 2026-05-25T19:28:34.445860+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 3 internal anchors

  1. [1]

    Machine learning applications in physical design: Recent results and directions

    Kahng, A. B., 2018. “Machine learning applications in physical design: Recent results and directions”. In Proceedings of the 2018 International Symposium on Physical Design, ACM, pp. 68–73

  2. [2]

    Moore’s law: past, present and future

    Schaller, R. R., 1997. “Moore’s law: past, present and future”. IEEE spectrum, 34(6), pp. 52–59

  3. [3]

    A survey on multi- net global routing for integrated circuits

    Hu, J., and Sapatnekar, S. S., 2001. “A survey on multi- net global routing for integrated circuits”. Integration, 31(1), pp. 1–49

  4. [4]

    The complexity of wirerout- ing and finding minimum area layouts for arbitrary vlsi circuits

    Kramer, M. R., 1984. “The complexity of wirerout- ing and finding minimum area layouts for arbitrary vlsi circuits”. Advances in computing research, 2, pp. 129– 146

  5. [5]

    Auto- mated al-based mechanical design of hydraulic mani- fold blocks

    Chambon, R., and Tollenaere, M., 1991. “Auto- mated al-based mechanical design of hydraulic mani- fold blocks”. Computer-Aided Design, 23(3), pp. 213– 222

  6. [6]

    A design expert system for auto-routing of ship pipes

    Kang, S.-S., Myung, S., and Han, S., 1999. “A design expert system for auto-routing of ship pipes”. Journal of Ship Production, 15(1), pp. 1–9

  7. [7]

    Pipe rout- ing through ant colony optimization

    Christodoulou, S. E., and Ellinas, G., 2010. “Pipe rout- ing through ant colony optimization”. Journal of In- frastructure systems, 16(2), pp. 149–159

  8. [8]

    Modeling distribution-system water quality; dynamic approach

    Grayman, W. M., Clark, R. M., and Males, R. M., 1988. “Modeling distribution-system water quality; dynamic approach”. Journal of Water Resources Planning and Management, 114(3), pp. 295–312

  9. [9]

    Advanced routing for city logistics service providers based on time-dependent travel times

    Ehmke, J. F., Steinert, A., and Mattfeld, D. C., 2012. “Advanced routing for city logistics service providers based on time-dependent travel times”.Journal of com- putational science, 3(4), pp. 193–205

  10. [10]

    Vehicle routing and scheduling models, simulation and city logistics

    Barcel ´o, J., Grzybowska, H., and Pardo, S., 2007. “Vehicle routing and scheduling models, simulation and city logistics”. In Dynamic Fleet Management . Springer, pp. 163–195

  11. [11]

    Use of steiner’s problem in suboptimal routing in rectilinear metric

    Lee, J., Bose, N., and Hwang, F., 1976. “Use of steiner’s problem in suboptimal routing in rectilinear metric”. IEEE Transactions on Circuits and Systems, 23(7), pp. 470–476

  12. [12]

    Maizerouter: Engineering an effective global router

    Moffitt, M. D., 2008. “Maizerouter: Engineering an effective global router”. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Sys- tems, 27(11), pp. 2017–2026

  13. [13]

    Predictable routing

    Kastner, R., Bozorgzadeh, E., Sarrafzadeh, M., and Sarrafzadeh, M., 2000. “Predictable routing”. In Pro- ceedings of the 2000 IEEE/ACM international confer- ence on Computer-aided design, IEEE Press, pp. 110– 114

  14. [14]

    Boxrouter: a new global router based on box expansion and progressive ilp

    Cho, M., and Pan, D. Z., 2007. “Boxrouter: a new global router based on box expansion and progressive ilp”. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 26(12), pp. 2130– 2143

  15. [15]

    A force-directed global router

    Hasan, N., 1987. “A force-directed global router”. PhD thesis, University of Illinois at Urbana-Champaign

  16. [16]

    Playing Atari with Deep Reinforcement Learning

    Mnih, V ., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M., 2013. “Playing atari with deep reinforcement learning”. arXiv preprint arXiv:1312.5602

  17. [17]

    Human- level control through deep reinforcement learning

    Mnih, V ., Kavukcuoglu, K., Silver, D., Rusu, A. A., Ve- ness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al., 2015. “Human- level control through deep reinforcement learning”. Nature, 518(7540), p. 529

  18. [18]

    Deep reinforcement learning with double q-learning

    Van Hasselt, H., Guez, A., and Silver, D., 2016. “Deep reinforcement learning with double q-learning.”. In AAAI, V ol. 2, Phoenix, AZ, p. 5

  19. [19]

    Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation

    Kulkarni, T. D., Narasimhan, K., Saeedi, A., and Tenenbaum, J., 2016. “Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation”. In Advances in neural information pro- cessing systems, pp. 3675–3683

  20. [20]

    Distributed Prioritized Experience Replay

    Horgan, D., Quan, J., Budden, D., Barth-Maron, G., Hessel, M., Van Hasselt, H., and Silver, D., 2018. “Dis- tributed prioritized experience replay”. arXiv preprint arXiv:1803.00933

  21. [21]

    A general reinforce- ment learning algorithm that masters chess, shogi, and go through self-play

    Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., et al., 2018. “A general reinforce- ment learning algorithm that masters chess, shogi, and go through self-play”. Science, 362(6419), pp. 1140– 1144

  22. [22]

    A markovian decision process

    Bellman, R., 1957. “A markovian decision process”. Indiana Univ. Math. J., 6, pp. 679–684

  23. [23]

    VLSI placement and global routing using simulated annealing , V ol

    Sechen, C., 2012. VLSI placement and global routing using simulated annealing , V ol. 54. Springer Science & Business Media

  24. [24]

    Rsr: A new rectilinear steiner minimum tree approxi- mation for fpga placement and global routing

    de Vincente, J., Lanchares, J., and Hermida, R., 1998. “Rsr: A new rectilinear steiner minimum tree approxi- mation for fpga placement and global routing”. In Eu- romicro Conference, 1998. Proceedings. 24th, V ol. 1, IEEE, pp. 192–195

  25. [25]

    New algorithms for the rectilinear steiner tree problem

    Ho, J.-M., Vijayan, G., and Wong, C.-K., 1990. “New algorithms for the rectilinear steiner tree problem”. IEEE Transactions on Computer-Aided Design of In- tegrated Circuits and Systems, 9(2), pp. 185–193

  26. [26]

    Global routing based on steiner min-max trees

    Wong, C. C. M. S. C., 1990. “Global routing based on steiner min-max trees”. IEEE Trans. Comput. Aided Des, 9, pp. 1315–1325

  27. [27]

    Boxrouter 2.0: Architecture and implementation of a hybrid and robust global router

    Cho, M., Lu, K., Yuan, K., and Pan, D. Z., 2007. “Boxrouter 2.0: Architecture and implementation of a hybrid and robust global router”. In Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design, IEEE Press, pp. 503–508

  28. [28]

    Accurate pre- diction of detailed routing congestion using supervised data learning

    Qi, Z., Cai, Y ., and Zhou, Q., 2014. “Accurate pre- diction of detailed routing congestion using supervised data learning”. In 2014 IEEE 32nd International Con- ference on Computer Design (ICCD), IEEE, pp. 97– 103

  29. [29]

    An accurate detailed routing routability prediction model in placement

    Zhou, Q., Wang, X., Qi, Z., Chen, Z., Zhou, Q., and Cai, Y ., 2015. “An accurate detailed routing routability prediction model in placement”. In 2015 6th Asia Sym- posium on Quality Electronic Design (ASQED), IEEE, pp. 119–122

  30. [30]

    S., Barto, A

    Sutton, R. S., Barto, A. G., et al., 1998. Introduction 13 Copyright c⃝ by ASME to reinforcement learning, V ol. 135. MIT press Cam- bridge

  31. [31]

    Rectified linear units improve restricted boltzmann machines

    Nair, V ., and Hinton, G. E., 2010. “Rectified linear units improve restricted boltzmann machines”. In Proceed- ings of the 27th international conference on machine learning (ICML-10), pp. 807–814

  32. [32]

    Deep reinforcement learning for multi-agent systems: A review of challenges, solutions and applications

    Nguyen, T. T., Nguyen, N. D., and Nahavandi, S., 2018. “Deep reinforcement learning for multi-agent systems: A review of challenges, solutions and applications”. CoRR, abs/1812.11794

  33. [33]

    Multiagent re- inforcement learning: theoretical framework and an al- gorithm

    Hu, J., Wellman, M. P., et al., 1998. “Multiagent re- inforcement learning: theoretical framework and an al- gorithm.”. In ICML, V ol. 98, Citeseer, pp. 242–250

  34. [34]

    A com- prehensive survey of multiagent reinforcement learn- ing

    Bu, L., Babu, R., De Schutter, B., et al., 2008. “A com- prehensive survey of multiagent reinforcement learn- ing”. IEEE Transactions on Systems, Man, and Cy- bernetics, Part C (Applications and Reviews), 38(2), pp. 156–172

  35. [35]

    Multi-agent reinforcement learning: Independent vs. cooperative agents

    Tan, M., 1993. “Multi-agent reinforcement learning: Independent vs. cooperative agents”. In Proceedings of the tenth international conference on machine learning, pp. 330–337

  36. [36]

    Schaul, T., Quan, J., Antonoglou, I., and Silver, D.,

  37. [37]

    Prioritized Experience Replay

    “Prioritized experience replay”. arXiv preprint arXiv:1511.05952. A Appendix A: Heat maps showing edge utilization of different problem sets based on A* solution B Appendix B: DQN Global Routing Algorithm

  38. [38]

    Burn in replay memory D=e1,e2, ...,et to capacity N et = (st ,at ,rt ,st+1,IsTerminalt+1)

  39. [39]

    Initialize action-value function Q with random weights

  40. [40]

    Decompose multi-pin problems with Minimum Spanning Tree

  41. [41]

    Network training: for episode = 1, maximum episodes Initialize sequence s1 starting at one pin With probability ε select a random action at otherwise select at = maxaQ∗(φ(st ,a; θ)) Execute action at in environment and observe reward rt Update the environment (capacity) for t = 1, maximum steps Set st+1 = st ,at Store transition (st ,at ,rt ,st+1,IsTermin...