A Deep Reinforcement Learning Approach for Global Routing

Barnabas Poczos; Haiguang Liao; Kenji Shimada; Levent Burak Kara; Wentai Zhang; Xuliang Dong

arxiv: 1906.08809 · v1 · pith:E3TZ27H4new · submitted 2019-06-20 · 💻 cs.LG · cs.AI· stat.ML

A Deep Reinforcement Learning Approach for Global Routing

Haiguang Liao , Wentai Zhang , Xuliang Dong , Barnabas Poczos , Kenji Shimada , Levent Burak Kara This is my paper

Pith reviewed 2026-05-25 19:28 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML

keywords global routingdeep reinforcement learningcircuit designpath planningA* algorithmproblem generationelectronic design automation

0 comments

The pith

A deep reinforcement learning agent can outperform sequential A* in finding global routing solutions for circuit designs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper applies deep reinforcement learning to the global routing problem, where wires must connect circuit components without violating design rules. The method trains an agent in a simulated environment to learn a routing policy that works across many different problem instances through conjoint optimization. It also introduces a generator that creates routing problems with adjustable sizes and constraints for training and testing. If the learned policy generalizes, it offers a flexible alternative to traditional greedy algorithms and heuristics that often yield suboptimal results. Experiments show the fine-tuned model achieves higher rewards than a sequential A* benchmark on the generated problems.

Core claim

The deep reinforcement learning approach enables an agent to produce an optimal routing policy based on the variety of problems presented, and the fine-tuned model outperforms the sequential A* method in routing solutions and rewards within the simulated global routing environment.

What carries the argument

The deep reinforcement learning agent that uses a conjoint optimization mechanism to learn a policy for connecting circuit components with wires in a simulated environment.

Load-bearing premise

The simulated environment and reward structure must accurately capture the constraints and objectives of real global routing problems so that the policy generalizes beyond the training instances.

What would settle it

Evaluating the trained routing agent on actual global routing instances from commercial electronic design automation software and comparing total wire length, via count, and design rule violations against A* results.

Figures

Figures reproduced from arXiv: 1906.08809 by Barnabas Poczos, Haiguang Liao, Kenji Shimada, Levent Burak Kara, Wentai Zhang, Xuliang Dong.

**Figure 2.** Figure 2: Pipeline for solving global routing with DQN. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Example configurations of the simulated global routing envi [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Heat maps and edge utilization histogram of two global routing problems generated with global routing problem set generator. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Workflow of DQN-based router. of the best action (as deemed by the network) for the given state of the problem. Training Parameters: These parameters are listed in Tab. 1, which only includes the optimized set of parameters. The model and parameter tuning process are shown and discussed next. Training Specifications: The implementation of the proposed algorithm is completed with Python and Tensorflow. Th… view at source ↗

**Figure 6.** Figure 6: Analysis on DQN model performance on two types of problems. Type I problem: no-edge-depletion type: (a) edge utilization plot; [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 8.** Figure 8: Case study showing the conjoint optimization of DQN router: [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

**Figure 9.** Figure 9: Reward plots based on different combinations of model pa [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

**Figure 10.** Figure 10: Influence of discount factor γ on performance of DQN router and reward curves. WL of DQN routing solutions (red) with γ value of (a) 1, (b) 0.95, (c) 0.9, (d) 0.8 and their comparison to A* routing solutions (blue); Decrease in WL of DQN-based routing solutions compared to A* routing solutions with γ value of (e) 1, (f) 0.95, (g) 0.9, (h) 0.8; log-plot of reward curves for 10 randomly selected problems so… view at source ↗

**Figure 11.** Figure 11: Burn-in memory study showing statistical results of DQN [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗

**Figure 12.** Figure 12: Visualized results showing routing solutions for [PITH_FULL_IMAGE:figures/full_fig_p012_12.png] view at source ↗

**Figure 13.** Figure 13: Heat maps and edge utilization histogram of two global routing problems generated with global routing problem sets generator. [PITH_FULL_IMAGE:figures/full_fig_p015_13.png] view at source ↗

**Figure 14.** Figure 14: Heat maps and edge utilization histogram of two global routing problems generated with global routing problem sets generator. [PITH_FULL_IMAGE:figures/full_fig_p016_14.png] view at source ↗

read the original abstract

Global routing has been a historically challenging problem in electronic circuit design, where the challenge is to connect a large and arbitrary number of circuit components with wires without violating the design rules for the printed circuit boards or integrated circuits. Similar routing problems also exist in the design of complex hydraulic systems, pipe systems and logistic networks. Existing solutions typically consist of greedy algorithms and hard-coded heuristics. As such, existing approaches suffer from a lack of model flexibility and non-optimum solutions. As an alternative approach, this work presents a deep reinforcement learning method for solving the global routing problem in a simulated environment. At the heart of the proposed method is deep reinforcement learning that enables an agent to produce an optimal policy for routing based on the variety of problems it is presented with leveraging the conjoint optimization mechanism of deep reinforcement learning. Conjoint optimization mechanism is explained and demonstrated in details; the best network structure and the parameters of the learned model are explored. Based on the fine-tuned model, routing solutions and rewards are presented and analyzed. The results indicate that the approach can outperform the benchmark method of a sequential A* method, suggesting a promising potential for deep reinforcement learning for global routing and other routing or path planning problems in general. Another major contribution of this work is the development of a global routing problem sets generator with the ability to generate parameterized global routing problem sets with different size and constraints, enabling evaluation of different routing algorithms and the generation of training datasets for future data-driven routing approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They built a parameterized generator for global routing problems and trained DRL that beats sequential A* on those synthetic cases, but the evaluation never leaves the generator.

read the letter

The punchline is that this paper introduces a custom problem generator for global routing and shows a DRL policy outperforming sequential A* on instances drawn from it. The generator is the clearest addition: it can produce problems with varying sizes and constraints, which is useful for creating training data and testing different algorithms. They also walk through the RL setup, try different network structures, and analyze the learned policies and rewards on their instances. That part is straightforward and gives a concrete example of applying existing deep RL techniques to a routing task. The main soft spot is the evaluation scope. All reported results stay inside the synthetic generator; there are no experiments on the standard ISPD 2007 or 2008 global routing benchmarks that reflect real netlists and design rules. This leaves open whether the observed advantage is tied to the generator's distribution rather than a general improvement. The comparison is only against sequential A*, with no ablations or statistical details visible in the abstract, though the full paper presumably supplies more. The work is honest about its simulation setting and does not overclaim transfer. Readers working on RL for path planning or EDA tools could pick up the generator idea or the training approach. It is not a foundational advance but supplies a usable data-generation tool and an empirical data point. I would send it to peer review so referees can check the full experimental protocol and push for ISPD results if the authors want broader claims.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a deep reinforcement learning (DRL) method for global routing in circuit design. It introduces a custom parameterized problem-set generator and reports that a fine-tuned DRL policy produces better routing solutions and rewards than a sequential A* baseline when evaluated on instances drawn from the generator.

Significance. If the empirical claims hold under more rigorous evaluation, the work would demonstrate a data-driven alternative to heuristic routing methods and supply a generator useful for training and benchmarking future routing algorithms. The conjoint optimization aspect of DRL is presented as a conceptual contribution.

major comments (2)

[Results] Results section: All reported comparisons are performed exclusively on synthetic instances from the authors' custom generator; no experiments appear on standard ISPD 2007/2008 global routing benchmarks containing realistic netlists, layer counts, and design-rule constraints. This directly undermines the load-bearing claim that the DRL approach outperforms A* for global routing.
[Abstract and Results] Abstract and Results section: The outperformance claim is stated without quantitative metrics (e.g., wirelength, via counts, overflow), statistical significance tests, problem-size ranges, or ablation studies on network architecture or reward components.

minor comments (2)

[Method] The description of the state representation and action space in the DRL formulation could be made more precise with explicit equations or pseudocode.
[Results] Figure captions and axis labels in the results plots lack sufficient detail on the exact metrics plotted and the number of instances averaged.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and outline the revisions we will make.

read point-by-point responses

Referee: [Results] Results section: All reported comparisons are performed exclusively on synthetic instances from the authors' custom generator; no experiments appear on standard ISPD 2007/2008 global routing benchmarks containing realistic netlists, layer counts, and design-rule constraints. This directly undermines the load-bearing claim that the DRL approach outperforms A* for global routing.

Authors: We agree that all quantitative comparisons are performed on instances drawn from the custom generator. This choice was deliberate to enable controlled variation of problem parameters (grid size, net count, layer constraints) and to isolate the effect of conjoint optimization in DRL. The generator is explicitly presented as a contribution for creating training and evaluation data. We acknowledge that the absence of results on ISPD 2007/2008 benchmarks limits direct claims about performance on industrial netlists with full design-rule sets. In the revised manuscript we will add a limitations paragraph that (a) states this scope explicitly and (b) describes the additional engineering steps (multi-layer encoding, via-cost modeling, overflow handling) needed to apply the same policy network to those benchmarks. We will not claim general superiority beyond the generated distribution. revision: partial
Referee: [Abstract and Results] Abstract and Results section: The outperformance claim is stated without quantitative metrics (e.g., wirelength, via counts, overflow), statistical significance tests, problem-size ranges, or ablation studies on network architecture or reward components.

Authors: We will revise the abstract and the results section to report concrete metrics: average reward, total wirelength, via count, and overflow where measured; the range of grid sizes and net counts used (e.g., 8×8 to 32×32 grids with 10–100 nets); and the number of evaluation instances. We will also add error bars across multiple random seeds and note any statistical tests performed. In addition, we will include ablation results comparing network architectures (MLP vs. CNN variants) and reward-component weightings. These quantitative details and ablations will be inserted into the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical DRL evaluation against external A* baseline on generated instances

full rationale

The manuscript describes training a DRL policy on instances from a custom parameterized generator and reports empirical outperformance versus sequential A* on the same generator distribution. No derivation chain, fitted-parameter-as-prediction, or self-citation load-bearing step is present. The central claim is a direct experimental comparison whose validity rests on the (external) question of generator realism rather than any internal reduction of result to input by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The method implicitly relies on standard RL assumptions (Markov decision process formulation, reward design) that are not detailed.

pith-pipeline@v0.9.0 · 5810 in / 946 out tokens · 22085 ms · 2026-05-25T19:28:34.445860+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 3 internal anchors

[1]

Machine learning applications in physical design: Recent results and directions

Kahng, A. B., 2018. “Machine learning applications in physical design: Recent results and directions”. In Proceedings of the 2018 International Symposium on Physical Design, ACM, pp. 68–73

work page 2018
[2]

Moore’s law: past, present and future

Schaller, R. R., 1997. “Moore’s law: past, present and future”. IEEE spectrum, 34(6), pp. 52–59

work page 1997
[3]

A survey on multi- net global routing for integrated circuits

Hu, J., and Sapatnekar, S. S., 2001. “A survey on multi- net global routing for integrated circuits”. Integration, 31(1), pp. 1–49

work page 2001
[4]

The complexity of wirerout- ing and ﬁnding minimum area layouts for arbitrary vlsi circuits

Kramer, M. R., 1984. “The complexity of wirerout- ing and ﬁnding minimum area layouts for arbitrary vlsi circuits”. Advances in computing research, 2, pp. 129– 146

work page 1984
[5]

Auto- mated al-based mechanical design of hydraulic mani- fold blocks

Chambon, R., and Tollenaere, M., 1991. “Auto- mated al-based mechanical design of hydraulic mani- fold blocks”. Computer-Aided Design, 23(3), pp. 213– 222

work page 1991
[6]

A design expert system for auto-routing of ship pipes

Kang, S.-S., Myung, S., and Han, S., 1999. “A design expert system for auto-routing of ship pipes”. Journal of Ship Production, 15(1), pp. 1–9

work page 1999
[7]

Pipe rout- ing through ant colony optimization

Christodoulou, S. E., and Ellinas, G., 2010. “Pipe rout- ing through ant colony optimization”. Journal of In- frastructure systems, 16(2), pp. 149–159

work page 2010
[8]

Modeling distribution-system water quality; dynamic approach

Grayman, W. M., Clark, R. M., and Males, R. M., 1988. “Modeling distribution-system water quality; dynamic approach”. Journal of Water Resources Planning and Management, 114(3), pp. 295–312

work page 1988
[9]

Advanced routing for city logistics service providers based on time-dependent travel times

Ehmke, J. F., Steinert, A., and Mattfeld, D. C., 2012. “Advanced routing for city logistics service providers based on time-dependent travel times”.Journal of com- putational science, 3(4), pp. 193–205

work page 2012
[10]

Vehicle routing and scheduling models, simulation and city logistics

Barcel ´o, J., Grzybowska, H., and Pardo, S., 2007. “Vehicle routing and scheduling models, simulation and city logistics”. In Dynamic Fleet Management . Springer, pp. 163–195

work page 2007
[11]

Use of steiner’s problem in suboptimal routing in rectilinear metric

Lee, J., Bose, N., and Hwang, F., 1976. “Use of steiner’s problem in suboptimal routing in rectilinear metric”. IEEE Transactions on Circuits and Systems, 23(7), pp. 470–476

work page 1976
[12]

Maizerouter: Engineering an effective global router

Mofﬁtt, M. D., 2008. “Maizerouter: Engineering an effective global router”. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Sys- tems, 27(11), pp. 2017–2026

work page 2008
[13]

Predictable routing

Kastner, R., Bozorgzadeh, E., Sarrafzadeh, M., and Sarrafzadeh, M., 2000. “Predictable routing”. In Pro- ceedings of the 2000 IEEE/ACM international confer- ence on Computer-aided design, IEEE Press, pp. 110– 114

work page 2000
[14]

Boxrouter: a new global router based on box expansion and progressive ilp

Cho, M., and Pan, D. Z., 2007. “Boxrouter: a new global router based on box expansion and progressive ilp”. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 26(12), pp. 2130– 2143

work page 2007
[15]

A force-directed global router

Hasan, N., 1987. “A force-directed global router”. PhD thesis, University of Illinois at Urbana-Champaign

work page 1987
[16]

Playing Atari with Deep Reinforcement Learning

Mnih, V ., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M., 2013. “Playing atari with deep reinforcement learning”. arXiv preprint arXiv:1312.5602

work page internal anchor Pith review Pith/arXiv arXiv 2013
[17]

Human- level control through deep reinforcement learning

Mnih, V ., Kavukcuoglu, K., Silver, D., Rusu, A. A., Ve- ness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al., 2015. “Human- level control through deep reinforcement learning”. Nature, 518(7540), p. 529

work page 2015
[18]

Deep reinforcement learning with double q-learning

Van Hasselt, H., Guez, A., and Silver, D., 2016. “Deep reinforcement learning with double q-learning.”. In AAAI, V ol. 2, Phoenix, AZ, p. 5

work page 2016
[19]

Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation

Kulkarni, T. D., Narasimhan, K., Saeedi, A., and Tenenbaum, J., 2016. “Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation”. In Advances in neural information pro- cessing systems, pp. 3675–3683

work page 2016
[20]

Distributed Prioritized Experience Replay

Horgan, D., Quan, J., Budden, D., Barth-Maron, G., Hessel, M., Van Hasselt, H., and Silver, D., 2018. “Dis- tributed prioritized experience replay”. arXiv preprint arXiv:1803.00933

work page internal anchor Pith review Pith/arXiv arXiv 2018
[21]

A general reinforce- ment learning algorithm that masters chess, shogi, and go through self-play

Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., et al., 2018. “A general reinforce- ment learning algorithm that masters chess, shogi, and go through self-play”. Science, 362(6419), pp. 1140– 1144

work page 2018
[22]

A markovian decision process

Bellman, R., 1957. “A markovian decision process”. Indiana Univ. Math. J., 6, pp. 679–684

work page 1957
[23]

VLSI placement and global routing using simulated annealing , V ol

Sechen, C., 2012. VLSI placement and global routing using simulated annealing , V ol. 54. Springer Science & Business Media

work page 2012
[24]

Rsr: A new rectilinear steiner minimum tree approxi- mation for fpga placement and global routing

de Vincente, J., Lanchares, J., and Hermida, R., 1998. “Rsr: A new rectilinear steiner minimum tree approxi- mation for fpga placement and global routing”. In Eu- romicro Conference, 1998. Proceedings. 24th, V ol. 1, IEEE, pp. 192–195

work page 1998
[25]

New algorithms for the rectilinear steiner tree problem

Ho, J.-M., Vijayan, G., and Wong, C.-K., 1990. “New algorithms for the rectilinear steiner tree problem”. IEEE Transactions on Computer-Aided Design of In- tegrated Circuits and Systems, 9(2), pp. 185–193

work page 1990
[26]

Global routing based on steiner min-max trees

Wong, C. C. M. S. C., 1990. “Global routing based on steiner min-max trees”. IEEE Trans. Comput. Aided Des, 9, pp. 1315–1325

work page 1990
[27]

Boxrouter 2.0: Architecture and implementation of a hybrid and robust global router

Cho, M., Lu, K., Yuan, K., and Pan, D. Z., 2007. “Boxrouter 2.0: Architecture and implementation of a hybrid and robust global router”. In Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design, IEEE Press, pp. 503–508

work page 2007
[28]

Accurate pre- diction of detailed routing congestion using supervised data learning

Qi, Z., Cai, Y ., and Zhou, Q., 2014. “Accurate pre- diction of detailed routing congestion using supervised data learning”. In 2014 IEEE 32nd International Con- ference on Computer Design (ICCD), IEEE, pp. 97– 103

work page 2014
[29]

An accurate detailed routing routability prediction model in placement

Zhou, Q., Wang, X., Qi, Z., Chen, Z., Zhou, Q., and Cai, Y ., 2015. “An accurate detailed routing routability prediction model in placement”. In 2015 6th Asia Sym- posium on Quality Electronic Design (ASQED), IEEE, pp. 119–122

work page 2015
[30]

S., Barto, A

Sutton, R. S., Barto, A. G., et al., 1998. Introduction 13 Copyright c⃝ by ASME to reinforcement learning, V ol. 135. MIT press Cam- bridge

work page 1998
[31]

Rectiﬁed linear units improve restricted boltzmann machines

Nair, V ., and Hinton, G. E., 2010. “Rectiﬁed linear units improve restricted boltzmann machines”. In Proceed- ings of the 27th international conference on machine learning (ICML-10), pp. 807–814

work page 2010
[32]

Deep reinforcement learning for multi-agent systems: A review of challenges, solutions and applications

Nguyen, T. T., Nguyen, N. D., and Nahavandi, S., 2018. “Deep reinforcement learning for multi-agent systems: A review of challenges, solutions and applications”. CoRR, abs/1812.11794

work page arXiv 2018
[33]

Multiagent re- inforcement learning: theoretical framework and an al- gorithm

Hu, J., Wellman, M. P., et al., 1998. “Multiagent re- inforcement learning: theoretical framework and an al- gorithm.”. In ICML, V ol. 98, Citeseer, pp. 242–250

work page 1998
[34]

A com- prehensive survey of multiagent reinforcement learn- ing

Bu, L., Babu, R., De Schutter, B., et al., 2008. “A com- prehensive survey of multiagent reinforcement learn- ing”. IEEE Transactions on Systems, Man, and Cy- bernetics, Part C (Applications and Reviews), 38(2), pp. 156–172

work page 2008
[35]

Multi-agent reinforcement learning: Independent vs. cooperative agents

Tan, M., 1993. “Multi-agent reinforcement learning: Independent vs. cooperative agents”. In Proceedings of the tenth international conference on machine learning, pp. 330–337

work page 1993
[36]

Schaul, T., Quan, J., Antonoglou, I., and Silver, D.,

work page
[37]

Prioritized Experience Replay

“Prioritized experience replay”. arXiv preprint arXiv:1511.05952. A Appendix A: Heat maps showing edge utilization of different problem sets based on A* solution B Appendix B: DQN Global Routing Algorithm

work page internal anchor Pith review Pith/arXiv arXiv
[38]

Burn in replay memory D=e1,e2, ...,et to capacity N et = (st ,at ,rt ,st+1,IsTerminalt+1)

work page
[39]

Initialize action-value function Q with random weights

work page
[40]

Decompose multi-pin problems with Minimum Spanning Tree

work page
[41]

Network training: for episode = 1, maximum episodes Initialize sequence s1 starting at one pin With probability ε select a random action at otherwise select at = maxaQ∗(φ(st ,a; θ)) Execute action at in environment and observe reward rt Update the environment (capacity) for t = 1, maximum steps Set st+1 = st ,at Store transition (st ,at ,rt ,st+1,IsTermin...

work page

[1] [1]

Machine learning applications in physical design: Recent results and directions

Kahng, A. B., 2018. “Machine learning applications in physical design: Recent results and directions”. In Proceedings of the 2018 International Symposium on Physical Design, ACM, pp. 68–73

work page 2018

[2] [2]

Moore’s law: past, present and future

Schaller, R. R., 1997. “Moore’s law: past, present and future”. IEEE spectrum, 34(6), pp. 52–59

work page 1997

[3] [3]

A survey on multi- net global routing for integrated circuits

Hu, J., and Sapatnekar, S. S., 2001. “A survey on multi- net global routing for integrated circuits”. Integration, 31(1), pp. 1–49

work page 2001

[4] [4]

The complexity of wirerout- ing and ﬁnding minimum area layouts for arbitrary vlsi circuits

Kramer, M. R., 1984. “The complexity of wirerout- ing and ﬁnding minimum area layouts for arbitrary vlsi circuits”. Advances in computing research, 2, pp. 129– 146

work page 1984

[5] [5]

Auto- mated al-based mechanical design of hydraulic mani- fold blocks

Chambon, R., and Tollenaere, M., 1991. “Auto- mated al-based mechanical design of hydraulic mani- fold blocks”. Computer-Aided Design, 23(3), pp. 213– 222

work page 1991

[6] [6]

A design expert system for auto-routing of ship pipes

Kang, S.-S., Myung, S., and Han, S., 1999. “A design expert system for auto-routing of ship pipes”. Journal of Ship Production, 15(1), pp. 1–9

work page 1999

[7] [7]

Pipe rout- ing through ant colony optimization

Christodoulou, S. E., and Ellinas, G., 2010. “Pipe rout- ing through ant colony optimization”. Journal of In- frastructure systems, 16(2), pp. 149–159

work page 2010

[8] [8]

Modeling distribution-system water quality; dynamic approach

Grayman, W. M., Clark, R. M., and Males, R. M., 1988. “Modeling distribution-system water quality; dynamic approach”. Journal of Water Resources Planning and Management, 114(3), pp. 295–312

work page 1988

[9] [9]

Advanced routing for city logistics service providers based on time-dependent travel times

Ehmke, J. F., Steinert, A., and Mattfeld, D. C., 2012. “Advanced routing for city logistics service providers based on time-dependent travel times”.Journal of com- putational science, 3(4), pp. 193–205

work page 2012

[10] [10]

Vehicle routing and scheduling models, simulation and city logistics

Barcel ´o, J., Grzybowska, H., and Pardo, S., 2007. “Vehicle routing and scheduling models, simulation and city logistics”. In Dynamic Fleet Management . Springer, pp. 163–195

work page 2007

[11] [11]

Use of steiner’s problem in suboptimal routing in rectilinear metric

Lee, J., Bose, N., and Hwang, F., 1976. “Use of steiner’s problem in suboptimal routing in rectilinear metric”. IEEE Transactions on Circuits and Systems, 23(7), pp. 470–476

work page 1976

[12] [12]

Maizerouter: Engineering an effective global router

Mofﬁtt, M. D., 2008. “Maizerouter: Engineering an effective global router”. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Sys- tems, 27(11), pp. 2017–2026

work page 2008

[13] [13]

Predictable routing

Kastner, R., Bozorgzadeh, E., Sarrafzadeh, M., and Sarrafzadeh, M., 2000. “Predictable routing”. In Pro- ceedings of the 2000 IEEE/ACM international confer- ence on Computer-aided design, IEEE Press, pp. 110– 114

work page 2000

[14] [14]

Boxrouter: a new global router based on box expansion and progressive ilp

Cho, M., and Pan, D. Z., 2007. “Boxrouter: a new global router based on box expansion and progressive ilp”. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 26(12), pp. 2130– 2143

work page 2007

[15] [15]

A force-directed global router

Hasan, N., 1987. “A force-directed global router”. PhD thesis, University of Illinois at Urbana-Champaign

work page 1987

[16] [16]

Playing Atari with Deep Reinforcement Learning

Mnih, V ., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M., 2013. “Playing atari with deep reinforcement learning”. arXiv preprint arXiv:1312.5602

work page internal anchor Pith review Pith/arXiv arXiv 2013

[17] [17]

Human- level control through deep reinforcement learning

Mnih, V ., Kavukcuoglu, K., Silver, D., Rusu, A. A., Ve- ness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al., 2015. “Human- level control through deep reinforcement learning”. Nature, 518(7540), p. 529

work page 2015

[18] [18]

Deep reinforcement learning with double q-learning

Van Hasselt, H., Guez, A., and Silver, D., 2016. “Deep reinforcement learning with double q-learning.”. In AAAI, V ol. 2, Phoenix, AZ, p. 5

work page 2016

[19] [19]

Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation

Kulkarni, T. D., Narasimhan, K., Saeedi, A., and Tenenbaum, J., 2016. “Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation”. In Advances in neural information pro- cessing systems, pp. 3675–3683

work page 2016

[20] [20]

Distributed Prioritized Experience Replay

Horgan, D., Quan, J., Budden, D., Barth-Maron, G., Hessel, M., Van Hasselt, H., and Silver, D., 2018. “Dis- tributed prioritized experience replay”. arXiv preprint arXiv:1803.00933

work page internal anchor Pith review Pith/arXiv arXiv 2018

[21] [21]

A general reinforce- ment learning algorithm that masters chess, shogi, and go through self-play

Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., et al., 2018. “A general reinforce- ment learning algorithm that masters chess, shogi, and go through self-play”. Science, 362(6419), pp. 1140– 1144

work page 2018

[22] [22]

A markovian decision process

Bellman, R., 1957. “A markovian decision process”. Indiana Univ. Math. J., 6, pp. 679–684

work page 1957

[23] [23]

VLSI placement and global routing using simulated annealing , V ol

Sechen, C., 2012. VLSI placement and global routing using simulated annealing , V ol. 54. Springer Science & Business Media

work page 2012

[24] [24]

Rsr: A new rectilinear steiner minimum tree approxi- mation for fpga placement and global routing

de Vincente, J., Lanchares, J., and Hermida, R., 1998. “Rsr: A new rectilinear steiner minimum tree approxi- mation for fpga placement and global routing”. In Eu- romicro Conference, 1998. Proceedings. 24th, V ol. 1, IEEE, pp. 192–195

work page 1998

[25] [25]

New algorithms for the rectilinear steiner tree problem

Ho, J.-M., Vijayan, G., and Wong, C.-K., 1990. “New algorithms for the rectilinear steiner tree problem”. IEEE Transactions on Computer-Aided Design of In- tegrated Circuits and Systems, 9(2), pp. 185–193

work page 1990

[26] [26]

Global routing based on steiner min-max trees

Wong, C. C. M. S. C., 1990. “Global routing based on steiner min-max trees”. IEEE Trans. Comput. Aided Des, 9, pp. 1315–1325

work page 1990

[27] [27]

Boxrouter 2.0: Architecture and implementation of a hybrid and robust global router

Cho, M., Lu, K., Yuan, K., and Pan, D. Z., 2007. “Boxrouter 2.0: Architecture and implementation of a hybrid and robust global router”. In Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design, IEEE Press, pp. 503–508

work page 2007

[28] [28]

Accurate pre- diction of detailed routing congestion using supervised data learning

Qi, Z., Cai, Y ., and Zhou, Q., 2014. “Accurate pre- diction of detailed routing congestion using supervised data learning”. In 2014 IEEE 32nd International Con- ference on Computer Design (ICCD), IEEE, pp. 97– 103

work page 2014

[29] [29]

An accurate detailed routing routability prediction model in placement

Zhou, Q., Wang, X., Qi, Z., Chen, Z., Zhou, Q., and Cai, Y ., 2015. “An accurate detailed routing routability prediction model in placement”. In 2015 6th Asia Sym- posium on Quality Electronic Design (ASQED), IEEE, pp. 119–122

work page 2015

[30] [30]

S., Barto, A

Sutton, R. S., Barto, A. G., et al., 1998. Introduction 13 Copyright c⃝ by ASME to reinforcement learning, V ol. 135. MIT press Cam- bridge

work page 1998

[31] [31]

Rectiﬁed linear units improve restricted boltzmann machines

Nair, V ., and Hinton, G. E., 2010. “Rectiﬁed linear units improve restricted boltzmann machines”. In Proceed- ings of the 27th international conference on machine learning (ICML-10), pp. 807–814

work page 2010

[32] [32]

Deep reinforcement learning for multi-agent systems: A review of challenges, solutions and applications

Nguyen, T. T., Nguyen, N. D., and Nahavandi, S., 2018. “Deep reinforcement learning for multi-agent systems: A review of challenges, solutions and applications”. CoRR, abs/1812.11794

work page arXiv 2018

[33] [33]

Multiagent re- inforcement learning: theoretical framework and an al- gorithm

Hu, J., Wellman, M. P., et al., 1998. “Multiagent re- inforcement learning: theoretical framework and an al- gorithm.”. In ICML, V ol. 98, Citeseer, pp. 242–250

work page 1998

[34] [34]

A com- prehensive survey of multiagent reinforcement learn- ing

Bu, L., Babu, R., De Schutter, B., et al., 2008. “A com- prehensive survey of multiagent reinforcement learn- ing”. IEEE Transactions on Systems, Man, and Cy- bernetics, Part C (Applications and Reviews), 38(2), pp. 156–172

work page 2008

[35] [35]

Multi-agent reinforcement learning: Independent vs. cooperative agents

Tan, M., 1993. “Multi-agent reinforcement learning: Independent vs. cooperative agents”. In Proceedings of the tenth international conference on machine learning, pp. 330–337

work page 1993

[36] [36]

Schaul, T., Quan, J., Antonoglou, I., and Silver, D.,

work page

[37] [37]

Prioritized Experience Replay

“Prioritized experience replay”. arXiv preprint arXiv:1511.05952. A Appendix A: Heat maps showing edge utilization of different problem sets based on A* solution B Appendix B: DQN Global Routing Algorithm

work page internal anchor Pith review Pith/arXiv arXiv

[38] [38]

Burn in replay memory D=e1,e2, ...,et to capacity N et = (st ,at ,rt ,st+1,IsTerminalt+1)

work page

[39] [39]

Initialize action-value function Q with random weights

work page

[40] [40]

Decompose multi-pin problems with Minimum Spanning Tree

work page

[41] [41]

Network training: for episode = 1, maximum episodes Initialize sequence s1 starting at one pin With probability ε select a random action at otherwise select at = maxaQ∗(φ(st ,a; θ)) Execute action at in environment and observe reward rt Update the environment (capacity) for t = 1, maximum steps Set st+1 = st ,at Store transition (st ,at ,rt ,st+1,IsTermin...

work page