D\'ej\`a Vu Packing: Optimizing FPGA Logic Clustering Runtime via Pattern Memoization

Amin Mohaghegh; Andrew Boutros; Milo Liebster

arxiv: 2604.24649 · v1 · submitted 2026-04-27 · 💻 cs.AR

D\'ej\`a Vu Packing: Optimizing FPGA Logic Clustering Runtime via Pattern Memoization

Milo Liebster , Amin Mohaghegh , Andrew Boutros This is my paper

Pith reviewed 2026-05-07 17:26 UTC · model grok-4.3

classification 💻 cs.AR

keywords FPGApackingclusteringmemoizationsignature treeCAD runtimeVPRlogic blocks

0 comments

The pith

A signature tree memoizes repeated cluster legality checks to accelerate FPGA packing several times over.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The packing stage in FPGA CAD flows clusters logic primitives into larger blocks and must solve expensive routing legality checks for each candidate. For modern complex logic blocks, this stage dominates total runtime, often exceeding half or nearly all of the place-and-route time. The authors observe that many candidate clusters repeat a small set of patterns, so repeated checks are wasteful. Their Deja Vu approach builds a packing signature tree to detect these repeats quickly and reuse prior legality results. On standard benchmarks this yields average speedups of 3.7x for 7-series-like architectures and 6.9x for Stratix-10-like ones, translating to 1.6x and 5.3x faster end-to-end flows with no loss in quality.

Core claim

The central discovery is that a packing signature tree data structure can identify recurring patterns among attempted clusters and memoize the outcomes of their multi-source multi-sink routing legality checks. Because a large fraction of attempted clusters match one of a modest number of prior patterns, the tree avoids redundant expensive checks while preserving the exact same clustering decisions.

What carries the argument

The packing signature tree, a trie-like structure that hashes cluster signatures to store and retrieve legality results for repeated packing patterns.

Load-bearing premise

That enough attempted clusters match previous patterns for the signature tree to skip most checks without using too much memory or missing any pattern variations that would change a legality decision.

What would settle it

Running the packer on a new set of large benchmarks where every candidate cluster has a unique pattern; if speedups fall below 1.2x while memory grows linearly with attempts, the core premise does not hold.

Figures

Figures reproduced from arXiv: 2604.24649 by Amin Mohaghegh, Andrew Boutros, Milo Liebster.

**Figure 1.** Figure 1: Complex commercial Stratix 10 LE with 8 distinct inputs (A-H), one view at source ↗

**Figure 2.** Figure 2: VPR runtime breakdown for the VTR and Koios benchmarks on the 7-series architecture (W=300) (left), as well as the Titan23 and Titanium25 view at source ↗

**Figure 3.** Figure 3: Breakdown of the percentage of clusters that succeeded with specu view at source ↗

**Figure 5.** Figure 5: Example PST representation with two packing signatures (right) view at source ↗

**Figure 6.** Figure 6: Breakdown of final packed clusters that are the first occurrence vs. have a repeated packing signature. On average, 59% and 65% of the clusters have view at source ↗

**Figure 7.** Figure 7: Comparison of peak memory utilization of the VPR packing stage and full VPR flow with and without déjà vu packing. All values are normalized view at source ↗

read the original abstract

Implementing a digital circuit on an FPGA fabric requires clustering technology-mapped netlist primitives into coarser-granularity blocks that can be directly mapped to the physical resources available on the FPGA. As the architecture of FPGA logic blocks (LBs) has grown in complexity, with sophisticated logic elements (LEs) and highly irregular local interconnect, this packing problem has become more challenging. To ensure the feasibility of intracluster routing, the computer-aided design (CAD) tools must solve a costly multi-source multi-sink routing problem for each candidate cluster. In this paper, we first show that such packing legality checks consume a significant portion of the CAD flow runtime for LB architectures with complex LEs and local routing structures resembling modern commercial FPGAs. We demonstrate that the packing stage constitutes 58% and 94% of the entire Versatile Place and Route (VPR) flow runtime on average when mapping a wide variety of benchmarks to the AMD 7-series-like and Altera Stratix-10-like VTR architecture captures, respectively. By analyzing the packing algorithm behavior, we observe that a significant fraction of the attempted packed clusters are repetitions of a much smaller number of packing patterns, and therefore many of the packing legality checks are redundant and could be skipped. To this end, we introduce our D\'ej\`a Vu packing approach, which leverages a novel packing signature tree data structure that enables efficient identification of recurring packing patterns and memoization of their legality check outcomes. Our approach speeds up the packing by up to 13.4x and 29.3x, with an average of 3.7x and 6.9x, across the evaluated benchmarks on the 7-series and Stratix 10 architectures. These packing runtime gains result in a significant 1.6x and 5.3x average reduction in end-to-end VPR runtime, while maintaining quality of results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The signature tree memoizes repeated packing legality checks for 3.7x-6.9x average speedups in VPR, but the paper must show how it avoids false matches on non-identical clusters.

read the letter

The main thing to know is that this work uses a packing signature tree to memoize legality checks during FPGA logic clustering, which cuts packing time substantially on the tested architectures while keeping results the same. They measure that the packing stage already takes 58 percent of VPR time on a 7-series like architecture and 94 percent on a Stratix 10 like one. By looking at the algorithm, they see lots of repeated cluster patterns, so many routing checks inside packing are redundant. The Deja Vu approach builds a tree to spot those repeats quickly and skip the expensive checks by reusing prior outcomes. This leads to packing speedups of 3.7 times on average for the first architecture and 6.9 times for the second, with peaks at 13.4 and 29.3 times. End to end VPR gets 1.6 times and 5.3 times faster on average. Quality of results stays the same across their benchmarks. The paper does a good job laying out the runtime problem first and then showing the repetition pattern that makes memoization viable. The signature tree is a new data structure for this specific use in packing. The soft spots are around the memoization safety and the experiments. The stress test points out that signatures must not produce false matches on different clusters, or you could reuse incorrect legality decisions. The abstract says quality holds, but the full paper needs to detail how signatures are built and if there's a verification step to catch collisions. Memory overhead for the tree also goes unmentioned, which could matter if patterns accumulate. On the experimental side, the benchmarks are described as a wide variety but without a list, selection criteria, or any error bars or statistical tests, the speedup numbers are harder to trust as general. This is for researchers and developers in FPGA computer aided design, especially those working with VPR or similar open tools. A reader focused on optimizing CAD flows would pick up a concrete technique to adapt. The work shows clear thinking on a practical bottleneck and has measurable gains, so it deserves a serious referee. I would recommend sending it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper proposes Déjà Vu packing, an optimization for FPGA logic clustering that observes a high fraction of repeated packing patterns during legality checks (which consume 58% and 94% of VPR runtime on two architectures). It introduces a packing signature tree data structure to identify recurring patterns and memoize their legality outcomes, claiming average packing speedups of 3.7× (7-series) and 6.9× (Stratix 10), with corresponding end-to-end VPR reductions of 1.6× and 5.3× while preserving quality of results.

Significance. If the memoization is provably sound and the empirical speedups are robust across benchmarks, the work could meaningfully accelerate FPGA CAD flows for modern complex logic blocks by eliminating redundant routing checks inside clusters. The core observation of pattern repetition and the signature-tree mechanism represent a practical algorithmic contribution to a known bottleneck.

major comments (2)

[Section 3] Section 3 (Signature Tree Construction): The description of signature generation does not specify whether signatures are exact canonical forms, structural hashes, or include a verification step. Without this, it is impossible to rule out false-positive matches between non-identical clusters, which would silently reuse incorrect legality results and potentially degrade QoR or produce illegal packings.
[Section 5] Section 5 (Experimental Evaluation): The reported speedups (up to 13.4×/29.3×, averages 3.7×/6.9×) and end-to-end gains lack any description of benchmark selection criteria, number of independent runs, statistical significance testing, or error bars. This makes it difficult to assess whether the results are generalizable or sensitive to post-hoc benchmark choices.

minor comments (2)

[Abstract] The abstract states that quality of results is maintained but does not quantify the specific metrics (critical-path delay, area, or routing success rate) used to verify this claim; these should be reported with tables or figures in the evaluation section.
[Section 3] Notation for 'packing patterns' and 'signatures' is introduced informally; a short formal definition or pseudocode in Section 3 would improve clarity for readers unfamiliar with the VPR packing algorithm.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the opportunity to clarify and strengthen our manuscript. We address each major comment in detail below, committing to revisions where appropriate to improve clarity and reproducibility.

read point-by-point responses

Referee: [Section 3] Section 3 (Signature Tree Construction): The description of signature generation does not specify whether signatures are exact canonical forms, structural hashes, or include a verification step. Without this, it is impossible to rule out false-positive matches between non-identical clusters, which would silently reuse incorrect legality results and potentially degrade QoR or produce illegal packings.

Authors: We appreciate this point on soundness. The signature in our packing signature tree is constructed as an exact canonical form: primitives within each candidate cluster are sorted by type and position, their local connectivity is encoded as a deterministic adjacency list string, and the resulting tuple serves as the key. This is not a probabilistic hash but a structural canonicalization, guaranteeing identical signatures only for identical patterns. The tree traversal further verifies structural equivalence before reusing any memoized legality outcome. We will revise Section 3 to explicitly describe this canonicalization process, add pseudocode for signature generation, and state that no false-positive reuse is possible by design. revision: yes
Referee: [Section 5] Section 5 (Experimental Evaluation): The reported speedups (up to 13.4×/29.3×, averages 3.7×/6.9×) and end-to-end gains lack any description of benchmark selection criteria, number of independent runs, statistical significance testing, or error bars. This makes it difficult to assess whether the results are generalizable or sensitive to post-hoc benchmark choices.

Authors: We agree that additional methodological details will strengthen the evaluation. The benchmarks comprise the full VTR 8.0 suite (all 20+ circuits), selected because they represent the standard, diverse set used in prior FPGA packing literature and cover a range of sizes and domains. All runs used deterministic algorithms on a fixed platform, so single executions per benchmark were performed. To improve statistical transparency, we will add a new subsection describing benchmark selection, include a table of all circuits with size and runtime statistics, and report mean speedups with standard deviations obtained from five independent runs (varying placement seeds where applicable). These changes will be incorporated in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical runtime optimization with direct measurements

full rationale

The paper describes an algorithmic optimization for FPGA packing that observes repeated patterns in clustering attempts and uses a signature tree for memoization. All claims rest on runtime benchmarks across architectures and benchmarks, with no equations, fitted parameters, predictions derived from inputs, or self-citations that bear the central result. Speedups (e.g., 3.7x packing) are measured outcomes, not quantities defined by the method itself. The work is self-contained against external VPR flows and does not reduce any result to its own assumptions by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical optimization technique. No free parameters, domain axioms, or invented physical entities are described in the abstract; the signature tree is a standard algorithmic data structure rather than a postulated entity.

pith-pipeline@v0.9.0 · 5657 in / 1393 out tokens · 74803 ms · 2026-05-07T17:26:58.970191+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

[1]

A New Paradigm for FPGA Placement Without Explicit Packing,

W. Li and D. Z. Pan, “A New Paradigm for FPGA Placement Without Explicit Packing,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 38, no. 11, pp. 2113– 2126, 2019

work page 2019
[2]

Optimizing FPGA Logic Block Architectures for Arithmetic,

K. E. Murray, J. Luu, M. J. Walker, C. McCullough, S. Wang, S. Huda, B. Yan, C. Chiasson, K. B. Kent, J. Andersonet al., “Optimizing FPGA Logic Block Architectures for Arithmetic,”IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 28, no. 6, pp. 1378–1391, 2020

work page 2020
[3]

VTR 9: Open-source CAD for Fabric and Beyond FPGA Architecture Exploration,

M. A. Elgammal, A. Mohaghegh, S. G. Shahrouz, F. Mahmoudi, F. Ko¸ sar, K. Talaei, J. Fife, D. Khadivi, K. Murray, A. Boutroset al., “VTR 9: Open-source CAD for Fabric and Beyond FPGA Architecture Exploration,”ACM Transactions on Reconfigurable Technology and Systems, vol. 18, no. 3, pp. 1–53, 2025

work page 2025
[4]

Koios 2.0: Open-Source Deep Learning Benchmarks for FPGA Architecture and CAD Research,

A. Arora, A. Boutros, S. A. Damghani, K. Mathur, V . Mohanty, T. Anand, M. A. Elgammal, K. B. Kent, V . Betz, and L. K. John, “Koios 2.0: Open-Source Deep Learning Benchmarks for FPGA Architecture and CAD Research,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 42, no. 11, pp. 3895–3909, 2023

work page 2023
[5]

Titan: Enabling Large and Complex Benchmarks in Academic CAD,

K. E. Murray, S. Whitty, S. Liu, J. Luu, and V . Betz, “Titan: Enabling Large and Complex Benchmarks in Academic CAD,” inIEEE Interna- tional Conference on Field programmable Logic and Applications (FPL), 2013, pp. 1–8

work page 2013
[6]

Cluster-based Logic Blocks for FPGAs: Area- Efficiency vs. Input Sharing and Size,

V . Betz and J. Rose, “Cluster-based Logic Blocks for FPGAs: Area- Efficiency vs. Input Sharing and Size,” inIEEE Custom Integrated Circuits Conference (CICC), 1997, pp. 551–554

work page 1997
[7]

Using Cluster-Based Logic Blocks and Timing-Driven Packing to Improve FPGA Speed and Density,

A. Marquardt, V . Betz, and J. Rose, “Using Cluster-Based Logic Blocks and Timing-Driven Packing to Improve FPGA Speed and Density,” inACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), 1999, pp. 37–46

work page 1999
[8]

RPack: Routability-Driven Packing for Cluster-Based FPGAs,

E. Bozorgzadeh, S. Ogrenci-Memik, and M. Sarrafzadeh, “RPack: Routability-Driven Packing for Cluster-Based FPGAs,” inACM/IEEE Asia and South Pacific Design Automation Conference (ASP-DAC), 2001, pp. 629–634

work page 2001
[9]

Efficient Circuit Clustering for Area and Power Reduction in FPGAs,

A. Singh, G. Parthasarathy, and M. Marek-Sadowska, “Efficient Circuit Clustering for Area and Power Reduction in FPGAs,”ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 7, no. 4, pp. 643–663, 2002

work page 2002
[10]

Improving Timing-Driven FPGA Packing with Physical Information,

D. T. Chen, K. V orwerk, and A. Kennings, “Improving Timing-Driven FPGA Packing with Physical Information,” inIEEE International Con- ference on Field Programmable Logic and Applications (FPL), 2007, pp. 117–123

work page 2007
[11]

MO-Pack: Many-Objective Clustering for FPGA CAD,

S. T. Rajavel and A. Akoglu, “MO-Pack: Many-Objective Clustering for FPGA CAD,” inACM/IEEE Design Automation Conference (DAC), 2011, pp. 818–823

work page 2011
[12]

Un/DoPack: Re-clustering of Large System-on-Chip Designs with Interconnect Variation for Low- Cost FPGAs,

M. Tom, D. Leong, and G. Lemieux, “Un/DoPack: Re-clustering of Large System-on-Chip Designs with Interconnect Variation for Low- Cost FPGAs,” inIEEE/ACM International Conference on Computer- aided design (ICCAD), 2006, pp. 680–687

work page 2006
[13]

T-NDPack: Timing-Driven Non-Uniform Depopulation Based Clustering,

H. Liu and A. Akoglu, “T-NDPack: Timing-Driven Non-Uniform Depopulation Based Clustering,” inIEEE Southern Conference on Programmable Logic (SPL), 2009, pp. 9–14

work page 2009
[14]

Multilevel K-Way Hypergraph Partitioning,

G. Karypis and V . Kumar, “Multilevel K-Way Hypergraph Partitioning,” inACM/IEEE Design Automation Conference (DAC), 1999, pp. 343– 348

work page 1999
[15]

K-Way Partitioning Based Packing for FPGA Logic Blocks Without Input Bandwidth Constraint,

W. Feng, “K-Way Partitioning Based Packing for FPGA Logic Blocks Without Input Bandwidth Constraint,” inIEEE International Conference on Field-Programmable Technology (FPT), 2012, pp. 8–15

work page 2012
[16]

Rent’s Rule Based FPGA Packing for Routability Optimization,

W. Feng, J. Greene, K. V orwerk, V . Pevzner, and A. Kundu, “Rent’s Rule Based FPGA Packing for Routability Optimization,” inACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), 2014, pp. 31–34

work page 2014
[17]

Runtime-Quality Tradeoff in Partitioning Based Multithreaded Packing,

D. Vercruyce, E. Vansteenkiste, and D. Stroobandt, “Runtime-Quality Tradeoff in Partitioning Based Multithreaded Packing,” inIEEE Inter- national Conference on Field Programmable Logic and Applications (FPL), 2016, pp. 1–9

work page 2016
[18]

How Preserving Circuit Design Hierarchy During FPGA Packing Leads to Better Performance,

——, “How Preserving Circuit Design Hierarchy During FPGA Packing Leads to Better Performance,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 37, no. 3, pp. 629–642, 2017

work page 2017
[19]

LSC: A Large-Scale Consensus-Based Clustering Algorithm for High-Performance FPGAs,

L. Singhal, M. A. Iyer, and S. Adya, “LSC: A Large-Scale Consensus-Based Clustering Algorithm for High-Performance FPGAs,” inACM/IEEE Design Automation Conference (DAC), 2017, pp. 1–6

work page 2017
[20]

Simultaneous Timing Driven Clustering and Placement for FPGAs,

G. Chen and J. Cong, “Simultaneous Timing Driven Clustering and Placement for FPGAs,” inInternational Conference on Field Pro- grammable Logic and Applications (FPL), 2004, pp. 158–167

work page 2004
[21]

Breaking Boundaries: Optimizing FPGA CAD with Flexible and Multi-threaded Re-Clustering,

M. A. Elgammal and V . Betz, “Breaking Boundaries: Optimizing FPGA CAD with Flexible and Multi-threaded Re-Clustering,” inACM Interna- tional Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART), 2023, pp. 11–18

work page 2023
[22]

Architecture description and packing for logic blocks with hierarchy, modes and complex interconnect,

J. Luu, J. Anderson, and J. Rose, “Architecture description and packing for logic blocks with hierarchy, modes and complex interconnect,” Proceedings of 2011 Acm/Sigda International Symposium on Field Programmable Gate Arrays, pp. 227–236, 2011

work page 2011
[23]

Towards Interconnect-Adaptive Packing for FPGAs,

J. Luu, J. Rose, and J. Anderson, “Towards Interconnect-Adaptive Packing for FPGAs,” inACM/SIGDA International Symposium on Field- Programmable Gate Arrays (FPGA), 2014, pp. 21–30

work page 2014
[24]

Using Sparse Crossbars within LUT,

G. Lemieux and D. Lewis, “Using Sparse Crossbars within LUT,” inACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), 2001, pp. 59–68

work page 2001
[25]

Academic Packing for Commercial FPGA Architec- tures,

T. D. Haroldsen, “Academic Packing for Commercial FPGA Architec- tures,” Ph.D. dissertation, Brigham Young University, 2017

work page 2017
[26]

VTR 7.0: Next Generation Architecture and CAD System for FPGAs,

J. Luu, J. Goeders, M. Wainberg, A. Somerville, T. Yu, K. Nasartschuk, M. Nasr, S. Wang, T. Liu, N. Ahmed, K. B. Kent, J. Anderson, J. Rose, and V . Betz, “VTR 7.0: Next Generation Architecture and CAD System for FPGAs,”ACM Transactions on Reconfigurable Technology and Systems, vol. 7, no. 3, pp. 1–30, 2014

work page 2014
[27]

VTR 8: High-performance CAD and Customizable FPGA Architecture Modelling,

K. E. Murray, O. Petelin, S. Zhong, J. M. Wang, M. Eldafrawy, J.-P. Legault, E. Sha, A. G. Graham, J. Wu, M. J. P. Walker, H. Zeng, P. Pa- tros, J. Luu, K. B. Kent, and V . Betz, “VTR 8: High-performance CAD and Customizable FPGA Architecture Modelling,”ACM Transactions on Reconfigurable Technology and Systems, vol. 13, no. 2, pp. 1–60, 2020

work page 2020
[28]

Titan 2.0: Enabling Open-Source CAD Evaluation with a Modern Architecture Capture,

K. T. Khoozani, A. A. Dehkordi, and V . Betz, “Titan 2.0: Enabling Open-Source CAD Evaluation with a Modern Architecture Capture,” inIEEE International Conference on Field-Programmable Logic and Applications (FPL), 2023, pp. 57–64

work page 2023
[29]

PathFinder: A Negotiation-Based Performance-Driven Router for FPGAs,

L. McMurchie and C. Ebeling, “PathFinder: A Negotiation-Based Performance-Driven Router for FPGAs,” inACM International Sym- posium on Field-Programmable Gate Arrays (FPGA), 1995, pp. 111— -117

work page 1995
[30]

Guaranteed Yet Hard to Find: Uncovering FPGA Routing Convergence Paradox,

S. Shrivastava, S. Nikoli ´c, S. Tanaka, C. Ravishankar, D. Gaitonde, and M. Stojilovi ´c, “Guaranteed Yet Hard to Find: Uncovering FPGA Routing Convergence Paradox,” inIEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2025, pp. 143–151

work page 2025
[31]

Valgrind,

“Valgrind,” https://valgrind.org, accessed: 2026-01-17

work page 2026

[1] [1]

A New Paradigm for FPGA Placement Without Explicit Packing,

W. Li and D. Z. Pan, “A New Paradigm for FPGA Placement Without Explicit Packing,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 38, no. 11, pp. 2113– 2126, 2019

work page 2019

[2] [2]

Optimizing FPGA Logic Block Architectures for Arithmetic,

K. E. Murray, J. Luu, M. J. Walker, C. McCullough, S. Wang, S. Huda, B. Yan, C. Chiasson, K. B. Kent, J. Andersonet al., “Optimizing FPGA Logic Block Architectures for Arithmetic,”IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 28, no. 6, pp. 1378–1391, 2020

work page 2020

[3] [3]

VTR 9: Open-source CAD for Fabric and Beyond FPGA Architecture Exploration,

M. A. Elgammal, A. Mohaghegh, S. G. Shahrouz, F. Mahmoudi, F. Ko¸ sar, K. Talaei, J. Fife, D. Khadivi, K. Murray, A. Boutroset al., “VTR 9: Open-source CAD for Fabric and Beyond FPGA Architecture Exploration,”ACM Transactions on Reconfigurable Technology and Systems, vol. 18, no. 3, pp. 1–53, 2025

work page 2025

[4] [4]

Koios 2.0: Open-Source Deep Learning Benchmarks for FPGA Architecture and CAD Research,

A. Arora, A. Boutros, S. A. Damghani, K. Mathur, V . Mohanty, T. Anand, M. A. Elgammal, K. B. Kent, V . Betz, and L. K. John, “Koios 2.0: Open-Source Deep Learning Benchmarks for FPGA Architecture and CAD Research,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 42, no. 11, pp. 3895–3909, 2023

work page 2023

[5] [5]

Titan: Enabling Large and Complex Benchmarks in Academic CAD,

K. E. Murray, S. Whitty, S. Liu, J. Luu, and V . Betz, “Titan: Enabling Large and Complex Benchmarks in Academic CAD,” inIEEE Interna- tional Conference on Field programmable Logic and Applications (FPL), 2013, pp. 1–8

work page 2013

[6] [6]

Cluster-based Logic Blocks for FPGAs: Area- Efficiency vs. Input Sharing and Size,

V . Betz and J. Rose, “Cluster-based Logic Blocks for FPGAs: Area- Efficiency vs. Input Sharing and Size,” inIEEE Custom Integrated Circuits Conference (CICC), 1997, pp. 551–554

work page 1997

[7] [7]

Using Cluster-Based Logic Blocks and Timing-Driven Packing to Improve FPGA Speed and Density,

A. Marquardt, V . Betz, and J. Rose, “Using Cluster-Based Logic Blocks and Timing-Driven Packing to Improve FPGA Speed and Density,” inACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), 1999, pp. 37–46

work page 1999

[8] [8]

RPack: Routability-Driven Packing for Cluster-Based FPGAs,

E. Bozorgzadeh, S. Ogrenci-Memik, and M. Sarrafzadeh, “RPack: Routability-Driven Packing for Cluster-Based FPGAs,” inACM/IEEE Asia and South Pacific Design Automation Conference (ASP-DAC), 2001, pp. 629–634

work page 2001

[9] [9]

Efficient Circuit Clustering for Area and Power Reduction in FPGAs,

A. Singh, G. Parthasarathy, and M. Marek-Sadowska, “Efficient Circuit Clustering for Area and Power Reduction in FPGAs,”ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 7, no. 4, pp. 643–663, 2002

work page 2002

[10] [10]

Improving Timing-Driven FPGA Packing with Physical Information,

D. T. Chen, K. V orwerk, and A. Kennings, “Improving Timing-Driven FPGA Packing with Physical Information,” inIEEE International Con- ference on Field Programmable Logic and Applications (FPL), 2007, pp. 117–123

work page 2007

[11] [11]

MO-Pack: Many-Objective Clustering for FPGA CAD,

S. T. Rajavel and A. Akoglu, “MO-Pack: Many-Objective Clustering for FPGA CAD,” inACM/IEEE Design Automation Conference (DAC), 2011, pp. 818–823

work page 2011

[12] [12]

Un/DoPack: Re-clustering of Large System-on-Chip Designs with Interconnect Variation for Low- Cost FPGAs,

M. Tom, D. Leong, and G. Lemieux, “Un/DoPack: Re-clustering of Large System-on-Chip Designs with Interconnect Variation for Low- Cost FPGAs,” inIEEE/ACM International Conference on Computer- aided design (ICCAD), 2006, pp. 680–687

work page 2006

[13] [13]

T-NDPack: Timing-Driven Non-Uniform Depopulation Based Clustering,

H. Liu and A. Akoglu, “T-NDPack: Timing-Driven Non-Uniform Depopulation Based Clustering,” inIEEE Southern Conference on Programmable Logic (SPL), 2009, pp. 9–14

work page 2009

[14] [14]

Multilevel K-Way Hypergraph Partitioning,

G. Karypis and V . Kumar, “Multilevel K-Way Hypergraph Partitioning,” inACM/IEEE Design Automation Conference (DAC), 1999, pp. 343– 348

work page 1999

[15] [15]

K-Way Partitioning Based Packing for FPGA Logic Blocks Without Input Bandwidth Constraint,

W. Feng, “K-Way Partitioning Based Packing for FPGA Logic Blocks Without Input Bandwidth Constraint,” inIEEE International Conference on Field-Programmable Technology (FPT), 2012, pp. 8–15

work page 2012

[16] [16]

Rent’s Rule Based FPGA Packing for Routability Optimization,

W. Feng, J. Greene, K. V orwerk, V . Pevzner, and A. Kundu, “Rent’s Rule Based FPGA Packing for Routability Optimization,” inACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), 2014, pp. 31–34

work page 2014

[17] [17]

Runtime-Quality Tradeoff in Partitioning Based Multithreaded Packing,

D. Vercruyce, E. Vansteenkiste, and D. Stroobandt, “Runtime-Quality Tradeoff in Partitioning Based Multithreaded Packing,” inIEEE Inter- national Conference on Field Programmable Logic and Applications (FPL), 2016, pp. 1–9

work page 2016

[18] [18]

How Preserving Circuit Design Hierarchy During FPGA Packing Leads to Better Performance,

——, “How Preserving Circuit Design Hierarchy During FPGA Packing Leads to Better Performance,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 37, no. 3, pp. 629–642, 2017

work page 2017

[19] [19]

LSC: A Large-Scale Consensus-Based Clustering Algorithm for High-Performance FPGAs,

L. Singhal, M. A. Iyer, and S. Adya, “LSC: A Large-Scale Consensus-Based Clustering Algorithm for High-Performance FPGAs,” inACM/IEEE Design Automation Conference (DAC), 2017, pp. 1–6

work page 2017

[20] [20]

Simultaneous Timing Driven Clustering and Placement for FPGAs,

G. Chen and J. Cong, “Simultaneous Timing Driven Clustering and Placement for FPGAs,” inInternational Conference on Field Pro- grammable Logic and Applications (FPL), 2004, pp. 158–167

work page 2004

[21] [21]

Breaking Boundaries: Optimizing FPGA CAD with Flexible and Multi-threaded Re-Clustering,

M. A. Elgammal and V . Betz, “Breaking Boundaries: Optimizing FPGA CAD with Flexible and Multi-threaded Re-Clustering,” inACM Interna- tional Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART), 2023, pp. 11–18

work page 2023

[22] [22]

Architecture description and packing for logic blocks with hierarchy, modes and complex interconnect,

J. Luu, J. Anderson, and J. Rose, “Architecture description and packing for logic blocks with hierarchy, modes and complex interconnect,” Proceedings of 2011 Acm/Sigda International Symposium on Field Programmable Gate Arrays, pp. 227–236, 2011

work page 2011

[23] [23]

Towards Interconnect-Adaptive Packing for FPGAs,

J. Luu, J. Rose, and J. Anderson, “Towards Interconnect-Adaptive Packing for FPGAs,” inACM/SIGDA International Symposium on Field- Programmable Gate Arrays (FPGA), 2014, pp. 21–30

work page 2014

[24] [24]

Using Sparse Crossbars within LUT,

G. Lemieux and D. Lewis, “Using Sparse Crossbars within LUT,” inACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), 2001, pp. 59–68

work page 2001

[25] [25]

Academic Packing for Commercial FPGA Architec- tures,

T. D. Haroldsen, “Academic Packing for Commercial FPGA Architec- tures,” Ph.D. dissertation, Brigham Young University, 2017

work page 2017

[26] [26]

VTR 7.0: Next Generation Architecture and CAD System for FPGAs,

J. Luu, J. Goeders, M. Wainberg, A. Somerville, T. Yu, K. Nasartschuk, M. Nasr, S. Wang, T. Liu, N. Ahmed, K. B. Kent, J. Anderson, J. Rose, and V . Betz, “VTR 7.0: Next Generation Architecture and CAD System for FPGAs,”ACM Transactions on Reconfigurable Technology and Systems, vol. 7, no. 3, pp. 1–30, 2014

work page 2014

[27] [27]

VTR 8: High-performance CAD and Customizable FPGA Architecture Modelling,

K. E. Murray, O. Petelin, S. Zhong, J. M. Wang, M. Eldafrawy, J.-P. Legault, E. Sha, A. G. Graham, J. Wu, M. J. P. Walker, H. Zeng, P. Pa- tros, J. Luu, K. B. Kent, and V . Betz, “VTR 8: High-performance CAD and Customizable FPGA Architecture Modelling,”ACM Transactions on Reconfigurable Technology and Systems, vol. 13, no. 2, pp. 1–60, 2020

work page 2020

[28] [28]

Titan 2.0: Enabling Open-Source CAD Evaluation with a Modern Architecture Capture,

K. T. Khoozani, A. A. Dehkordi, and V . Betz, “Titan 2.0: Enabling Open-Source CAD Evaluation with a Modern Architecture Capture,” inIEEE International Conference on Field-Programmable Logic and Applications (FPL), 2023, pp. 57–64

work page 2023

[29] [29]

PathFinder: A Negotiation-Based Performance-Driven Router for FPGAs,

L. McMurchie and C. Ebeling, “PathFinder: A Negotiation-Based Performance-Driven Router for FPGAs,” inACM International Sym- posium on Field-Programmable Gate Arrays (FPGA), 1995, pp. 111— -117

work page 1995

[30] [30]

Guaranteed Yet Hard to Find: Uncovering FPGA Routing Convergence Paradox,

S. Shrivastava, S. Nikoli ´c, S. Tanaka, C. Ravishankar, D. Gaitonde, and M. Stojilovi ´c, “Guaranteed Yet Hard to Find: Uncovering FPGA Routing Convergence Paradox,” inIEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2025, pp. 143–151

work page 2025

[31] [31]

Valgrind,

“Valgrind,” https://valgrind.org, accessed: 2026-01-17

work page 2026