Hardware-Compatible Single-Shot Feasible-Space Heuristics for Solving the Quadratic Assignment Problem
Pith reviewed 2026-05-22 23:47 UTC · model grok-4.3
The pith
Co-designing local search with hardware allows single-shot evaluation of full feasible neighborhoods for the quadratic assignment problem.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By co-designing the local search heuristic with the underlying hardware, full neighbourhoods can be computed simultaneously to make update decisions. Binary solutions remain feasible by selecting local moves that lead to neighbouring feasible solutions. The approach is compatible with both digital computers and analog hardware and demonstrates effectiveness in CPU implementations by comparison with state-of-the-art heuristics.
What carries the argument
Single-shot feasible-space heuristics that use hardware parallelism to evaluate full feasible neighborhoods in one step and select only moves preserving feasibility.
If this is right
- The method runs on both CPU and IMC hardware without first mapping the problem to QUBO form.
- It produces solutions competitive with existing state-of-the-art heuristics when run on ordinary computers.
- Each iteration can exploit massive parallelism to evaluate every feasible neighbor at once.
- Feasibility is maintained through choice of moves rather than through extra penalty terms or repair steps.
Where Pith is reading between the lines
- The same single-step neighborhood evaluation could reduce runtime on other permutation-based problems whose feasible moves have similar structure.
- Actual hardware runs would be needed to confirm whether the claimed single parallel step incurs truly negligible overhead.
- If the parallelism scales, the approach could handle larger QAP instances than conventional local search allows on CPUs.
Load-bearing premise
The hardware can compute full feasible neighborhoods in a single parallel step with negligible overhead and without introducing new constraint violations.
What would settle it
Executing the heuristic on physical in-memory computing hardware and checking whether iteration time stays constant as neighborhood size grows on standard QAP benchmark instances.
Figures
read the original abstract
Research into the development of special-purpose computing architectures designed to solve quadratic unconstrained binary optimization (QUBO) problems has flourished in recent years. It has been demonstrated in the literature that such special-purpose solvers can outperform traditional complementary metal--oxide--semiconductor architectures by orders of magnitude with respect to timing metrics on synthetic problems. However, they face challenges with constrained problems such as the quadratic assignment problem (QAP), where mapping to binary formulations such as QUBO introduces overhead and limits parallelism. In-memory computing (IMC) devices, such as memristor-based analog Ising machines, offer significant speed-ups and efficiency gains over traditional CPU-based solvers, particularly for solving combinatorial optimization problems. In this work, we present a novel hardware-aware QAP optimization framework designed for IMC hardware. By co-designing the local search heuristic with the underlying hardware, we exploit the intrinsic massive parallelism that allows for computing of full neighbourhoods simultaneously to make update decisions. We ensure binary solutions remain feasible by selecting local moves that lead to neighbouring feasible solutions, leveraging feasible-space search heuristics and the underlying structure of a given problem. Our approach is compatible with both digital computers and analog hardware. We demonstrate its effectiveness in CPU implementations by comparing it with state-of-the-art heuristics for solving the QAP.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a hardware-aware local-search framework for the Quadratic Assignment Problem that co-designs a feasible-space heuristic with in-memory computing (IMC) architectures. By evaluating full neighborhoods in one parallel step and restricting updates to feasible neighboring solutions, the method aims to avoid QUBO-mapping overhead while remaining compatible with both digital CPUs and analog memristor-based Ising machines; effectiveness is shown via CPU implementations benchmarked against existing QAP heuristics.
Significance. A validated single-shot feasible-space heuristic on IMC hardware would remove a key barrier to applying analog solvers to constrained combinatorial problems and could deliver substantial speed and energy gains over CPU baselines. The CPU results supply a necessary first benchmark, but the absence of any hardware mapping or timing measurements means the central hardware-compatibility claim remains untested.
major comments (2)
- [Abstract] Abstract: the assertion that the heuristic 'allows for computing of full neighbourhoods simultaneously' on IMC hardware is unsupported; the manuscript reports only CPU implementations and provides neither a hardware mapping, crossbar simulation, nor any analog timing/energy data that would confirm single-step parallel evaluation of feasible neighborhoods.
- [Abstract] Abstract: the claim that binary solutions 'remain feasible by selecting local moves that lead to neighbouring feasible solutions' is presented as hardware-compatible, yet no mechanism is described for realizing the feasible-move filter in analog IMC circuitry without extra digital correction logic or risk of introducing uncorrectable constraint violations.
minor comments (1)
- The abstract refers to comparisons with 'state-of-the-art heuristics' but does not name the specific algorithms or report quantitative metrics (runtime, solution quality) in the provided text.
Simulated Author's Rebuttal
We thank the referee for the detailed review and for identifying points where the abstract could be read as overstating hardware results. The manuscript presents an algorithmic co-design whose CPU implementation is fully benchmarked; the hardware-compatibility claims rest on the design choices that map directly onto IMC parallelism and feasible-space moves. We will revise the abstract and add clarifying text to ensure no unsupported performance assertions appear. No hardware experiments or crossbar simulations were performed, so we limit claims to algorithmic compatibility.
read point-by-point responses
-
Referee: [Abstract] Abstract: the assertion that the heuristic 'allows for computing of full neighbourhoods simultaneously' on IMC hardware is unsupported; the manuscript reports only CPU implementations and provides neither a hardware mapping, crossbar simulation, nor any analog timing/energy data that would confirm single-step parallel evaluation of feasible neighborhoods.
Authors: We agree the phrasing in the abstract can be misread as claiming executed single-shot evaluation on IMC hardware. The heuristic is constructed so that a single parallel read-out of the entire neighborhood is sufficient to select the next feasible move; this property is a direct consequence of the feasible-space formulation and matches the native parallelism of IMC crossbars. The CPU implementation evaluates the same neighborhood structure sequentially to obtain the reported benchmarks. We will revise the abstract to read that the heuristic is designed to exploit single-shot neighborhood evaluation on IMC hardware, while making explicit that all reported results are from CPU runs and that hardware validation remains future work. revision: yes
-
Referee: [Abstract] Abstract: the claim that binary solutions 'remain feasible by selecting local moves that lead to neighbouring feasible solutions' is presented as hardware-compatible, yet no mechanism is described for realizing the feasible-move filter in analog IMC circuitry without extra digital correction logic or risk of introducing uncorrectable constraint violations.
Authors: The feasible-space restriction is enforced at the algorithmic level by enumerating only the moves that preserve the permutation structure of QAP solutions; this enumeration can be performed either in digital pre-processing or by a lightweight digital controller that gates the analog update. The manuscript does not claim an all-analog realization of the filter. We will add a short paragraph in the methods section describing this hybrid interface and will tone down the abstract to state that the approach remains compatible with IMC hardware when a feasible-move selector (digital or hybrid) is present. revision: yes
Circularity Check
No circularity: heuristic description contains no derivations, fits, or self-referential reductions
full rationale
The manuscript describes an algorithmic framework for feasible-space local search co-designed for IMC hardware. No equations, parameters, or first-principles derivations appear in the provided text. The central claims rest on stated design choices and CPU-based empirical comparisons rather than any quantity derived from itself by construction. No self-citations, ansatzes, or uniqueness theorems are invoked in a load-bearing way. The derivation chain is therefore self-contained and does not reduce to its inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
We provide an efficient construction of the full neighbourhood at a given feasible binary solution
-
[2]
We design a parallel evaluation framework that computes objective differences across all neighbouring solutions in one iteration
-
[3]
We introduce a framework tailored to the architectural requirements of application-driven IMC hard- ware. 1.3. Outline Section 2 provides a background on the QAP and a discussion on the motivation for using the feasible- space method. In Section 3, we show formulas that leverage problems’ underlying structure to optimize performance. Section 4 introduces ...
-
[4]
BACKGROUND In this section, we review existing approaches for solving the QAP and explain the rationale behind pursuing the feasible-space method. Being an NP-hard problem, the QAP is commonly tackled using local- search heuristic algorithms, such as those presented in Refs. [3, 9, 12]. They are CPU-based solvers that offer fast runtimes, and have been sh...
-
[5]
AN IMPROVED LOCAL SEARCH HEURISTIC FOR HARDWARE The QAP problem QAP( F, D) can be formulated as a series of matrix multiplications by representing solutions as permutation matrices. Recall that Pn is the set of n × n permutation matrices. Then, the 6 103 105 0 10 3 10 2 10 1 100 Optimality Gap chr12a 103 105 rou15 103 105 nug18 103 105 Iteration 0 10 3 10...
-
[6]
FULL-NEIGHBOURHOOD SEARCH In the native QAP space, given a solution represented as a permutation matrix X ∈ Pn, we call Y a “neighbour” of X if Y is obtained by swapping two distinct columns of X. A “full neighbourhood” of X is the set of all such neighbours of X. Hence, the full neighbourhood of X consists of n 2 = n(n−1) 2 members. In this section, we p...
-
[7]
∈ N (x). Note that by the definition (4.1), the elements in this neighbour are related by the equalities z∗ 1 = j ¯z1 n k · n + mod(¯z2, n) and z∗ 2 = j ¯z2 n k · n + mod(¯z1, n). (4.4) 12 Now, consider the neighbour that results from the quadruple bit-flip via (¯z1, ¯z2, z∗ 2 , z∗ 1), y = x − e¯z1 − e¯z2 + ez∗ 1 + ez∗ 2 , and its full neighbourhood N (y)...
-
[8]
For all i ∈ R1, let S12(i, ¯z1) = 0 and S12(i, z∗
-
[9]
For all i ∈ R2, let S12(i, ¯z2) = 0 and S12(i, z∗
-
[10]
For i ∈ R3, let S12(i, [¯z1, ¯z2]) = 0 and S12(i, [z∗ 1 , z∗ 2]) = 1. Note that this update involves changing 2n − 2 elements of S12 from 0 to 1, and another 2 n − 2 changes from 1 to 0. See Figure 4.2 (a), (b), and (c) for an illustration of the transition of this update for an instance of size n = 12. The update procedure for S12 is outlined in Algorith...
-
[11]
= 1, i ∈ R1 3: S12(i, z2) = 0, S12(i, z∗
-
[12]
= 1, i ∈ R2 4: S12(i, [z1, z2]) = 0, S12(i, [z∗ 1 , z∗ 2]) = 1, i ∈ R3 5: return S12 6: end function Recall that a neighbour has an analogous meaning in the space of permutation matrices. Once a neighbour is chosen from N (vec(X)), it analogously means that two columns of X ∈ Pn are swapped while the other n − 2 columns remain invariant. The indices of th...
-
[13]
NUMERICAL EXPERIMENTS We used instances from the QAPLIB [27] of various sizes ranging from n = 12 to n = 100 to evaluate the performance of Algorithm 4.5. All instances from the QAPLIB follow a size-based naming convention, “identifier n version”, where n indicates the problem size. The relative optimality gap was chosen as the performance metric. Let fbe...
work page 2000
-
[14]
CONCLUSIONS To sustain the computational performance growth rate in the post-Moore’s law era, new technologies such as IMC have emerged. Mapping the native formulation of binary constrained optimization problems onto the unconstrained binary formulation incurs a massive overhead and remains the primary obstacle limiting the performance of special-purpose ...
- [15]
-
[16]
M. Bagherbeik, W. Xu, S. F. Mousavi, K. Kanda, H. Tamura, and A. Sheikholeslami , Maqo: A scalable many-core annealer for quadratic optimization, in 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2022, pp. 76–77
work page 2022
-
[17]
M. Bashiri and H. Karimi, Effective heuristics and meta-heuristics for the quadratic assignment problem with tuned parameters and analytical comparisons, Journal of industrial engineering international, 8 (2012), pp. 1–9
work page 2012
-
[18]
T. Bhattacharya, G. H. Hutchinson, G. Pedretti, X. Sheng, J. Ignowski, T. Van Vaerenbergh, R. Beausoleil, J. P. Strachan, and D. B. Strukov , Computing high-degree polynomial gradients in memory, Nature Communications, 15 (2024), p. 8211
work page 2024
-
[19]
R. E. Burkard, E. Cela, P. M. Pardalos, and L. S. Pitsoulis, The quadratic assignment problem, Springer, 1998
work page 1998
-
[20]
F. Cai, S. Kumar, T. Van Vaerenbergh, X. Sheng, R. Liu, C. Li, Z. Liu, M. Foltin, S. Yu, Q. Xia, et al., Power-efficient combinatorial optimization using intrinsic noise in memristor hopfield neural networks, Nature Electronics, 3 (2020), pp. 409–418
work page 2020
-
[21]
S. A. de Carvalho Jr. and S. Rahmann , Microarray layout as quadratic assignment problem, in German Conference on Bioinformatics, Gesellschaft f¨ ur Informatik eV, 2006
work page 2006
-
[22]
D. Dobrynin, A. Renaudineau, M. Hizzani, D. Strukov, M. Mohseni, and J. P. Strachan , Energy landscapes of combinatorial optimization in Ising machines, Phys. Rev. E, 110 (2024), p. 045308
work page 2024
-
[23]
Z. Drezner, A new genetic algorithm for the quadratic assignment problem, INFORMS journal on computing, 15 (2003), pp. 320–330
work page 2003
- [24]
- [25]
-
[26]
L. M. Gambardella, ´E. D. Taillard, and M. Dorigo , Ant colonies for the quadratic assignment problem, The Journal of the Operational Research Society, 50 (1999), pp. 167–176
work page 1999
- [27]
-
[28]
H. Goto, K. Tatsumura, and A. R. Dixon, Combinatorial optimization by simulating adiabatic bifurcations 29 in nonlinear Hamiltonian systems, Science Advances, 5 (2019)
work page 2019
-
[29]
R. Hamerly, T. Inagaki, P. L. McMahon, D. Venturelli, A. Marandi, T. Onodera, E. Ng, C. Lan- grock, K. Inaba, T. Honjo, K. Enbutsu, T. Umeki, R. Kasahara, S. Utsunomiya, S. Kako, K. I. Kawarabayashi, R. L. Byer, M. M. Fejer, H. Mabuchi, D. Englund, E. Rieffel, H. Takesue, and Y. Yamamoto, Experimental investigation of performance differences between coher...
work page 2019
-
[30]
M. Johnson, M. Amin, S. Gildert, T. Lanting, F. Hamze, N. Dickson, R. Harris, A. Berkley, J. Jo- hansson, P. Bunyk, E. Chapple, C. Enderud, J. Hilton, K. Karimi, E. Ladizinsky, N. Ladizinsky, T. Oh, I. Perminov, C. Rich, and G. Rose , Quantum annealing with manufactured spins, Nature, 473 (2011), pp. 194–8
work page 2011
-
[31]
T. C. Koopmans and M. Beckmann , Assignment problems and the location of economic activities, Econo- metrica, 25 (1957), pp. 53–76
work page 1957
-
[32]
J. Li, M. Moghaddam, and S. Y. Nof , Dynamic storage assignment with product affinity and abc classification—a case study, The International Journal of Advanced Manufacturing Technology, 84 (2016), pp. 2179–2194
work page 2016
-
[33]
A. Lu, J. Hur, Y.-C. Luo, H. Li, D. E. Nikonov, I. A. Young, Y.-K. Choi, and S. Yu, Scalable in-memory clustered annealer with temporal noise of charge trap transistor for large scale travelling salesman problems, IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 13 (2023), pp. 422–435
work page 2023
-
[34]
Lucas, Ising formulations of many NP problems, Frontiers in Physics, 2 (2014)
A. Lucas, Ising formulations of many NP problems, Frontiers in Physics, 2 (2014)
work page 2014
-
[35]
S. Matsubara, H. Tamura, M. Takatsu, D. Yoo, B. Vatankhahghadim, H. Yamasaki, T. Miyazawa, S. Tsukamoto, Y. Watanabe, K. Takemoto, and A. Sheikholeslami , Ising-model optimizer with parallel-trial bit-sieve engine, in Complex, Intelligent, and Software Intensive Systems – Proceedings of the 11th International Conference on Complex, Intelligent, and Softwa...
work page 2017
-
[36]
M. Mohseni, D. Eppens, J. Str ¨umpfer, R. Marino, V. S. Denchev, A. K. Ho, S. V. Isakov, S. Boixo, F. Ricci-Tersenghi, and H. Neven, Nonequilibrium monte carlo for unfreezing variables in hard combinatorial optimization, arXiv.2111.13628, (2021)
-
[37]
D. Munera, D. Diaz, and S. Abreu , Hybridization as cooperative parallelism for the quadratic assignment problem, in Hybrid Metaheuristics, M. J. Blesa, C. Blum, A. Cangelosi, V. Cutello, A. DI NUOVO, M. Pavone, and E.-G. Talbi, eds., Cham, 2016, Springer International Publishing, pp. 47–61
work page 2016
-
[38]
T. Okuyama, T. Sonobe, K. Kawarabayashi, and M. Yamaoka , Binary optimization by momentum annealing, Physics Review E, 100 (2019), p. 012111
work page 2019
-
[39]
T. Okuyama, C. Yoshimura, M. Hayashi, and M. Yamaoka , Computing architecture to perform approximated simulated annealing for Ising models, in IEEE International Conference on Rebooting Computing (ICRC), 2016, pp. 1–8
work page 2016
-
[40]
G. Pedretti, F. B ¨ohm, T. Bhattacharya, A. Heittman, X. Zhang, M. Hizzani, G. Hutchinson, D. Kwon, J. Moon, E. Valiante, I. Rozada, C. E. Graves, J. Ignowski, M. Mohseni, J. P. Stra- chan, D. Strukov, R. Beausoleil, and T. V. Vaerenbergh , Solving Boolean satisfiability problems with resistive content addressable memories, arXiv:2501.07733, (2025)
-
[41]
https://coral.ise.lehigh.edu/data- sets/qaplib/
QAPLIB, QAPLIB: A library of quadratic assignment problem instances. https://coral.ise.lehigh.edu/data- sets/qaplib/
- [42]
-
[43]
W. Song, M. Rao, Y. Li, C. Li, Y. Zhuo, F. Cai, M. Wu, W. Yin, Z. Li, Q. Wei, S. Lee, H. Zhu, L. Gong, M. Barnell, Q. Wu, P. A. Beerel, M. S.-W. Chen, N. Ge, M. Hu, Q. Xia, and J. J. Yang, Programming memristor arrays with arbitrarily high precision for analog computing, Science, 383 (2024), pp. 903–910
work page 2024
-
[44]
J. Su, T. Tu, and L. He , A quantum annealing approach for Boolean satisfiability problem, in 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC), 2016, pp. 1–6
work page 2016
-
[45]
Taillard, Robust tabu search for the quadratic assignment problem, Parallel Computing, 17 (1991), pp
E. Taillard, Robust tabu search for the quadratic assignment problem, Parallel Computing, 17 (1991), pp. 443– 455
work page 1991
-
[46]
R. H. Warren, Solving the traveling salesman problem on a quantum annealer, SN applied sciences, 2 (2020), p. 75
work page 2020
-
[47]
C. Yoshimura, M. Hayashi, T. Okuyama, and M. Yamaoka, Implementation and evaluation of FPGA-based annealing processor for Ising model by use of resource sharing, International Journal of Networking and Com- puting, 7 (2017), pp. 154–172
work page 2017
- [48]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.