A Hardware Accelerator for the Goemans-Williamson Algorithm

D. A. Herrera-Mart\'i; E. Guthmuller; J. Fereyre

arxiv: 2510.02863 · v1 · submitted 2025-10-03 · 💻 cs.AR · cs.DS· cs.NA· math.NA· quant-ph

A Hardware Accelerator for the Goemans-Williamson Algorithm

D. A. Herrera-Mart\'i , E. Guthmuller , J. Fereyre This is my paper

Pith reviewed 2026-05-18 10:29 UTC · model grok-4.3

classification 💻 cs.AR cs.DScs.NAmath.NAquant-ph

keywords Max-CutGoemans-Williamson algorithmsemidefinite programmingconjugate gradient methodhardware acceleratorextended precisioninterior point methods

0 comments

The pith

Higher internal precision in conjugate gradient reduces time to solution for large Max-Cut SDP relaxations by a factor that grows with system size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows how extended floating point precision can be used in the conjugate gradient method for solving large instances of the semidefinite program that relaxes Max-Cut in the Goemans-Williamson algorithm. The approach is intended for hardware accelerators that support such precision natively. A sympathetic reader cares because it offers a way to obtain solutions with provable approximation guarantees more quickly when problems are too large for direct matrix methods. The estimate indicates that the speedup from reduced iterations grows as the system size increases.

Core claim

When using indirect matrix inversion methods like Conjugate Gradient, which have lower complexity than direct methods and are therefore used in very large problems, increasing the internal working precision reduces the time to solution by a factor that increases with the system size on a hardware architecture that runs natively on extended precision.

What carries the argument

Conjugate gradient solver using extended floating point precision inside interior point methods for the Goemans-Williamson semidefinite relaxation of Max-Cut.

If this is right

Time to solution decreases more significantly for larger systems.
The method enables practical use of worst-case guaranteed Max-Cut approximations in performance-critical scenarios.
Hardware designs can leverage native extended precision to translate iteration reductions into direct speedups.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The technique may apply to other large-scale convex optimization tasks that rely on iterative linear solvers.
Future hardware could integrate this precision strategy to improve classical solvers used alongside quantum optimization methods.

Load-bearing premise

The hardware can execute extended-precision arithmetic natively with negligible overhead outside the linear solver itself.

What would settle it

Build a prototype hardware accelerator and compare the actual runtime for solving Max-Cut SDP instances of increasing size using standard versus extended precision in the conjugate gradient step.

Figures

Figures reproduced from arXiv: 2510.02863 by D. A. Herrera-Mart\'i, E. Guthmuller, J. Fereyre.

**Figure 3.** Figure 3: FIG. 3. Ratio between the attained optimal value and the best [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4. Time spent in CG algorithm at each Newton step for [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5. Total time spent in CG for all problems and preci [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: FIG. 6. The Roofline model provides a visual representa [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

read the original abstract

The combinatorial problem Max-Cut has become a benchmark in the evaluation of local search heuristics for both quantum and classical optimisers. In contrast to local search, which only provides average-case performance guarantees, the convex semidefinite relaxation of Max-Cut by Goemans and Williamson, provides worst-case guarantees and is therefore suited to both the construction of benchmarks and in applications to performance-critic scenarios. We show how extended floating point precision can be incorporated in algebraic subroutines in convex optimisation, namely in indirect matrix inversion methods like Conjugate Gradient, which are used in Interior Point Methods in the case of very large problem sizes. Also, an estimate is provided of the expected acceleration of the time to solution for a hardware architecture that runs natively on extended precision. Specifically, when using indirect matrix inversion methods like Conjugate Gradient, which have lower complexity than direct methods and are therefore used in very large problems, we see that increasing the internal working precision reduces the time to solution by a factor that increases with the system size.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sketches using higher internal precision in conjugate gradient for the large SDP inside Goemans-Williamson to cut iterations and get growing wall-clock speedup on custom hardware, but supplies no measurements or architecture details to support the claim.

read the letter

The core idea is that indirect solvers like conjugate gradient, already used for very large Max-Cut SDPs, converge faster at extended precision, and a hardware unit that runs that precision natively could turn the iteration savings into real acceleration that scales with problem size. That observation is not new in numerical linear algebra, but applying it specifically to the Goemans-Williamson interior-point path and tying it to a hardware accelerator is a reasonable domain extension. The paper correctly notes that direct methods become impractical at scale, so the focus on iterative methods is the right one for the target regime. Credit for keeping the discussion grounded in existing complexity considerations rather than inventing new theory. The estimate of acceleration is presented as an expectation derived from those considerations, which is fair as far as it goes. The soft spot is exactly where the stress-test note points: the speedup only materializes if the extra-precision operations incur negligible overhead outside the solver kernel itself. The abstract gives the scaling claim but shows neither iteration counts, convergence plots, nor any micro-architectural numbers on area or cycle cost for the extended-precision units. Without those, the central performance argument stays at the level of a plausible projection rather than a demonstrated result. The citation pattern is light but appropriate for a short note that builds on standard facts about iterative solvers. This is for people who build or evaluate hardware for convex optimization benchmarks, not for readers looking for a new algorithmic result or formal proof. A serious referee could usefully ask for the missing convergence data and a concrete hardware sketch; the work is coherent enough on its own terms to deserve that review rather than an immediate desk reject.

Referee Report

2 major / 1 minor

Summary. The paper proposes a hardware accelerator for the Goemans-Williamson algorithm applied to Max-Cut. It focuses on incorporating extended floating-point precision into indirect solvers such as Conjugate Gradient within Interior Point Methods for the SDP relaxation, which are used at large problem sizes. The central claim is that raising internal working precision reduces iteration counts sufficiently that time-to-solution improves by a factor that grows with system size; an estimate of the resulting wall-clock acceleration is provided for a hardware architecture that executes extended-precision arithmetic natively.

Significance. If the unshown convergence model, iteration counts, and hardware-overhead analysis can be supplied and validated, the work would demonstrate a concrete precision-iteration tradeoff that improves scaling of iterative linear solvers in SDP-based combinatorial optimization. This could inform the design of specialized accelerators for large-scale convex relaxations, particularly where direct methods become prohibitive.

major comments (2)

[Abstract] Abstract: the estimate of expected acceleration is stated without any convergence model, measured iteration counts for CG at different precisions, or error analysis showing how higher precision reduces iterations enough to produce a speedup factor that increases with system size. The central scaling claim therefore rests on unshown calculations.
[Abstract] The manuscript supplies no cycle counts, area overhead figures, or micro-architectural description demonstrating that extended-precision operations can be realized with negligible overhead relative to the linear-algebra kernel. Without this, the translation from iteration reduction to wall-clock speedup remains an unverified assumption.

minor comments (1)

[Abstract] The abstract refers to 'an estimate' of acceleration but does not indicate whether this estimate is derived analytically or from simulation; clarifying the derivation method would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. The points raised identify areas where additional supporting details would improve clarity and rigor. We address each major comment below and describe the changes we will incorporate in the revised version.

read point-by-point responses

Referee: [Abstract] Abstract: the estimate of expected acceleration is stated without any convergence model, measured iteration counts for CG at different precisions, or error analysis showing how higher precision reduces iterations enough to produce a speedup factor that increases with system size. The central scaling claim therefore rests on unshown calculations.

Authors: We agree that the abstract, being a high-level summary, does not present the supporting convergence model, iteration counts, or error analysis. The estimate in the manuscript is based on the established dependence of CG convergence on numerical stability and condition number, where extended precision reduces rounding-error accumulation that grows with problem size in SDP relaxations. In the revision we will add a concise convergence model and error analysis (drawing on standard CG theory) to the abstract or a new subsection, and we will include available iteration-count data from our simulations to substantiate that the speedup factor increases with system size. revision: yes
Referee: [Abstract] The manuscript supplies no cycle counts, area overhead figures, or micro-architectural description demonstrating that extended-precision operations can be realized with negligible overhead relative to the linear-algebra kernel. Without this, the translation from iteration reduction to wall-clock speedup remains an unverified assumption.

Authors: We acknowledge that the manuscript does not supply explicit cycle counts, area-overhead estimates, or a micro-architectural description. The current estimate assumes native extended-precision execution with low overhead, consistent with known VLSI trade-offs for floating-point units. In the revised manuscript we will add a dedicated paragraph or short appendix providing high-level cycle-count estimates, area-overhead figures relative to standard FP units, and a brief micro-architectural sketch to justify the negligible-overhead assumption and strengthen the link to wall-clock speedup. revision: yes

Circularity Check

0 steps flagged

No circularity: claim follows from standard CG complexity scaling without reduction to fitted inputs or self-citations

full rationale

The paper derives the claim that higher internal precision reduces CG iteration count (and thus time-to-solution by a factor growing with system size) from the known dependence of conjugate-gradient convergence on condition number and working precision in large-scale SDP relaxations inside Goemans-Williamson. No equation, prediction, or estimate is obtained by fitting a parameter to a subset of data and then renaming the fit as a prediction; no load-bearing step invokes a self-citation whose authors overlap with the present work; and the hardware-overhead assumption is stated separately rather than smuggled into the algebraic derivation. The result is therefore self-contained against external numerical-linear-algebra benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard numerical assumption that higher working precision improves conjugate-gradient convergence for the linear systems arising in interior-point SDP solvers, plus the engineering premise that extended-precision hardware can be built with low overhead.

axioms (1)

domain assumption Higher internal floating-point precision reduces the iteration count of conjugate gradient when solving the Newton systems inside interior-point methods for the Max-Cut SDP.
Invoked when the authors link increased precision to reduced time-to-solution that grows with system size.

pith-pipeline@v0.9.0 · 5725 in / 1218 out tokens · 49123 ms · 2026-05-18T10:29:11.611563+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

[1]

This RISC-V pro- cessor has been fitted with a variable and extended preci- sion floating point unit and a corresponding instruction set extension

for extended precision computing. This RISC-V pro- cessor has been fitted with a variable and extended preci- sion floating point unit and a corresponding instruction set extension. It supports up to 512-bit floating point precision with performance depending on input and out- put precision. We have validated and measured this hard- FIG. 3. Ratio between ...

work page
[2]

Combinatorial op- timization: theory and algorithms

KORTE, Bernhard; VYGEN, Jens. Combinatorial op- timization: theory and algorithms. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008

work page 2008
[3]

GENDREAU, Michel, et al. (ed.). Handbook of meta- heuristics. New York: Springer, 2010

work page 2010
[4]

Mathemati- cal foundation of quantum annealing

MORITA, Satoshi; NISHIMORI, Hidetoshi. Mathemati- cal foundation of quantum annealing. Journal of Mathe- matical Physics, 2008, vol. 49, no 12

work page 2008
[5]

Simulated quantum annealing can be exponentially faster than clas- sical simulated annealing

CROSSON, Elizabeth; HARROW, Aram W. Simulated quantum annealing can be exponentially faster than clas- sical simulated annealing. 2016 IEEE 57th Annual Sym- posium on Foundations of Computer Science (FOCS). IEEE, 2016. p. 714-723

work page 2016
[6]

Perspectives of quantum anneal- ing: Methods and implementations

HAUKE, Philipp, et al. Perspectives of quantum anneal- ing: Methods and implementations. Reports on Progress in Physics, 2020, vol. 83, no 5, p. 054401

work page 2020
[7]

A coherent Ising machine for 2000-node optimization problems

INAGAKI, Takahiro, et al. A coherent Ising machine for 2000-node optimization problems. Science, 2016, vol. 354, no 6312, p. 603-606

work page 2000
[8]

Coherent Ising ma- chines—optical neural networks operating at the quan- tum limit

YAMAMOTO, Yoshihisa, et al. Coherent Ising ma- chines—optical neural networks operating at the quan- tum limit. npj Quantum Information, 2017, vol. 3, no 1, p. 49

work page 2017
[9]

A time-division multiplex- ing Ising machine on FPGAs

YAMAMOTO, Kasho, et al. A time-division multiplex- ing Ising machine on FPGAs. Proceedings of the 8th In- ternational Symposium on Highly Efficient Accelerators and Reconfigurable Technologies. 2017. p. 1-6

work page 2017
[10]

High-speed sparse Ising model on FPGA

MINAMISAWA, Akira; IIMURA, Ryoma; KAWA- HARA, Takayuki. High-speed sparse Ising model on FPGA. 2019 IEEE 62nd International Midwest Sympo- sium on Circuits and Systems (MWSCAS). IEEE, 2019. p. 670-673

work page 2019
[11]

GPU-based Ising computing for solving max-cut combinatorial optimization problems

COOK, Chase, et al. GPU-based Ising computing for solving max-cut combinatorial optimization problems. Integration, 2019, vol. 69, p. 335-344

work page 2019
[12]

FPGA-based simulated bifurcation machine

TATSUMURA, Kosuke; DIXON, Alexander R.; GOTO, Hayato. FPGA-based simulated bifurcation machine. 29th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 2019. p. 59-66

work page 2019
[13]

Scaling out Ising machines using a multi-chip ar- chitecture for simulated bifurcation

TATSUMURA, Kosuke; YAMASAKI, Masaya; GOTO, Hayato. Scaling out Ising machines using a multi-chip ar- chitecture for simulated bifurcation. Nature Electronics, 2021, vol. 4, no 3, p. 208-217

work page 2021
[14]

Application of digital annealer for faster combinatorial optimization

SAO, Masataka, et al. Application of digital annealer for faster combinatorial optimization. Fujitsu Scientific and Technical Journal, 2019, vol. 55, no 2, p. 45-51

work page 2019
[15]

Ising-FPGA: A spintronics-based reconfigurable Ising model solver

MONDAL, Ankit; SRIVASTAVA, Ankur. Ising-FPGA: A spintronics-based reconfigurable Ising model solver. ACM Transactions on Design Automation of Electronic Systems (TODAES), 2020, vol. 26, no 1, p. 1-27

work page 2020
[16]

100,000-spin coherent Ising machine

HONJO, Toshimori, et al. 100,000-spin coherent Ising machine. Science advances, 2021, vol. 7, no 40, p. eabh0952

work page 2021
[17]

Ising machines as hardware solvers of combinatorial 7 optimization problems

MOHSENI, Naeimeh; MCMAHON, Peter L.; BYRNES, Tim. Ising machines as hardware solvers of combinatorial 7 optimization problems. Nature Reviews Physics, 2022, vol. 4, no 6, p. 363-379

work page 2022
[18]

All-to-all reconfigurability with sparse and higher-order Ising machines

NIKHAR, Srijan, et al. All-to-all reconfigurability with sparse and higher-order Ising machines. Nature Commu- nications, 2024, vol. 15, no 1, p. 8977

work page 2024
[19]

879- approximation algorithms for max cut and max 2sat

GOEMANS, Michel X.; WILLIAMSON, David P. . 879- approximation algorithms for max cut and max 2sat. Proceedings of the Twenty-sixth Annual ACM Sympo- sium on Theory of Computing. 1994. p. 422-431

work page 1994
[20]

Im- proved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming

GOEMANS, Michel X.; WILLIAMSON, David P. Im- proved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM (JACM), 1995, vol. 42, no 6, p. 1115-1145

work page 1995
[21]

Obstacles to variational quantum optimization from symmetry protection

BRAVYI, Sergey, et al. Obstacles to variational quantum optimization from symmetry protection. Physical review letters, 2020, vol. 125, no 26, p. 260505

work page 2020
[22]

On Khot’s unique games conjecture

TREVISAN, Luca. On Khot’s unique games conjecture. Bulletin of the American Mathematical Society, 2012, vol. 49, no 1, p. 91-111

work page 2012
[23]

Approximating the value of two power proof systems, with applications to max 2sat and max dicut

FEIGE, Uriel; GOEMANS, Michel. Approximating the value of two power proof systems, with applications to max 2sat and max dicut. Proceedings third Israel sym- posium on the theory of computing and systems. IEEE,

work page
[24]

Approx- imation algorithms for MAX-3-CUT and other prob- lems via complex semidefinite programming.Proceedings of the thirty-third annual ACM symposium on Theory of computing

GOEMANS, Michel X.; WILLIAMSON, David. Approx- imation algorithms for MAX-3-CUT and other prob- lems via complex semidefinite programming.Proceedings of the thirty-third annual ACM symposium on Theory of computing. 2001. p. 443-452

work page 2001
[25]

Interior point methods in semidefi- nite programming with applications to combinatorial op- timization

ALIZADEH, Farid. Interior point methods in semidefi- nite programming with applications to combinatorial op- timization. SIAM journal on Optimization, 1995, vol. 5, no 1, p. 13-51

work page 1995
[26]

Convex optimization

BOYD, Stephen P.; VANDENBERGHE, Lieven. Convex optimization. Cambridge university press, 2004

work page 2004
[27]

MPFR: A multiple-precision binary floating-point library with correct rounding

FOUSSE, Laurent, et al. MPFR: A multiple-precision binary floating-point library with correct rounding. ACM Transactions on Mathematical Software (TOMS), 2007, vol. 33, no 2, p. 13-es

work page 2007
[28]

A new stopping criterion for Krylov solvers applied in interior point meth- ods

ZANETTI, Filippo; GONDZIO, Jacek. A new stopping criterion for Krylov solvers applied in interior point meth- ods. SIAM Journal on Scientific Computing, 2023, vol. 45, no 2, p. A703-A728

work page 2023
[29]

Interior methods for nonlinear optimization

FORSGREN, Anders; GILL, Philip E.; WRIGHT, Mar- garet H. Interior methods for nonlinear optimization. SIAM review, 2002, vol. 44, no 4, p. 525-597

work page 2002
[30]

Iterative methods for solving lin- ear systems

GREENBAUM, Anne. Iterative methods for solving lin- ear systems. Society for Industrial and Applied Mathe- matics, 1997

work page 1997
[31]

Guthmuller et al., ”Xvpfloat: RISC-V ISA Extension for Variable Extended Precision Floating Point Compu- tation,” in IEEE Transactions on Computers, vol

E. Guthmuller et al., ”Xvpfloat: RISC-V ISA Extension for Variable Extended Precision Floating Point Compu- tation,” in IEEE Transactions on Computers, vol. 73, no. 7, pp. 1683-1697, July 2024

work page 2024
[32]

Fuguet, E

C. Fuguet, E. Guthmuller, A. Bocco, J. Fereyre, A. Evans and Y. Durand, ”A Variable and Extended Precision (VRP) Accelerator and its 22 nm SoC Implementation,” 2024 39th Conference on Design of Circuits and Inte- grated Systems (DCIS), Catania, Italy, 2024, pp. 1-6

work page 2024
[33]

Guthmuller, C

E. Guthmuller, C. Fuguet, A. Bocco, J. Fereyre, A. Evans, Y. Durand, ”Variable and Extended Precision (VRP) Accelerator Implemented in a 22nm SoC,” 2025, Electronics Letters, 61 (1), art. no. e70255s

work page 2025
[34]

Roofline: An insightful visual performance model for multicore ar- chitectures,

S. Williams, A. Waterman, and D. Patterson, “Roofline: An insightful visual performance model for multicore ar- chitectures,” Commun. ACM, vol. 52, no. 4, pp. 65–76, Apr. 2009

work page 2009
[35]

β- approximate central path

K. Kim and M. -j. Park, ”Present and Future, Challenges of High Bandwith Memory (HBM),” 2024 IEEE Interna- tional Memory Workshop (IMW), Seoul, Korea, Republic of, 2024, pp. 1-4. 8 APPENDIX A. Primal-Dual Barrier Method for Semidefinite Programming By introducing the barrierµ, the original SDP cost function is modified with a log-determinant barrier func-...

work page 2024
[36]

Spatial locality: successive memory accesses are generally done to close locations, most often con- secutive memory addresses

work page
[37]

To exploit spatial locality, caches store blocks of data, calledcache linesorcache blocks, instead of a single vari- able

Temporal locality: a memory location has a high chance to be accessed multiple times during the execution of the program. To exploit spatial locality, caches store blocks of data, calledcache linesorcache blocks, instead of a single vari- able. The VXP accelerator first level caches implements 64B cache lines, which is typical for general purpose pro- ces...

work page

[1] [1]

This RISC-V pro- cessor has been fitted with a variable and extended preci- sion floating point unit and a corresponding instruction set extension

for extended precision computing. This RISC-V pro- cessor has been fitted with a variable and extended preci- sion floating point unit and a corresponding instruction set extension. It supports up to 512-bit floating point precision with performance depending on input and out- put precision. We have validated and measured this hard- FIG. 3. Ratio between ...

work page

[2] [2]

Combinatorial op- timization: theory and algorithms

KORTE, Bernhard; VYGEN, Jens. Combinatorial op- timization: theory and algorithms. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008

work page 2008

[3] [3]

GENDREAU, Michel, et al. (ed.). Handbook of meta- heuristics. New York: Springer, 2010

work page 2010

[4] [4]

Mathemati- cal foundation of quantum annealing

MORITA, Satoshi; NISHIMORI, Hidetoshi. Mathemati- cal foundation of quantum annealing. Journal of Mathe- matical Physics, 2008, vol. 49, no 12

work page 2008

[5] [5]

Simulated quantum annealing can be exponentially faster than clas- sical simulated annealing

CROSSON, Elizabeth; HARROW, Aram W. Simulated quantum annealing can be exponentially faster than clas- sical simulated annealing. 2016 IEEE 57th Annual Sym- posium on Foundations of Computer Science (FOCS). IEEE, 2016. p. 714-723

work page 2016

[6] [6]

Perspectives of quantum anneal- ing: Methods and implementations

HAUKE, Philipp, et al. Perspectives of quantum anneal- ing: Methods and implementations. Reports on Progress in Physics, 2020, vol. 83, no 5, p. 054401

work page 2020

[7] [7]

A coherent Ising machine for 2000-node optimization problems

INAGAKI, Takahiro, et al. A coherent Ising machine for 2000-node optimization problems. Science, 2016, vol. 354, no 6312, p. 603-606

work page 2000

[8] [8]

Coherent Ising ma- chines—optical neural networks operating at the quan- tum limit

YAMAMOTO, Yoshihisa, et al. Coherent Ising ma- chines—optical neural networks operating at the quan- tum limit. npj Quantum Information, 2017, vol. 3, no 1, p. 49

work page 2017

[9] [9]

A time-division multiplex- ing Ising machine on FPGAs

YAMAMOTO, Kasho, et al. A time-division multiplex- ing Ising machine on FPGAs. Proceedings of the 8th In- ternational Symposium on Highly Efficient Accelerators and Reconfigurable Technologies. 2017. p. 1-6

work page 2017

[10] [10]

High-speed sparse Ising model on FPGA

MINAMISAWA, Akira; IIMURA, Ryoma; KAWA- HARA, Takayuki. High-speed sparse Ising model on FPGA. 2019 IEEE 62nd International Midwest Sympo- sium on Circuits and Systems (MWSCAS). IEEE, 2019. p. 670-673

work page 2019

[11] [11]

GPU-based Ising computing for solving max-cut combinatorial optimization problems

COOK, Chase, et al. GPU-based Ising computing for solving max-cut combinatorial optimization problems. Integration, 2019, vol. 69, p. 335-344

work page 2019

[12] [12]

FPGA-based simulated bifurcation machine

TATSUMURA, Kosuke; DIXON, Alexander R.; GOTO, Hayato. FPGA-based simulated bifurcation machine. 29th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 2019. p. 59-66

work page 2019

[13] [13]

Scaling out Ising machines using a multi-chip ar- chitecture for simulated bifurcation

TATSUMURA, Kosuke; YAMASAKI, Masaya; GOTO, Hayato. Scaling out Ising machines using a multi-chip ar- chitecture for simulated bifurcation. Nature Electronics, 2021, vol. 4, no 3, p. 208-217

work page 2021

[14] [14]

Application of digital annealer for faster combinatorial optimization

SAO, Masataka, et al. Application of digital annealer for faster combinatorial optimization. Fujitsu Scientific and Technical Journal, 2019, vol. 55, no 2, p. 45-51

work page 2019

[15] [15]

Ising-FPGA: A spintronics-based reconfigurable Ising model solver

MONDAL, Ankit; SRIVASTAVA, Ankur. Ising-FPGA: A spintronics-based reconfigurable Ising model solver. ACM Transactions on Design Automation of Electronic Systems (TODAES), 2020, vol. 26, no 1, p. 1-27

work page 2020

[16] [16]

100,000-spin coherent Ising machine

HONJO, Toshimori, et al. 100,000-spin coherent Ising machine. Science advances, 2021, vol. 7, no 40, p. eabh0952

work page 2021

[17] [17]

Ising machines as hardware solvers of combinatorial 7 optimization problems

MOHSENI, Naeimeh; MCMAHON, Peter L.; BYRNES, Tim. Ising machines as hardware solvers of combinatorial 7 optimization problems. Nature Reviews Physics, 2022, vol. 4, no 6, p. 363-379

work page 2022

[18] [18]

All-to-all reconfigurability with sparse and higher-order Ising machines

NIKHAR, Srijan, et al. All-to-all reconfigurability with sparse and higher-order Ising machines. Nature Commu- nications, 2024, vol. 15, no 1, p. 8977

work page 2024

[19] [19]

879- approximation algorithms for max cut and max 2sat

GOEMANS, Michel X.; WILLIAMSON, David P. . 879- approximation algorithms for max cut and max 2sat. Proceedings of the Twenty-sixth Annual ACM Sympo- sium on Theory of Computing. 1994. p. 422-431

work page 1994

[20] [20]

Im- proved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming

GOEMANS, Michel X.; WILLIAMSON, David P. Im- proved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM (JACM), 1995, vol. 42, no 6, p. 1115-1145

work page 1995

[21] [21]

Obstacles to variational quantum optimization from symmetry protection

BRAVYI, Sergey, et al. Obstacles to variational quantum optimization from symmetry protection. Physical review letters, 2020, vol. 125, no 26, p. 260505

work page 2020

[22] [22]

On Khot’s unique games conjecture

TREVISAN, Luca. On Khot’s unique games conjecture. Bulletin of the American Mathematical Society, 2012, vol. 49, no 1, p. 91-111

work page 2012

[23] [23]

Approximating the value of two power proof systems, with applications to max 2sat and max dicut

FEIGE, Uriel; GOEMANS, Michel. Approximating the value of two power proof systems, with applications to max 2sat and max dicut. Proceedings third Israel sym- posium on the theory of computing and systems. IEEE,

work page

[24] [24]

Approx- imation algorithms for MAX-3-CUT and other prob- lems via complex semidefinite programming.Proceedings of the thirty-third annual ACM symposium on Theory of computing

GOEMANS, Michel X.; WILLIAMSON, David. Approx- imation algorithms for MAX-3-CUT and other prob- lems via complex semidefinite programming.Proceedings of the thirty-third annual ACM symposium on Theory of computing. 2001. p. 443-452

work page 2001

[25] [25]

Interior point methods in semidefi- nite programming with applications to combinatorial op- timization

ALIZADEH, Farid. Interior point methods in semidefi- nite programming with applications to combinatorial op- timization. SIAM journal on Optimization, 1995, vol. 5, no 1, p. 13-51

work page 1995

[26] [26]

Convex optimization

BOYD, Stephen P.; VANDENBERGHE, Lieven. Convex optimization. Cambridge university press, 2004

work page 2004

[27] [27]

MPFR: A multiple-precision binary floating-point library with correct rounding

FOUSSE, Laurent, et al. MPFR: A multiple-precision binary floating-point library with correct rounding. ACM Transactions on Mathematical Software (TOMS), 2007, vol. 33, no 2, p. 13-es

work page 2007

[28] [28]

A new stopping criterion for Krylov solvers applied in interior point meth- ods

ZANETTI, Filippo; GONDZIO, Jacek. A new stopping criterion for Krylov solvers applied in interior point meth- ods. SIAM Journal on Scientific Computing, 2023, vol. 45, no 2, p. A703-A728

work page 2023

[29] [29]

Interior methods for nonlinear optimization

FORSGREN, Anders; GILL, Philip E.; WRIGHT, Mar- garet H. Interior methods for nonlinear optimization. SIAM review, 2002, vol. 44, no 4, p. 525-597

work page 2002

[30] [30]

Iterative methods for solving lin- ear systems

GREENBAUM, Anne. Iterative methods for solving lin- ear systems. Society for Industrial and Applied Mathe- matics, 1997

work page 1997

[31] [31]

Guthmuller et al., ”Xvpfloat: RISC-V ISA Extension for Variable Extended Precision Floating Point Compu- tation,” in IEEE Transactions on Computers, vol

E. Guthmuller et al., ”Xvpfloat: RISC-V ISA Extension for Variable Extended Precision Floating Point Compu- tation,” in IEEE Transactions on Computers, vol. 73, no. 7, pp. 1683-1697, July 2024

work page 2024

[32] [32]

Fuguet, E

C. Fuguet, E. Guthmuller, A. Bocco, J. Fereyre, A. Evans and Y. Durand, ”A Variable and Extended Precision (VRP) Accelerator and its 22 nm SoC Implementation,” 2024 39th Conference on Design of Circuits and Inte- grated Systems (DCIS), Catania, Italy, 2024, pp. 1-6

work page 2024

[33] [33]

Guthmuller, C

E. Guthmuller, C. Fuguet, A. Bocco, J. Fereyre, A. Evans, Y. Durand, ”Variable and Extended Precision (VRP) Accelerator Implemented in a 22nm SoC,” 2025, Electronics Letters, 61 (1), art. no. e70255s

work page 2025

[34] [34]

Roofline: An insightful visual performance model for multicore ar- chitectures,

S. Williams, A. Waterman, and D. Patterson, “Roofline: An insightful visual performance model for multicore ar- chitectures,” Commun. ACM, vol. 52, no. 4, pp. 65–76, Apr. 2009

work page 2009

[35] [35]

β- approximate central path

K. Kim and M. -j. Park, ”Present and Future, Challenges of High Bandwith Memory (HBM),” 2024 IEEE Interna- tional Memory Workshop (IMW), Seoul, Korea, Republic of, 2024, pp. 1-4. 8 APPENDIX A. Primal-Dual Barrier Method for Semidefinite Programming By introducing the barrierµ, the original SDP cost function is modified with a log-determinant barrier func-...

work page 2024

[36] [36]

Spatial locality: successive memory accesses are generally done to close locations, most often con- secutive memory addresses

work page

[37] [37]

To exploit spatial locality, caches store blocks of data, calledcache linesorcache blocks, instead of a single vari- able

Temporal locality: a memory location has a high chance to be accessed multiple times during the execution of the program. To exploit spatial locality, caches store blocks of data, calledcache linesorcache blocks, instead of a single vari- able. The VXP accelerator first level caches implements 64B cache lines, which is typical for general purpose pro- ces...

work page