pith. sign in

arxiv: 2510.02863 · v1 · submitted 2025-10-03 · 💻 cs.AR · cs.DS· cs.NA· math.NA· quant-ph

A Hardware Accelerator for the Goemans-Williamson Algorithm

Pith reviewed 2026-05-18 10:29 UTC · model grok-4.3

classification 💻 cs.AR cs.DScs.NAmath.NAquant-ph
keywords Max-CutGoemans-Williamson algorithmsemidefinite programmingconjugate gradient methodhardware acceleratorextended precisioninterior point methods
0
0 comments X

The pith

Higher internal precision in conjugate gradient reduces time to solution for large Max-Cut SDP relaxations by a factor that grows with system size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows how extended floating point precision can be used in the conjugate gradient method for solving large instances of the semidefinite program that relaxes Max-Cut in the Goemans-Williamson algorithm. The approach is intended for hardware accelerators that support such precision natively. A sympathetic reader cares because it offers a way to obtain solutions with provable approximation guarantees more quickly when problems are too large for direct matrix methods. The estimate indicates that the speedup from reduced iterations grows as the system size increases.

Core claim

When using indirect matrix inversion methods like Conjugate Gradient, which have lower complexity than direct methods and are therefore used in very large problems, increasing the internal working precision reduces the time to solution by a factor that increases with the system size on a hardware architecture that runs natively on extended precision.

What carries the argument

Conjugate gradient solver using extended floating point precision inside interior point methods for the Goemans-Williamson semidefinite relaxation of Max-Cut.

If this is right

  • Time to solution decreases more significantly for larger systems.
  • The method enables practical use of worst-case guaranteed Max-Cut approximations in performance-critical scenarios.
  • Hardware designs can leverage native extended precision to translate iteration reductions into direct speedups.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The technique may apply to other large-scale convex optimization tasks that rely on iterative linear solvers.
  • Future hardware could integrate this precision strategy to improve classical solvers used alongside quantum optimization methods.

Load-bearing premise

The hardware can execute extended-precision arithmetic natively with negligible overhead outside the linear solver itself.

What would settle it

Build a prototype hardware accelerator and compare the actual runtime for solving Max-Cut SDP instances of increasing size using standard versus extended precision in the conjugate gradient step.

Figures

Figures reproduced from arXiv: 2510.02863 by D. A. Herrera-Mart\'i, E. Guthmuller, J. Fereyre.

Figure 2
Figure 2. Figure 2: FIG. 2. Relative improvement vs. problem size. We inte [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3. Ratio between the attained optimal value and the best [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4. Time spent in CG algorithm at each Newton step for [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIG. 5. Total time spent in CG for all problems and preci [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: FIG. 6. The Roofline model provides a visual representa [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
read the original abstract

The combinatorial problem Max-Cut has become a benchmark in the evaluation of local search heuristics for both quantum and classical optimisers. In contrast to local search, which only provides average-case performance guarantees, the convex semidefinite relaxation of Max-Cut by Goemans and Williamson, provides worst-case guarantees and is therefore suited to both the construction of benchmarks and in applications to performance-critic scenarios. We show how extended floating point precision can be incorporated in algebraic subroutines in convex optimisation, namely in indirect matrix inversion methods like Conjugate Gradient, which are used in Interior Point Methods in the case of very large problem sizes. Also, an estimate is provided of the expected acceleration of the time to solution for a hardware architecture that runs natively on extended precision. Specifically, when using indirect matrix inversion methods like Conjugate Gradient, which have lower complexity than direct methods and are therefore used in very large problems, we see that increasing the internal working precision reduces the time to solution by a factor that increases with the system size.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes a hardware accelerator for the Goemans-Williamson algorithm applied to Max-Cut. It focuses on incorporating extended floating-point precision into indirect solvers such as Conjugate Gradient within Interior Point Methods for the SDP relaxation, which are used at large problem sizes. The central claim is that raising internal working precision reduces iteration counts sufficiently that time-to-solution improves by a factor that grows with system size; an estimate of the resulting wall-clock acceleration is provided for a hardware architecture that executes extended-precision arithmetic natively.

Significance. If the unshown convergence model, iteration counts, and hardware-overhead analysis can be supplied and validated, the work would demonstrate a concrete precision-iteration tradeoff that improves scaling of iterative linear solvers in SDP-based combinatorial optimization. This could inform the design of specialized accelerators for large-scale convex relaxations, particularly where direct methods become prohibitive.

major comments (2)
  1. [Abstract] Abstract: the estimate of expected acceleration is stated without any convergence model, measured iteration counts for CG at different precisions, or error analysis showing how higher precision reduces iterations enough to produce a speedup factor that increases with system size. The central scaling claim therefore rests on unshown calculations.
  2. [Abstract] The manuscript supplies no cycle counts, area overhead figures, or micro-architectural description demonstrating that extended-precision operations can be realized with negligible overhead relative to the linear-algebra kernel. Without this, the translation from iteration reduction to wall-clock speedup remains an unverified assumption.
minor comments (1)
  1. [Abstract] The abstract refers to 'an estimate' of acceleration but does not indicate whether this estimate is derived analytically or from simulation; clarifying the derivation method would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. The points raised identify areas where additional supporting details would improve clarity and rigor. We address each major comment below and describe the changes we will incorporate in the revised version.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the estimate of expected acceleration is stated without any convergence model, measured iteration counts for CG at different precisions, or error analysis showing how higher precision reduces iterations enough to produce a speedup factor that increases with system size. The central scaling claim therefore rests on unshown calculations.

    Authors: We agree that the abstract, being a high-level summary, does not present the supporting convergence model, iteration counts, or error analysis. The estimate in the manuscript is based on the established dependence of CG convergence on numerical stability and condition number, where extended precision reduces rounding-error accumulation that grows with problem size in SDP relaxations. In the revision we will add a concise convergence model and error analysis (drawing on standard CG theory) to the abstract or a new subsection, and we will include available iteration-count data from our simulations to substantiate that the speedup factor increases with system size. revision: yes

  2. Referee: [Abstract] The manuscript supplies no cycle counts, area overhead figures, or micro-architectural description demonstrating that extended-precision operations can be realized with negligible overhead relative to the linear-algebra kernel. Without this, the translation from iteration reduction to wall-clock speedup remains an unverified assumption.

    Authors: We acknowledge that the manuscript does not supply explicit cycle counts, area-overhead estimates, or a micro-architectural description. The current estimate assumes native extended-precision execution with low overhead, consistent with known VLSI trade-offs for floating-point units. In the revised manuscript we will add a dedicated paragraph or short appendix providing high-level cycle-count estimates, area-overhead figures relative to standard FP units, and a brief micro-architectural sketch to justify the negligible-overhead assumption and strengthen the link to wall-clock speedup. revision: yes

Circularity Check

0 steps flagged

No circularity: claim follows from standard CG complexity scaling without reduction to fitted inputs or self-citations

full rationale

The paper derives the claim that higher internal precision reduces CG iteration count (and thus time-to-solution by a factor growing with system size) from the known dependence of conjugate-gradient convergence on condition number and working precision in large-scale SDP relaxations inside Goemans-Williamson. No equation, prediction, or estimate is obtained by fitting a parameter to a subset of data and then renaming the fit as a prediction; no load-bearing step invokes a self-citation whose authors overlap with the present work; and the hardware-overhead assumption is stated separately rather than smuggled into the algebraic derivation. The result is therefore self-contained against external numerical-linear-algebra benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard numerical assumption that higher working precision improves conjugate-gradient convergence for the linear systems arising in interior-point SDP solvers, plus the engineering premise that extended-precision hardware can be built with low overhead.

axioms (1)
  • domain assumption Higher internal floating-point precision reduces the iteration count of conjugate gradient when solving the Newton systems inside interior-point methods for the Max-Cut SDP.
    Invoked when the authors link increased precision to reduced time-to-solution that grows with system size.

pith-pipeline@v0.9.0 · 5725 in / 1218 out tokens · 49123 ms · 2026-05-18T10:29:11.611563+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

  1. [1]

    This RISC-V pro- cessor has been fitted with a variable and extended preci- sion floating point unit and a corresponding instruction set extension

    for extended precision computing. This RISC-V pro- cessor has been fitted with a variable and extended preci- sion floating point unit and a corresponding instruction set extension. It supports up to 512-bit floating point precision with performance depending on input and out- put precision. We have validated and measured this hard- FIG. 3. Ratio between ...

  2. [2]

    Combinatorial op- timization: theory and algorithms

    KORTE, Bernhard; VYGEN, Jens. Combinatorial op- timization: theory and algorithms. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008

  3. [3]

    GENDREAU, Michel, et al. (ed.). Handbook of meta- heuristics. New York: Springer, 2010

  4. [4]

    Mathemati- cal foundation of quantum annealing

    MORITA, Satoshi; NISHIMORI, Hidetoshi. Mathemati- cal foundation of quantum annealing. Journal of Mathe- matical Physics, 2008, vol. 49, no 12

  5. [5]

    Simulated quantum annealing can be exponentially faster than clas- sical simulated annealing

    CROSSON, Elizabeth; HARROW, Aram W. Simulated quantum annealing can be exponentially faster than clas- sical simulated annealing. 2016 IEEE 57th Annual Sym- posium on Foundations of Computer Science (FOCS). IEEE, 2016. p. 714-723

  6. [6]

    Perspectives of quantum anneal- ing: Methods and implementations

    HAUKE, Philipp, et al. Perspectives of quantum anneal- ing: Methods and implementations. Reports on Progress in Physics, 2020, vol. 83, no 5, p. 054401

  7. [7]

    A coherent Ising machine for 2000-node optimization problems

    INAGAKI, Takahiro, et al. A coherent Ising machine for 2000-node optimization problems. Science, 2016, vol. 354, no 6312, p. 603-606

  8. [8]

    Coherent Ising ma- chines—optical neural networks operating at the quan- tum limit

    YAMAMOTO, Yoshihisa, et al. Coherent Ising ma- chines—optical neural networks operating at the quan- tum limit. npj Quantum Information, 2017, vol. 3, no 1, p. 49

  9. [9]

    A time-division multiplex- ing Ising machine on FPGAs

    YAMAMOTO, Kasho, et al. A time-division multiplex- ing Ising machine on FPGAs. Proceedings of the 8th In- ternational Symposium on Highly Efficient Accelerators and Reconfigurable Technologies. 2017. p. 1-6

  10. [10]

    High-speed sparse Ising model on FPGA

    MINAMISAWA, Akira; IIMURA, Ryoma; KAWA- HARA, Takayuki. High-speed sparse Ising model on FPGA. 2019 IEEE 62nd International Midwest Sympo- sium on Circuits and Systems (MWSCAS). IEEE, 2019. p. 670-673

  11. [11]

    GPU-based Ising computing for solving max-cut combinatorial optimization problems

    COOK, Chase, et al. GPU-based Ising computing for solving max-cut combinatorial optimization problems. Integration, 2019, vol. 69, p. 335-344

  12. [12]

    FPGA-based simulated bifurcation machine

    TATSUMURA, Kosuke; DIXON, Alexander R.; GOTO, Hayato. FPGA-based simulated bifurcation machine. 29th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 2019. p. 59-66

  13. [13]

    Scaling out Ising machines using a multi-chip ar- chitecture for simulated bifurcation

    TATSUMURA, Kosuke; YAMASAKI, Masaya; GOTO, Hayato. Scaling out Ising machines using a multi-chip ar- chitecture for simulated bifurcation. Nature Electronics, 2021, vol. 4, no 3, p. 208-217

  14. [14]

    Application of digital annealer for faster combinatorial optimization

    SAO, Masataka, et al. Application of digital annealer for faster combinatorial optimization. Fujitsu Scientific and Technical Journal, 2019, vol. 55, no 2, p. 45-51

  15. [15]

    Ising-FPGA: A spintronics-based reconfigurable Ising model solver

    MONDAL, Ankit; SRIVASTAVA, Ankur. Ising-FPGA: A spintronics-based reconfigurable Ising model solver. ACM Transactions on Design Automation of Electronic Systems (TODAES), 2020, vol. 26, no 1, p. 1-27

  16. [16]

    100,000-spin coherent Ising machine

    HONJO, Toshimori, et al. 100,000-spin coherent Ising machine. Science advances, 2021, vol. 7, no 40, p. eabh0952

  17. [17]

    Ising machines as hardware solvers of combinatorial 7 optimization problems

    MOHSENI, Naeimeh; MCMAHON, Peter L.; BYRNES, Tim. Ising machines as hardware solvers of combinatorial 7 optimization problems. Nature Reviews Physics, 2022, vol. 4, no 6, p. 363-379

  18. [18]

    All-to-all reconfigurability with sparse and higher-order Ising machines

    NIKHAR, Srijan, et al. All-to-all reconfigurability with sparse and higher-order Ising machines. Nature Commu- nications, 2024, vol. 15, no 1, p. 8977

  19. [19]

    879- approximation algorithms for max cut and max 2sat

    GOEMANS, Michel X.; WILLIAMSON, David P. . 879- approximation algorithms for max cut and max 2sat. Proceedings of the Twenty-sixth Annual ACM Sympo- sium on Theory of Computing. 1994. p. 422-431

  20. [20]

    Im- proved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming

    GOEMANS, Michel X.; WILLIAMSON, David P. Im- proved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM (JACM), 1995, vol. 42, no 6, p. 1115-1145

  21. [21]

    Obstacles to variational quantum optimization from symmetry protection

    BRAVYI, Sergey, et al. Obstacles to variational quantum optimization from symmetry protection. Physical review letters, 2020, vol. 125, no 26, p. 260505

  22. [22]

    On Khot’s unique games conjecture

    TREVISAN, Luca. On Khot’s unique games conjecture. Bulletin of the American Mathematical Society, 2012, vol. 49, no 1, p. 91-111

  23. [23]

    Approximating the value of two power proof systems, with applications to max 2sat and max dicut

    FEIGE, Uriel; GOEMANS, Michel. Approximating the value of two power proof systems, with applications to max 2sat and max dicut. Proceedings third Israel sym- posium on the theory of computing and systems. IEEE,

  24. [24]

    Approx- imation algorithms for MAX-3-CUT and other prob- lems via complex semidefinite programming.Proceedings of the thirty-third annual ACM symposium on Theory of computing

    GOEMANS, Michel X.; WILLIAMSON, David. Approx- imation algorithms for MAX-3-CUT and other prob- lems via complex semidefinite programming.Proceedings of the thirty-third annual ACM symposium on Theory of computing. 2001. p. 443-452

  25. [25]

    Interior point methods in semidefi- nite programming with applications to combinatorial op- timization

    ALIZADEH, Farid. Interior point methods in semidefi- nite programming with applications to combinatorial op- timization. SIAM journal on Optimization, 1995, vol. 5, no 1, p. 13-51

  26. [26]

    Convex optimization

    BOYD, Stephen P.; VANDENBERGHE, Lieven. Convex optimization. Cambridge university press, 2004

  27. [27]

    MPFR: A multiple-precision binary floating-point library with correct rounding

    FOUSSE, Laurent, et al. MPFR: A multiple-precision binary floating-point library with correct rounding. ACM Transactions on Mathematical Software (TOMS), 2007, vol. 33, no 2, p. 13-es

  28. [28]

    A new stopping criterion for Krylov solvers applied in interior point meth- ods

    ZANETTI, Filippo; GONDZIO, Jacek. A new stopping criterion for Krylov solvers applied in interior point meth- ods. SIAM Journal on Scientific Computing, 2023, vol. 45, no 2, p. A703-A728

  29. [29]

    Interior methods for nonlinear optimization

    FORSGREN, Anders; GILL, Philip E.; WRIGHT, Mar- garet H. Interior methods for nonlinear optimization. SIAM review, 2002, vol. 44, no 4, p. 525-597

  30. [30]

    Iterative methods for solving lin- ear systems

    GREENBAUM, Anne. Iterative methods for solving lin- ear systems. Society for Industrial and Applied Mathe- matics, 1997

  31. [31]

    Guthmuller et al., ”Xvpfloat: RISC-V ISA Extension for Variable Extended Precision Floating Point Compu- tation,” in IEEE Transactions on Computers, vol

    E. Guthmuller et al., ”Xvpfloat: RISC-V ISA Extension for Variable Extended Precision Floating Point Compu- tation,” in IEEE Transactions on Computers, vol. 73, no. 7, pp. 1683-1697, July 2024

  32. [32]

    Fuguet, E

    C. Fuguet, E. Guthmuller, A. Bocco, J. Fereyre, A. Evans and Y. Durand, ”A Variable and Extended Precision (VRP) Accelerator and its 22 nm SoC Implementation,” 2024 39th Conference on Design of Circuits and Inte- grated Systems (DCIS), Catania, Italy, 2024, pp. 1-6

  33. [33]

    Guthmuller, C

    E. Guthmuller, C. Fuguet, A. Bocco, J. Fereyre, A. Evans, Y. Durand, ”Variable and Extended Precision (VRP) Accelerator Implemented in a 22nm SoC,” 2025, Electronics Letters, 61 (1), art. no. e70255s

  34. [34]

    Roofline: An insightful visual performance model for multicore ar- chitectures,

    S. Williams, A. Waterman, and D. Patterson, “Roofline: An insightful visual performance model for multicore ar- chitectures,” Commun. ACM, vol. 52, no. 4, pp. 65–76, Apr. 2009

  35. [35]

    β- approximate central path

    K. Kim and M. -j. Park, ”Present and Future, Challenges of High Bandwith Memory (HBM),” 2024 IEEE Interna- tional Memory Workshop (IMW), Seoul, Korea, Republic of, 2024, pp. 1-4. 8 APPENDIX A. Primal-Dual Barrier Method for Semidefinite Programming By introducing the barrierµ, the original SDP cost function is modified with a log-determinant barrier func-...

  36. [36]

    Spatial locality: successive memory accesses are generally done to close locations, most often con- secutive memory addresses

  37. [37]

    To exploit spatial locality, caches store blocks of data, calledcache linesorcache blocks, instead of a single vari- able

    Temporal locality: a memory location has a high chance to be accessed multiple times during the execution of the program. To exploit spatial locality, caches store blocks of data, calledcache linesorcache blocks, instead of a single vari- able. The VXP accelerator first level caches implements 64B cache lines, which is typical for general purpose pro- ces...