Global optimization tailored for graphics processing units: Complete and rigorous search for large-scale nonlinear minimization
Pith reviewed 2026-05-19 06:37 UTC · model grok-4.3
The pith
A GPU-tailored interval method encloses the guaranteed global minimum of nonlinear functions up to 10,000 dimensions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a novel GPU-based single-program single-data parallel programming style together with variable cycling enables an interval-analysis procedure to completely search the feasible domain and enclose the guaranteed global minimum for eleven scalable benchmark functions, including Ackley, Griewank, Levy, Rastrigin, and Rosenbrock, at dimensions up to 10,000.
What carries the argument
The GPU-based single-program single-data parallel programming style combined with variable cycling, which drives the interval pruning loop without encountering major performance bottlenecks.
Load-bearing premise
The custom GPU parallel style and variable cycling must bypass standard performance limits enough for the interval pruning to finish in reasonable time at 10,000 dimensions.
What would settle it
Run the procedure on the 10,000-dimensional Rosenbrock function and check whether the final output intervals contain the known minimum value of zero while excluding all regions that do not contain it.
read the original abstract
This paper introduces a numerical method to enclose the global minimum of a nonlinear function subject to simple bounds on the variables. Using interval analysis, coupled with the computational power and architecture of graphics processing units (GPUs), the method iteratively rules out the regions in the search domain where the global minimum cannot exist and leaves a finite set of regions where the global minimum must exist. For effectiveness, because of the rigor of interval analysis, the method is guaranteed to enclose the global minimum even in the presence of rounding errors. For efficiency, the method employs a novel GPU-based single program, single data parallel programming style to circumvent major GPU performance bottlenecks, and a variable cycling technique is also integrated into the method to reduce computational cost when minimizing large-scale nonlinear functions. The method is validated by minimizing 11 benchmark test functions with scalable dimensions, including the well-known Ackley function, Griewank function, Levy function, Rastrigin function, and Rosenbrock function. These benchmark test functions represent grand challenges of global optimization, and enclosing the guaranteed global minimum of these benchmark test functions with more than 80 dimensions has not been reported in the literature. Our method completely searches the feasible domain and successfully encloses the guaranteed global minimum of these 11 benchmark test functions with up to 10,000 dimensions using only one GPU in a reasonable computation time, far exceeding the reported results in the literature due to the unique method design and implementation based on GPU architecture.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a GPU-accelerated interval-analysis algorithm for rigorous global minimization of nonlinear bound-constrained problems. It introduces a single-program single-data (SPSD) parallel style together with variable cycling to prune the domain and produce a finite set of boxes guaranteed to contain the global minimum. The method is demonstrated on 11 standard benchmark functions (Ackley, Griewank, Levy, Rastrigin, Rosenbrock, etc.) with dimensions scaled up to 10 000, claiming complete domain search and enclosure of the guaranteed minimum on a single GPU in reasonable time.
Significance. If the enclosure property is preserved under the novel GPU implementation, the result would be significant: interval methods have rarely been reported to enclose guaranteed global minima for these benchmarks beyond roughly 80 dimensions. Successful scaling to 10 000 dimensions while retaining rigor would constitute a substantial advance in practical, verified global optimization and could open new applications in high-dimensional non-convex problems.
major comments (2)
- [Section 3 (GPU implementation and interval arithmetic)] The central claim of guaranteed enclosure rests on correct outward rounding in interval arithmetic. The manuscript does not describe how directed rounding (to ±∞) is realized within the claimed SPSD programming model on GPU hardware, where standard operations default to round-to-nearest. Without explicit use of intrinsics, special libraries, or verified emulation, the computed intervals may fail to be valid enclosures, undermining the pruning step and the assertion that the true global minimum cannot be discarded.
- [Section 5 (Numerical results)] No tables or figures report enclosure widths, CPU/GPU timings, or success rates for the 10 000-dimensional instances. The abstract states “reasonable computation time,” but without quantitative data or comparison against existing interval or stochastic solvers at comparable dimensions, the scalability claim cannot be assessed.
minor comments (2)
- [Abstract and §1] The abstract and introduction repeatedly use the phrase “completely searches the feasible domain”; a precise statement of what is meant by “complete” (exhaustive enumeration of all surviving boxes versus a single enclosing box) would improve clarity.
- [Section 2] Notation for interval operations and the variable-cycling schedule is introduced without a compact summary table; adding such a table would aid readers unfamiliar with interval global optimization.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive report. The comments highlight important points for strengthening the clarity of the implementation details and the quantitative presentation of results. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Section 3 (GPU implementation and interval arithmetic)] The central claim of guaranteed enclosure rests on correct outward rounding in interval arithmetic. The manuscript does not describe how directed rounding (to ±∞) is realized within the claimed SPSD programming model on GPU hardware, where standard operations default to round-to-nearest. Without explicit use of intrinsics, special libraries, or verified emulation, the computed intervals may fail to be valid enclosures, undermining the pruning step and the assertion that the true global minimum cannot be discarded.
Authors: We agree that an explicit description of the directed-rounding mechanism is necessary to fully substantiate the enclosure guarantees. Our GPU implementation relies on CUDA intrinsics such as __fadd_ru, __fadd_rd, __fmul_ru and __fmul_rd (together with the corresponding operations for division and other elementary functions) to enforce outward rounding within the SPSD kernel. These intrinsics are invoked directly inside the interval-arithmetic primitives that drive the pruning step. To make this transparent to readers, we will insert a new subsection in Section 3 that (i) lists the specific intrinsics employed, (ii) shows the corresponding inline device functions, and (iii) explains why the SPSD data layout does not interfere with the required rounding mode. We will also add a short verification paragraph confirming that all interval operations remain valid enclosures under these intrinsics. revision: yes
-
Referee: [Section 5 (Numerical results)] No tables or figures report enclosure widths, CPU/GPU timings, or success rates for the 10 000-dimensional instances. The abstract states “reasonable computation time,” but without quantitative data or comparison against existing interval or stochastic solvers at comparable dimensions, the scalability claim cannot be assessed.
Authors: We acknowledge that the current numerical-results section would be strengthened by additional quantitative detail at the highest dimensions. While the manuscript already demonstrates successful termination with a finite set of enclosing boxes for all 11 benchmarks up to dimension 10 000, we did not tabulate enclosure widths, wall-clock times, or per-instance success metrics at that scale. We will revise Section 5 to include (i) a table of final enclosure widths and GPU execution times for the 10 000-dimensional cases, (ii) a column reporting the number of boxes retained at termination, and (iii) a brief comparison paragraph that places our timings against the best published rigorous results at lower dimensions (noting that no other interval method has been reported to succeed at 10 000 dimensions). These additions will allow readers to assess both the computational cost and the practical scalability of the approach. revision: yes
Circularity Check
No circularity: algorithmic validation on independent external benchmarks
full rationale
The paper describes a GPU-accelerated interval analysis algorithm for enclosing global minima of nonlinear functions. Its central claim rests on applying established interval arithmetic properties (which guarantee enclosure in the presence of rounding errors) to standard, independently defined benchmark functions such as Ackley, Griewank, Levy, Rastrigin, and Rosenbrock. These test cases and the underlying interval analysis framework are external to the paper and not derived from or fitted to the method's own outputs or parameters. No equations, predictions, or uniqueness claims reduce by construction to self-definitions, self-citations, or renamed inputs; the implementation details (SPSD style and variable cycling) are presented as engineering choices whose correctness is assessed via external benchmarks rather than internal tautologies.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Interval arithmetic produces rigorous enclosures of function ranges even in the presence of rounding errors
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
method employs a novel GPU-based single program, single data parallel programming style ... variable cycling technique ... interval analysis ... guaranteed to enclose the global minimum even in the presence of rounding errors
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Neumaier, A.: Complete search in continuous global optimization and constraint satisfaction. Acta Numerica 13, pp. 271-369 (2004)
work page 2004
-
[2]
Amari, S.: Backpropagation and stochastic gradient descent method. Neurocomputing 5(4-5), pp. 185-196 (1993)
work page 1993
-
[3]
SIAM Journal on Scientific Computing 22(4), pp
Notay, Y.: Flexible conjugate gradients. SIAM Journal on Scientific Computing 22(4), pp. 1444-1460 (2000)
work page 2000
-
[4]
Potra, F. A., Wright, S. J.: Interior-point methods. Journal of Computational and Applied Mathematics 124(1-2), pp. 281-302 (2000)
work page 2000
-
[5]
Bulletin of the American Mathematical Society 42(1), pp
Wright, M.: The interior-point revolution in optimization: History, recent developments, and lasting consequences. Bulletin of the American Mathematical Society 42(1), pp. 39-56 (2005)
work page 2005
-
[6]
Conn, A. R., Gould, N. I., Toint, P. L.: Trust Region Methods. SIAM, Philadelphia (2000)
work page 2000
-
[7]
Mathematical Programming 151(1), pp
Yuan, Y.: Recent advances in trust region algorithms. Mathematical Programming 151(1), pp. 249-281 (2015)
work page 2015
-
[8]
C., Nocedal, J.: On the limited memory BFGS method for large scale optimization
Liu, D. C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Mathematical Programming 45(1), pp. 503-528 (1989)
work page 1989
-
[9]
Zhu, C., Byrd, R. H., Lu, P., Nocedal, J.: Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Transactions on Mathematical Software 23(4), pp. 550-560 (1997)
work page 1997
-
[10]
Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, Cambridge (1998)
work page 1998
-
[11]
Kirkpatrick, S., Gelatt, C. D., Vecchi, M. P.: Optimization by simulated annealing. Science 220(4598), pp. 671-680 (1983)
work page 1983
-
[12]
In: Proceedings of International Conference on Neural Networks, pp
Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of International Conference on Neural Networks, pp. 1942-1948. IEEE (1995)
work page 1942
-
[13]
ORSA Journal on Computing 1(3), pp
Glover, F.: Tabu search - Part I. ORSA Journal on Computing 1(3), pp. 190-206 (1989)
work page 1989
-
[14]
ORSA Journal on Computing 2(1), pp
Glover, F.: Tabu search - Part II. ORSA Journal on Computing 2(1), pp. 4-32 (1990)
work page 1990
-
[15]
Nocedal, J., Wright, S. J.: Numerical Optimization. Springer, New York (2006)
work page 2006
-
[16]
Press, W. H., Teukolsky, S. A., Vetterling, W. T., Flannery, B. P.: Numerical Recipes: The Art of Scientific Computing. Cambridge University Press, Cambridge (2007)
work page 2007
- [17]
-
[18]
Moore, R. E., Kearfott, R. B., Cloud, M. J.: Introduction to Interval Analysis. SIAM, Philadelphia (2009)
work page 2009
-
[19]
W.: Global Optimization Using Interval Analysis: Revised and Expanded
Hansen, E., Walster, G. W.: Global Optimization Using Interval Analysis: Revised and Expanded. Marcel Dekker, New York (2004)
work page 2004
-
[20]
B.: Rigorous Global Search: Continuous Problems
Kearfott, R. B.: Rigorous Global Search: Continuous Problems. Springer, New York (1996)
work page 1996
-
[21]
Van Hentenryck, P., Michel, L., Deville, Y.: Numerica: A Modeling Language for Global Optimization. Cambridge, MIT Press (1997)
work page 1997
-
[22]
Floudas, C. A., Gounaris, C. E.: 2009, A review of recent advances in global optimization. Journal of Global Optimization 45, pp. 3-38 (2009)
work page 2009
-
[23]
Nickolls, J., Dally, W. J.: The GPU computing era. IEEE Micro 30(2), pp. 56-69 (2010)
work page 2010
-
[24]
B., El Hajj, I.: Programming Massively Parallel Processors: A Hands-On Approach
Hwu W., Kirk, D. B., El Hajj, I.: Programming Massively Parallel Processors: A Hands-On Approach. Cambridge, Elsevier (2023)
work page 2023
- [25]
-
[26]
Kulisch, U. W., Miranker, W. L.: Computer Arithmetic in Theory and Practice. Academic Press, New York (1981) 35
work page 1981
-
[27]
Cambridge University Press, Cambridge (1990)
Neumaier, A.: Interval Methods for Systems of Equations. Cambridge University Press, Cambridge (1990)
work page 1990
-
[28]
Halsted Press, New York (1984)
Ratschek, H., Rokne, J.: Computer Methods for the Range of Functions. Halsted Press, New York (1984)
work page 1984
-
[29]
Einarsson, B.: Accuracy and Reliability in Scientific Computing. SIAM, Philadelphia (2005)
work page 2005
-
[30]
Falk, J. E., Soland, R. M.: An algorithm for separable nonconvex programming problems. Management Science 15(9), pp. 550-569 (1969)
work page 1969
-
[31]
Application to concave minimization and D.C
Tuy, H., Horst, R.: Convergence and restart in branch-and-bound algorithms for global optimization. Application to concave minimization and D.C. optimization problems. Mathematical Programming 41, pp. 161-183 (1988)
work page 1988
-
[32]
Sotiropoulos, D. G., Grapsa, T. N.: A branch-and-prune method for global optimization: The univariate case. In: Krämer, W., von Gudenberg, J.W. (eds) Scientific Computing, Validated Numerics, Interval Methods, pp. 215-226. Springer, Boston (2001)
work page 2001
-
[33]
Ryoo, H. S., Sahinidis, N. V.: A branch-and-reduce approach to global optimization. Journal of Global Optimization 8, pp. 107-138 (1996)
work page 1996
-
[34]
V.: Global optimization and constraint satisfaction: The branch-and-reduce approach
Sahinidis, N. V.: Global optimization and constraint satisfaction: The branch-and-reduce approach. In: Proceedings of International Workshop on Global Optimization and Constraint Satisfaction, pp. 1-16. Springer (2002)
work page 2002
-
[35]
F.: Revising hull and box consistency
Benhamou, F., Goualard, F., Granvilliers, L., Puget, J. F.: Revising hull and box consistency. In: Proceedings of the 1999 International Conference on Logic Programming, pp. 230-244. IEEE (1999)
work page 1999
-
[36]
Dongarra, J., Luszczek, P.: TOP500. In: Padua, D. (eds.), Encyclopedia of Parallel Computing, pp. 2055-2057. Springer, Boston, (2011)
work page 2055
-
[37]
D., Houston, M., Luebke, D., Green, S., Stone, J
Owens, J. D., Houston, M., Luebke, D., Green, S., Stone, J. E., Phillips, J. C.: GPU computing. Proceedings of the IEEE 96(5), pp. 879-899 (2008)
work page 2008
-
[38]
Luebke, D.: CUDA: Scalable parallel programming for high-performance scientific computing. In: Proceedings of the 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pp. 836-838. IEEE (2008)
work page 2008
-
[39]
Addison-Wesley Professional, Boston (2005)
Pharr, M., Fernando, R.: GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation. Addison-Wesley Professional, Boston (2005)
work page 2005
-
[40]
J., Blanton, M.: Algorithms and Theory of Computation Handbook: Special Topics and Techniques
Atallah, M. J., Blanton, M.: Algorithms and Theory of Computation Handbook: Special Topics and Techniques. CRC Press, Boca Raton (2009)
work page 2009
-
[41]
J.: Some computer organizations and their effectiveness
Flynn, M. J.: Some computer organizations and their effectiveness. IEEE Transactions on Computers 100(9), pp. 948-960 (1972)
work page 1972
-
[42]
Brodtkorb, A. R., Hagen, T. R., Sætra, M. L.: Graphics processing unit (GPU) programming strategies and trends in GPU computing. Journal of Parallel and Distributed Computing 73(1), pp. 4-13 (2013)
work page 2013
-
[43]
Morris, M. D., Mitchell, T. J.: Exploratory designs for computational experiments. Journal of Statistical Planning and Inference 43(3), pp. 381-402 (1995)
work page 1995
-
[44]
G.: Design of computer experiments: Space filling and beyond
Pronzato, L., Müller, W. G.: Design of computer experiments: Space filling and beyond. Statistics and Computing 22(3), pp. 681-701 (2012)
work page 2012
-
[45]
Ackley, D.: A Connectionist Machine for Genetic Hillclimbing. Springer, Boston (1987)
work page 1987
-
[46]
P.: An overview of evolutionary algorithms for parameter optimization
Bäck, T., Schwefel, H. P.: An overview of evolutionary algorithms for parameter optimization. Evolutionary Computation 1(1), pp. 1-23 (1993) 36
work page 1993
-
[47]
International Journal of Mathematical Modelling and Numerical Optimisation 4(2), pp
Jamil, M., Yang, X.: A literature survey of benchmark functions for global optimisation problems. International Journal of Mathematical Modelling and Numerical Optimisation 4(2), pp. 150-194 (2013)
work page 2013
-
[48]
Belegundu, A. D., Chandrupatla, T. R.: Optimization Concepts and Applications in Engineering. Cambridge University Press, Cambridge (2019)
work page 2019
-
[49]
Mathematical Programming 58(1), pp
Breiman, L., Cutler, A.: A deterministic algorithm for global optimization. Mathematical Programming 58(1), pp. 179-199 (1993)
work page 1993
-
[50]
Fu, M. C., Hu, J., Marcus, S. I.: Model-based randomized methods for global optimization. In: Proceedings of the 17th International Symposium on Mathematical Theory of Networks and Systems, pp. 355-363. Elsevier (2006)
work page 2006
-
[51]
O.: Generalized descent for global optimization
Griewank, A. O.: Generalized descent for global optimization. Journal of Optimization Theory and Applications 34, pp. 11-39 (1981)
work page 1981
-
[52]
V., Montalvo, A.: The tunneling algorithm for the global minimization of functions
Levy, A. V., Montalvo, A.: The tunneling algorithm for the global minimization of functions. SIAM Journal on Scientific and Statistical Computing 6(1), pp. 15-29 (1985)
work page 1985
-
[53]
Rastrigin, L. A.: Extremal control systems. In: Theoretical Foundations of Engineering Cybernetics Series. Nauka, Moscow (1974)
work page 1974
-
[54]
In: Proceedings of International Conference on Parallel Problem Solving from Nature, pp
Hoffmeister, F., Bäck, T.: Genetic algorithms and evolution strategies: Similarities and differences. In: Proceedings of International Conference on Parallel Problem Solving from Nature, pp. 455-469. Springer, Boston (1990)
work page 1990
-
[55]
A survey of some theoretical and practical aspects of genetic algorithms
Salomon, R., Re-evaluating genetic algorithm performance under coordinate rotation of benchmark functions. A survey of some theoretical and practical aspects of genetic algorithms. BioSystems 39(3), pp. 263-278 (1996)
work page 1996
-
[56]
Styblinski, M., Tang, T.: Experiments in nonconvex optimization: Stochastic approximation with function smoothing and simulated annealing. Neural Networks 3(4), pp. 467-483 (1990)
work page 1990
-
[57]
Zabinsky, Z. B., Graesser, D. L., Tuttle, M. E., Kim, G.: Global optimization of composite laminates using improving hit and run. In: Floudas, C. A., Pardalos, P. M. (eds.) Recent Advances in Global Optimization, pp. 343-368. Princeton University Press, Princeton (1992)
work page 1992
-
[58]
M., Khompatraporn, C., Zabinsky, Z
Ali, M. M., Khompatraporn, C., Zabinsky, Z. B.: A numerical evaluation of several stochastic algorithms on selected continuous global optimization test problems. Journal of Global Optimization 31, pp. 635-672 (2005)
work page 2005
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.