GPU accelerated variant of Schroeppel-Shamir's algorithm for solving the market split problem

Nils-Christian Kempke; Thorsten Koch

arxiv: 2507.05045 · v3 · submitted 2025-07-07 · 🧮 math.OC

GPU accelerated variant of Schroeppel-Shamir's algorithm for solving the market split problem

Nils-Christian Kempke , Thorsten Koch This is my paper

Pith reviewed 2026-05-19 06:08 UTC · model grok-4.3

classification 🧮 math.OC

keywords market split problemSchroeppel-Shamir algorithmGPU accelerationsubset sumbinary integer programmingfeasibilitycombinatorial optimization

0 comments

The pith

A hybrid CPU-GPU implementation of a Schroeppel-Shamir variant solves market split feasibility instances with up to 10 constraints and 90 variables.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a new method for the market split problem by adapting Schroeppel-Shamir's algorithm from subset sum. It exhaustively generates one-dimensional solutions on the CPU and uses the GPU to test those candidates against all constraints at once. This hybrid approach makes it possible to handle instances that standard solvers struggle with. Readers interested in hard combinatorial problems would care because it provides a practical way to solve larger cases in reasonable time.

Core claim

By deriving a feasibility solver for the market split problem from Schroeppel-Shamir's meet-in-the-middle technique for one-dimensional subset sum and accelerating the multi-constraint verification step with GPUs, the algorithm can solve instances of size (9,80) in less than fifteen minutes and (10,90) in up to one day.

What carries the argument

Exhaustive enumeration of one-dimensional solutions combined with GPU batch evaluation of candidate solutions across multiple constraints.

If this is right

Standard linear programming branch-and-cut solvers perform poorly on the market split problem compared to this approach.
Instances with nine constraints and eighty variables become solvable in under fifteen minutes.
Instances with ten constraints and ninety variables become solvable in up to one day.
The method scales to larger feasibility versions than previously practical.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This technique of splitting the problem into one-dimensional enumerations might apply to other multi-dimensional subset sum variants.
Further speedups could come from optimizing the GPU kernel or using multiple GPUs for even larger instances.
Extending the approach to the optimization version of the market split problem could be a natural next step.

Load-bearing premise

That the number of one-dimensional solutions stays small enough and the GPU can evaluate them in batches without running out of memory or time for sizes up to ten constraints and ninety variables.

What would settle it

An instance with ten constraints and ninety variables that either takes more than one day or causes the GPU to run out of memory when using this algorithm.

read the original abstract

The market split problem (MSP), introduced by Cornuejols and Dawande (1998), is a challenging binary optimization problem that performs poorly on state-of-the-art linear programming-based branch-and-cut solvers. We present a novel algorithm for solving the feasibility version of this problem, derived from Schroeppel-Shamir's algorithm for the one-dimensional subset sum problem. Our approach is based on exhaustively enumerating one-dimensional solutions of MSP and utilizing GPUs to evaluate candidate solutions across the entire problem. The resulting hybrid CPU-GPU implementation efficiently solves instances with up to 10 constraints and 90 variables. We demonstrate the algorithm's performance on benchmark problems, solving instances of size (9, 80) in less than fifteen minutes and (10, 90) in up to one day.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adapts Schroeppel-Shamir enumeration to the multi-constraint market split problem and adds GPU batch checks, with reported practical runtimes up to (10,90) that standard solvers miss.

read the letter

Hey, the core of this paper is a hybrid CPU-GPU solver for the feasibility version of the market split problem. They reuse Schroeppel-Shamir's meet-in-the-middle enumeration on one constraint to generate candidate solutions, then shift the remaining nine constraints to the GPU for fast batch evaluation. The headline result is that this handles instances with 10 constraints and 90 variables, solving (9,80) in under fifteen minutes and (10,90) in up to a day on benchmark sets where branch-and-cut stalls.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a GPU-accelerated variant of Schroeppel-Shamir's meet-in-the-middle algorithm for the feasibility version of the market split problem. It claims that the hybrid CPU-GPU implementation solves benchmark instances with up to 10 constraints and 90 variables, specifically reporting that (9,80) instances are solved in less than fifteen minutes and (10,90) instances in up to one day.

Significance. If the scaling claims hold with verifiable candidate-set sizes and no hidden exponential costs in the multi-constraint checks, the work would offer a practical advance for a class of binary optimization problems that perform poorly under standard LP-based solvers, by combining classical enumeration with GPU batch evaluation.

major comments (2)

[Abstract] Abstract: The performance numbers for (9,80) and (10,90) instances are stated without any reported bound, measurement, or analysis of the size of the one-dimensional candidate sets produced by the Schroeppel-Shamir enumeration; this quantity is load-bearing for assessing whether GPU batch evaluation remains feasible at n=90 without memory blow-up.
[§3 (Algorithm)] §3 (Algorithm): The description of extending the one-dimensional partial solutions to the remaining constraints via GPU evaluation provides no details on the data structures (hashing, sorting, or batching) used for consistency checks or any empirical count of surviving candidates, leaving open the possibility that exponential cost is reintroduced when the number of constraints reaches 10.

minor comments (2)

[§2] The notation distinguishing the one-dimensional subset-sum subproblems from the full multi-constraint MSP could be introduced more explicitly in the background section to improve readability.
[§4] Runtime tables or figures should report the number of runs, hardware specifications (GPU model, memory), and any observed variance to support the claimed wall-clock times.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. We agree that additional details on candidate-set sizes and GPU data structures will improve clarity and verifiability. We address the major comments below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The performance numbers for (9,80) and (10,90) instances are stated without any reported bound, measurement, or analysis of the size of the one-dimensional candidate sets produced by the Schroeppel-Shamir enumeration; this quantity is load-bearing for assessing whether GPU batch evaluation remains feasible at n=90 without memory blow-up.

Authors: We agree that explicit bounds and measurements of the one-dimensional candidate-set sizes are necessary to substantiate the reported runtimes and GPU feasibility. In the revised manuscript we will add a new paragraph (or subsection) reporting both theoretical bounds (O(2^{n/2}) per half) and empirical sizes observed for the (9,80) and (10,90) benchmark instances, together with peak GPU memory usage during batch evaluation. This will confirm that the candidate sets remain within practical GPU memory limits at n=90. revision: yes
Referee: [§3 (Algorithm)] §3 (Algorithm): The description of extending the one-dimensional partial solutions to the remaining constraints via GPU evaluation provides no details on the data structures (hashing, sorting, or batching) used for consistency checks or any empirical count of surviving candidates, leaving open the possibility that exponential cost is reintroduced when the number of constraints reaches 10.

Authors: We acknowledge that the current description of the multi-constraint filtering step is insufficiently detailed. In the revision we will expand §3 with a precise account of the data structures and algorithms used on the GPU: sorted arrays of partial sums for the first constraint, followed by batched parallel prefix-sum and binary-search checks for the remaining constraints. We will also report empirical counts of candidates that survive each successive constraint check for the tested instances, demonstrating that the filtering remains sub-exponential in practice. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical runtime reporting on external algorithm variant

full rationale

The paper presents a hybrid CPU-GPU implementation derived from the established Schroeppel-Shamir algorithm for subset sum, with performance claims based on measured runtimes for benchmark instances up to (10,90). No equations, fitted parameters, or self-referential derivations appear; the approach relies on exhaustive enumeration of one-dimensional solutions followed by GPU batch checks, reported as direct computational results rather than predictions that reduce to inputs by construction. The central claims are self-contained empirical observations on an external base algorithm with no load-bearing self-citations or ansatz smuggling.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the correctness of the original Schroeppel-Shamir enumeration for one dimension and the assumption that GPU batch verification can be performed without false negatives or excessive overhead. No free parameters or invented entities are introduced in the abstract.

axioms (1)

standard math Schroeppel-Shamir's algorithm correctly enumerates all one-dimensional subset sum solutions in the stated time bounds.
Invoked when the paper states the approach is derived from Schroeppel-Shamir's algorithm for the one-dimensional subset sum problem.

pith-pipeline@v0.9.0 · 5663 in / 1377 out tokens · 52898 ms · 2026-05-19T06:08:53.081990+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

Aardal et al. (1999). Market Split and Basis Reduction: T o- wards a Solution of the Cornu ´ejols-Dawande Instances. In Lecture Notes in Computer Science (pp. 1–16). Springer Berlin Heidelberg. https://doi.org/10.1007/3-540-48777-8_1

work page doi:10.1007/3-540-48777-8_1 1999
[2]

Cornu ´ejols, G., & Dawande, M. (1998). A Class of Hard Small 0–1 Pro- grams. In Lecture Notes in Computer Science (pp. 284–293). Springer Berlin Heidelberg. https://doi.org/10.1007/3-540-69346-7_22

work page doi:10.1007/3-540-69346-7_22 1998
[3]

Gurobi Optimization, LLC. (2023). Gurobi (Version 11). https://www.gurobi.com

work page 2023
[4]

Horowitz, E., & Sahni, S. (1974). Computing Partitions w ith Applica- tions to the Knapsack Problem. Journal of the ACM , 21(2), 277–292. https://doi.org/10.1145/321812.321823

work page doi:10.1145/321812.321823 1974
[5]

Karp, R. M. (1972). Reducibility among Combinatorial Pr oblems. In Complexity of Computer Computations (pp. 85–103). Springer US. https://doi.org/10.1007/978-1-4684-2001-2_9

work page doi:10.1007/978-1-4684-2001-2_9 1972
[6]

Koch et al. (2025). Quantum Optimization Benchmark Li- brary – The Intractable Decathlon (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2504.03832

work page doi:10.48550/arxiv.2504.03832 2025
[7]

Schroeppel, R., & Shamir, A. (1981). A /u1D447 = /u1D442( 2/u1D45B/ 2) , /u1D446 = /u1D442( 2/u1D45B/ 4) Al- gorithm for Certain NP-Complete Problems. SIAM Journal on Computing , 10(3), 456–464. https://doi.org/10.1137/0210033

work page doi:10.1137/0210033 1981
[8]

Vogel, H. (2012). Solving market split problems with heu ristical lattice reduction. Annals of Operations Research , 196(1), 581–590. https://doi.org/10.1007/s10479-012-1143-0

work page doi:10.1007/s10479-012-1143-0 2012
[9]

Wang et al. (2009). Solving the market split problem via b ranch-and-cut. In- ternational Journal of Mathematical Modelling and Numerical Optimisation, 1(1/2), 121. https://doi.org/10.1504/ijmmno.2009.030091 6

work page doi:10.1504/ijmmno.2009.030091 2009
[10]

Wassermann, A. (2002). Attacking the Market Split Prob lem with Lattice Point Enumeration. Journal of Combinatorial Optimization , 6(1), 5–16. https://doi.org/10.1023/a:1013355015853

work page doi:10.1023/a:1013355015853 2002
[11]

Williams, H. P. (1978). Model Building in Mathematical Programming. John Wiley & Sons Ltd

work page 1978
[12]

Wu et al. (2013). Solving the market split problem using a distributed computation approach. In 2013 IEEE International Conference on Information and Automation (pp. 1252–1257). https://doi.org/10.1109/icinfa.2013.6720486 7

work page doi:10.1109/icinfa.2013.6720486 2013

[1] [1]

Aardal et al. (1999). Market Split and Basis Reduction: T o- wards a Solution of the Cornu ´ejols-Dawande Instances. In Lecture Notes in Computer Science (pp. 1–16). Springer Berlin Heidelberg. https://doi.org/10.1007/3-540-48777-8_1

work page doi:10.1007/3-540-48777-8_1 1999

[2] [2]

Cornu ´ejols, G., & Dawande, M. (1998). A Class of Hard Small 0–1 Pro- grams. In Lecture Notes in Computer Science (pp. 284–293). Springer Berlin Heidelberg. https://doi.org/10.1007/3-540-69346-7_22

work page doi:10.1007/3-540-69346-7_22 1998

[3] [3]

Gurobi Optimization, LLC. (2023). Gurobi (Version 11). https://www.gurobi.com

work page 2023

[4] [4]

Horowitz, E., & Sahni, S. (1974). Computing Partitions w ith Applica- tions to the Knapsack Problem. Journal of the ACM , 21(2), 277–292. https://doi.org/10.1145/321812.321823

work page doi:10.1145/321812.321823 1974

[5] [5]

Karp, R. M. (1972). Reducibility among Combinatorial Pr oblems. In Complexity of Computer Computations (pp. 85–103). Springer US. https://doi.org/10.1007/978-1-4684-2001-2_9

work page doi:10.1007/978-1-4684-2001-2_9 1972

[6] [6]

Koch et al. (2025). Quantum Optimization Benchmark Li- brary – The Intractable Decathlon (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2504.03832

work page doi:10.48550/arxiv.2504.03832 2025

[7] [7]

Schroeppel, R., & Shamir, A. (1981). A /u1D447 = /u1D442( 2/u1D45B/ 2) , /u1D446 = /u1D442( 2/u1D45B/ 4) Al- gorithm for Certain NP-Complete Problems. SIAM Journal on Computing , 10(3), 456–464. https://doi.org/10.1137/0210033

work page doi:10.1137/0210033 1981

[8] [8]

Vogel, H. (2012). Solving market split problems with heu ristical lattice reduction. Annals of Operations Research , 196(1), 581–590. https://doi.org/10.1007/s10479-012-1143-0

work page doi:10.1007/s10479-012-1143-0 2012

[9] [9]

Wang et al. (2009). Solving the market split problem via b ranch-and-cut. In- ternational Journal of Mathematical Modelling and Numerical Optimisation, 1(1/2), 121. https://doi.org/10.1504/ijmmno.2009.030091 6

work page doi:10.1504/ijmmno.2009.030091 2009

[10] [10]

Wassermann, A. (2002). Attacking the Market Split Prob lem with Lattice Point Enumeration. Journal of Combinatorial Optimization , 6(1), 5–16. https://doi.org/10.1023/a:1013355015853

work page doi:10.1023/a:1013355015853 2002

[11] [11]

Williams, H. P. (1978). Model Building in Mathematical Programming. John Wiley & Sons Ltd

work page 1978

[12] [12]

Wu et al. (2013). Solving the market split problem using a distributed computation approach. In 2013 IEEE International Conference on Information and Automation (pp. 1252–1257). https://doi.org/10.1109/icinfa.2013.6720486 7

work page doi:10.1109/icinfa.2013.6720486 2013