Para-B&B: Load-Balanced Deterministic Parallelization of Solving MIP

Di Huang; Jinyu Zhang; Shuo Wang; Yue Liu; Zhenyu Pu; Zhiyuan Liu

arxiv: 2604.09556 · v1 · submitted 2026-02-10 · 💻 cs.DC · cs.AI

Para-B&B: Load-Balanced Deterministic Parallelization of Solving MIP

Jinyu Zhang , Di Huang , Yue Liu , Shuo Wang , Zhenyu Pu , Zhiyuan Liu This is my paper

Pith reviewed 2026-05-16 03:16 UTC · model grok-4.3

classification 💻 cs.DC cs.AI

keywords deterministicnodebranch-and-boundcompletecomputationaldatadeterminisminstances

0 comments

The pith

Para-B&B delivers the first open-source deterministic parallel branch-and-bound for HiGHS, achieving 2.17x geometric mean speedup with 8 threads on 80 MIPLIB 2017 instances while guaranteeing identical solutions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Mixed-integer programming solves planning problems that mix continuous numbers with whole-number decisions, such as factory schedules or delivery routes. The classic exact method is branch-and-bound, which grows a tree of possible choices and cuts off hopeless branches. Running this tree search on multiple processor cores at once is difficult because the order in which branches are explored can change which parts get pruned, producing different final answers. Para-B&B avoids this by giving every core an exact copy of the solver's current state so all cores start from identical information. To keep cores busy, it trains AI models on problem structure and past run times to guess how expensive each tree node will be, then assigns work accordingly and adjusts settings during execution. On 80 standard benchmark problems the method ran 2.17 times faster on average with eight cores; the hardest instances reached 5.12 times faster. Processor idle time averaged 34.7 percent.

Core claim

Comprehensive experimental evaluation on 80 MIPLIB 2017 benchmark instances demonstrates effectiveness, achieving a geometric mean speedup of 2.17 using eight threads while maintaining complete deterministic guarantees.

Load-bearing premise

Replicating complete solver state across worker threads and eliminating non-deterministic synchronization primitives will preserve strict determinism while the AI-driven load balancing based on structural characteristics and historical data will not alter the branch-and-bound search tree or final solution.

read the original abstract

Mixed-integer programming (MIP) extends linear programming by incorporating both continuous and integer decision variables, making it widely used in production planning, logistics scheduling, and resource allocation. However, MIP remains NP-hard and cannot generally be solved to optimality in polynomial time. Branch-and-bound, a fundamental exact method, faces significant parallelization challenges due to computational heterogeneity and strict determinism requirements in commercial applications. This paper presents the first fully open-source implementation of deterministic parallel branch-and-bound for HiGHS, a high-performance MIP solver. Our approach introduces a novel data-parallel architecture ensuring strict determinism by replicating complete solver state across worker threads and eliminating non-deterministic synchronization primitives. A key innovation is our AI-driven load balancing mechanism employing multi-stage workload prediction models that estimate node computational complexity based on structural characteristics and historical performance data, coupled with dynamic parameter adjustment strategies. The framework executes orchestrated parallel phases including concurrent dive operations, systematic data consolidation, and intelligent node selection. Comprehensive experimental evaluation on 80 MIPLIB 2017 benchmark instances demonstrates effectiveness, achieving a geometric mean speedup of 2.17 using eight threads while maintaining complete deterministic guarantees. Performance gains become increasingly pronounced for higher node counts, with speedup factors reaching 5.12 for computationally intensive instances and thread idle rates averaging 34.7%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper ships the first open-source deterministic parallel B&B for HiGHS with AI load balancing and reports concrete speedups on MIPLIB, but the determinism claim rests on unshown details about how the AI avoids changing node order.

read the letter

The main point is a practical open-source implementation of deterministic parallel branch-and-bound inside HiGHS. It replicates full solver state across threads to cut nondeterministic synchronization and adds multi-stage AI models that predict node work from structure and past runs, then balances load during orchestrated phases like concurrent dives and consolidation. Experiments on 80 MIPLIB 2017 instances give a geometric mean speedup of 2.17 on eight threads, rising to 5.12 on harder cases, with thread idle time averaging 34.7 percent. That is useful data for anyone who needs reproducible MIP solves on shared hardware. The code release itself is the clearest contribution; prior parallel B&B work often stayed closed or sacrificed determinism. The state-replication tactic is straightforward and removes one common source of nondeterminism. The load-balancing numbers look reasonable on the reported benchmarks. The soft spot is the determinism guarantee. Replicating state helps, but the AI predictor and dynamic parameter tweaks still decide which nodes to process and when. If a prediction ever shifts selection or pruning order relative to sequential execution, the search tree changes even if the final answer happens to match. The abstract and summary give no invariant, proof sketch, or verification step showing the tree stays identical. Experiments are high-level aggregates with no raw traces, error bars, or side-by-side tree comparisons. That leaves the central claim plausible but unverified from what is shown. This is for practitioners who run MIP solvers in planning or scheduling and want multi-core speed without losing reproducibility. A reader working on parallel optimization or HiGHS extensions would get value from the implementation and the benchmark numbers. It deserves peer review because the open-source part makes verification feasible and the empirical results are concrete enough to check.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that full state replication guarantees determinism and on the introduction of AI prediction models whose parameters are fitted to historical data.

free parameters (1)

multi-stage workload prediction model parameters
Parameters of the AI models that estimate node complexity from structural features and historical performance are fitted to data and used for dynamic assignment.

axioms (1)

domain assumption Replicating complete solver state across threads eliminates non-deterministic synchronization
Invoked to justify strict determinism without further proof in the abstract.

invented entities (1)

AI-driven multi-stage workload prediction models no independent evidence
purpose: Estimate computational complexity of branch-and-bound nodes to enable dynamic load balancing
New mechanism introduced in the paper; no independent evidence outside the reported experiments is supplied.

pith-pipeline@v0.9.0 · 5542 in / 1430 out tokens · 136108 ms · 2026-05-16T03:16:19.447273+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

94 extracted references · 94 canonical work pages

[1]

Initialize MIP instance, set global lower bound to -∞, global upper bound to +∞, create priority queue and search stack

work page
[3]

If problem solved during presolve then return solution

work page
[7]

Update lower bound and upper bound from root node results

work page
[8]

Add root node to priority queue

work page
[9]

While priority queue is not empty do

work page
[10]

Perform aging on conflict pool

work page
[11]

Set iteration limit for LP solves during dive

work page
[12]

Choose best node from queue based on lower bound or estimation and install to search

work page
[13]

While search has active node do

work page
[15]

If suboptimal (iteration limit reached) then add current node back to queue and break

work page
[16]

If node is pruned then increment leaf count and break

work page
[17]

If node not pruned and heuristics allowed then

work page
[18]

Apply primal heuristics (randomized rounding, RENS, RINS)

work page
[19]

If domain becomes infeasible then break

work page
[21]

Dive: perform branching and select child node

work page
[22]

If suboptimal then break

work page
[23]

Increment leaf count

work page
[24]

If limits reached then set limit_reached and break

work page
[25]

If dive depth >= 100 nodes then break

work page
[26]

If cannot backtrack in current plunge then break

work page
[27]

Perform conflict pool aging if needed

work page
[28]

Add open nodes from search to priority queue

work page
[29]

If limit reached then update bounds and break

work page
[30]

Propagate global domain

work page
[31]

Prune infeasible nodes in queue based on global domain

work page
[32]

If global domain infeasible then clear queue and break

work page
[33]

Update global lower bound from queue

work page
[34]

If domain changes found then update local domain and cleanup fixed variables

work page
[36]

While queue not empty do

work page
[37]

Select next node from queue (best bound or best estimate)

work page
[38]

Install node to search

work page
[39]

Evaluate node: solve LP relaxation

work page
[40]

If suboptimal then add node back to queue

work page
[41]

Backtrack, increment counters, propagate domain, prune infeasible nodes

work page
[42]

Update bounds and continue to next node

work page
[43]

If node not pruned then

work page
[44]

Separate this node: generate cutting plane

work page
[45]

Store basis and break to perform dive

work page
[46]

Para-B&B Deterministic Parallel B&B Algorithm 2: HiGHS Parallel Solver with Load Balancing

Clean up and return solution B. Para-B&B Deterministic Parallel B&B Algorithm 2: HiGHS Parallel Solver with Load Balancing

work page
[47]

Initialize MIP instance, set global bounds, create worker threads

work page
[48]

Presolve to reduce problem complexity

work page
[49]

If problem solved during presolve then terminate workers and return solution

work page
[50]

Apply trivial heuristics to find initial feasible solution

work page
[51]

Evaluate root node: solve LP relaxation and perform cut generation

work page
[52]

Perform aging to shrink cut pool after root separation

work page
[53]

Create worker data replicas and broadcast global information to all workers

work page
[54]

Initialize master search and install root node

work page
[55]

While active workers exist do

work page
[56]

Perform aging on conflict pools for master and active workers

work page
[57]

Set adaptive iteration limits for master and workers based on historical performance

work page
[58]

Start parallel dive phase with synchronized timing

work page
[59]

Master and active workers perform concurrent dive operations:

work page
[60]

Evaluate current node and apply primal heuristics if allowed

work page
[61]

If node pruned then increment leaf count and continue

work page
[62]

If node not pruned then dive by branching and selecting child nodes

work page
[63]

Continue diving until depth limits, iteration limits, or backtrack conditions met

work page
[64]

Wait for all workers to complete dive phase using barrier synchronization

work page
[65]

Merge worker data to global state:

work page
[66]

Transfer incumbent solutions and update bounds

work page
[67]

Merge domain information and node queues

work page
[68]

Consolidate cut pools, conflict pools, and heuristic statistics

work page
[69]

Update global counters and performance metrics

work page
[70]

Update global information:

work page
[71]

Add open nodes from all searches to global queue

work page
[72]

Propagate global domain and prune infeasible nodes

work page
[73]

Update global bounds and check termination conditions

work page
[74]

If restart conditions met then perform restart and go to step 4

work page
[75]

Broadcast updated global information to all workers:

work page
[76]

Synchronize domain restrictions and bounds

work page
[77]

Transfer incumbent solutions and optimization limits

work page
[78]

Update worker-specific parameters and statistics

work page
[79]

Start parallel node selection phase:

work page
[80]

Distribute available nodes to private queues using round-robin allocation

work page
[81]

For each worker requiring a node do

work page
[82]

Estimate node processing time based on historical data and node characteristics

work page
[83]

Apply load balancing algorithm to assign nodes to workers considering:

work page
[84]

Current worker utilization and capacity

work page
[85]

Estimated processing time for candidate nodes

work page
[86]

Worker-specific performance history

work page
[87]

Apply load control algorithm to adjust assignments:

work page

Showing first 80 references.

[1] [1]

Initialize MIP instance, set global lower bound to -∞, global upper bound to +∞, create priority queue and search stack

work page

[2] [3]

If problem solved during presolve then return solution

work page

[3] [7]

Update lower bound and upper bound from root node results

work page

[4] [8]

Add root node to priority queue

work page

[5] [9]

While priority queue is not empty do

work page

[6] [10]

Perform aging on conflict pool

work page

[7] [11]

Set iteration limit for LP solves during dive

work page

[8] [12]

Choose best node from queue based on lower bound or estimation and install to search

work page

[9] [13]

While search has active node do

work page

[10] [15]

If suboptimal (iteration limit reached) then add current node back to queue and break

work page

[11] [16]

If node is pruned then increment leaf count and break

work page

[12] [17]

If node not pruned and heuristics allowed then

work page

[13] [18]

Apply primal heuristics (randomized rounding, RENS, RINS)

work page

[14] [19]

If domain becomes infeasible then break

work page

[15] [21]

Dive: perform branching and select child node

work page

[16] [22]

If suboptimal then break

work page

[17] [23]

Increment leaf count

work page

[18] [24]

If limits reached then set limit_reached and break

work page

[19] [25]

If dive depth >= 100 nodes then break

work page

[20] [26]

If cannot backtrack in current plunge then break

work page

[21] [27]

Perform conflict pool aging if needed

work page

[22] [28]

Add open nodes from search to priority queue

work page

[23] [29]

If limit reached then update bounds and break

work page

[24] [30]

Propagate global domain

work page

[25] [31]

Prune infeasible nodes in queue based on global domain

work page

[26] [32]

If global domain infeasible then clear queue and break

work page

[27] [33]

Update global lower bound from queue

work page

[28] [34]

If domain changes found then update local domain and cleanup fixed variables

work page

[29] [36]

While queue not empty do

work page

[30] [37]

Select next node from queue (best bound or best estimate)

work page

[31] [38]

Install node to search

work page

[32] [39]

Evaluate node: solve LP relaxation

work page

[33] [40]

If suboptimal then add node back to queue

work page

[34] [41]

Backtrack, increment counters, propagate domain, prune infeasible nodes

work page

[35] [42]

Update bounds and continue to next node

work page

[36] [43]

If node not pruned then

work page

[37] [44]

Separate this node: generate cutting plane

work page

[38] [45]

Store basis and break to perform dive

work page

[39] [46]

Para-B&B Deterministic Parallel B&B Algorithm 2: HiGHS Parallel Solver with Load Balancing

Clean up and return solution B. Para-B&B Deterministic Parallel B&B Algorithm 2: HiGHS Parallel Solver with Load Balancing

work page

[40] [47]

Initialize MIP instance, set global bounds, create worker threads

work page

[41] [48]

Presolve to reduce problem complexity

work page

[42] [49]

If problem solved during presolve then terminate workers and return solution

work page

[43] [50]

Apply trivial heuristics to find initial feasible solution

work page

[44] [51]

Evaluate root node: solve LP relaxation and perform cut generation

work page

[45] [52]

Perform aging to shrink cut pool after root separation

work page

[46] [53]

Create worker data replicas and broadcast global information to all workers

work page

[47] [54]

Initialize master search and install root node

work page

[48] [55]

While active workers exist do

work page

[49] [56]

Perform aging on conflict pools for master and active workers

work page

[50] [57]

Set adaptive iteration limits for master and workers based on historical performance

work page

[51] [58]

Start parallel dive phase with synchronized timing

work page

[52] [59]

Master and active workers perform concurrent dive operations:

work page

[53] [60]

Evaluate current node and apply primal heuristics if allowed

work page

[54] [61]

If node pruned then increment leaf count and continue

work page

[55] [62]

If node not pruned then dive by branching and selecting child nodes

work page

[56] [63]

Continue diving until depth limits, iteration limits, or backtrack conditions met

work page

[57] [64]

Wait for all workers to complete dive phase using barrier synchronization

work page

[58] [65]

Merge worker data to global state:

work page

[59] [66]

Transfer incumbent solutions and update bounds

work page

[60] [67]

Merge domain information and node queues

work page

[61] [68]

Consolidate cut pools, conflict pools, and heuristic statistics

work page

[62] [69]

Update global counters and performance metrics

work page

[63] [70]

Update global information:

work page

[64] [71]

Add open nodes from all searches to global queue

work page

[65] [72]

Propagate global domain and prune infeasible nodes

work page

[66] [73]

Update global bounds and check termination conditions

work page

[67] [74]

If restart conditions met then perform restart and go to step 4

work page

[68] [75]

Broadcast updated global information to all workers:

work page

[69] [76]

Synchronize domain restrictions and bounds

work page

[70] [77]

Transfer incumbent solutions and optimization limits

work page

[71] [78]

Update worker-specific parameters and statistics

work page

[72] [79]

Start parallel node selection phase:

work page

[73] [80]

Distribute available nodes to private queues using round-robin allocation

work page

[74] [81]

For each worker requiring a node do

work page

[75] [82]

Estimate node processing time based on historical data and node characteristics

work page

[76] [83]

Apply load balancing algorithm to assign nodes to workers considering:

work page

[77] [84]

Current worker utilization and capacity

work page

[78] [85]

Estimated processing time for candidate nodes

work page

[79] [86]

Worker-specific performance history

work page

[80] [87]

Apply load control algorithm to adjust assignments:

work page