Para-B&B: Load-Balanced Deterministic Parallelization of Solving MIP
Pith reviewed 2026-05-16 03:16 UTC · model grok-4.3
The pith
Para-B&B delivers the first open-source deterministic parallel branch-and-bound for HiGHS, achieving 2.17x geometric mean speedup with 8 threads on 80 MIPLIB 2017 instances while guaranteeing identical solutions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Comprehensive experimental evaluation on 80 MIPLIB 2017 benchmark instances demonstrates effectiveness, achieving a geometric mean speedup of 2.17 using eight threads while maintaining complete deterministic guarantees.
Load-bearing premise
Replicating complete solver state across worker threads and eliminating non-deterministic synchronization primitives will preserve strict determinism while the AI-driven load balancing based on structural characteristics and historical data will not alter the branch-and-bound search tree or final solution.
read the original abstract
Mixed-integer programming (MIP) extends linear programming by incorporating both continuous and integer decision variables, making it widely used in production planning, logistics scheduling, and resource allocation. However, MIP remains NP-hard and cannot generally be solved to optimality in polynomial time. Branch-and-bound, a fundamental exact method, faces significant parallelization challenges due to computational heterogeneity and strict determinism requirements in commercial applications. This paper presents the first fully open-source implementation of deterministic parallel branch-and-bound for HiGHS, a high-performance MIP solver. Our approach introduces a novel data-parallel architecture ensuring strict determinism by replicating complete solver state across worker threads and eliminating non-deterministic synchronization primitives. A key innovation is our AI-driven load balancing mechanism employing multi-stage workload prediction models that estimate node computational complexity based on structural characteristics and historical performance data, coupled with dynamic parameter adjustment strategies. The framework executes orchestrated parallel phases including concurrent dive operations, systematic data consolidation, and intelligent node selection. Comprehensive experimental evaluation on 80 MIPLIB 2017 benchmark instances demonstrates effectiveness, achieving a geometric mean speedup of 2.17 using eight threads while maintaining complete deterministic guarantees. Performance gains become increasingly pronounced for higher node counts, with speedup factors reaching 5.12 for computationally intensive instances and thread idle rates averaging 34.7%.
Editorial analysis
A structured set of objections, weighed in public.
Axiom & Free-Parameter Ledger
free parameters (1)
- multi-stage workload prediction model parameters
axioms (1)
- domain assumption Replicating complete solver state across threads eliminates non-deterministic synchronization
invented entities (1)
-
AI-driven multi-stage workload prediction models
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Initialize MIP instance, set global lower bound to -∞, global upper bound to +∞, create priority queue and search stack
-
[3]
If problem solved during presolve then return solution
-
[7]
Update lower bound and upper bound from root node results
-
[8]
Add root node to priority queue
-
[9]
While priority queue is not empty do
-
[10]
Perform aging on conflict pool
-
[11]
Set iteration limit for LP solves during dive
-
[12]
Choose best node from queue based on lower bound or estimation and install to search
-
[13]
While search has active node do
-
[15]
If suboptimal (iteration limit reached) then add current node back to queue and break
-
[16]
If node is pruned then increment leaf count and break
-
[17]
If node not pruned and heuristics allowed then
-
[18]
Apply primal heuristics (randomized rounding, RENS, RINS)
-
[19]
If domain becomes infeasible then break
-
[21]
Dive: perform branching and select child node
-
[22]
If suboptimal then break
-
[23]
Increment leaf count
-
[24]
If limits reached then set limit_reached and break
-
[25]
If dive depth >= 100 nodes then break
-
[26]
If cannot backtrack in current plunge then break
-
[27]
Perform conflict pool aging if needed
-
[28]
Add open nodes from search to priority queue
-
[29]
If limit reached then update bounds and break
-
[30]
Propagate global domain
-
[31]
Prune infeasible nodes in queue based on global domain
-
[32]
If global domain infeasible then clear queue and break
-
[33]
Update global lower bound from queue
-
[34]
If domain changes found then update local domain and cleanup fixed variables
-
[36]
While queue not empty do
-
[37]
Select next node from queue (best bound or best estimate)
-
[38]
Install node to search
-
[39]
Evaluate node: solve LP relaxation
-
[40]
If suboptimal then add node back to queue
-
[41]
Backtrack, increment counters, propagate domain, prune infeasible nodes
-
[42]
Update bounds and continue to next node
-
[43]
If node not pruned then
-
[44]
Separate this node: generate cutting plane
-
[45]
Store basis and break to perform dive
-
[46]
Para-B&B Deterministic Parallel B&B Algorithm 2: HiGHS Parallel Solver with Load Balancing
Clean up and return solution B. Para-B&B Deterministic Parallel B&B Algorithm 2: HiGHS Parallel Solver with Load Balancing
-
[47]
Initialize MIP instance, set global bounds, create worker threads
-
[48]
Presolve to reduce problem complexity
-
[49]
If problem solved during presolve then terminate workers and return solution
-
[50]
Apply trivial heuristics to find initial feasible solution
-
[51]
Evaluate root node: solve LP relaxation and perform cut generation
-
[52]
Perform aging to shrink cut pool after root separation
-
[53]
Create worker data replicas and broadcast global information to all workers
-
[54]
Initialize master search and install root node
-
[55]
While active workers exist do
-
[56]
Perform aging on conflict pools for master and active workers
-
[57]
Set adaptive iteration limits for master and workers based on historical performance
-
[58]
Start parallel dive phase with synchronized timing
-
[59]
Master and active workers perform concurrent dive operations:
-
[60]
Evaluate current node and apply primal heuristics if allowed
-
[61]
If node pruned then increment leaf count and continue
-
[62]
If node not pruned then dive by branching and selecting child nodes
-
[63]
Continue diving until depth limits, iteration limits, or backtrack conditions met
-
[64]
Wait for all workers to complete dive phase using barrier synchronization
-
[65]
Merge worker data to global state:
-
[66]
Transfer incumbent solutions and update bounds
-
[67]
Merge domain information and node queues
-
[68]
Consolidate cut pools, conflict pools, and heuristic statistics
-
[69]
Update global counters and performance metrics
-
[70]
Update global information:
-
[71]
Add open nodes from all searches to global queue
-
[72]
Propagate global domain and prune infeasible nodes
-
[73]
Update global bounds and check termination conditions
-
[74]
If restart conditions met then perform restart and go to step 4
-
[75]
Broadcast updated global information to all workers:
-
[76]
Synchronize domain restrictions and bounds
-
[77]
Transfer incumbent solutions and optimization limits
-
[78]
Update worker-specific parameters and statistics
-
[79]
Start parallel node selection phase:
-
[80]
Distribute available nodes to private queues using round-robin allocation
-
[81]
For each worker requiring a node do
-
[82]
Estimate node processing time based on historical data and node characteristics
-
[83]
Apply load balancing algorithm to assign nodes to workers considering:
-
[84]
Current worker utilization and capacity
-
[85]
Estimated processing time for candidate nodes
-
[86]
Worker-specific performance history
-
[87]
Apply load control algorithm to adjust assignments:
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.