📈 #83 Where did the time go? Part IV: algorithms and parallelization
Making your solver faster without buying a faster computer
Three years ago, our optimization model was taking some time to solve the problem.
Yes, that italic format in the word some gives you a hint that it was slow-ish.
We started adding more cores to the computation. It was easy, just tweaking a parameter. But faster than you think we realized that at some point, the hardware needed for solving the problem exceeded our expectations and we were not getting better solutions.
The bottleneck? Memory consumption.
We found that after 8 threads, the algorithms were so hungry that consumed more than 32 GB… Without getting much better solutions. We were getting just marginal benefits out of them.
But I remember that some years ago, during my PhD, I successfully got good results by parallelizing a Variable Neighborhood Search algorithm for a specific problem1. The objective function was a minimization of the maximum value of retrieving times in warehouses. These problems (min of max, or max of min too) tend to have a flat landscape in the search space that make them extra difficult.
The thing is that back in 2016 I was able to get the state-of-the-art for that problem. It reduced the percentage of deviation to the best solution found more than one percentage point in the same amount of time as previous algorithms.
All of this leads me to today’s questions: why some algorithms are more suitable than others to parallelize? How to identify them? What can we do to make computations faster?
That’s what we’ll explore today in Feasible:
🚫 Why some algorithms won’t thread
🫶🏻 Algorithms that love extra cores
🛠 From theory to first steps
💡 Like where this is going? Click here to get Part IV and the rest of the series delivered straight to your inbox:
Remember this is the fourth part of a 6-part series of posts focused on tractability. We started with bencharking before fixing, then large-scale problems, and three more parts centered on symmetries (what they mean and classical techniques, more advanced techniques, and ML-guided techniques and practical ways to dealing with symmetries).
Are you ready? Let’s dive in… 🪂
🚫 Why some algorithms won’t thread
One of the things computation has given to us is cores.
We tend to think of them as ‘add more cores to your algorithms, and magically you’ll get a speed up’.
But that’s not necessarily true when solving optimization problems. We all wish it was! Just a simple parameter to tune the number of cores and voilà, you get better solutions in less running time.
Let’s see what happens with some of the well-known algorithms…
🧮 Simplex algorithm
By far, one of the most used algorithms in practice. Not only because it solves Linear Programming problems efficiently, but because it helps when solving Mixed Integer Programming problems too (when solving relaxed versions of the original problem as an LP).
Why it resists parallelization?
- Each pivot operation depends on the previous one
- The tableau updates create strict data dependencies
- You must know where you are before deciding where to go next
This is inherently serial. Like a conga line: adding more dancers won’t make the conga parallel, as they still need to follow one by one.
(now you have a mental conga line full of people dancing, like me 😅)
What you can do instead is launching several threads with different initializations running the simplex so that each thread solves the problem on its own.
(yes, like having multiple congas at once)
That will increase the chances of getting a better solution to your problem, but without any guarantees that it will happen.
You can read more of that in this blog post2 and in HiGHS documentation3, where you can see the difficulty on adding more threads.
This is a similar case to Dynamic Programming: each step builds in the previous one, making the algorithm hard to parallelize.
🌳 Branch-and-bound
In the traditional B&B algorithm, you can parallelize the search at node level, but the global incumbent bound and node selection policy cause contention, so threads compete for hardware resources.
What this means in practice is: there are diminishing returns when doing it parallel, and it’s easier to run multiple instances of your single-thread algorithm in multiple cores than building a multi-threaded solver.
That competition for resources is not unique to B&B. You face a similar situation with column generation techniques: as the master problem must finish before pricing and pricing problems often share resources, column pool updates create bottlenecks.
So if these algorithms won't thread well, what will?
🫶🏻 Algorithms that love extra cores
There are some algorithms better suited for parallelization, and especially one type of algorithms are the ones that shine the most.
Let’s start with…
Keep reading with a 7-day free trial
Subscribe to Feasible to keep reading this post and get 7 days of free access to the full post archives.