📈 #78 Where did the time go? Part II: scale
When your optimization problem outgrows your solver, and what to do about it
Two years ago, I was solving a planning problem for assigning trucks to routes.
More precisely, as we were doing lots of relays, it was for assigning trucks to segments of routes (what we call legs).
The problem was more or less manageable for the ~120 trucks we had at that moment.
But then they asked me:
(remember I work for a startup and things change fast… but maybe not that fast)
Borja, if we suddenly tomorrow had 1,000 trucks instead, could we still do the automatic planning?
Of course not!
That would radically change the size of the problem, and thus the things we needed to do to solve it.
Overnight, a problem that could be solved with a desired quality, would be impossible to solve in the time frames we were managing.
So we started to prepare for that potential future. Not because it was a potential one (we’re not there yet), but because the problem could become bigger in no time as there was a new use case that added a lot of complexity.
Understanding that complexity and knowing what to do with it is key to a successful project from the technical point of view. Let’s see today in Feasible:
🎃 When size breaks the solver
🧰 The toolbox: 4 ways to tackle large-scale problems
🪄 Making it real
Remember: this is the second part of the 6-part series of Where did the time go?; in case you missed it, here is Part I: benchmark before you fix.
Are you ready? Let’s dive in… 🪂
🎃 When size breaks the solver
The name itself gives us an idea of what large-scale means: problems where the needed storage or computing time goes beyond of what regular computers might handle.
That comes in different flavors and might be difficult to understand at first, so let’s break it down.
But first: this is going to be focused on Mixed Integer Linear Programs (or MILPs) as they’re usually the hardest problems to solve.
✅ What counts as large-scale optimization problems?
It might sound obvious, but one of the main issues with large-scale optimization problems is the number of variables and constraints.
Theoretically, there’s no limit on the number of variables or constraints for a solver as Gurobi states here.
However, we need to take into account that Linear Programs (LP) are inherently easier to solve thanks to the methods and solvers we have today, so a 1 million-variable LP problem might be easy to solve while a 100k-variable Mixed Integer Programming (MIP) problem might be difficult.
This comes because of the way traditional solvers tackle MIPs: we create a tree where we solve a relaxed version of the problem at each node, then apply branch or prune, and apply the optimality criterion; repeat the process until you find the optimal solution.
In the worst-case scenario, we would need to solve an LP at each possible node and explore the branch. Every time we open a branch, we have two sides, so in the end we get 2^num_variables. Let’s say we have 1000 variables, the worst-case scenario would contemplate 2^1000 steps.
How much is that? If your solver manages 10,000 LP relaxations per second, you can do the math to calculate it could solve 315 billion LPs per year. So after the age of the universe, 13.7 billion years, it would’ve solved 4.32 x 10^21 LP relaxations in the branch-and-bound tree.
And yes, in case you’re wondering, 2^1000 > 4.32 x 10^21. So we would need much more time than the age of the universe to solve it.
It doesn’t usually take that long solving these problems, right? 😅 Because we don’t usually need to solve every possible LP relaxation. But the thing is: we’re not able to forecast the number of LP relaxations our solver will need, so we might end up in -very- large running times because of that unpredictability.
There are some formulations that add lots of constraints. For example, you’d need to add subtour elimination constraints in the TSP so that you prevent creating subtours and the solution is just one single tour. And there are two main approaches: MTZ and DFJ formulations.
In the first one, you add n^2 new constraints and n new decision variables, being n the number of locations. You add them once and solve the model.
In the second one, you add 2^n new constraints with no new decision variables. However, you add new constraints while finding solutions with subtours iteratively, meaning that you solve the original problem, and if there’s a subtour then you add new constraints, solve the model again, check for subtours and adding constraints just in case, solve again, and so on and so forth.
There’s no clear winner between the two, but in any case you’d add lots of new constraints too, and the problem becomes more difficult to solve.
On top of that, you also need to understand the density of the matrix you’re generating with your decision variables and constraints. The density refers to the number of non-zero values in that matrix. The higher the density, the larger the memory you need to store it.
For instance, for a problem with 1M variables and 1M constraints, at 8 bytes per value, we’d need ~8TB of memory to store it. That’s not sustainable.
You also need to take into account the time-window constraint for your problem to run. It’s not the same having a problem with just a few seconds to run as one with 30 minutes or another one with 24 hours. The less time to run the algorithm, the fewer number of iterations. The fewer number of iterations, the more difficult to solve as each iteration should be improving the solution a lot.
❌ But beware: sometimes we address problems formulated with small models but huge run times.
As a general rule of thumb, we can think of them as hard problems but maybe not large-scale ones.
It could also happen that the solver stalls at the root node with no incumbent. This is often due to strong symmetry or weak formulation, not scale neither.
Now that we have identified our problem as a large-scale one, the next question arises quite naturally: what can I do to solve them?
Also, I’ll address some first checks at the end of the article.
🧰 The toolbox: 4 ways to tackle large-scale problems
During my career, I’ve developed or seen in projects some interesting methods to tackle large-scale optimization problems.
Today I want to present you four different strategies (plus one bonus as this technique is not that common yet), being two of them the ones that I like the most due to their implementation simplicity, scalability, and maintainability. They come with a cost: you lose optimality guarantees.
Let’s start with the one I like the most (that you can also use to warm-start your solver so you don’t lose those optimality guarantees):
🎨 Heuristics and metaheuristics approaches
You know I’m a metaheuristics lover.
I started with them in a Master’s course, then continued working with them in my PhD. Just in case you missed it:
But taking a look at how to address large-scale optimization problem with them, it’s easy to see this approach succeeding.
The ingredients you need: some heuristics to create the initial solution and enhance it by local search methods, then generate high-quality solutions with metaheuristics. Then you can use these solutions to feed your math model as a warmstart process.
If your problem size increases dramatically, using this approach will reduce the effective search space and guide the exact solver toward promising regions, faster.
It’s very common in practice, especially for large, time-constrained problems, as they arise quite natural in most problems. But why?
Keep reading with a 7-day free trial
Subscribe to Feasible to keep reading this post and get 7 days of free access to the full post archives.