📈 #77 Where did the time go? Part I: benchmark before you fix
Why your optimization solution is slow (and how to find out without guessing)
I made a promise in my last post:
And I hate breaking promises.
But at the time I was thinking more deeply about the issue (large-scale optimization problems), I realized it touched several topics around the same bigger problem: tractability.
The first time I heard about that word, I kept looking at the teacher with the very same poker face for some minutes. Then I understood it was pretty simple:
Tractability refers to whether a problem can be solved efficiently; meaning, in a reasonable amount of time as the problem size grows considering your real-world business constraints.
So going back to the promise: yes, I broke it. The topic has developed into something bigger as I shared the other day:
And today I’ll share with you my vision about to solving problems efficiently.
I see it as a complex topic that affect all of us as we want to give high-quality solutions to our problems in time, and most of the times you see different issues affecting the way you solve your problems at the same time.
But I’ll give my 100% to simplify as much as possible without losing rigor.
The point of view of this series will be: I made a mathematical model / algorithm to solve a problem, and it’s not giving high-quality solutions in short running times, what should I do?
So today in Feasible I’ll take a step back and start with the basics to talk about:
🛠️ Profile first, fix second
📏 What to measure (and how)
📌 Your roadmap forward
I’ll give you my own experience profiling, guessing, and selecting the best approach.
You’ll also walk away with a prompt to feed ChatGPT to better understand solver’s logs and a flowchart diagram to make better decisions on where to put your efforts.
Are you ready? Let’s dive in… 🪂
🛠️ Profile first, fix second
Picture this:
You developed a MIP considering several real-world constraints that affect the project, like the maximum execution time, the need for optimality, and hardware to run it.
You established a definition, together with the stakeholder, about what acceptable results mean for this project. They don’t need optimal solutions but running times cannot be longer than one hour in a machine with 32 GB RAM.
That MIP usually solves in 10-15 minutes depending on the data. But today it kept running and running, and suddenly the execution broke. It previously handled 250 resources easily, but crashed at 300 with a longer planning horizon.
Memory? Scaling? Symmetry explosion? Without profiling, you're guessing. You need to classify where the bottleneck is to propose a solution:
🚀 Is it a problem of scale? Did your problem scale so much that now it became a large-scale optimization problem and the algorithm wasn’t prepared for that?
🧱 Is it a problem of structure? Does your problem have symmetries but you didn’t detect them and now that the problem scaled they’ve become important?
🧩 Is it the algorithmic approach? Did you use some heuristics that now are trapped in local optima and you need anything extra to escape from them?
💾 Is it a purely hardware problem? Would it be enough to just throw a better CPU and/or memory to get results in the needed time?
You can guess which one is affecting the most to your problem, and there might be several issues affecting at the same time.
But you need to prioritize your work and attack the most striking problem.
That’s why it’s better if you install a diagnostic mindset in your team to get improvements with low(er) efforts.
I’ve seen the benefits of that several times:
→ When we were growing the problem and the current approach didn’t work well, we added a new layer of pre-processing the problem to make the search space smaller, reducing running times by a factor of 10.
→ When the problem was not big enough but running times were making it impossible to solve, we looked at the problem with a different angle, added a couple of constraints and reduce running times by 30%.
→ When the solutions were good enough but we needed a faster execution, we parallelized the algorithm to get the same results but in no time.
See the pattern? Understand the issue first, then propose a solution.
That comes with the diagnostic mindset first. Practice profiling as much as you can, so whenever you hit a wall, you’re better prepared for solving the issue.
You don’t only get optimal solutions for your problems, you also need to optimize the way you solve them. But more importantly, you need to see what’s happening inside your runnings.
So let’s start with what and how to measure important metrics.
📏 What to measure (and how)
You have a problem, and now you know you need to measure something.
But what?
I see three things as the most usual ones to understand the problem firsthand:
CPU time: either you get it from the logs from your solver or need to calculate it if you yourself built the algorithm. Anyway, this is important information as you need your algorithm to run in time.
Solution quality: the value of the objective function and, ideally, the value of the best bound of your problem. If you’re using a solver, logs will tell you; if you built the algorithm, log it!
Memory footprints: this is something you’d need to do yourself as it’s not that usual that you get this information from a solver’s log, and it will give you critical information about how the problem is growing in computational needs as the input size varies.
Those are common across any kind of optimization problem and technique used to solve them, especially the first two.
If you need to focus on something, I’d select CPU time and solution quality. You could build charts to see the evolution of solutions for your problem.
For example, there you have the evolution of gap and the quality of solution (best solution and bound found) over time. It’s not fake data, it was for an execution for solving our problem one year ago capped at 1.5 hours (5400 seconds):
Keep reading with a 7-day free trial
Subscribe to Feasible to keep reading this post and get 7 days of free access to the full post archives.