📈 #75 Breaking to build better: Implementing Test-Driven Development for Operations Research

Why comprehensive testing is not just important, but essential for maintaining solution quality and reliability

May 12, 2025

∙ Paid

Seven years ago, on a day like today, I was sitting at my desktop within a large project.

More than 8 years of development and more than half a million lines of code were looking at my face.

The user detected an issue with one solution in production. I need to be honest: it was an easy-to-fix issue. No more than a couple of hours of development and it was ready.

Well, ready until I ran the suite of 2200+ tests we had at that time.

3 tests were red. 3 tests failed. 3 tests were showing that something was wrong.

That’s true, they were corner cases, but they were ones that happened in the real-life day-to-day of the business. So they were important.

Do you think that someone like me can store more than 2200 cases in his head? That’s not realistic. Then imagine that only 0.1% of those cases fail.

That day I learned that having a big suite of tests is a safety net. A way to guarantee you’re delivering good quality developments to the business.

So today I come here to Feasible to talk about Test-Driven Development applied to OR, or what I call Test-Driven Modeling:

Test‑Driven Modeling (TDM) = applying the red‑green‑refactor loop to mathematical models rather than just code.

We’ll see:

🔍 Why tests are so important in OR
🧰 A selection of tests you can run right away
🚀 How to start leveraging TDD for your next project

Are you ready? Let’s dive in… 🪂

🔍 Why tests are so important in OR

Operations Research is a discipline tied to business outcomes.

Its solutions often drive critical business decisions involving significant resources (supply chains, financial portfolios, resource allocation, you name it), and errors can have substantial financial or operational consequences.

Not only that, but solution correctness is difficult to verify visually. Solutions often can't be confirmed by visual inspection alone, and small errors in algorithms or constraints can lead to subtly incorrect or suboptimal solutions that appear reasonable at first sight.

The combinatorial explosion of possible scenarios in OR makes comprehensive testing essential for maintaining solution quality and reliability.

Testing an optimization model is a three-fold game where every green check marks a victory for a different player:

Developer confidence ⚙️ Green bars mean you can refactor, swap solvers, or add constraints without fear of silent breakage. Your model code becomes a playground instead of a minefield.
Business guarantees 💼 Executives care about trucks leaving on time and budgets holding steady. Golden-case tests and KPI-drift checks turn those worries into a simple pass/fail signal the business can bank on.
Stakeholder trust 🤝 Operators press “Run plan” at anytime because they believe the system won’t blow up. Your social proof is a visible, automated test suite: “We break it long before you can.”

When the CI dashboard lights up green, all three layers (engineers, leadership, and end-users) get a simultaneous thumbs-up.

A failing test isn’t bad news; it’s a low‑cost cue that one of those layers would have been burned later.

So the earlier you catch a bug, the better for anyone in the loop, since the consequences of errors in OR solutions can be severe.

→ Miscalculations in supply chain optimization could lead to stockouts and lost revenue.

→ Flaws in financial portfolio models could result in substantial investment losses.

→ Inaccuracies in resource allocation models could cause operational disruptions and angry customers.

These types of errors are not simply academic; they can have very real and costly impacts on businesses and their stakeholders.

🧰 A selection of tests you can run right away

Today I want you to focus on tests you can write and run, not on the benefits of following TDD practices.

Why? Because you have plenty of knowledge out there about that.

But there’s not that much regarding how to use it around OR projects.

So let’s start from the beginning. Let’s start by focusing on a layered pyramid that represents several important aspects of a testing strategy, like:

🕐 Quantity & frequency: Typically, you have more tests at the bottom of the pyramid and fewer as you move up. Lower-level tests run more frequently during development.
⚡ Execution speed: Tests at the bottom are faster to run (milliseconds to just a few seconds) while tests at the top can take minutes or hours.
🔍 Scope & isolation: Lower tests focus on isolated components, while higher tests evaluate entire systems.
🏗️ Cost of creation & maintenance: Tests at the bottom are relatively inexpensive to create and maintain, while comprehensive tests at the top require significant investment.
🔄 Feedback speed: Lower-level tests provide immediate feedback during development, while higher-level tests might run only in nightly builds or pre-release phases.

Start implementing from the bottom of the pyramid and work your way up. This approach builds confidence in your foundational components before tackling more complex concerns. Though I’ll give you specific suggestions in the next section.

Ok, but… What’s the pyramid then? And what does each layer cover? 👇

Keep reading with a 7-day free trial

Subscribe to Feasible to keep reading this post and get 7 days of free access to the full post archives.