top of page
  • Writer's picturePrajith Raju

TDD for building ML models

Updated: Jul 15

At ReWorked.ai, we aim to deliver consistent results for our customer to allow for significant time/dollar savings, we do that by following the mantra “make your products observable and testable before actually building the product”. In our first blog, we spoke about the importance of observability and our journey in identifying the right metrics. In this blog, we’ll talk about the importance of testability while building ML models.

The ML models aka “Betty” power ReWorked.ai and provide our customers 50% or more in savings of marketing expenses.

In the journey of building ML models, we obviously want to be testing various algorithms and various sampling techniques, various configurations within those, etc to arrive at the right model for our needs. So having a repeatable test that demonstrates the success of the current code is important.




You might have heard about “Test Driven Development” or TDD in the context of general software engineering and it’s certainly applicable to model building as well.

In our previous blog we identified the right metric, which is a weighted average of PTIP (Percentage of True Inquiries Predicted) and model accuracy, we continue our journey into building the right model by taking a testing first approach.

To begin with I wrote this helper function :

def gather_print_results(total_number_of_true_inquiries, PredictedDF, testDF, algorithm):

That way, I could do something like this within the function:

model_accuracy = round((2*(100/total_number_of_true_inquiries*total_number_of_inquiries_predicted_correctly)+(cm[0][0] + cm[1][1])/(cm[0][0]+cm[1][0]+cm[1][1] + cm[0][1])*100)/3,2)print("FINAL METRIC::: Weighted average accuracy =", model_accuracy)

Where cm is: cm = confusion_matrix(testDF[‘Inquiry’] ,PredictedDF). If folks are unfamiliar with what a confusion matrix is, I touched on it in the previous blog, you can also find several online resources that talk about the confusion matrix.

Now, all we need to do is to actually write the test. Python has a helpful unit testing framework called unittest which gives some helpful native functions, like assertGreater, which we’ll be using to assert if the accuracy returned from the model is greater than our expected accuracy. For the initial failing test, we started with a 65% baseline accuracy knowing that an initial model we’d written was giving us that number.

So the inputs to our test script is:

  • Test data

  • Model to use

Invoking our model class with the above inputs we’re able to get the model accuracy numbers which we’ll then pass to the assertGreater function to see if our test is passing or not (test passes when the accuracy as defined by us is greater than our expected accuracy). If the accuracy we got back is greater, then we make this the new baseline, otherwise we tweak things in the model.

Continuous Integration — We also leveraged github actions to be triggered on any commit, so any code change on any branch triggers a github action that invokes the tests that tell us how our model is performing relative to previous baseline.

That’s it! With the above approach we now have followed the mantra “make your products observable and testable before actually building the product” and made our yet-to-be-built-product both observable and testable. Now, we can actually start building the product and make quick iterations on it.

In the current and previous blogs we talked about topics or steps you don’t find in tech blogs or general tech discourse out there, certainly not for building machine learning modes. In the subsequent blogs however we’ll talk about the more standard machine learning build process of — gathering training data, data cleansing, feature selection, model selection, etc.

By following a test driven development (TDD) approach, ReWorked.ai is able to utilize scientific proven processes to bring accurate consistent results of finding the clients that are ready to make a deal.

9 views0 comments

Comments


bottom of page