Why You Need a Machine Learning Strategy

AllCloud Blog: Cloud Insights and Innovation

Machine learning models get more sophisticated every day and with this sophistication comes complexity. Newer models have even more parameters and are more difficult to design and tune. Even seasoned machine learning engineers can get lost in the maze of techniques and possibilities to improve their models.

For example, let’s imagine that you designed a bird classifier with an 85% accuracy. Unfortunately, this is not good enough for your application.

What should you do to improve your model?

You could add more training data or decide to train your algorithm for longer. You could also choose to change the activation function in your neural network or the network architecture. And what about perhaps adding dropout and changing the optimizer?

It may come as a surprise but the reality is that often engineers in charge of developing these models decide to go with their gut feeling and test the waters. You could get lucky and solve your problem immediately but this will more likely result in a huge loss of time.

It’s time for a strategy

This is where having an Machine Learning Strategy can make all the difference. Having a strategy does not mean knowing in advance which solution will be best suited for your model. No, it’s more about being able to quickly discard possibilities to improve your model. This can translate into massive gains in time and cost of development.

Over the next few posts, we’ll review the different diagnostics you can run. But for the moment, let’s focus on the underlying philosophy of this method.

The basic idea could be summed up in one phrase: you need to have a clear idea of what to tune in order to achieve a precise effect. This is what we call “orthogonalization”.

The most common way to illustrate orthogonalization is to refer to the example of the old television set. These television sets had a lot of knobs to tune the image display. One knob allowed to adjust the image’s height, another to adjust its width, etc. One knob was responsible for one dimension only. Imagine a tv where one knob would have allowed to tune 20% of the width and 50% of the height while another knob would have been responsible for the remaining 80% of the width and 50% of the height. That would have made tuning the image a lot more difficult!

Well, the same lesson should drive our machine learning model tuning. If we take the example of supervised learning, we have to tune our system to achieve four distinct goals:

  • First, we have to perform well on the training set
  • Then, on the validation set
  • Third, on the test set
  • Finally, our system has to perform well on the data it will meet in production

According to the principle of orthogonalization, each time we decide to tune a parameter of our model, we should focus on one of these steps and this step only!

Remember to focus on one thing at a time

In order to better understand this idea, let’s take an example that violates orthogonalization. You have probably heard of the regularization technique called early stopping. The idea is very simple. Although your training accuracy could increase further, you stop training as soon as your validation accuracy starts decreasing and thus prevent your model from overfitting.

It’s a good technique that is quite widely used and the point here is not to dismiss it but rather to draw your attention to something we often overlook – early stopping violates orthogonalization. It deliberately hurts training performance to improve validation performance. It’s as if you were tuning simultaneously the height and the width of the image on the old TV.

Although early stopping could help in regards to overfitting, focusing simultaneously on both the training set and the validation set can make the overall process of hyperparameters optimization harder.

Now that the underlying philosophy is clearer, we are going to go over the different diagnostics you can run to improve your model. The first thing you have to find out is if your model is suffering from high bias or high variance. This will be the subject of our next post, stay tuned!

Jonathan Chemama

Data Scientist

Read more posts by Jonathan Chemama