Why You Should Invest in Machine Learning Talent, Not Infrastructure


AllCloud Blog: Cloud Insights and Innovation

The benefits of Machine Learning (ML) in a business context are clear. Most, if not all companies can use ML in their respective domains. Unfortunately, ML is not cheap: collecting data, labeling it, maintaining it, hiring data engineers, analysts and scientists to work with it, building infrastructure for everybody to work on, training models, tuning them, deploying them, etc. This list goes on, and this all costs money. In my previous two posts, I’ve touched on the subject of cost reduction while using Amazon SageMaker as your chosen ML platform. Here, I’ll convey just how much this platform can really save on costs.

Starting from the Bottom Up: Storage and Labeling

As mentioned previously, storage on the cloud is cheap. This is doubly true when using Amazon Web Services (AWS) Simple Storage Service (S3). Out of the box, you get 99.999999999% object durability across multiple availability zones, among other features. In comparison, this service would significantly save on the operation cost of on-premise storage and maintenance. Once your data is on the cloud, labeling it is easy with SageMaker Ground Truth, which can itself reduce the costs by utilizing Automated Data Labeling. Mentioned in the previous post, this feature can reduce up to 70% of labeling costs by reducing the amount of samples human annotators need to see.

Ready to Go: A Machine Learning Platform

AWS SageMaker provides pre-installed instances with Jupyter Notebooks, CUDA support for GPU utilization, and the industry-standard ML frameworks Tensorflow, PyTorch, and MXNet among others. Some of these packages are notoriously difficult to set up and maintain correctly, and this often becomes a huge time sink for IT and engineering teams. On SageMaker, it only takes a few clicks to set up an instance and get started on a new project.

Training on the Spot: Using Spot Instances to Reduce Training Costs

ML model training costs can also be significantly lowered by up to 90% training cost reduction using Spot instances instead of on-demand instances. While using SpotIinstances may rarely cause your training job to be interrupted, this is not a real issue when you add checkpoints to your training loop (after each epoch). In practice, most training jobs never get interrupted, so there is no real reason not to use Spot Instances for this type of usage.

Don’t Overpay for Inference

When training ML models with large amounts of data – you’ll need strong compute power. However, deploying your models with the same computational power used for training is a common mistake. Deploying and using your model doesn’t benefit from strong and costly instances, so they are usually a waste of money in this case. Amazon Elastic Inference allows you to use the instance types most suited for the inference needs, and ‘attach’ GPUs to them should you need them. This can save up to 75% on inference costs when using the supported Tensorflow or MXNet frameworks.

AllCloud Optimizes Customer Cloud Costs

Here at AllCloud, we have the AWS expertise, Data Engineering experience, and Machine Learning know-how to produce real value and actionable insights from customer data. We always do this with our customers’ costs in mind and implement all these cost-reduction techniques, and more, when implementing ML solutions.

The Bottom Line

If you want to use ML but are afraid of the associated costs, we recommend using AWS as your cloud provider and SageMaker as your chosen ML platform. You’ll benefit from a variety of cost-reducing abilities at all stages of the Machine Learning pipeline. AllCloud’s data professionals can help you start your Machine Learning journey on AWS, and optimize your costs along the way. Contact us to learn more!

 

Ido Nissim

Data Engineer

Read more posts by Ido Nissim