Whatever the application, AI needs rigor and methodology

The Covid-19 crisis has given AI a new opportunity to prove its worth and showcase its capabilities.

All over the world, the AI community produced models to fight the virus by helping spot infections, predicting the likelihood of hospitalizations or predicting patients’ outcomes.

The speed at which models were created and shared was impressive and revealed a very positive research attitude. Some even started hoping that AI could bring a breakthrough in fighting the coronavirus. However, a recent study by the British Medical Journal poured cold water on these hopes.

In the study, an epidemiologist at Maastricht University, Dr. Laure Winants, led a team of clinicians, scientists and engineers to review and analyze 31 published models. The authors declared the models so “uniformly poor” that “none can be recommended for clinical use.”

AI’s incredible success in medical imagery brought it to the forefront and finally convinced major players across all industries that it meant business. So why such a failure in the fight against the coronavirus?

An AI application is only as good as the data it was trained on

Deep learning for computer vision as a whole made huge strides forward in recent years. AI for medical imagery simply benefited from these breakthroughs.

Perhaps more significantly, if we take a closer look at Dr Winants’s article, we can see that most mistakes researchers made are related to the data itself: bad annotations, wrong data sampling, etc.

On a practical level, the study by the British Medical Journal recommended that ML engineers adopt the TRIPOD checklist as a standard for developing AI for the medical field. This checklist was designed by physicians and data scientists with the goal of helping engineers report their work clearly and therefore reducing the risk of developing biased models.

This is one of the critical rules of AI application. We can never stress it enough: an AI application is only as good as the data it was trained on.

Most likely, the urgency around the Covid-19 situation made some engineers rush through the steps they usually go through in data exploration and data preparation. They also probably released their models earlier than they usually do and with less tests.

It is not unusual for the AI community to publish preliminary results. As a matter of fact, this speeds up considerably research processes. However, when it comes to health, rigorous methodology and peer review are essential to avoid causing harm.

The same principles can be applied to your business. As we work with clients to incorporate AI into applications, many are excited about the benefits of AI and eager to rush to assumptions. The need for a healthy balance of urgency and accuracy of results is critical to long term success of any program.

There is no place this is clearer than in the fast evolving Covid-19 situation. Sharing results early is critical, but the AI community needs to put new mechanisms in place so that preliminary models don’t cause harm and AI doesn’t lose its credibility.

Have questions about how you can leverage AI or ML? Contact our data experts to learn more about building the right data strategy for your organization today!

Jonathan Chemama

AI Tech Lead