Convolutional Neural Networks (CNN) are state-of-the-art Neural Network architectures which are primarily used for computer vision tasks. CNN can be applied to a number of different tasks, such as object classification (also known as image recognition), object localization, image segmentation, change detection, as well as similar tasks applied to video footage.
Earlier this year, a company contacted Data Insights, asking us if we would be able to develop a Computer Vision application that would be able to, given an image, identify the car model in said image. The request is certainly a challenge. Some different car models can appear quite similar, and any car can look very different depending on the angle at which they are photographed. Cars can also be imaged in a wide range of settings, sometimes with complex backgrounds, or other images in the foreground. In fact, until quite recently, such problems were simply impossible. Nobody had been been able to design a traditional algorithm that could handle such a problem.
However, in 2012, Deep Neural Networks made it possible to accomplish complex computer vision tasks. Instead of being explained the concept of a car, computers could instead repeatedly study pictures and learn such concepts themselves. In the past few years, additional Neural Network innovations have resulted in AI that can perform image classification tasks with human level accuracy.
Building on such developments, it was possible to train a Deep CNN to classify cars by their model. The Neural Network was trained on the Stanford Cars Dataset (https://ai.stanford.edu/~jkrause/cars/car_dataset.html), which contains over 16,000 pictures of cars, comprising 196 different models (roughly 80 pictures per car model). The CNN was then left to run through these. Over time you can see the accuracy of predictions begins to improve, as the neural network learns the concept of a car, and how to distinguish between different models. The below figure shows this. On the horizontal axis, each `epoch’ is one complete pass through the training data. The purple line shows the accuracy of the network when classifying training images (images it had already practised on). The green line shows what is called the `validation accuracy’, which is the accuracy when classifying images that it has not seen before. The green line is a better indication of how well the network would perform if it were truly deployed.
We see that, after 100 epochs (100 practice attempts using all training images), the accuracy is at roughly 50% (the green line). To improve the accuracy further, we moved the CNN into the cloud, and began training on Databricks. Databricks is optimized to train networks fast and efficiently, and also allows us to try many different CNN configurations much more quickly.
We see here the huge improvement that Databricks makes. CNN train much faster. Even after only a few epochs, accuracy reaches above 80%. This begins to approach human-level performance (current world record big-team attempts tend to range between 88-92%).
If you would like to know more about how to set up such a Neural Network, all files, as well as a very accessible step-by-step `how to’ guide, can all be found here: https://github.com/EvanEames/Cars
Once everything is set up, the Neural Network can be used to identify car models. Just input an image of url, and it will return a model (‘label’), as well as a confidence estimate (‘prob’). Here’s an example: