Training Machine Vision Systems with Limited Labeled Datasets

The age of AI is upon us and AI-powered intelligent automation systems are the need of the hour. Early adopters are reaping huge benefits in terms of lower costs and higher productivity. Take the example of the potential impact of AI on machine vision. For the longest time, the approach here has been to use dedicated cameras, illumination systems and pattern projectors. The lighting systems often needed careful tuning and calibration. Each product-defect scenario had to be designed for separately. For example, the system used to find defects in bolts is very different from the system used to find defects in a crank shaft.

The fact is that in many of these scenarios, a human expert could have just looked at the item of interest and identified the issue, without any special lighting requirements. The promise of AI and deep learning is that this is now possible to replicate this intelligence in machine vision systems.

In our previous blog we talked about some of the challenges associated with training intelligent systems for autonomous robots and industrial automation in general. One key challenge we discussed is that deep learning systems need a lot of training data for reaching good accuracy. For image recognition, this means that lots of image data is needed with associated meta-data and tags. Generating this labeled data is expensive. Current AI projects involve a dedicated team of real humans who are trained to tag and annotate images. This is cost prohibitive due to several reasons, including the large amount of labeled data needed as well as the need to hire experts for many important tasks related to industrial automation.

In this blog, we show results from our research to prove that high quality results without requiring a large labeled dataset. We promise to keep technical details to a minimum.

The case study illustrates the case of a surface defect encountered during metal processing in manufacturing.. We use 300 samples each of 6 different types of defects, a total of 1800 samples. The images used are 32 pixels by 32 pixels, which would correspond to a very grainy low-end embedded vision system.

Out of 300 samples per defect type, only 100 are assigned labels. The remaining 200 samples or images are used for training but without any labels. In most practical scenarios, images are typically available in good quantity, however the challenge is to assign accurate labels to tens of thousands of samples which is expensive and often infeasible. By significantly reducing the amount of labeled data, while feeding the remaining images without labels, we solve this problem.

With this set up, we achieve 90+% accuracy in identifying the defect class correctly among 6 different classes. The detection task is complicated due to the fact that several of these defects are visually similar to each other. Also, in this experiment, we only use a total of 600 labeled and 1200 unlabeled images. In practice, the amount of unlabeled data available is much higher, which will significantly improve the accuracy of the system. Improving image resolution should significantly improve results. We also did not attempt to use any image augmentation techniques to expand the dataset. Further, we did not attempt to optimize the neural nets for performance. The idea is to show that even with limited data and optimization, impressive results can be achieved. With more unlabeled data and/or image augmentation, tuning and optimization, higher accuracy is achieved.

To illustrate further how our neural nets are trained, we show examples of what the neural network thinks is the nature of the defect. In figure 1 below, we have actual samples of the defects we trained on. In figure 2, we show the neural network's 'mental model' of these defects.

Figure 1. Real Images of defects (surface defects in metal processing )

Tile of real defect images

Figure 2. Images from the 'mental model' of the neural network. These are generated/painted by the neural nets based on its understanding of the dataset.

Tile of the neural network's mental model of the defects

As we can see, the neural nets are able to understand nuances of the defects visually and reproduce them, even paint new images of defects based on its understanding. This mental model plays a big role in its ability to achieve high accuracy. With traditional approaches (also known as discriminative approaches), such a small amount of labeled data leads to a significant drop in accuracy to 60% with a similar setup.

Our mission at Reflective AI is to help companies access advanced technology required to make industrial automation as simple as possible. For further information on this work and projects related to artificial intelligence for machine vision and industrial automation, please contact

Featured Posts
Posts are coming soon
Stay tuned...
Recent Posts
Search By Tags