Microsoft Cognitive Services provides several APIs for image recognition, but if you want to build your own recognizer (or create one that works offline), you can use the new Image Featurizer capabilities of Microsoft R Server.
Training an image recognition system requires LOTS of images — millions and millions of them. It involves feeding those images into a deep neural network, and during that process the network generates "features" from the image. These features might be versions of the image including just the outlines, or maybe the image with only the green parts. You could further boil those features down into a single number, say the length of the outline or the percentage of the image that is green. With enough of these "features", you could use them in a traditional machine learning model to classify the images, or perform other recognition tasks.
But if you don't have millions of images, it's still possible to generate these features from a model that has already been trained on millions of images. ResNet is a very deep neural network model trained for the task of image recognition which has been used to win major computer-vision competitions. With the
rxFeaturize function in Microsoft R Client and Microsoft R Server, you can generate 4096 features from this model on any image you provide. The features themselves are meaningful only to a computer, but that vector of 4096 numbers between zero and one is (ideally) a distillation of the unique characteristics of that image as a human would recognize it. You can then use that features vector to create your own image-recognition system without the burden of training your own neural network on a large corpus of images.
On the Cortana Intelligence and ML blog, Remko de Lange provides a simple example: given a collection of 60-or-so images of chairs like those below, how can you find the image that most looks like a deck chair?
First, you need a representative image of a deck chair:
Then, you calculate the features vector for that image using rxFeaturize.
Note that when featurizing an image, you need to shrink it down to a specified size (the built-in function
resizeImage handles that). There are also several pretrained models to choose from: three variants of ResNet and also Alexnet, which we use here. This gives us a features vector of 4096 numbers to represent our deck chair image. Then, we just need to use the same process to calculate the features vector for our 60 other images, and find the one that's closest to our deck chair. We can simply use the
dist function in R to do that, for example, and that's exactly what Remko's script on Github does. The image with the closest features vector to our representative image is this one:
So, even with a relatively small collection of images, it's possible to build a powerful image recognition system using image featurization and the powerful image recognition neural networks provided with Microsoft R. The complete code and sample images used in this example are available on Github. (Note, you'll need to have a license for Microsoft R Server or install the free Microsoft R Client with the pretrained models option to use the image featurization functions in the script.) And for more details on creating this recognizer, check out the blog post below.
Cortana Intelligence and Machine Learning Blog: Find Images With Images