by Anusua Trivedi, Microsoft Data Scientist
This is a blog series in several parts — where I describe my experiences and go deep into the reasons behind my choices. In Part 1, I discussed the pros and cons of different symbolic frameworks, and my reasons for choosing Theano (with Lasagne) as my platform of choice.
Part 2 of this blog series is based on my upcoming talk at The Data Science Conference, 2016. Here in Part 2, I describe Deep Convolutional Neural Networks (DCNNs) and how Transfer learning and Fine-tuning helps better the training process for domain specific images.
Please feel free to email me at firstname.lastname@example.org if you have questions.
The eye disease Diabetic Retinopathy (DR) is a common cause of vision loss. Screening diabetic patients using fluorescein angiography images can potentially reduce the risk of blindness. Current trends in the research have demonstrated that DCNNs are very effective in automatically analyzing large collections of images and identifying features that can categorize images with minimum error. DCNNs are rarely trained from scratch, as it is relatively uncommon to have a domain-specific dataset of sufficient size. Since modern DCNNs take 2-3 weeks to train across GPUs, Berkley Vision and Learning Center (BVLC) have released some final DCNN checkpoints. In this blog, we use such a pre-trained network: GoogLeNet. This GoogLeNet network is pre-trained on a large collection of natural ImageNet images. We transfer the learned ImageNet weights as initial weights for the network, and fine-tune these pre-trained generic network to recognize fluorescein angiography images of eyes and improve DR prediction.
Using explicit feature extraction to predict Diabetic Retinopathy
Much work has been done in developing algorithms and morphological image processing techniques that explicitly extract features prevalent in patients with DR. The generic workflow used in a standard image classification technique is as follows:
- Image preprocessing techniques for noise removal and contrast enhancement
- Feature extraction technique
Faust et al. provide a very comprehensive analysis of models that use explicit feature extraction for DR screening. Vujosevic et al. build a binary classifier on a dataset of 55 patients by explicitly forming single lesion features. These authors use morphological image processing techniques to extract blood vessel, and hemorrhage features and then train an SVM on a data set of 331 images. These authors report accuracy of 90% and sensitivity of 90% on binary classification task with a dataset of 140 images.
However, all these processes are very time and effort consuming. Further improvements in prediction accuracy require large quantities of labeled data. Image processing and feature extraction of image datasets is very complex and time-consuming. Thus, we choose to automate the image processing and feature extraction step by using DCNNs.
Deep convolutional neural network (DCNN)
Image data requires subject-matter expertise to extract key features. DCNNs extract features automatically from domain-specific images, without any feature engineering techniques. This process makes DCNNs suitable for image analysis:
- DCNNs train networks with many layers
- Multiple layers work to build an improved feature space
- Initial layers learn 1st order features (e.g. color, edges etc.)
- Later layers learn higher order features (specific to input dataset)
- Lastly, final layer features are fed into classification layer(s)
Convolution: Convolution layers consist of a rectangular grid of neurons. The weights for this are the same for each neuron in the convolution layer. The convolution layer weights specify the convolution filter.
Pooling: The pooling layer takes small rectangular blocks from the convolutional layer and subsamples it to produce a single output from that block.
In this post, we are using GoogLeNet DCNN, which was developed at Google. GoogLeNet won the ImageNet challenge in 2014, setting the record for the best contemporaneous results. Motivations for this model were a simultaneously deeper as well as computationally inexpensive architecture.