Given a short review of a product, like "I couldn't put it down!", can you predict what the product is? In that case it's pretty easy — it's for a book — but this general problem of text categorization comes up in a lot of natural language analysis problems. In his talk at useR!2017 (shown below), Microsoft data scientist Angus Taylor demonstrates how to build a text categorization model in R. He applies a convolutional neural network (trained using the R interface to the MXNET deep learning platform) to Amazon review data, and creates a small Shiny app to categorize previously-unseen reviews. The talk also provides an brief introduction to convolutional neural networks and one-hot encoding, if you haven't come across those concepts before.
The model Angus uses in this example is described in more detail in the blog post, Cloud-Scale Text Classification with Convolutional Neural Networks on Microsoft Azure. If you'd like to implement something similar yourself, you may also want to check out this tutorial on deep learning for text classification, where you can find code and sample data, and explains how to use GPU-accelerated clusters in Azure to speed up the training process. The GPU-enabled Data Science Virtual Machine on Azure contains everything you need to implement the text categorization models described there.
Channel 9: Deep Learning for Natural Language Processing in R
Comments
You can follow this conversation by subscribing to the comment feed for this post.