by Michael Helbraun
The software business includes travel, and that means hotels. The news that Marriott was acquiring Starwood was of particular interest to me – especially since more than 75% of my 95 nights so far this year on the road have been spent with one of those two companies.
While other folks can evaluate if the deal makes sense financially, I was just curious how this might affect a business traveler. Looking at the news there are those optimistic and plenty concerned. Granted, many of these details on how the loyalty programs will be combined won’t be known for some time, but what we do know is where each company maintains properties.
With 4200+ Marriott and 1700+ Starwood properties I was curious where there might be overlap, and how well the deal would help Marriott to grow in new markets. Luckily R can help in this regard.
The first thing to do is to put together a data set. It would have been nice if the companies had cleaned spreadsheets available publically, but as is normally the case we end up spending a good portion of time gathering and preparing data. In this case scraping, and formatting the data from SPG and Marriott into a spreadsheet with all their property locations. While I won’t go into data cleaning here, for a one time effort on just a few thousand rows of data this was pretty straightforward to do in Excel.
After I had all locations for all properties it was time to bring that data into R to start the analysis. First I was curious where each firm had the most properties – simple to do with a cross tab. NYC seems a logical top 5, but Houston and Atlanta, interesting:
Top 10 Marriott Locations
Top 10 Starwood Locations
So far so good, but to actually put these on a map it’s much easier if the data has latitude and longitude. The geocode function within the ggmap package makes this easy; resolution is done against the Google API, and is limited to 2,500 requests/day - so be sure to use save/load. (Note: there are more than 2500 locations here so I split the task up across a couple machines. There are other free geocoding options with higher daily limits if you have more data points, like using Bing, but that’s a REST based approach.)
marGeocoded <- cbind(locations, geocode(locations))
locations <- hotToGeo
hotGeocoded <- cbind(locations, geocode(locations))
Once the lat/long coordinates are merged back into our data set there are a number of ways to plot the results. I’m a fan of the globe plots within Bryan Lewis’s excellent rthreejs package. This allows you to stretch a 2D image over a globe which you can then plot on top of and interact with. Here I’ve plotted all the Marriott properties in orange and the Starwood properties in yellow:
After this it seemed like there was the most overlap in the US and Europe. To create a static plot ggmap is very quick:
# Europe map with ggmap
eurPlot <- qmap(location = "Europe", zoom = 4, legend = "bottomright", maptype = "terrain", color = "bw", darken = 0.01)
eurPlot <- eurPlot + geom_point(data = combGeocoded, aes(y = lat, x = lon, colour = firm, size=Counts, alpha=.2))
(eurPlot <- eurPlot + scale_size_continuous(range = c(3,10)))
If we want to create something within an interactive zoom the leaflet package is another useful one. It leverages Open Street Map and allows you to pan and zoom:
Aggregating and deriving value from low value info is a great use of R, and this sort of analysis is fun as it gives some additional perspective into a current event. If you would like to play around with this, a copy of the script Download Merger analysis and relevant data files are available Download HotGeocoded and Download MarGeocoded – let us know what you find in the comments.