« Because it's Friday: Illusions sell | Main | Big Data Analytics predictions for 2014 »

December 30, 2013


Feed You can follow this conversation by subscribing to the comment feed for this post.

Great post. I took the liberty to build a little app around it (just for a demo). http://www.econinfo.de/2013/12/31/clustering-whiskies-by-taste/

Very nice analysis and graphics. We had a bottle of 18 year old Talisker at my wedding and it was wonderful; pleasant amount of peat with a lot of complexity underneath. Hope you enjoy :)

You correctly discovered that Talisker is, in fact, the perfect Whisky.

All kidding aside, this is awesome work and just the right mixture of nerdiness and tastiness.

Nice work! Looking forward to playing around with your code examples.

I'm assuming the data set is several years old, since it doesn't have Kilchoman on Islay. They opened in 2005, and while I haven't tried all their expressions, their Machir Bay should fit well in cluster 4. (And I wonder if flavor profiles for any of these 86 scotches have evolved/changed since the data was put together?)

But ah yes, Talisker! It was to me what Laphroaig was to you--the one that first introduced me to the wonders of peat. You're going to love it!

Playing around with the number of clusters, it's amazing how stable the Ardberg-cluster is, it has the same members with 4 or 9 clusters.

Yes, Talisker is your next stop. Either that, or you need to explore the landscape in a different way, to bound your tastes (ie go as far away as possible, finding something you dislike, and closing in on the boundaries of your taste - perhaps, to achieve that, you may need to go well beyond Scotch, if it is possible that you like all Scotches or even all whisky/ey).

great job :-p and it happens at the right time!! :-D
I am a little sad, because I'm not able to read end the command line in almost all cases!

It'd be interesting to have tasting notes for all expressions for one distillery and do the same thing for them - how are they grouped, or does the additional ageing (or dare I say, blending) change their categories?

Awesome, I ran into this while drinking a Laphroaig. Funny how all the bottles I have right now are peaty. Laphroaig, Talisker, Lagavulin.

There is at least one other scotch tasting site with hundreds (possibly thousands) of individuals' tasting notes with which I believe a similar analysis could be performed. I know the site's admin personally and even suggested he consider a "genome"-type analysis such as the Music Genome Project a few years ago to characterize the subjective experiences in a good dram. If interested in a introduction to the admin, hit me with an email.

This analysis will not tell me anything about the difference between Ledaig and Tobermory. Who knows how it would choke on all the outputs of Bruichladdich over the last decade!

Garbage in, garbage out. First off, Laphroaig is not *just* peaty; it's also got strong notes of iodine, seaweed and such. And that's just the house style. The difference between, say, Laph 10 OB, Laph new make, and a sherry finished independent bottling of Laphroaig at 25 years is wildly different even within that house style. At least you figured out that Islay malts might be a bit smoky, so yay that I guess.

(The main way one figures out peat levels in a bottle of malt is the peating level as measured in ppm phenol. Do that and you'll see a closer correlation between sensed peat and ppm phenols. This is not going to be a perfect correlation, as aged malt tends to whack you less with the peat reek than younger malts.)

Secondly, the data is not granular enough and is of unknown provenance. There is a list of malts with a series of categories rated 0-4. Which bottlings of the distillery? Who did the sensory analysis? Why those categories? I realize that you're just experimenting with a readily available data set, but you'll miss out if you rely on it exclusively. The data sets from the scorings at the Malt Maniacs's website (here) is better as a data set, though it's twelve evaluators only and there's only an aggregate score without location (hint: location is not necessarily causation). Another option would be the word clouds that pop up on this nose's website (here, and make sure to check the SGP scores for more useful details).

If you like Laphroaig, then Islay malts might be up your alley. If you just want to get hit in the head with a brick of dried peat, then try younger peated malts, and don't forget off-island ones like Ledaig or weird ones like Lost Spirits' Leviathan or Balcones Brimstone. Perhaps this analysis can indicate the limitations of the "plug and chug" method before someone tries to do this with, say Abstract Expressionist art....

Another lesson implied by this analysis: It's better sometimes to have someone who knows the subject deeply get a quant to crunch data than the other way around. You should see some of the odd number crunching the Maniacs and related friends do...

I think you would like to have a look at http://www.amazon.de/gp/aw/d/B0092L4ZXQ/

My friend and I liked Bunnahabhain. It was just the regular one. I am sure the 12 and 25 year olds are better.

Shouldn't we start with the question, "Are there clusters of whiskies by flavor at all?" before asking what characteristics each cluster has and the geographical distributions of them? Nothing you've shown refutes the null hypothesis that each flavor component is independently distributed amongst the whiskies. If that's true, then the whole cluster analysis is a waste of time, no? Something as simple as a multi-dimensional scaling plot of all the whiskies should quickly reveal whether there appears to be evidence of clustering or not.

Evan--I would love to have a look at that site, whether or not the data are used in this manner. I'm currently expanding this (fantastic) idea to use a fuzzy-c means clustering (so each scotch is not locked into only one cluster), and would love to access that dataset.

When I finish that code (if I ever get to it) I'll post a link here to the source for everyone, though it will probably be in Python instead of R. Thanks Luba for a great post/idea (I'll cite you on all future work of course)!

This is very nice. I tried this using spherical k-means, which uses the cosine distance, instead of kmeans. The cosine distance might make more sense if you want to match each of those 12 qualities. Choosing k=4 clusters and your criteria of full bodied, smoky and medicinal, I got the same six whiskies you get plus a few more. This is the full list:
Ardbeg Balblair Bowmore Bruichladdich Caol Ila
Clynelish GlenGarioch GlenScotia HighlandPark Isle of Jura Lagavulin Laphroig Oban OldFettercairn OldPulteney
Springbank Talisker Teaninich Tormore

The mean for this group was Isle of Jura. The grouping does not seem to depend on geo coordinates. I don't drink whisky but someone who does might be able to make sense of this.

My favourite for many years was Lagavulin, and I've always loved the Islay malts, so when I visited Scotland in 2007 I was utterly surprised to discover Mortlach, from Dufftown, eclipsed it for me.

So we bought a bottle of Mortlach and brought it home to New Zealand for some theoretical special occasion that was never quite good enough and a few days ago I was looking at it, still unopened after seven years in the cupboard.

In a few months we're moving to the other side of the world with Scotland only a short ferry journey away. Friends were around so I decided to crack it open and it is just absolutely as I remember it, once again blitzing all memory of Bowmores, Laphroig, Talisker, Scapa or even Lagavulin.

I suspect the smokier "Talisker Storm" would be a better fit for cluster 4, and probably more in line with your other tastes.

Nice stats! I have a bottle of coal ila, really good whisky! So just try the 12 year old, good quality for little money

The comments to this entry are closed.

R for the Enterprise

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid

Search Revolutions Blog