Scrabble is a game that involves both skill and luck. There's skill in knowing the words you can play and — especially — the most advantageous ways to play them. But there's also luck in the tiles you draw randomly from the bag: get saddled with a rack containing four I's and there's usually not much you can do. That's why professional Scrabble tournaments are decided by playing multiple games between each pair of players. Tournaments do this to average out the variability in tile draws between players, to make the deciding factor skill rather than luck.
But how much does luck affect a typical Scrabble game? Andrew C Thomas, a professor in Statistics at Carnegie Mellon University, came up with an ingenious idea to test this. (Thomas's research will soon be published in a paper, a draft of which is now available on Arxiv.org.) What if we could observe a game between a couple of equally-matched players, where we "fix" the luck factor by determining the tiles each player gets in advance? Then, we can eliminate the "skill" factor by having those players re-play the fixed game many times: they'll get the same letters, but might make different strategic plays during the game. Each player's scores will vary over the series of games, but on average, the player with the better "luck" — in other words, the better pre-determined sequence of letters — will have the higher average score.
There are two problems with this approach. First of all, it's not practical to get expert players to play the same game over and over with consistent results. Thomas solves this by using open-source Scrabble AI software instead. (To avoid the robot players making exactly the same moves each time, ha adds a small random factor to the AI decision-making process, by weighting the future value of any given move up or down a point or two.)
The second problem is more of a mechanical one: how can you guarantee that each robot player will get the same sequence of letters each time? In Scrabble, each player may play anywhere between one and seven tiles each move (with a 50-point "bingo" bonus for all seven), or play none at all and exchange some tiles for a new set randomly selected from the pool. The scheme Thomas comes up with to address this is very clever: rather than have each player draw from the same sequence, he pre-generates one sequence of tiles and has each player draw from opposite ends, as shown in this diagram from his paper:
In this diagram, Player 1 drew seven tiles from the left of the sequence to create the rack; Player 2 drew from the right. This way, each player gets the same sequence of tiles in the repeated games, regardless of the number of tiles played each move. Or nearly so: if Player 1 plays many long words in a game, he may access a tile toward the right of the sequence he doesn't usually get. And tile exchanges, which are mixed in with the reserve sequence, add more variability. But in general, the more a letter is towards the left of the sequence, the more likely Player 1 will get to play it, and vice versa.
Again, in a real Scrabble game it would be impractical to lay out the tiles in a pre-determined sequence like this (especially without the players seeing them!). Thomas solves this problem by simulating the games in software: code in the R programming language simulates the sequence of tiles, hands new tiles to the AI players, and then observes their final score. 100 simulated matches are played for each sequence: the average score difference between Player 1 and Player 2 is then a measure of how "lucky" that sequence is for Player 1. And Thomas repeats this process for 10,000 different random sequences, which allows him to do statistical analysis in R on how the tile sequence (or "luck") affects Scrabble games on average. For example, Thomas noted that most sequences where the Q was towards the left led to a point advantage for Player 2, and so in that sense Q is an "unlucky" tile to get.
Thomas takes this analysis even further: when you get a high-value "power tile" like a Q or a Z also makes a difference. Getting a Q early in the game when there are few options to play it is bad; getting it later in the game when the board has more options is better; letting your opponent draw it is best. These options are reflected in where in the initial sequence (used by both players) the Q falls: towards the left, in the middle, or on the right. Using this method, Thomas maps average player's scores for Q, J, X, and Z depending on where in the sequence they fall:
To the left of the chart, Player 1 has each tile early in the game; towards the right, Player 2 has it. In contrast to the Q, the X is generally beneficial to the player who draws it. Using these techniques, Thomas finds the following conclusions about tiles:
- The blank is worth about 30 points to a good player, mainly by making 50-point "bingo" plays possible.
- Each S is worth about 10 points to the player who draws it.
- The Q is a burden to whichever player receives it, effectively serving as a 5 point penalty for having to deal with it due to its effect in reducing bingo opportunities, needing either a U or a blank for a chance at a bingo and a 50-point bonus.
- The J is essentially neutral pointwise.
- The X and the Z are each worth about 3-5 extra points to the player who receives them. Their difficulty in playing in bingoes is mitigated by their usefulness in other short words.
(Of course, all of these conclusions will depend on exactly which Scrabble dictionary you're using: there are a lot more words available to play from the OED Chambers-based SOWPODS dictionary, and I presume this is based on the official TWL dictionary used in American Scrabble game. I'd love to see the effect on this chart of using British rules.)
Thomas also finds that the player who goes first generally has an advantage, to the tune of about 14 points. So if you're a gracious Scrabble player, let your opponent go first.
AC Thomas: Statistics and Scrabble, Together At Last
Thanks for the shout-out, David! This is using TWL; I'll have to try it with the SOWPODS dictionary for sure.
Posted by: Andrew C. Thomas | July 26, 2011 at 11:18
This is a nice summary of the paper. I was going to blog about this, but now I don't have to!
I'll add that the simulation of the two Scrabble "Bots" is accomplished by writing an R wrapper to a the underlying C++ code. Rather than re-implement a difficult bit of analysis in R, he just calls it directly, which is always a smart technique.
Maybe you or Andrew can answer a question that puzzled me. On the histograms on p. 7, the first graph shows the variation in score that is attributed to variation in the pulled tiles. But what is the second? The paper says "standard deviation of score differences," but why are all the standard deviations greater than 30? Weren't there any close games in the 10,000 simulations? I'm missing something....
Posted by: Rick Wicklin | July 26, 2011 at 12:25
What happens to that pregenerated double-ended sequence of tiles when one player throws back a bunch of his letters? Or do the AI players never throw anything back, no matter how bad it is?
Posted by: Anonymous Cowherd | July 26, 2011 at 22:56
SOWPODS is based on OSPD and Chambers Dictionary, not the OED. Its successor, the Official Tournament and Club Word List, is based on SOWPODS and Collins Dictionary.
Posted by: Paul G | July 27, 2011 at 07:00
@AC - the AI players do exchange when it makes strategic sense. When one of the bots exchanges, the replacement tiles are randomly selected from the Reserve list (see Fig 2 above) and the old tiles randomly inserted back into it. It's explained in the arXiv paper.
Posted by: David Smith | July 27, 2011 at 07:53
@Paul - thanks for the correction. I have a copy of Chambers sitting here on my desk that I used to use when playing in the UK, so I should have remembered that. I miss ZO and KI.
Posted by: David Smith | July 27, 2011 at 07:54
It seems to be an experiment to show the Q is unlucky (by 2 points?) in the grand average.
Some players might be better than dealing with Q than others so it suggests the idea of Q-skill
Not only are there 26 different letters, but there are a large number of racks to draw so scrabble players have skills that might be measured by a long list of different rack combinations. Someone might have learned all the FISH words while someone else might have spent time studying words with IEST. So FISH is luckier for the first player, but IEST is more likely so the second player probably benefits more overall.
Posted by: Kevin Leeds | July 27, 2011 at 10:49
It might be distracted and missing something, but isn't this paper implying that Scrabble is merely about maximizing your points. Something akin to what Neoclassical Economist say about humans as rational profit maximizers?
I've played lots of Scrabble and think this seems naive. You are trying to make sure that your score>opponents score. In other words, defense matters as well as offense. As much as I enjoy these simulations, I don't think they pick up on that idea and so miss enormous nuance.
I agree luck matters, though.
Posted by: Tobin's Q | July 27, 2011 at 18:25
This is very interesting analysis, mind if I cite some of it (linking back here naturally) to my Scrabble mailing list?
The negative effect of the Q fits in with a lot of advice I've read from advanced players, who tend to ditch it as soon as they can. The X is relatively versatile, as you can use it as a two-letter word with any vowel tile.
I wonder what weightings for 'luck' of the tile vs skill of the player look like - is the simulation playing the 'best possible' word each time?
Posted by: Scrabble Word Finder | July 28, 2011 at 12:30
I don't think the end result of the experiment (luck vs. skill in scrabble) is very interesting or surprising. I would rather see where Scrabble falls in the "how much of it is luck" scale vs. other popular games, poker, backgammon, chess, Go and so on.
Posted by: DB | August 02, 2011 at 13:59
I'm working on a nano module that plugs into your brain to give you access to every known word but for now you will have to use http://scrabblecheat.com
Posted by: steve | August 03, 2011 at 03:55
This is all very well and good, and duplicates a lot of results already known to professional Scrabble® players. But I think the study is flawed in one important area -- bots and expert players don't play in the same way. For one small example, bots will never bluff, nor will they fall for a phoney, no matter how plausible.
Posted by: turnip | August 05, 2011 at 04:39
These are interesting numbers, but based on the comments, people want to know more. DB makes a valid point, it would be interesting to play this data against other games, like backgammon online.
Posted by: Alissa | December 27, 2011 at 22:08
I really enjoy to read this awesome blog post.
Inspiring!
Posted by: Scrabble Wordfinder | February 02, 2012 at 02:45
Hey, everybody that has criticism, or doesn't think this is interesting, why don't you
1. carry out your own experiments on scrabble with said criticism or
2. read something else.
I found it very interesting and was thinking about this problem for a few weeks before I found it.
Posted by: Evelyn | April 11, 2012 at 10:14
Factoring out luck is easy : just show all players the same set of randomly drawn letters ! Then let every player privately enter his best solution on the board. Each player adds his own score to his personal total. The solution with the highest score is put on the common board, and all players continue with the same rack of remaining letters plus the newly drawn ones. You can play with an infinite number of players as long as you fix a time limit per draw. Leaving the game causes no disturbance. Entering a new game can happen on fixed times - every hour or so.
I'd like an iPhone version in Dutch, please !
Copyright Bart Viaene - Leuven, Belgium if I'm the first with this idea :-)
Posted by: Bart Viaene | April 17, 2012 at 14:20
Wow, I did some work on Scrabble algorithms and must have missed this one. Thank you for the great article. While reading I was thinking about other approaches to solve this.
In reality I recently talked with a finalist from a national Scrabble tournament and she kind of admitted that she lost against a very good opponent and not because of bad luck :)
Posted by: Thomas | May 12, 2012 at 11:01
Great Article! Luck def is a huge component if you have players of similar skill levels. If you had 2 robots going against each other, it 100% would come down to luck of the draw.
Posted by: Word Finder | April 01, 2013 at 13:15