At the JSM conference last week, I stopped by a great poster by Steve Salaga and Brian Mills, graduate students at University of Michigan's Department of Sport Management. The guys were clearly hockey fans, and had channelled their enthusiasm for a sport into an interesting statistical analysis of game and player data from the NHL. One analysis, based on a random forest model implemented in the R statistical language, looked at the characteristics of players selected for the NHL Hall of Fame:
We are attempting to gauge how well each player aligns with the views of the Hall of Fame Voting Committee and whether or not they were 'snubbed' based on how the committee would be predicted to vote.
According to their criteria, the key criteria for forwards and defensemen selected for the Hall of Fame are All-Star game appearances, assists, goals and (to a lesser extent) plus/minus and Stanley Cup wins. (Factors that appear not to much influence the committee's decision included shot percentage, and whether the player was French-speaking or not. Goalies were not included in the analysis.) On this basis, players that appear to have been "snubbed" by the selection committee include Kevin Lowe, Alexander Mogilny, Dave Taylor and Theoren Fleury.
Steve and Brian will be publishing their analysis in a paper soon, but in the meantime you can see more details in their poster (PDF. 3.1Mb) or at their blog post below.
The Prince of Slides: More on JSM
I have some issues with the selection process of the HHOF but I also wonder whether the people doing this analysis were really hockey fans. It isn't the NHL HOF. It's the Hockey HOF. It includes people - yes, in the Players category - who never played in the NHL, or whose playing careers included significant time outside the NHL.
The fact that they couldn't understand why Slava Fetisov and Igor Larionov were inducted makes this analysis very, very questionable. Didn't they ever see (or hear of, if they are too young) the Soviet Red Army team? Also, they questioned the induction of Bob Gainey, one of the greatest, if not the greatest, defensive forward in NHL history. We aren't talking about his GM career here, we're talking about his playing career.
I can't tell you why Clarke Gilles is in and Kevin Lowe is not (both played on dynasty teams - although different ones - and Lowe was a better player than Gilles, even though Lowe was a defenseman and Gilles was a forward). The HHOF votes are secret and nobody has a real clue what they are doing or thinking. The model they have put together does give us some clue - but I'm not sure the people who did it understand that's not all there is. Certainly, the fact that they don't even understand that the NHL isn't all there is presents a real issue.
Posted by: Sue Natan | August 10, 2011 at 13:55
@Sue That was me, David, not the poster authors that used the term "NHL HOF" -- you've exposed my hockey naivité :). There's a thread on Reddit where Brian Mills responds to some other questions about the study, which you might find informative.
Posted by: David Smith | August 10, 2011 at 14:48
Hi Sue.
Thanks for your comments about the project. The purpose of the analysis was to use the Random Forests classification algorithm to model voting behavior for induction into the Hockey Hall of Fame. Additionally, we wanted to see how the technique would perform in a scenario where performance statistics are not 100% indicative of the value of the player to the club. We feel professional hockey fits.
In regards to your specific comments:
1) We were not stating that we could not understand why Fetisov and Larionov were inducted. One potential limitation of the analysis is that it only includes NHL playing statistics as statistics from the European Leagues and Interational competition are not complete over the entire examination period. We were simply showing that based on their career NHL statistics, the model does not predict either player as a HOF inductee.
2) Nobody is debating that Gainey was an outstanding defensive player. Unfortunately, outside of +/- which is highly dependent on overall team quality, there is not a statistic avaiable which accurately proxies for defensive ability. The possibility of including post-season trophies won (ie - Selke Trophy) as an additional input is an option, but these awards are largely being proxied by all-star game appearances.
Thanks again.
S
Posted by: Steve | August 10, 2011 at 15:11