Scientific Blogging has a nice article today detailing the data behind the Netflix Prize and stepping the reader through the tree of plausible models, from the simple 1-parameter model (assign all movies the same rating) through increasingly elaborate models. The article also reveals some interesting anomalies in the data, such as the discontinuity shown in the average movie ratings assigned over time:
A second, somewhat more appealing hypothesis is that the text accompanying the star ratings changed from an objective scale (excellent, good, fair, …) to a subjective scale (loved it, liked it, …), and people are less accommodating when they think they need to be objective.



Comments