Rating Is Futile
My boss Shane Buettner has been taking the heat recently for eliminating our product review rating system. In addition to the comments on his blog of last week, he's also been reading a torrent of opinion in the magazine's regular stream of reader mail. When Shane notes that he dropped the ratings with the "complete support of HT’s staff," that definitely includes me. I had been trying to persuade my editors to kill the ratings long before Shane took over.
The memo below dates from April 2007. It was originally a behind-the-scenes editors-only email, but I'm going public with it because I want readers to understand what someone who has bestowed many HT ratings actually thinks of rating systems. For the moment, I don't have much to add to my original opus, except to say that I work like the devil to write reviews for the magazine--and I want people to actually read them, not just scan a few numbers. Nothing depresses a professional writer more than a system that encourages people not to read what I write.
I should also note that I wrote the following memo under the assumption that the rating system would not be killed but might be revamped. My overwhelming preference, as stated in the first graf, was always to discontinue the ratings and I'm glad it finally happened.
All rating systems are inherently flawed because they try to put an objective sheen on something that's inherently subjective. In the best of all possible worlds, we would not use ratings at all, especially since we already provide objective measurements. Since we don't live in the best of all possible worlds, let's move on.
There are at least two serious problems with all rating systems. One we're all aware of is Rating Creep. We want to be nice to our industry contacts, especially since many of them are advertisers, so our 100-point scale doesn't really have 100 points. If the reviewer never bestows anything below 85, it is a 15-point scale. If the threshold is 88, a 12-point scale. Etc.
The other problem is the Linearity Trap. Ratings are inherently linear and therefore misleading. For example, let's say there are only two factors in a build-quality rating. The enclosure is very well-made and is worth 95 points in itself. But the drivers are mediocre and therefore worth 85 points. If the reviewer gives them equal weight, she averages them out to 90 points. But in this example, this is just plain wrong because there's nothing 90-points-ish in any of the product's build-quality characteristics.
There is nothing we can do about the Linearity Trap except to radically increase the number of rating categories or (always my preference) abandon them altogether. However, there is a cure for Rating Creep, and that's to reduce the rating range. Some examples:
Pass/Fail System: Make a binary decision. You're done.
Three-Point System: Fail, pass, and wow this is the greatest thing ever. In operation, since we would tend to avoid products that fail, this is another binary decision: Is it the greatest thing ever, or just OK? The dominant rating is pass. This system would work best if reviewers were strongly discouraged to give the top rating except on special occasions.
Five-Point System: A B C D F. And no pluses or minuses. In operation this is way less than a five-point system, since anything C or below would be interpreted by readers and manufacturers as a failure. Again, the A rating is only for special occasions. The dominant rating is B.
Ten-Point System: In practice, four points: fail, 8, 9, and the rarely used 10. The dominant ratings are 8 and 9. Resisting Rating Creep is more difficult but not impossible.
100-Point System: Finally, here is a retooled 100-point system that Stewart Wolpin and I came up with at etown.com (we were overruled). The midpoint is 50. Everything above 50 is above average and everything below 50 is below average. The benefit of using a midpoint instead of a minimum is that it allows greater dynamic range. In practice there would be a floor and a ceiling but this system still allows a lot more latitude, and simply breaking old habits might make it ... interesting.
The argument in favor of a 100-point school-style rating system (like what we use now) is that people recognize immediately how it works. But Rating Creep reduces the dynamic range. At etown, using the system that was unwisely shoved down our throats, the official dictate was not to go below 70 or above 95, reducing the official dynamic range to 25 points in theory, and it was much less in practice. The argument against this system is that it is inherently prone to collapse. The more familiar it is, the more likely it is to slip.