Does Scrabble Need To Be Fixed?

An experiment in controlling how much of Scrabble is luck.

You can find Lynda Woods Cleary playing Scrabble every Tuesday at a Panera in Princeton, NJ. Cleary, a 68-year-old retired financial consultant, has been playing every week for 20 years since founding the Princeton Scrabble Club in 1998. When I asked her if she’s ever disappointed to draw certain tiles, she looked surprised, even hurt. “Oh no,” she said with an Alabama twang. “I want each and every one.”

It’s a sweet sentiment, but according to a 2014 statistical program written by Joshua Lewis, then a Ph.D. candidate at the University of California, San Diego, it isn’t a sensible one. His study showed that there are “lucky” tiles in Scrabble: A “Q” is harder to place on a board than a “Z,” and yet both are worth 10 points. Therefore, it’s luckier to draw a “Z” than a “Q.” Lewis argued that the traditional values associated with each letter diminish the role of skill in the game, and recommended changing them to make Scrabble scores more indicative of skill.

The suggestion was picked up by the BBC, the Huffington Post, and TIME, among others. As you can imagine, traditionalists like Cleary were dismissive of Lewis’ suggestion. Some of them were passionately opposed. John Chew, then the co-president of the North American Scrabble Players Association, titled a two-part, 2,600-word diatribe in response to Lewis’ suggestion, “Catastrophic Outrage.”

Recently, I conducted my own tests to see if Lewis’ values really make Scrabble more fair. In short, Lewis was wrong. His values don’t reduce the element of luck in Scrabble. The tests also show, however, that traditional tile scoring isn’t more fair than random tile values. If we want to make Scrabble scoring more indicative of skill, we’ve been looking at the wrong part of the game for years.

The professional Scrabble world is well aware of how important luck is to the game. “The community mostly agrees it’s around 15 percent luck,” said John Williams, the former head of the National Scrabble Association. He mentions the same number in his book Word Nerd. “But some people say it’s more like 20 percent. Somewhere in that range.” Lewis’ hypothesis is that the degree of luck can be tuned by changing the point values of tiles.

The creator of Scrabble, Alfred Butts, first determined the values by hand in the 1930s by studying letter frequency, including of articles published on the front page of The New York Times. Lewis calculated new tile values, reasoning that after 80 years, with new words added to the dictionary, Scrabble could use an update. He wrote a program that used multiple metrics, considering a letter’s frequency in the dictionary and the typical word length, to calculate the difficulty of placing it. His results recommended, for example, that “Q” remain at 10 points, but that “Z” drop to 6.

Scrabble’s complexity makes any clear-cut answer inherently difficult.

In theory, the slimmer the average difference in score over many games between two identical opponents, the less luck is inherent in a game. This fact can be used to test whether Lewis’ new tile values actually reduced the role of luck, and increased the role of skill, in Scrabble scores. I set up a Scrabble AI program called Quackle to play against itself with various tile values. The program was developed by competitive Scrabble players at the Massachusetts Institute of Technology and is commonly used in the professional Scrabble community to train. Williams’ estimate turns out to be pretty accurate. Thousands of self-play Scrabble games revealed that the traditional tile values produce an average difference of approximately 18 percent in final scores between two identical players. But Lewis’ suggested values not only fail to reduce this difference, they actually cause a small (but statistically significant) increase: from 18 to 19 percent.

Why did the amount of luck in Scrabble stay approximately the same even after shifting to Lewis’ tile values? The tests don’t give a definitive answer to this question: Scrabble’s complexity makes any clear-cut answer inherently difficult. But some Scrabble enthusiasts think it’s because of players’ adaptive strategy. “You learn how to use [your tiles] even if you have a rack with five vowels on it,” Cleary explains. “That’s what you do.” Stefan Fatsis, author of Word Freak and expert-rated Scrabbler, was one of the first to point out that Lewis’ findings, though interesting, misinterpreted how a typical Scrabble player plays. He was unsurprised by what my Quackle experiment showed. “It doesn’t matter what values you give me,” he said. “I’m going to work with what I have and make the best of it.” In other words, players recognize the difference between the nominal point value of a tile and its real value, and strategize around that. It is part of the game.

If that’s true, we should be able to assign quasi-random values to Scrabble tiles, and still have the same results. Naturally if “Q” was worth 500 points and the rest of the tiles worth 1 point, then Scrabble would become purely a game of luck, but after assigning random tile values between 1 and 10 ( “A” equal to 5, for example, and “Q” equal to 1) with a similar standard deviation and average to the traditional tiles, we get the same 18 percent different, on average.

So is there not much luck to a game of Scrabble? There is—but it has to do with a completely different part of the game.

A Scrabble bingo happens when one player plays all seven tiles in one turn, earning an extra 50 points.

Most Scrabble amateurs don’t need to worry about bingos bringing an unfair element of luck into their game. Most of us go tens of matches without witnessing a single bingo appear on the board. For professional Scrabblers, however, bingos are a lot more common, and they may be more about luck than skill. In the Quackle tests, only in 1 of 4 games do both players get an equal opportunity to play bingos. Lewis’ values may have slightly increased the role luck plays simply because they had a lower average value (2.74) than traditional tiles (3.22). It’s harder for a player with one fewer bingo to catch up if tiles are simply worth less.

If we devalue the bingo by setting all tiles equal to 50, the average percent difference falls to 14 percent. Furthermore, when players earn an equal number of bingos, the percent differences in scores for all different kinds of tile distributions—the traditional values, Lewis’ values, all tiles equal to 50, even random values—are roughly equal.

Therefore, Scrabble would be more indicative of skill if we remove bingo altogether, or devalue the bingo by proportionally increasing all tile values. But after making all our “fixes,” we would no longer be playing Scrabble.

Some, like Cleary, won’t be interested in that change: “I like the tiles just the way they are.”

Kevin McElwee is a software engineer based in Washington, D.C. As a freelance journalist, his articles have appeared in the Columbia Journalism Review and Discovery, Princeton University’s research magazine. In 2017, he reported from Moscow for The GroundTruth Project.

