Scrabble is a volatile game. It’s not uncommon for underdogs to make tournament upsets. Why? Luck. It plays a large role in Scrabble, and efforts to remove it, by changing tile values, for instance, have mostly been in vain.

Still, Scrabble skills matter. For those looking to improve them, it’s natural to wonder how much they can trust their wins or losses as indicators of progress. If a player wins by 20 percent, was it just a fluke? What about 50 percent? It turns out that players shouldn’t get too confident when winning by these margins. So how do you tell when you’ve leveled up? An AI-driven Scrabble game can help give us concrete answers, and a web app can help us apply those results.

By having a computer play against itself, we can see how much scores can vary when two players are equally skilled. Using the Scrabble AI software package called Quackle, I have collected more than 7,700 AI-v-AI Scrabble games. The results starkly show how players that are identical in every way can still have a dramatic range of scores.

On average, the tests show that equally-skilled players should expect an 18 percent difference in their scores. For example, if Alice beats Bob 455 to 545—winning by 90 points, or 18 percent—it is entirely within the realm of possibility that Alice and Bob are actually equally talented. The largest difference in scores was 81.3 percent; despite using the same strategy, one player earned 623 points, and the other earned 263. If I were to lose by more than 90 points, I might be left feeling that the person I was playing was definitely better than me, but these tests prove that it’s not unlikely that, were we to play again, I’d have a fighting chance.

Gauging opponents can help players parse the talented from the lucky.

Another quirk revealed in the analysis is that the computer who goes first won more than 56 percent of games, suggesting that there is a notable advantage to playing the first hand. (Perfectly equal games made up only 0.02 percent of the dataset.) There’s no way to know for sure what specifically causes this advantage, but perhaps giving one player initial control of the board has some beneficial effect.

There’s no straightforward way to apply these results to real-life scenarios; however, some statistical methods can help Scrabblers place their scores in context. By visiting this website, players can plug in their Scrabble scores to get an idea of how confident they should be in their wins and losses—the amount of luck in Scrabble can make it difficult to judge one’s own skill level. Hopefully, the web app can help give players a data-driven perspective when judging where they are and how they can improve.

This online calculator doesn’t consider who went first or any kind of complicated metrics. It simply gives each winner a score proportional to the distribution above. For example, consider a game where player one beats player two by 60 percent. Since less than 1 percent of our AI-v-AI test cases score with a percent difference higher than 60 percent, we can say with 99 percent confidence that player one is better than player two.

The patterns shown above would likely reveal themselves over the course of analyzing many real-life Scrabble games, but when data is scarce, this technique can be helpful. For those interested in predicting game results of any kind (as many in the data-science community tend to do), running in-house simulations can offer an idealized environment to learn the inherent nature of a game. Gauging the variance of “perfect” opponents can help us parse the talented from the lucky, and make predictions more confidently.

Kevin McElwee is a software engineer based in Washington, D.C. As a freelance journalist, his articles have appeared in the Columbia Journalism Review and Discovery, Princeton University’s research magazine. In 2017, he reported from Moscow for The Ground Truth Project.