Here’s a really interesting writeup by [Mike] that has two parts. He shows that not only is it possible to load wooden dice by placing them in a dish of water, but that when using these dice to get an unfair advantage in *Settlers of Catan*, observation of dice rolls within the game is insufficient to prove that the cheating is taking place.

[Mike] first proves that his pair of loaded dice do indeed result in a higher chance of totals above seven being rolled. He then shows how this knowledge can be exploited by a *Settlers of Catan* player to gain an average 5-15 additional resource cards in a typical game by taking actions that target the skewed distribution of the loaded dice.

The second part highlights shortcomings and common misunderstandings in current statistical analysis. While it’s possible to prove that the loaded dice do have a skewed distribution by rolling them an arbitrary number of times, as [Mike] and his wife do, it is *not* possible to detect this cheating in a game. How’s that? There are simply not enough die rolls in a game of Settlers to provide enough significant data to prove that dice distribution is skewed.

Our staff of statistics Ph.D.s would claim that [Mike] overstates his claims about shorcomings in the classical hypothesis testing framework, but the point remains that it’s possible to pass through any given statistical testing process by making the effect *just small enough*. And we still think it’s neat that he can cheat at Settlers by soaking wooden dice in water overnight.

This isn’t the first time we’ve seen *Settlers of Catan* at the center of some creative work. There’s this deluxe, hand-crafted reboot, and don’t forget the electroshock-enabled version.

[via Reddit; images from official Catan site]

Wait, am I missing a place where he did control rolls of his dice before putting them in water? Gaming dice are often not quite as well balanced as casino dice.

You’re right, they didn’t.

And wooden dice soak pretty much throughout even though they appear dry on the surface, so they may not in fact have loaded the dice at all, and they came out of the factory like that.

Quote from reddit: “Good point. I thought about doing a before measurement, but there was no way my wife would have agreed to that!”-PokerPirate

The question is, why didn’t they just leave more dice unmodified as a control group?

I don’t know much about Catan. Would the other players benefit from the screwed probability also?

The other players would probably not benefit. Loading the dice so each one is more likely to come up 6 means that high numbers (8,9,10,11,12) will be rolled more often and those tiles will pay out more resources. The cheater would therefore build his settlements on those tiles. Unsuspecting players would distribute their settlements evenly between high and low numbers and would therefore get a much greater share of the infrequently rolled low numbers.

Only if they also optimize their strategy for the distribution. If they put it together you’ll still probably be ahead by a couple turns.

I assume it’s a bit like card counting. Everyone has the same information but not everyone uses it the same.

Cheating is just a more efficient way of accomplishing a goal. ;)

Unless the goal is to win a fair game.

It would also be possible to make the 2, 4, 6, 8, 10 and 12 tiles much more valuable by using controlled rolls to make rolling doubles more likely. Alternatively you could do a controlled roll to make 7 more likely if you want to move the robber. See any guide for controlled throws in craps for the details.

Any observation [of rolls] is insufficient for proving that dice are loaded.

The best you can do is compare the dice rolls to a true random source and come up with a probability that the true random would have outputs like the dice in question.

While that probability gets ever smaller, it never actually hits zero. A coin that comes up heads ten times in a row *might* still be fair, because we know that in about 1/1000 tries even fair coins will do that.

The trick is to select a probability that you consider significant. In the scientific literature 5% is considered significant; meaning, if your results are less likely than 5% to be random, it’s considered significant.

That value, 5%, is just a gentleman’s agreement among researchers. There’s no fundamental or compelling reason for it, and depending on the circumstances you may want to use a different number. Medical studies sometimes use 1% as an extra margin of safety, because the damages from making a mistake are so high.

Then there’s also four, five, six sigma…

The size of the test (the 1% or 5% or whatever cutoff) determines the probability that a non-effect will be declared significant — that fair dice will be misclassified as loaded.

The power of the test, which is what is relevant in the OP’s situation, is the probability that a true effect — our loaded dice — will be detected. Power depends on the chosen size, the number of samples you’ve got, and the mathematical formulae used for the test statistic itself. Power always decreases as the size decreases, and increases as the number of samples increases.

The whole gimmick in the OP’s swindle is that he’s got an effect that’s so small that it’s not likely to be detected when the power of the test is low due to having very few dice rolls in a game. He _can_ find the effect by sitting around throwing dice with his wife all night.

(This isn’t new to anyone who’s taken college-level statistics classes, but it’s a great demo of the power/sample-size relationship, IMO.)

Well, that depends on what exactly he’s trying to detect.

Without a control group/test he cannot determine whether the dice are -loaded- or just -biased- i.e. whether deliberate cheating is happening or not.

That is btw. another point you should remember in the discussion about over-reliance on statistics.

Just because your hypothesis predicts a certain skew, doesn’t mean observing that skew supports your hypothesis. It can actually support a number of different hypotheses, so you may fall victim to a false dichotomy.

Margin of error. Any statistic that does not have that, is probably something trying to manipulate you. Or they have mislead themselves into thinking something.