COVID-19 Statistics: Reading The Tea Leaves

If you’ve been tracking the spread of the COVID-19 pandemic around the world, as we have, you’ve doubtless seen a lot of statistics. The raw numbers look shocking, and in many cases they are, but as always it’s crucially important to ask yourself what the numbers mean.

For instance, our own Tom Nardi put together a counter that displays the total number of COVID-19 cases in the US. It’s a cool project that puts together some web-scraping, a nice OLED screen, and a 3D-printed network display. When this is all over, it can be easily re-trained to show some other statistic of interest, and it’s a great introduction to a number of web APIs. However, it’s looking at the wrong number.

Let me explain. Diseases spread exponentially: the more people who have it, the more people are spreading it. And exponential curves all look the same when you plot out their instantaneous values — the raw number of COVID-19 cases. Instead, what distinguishes one exponential from another is the growth parameter, and this is related to the number of new cases per day, or more correctly, to the day-to-day change in new cases.

If left unchecked, and especially in the early stages of spread, the number of new cases grows every day. But as control efforts, mainly social distancing, take effect, the rate at which the number of new cases can slow, or even go negative. That’s the plan, anyway.

As is very well explained by this video from 3 Blue, 1 Brown, if this were a naturally spreading epidemic, the point at which the new cases just starts to decline marks the halfway point in the course of the disease. Here, we’re hoping that particularly strict quarantining procedures will cut this run even shorter, but if you’re interested in how the disease is spreading, the point when daily new infections turns around is what you’re looking for.

Why not put the daily difference in new cases on your desktop, then? These numbers are noisy, and the difference jumps all around. To be serious, you would probably want to put a moving average on the new cases figure, and look at that difference. Or simply show the new cases instead and look for it to drop for a few days in a row.

Still, this won’t be a perfect measure. For starters, COVID-19 seems to incubate for roughly a week without symptoms. This means that whatever numbers we have, they’re probably a week behind the actual situation. We won’t see the effects of social distancing for at least a week, and maybe more.

Further complicating things is the availability of tests, human factors like weekends when more people get tested but fewer government reporting offices are open, timezones, etc. (What happened on Feb. 13?)

I’m not going to go so far as to say that the COVID-19 stats that we see are useless — actually far from it. But if you’re going to armchair quarterback this pandemic, do it right. Plot out the daily new cases, maybe apply a little smoothing, at least in your head, and realize that whatever you’re seeing now probably represents what happened last week.

When you finally see the turning point, you may celebrate a little, because that means the halfway point was a week ago. We’ve seen it happen in China around Feb 2, and I’m looking forward to it happening here. I hope it happens wherever you are, and soon.

We will get through this. Stay safe, all. And keep yourself uninfected to keep others uninfected.

This article is part of the Hackaday.com newsletter, delivered every seven days for each of the last 212 weeks or so. It also includes our favorite articles from the last seven days that you can see on the web version of the newsletter.

Want this type of article to hit your inbox every Friday morning? You should sign up!

Florence Nightingale: The Lady With The Data

When you think of Florence Nightingale, you probably imagine a nurse with a lamp, comforting soldiers. Indeed, Florence is considered the mother of modern nursing. But she also made serious contributions in statistical data analysis, and used the diagram named after her, the Nightingale rose diagram, to convince the British Parliament to enact sanitation reforms that saved hundreds of thousands, if not millions, of lives.

During the Crimean war, Florence worked around the clock as head nurse in an overcrowded field hospital. But she also found time to create graphs to illustrate the terrible conditions of that field hospital to members of British Parliament. The sanitation reforms she led greatly improved the life of the soldiers in battle, and widespread adoption of her hygienic practices vastly reduced mortality rates of humanity in general.

Continue reading “Florence Nightingale: The Lady With The Data”

How Random Is Random?

Many languages feature a random number generator library for help with tasks like rolling a die or flipping a coin. Why, you may ask, is this necessary when humans are perfectly capable of randomly coming up with values?

[ex-punctis] was curious about the same quandary and decided to code up an experiment to test the true randomness of human. A script guesses the user’s next input from two choices, keeping a tally in the JavaScript backend that holds on to the past five choices. If the script guesses correctly, they take $1 from the user. Otherwise, the user earns $1.05.

The data from gathered from running the script with 200 pseudo-random inputs 100,000 times resulted in a distribution of correct guess approximately normal (µ=50% and σ=3.5%). The probability of the script correctly guessing the user’s input is >57% from calculating µ+2σ. The result? Humans aren’t so good at being random after all.

It’s almost intuitive why this happens. Finger presses tend to repeat certain patterns. The script already has a database of all possible combinations of five presses, with a counter for each combination. Every time a key is pressed, the latest five presses is updated and the counter increases for whichever combination of five presses this falls under. Based on this data, the script is able to make a prediction about the user’s next press.

In a follow-up statistic analysis, [ex-punctis] notes that with more key presses, the accuracy of the script tended to increase, with the exception of 1000+ key presses. The latter was thought to be due to the use of a psuedo random number generator to achieve such high levels of engagement with the script.

Some additional tests were done to see if holding shorter or longer sequences in memory would account for more accurate predictions. While shorter sequences should theoretically work, the risk of players keeping a tally of their own presses made it more likely for the longer sequences to reduce bias.

There’s a lot of literature on behavioral models and framing effects for similar games if you’re interested in implementing your own experiments and tricking your friends into giving you some cash.

Abraham Wald’s Problem Solving Lesson Is To Seek What’s Not There

You may not know the name Abraham Wald, but he has a very valuable lesson you can apply to problem solving, engineering, and many other parts of life. Wald worked for the Statistical Research Group (SRG) during World War II. This was part of a top secret organization in the United States that applied elite mathematical talent to help the allies win the war. Near Columbia University, mathematicians and computers — the human kind — worked on problems ranging from how to keep an enemy plane under fire longer to optimal bombing patterns.

One of Wald’s ways to approach problem was to look beyond the data in front of him. He was looking for things that weren’t there, using their absence as an additional data point. It is easy to critique things that are present but incorrect. It is harder to see things that are missing. But the end results of this technique were profound and present an object lesson we can still draw from today.

Continue reading “Abraham Wald’s Problem Solving Lesson Is To Seek What’s Not There”

Automated Dice Tester Uses Machine Vision To Ensure A Fair Game

People take their tabletop games very, very seriously. [Andrew Lauritzen], though, has gone far above and beyond in pursuit of a fair game. The game in question is Star War: X-Wing, a strategy wargame where miniature pieces are moved according to rolls of the dice. [Andrew] suspected that commercially available dice were skewing the game, and the automated machine-vision dice tester shown in the video after the break was the result.

The rig is a very clever design that maximizes the data set with as little motion as possible. The test chamber is a box with clear ends that can be flipped end-for-end by a motor; walls separate the chamber into four channels to test multiple dice on each throw, and baffles within the channels assure randomization. A webcam is positioned below the chamber to take a snapshot of each “throw”, which is then analyzed in OpenCV. This scheme has the unfortunate effect of looking at the dice from the table’s perspective, but [Andrew] dealt with that in true hacker fashion: he ignored it since it didn’t impact the statistics he was interested in.

And speaking of statistics, he generated a LOT of them. The 62-page report of results from his study is an impressive piece of work, which basically concludes that the dice aren’t fair due to manufacturing variability, and that players could use this fact to cheat. He recommends pooled sets of dice to eliminate advantages during competitive play. 

This isn’t the first automated dice roller we’ve seen around these parts. There was the tweeting dice-bot, the Dice-O-Matic, and all manner of electronic dice throwers. This one goes the extra mile to keep things fair, and we appreciate that.

Continue reading “Automated Dice Tester Uses Machine Vision To Ensure A Fair Game”

How Provably Loaded Dice Lead To Unprovable Cheating

Here’s a really interesting writeup by [Mike] that has two parts. He shows that not only is it possible to load wooden dice by placing them in a dish of water, but that when using these dice to get an unfair advantage in Settlers of Catan, observation of dice rolls within the game is insufficient to prove that the cheating is taking place.

[Mike] first proves that his pair of loaded dice do indeed result in a higher chance of totals above seven being rolled. He then shows how this knowledge can be exploited by a Settlers of Catan player to gain an average 5-15 additional resource cards in a typical game by taking actions that target the skewed distribution of the loaded dice.

The second part highlights shortcomings and common misunderstandings in current statistical analysis. While it’s possible to prove that the loaded dice do have a skewed distribution by rolling them an arbitrary number of times, as [Mike] and his wife do, it is not possible to detect this cheating in a game. How’s that? There are simply not enough die rolls in a game of Settlers to provide enough significant data to prove that dice distribution is skewed.

Our staff of statistics Ph.D.s would claim that [Mike] overstates his claims about shorcomings in the classical hypothesis testing framework, but the point remains that it’s possible to pass through any given statistical testing process by making the effect just small enough. And we still think it’s neat that he can cheat at Settlers by soaking wooden dice in water overnight.

This isn’t the first time we’ve seen Settlers of Catan at the center of some creative work. There’s this deluxe, hand-crafted reboot, and don’t forget the electroshock-enabled version.

[via Reddit; images from official Catan site]

Statistics And Hacking: A Stout Little Distribution

Previously, we discussed how to apply the most basic hypothesis test: the z-test. It requires a relatively large sample size, and might be appreciated less by hackers searching for truth on a tight budget of time and money.

As an alternative, we briefly mentioned the t-test. The basic procedure still applies: form hypotheses, sample data, check your assumptions, and perform the test. This time though, we’ll run the test with real data from IoT sensors, and programmatically rather than by hand.

The most important difference between the z-test and the t-test is that the t-test uses a different probability distribution. It is called the ‘t-distribution’, and is similar in principle to the normal distribution used by the z-test, but was developed by studying the properties of small sample sizes. The precise shape of the distribution depends on your sample size. Continue reading “Statistics And Hacking: A Stout Little Distribution”