Intuitive Explanation Of Arithmetic, Geometric & Harmonic Mean

The simple definition of a mean is that of a numeric quantity which represents the center of a collection of numbers. Here the trick lies in defining the exact type of numeric collection, as beyond the arithmetic mean (AM for short, the sum of all values divided by their number) there are many more, with the other two classical Pythagorean means being the geometric mean (GM) and harmonic mean (HM).

The question that many start off with, is what the GM and AM are and why you’d want to use them, which is why [W.D.] wrote a blog post on that topic that they figure should be somewhat intuitive relative to digging through search results, or consulting the Wikipedia entries.

Compared to the AM, the GM uses the product of the values rather than the sum, which makes it a good fit for e.g. changes in a percentage data set. One thing that [W.D] argues for is to use logarithms to grasp the GM, as this makes it more obvious and closer to taking the AM. Finally, the HM is useful for something like the average speed across multiple trips, and is perhaps the easiest to grasp.

Ultimately, the Pythagorean means and their non-Pythagorean brethren are useful for things like data analysis and statistics, where using the right mean can reveal interesting data, much like how other types using something like the median can make a lot more sense. The latter obviously mostly in the hazy field of statistics.

No matter what approach works for you to make these concepts ‘click’, they’re all very useful things to comprehend, as much of every day life revolves around them, including concepts like ‘mean time to failure’ for parts.


Top image: Cycles of sunspots for the last 400 years as an example data set to apply statistical interpretations to. (Credit: Robert A. Rohde, CC BY-SA 3.0)

A Second OctoPrint Plugin Has Been Falsifying Stats

The ongoing story of bogus analytical data being submitted to the public OctoPrint usage statistics has taken a surprising turn with the news that a second plugin was being artificially pushed up the charts. At least this time, the developer of the plugin has admitted to doing the deed personally.

Just to recap, last week OctoPrint creator [Gina Häußge] found that somebody had been generating fictitious OctoPrint usage stats since 2022 in an effort to make the OctoEverywhere plugin appear to be more popular than it actually was. It was a clever attempt, and if it wasn’t for the fact that the fake data was reporting itself to be from a significantly out of date build of OctoPrint, there’s no telling how long it would have continued. When the developers of the plugin were confronted, they claimed it was an overzealous user operating under their own initiative, and denied any knowledge that the stats were being manipulated in their favor.

Presumably it was around this time that Obico creator [Kenneth Jiang] started sweating bullets. It turns out he’d been doing the same thing, for just about as long. When [Gina] contacted him about the suspicious data she was seeing regarding his plugin, he owned up to falsifying the data and published what strikes us as a fairly contrite apology on the Obico blog. While this doesn’t absolve him of making a very poor decision, we respect that he didn’t try to shift the blame elsewhere.

That said, there’s at least one part of his version of events that doesn’t quite pass the sniff test for us. According to [Kenneth], he first wrote the script that generated the fake data back in 2022 because he suspected (correctly, it turns out) that the developers of OctoEverywhere were doing something similar. But after that, he says he didn’t realize the script was still running until [Gina] confronted him about it.

Now admittedly, we’re not professional programmers here at Hackaday. But we’ve written enough code to be suspicious when somebody claims a script they whipped up on a lark was able to run unattended for two years and never once crashed or otherwise bailed out. We won’t even begin to speculate where said script could have been running since 2022 without anyone noticing…

But we won’t dwell on the minutiae here. [Gina] has once again purged the garbage data from the OctoPrint stats, and hopefully things are finally starting to reflect reality. We know she was already angry about the earlier attempts to manipulate the stats, so she’s got to be seething right about now. But as we said before, these unfortunate incidents are ultimately just bumps in the road. We don’t need any stat tracker to know that the community as a whole greatly appreciates the incredible work she’s put into OctoPrint.

The Guinness Brewery Invented One Of Science’s Most Important Statistical Tools

The Guinness brewery has a long history of innovation, but did you know that it was the birthplace of the t-test? A t-test is usually what underpins a declaration of results being “statistically significant”. Scientific American has a fascinating article all about how the Guinness brewery (and one experimental brewer in particular) brought it into being, with ramifications far beyond that of brewing better beer.

William Sealy Gosset (aka ‘Student’), self-trained statistician. [source: user Wujaszek, wikipedia]
Head brewer William Sealy Gosset developed the technique in the early 1900s as a way to more effectively monitor and control the quality of stout beer. At Guinness, Gosset and other brilliant researchers measured everything they could in their quest to optimize and refine large-scale brewing, but there was a repeated problem. Time and again, existing techniques of analysis were simply not applicable to their gathered data, because sample sizes were too small to work with.

While the concept of statistical significance was not new at the time, Gosset’s significant contribution was finding a way to effectively and economically interpret data in the face of small sample sizes. That contribution was the t-test; a practical and logical approach to dealing with uncertainty.

As mentioned, t-testing had ramifications and applications far beyond that of brewing beer. The basic question of whether to consider one population of results significantly different from another population of results is one that underlies nearly all purposeful scientific inquiry. (If you’re unclear on how exactly the t-test is applied and how it is meaningful, the article in the first link walks through some excellent and practical examples.)

Dublin’s Guinness brewery has a rich heritage of innovation so maybe spare them a thought the next time you indulge in statistical inquiry, or in a modern “nitro brew” style beverage. But if you prefer to keep things ultra-classic, there’s always beer from 1574, Dublin castle-style.

Full Self-Driving, On A Budget

Self-driving is currently the Holy Grail in the automotive world, with a number of companies racing to build general-purpose autonomous vehicles that can get from point A to point B with no user input. While no one has brought one to market yet, at least one has promised this feature and had customers pay for it, but continually moved the goalposts for delivery due to how challenging this problem turns out to be. But it doesn’t need to be that hard or expensive to solve, at least in some situations.

The situation in question is driving on a single stretch of highway, and only focuses on steering, so it doesn’t handle the accelerator or brake pedal input. The highway is driven normally, using a webcam to take images of the route and an Arduino to capture data about the steering angle. The idea here is that with enough training the Arduino could eventually steer the car. But first some math needs to happen on the training data since the steering wheel is almost always not turning the car, so the Arduino knows that actual steering events aren’t just statistical anomalies. After the training, the system does a surprisingly good job at “driving” based on this data, and does it on a budget not much larger than laptop, microcontroller, and webcam.

Admittedly, this project was a proof-of-concept to investigate machine learning, neural networks, and other statistical algorithms used in these sorts of systems, and doesn’t actually drive any cars on any roadways. Even the creator says he wouldn’t trust it himself, but that he was pleasantly surprised by the results of such a simple system. It could also be expanded out to handle brake and accelerator pedals with separate neural networks as well. It’s not our first budget-friendly self-driving system, either. This one makes it happen with the enormous computing resources of a single Android smartphone.

Continue reading “Full Self-Driving, On A Budget”

Putting A Cheap Laser Rangefinder Through Its Paces

Sometimes a gizmo seems too cheap to be true. You know there’s just no way it’ll work as advertised — but sometimes it’s fun to find out. Thankfully, if that gadget happens to be a MILESEEY PF210 Hunting Laser Rangefinder, [Phil] has got you covered. He recently got his hands on one (for less than 100 euros, which is wild for a laser rangefinder) and decided to see just how useful it actually was.

The instrument in question measures distances via the time-of-flight method; it bounces a laser pulse off of some distant (or not-so-distant) object and measures how long the pulse takes to return. Using the speed of light, it can calculate the distance the pulse has traveled).

As it turns out, it worked surprisingly well. [Phil] decided to focus his analysis on accuracy and precision, arguably the most important features you’d look for while purchasing such an instrument. We won’t get into the statistical nitty-gritty here, but suffice it to say that [Phil] did his homework. To evaluate the instrument’s precision, he took ten measurements against each of ten different targets of various ranges between 2.9 m and 800 m. He found that it was incredibly precise (almost perfectly repeatable) at low distances, and still pretty darn good way out at 800 m (±1 m repeatability).

To test the accuracy, he took a series of measurements and compared them against their known values (pretty straightforward, right?). He found that the instrument was accurate to within a maximum of 3% (but was usually even better than that).

While this may not be groundbreaking science, it’s really nice to be reminded that sometimes a cheap instrument will do the job, and we love that there are dedicated folks like [Phil] out there who are willing to put the time in to prove it.

Using Statistics Instead Of Sensors

Statistics often gets a bad rap in mathematics circles for being less than concrete at best, and being downright misleading at worst. While these sentiments might ring true for things like political polling, it hides the fact that statistical methods can be put to good use in engineering systems with fantastic results. [Mark Smith], for example, has been working on an espresso machine which can make the perfect shot of coffee, and turned to one of the tools in the statistics toolbox in order to solve a problem rather than adding another sensor to his complex coffee-brewing machine.

To make espresso, steam is generated which is then forced through finely ground coffee. [Mark] found that his espresso machine was often pouring too much or too little coffee, and in order to improve his machine’s accuracy in this area he turned to the linear regression parameter R2, also known as the coefficient of determination. By using a machine learning algorithm tuned to this value, which assesses predictable variation in a data set, a computer can more easily tell when the coffee begins pouring out of the portafilter and into the espresso cup based on the pressure and water flow in the machine itself rather than using some other input such as the weight of the cup.

We have seen in the past how seriously [Mark] takes his coffee-making, and this is another step in a series of improvements he has made to his equipment. In this iteration, he has additionally produced a simulation in JupyterLab to better assist him in modeling the system and making even more accurate predictions. It’s quite a bit more effort than adding sensors, but since his espresso machine already included quite a bit of computing power it’s not too big a leap for him to make.

Hackers, Fingerprints, Laptops, And Stickers

A discussion ensued about our crazy hacker ways the other night. I jokingly suggested that with as many stickers as we each had on our trusty companion machines, they might literally be as unique as a fingerprint. Cut straight to nerds talking too much math.

First off, you could wonder about the chances of two random hackers having the same sticker on their laptop. Say, for argument’s sake, that globally there are 2,000 stickers per year that are cool enough to put on a laptop. (None of us will see them all.) If a laptop lasts five years, that’s a pool of 10,000 stickers to draw from. If you’ve only got one sticker per laptop, that’s pretty slim odds, even when the laptops are of the same vintage.

Real hackers have 20-50 stickers per laptop — at least in our sample of “real hackers”. Here, the Birthday Paradox kicks in and helps us out. Each additional sticker provides another shot at matching, and an extra shot at being matched. So while you and I are unlikely to have the same birthday, in a room full of 42 people, it’s 90% likely that someone will have their birthday matched. With eight of us in the room, that’s 240 stickers that could match each other. (9999 / 10000) ^ (240 * 210 / 2) = about an eight percent chance of no match, so a better than 90% chance that we’d have at least one matching sticker.

But that doesn’t answer the original question: are our be-stickered laptops unique, like fingerprints or snowflakes? There, you have to match each and every sticker on the laptop — a virtually impossible task, and while there were eight of us in the room, that’s just not enough to get any real juice from the Birthday Paradox. (1/10,000) ^ 30 = something with -120 in the exponent. More than all the atoms in the universe, much less hackers in a room, whether you take things to the eighth power or not.

I hear you mumbling “network effects”. We’ve all gone to the same conferences, and we have similar taste in stickers, and maybe we even trade with each other. Think six degrees of separation type stuff. Indeed, this was true in our room. A few of us had the same stickers because we gave them to each other. We had a lot more matches than you’d expect, even though we were all unique.

So while the math for these network effects is over my head, I think it says something deeper about our trusty boxen, their stickers, and their hackers. Each sticker also comes with a memory, and our collected memories make us unique like our laptops. But matching stickers are also more than pure Birthday Paradoxes, they represent the shared history of friends.

Wear your laptop stickers with pride!