Reservoir Sampling, Or How To Sample Sets Of Unknown Size

Selecting a random sample from a set is simple. But what about selecting a fair random sample from a set of unknown or indeterminate size? That’s where reservoir sampling comes in, and [Sam Rose] has a beautifully-illustrated, interactive guide to how reservoir sampling works. As far as methods go, it’s as elegant as it is simple, and particularly suited to fairly sampling dynamic datasets like sipping from a firehose of log events.

While reservoir sampling is simple in principle it’s not entirely intuitive to everyone. That’s what makes [Sam]’s interactive essay so helpful; he first articulates the problem before presenting the solution in a way that makes it almost self-evident.

[Sam] uses an imaginary deck of cards to illustrate the problem. If one is being dealt cards one at a time from a deck of unknown size (there could be ten cards, or a million), how can one choose a single card in a way that gives each an equal chance of having been selected? Without collecting them all first?

In a nutshell, the solution is to make a decision every time a new card arrives: hold onto the current card, or replace it with the new one. Each new card is given a 1/n chance of becoming held, where n is the number of cards we’ve seen so far. That’s all it takes. No matter when the dealer stops dealing, each card that has been seen will have had an equal chance of ending up the one selected.

There are a few variations which [Sam] also covers, and practical ways of applying it to log collection, so check it out for yourself.

If [Sam]’s knack for illustrating concepts in an interactive way is your jam, we have one more to point out. Our own Al Williams wrote a piece on Turing machines; the original “universal machine” being a theoretical device with a read/write head and infinite paper tape. A wonderful companion to that article is [Sam]’s piece illustrating exactly how such a Turing machines would work in an interactive way.

Open Source Hiding In Plain Sight

On the podcast, [Tom] and I were talking about the continuing saga of the libogc debacle. [Tom] has been interviewing some of the principals involved, so he’s got some first-hand perspective on it all – you should really go read his pieces. But the short version is that an old library that many Nintendo game emulators use appears to have cribbed code from both and open-source real-time operating system called RTEMS, and the Linux kernel itself.

You probably know Linux, but RTEMS is a high-reliability RTOS for aerospace. People in the field tell me that it’s well-known in those circles, but it doesn’t have a high profile in the hacker world. Still, satellites run RTEMS, so it’s probably also a good place to draw inspiration from, or simply use the library as-is. Since it’s BSD-licensed, you can also borrow entire functions wholesale if you attribute them properly.

In the end, an RTOS is an RTOS. It doesn’t matter if it’s developed for blinking LEDs or for guiding ICBMs. This thought got [Tom] and I to thinking about what other high-reliability open-source code is out there, hidden away in obscurity because of the industry that it was developed for. NASA’s core flight system came instantly to mind, but NASA makes much of its code available for you to use if you’re interested. There are surely worse places to draw inspiration!

What other off-the-beaten-path software sources do you know of that might be useful for our crowd?

If You’re 3D Scanning, You’ll Want A Way To Work With Point Clouds

3D scanning is becoming much more accessible, which means it’s more likely that the average hacker will use it to solve problems — possibly odd ones. That being the case, a handy tool to have in one’s repertoire is a way to work with point clouds. We’ll explain why in a moment, but that’s where CloudCompare comes in (GitHub).

Not all point clouds are destined to be 3D models. A project may call for watching for changes in a surface, for example.

CloudCompare is an open source tool with which one can load up and do various operations on point clouds, including generating mesh models from them. Point clouds are what 3D scanners create when an object is scanned, and to become useful, those point clouds are usually post-processed into 3D models (specifically, meshes) like an .obj or .stl file.

We’ve gone into detail in the past about how 3D scanning works, what to expect from it, and taken a hands-on tour of what an all-in-one wireless scanner can do. But what do point clouds have to do with getting the most out of 3D scanning? Well, if one starts to push the boundaries of how and to what purposes 3D scanning can be applied, it sometimes makes more sense to work with point clouds directly instead of the generated meshes, and CloudCompare is an open-source tool for doing exactly that.

For example, one may wish to align and merge two or more different clouds, such as from two different (possibly incomplete) scans. Or, you might want to conduct a deviation analysis of how those different scans have changed. Alternately, if one is into designing wearable items, it can be invaluable to be able to align something to a 3D scan of a body part.

It’s a versatile tool with numerous tutorials, so if you find yourself into 3D scanning but yearning for more flexibility than you can get by working with the mesh models — or want an alternative to modeling-focused software like Blender — maybe it’s time to work with the point clouds directly.

Physical Computing Used To Be A Thing

In the early 2000s, the idea that you could write programs on microcontrollers that did things in the physical world, like run motors or light up LEDs, was kind of new. At the time, most people thought of coding as stuff that stayed on the screen, or in cyberspace. This idea of writing code for physical gadgets was uncommon enough that it had a buzzword of its own: “physical computing”.

You never hear much about “physical computing” these days, but that’s not because the concept went away. Rather, it’s probably because it’s almost become the norm. I realized this as Tom Nardi and I were talking on the podcast about a number of apparently different trends that all point in the same direction.

We started off talking about the early days of the Arduino revolution. Sure, folks have been building hobby projects with microcontrollers built in before Arduino, but the combination of a standardized board, a wide-ranging software library, and abundant examples to learn from brought embedded programming to a much wider audience. And particularly, it brought this to an audience of beginners who were not only blinking an LED for the first time, but maybe even taking their first steps into coding. For many, the Arduino hello world was their coding hello world as well. These folks are “physical computing” natives.

Now, it’s to the point that when Arya goes to visit FOSDEM, an open-source software convention, there is hardware everywhere. Why? Because many successful software projects support open hardware, and many others run on it. People port their favorite programming languages to microcontroller platforms, and as they become more powerful, the lines between the “big” computers and the “micro” ones starts to blur.

And I think this is awesome. For one, it’s somehow more rewarding, when you’re just starting to learn to code, to see the letters you type cause something in the physical world to happen, even if it’s just blinking an LED. At the same time, everything has a microcontroller in it these days, and hacking on these devices is also another flavor of physical computing – there’s code in everything that you might think of as hardware. And with open licenses, everything being under version control, and more openness in open hardware than we’ve ever seen before, the open-source hardware world reflects the open-source software ethos.

Are we getting past the point where the hardware / software distinction is even worth making? And was “physical computing” just the buzzword for the final stages of blurring out those lines?

UScope: A New Linux Debugger And Not A GDB Shell, Apparently

[Jim Colabro] is a little underwhelmed with the experience of low-level debugging of Linux applications using traditional debuggers such as GDB and LLDB. These programs have been around for a long time, developing alongside Linux and other UNIX-like OSs, and are still solidly in the CLI domain.  Fed up with the lack of data structure support and these tools’ staleness and user experience, [Jim] has created UScope, a new debugger written from scratch with no code from the existing projects.

GBD, in particular, has quite a steep learning curve once you dig into its more advanced features. Many people side-step this learning curve by running GDB within Visual Studio or some other modern IDE, but it is still the same old debugger core at the end of the day. [Jim] gripes that existing debuggers don’t support modern data structures commonly used and have poor customizability. It would be nice, for example, to write a little code, and have the debugger render a data structure graphically to aid visualisation of a problem being investigated. We know that GDB at least can be customised with Python to create application-specific pretty printers, but perhaps [Jim] has bigger plans?

Continue reading “UScope: A New Linux Debugger And Not A GDB Shell, Apparently”

Learn New Tools, Or Hone Your Skill With The Old?

Buried in a talk on AI from an artist who is doing cutting-edge video work was the following nugget that entirely sums up the zeitgeist: “The tools are changing so fast that artists can’t keep up with them, let alone master them, before everyone is on to the next.” And while you might think that this concern is only relevant to those who have to stay on the crest of the hype wave, the deeper question resounds with every hacker.

When was the last time you changed PCB layout software or refreshed your operating system? What other tools do you use in your work or your extra-curricular projects, and how long have you been using them? Are you still designing your analog front-ends with LM358s, or have you looked around to see that technology has moved on since the 1970s? “OMG, you’re still using ST32F103s?”

It’s not a simple question, and there are no good answers. Proficiency with a tool, like for instance the audio editor with which I crank out the podcast every week, only comes through practice. And practice simply takes time and effort. When you put your time in on a tool, it really is an investment in that it helps you get better. But what about that newer, better tool out there?

Some of the reluctance to update is certainly sunk-cost fallacy, after all you put so much sweat and tears into the current tool, but there is also a real cost to overcome to learn the new hotness, and that’s no fallacy. If you’re always trying to learn a new way of doing something, you’re never going to get good at doing something, and that’s the lament of our artist friend. Honing your craft requires focus. You won’t know the odd feature set of that next microcontroller as well as you do the old faithful – without sitting down and reading the datasheet and doing a couple finger-stretching projects first.

Striking the optimal balance here is hard. On a per-project basis, staying with your good old tool or swapping to the new hotness is a binary choice, but across your projects, you can do some of each. Maybe it makes sense to budget some of your hacking time into learning new tools? How about ten percent? What do you think?

38C3: Save Your Satellite With These Three Simple Tricks

BEESAT-1 is a 1U cubesat launched in 2009 by the Technical University of Berlin. Like all good satellites, it has redundant computers onboard, so when the first one failed in 2011, it just switched over to the second. And when the backup failed in 2013, well, the satellite was “dead” — or rather sending back all zeroes. Until [PistonMiner] took a look at it, that is.

Getting the job done required debugging the firmware remotely — like 700 km remotely. Because it was sending back all zeroes, but sending back valid zeroes, that meant there was something wrong either in the data collection or the assembly of the telemetry frames. A quick experiment confirmed that the assembly routine fired off very infrequently, which was a parameter that’s modifiable in SRAM. Setting a shorter assembly time lead to success: valid telemetry frame.

Then comes the job of patching the bird in flight. [PistonMiner] pulled the flash down, and cobbled together a model of the satellite to practice with in the lab. And that’s when they discovered that the satellite doesn’t support software upload to flash, but does allow writing parameter words. The hack was an abuse of the fact that the original code was written in C++. Intercepting the vtables let them run their own commands without the flash read and write conflicting.

Of course, nothing is that easy. Bugs upon bugs, combined with the short communication window, made it even more challenging. And then there was the bizarre bit with the camera firing off after every flash dump because of a missing break in a case statement. But the camera never worked anyway, because the firmware didn’t get finished before launch.

Challenge accepted: [PistonMiner] got it working, and after fifteen years in space, and ten years of being “dead”, BEESAT-1 was taking photos again. What caused the initial problem? NAND flash memory needs to be cleared to zeroes before it’s written, and a bug in the code lead to a long pause between the two, during which a watchdog timeout fired and the satellite reset, blanking the flash.

This talk is absolutely fantastic, but may be of limited practical use unless you have a long-dormant satellite to play around with. We can nearly guarantee that after watching this talk, you will wish that you did. If so, the Orbital Index can help you get started.