Computers For Fun

The last couple years have seen an incredible flourishing of the cyberdeck scene, and probably for about as many reasons as there are individual ’deck designs. Some people get really into the prop-making, some into scrapping old tech or reusing a particularly appealing case, and others simply into the customization possibilities. That’s awesome, and they’re all different motivations for making a computer that’s truly your own.

But I really like the motivation and sentiment behind [Andreas Eriksen]’s PotatoP. (Assuming that his real motivation isn’t all the bad potato puns.) This is a small microcomputer that’s built on a commonly available microcontroller, so it’s not a particularly powerful beast – hence the “potato”. But what makes up for that in my mind is that it’s running a rudimentary bare-metal OS of his own writing. It’s like he’s taken the cyberdeck’s DIY aesthetic into the software as well.

What I like most about the spirit of the project is the idea of a long-term project that’s also a constant companion. Once you get past a terminal and an interpreter – [Andreas] is using LISP for both – everything else consists of small projects that you can check off one by one, that maybe don’t take forever, and that are limited in complexity by the hardware you’re working on. A simple text editor, some graphics primitives, maybe a sound subsystem. A way to read and write files in flash. I don’t love LISP personally, but I love that it brings interactivity and independence from an external compiler, making the it possible to develop the system on the system, pulling itself up by its own bootstraps.

Pretty soon, you could have something capable, and completely DIY. But it doesn’t need to be done all at once either. With a light enough computer, and a good basic foundation, you could keep it in your backpack and play “OS development” whenever you’ve got the free time. A DIY play OS for a sandbox computing platform: what more could a nerd want?

ChatGPT, Bing, And The Upcoming Security Apocalypse

Most security professionals will tell you that it’s a lot easier to attack code systems than it is to defend them, and that this is especially true for large systems. The white hat’s job is to secure each and every point of contact, while the black hat’s goal is to find just one that’s insecure.

Whether black hat or white hat, it also helps a lot to know how the system works and exactly what it’s doing. When you’ve got the source code, either because it’s open-source, or because you’re working inside the company that makes the software, you’ve got a huge advantage both in finding bugs and in fixing them. In the case of closed-source software, the white hats arguably have the offsetting advantage that they at least can see the source code, and peek inside the black box, while the attackers cannot.

Still, if you look at the number of security issues raised weekly, it’s clear that even in the case of closed-source software, where the defenders should have the largest advantage, that offense is a lot easier than defense.

So now put yourself in the shoes of the poor folks who are going to try to secure large language models like ChatGPT, the new Bing, or Google’s soon-to-be-released Bard. They don’t understand their machines. Of course they know how the work inside, in the sense of cross multiplying tensors and updating weights based on training sets and so on. But because the billions of internal parameters interact in incomprehensible ways, almost all researchers refer to large language models’ inner workings as a black box.

And they haven’t even begun to consider security yet. They’re still worried about how to construct obscure background prompts that prevent their machines from spewing hate speech or pornographic novels. But as soon as the machines start doing something more interesting than just providing you plain text, the black hats will take notice, and someone will have to figure out defense.

Indeed, this week, we saw the first real shot across the bow: a hack to make Bing direct users to arbitrary (bad) webpages. The Bing hack requires the user to already be on a compromised website, so it’s maybe not very threatening, but it points out a possible real security difference between Bing and ChatGPT: Bing gives you links to follow, and that makes it a juicy target.

We’re right on the edge of a new security landscape, because even the white hats are facing a black box in the AI. So far, what ChatGPT and Codex and other large language models are doing is trivially secure – putting out plain text – but Bing is taking the first dangerous steps into doing something more useful, both for users and black hats. Given the ease with which people have undone OpenAI’s attempts to keep ChatGPT in its comfort zone, my guess is that the white hats will have their hands full, and the black-box nature of the model deprives them of their best hope. Buckle your seatbelts.

Simultaneous Invention, All The Time?

As Tom quipped on the podcast this week, if you have an idea for a program you’d like to write, all you have to do is look around on GitHub and you’ll find it already coded up for you. (Or StackOverflow, or…) And that’s probably pretty close to true, at least for really trivial bits of code. But it hasn’t always been thus.

I was in college in the mid 90s, and we had a lab of networked workstations that the physics majors could use. That’s where I learned Unix, and where I had the idea for the simplest program ever. It took the background screen color, in the days before wallpapers, and slowly random-walked it around in RGB space. This was set to be slow enough that anyone watching it intently wouldn’t notice, but fast enough that others occasionally walking by my terminal would see a different color every time. I assure you, dear reader, this was the very height of wit at the time.

With the late 90s came the World Wide Web and the search engine, and the world got a lot smaller. For some reason, I was looking for how to set the X terminal background color again, this time searching the Internet instead of reading up in a reference book, and I stumbled on someone who wrote nearly exactly the same random-walk background color changer. My jaw dropped! I had found my long-lost identical twin brother! Of course, I e-mailed him to let him know. He was stoked, and we shot a couple funny e-mails back and forth riffing on the bizarre coincidence, and that was that.

Can you imagine this taking place today? It’s almost boringly obvious that if you search hard enough you’ll find another monkey on another typewriter writing exactly the same sentence as you. It doesn’t even bear mentioning. Heck, that’s the fundamental principle behind Codex / CoPilot – the code that you want to write has been already written so many times that it will emerge as the most statistically likely response from a giant pattern-matching, word-word completion neural net model.

Indeed, stop me if you’ve read this before.

Thor does battle with a man shooting lasers from his hands

Hackaday Berlin: In Praise Of Lightning Talks

We’re in full-on prep mode for our first event in Europe in four years: Hackaday Berlin. And while we’ve got a great slate of speakers lined up, and to be announced soon, I’m personally most excited for the lightning talks.

Why? Because the lightning talks give you all, the attendees, the chance to get up and let everyone know what you’re up to. They’re longer than an elevator pitch, so you have time to at least start to explain the most interesting detail or two, but they’re not long enough that you can cover every aspect of a project. And that’s the trick!

By being short enough that you couldn’t possibly cover everything, you don’t need to worry about covering everything. Just go for the highlights. And because you left a lot of the interesting details back, everyone in the audience is going to want to bend your ear about it for the rest of the conference. It’s like the ultimate icebreaker.

For the audience? Lightning talks, when they’re good, are like a fountain of non-stop great ideas and inspiration. And if you happen on that just doesn’t tickle your hacker-bone, it’s probably over in another five minutes, so no worries.

We didn’t have time to run a full-on call for proposals for Berlin, but we’re hoping that you’ll ride the lightning. We’d all love to hear what you’ve got to say!

Copyright Data, But Do It Right

Copyright law is a triple-edged sword. Historically, it has been used to make sure that authors and rock musicians get their due, but it’s also been extended to the breaking point by firms like Disney. Strangely, a concept that protected creative arts got pressed into duty in the 1980s to protect the writing down of computer instructions, ironically a comparatively few bytes of BIOS code. But as long as we’re going down this strange road where assembly language is creative art, copyright law could also be used to protect the openness of software as well. And doing so has given tremendous legal backbone to the open and free software movements.

So let’s muddy the waters further. Looking at cases like the CDDB fiasco, or the most recent sale of ADSB Exchange, what I see is a community of people providing data to an open resource, in the belief that they are building something for the greater good. And then someone comes along, closes up the database, and sells it. What prevents this from happening in the open-software world? Copyright law. What is the equivalent of copyright for datasets? Strangely enough, that same copyright law.

Data, being facts, can’t be copyrighted. But datasets are purposeful collections of data. And just like computer programs, datasets can be licensed with a restrictive copyright or a permissive copyleft. Indeed, they must, because the same presumption of restrictive copyright is the default.

I scoured all over the ADSB Exchange website to find any notice of the copyright / copyleft status of their dataset taken as a whole, and couldn’t find any. My read is that this means that the dataset is the exclusive property of its owner. The folks who were contributing to ADSB Exchange were, as far as I can tell, contributing to a dataset that they couldn’t modify or redistribute. To be a free and open dataset, to be shared freely, copied, and remixed, it would need a copyleft license like Creative Commons or the Open Data Commons license.

So I’ll admit that I’m surprised to have not seen permissive licenses used around community-based open data projects, especially projects like ADSB Exchange, where all of the software that drives it is open source. Is this just because we don’t know enough about them? Maybe it’s time for that to change, because copyright on datasets is the law of the land, no matter how absurd it may sound on the face, and the closed version is the default. If you want your data contributions to be free, make sure that the project has a free data license.

Speak To The Machine

If you own a 3D printer, CNC router, or basically anything else that makes coordinated movements with a bunch of stepper motors, chances are good that it speaks G-code. Do you?

If you were a CNC machinist back in the 1980’s, chances are very good that you’d be fluent in the language, and maybe even a couple different machines’ specialized dialects. But higher level abstractions pretty quickly took over the CAM landscape, and knowing how to navigate GUIs and do CAD became more relevant than knowing how to move the machine around by typing.

a Reprap Darwin
Reprap Darwin: it was horrible, but it was awesome.

Strangely enough, I learned G-code in 2010, as the RepRap Darwin that my hackerspace needed some human wranglers. If you want to print out a 3D design today, you have a wealth of convenient slicers that’ll turn abstract geometry into G-code, but back in the day, all we had was a mess of Python scripts. Given the state of things, it was worth learning a little G-code, because even if you just wanted to print something out, it was far from plug-and-play.

For instance, it was far easier to just edit the M104 value than to change the temperature and re-slice the whole thing, which could take an appreciable amount of time back then. Honestly, we were all working on the printers as much as we were printing. Knowing how to whip up some quick bed-levelling test scripts and/or demo objects in G-code was just plain handy. And of course the people writing or tweaking the slicers had to know how to talk directly to the machine.

Even today, I think it’s useful to be able to speak to the machine in its native language. Case in point: the el-quicko pen-plotter I whipped together two weekends ago was actually to play around with Logo, the turtle language, with my son. It didn’t take me more than an hour or so to whip up a trivial Logo-alike (in Python) for the CNC: pen-up, pen-down, forward, turn, repeat, and subroutine definitions. Translating this all to machine moves was actually super simple, and we had a great time live-drawing with the machine.

So if you want to code for your machine, you’ll need to speak its language. A slicer is great for the one thing it does – turning an STL into G-code, but if you want to do anything a little more bespoke, you should learn G-code. And if you’ve got a 3D printer kicking around, certainly if it runs Marlin or similar firmware, you’ve got the ideal platform for exploration.

Does anyone else still play with G-code?

Irreproducible, Accumulative Hacks

Last weekend, I made an incredibly accurate CNC pen-plotter bot in just 20 minutes, for a total expenditure of $0. How did I pull this off? Hacks accumulate.

In particular, the main ingredients were a CNC router, some 3D-printed mounts that I’d designed and built for it, and a sweet used linear rail that I picked up on eBay as part of a set a few years back because it was just too good of a deal. If you had to replicate this build exactly, it would probably take a month or two of labor and cost maybe $2,000 on top of that. Heck, just tuning up the Chinese 6040 CNC machine alone took me four good weekends and involved replacing the stepper motors. Continue reading “Irreproducible, Accumulative Hacks”