Software Project Pieces Broken Bits Back Together

April 13, 2025 by Donald Papp 18 Comments

With all the attention on LLMs (Large Language Models) and image generators lately, it’s nice to see some of the more niche and unusual applications of machine learning. GARF (Generalizeable 3D reAssembly for Real-world Fractures) is one such project.

GARF may play fast and loose with acronym formation, but it certainly knows how to be picky when it counts. Its whole job is to look at the pieces of a broken object and accurately figure out how to fit the pieces back together, even if there are some missing bits or the edges aren’t clean.

Re-assembling an object from imperfect fragments is a nontrivial undertaking.

Efficiently and accurately figuring out how to re-assemble different pieces into a whole is not a trivial task. One may think it can in theory be brute-forced, but the complexity of such a job rapidly becomes immense. That’s where machine learning methods come in, as researchers created a system that can do exactly that. It addresses the challenge of generalizing from a synthetic data set (in which computer-generated objects are broken and analyzed for training) and successfully applying it to the kinds of highly complex breakage patterns that are seen in real-world objects like bones, recovered archaeological artifacts, and more.

The system is essentially a highly adept 3D puzzle solver, but an entirely different beast from something like this jigsaw puzzle solving pick-and-place robot. Instead of working on flat pieces with clean, predictable edges it handles 3D scanned fragments with complex break patterns even if the edges are imperfect, or there are missing pieces.

GARF is exactly the kind of software framework that is worth keeping in the back of one’s mind just in case it comes in handy some day. The GitHub repository contains the code (although at this moment the custom dataset is not yet uploaded) but there is also a demo available for the curious.

Illustration of author surveying the fruits of his labor by Bomberanian

Learning Linux Kernel Modules Using COM Binary Support

April 13, 2025 by Maya Posch 17 Comments

Have you ever felt the urge to make your own private binary format for use in Linux? Perhaps you have looked at creating the smallest possible binary when compiling a project, and felt disgusted with how bloated the ELF format is? If you are like [Brian Raiter], then this has led you down many rabbit holes, with the conclusion being that flat binary formats are the way to go if you want sleek, streamlined binaries. These are formats like COM, which many know from MS-DOS, but which was already around in the CP/M days. Here ‘flat’ means that the entire binary is loaded into RAM without any fuss or foreplay.

Although Linux does not (yet) support this binary format, the good news is that you can learn how to write kernel modules by implementing COM support for the Linux kernel. In the article [Brian] takes us down this COM rabbit hole, which involves setting up a kernel module development environment and exploring how to implement a binary file format. This leads us past familiar paths for those who have looked at e.g. how the Linux kernel handles the shebang (#!) and ‘misc’ formats.

On Windows, the kernel identifies the COM file by its extension, after which it gives it 640 kB & an interrupt table to play with. The kernel module does pretty much the same, which still involves a lot of code.

Of course, this particular rabbit hole wasn’t deep enough yet, so the COM format was extended into the .♚ (Unicode U+265A) format, because this is 2025 and we have to use all those Unicode glyphs for something. This format extension allows for amazing things like automatically exiting after finishing execution (like crashing).

At the end of all these efforts we have not only learned how to write kernel modules and add new binary file formats to Linux, we have also learned to embrace the freedom of accepting the richness of the Unicode glyph space, rather than remain confined by ASCII. All of which is perfectly fine.

Top image: Illustration of [Brian Raiter] surveying the fruits of his labor by [Bomberanian]

The Incomplete JSON Pretty Printer (Brought To You By Vibes)

April 12, 2025 by Donald Papp 21 Comments

Incomplete JSON (such as from a log that terminates unexpectedly) doesn’t parse cleanly, which means anything that usually prints JSON nicely, won’t. Frustration with this is what led [Simon Willison] to make The Incomplete JSON pretty printer, a single-purpose web tool that pretty-prints JSON regardless of whether it’s complete or not.

Making a tool to solve a particular issue is a fantastic application of software, but in this case it also is a good lead-in to some thoughts [Simon] has to share about vibe coding. The incomplete JSON printer is a perfect example of vibe coding, being the product of [Simon] directing an LLM to iteratively create a tool and not looking at the actual code once.

Sometimes, however the machine decides to code something is *fine*.

[Simon] shares that the term “vibe coding” was first used in a social media post by [Andrej Karpathy], who we’ve seen shared a “hello world” of GPT-based LLMs as well as how to train one in pure C, both of which are the product of a deep understanding of the subject (and fantastically educational) so he certainly knows how things work.

Anyway, [Andrej] had a very specific idea he was describing with vibe coding: that of engaging with the tool in almost a state of flow for something like a weekend project, just focused on iterating one’s way to what they want without fussing the details. Why? Because doing so is new, engaging, and fun.

Since then, vibe coding as a term seems to get used to refer to any and all AI-assisted coding, a subject on which folks have quite a few thoughts (many of which were eagerly shared on a recent Ask Hackaday on the subject).

Of course human oversight is critical to a solid and reliable development workflow. But not all software is the same. In the case of the Incomplete JSON Pretty Printer, [Simon] really doesn’t care what the code actually looks like. He got it made in a short amount of time, the tool does exactly what he wants, and it’s hard to imagine the stakes being any lower. To [Simon], however the LMM decided to do things is fine, and there’s a place for that.

A flowchart demonstrating the exploit described.

Vibe Check: False Packages A New LLM Security Risk?

April 12, 2025 by Tyler August 23 Comments

Lots of people swear by large-language model (LLM) AIs for writing code. Lots of people swear at them. Still others may be planning to exploit their peculiarities, according to [Joe Spracklen] and other researchers at USTA. At least, the researchers have found a potential exploit in ‘vibe coding’.

Everyone who has used an LLM knows they have a propensity to “hallucinate”– that is, to go off the rails and create plausible-sounding gibberish. When you’re vibe coding, that gibberish is likely to make it into your program. Normally, that just means errors. If you are working in an environment that uses a package manager, however (like npm in Node.js, or PiPy in Python, CRAN in R-studio) that plausible-sounding nonsense code may end up calling for a fake package.

A clever attacker might be able to determine what sort of false packages the LLM is hallucinating, and inject them as a vector for malicious code. It’s more likely than you think– while CodeLlama was the worst offender, the most accurate model tested (ChatGPT4) still generated these false packages at a rate of over 5%. The researchers were able to come up with a number of mitigation strategies in their full paper, but this is a sobering reminder that an AI cannot take responsibility. Ultimately it is up to us, the programmers, to ensure the integrity and security of our code, and of the libraries we include in it.

We just had a rollicking discussion of vibe coding, which some of you seemed quite taken with. Others agreed that ChatGPT is the worst summer intern ever. Love it or hate it, it’s likely this won’t be the last time we hear of security concerns brought up by this new method of programming.

Special thanks to [Wolfgang Friedrich] for sending this into our tip line.

Using Integer Addition To Approximate Float Multiplication

April 10, 2025 by Maya Posch 14 Comments

Once the domain of esoteric scientific and business computing, floating point calculations are now practically everywhere. From video games to large language models and kin, it would seem that a processor without floating point capabilities is pretty much a brick at this point. Yet the truth is that integer-based approximations can be good enough to hit the required accuracy. For example, approximating floating point multiplication with integer addition, as [Malte Skarupke] recently had a poke at based on an integer addition-only LLM approach suggested by [Hongyin Luo] and [Wei Sun].

As for the way this works, it does pretty much what it says on the tin: adding the two floating point inputs as integer values, followed by adjusting the exponent. This adjustment factor is what gets you close to the answer, but as the article and comments to it illustrate, there are plenty of issues and edge cases you have to concern yourself with. These include under- and overflow, but also specific floating point inputs.

Unlike in scientific calculations where even minor inaccuracies tend to propagate and cause much larger errors down the line, graphics and LLMs do not care that much about float point precision, so the ~7.5% accuracy of the integer approach is good enough. The question is whether it’s truly more efficient as the paper suggests, rather than a fallback as seen with e.g. integer-only audio decoders for platforms without an FPU.

Since one of the nice things about FP-focused vector processors like GPUs and derivatives (tensor, ‘neural’, etc.) is that they can churn through a lot of data quite efficiently, the benefits of shifting this to the ALU of a CPU and expecting (energy) improvements seem quite optimistic.

If You’re 3D Scanning, You’ll Want A Way To Work With Point Clouds

April 5, 2025 by Donald Papp 7 Comments

3D scanning is becoming much more accessible, which means it’s more likely that the average hacker will use it to solve problems — possibly odd ones. That being the case, a handy tool to have in one’s repertoire is a way to work with point clouds. We’ll explain why in a moment, but that’s where CloudCompare comes in (GitHub).

Not all point clouds are destined to be 3D models. A project may call for watching for changes in a surface, for example.

CloudCompare is an open source tool with which one can load up and do various operations on point clouds, including generating mesh models from them. Point clouds are what 3D scanners create when an object is scanned, and to become useful, those point clouds are usually post-processed into 3D models (specifically, meshes) like an .obj or .stl file.

We’ve gone into detail in the past about how 3D scanning works, what to expect from it, and taken a hands-on tour of what an all-in-one wireless scanner can do. But what do point clouds have to do with getting the most out of 3D scanning? Well, if one starts to push the boundaries of how and to what purposes 3D scanning can be applied, it sometimes makes more sense to work with point clouds directly instead of the generated meshes, and CloudCompare is an open-source tool for doing exactly that.

For example, one may wish to align and merge two or more different clouds, such as from two different (possibly incomplete) scans. Or, you might want to conduct a deviation analysis of how those different scans have changed. Alternately, if one is into designing wearable items, it can be invaluable to be able to align something to a 3D scan of a body part.

It’s a versatile tool with numerous tutorials, so if you find yourself into 3D scanning but yearning for more flexibility than you can get by working with the mesh models — or want an alternative to modeling-focused software like Blender — maybe it’s time to work with the point clouds directly.

Software Hacks Unlock Cheap Spectrometer

March 31, 2025 by Tom Nardi 25 Comments

A spectrometer is one of those tools that many of us would love to have, but just can’t justify the price of. Sure there are some DIY options out there, but few of them have the convenience or capability of what’s on the commercial market. [Chris] from Zoid Technology recently found a portable spectrometer complete with Android application for just $150 USD on AliExpress which looked very promising…at least at first.

The problem is that the manufacturer, Torch Bearer, offers more expensive models of this spectrometer. In an effort to push users into those higher-priced models, arbitrary features such as data export are blocked in the software. [Chris] first thought he could get around this by reverse engineering the serial data coming from the device (interestingly, the spectrometer ships with a USB-to-serial adapter), but while he got some promising early results, he found that the actual spectrometer data was obfuscated — a graph of the results looked like stacks of LEGOs.

Continue reading “Software Hacks Unlock Cheap Spectrometer” →