Using Local AI On The Command Line To Rename Images (And More)

We all have a folder full of images whose filenames resemble line noise. How about renaming those images with the help of a local LLM (large language model) executable on the command line? All that and more is showcased on [Justine Tunney]’s bash one-liners for LLMs, a showcase aimed at giving folks ideas and guidance on using a local (and private) LLM to do actual, useful work.

This is built out from the recent llamafile project, which turns LLMs into single-file executables. This not only makes them more portable and easier to distribute, but the executables are perfectly capable of being called from the command line and sending to standard output like any other UNIX tool. It’s simpler to version control the embedded LLM weights (and therefore their behavior) when it’s all part of the same file as well.

One such tool (the multi-modal LLaVA) is capable of interpreting image content. As an example, we can point it to a local image of the Jolly Wrencher logo using the following command:

llava-v1.5-7b-q4-main.llamafile --image logo.jpg --temp 0 -e -p '### User: The image has...\n### Assistant:'

Which produces the following response:

The image has a black background with a white skull and crossbones symbol.

With a different prompt (“What do you see?” instead of “The image has…”) the LLM even picks out the wrenches, but one can already see that the right pieces exist to do some useful work.

Check out [Justine]’s rename-pictures.sh script, which cleverly evaluates image filenames. If an image’s given filename already looks like readable English (also a job for a local LLM) the image is left alone. Otherwise, the picture is fed to an LLM whose output guides the generation of a new short and descriptive English filename in lowercase, with underscores for spaces.

What about the fact that LLM output isn’t entirely predictable? That’s easy to deal with. [Justine] suggests always calling these tools with the --temp 0 parameter. Setting the temperature to zero makes the model deterministic, ensuring that a same input always yields the same output.

There’s more neat examples on the Bash One-Liners for LLMs that demonstrate different ways to use a local LLM that lives in a single-file executable, so be sure to give it a look and see if you get any new ideas. After all, we have previously shown how automating tasks is almost always worth the time invested.

A Ham Radio Answering Machine

For those who grew up with a cell phone in their hand, it might be difficult to imagine a time where the phone wasn’t fully integrated with voicemail. It sounds like a fantastical past, yet at one point a separate machine needed to be attached to the phone to record messages if no one was home to answer. Not only that, but a third device, a cassette tape, was generally needed as a storage device to hold the messages. In many ways we live in a much simpler world now, but in the amateur radio world one group is looking to bring this esoteric technology to the airwaves and [saveitforparts] is demonstrating one as part of a beta test.

The device is called the Boondock Echo, and while at its core it’s an ESP32 there’s a lot going on behind the scenes. It has an audio interface which is capable of connecting to a radio given the correct patch cable; in this case with a simple Baofeng handheld unit. The answering machine can record any sounds that come in. However, with a network connection the recordings are analyzed with an AI which can transcribe what it hears and even listen for specific call signs, then take actions such as sending emails when it hears triggers like that. Boondock also plans for this device to be capable of responding as well, but [saveitforparts] was not able to get this working during this beta test.

While an answering machine might seem like a step backwards technologically, an answering machine like this, especially when paired with Google Voice-like capabilities from an AI, has a lot of promise for ham radio operators. Even during this test, [saveitforparts] lost a radio and a kind stranger keyed it up when it was found, which was recorded by the Boondock Echo and used to eventually recover the radio. Certainly there are plenty of other applications as well, such as using AI instead of something like an Arduino to do Morse decoding.

Continue reading “A Ham Radio Answering Machine”

Making Visual Anagrams, With Help From Machine Learning

[Daniel Geng] and others have an interesting system of generating multi-view optical illusions, or visual anagrams. Such images have more than one “correct” view and visual interpretation.

What’s more, there are quite a few different methods on display: 90 degree flips and other (orthogonal) image rotations, color inversions, jigsaw permutations, and more. The project page has a generous number of examples, so go check them out!

The team’s method uses pre-trained diffusion models — more commonly known as the secret sauce inside image-generating AIs — to evaluate and work to combine the differences between different images, and try to combine and apply it in a way that results in the model generating a good visual result. While conceptually straightforward, this process wasn’t really something that could work without diffusion models driven by modern machine learning techniques.

The visual_anagrams GitHub repository has code and the research paper goes into details on implementation, limitations, and gives guidance on obtaining good results. Image generation is just one of the rapidly-evolving aspects of recent innovations, and it’s always interesting to see unusual applications like this one.

Mozilla Lets Folks Turn AI LLMs Into Single-File Executables

LLMs (Large Language Models) for local use are usually distributed as a set of weights in a multi-gigabyte file. These cannot be directly used on their own, which generally makes them harder to distribute and run compared to other software. A given model can also have undergone changes and tweaks, leading to different results if different versions are used.

To help with that, Mozilla’s innovation group have released llamafile, an open source method of turning a set of weights into a single binary that runs on six different OSes (macOS, Windows, Linux, FreeBSD, OpenBSD, and NetBSD) without needing to be installed. This makes it dramatically easier to distribute and run LLMs, as well as ensuring that a particular version of LLM remains consistent and reproducible, forever.

This wouldn’t be possible without the work of [Justine Tunney], creator of Cosmopolitan, a build-once-run-anywhere framework. The other main part is llama.cpp, and we’ve covered why it is such a big deal when it comes to running self-hosted LLMs.

There are some sample binaries available using the Mistral-7B, WizardCoder-Python-13B, and LLaVA 1.5 LLMs. Just keep in mind that if you’re on a Windows platform, only the LLaVA 1.5 will run, because it’s the only one that squeaks under the 4 GB limit on executable files that Windows has. If you run into issues, check out the gotchas list for troubleshooting tips.

Falsified Photos: Fooling Adobe’s Cryptographically-Signed Metadata

Last week, we wrote about the Leica M11-P, the world’s first camera with Adobe’s Content Authenticity Initiative (CAI) credentials baked into every shot. Essentially, each file is signed with Leica’s encryption key such that any changes to the image, whether edits to the photo itself or the metadata, are tracked. The goal is to not only prove ownership, but that photos are real — not tampered with or AI-generated. At least, that’s the main selling point.

Although the CAI has been around since 2019, it’s adoption is far from widespread. Only a handful of programs support it, although this list includes Photoshop, and its unlikely anybody outside the professional photography space was aware of it until recently. This isn’t too surprising, as it really isn’t relevant to the casual shooter — when I take a shot to upload to Instagram, I’m rarely thinking about whether or not I’ll need cryptographic proof that the photo wasn’t edited — usually adding #nofilter to the description is enough. Where the CAI is supposed to shine, however, is in the world of photojournalism. The idea is that a photographer can capture an image that is signed at the time of creation and maintains a tamper-proof log of any edits made. When the final image is sold to a news publisher or viewed by a reader online, they are able to view that data.

At this point, there are two thoughts you might have (or, at least, there are two thoughts I had upon learning about the CAI)

  1. Do I care that a photo is cryptographically signed?
  2. This sounds easy to break.

Well, after some messing around with the CAI tools, I have some answers for you.

  1. No, you don’t.
  2. Yes, it is.

Continue reading “Falsified Photos: Fooling Adobe’s Cryptographically-Signed Metadata”

How Do You Prove An AI Didn’t Make Your Art?

In the world of digital art, distinguishing between AI-generated and human-made creations has become a significant challenge. Almost overnight, tool sets for generating AI artworks became commonly available to the public, and suddenly, every digital art competition had to contend with potential submissions. Some have welcomed AI, while others demand competitors create artworks by their own hand and no other.

The problem facing artists and judges alike is just how to determine whether an artwork was created by a human or an AI. So what can be done?

Continue reading “How Do You Prove An AI Didn’t Make Your Art?”

AI In A Box Envisions AI As A Private, Offline, Hackable Module

[Useful Sensors] aims to embed a variety of complementary AI tools into a small, private, self-contained module with no internet connection with AI in a Box. It can do live voice recognition and captioning, live translation, and natural language conversational interaction with a local large language model (LLM). Intriguingly, it’s specifically designed with features to make it hack-friendly, such as the ability to act as a voice keyboard by sending live transcribed audio as keystrokes over USB.

Based on the RockChip 3588S SoC, the unit aims to have an integrated speaker, display, and microphone.

Right now it’s wrapping up a pre-order phase, and aims to ship units around the end of January 2024. The project is based around the RockChip 3588S SoC and is open source (GitHub repository), but since it’s still in development, there’s not a whole lot visible in the repository yet. However, a key part of getting good performance is [Useful Sensors]’s own transformers library for the RockChip NPU (neural processing unit).

The ability to perform things like high quality local voice recognition and run locally-hosted LLMs like LLaMa have gotten a massive boost thanks to recent advances in machine learning, and it looks like this project aims to tie them together in a self-contained package.

Perhaps private digital assistants can become more useful when users can have the freedom to modify and integrate them as they see fit. Digital assistants hosted by the big tech companies are often frustrating, and others have observed that this is ultimately because they primarily exist to serve their makers more than they help users.

Continue reading AI In A Box Envisions AI As A Private, Offline, Hackable Module”