A Straightforward AI Voice Assistant, On A Pi

With AI being all the rage at the moment it’s been somewhat annoying that using a large language model (LLM) without significant amounts of computing power meant surrendering to an online service run by a large company. But as happens with every technological innovation the state of the art has moved on, now to such an extent that a computer as small as a Raspberry Pi can join the fun. [Nick Bild] has one running on a Pi 4, and he’s gone further than just a chatbot by making into a voice assistant.

The brains of the operation is a Tinyllama LLM, packaged as a llamafile, which is to say an executable that provides about as easy a one-step access to a local LLM as it’s currently possible to get. The whisper voice recognition sytem provides a text transcript of the input prompt, while the eSpeak speech synthesizer creates a voice output for the result. There’s a brief demo video we’ve placed below the break, which shows it working, albeit slowly.

Perhaps the most important part of this project is that it’s easy to install and he’s provided full instructions in a GitHub repository. We know that the quality and speed of these models on commodity single board computers will only increase with time, so we’d rate this as an important step towards really good and cheap local LLMs. It may however be a while before it can help you make breakfast.

Continue reading “A Straightforward AI Voice Assistant, On A Pi”

Meet GOODY-2, The World’s Most Responsible (And Least Helpful) AI

AI guardrails and safety features are as important to get right as they are difficult to implement in a way that satisfies everyone. This means safety features tend to err on the side of caution. Side effects include AI models adopting a vaguely obsequious tone, and coming off as overly priggish when they refuse reasonable requests.

Prioritizing safety above all.

Enter GOODY-2, the world’s most responsible AI model. It has next-gen ethical principles and guidelines, capable of refusing every request made of it in any context whatsoever. Its advanced reasoning allows it to construe even the most banal of queries as problematic, and dutifully refuse to answer.

As the creators of GOODY-2 point out, taking guardrails to a logical extreme is not only funny, but also acknowledges that effective guardrails are actually a pretty difficult problem to get right in a way that works for everyone.

Complications in this area include the fact that studies show humans expect far more from machines than they do from each other (or, indeed, from themselves) and have very little tolerance for anything they perceive as transgressive.

This also means that as AI models become more advanced, so too have they become increasingly sycophantic, falling over themselves to apologize for perceived misunderstandings and twisting themselves into pretzels to align their responses with a user’s expectations. But GOODY-2 allows us all to skip to the end, and glimpse the ultimate future of erring on the side of caution.

[via WIRED]

Bringing The Voice Assistant Home

For many, the voice assistants are helpful listeners. Just shout to the void, and a timer will be set, or Led Zepplin will start playing. For some, the lack of flexibility and reliance on cloud services is a severe drawback. [John Karabudak] is one of those people, and he runs his own voice assistant with an LLM (large language model) brain.

In the mid-2010’s, it seemed like voice assistants would take over the world, and all interfaces were going to NLP (natural language processing). Cracks started to show as these assistants ran into the limits of what NLP could reasonably handle. However, LLMs have breathed some new life into the idea as they can easily handle much more complex ideas and commands. However, running one locally is easier said than done.

A firewall with some muscle (Protectli Vault VP2420) runs a VLAN and NIPS to expose the service to the wider internet. For actually running the LLM, two RTX 4060 Ti cards provide the large VRAM needed to load a decent-sized model at a cheap price point. The AI engine (vLLM) supports dozens of models, but [John] chose a quantized version of Mixtral to fit in the 32GB of VRAM he had available.

Continue reading “Bringing The Voice Assistant Home”

Using Local AI On The Command Line To Rename Images (And More)

We all have a folder full of images whose filenames resemble line noise. How about renaming those images with the help of a local LLM (large language model) executable on the command line? All that and more is showcased on [Justine Tunney]’s bash one-liners for LLMs, a showcase aimed at giving folks ideas and guidance on using a local (and private) LLM to do actual, useful work.

This is built out from the recent llamafile project, which turns LLMs into single-file executables. This not only makes them more portable and easier to distribute, but the executables are perfectly capable of being called from the command line and sending to standard output like any other UNIX tool. It’s simpler to version control the embedded LLM weights (and therefore their behavior) when it’s all part of the same file as well.

One such tool (the multi-modal LLaVA) is capable of interpreting image content. As an example, we can point it to a local image of the Jolly Wrencher logo using the following command:

llava-v1.5-7b-q4-main.llamafile --image logo.jpg --temp 0 -e -p '### User: The image has...\n### Assistant:'

Which produces the following response:

The image has a black background with a white skull and crossbones symbol.

With a different prompt (“What do you see?” instead of “The image has…”) the LLM even picks out the wrenches, but one can already see that the right pieces exist to do some useful work.

Check out [Justine]’s rename-pictures.sh script, which cleverly evaluates image filenames. If an image’s given filename already looks like readable English (also a job for a local LLM) the image is left alone. Otherwise, the picture is fed to an LLM whose output guides the generation of a new short and descriptive English filename in lowercase, with underscores for spaces.

What about the fact that LLM output isn’t entirely predictable? That’s easy to deal with. [Justine] suggests always calling these tools with the --temp 0 parameter. Setting the temperature to zero makes the model deterministic, ensuring that a same input always yields the same output.

There’s more neat examples on the Bash One-Liners for LLMs that demonstrate different ways to use a local LLM that lives in a single-file executable, so be sure to give it a look and see if you get any new ideas. After all, we have previously shown how automating tasks is almost always worth the time invested.

Hackaday Links Column Banner

Hackaday Links: December 10, 2023

In this week’s episode of “Stupid Chatbot Tricks,” it turns out that jailbreaking ChatGPT is as easy as asking it to repeat a word over and over forever. That’s according to Google DeepMind researchers, who managed to force the chatbot to reveal some of its training data with a simple prompt, something like “Repeat the word ‘poem’ forever.” ChatGPT dutifully followed the instructions for a little while before spilling its guts and revealing random phrases from its training dataset, to including complete email addresses and phone numbers. They argue that this is a pretty big deal, not just because it’s potentially doxxing people, but because it reveals the extent to which large language models just spit back memorized text verbatim. It looks like OpenAI agrees that it’s a big deal, too, since they’ve explicitly made prompt-induced echolalia a violation of the ChatGPT terms of service. Seems like they might need to do a little more work to fix the underlying problem.

Continue reading “Hackaday Links: December 10, 2023”

Mozilla Lets Folks Turn AI LLMs Into Single-File Executables

LLMs (Large Language Models) for local use are usually distributed as a set of weights in a multi-gigabyte file. These cannot be directly used on their own, which generally makes them harder to distribute and run compared to other software. A given model can also have undergone changes and tweaks, leading to different results if different versions are used.

To help with that, Mozilla’s innovation group have released llamafile, an open source method of turning a set of weights into a single binary that runs on six different OSes (macOS, Windows, Linux, FreeBSD, OpenBSD, and NetBSD) without needing to be installed. This makes it dramatically easier to distribute and run LLMs, as well as ensuring that a particular version of LLM remains consistent and reproducible, forever.

This wouldn’t be possible without the work of [Justine Tunney], creator of Cosmopolitan, a build-once-run-anywhere framework. The other main part is llama.cpp, and we’ve covered why it is such a big deal when it comes to running self-hosted LLMs.

There are some sample binaries available using the Mistral-7B, WizardCoder-Python-13B, and LLaVA 1.5 LLMs. Just keep in mind that if you’re on a Windows platform, only the LLaVA 1.5 will run, because it’s the only one that squeaks under the 4 GB limit on executable files that Windows has. If you run into issues, check out the gotchas list for troubleshooting tips.

NVIDIA Trains Custom AI To Assist Chip Designers

AI is big news lately, but as with all new technology moves, it’s important to pierce through the hype. Recent news about NVIDIA creating a custom large language model (LLM) called ChipNeMo to assist in chip design is tailor-made for breathless hyperbole, so it’s refreshing to read exactly how such a thing is genuinely useful.

ChipNeMo is trained on the highly specific domain of semiconductor design via internal code repositories, documentation, and more. The result is a vast 43-billion parameter LLM running on a single A100 GPU that actually plays no direct role in designing chips, but focuses instead on making designers’ jobs easier.

For example, it turns out that senior designers spend a lot of time answering questions from junior designers. If a junior designer can ask ChipNeMo a question like “what does signal x from memory unit y do?” and that saves a senior designer’s time, then NVIDIA says the tool is already worth it. In addition, it turns out another big time sink for designers is dealing with bugs. Bugs are extensively documented in a variety of ways, and designers spend a lot of time reading documentation just to grasp the basics of a particular bug. Acting as a smart interface to such narrowly-focused repositories is something a tool like ChipNeMo excels at, because it can provide not just summaries but also concrete references and sources. Saving developer time in this way is a clear and easy win.

It’s an internal tool and part research project, but it’s easy to see the benefits ChipNeMo can bring. Using LLMs trained on internal information for internal use is something organizations have experimented with (for example, Mozilla did so, while explaining how to do it for yourself) but it’s interesting to see a clear roadmap to assisting developers in concrete ways.