Mozilla Lets Folks Turn AI LLMs Into Single-File Executables

LLMs (Large Language Models) for local use are usually distributed as a set of weights in a multi-gigabyte file. These cannot be directly used on their own, which generally makes them harder to distribute and run compared to other software. A given model can also have undergone changes and tweaks, leading to different results if different versions are used.

To help with that, Mozilla’s innovation group have released llamafile, an open source method of turning a set of weights into a single binary that runs on six different OSes (macOS, Windows, Linux, FreeBSD, OpenBSD, and NetBSD) without needing to be installed. This makes it dramatically easier to distribute and run LLMs, as well as ensuring that a particular version of LLM remains consistent and reproducible, forever.

This wouldn’t be possible without the work of [Justine Tunney], creator of Cosmopolitan, a build-once-run-anywhere framework. The other main part is llama.cpp, and we’ve covered why it is such a big deal when it comes to running self-hosted LLMs.

There are some sample binaries available using the Mistral-7B, WizardCoder-Python-13B, and LLaVA 1.5 LLMs. Just keep in mind that if you’re on a Windows platform, only the LLaVA 1.5 will run, because it’s the only one that squeaks under the 4 GB limit on executable files that Windows has. If you run into issues, check out the gotchas list for troubleshooting tips.

NVIDIA Trains Custom AI To Assist Chip Designers

AI is big news lately, but as with all new technology moves, it’s important to pierce through the hype. Recent news about NVIDIA creating a custom large language model (LLM) called ChipNeMo to assist in chip design is tailor-made for breathless hyperbole, so it’s refreshing to read exactly how such a thing is genuinely useful.

ChipNeMo is trained on the highly specific domain of semiconductor design via internal code repositories, documentation, and more. The result is a vast 43-billion parameter LLM running on a single A100 GPU that actually plays no direct role in designing chips, but focuses instead on making designers’ jobs easier.

For example, it turns out that senior designers spend a lot of time answering questions from junior designers. If a junior designer can ask ChipNeMo a question like “what does signal x from memory unit y do?” and that saves a senior designer’s time, then NVIDIA says the tool is already worth it. In addition, it turns out another big time sink for designers is dealing with bugs. Bugs are extensively documented in a variety of ways, and designers spend a lot of time reading documentation just to grasp the basics of a particular bug. Acting as a smart interface to such narrowly-focused repositories is something a tool like ChipNeMo excels at, because it can provide not just summaries but also concrete references and sources. Saving developer time in this way is a clear and easy win.

It’s an internal tool and part research project, but it’s easy to see the benefits ChipNeMo can bring. Using LLMs trained on internal information for internal use is something organizations have experimented with (for example, Mozilla did so, while explaining how to do it for yourself) but it’s interesting to see a clear roadmap to assisting developers in concrete ways.

Social Engineering Chatbots With Sad-Sob Stories, For Fun And Profit

By this point, we probably all know that most AI chatbots will decline a request to do something even marginally nefarious. But it turns out that you just might be able to get a chatbot to solve a CAPTCHA puzzle (Nitter), if you make up a good enough “dead grandma” story.

Right up front, we’re going to warn that fabricating a story about a dead or dying relative is a really bad idea; call us superstitious, but karma has a way of balancing things out in ways you might not like. But that didn’t stop X user [Denis Shiryaev] from trying to trick Microsoft’s Bing Chat. As a control, [Denis] first uploaded the image of a CAPTCHA to the chatbot with a simple prompt: “What is the text in this image?” In most cases, a chatbot will gladly pull text from an image, or at least attempt to do so, but Bing Chat has a filter that recognizes obfuscating lines and squiggles of a CAPTCHA, and wisely refuses to comply with the prompt.

On the second try, [Denis] did a quick-and-dirty Photoshop of the CAPTCHA image onto a stock photo of a locket, and changed the prompt to a cock-and-bull story about how his recently deceased grandmother left behind this locket with a bit of their “special love code” inside, and would you be so kind as to translate it, pretty please? Surprisingly, the story worked; Bing Chat not only solved the puzzle, but also gave [Denis] some kind words and a virtual hug.

Now, a couple of things stand out about this. First, we’d like to see this replicated — maybe other chatbots won’t fall for something like this, and it may be the case that Bing Chat has since been patched against this exploit. If [Denis]’ experience stands up, we’d like to see how far this goes; perhaps this is even a new, more practical definition of the Turing Test — a machine whose gullibility is indistinguishable from a human’s.

A Hacker-Friendly Software Package For Your Next AI Project

If you’re interested in using Large Language Models (LLM) in a project, but aren’t plugged directly into the fast-developing world of artificial intelligence (AI), knowing what tool or software to use can be daunting. Luckily, [Max Woolf] created simpleaichat, which is complete with examples and documentation and minimal code complexity.

As [Max] puts it, the main motivations behind the project are to provide useful tools while making it easier for non-engineers to peer through the breathless hyperbole and see just how AI-based apps actually work. This project was directly inspired by [Max]’s own real-world software experiences in this area, particularly his frustrations with popular and much-hyped frameworks in which “Hello World” feels a lot more like Hell World.

simpleaichat is a Python package that provides easy and powerful ways to interface with the OpenAI API, makers of ChatGPT. Now, it is true that OpenAI’s models are not open source and access is not free, but they are easily one of the most capable and cost-effective services of their kind.

Prefer something a little more open, and a lot more private? There’s always the option to run an LLM locally on your own machine, possibly with the help of a tool like text-generation-webui or gpt4all. Running an LLM locally will not have the quality of OpenAI’s offerings, but it can still do the job. It’s also possible to give these local LLMs an interface that mimics OpenAI’s API, so there are loads of possibilities.

Are you getting ideas yet? Share them in the comments, or keep them to yourselves and submit a tip once your project is off the ground!

Text Compression Gets Weirdly Efficient With LLMs

It used to be that memory and storage space were so precious and so limited of a resource that handling nontrivial amounts of text was a serious problem. Text compression was a highly practical application of computing power.

Today it might be a solved problem, but that doesn’t mean it doesn’t attract new or unusual solutions. [Fabrice Bellard] released ts_zip which uses Large Language Models (LLM) to attain text compression ratios higher than any other tool can offer.

LLMs are the technology behind natural language AIs, and applying them in this way seems effective. The tradeoff? Unlike typical compression tools, the lossless decompression part isn’t exactly guaranteed when an LLM is involved. Lossy compression methods are in fact quite useful. JPEG compression, for example, is a good example of discarding data that isn’t readily perceived by humans to make a smaller file, but that isn’t usually applied to text. If you absolutely require lossless compression, [Fabrice] has that covered with NNCP, a neural-network powered lossless data compressor.

Do neural networks and LLMs sound far too serious and complicated for your text compression needs? As long as you don’t mind a mild amount of definitely noticeable data loss, check out [Greg Kennedy]’s Lossy Text Compression which simply, brilliantly, and amusingly uses a thesaurus instead of some fancy algorithms. Yep, it just swaps longer words for shorter ones. Perhaps not the best solution for every need, but between that and [Fabrice]’s brilliant work we’re confident there’s something for everyone who craves some novelty with their text compression.

[Photo by Matthew Henry from Burst]

Self-Hosted Chatbot Focuses On Privacy

Large language models (LLMs) have been all the rage lately, assisting from all kinds of tasks from programming to devising Excel formulas to shortcutting school work. They’re also relatively easy to access for the most part, but as the old saying goes, if something on the Internet is free the real product is you (and your data). Luckily there are ways of hosting LLMs on your own to avoid your personal data getting harvested, as well as taking advantage of open-source solutions, but building these systems takes a little bit of effort. [Stephen] and a team from Mozilla walk us through this process and show us a number of options currently available.

Working from the ground up, the group first decides on hosting, which (unsurprisingly) involves using Mozilla hosting services. The choice of runtime environment was a little bit more challenging. The project was time constrained, so they looked at two options here: Hugging Face and llama.cpp. Eventually deciding to move forward with llama.cpp largely due to its ability to run on more consumer-oriented hardware (especially Apple silicon) and the fact that it doesn’t need a powerful GPU, the next task was to choose the model. Settling on the LLaMa model that Facebook recently open-sourced, this model works well with the runtime environment and is essentially the only one that does.

From there, the team at Mozilla wanted to make sure their chat bot would be able to provide other Mozilla employees with information more readily pertinent to their jobs, so they trained their model with some internal Mozilla data as well as other more generic information. This doesn’t mean the job is done, though, there are a number of other factors that went in to designing this system before it was finally complete. Even then, since they built this in a week it’s not perfect; there are some issues with non-permissive licensing of some of the components and many of the design choices may not have been ideal. It’s impressive what’s out there if you’re hosting your own system, though, and while this might be a little more advanced for a self-hosted project, take a look at some other more beginner-friendly projects you can try if you’re just starting out on the self-hosted path.

The Right Benchmark For GPT

Dan Maloney wanted to design a part for 3D printing. OpenSCAD is a coding language for generating 3D objects. ChatGPT can write code. What could possibly go wrong? You should go read his article because it’s enlightening and hilarious, but the punchline is that it ran afoul of syntax errors, but also gave him enough of a foothold that he could teach himself enough OpenSCAD to get the project done anyway. As with many people who have asked the AI to create some code, Dan finds that it’s not as good as asking someone who knows what they’re doing, but that it’s also better than nothing.

And this is where I start grumbling. When you type your desires into the word-follower machine, your alternative isn’t nothing. Your alternative is to fire up a search engine instead and type “openscad tutorial”. That, for nearly any human endeavor, will get you a few good guides, written by humans who are probably expert in the subject in question, and which are aimed at teaching you the thing that you want to learn. It doesn’t get better than that. You’ll be up and running with your design in no time.

Indeed, if you think about the relevant source material that the LLM was trained on, it’s exactly these tutorials. It can’t possibly do better than the best of them, although the resulting average tutorial might be better than the worst you’ll find. (Some have speculated on what happens when the entire Internet is filled with these generated texts – what will future AIs learn from?)

In Dan’s case, though, he didn’t necessarily want to learn OpenSCAD – he just wanted the latch designed. But in the end, he had to learn enough OpenSCAD to get the AI code compiling without error. He spent an hour learning OpenSCAD and now he’s good to go on his next project too.

So the next time you hear someone say that they got an answer back from a large language model that wasn’t perfect, but it was “better than nothing”, think critically if “nothing” is really the right benchmark.

Do you really want to learn nothing? Do you really have no resources to get started with? I would claim that we have the most amazing set of tutorial resources the world has ever known at our fingertips. Compared to the ability to teach millions of humans to achieve their own goals, that makes the LLM party tricks look kinda weak, in my opinion.