A Hacker-Friendly Software Package For Your Next AI Project

If you’re interested in using Large Language Models (LLM) in a project, but aren’t plugged directly into the fast-developing world of artificial intelligence (AI), knowing what tool or software to use can be daunting. Luckily, [Max Woolf] created simpleaichat, which is complete with examples and documentation and minimal code complexity.

As [Max] puts it, the main motivations behind the project are to provide useful tools while making it easier for non-engineers to peer through the breathless hyperbole and see just how AI-based apps actually work. This project was directly inspired by [Max]’s own real-world software experiences in this area, particularly his frustrations with popular and much-hyped frameworks in which “Hello World” feels a lot more like Hell World.

simpleaichat is a Python package that provides easy and powerful ways to interface with the OpenAI API, makers of ChatGPT. Now, it is true that OpenAI’s models are not open source and access is not free, but they are easily one of the most capable and cost-effective services of their kind.

Prefer something a little more open, and a lot more private? There’s always the option to run an LLM locally on your own machine, possibly with the help of a tool like text-generation-webui or gpt4all. Running an LLM locally will not have the quality of OpenAI’s offerings, but it can still do the job. It’s also possible to give these local LLMs an interface that mimics OpenAI’s API, so there are loads of possibilities.

Are you getting ideas yet? Share them in the comments, or keep them to yourselves and submit a tip once your project is off the ground!

AI Assistant Translates Your Every Request For The Command Line

If you don’t live on the command line, it can be easy to forget the exact syntax of commands. It often leaves you running to the “/?” or “–help” switches, or else a quick Google search to find the proper incantations. Shell-AI is a machine-learning assistant that could change all that by helping you find the proper command for the job, right on the command line!

Shell-AI accepts natural-language inputs — simply type in “shai” followed by what you’re trying to do. It will then take in your request, run it through an OpenAI language model like GPT-3.5-Turbo, and then present you with three (or more) potential commands. You can then select which command to use and get on with your day.

As demonstrated, it’s more than capable of following commands like “download a random image” or “show only image files ls.” And, hilariously, it responds to the request “do something crazy” with just one suggestion: “rm -rf”. That seems rather fitting.

We wouldn’t blindly follow any commands coming out of a large language model, of course. But, if you know what you’re doing, it could prove a useful little tool to ease your regular duties on the command line.

Text Compression Gets Weirdly Efficient With LLMs

It used to be that memory and storage space were so precious and so limited of a resource that handling nontrivial amounts of text was a serious problem. Text compression was a highly practical application of computing power.

Today it might be a solved problem, but that doesn’t mean it doesn’t attract new or unusual solutions. [Fabrice Bellard] released ts_zip which uses Large Language Models (LLM) to attain text compression ratios higher than any other tool can offer.

LLMs are the technology behind natural language AIs, and applying them in this way seems effective. The tradeoff? Unlike typical compression tools, the lossless decompression part isn’t exactly guaranteed when an LLM is involved. Lossy compression methods are in fact quite useful. JPEG compression, for example, is a good example of discarding data that isn’t readily perceived by humans to make a smaller file, but that isn’t usually applied to text. If you absolutely require lossless compression, [Fabrice] has that covered with NNCP, a neural-network powered lossless data compressor.

Do neural networks and LLMs sound far too serious and complicated for your text compression needs? As long as you don’t mind a mild amount of definitely noticeable data loss, check out [Greg Kennedy]’s Lossy Text Compression which simply, brilliantly, and amusingly uses a thesaurus instead of some fancy algorithms. Yep, it just swaps longer words for shorter ones. Perhaps not the best solution for every need, but between that and [Fabrice]’s brilliant work we’re confident there’s something for everyone who craves some novelty with their text compression.

[Photo by Matthew Henry from Burst]

2023 Hackaday Prize: Two Bee Or More Bee Swarm Detection

In the bustling world of bees, swarming is the ultimate game of real estate shuffle. When a hive gets too crowded or craves a change of scenery, colonies scout out swarms for a new hive. [Captain Flatus O’Flaherty] is a beekeeper trying to capture more native honey bees, and a custom LoRa-enabled capture hive helps him do that.

A catch hive, perched high and mighty, lures scouting as potential new homes. If selected, a swarm of over a thousand bees can move in, where [Flatus]’s detector comes in. Many catch hives are scattered around, and manually checking them is difficult. While the breath of one bee is hard to see, a thousand bees produce enough CO2 to be detected by a sensor. A custom PCB with a solar-powered  +30dB LoRa radio measures CO2 and reports back. The PCB contains an ESP32 D4 and a 1-watt Ebyte E22-400M30S LoRa module. If the CO2 levels are still elevated at nightfall, [Flatus] can be pretty confident a swarm has moved in.

Using the data collected, he massaged it to create a dataset suitable for training on XGBoost. With weather data and other conditions, the model tries to predict when a swarm is more or less likely to happen. Apis Mellifera (the local honeybee around [Flatus]) loves sun-kissed, warm, humid afternoons with little wind.

We’ve seen beehive monitors before and love exploring what the data could be used for—video after the break.

Continue reading “2023 Hackaday Prize: Two Bee Or More Bee Swarm Detection”

Several video clips of a robot arm manipulating objects in a kitchen environment, demonstrating some of the 12 generalized skills

RoboAgent Gets Its MT-ACT Together

Researchers at Carnegie Mellon University have shared a pre-print paper on generalized robot training within a small “practical data budget.” The team developed a system that breaks movement tasks into 12 “skills” (e.g., pick, place, slide, wipe) that can be combined to create new and complex trajectories within at least somewhat novel scenarios, called MT-ACT: Multi-Task Action Chunking Transformer. The authors write:

Trained merely on 7500 trajectories, we are demonstrating a universal RoboAgent that can exhibit a diverse set of 12 non-trivial manipulation skills (beyond picking/pushing, including articulated object manipulation and object re-orientation) across 38 tasks and can generalize them to 100s of diverse unseen scenarios (involving unseen objects, unseen tasks, and to completely unseen kitchens). RoboAgent can also evolve its capabilities with new experiences.

Continue reading “RoboAgent Gets Its MT-ACT Together”

Noisy Keyboards Sink Ships

Many of us like a keyboard with a positive click noise when we type. You might want to rethink that, though, in light of a new paper from the UK that shows how researchers trained an AI to decode keystrokes from noise on conference calls.

The researchers point out that people don’t expect sound-based exploits. The paper reads, “For example, when typing a password, people will regularly hide their screen but will do little to obfuscate their keyboard’s sound.”

The technique uses the same kind of attention network that makes models like ChatGPT so powerful. It seems to work well, as the paper claims a 97% peak accuracy over both a telephone or Zoom. In addition, where the model was wrong, it tended to be close, identifying an adjacent keystroke instead of the correct one. This would be easy to correct for in software, or even in your brain as infrequent as it is. If you see the sentence “Paris im the s[ring,” you can probably figure out what was really typed.

We’ve seen this done before, but this technique raises the bar. As sophisticated as keyboard listening was back in the 1970s, you can only imagine what the three-letter agencies can do these days.

In the meantime, the mitigation for this particular threat seems obvious — just start screaming whenever you type in your password.

The AI Engine That Fits In 100K

Running your own AI models is possible, but it requires a giant computer, right? Maybe not. Researchers at NVidia are showing off Perfusion, a text-to-image model they say is 100KB in size and takes four minutes to train. The model specializes in customizing a photo. For example, the paper shows a picture of a teddy bear and a prompt to dress it as a wizard. In all fairness, the small size and quick training are a little misleading, we think, because the results are still using the usual giant model. What’s small and fast is the customization of the existing model.

Customizing models is a common task since you often want to work with something the model doesn’t contain. For example, you might want to alter a picture of your face or your pet, which probably isn’t in the original model. You can create a special keyword and partially train the model for what you want using something called textual inversion. The problem the researchers identified is that creating textual inversions often causes the new training to leak to unintended areas.

They describe “key locking,” a technique to avoid overfitting when fine-tuning an existing model. For example, suppose you want to add a specific dog picture to the model. With typical techniques, a special keyword like dog* will indicate the custom dog image, but the keyword has no connection with generic dogs, mammals, or animals. This makes it difficult for the AI to work with the image. For example, the prompts “a man sitting” and “a dog sitting” require very different image generations. But if we train a specific dog as “dog*” there’s no deeper understanding that “dog*” is a type of “dog” that the model already knows about. So what do you do with “dog* sitting?” Key locking makes that association.

Continue reading “The AI Engine That Fits In 100K”