A Milliwatt Of DOOM

The seminal 1993 first-person shooter from id Software, DOOM, has become well-known as a test of small computer platforms. We’ve seen it on embedded systems far and wide, but we doubt we’ve ever seen it consume as little power as it does on a specialized neural network processor. The chip in question is a Syntiant NDP200, and it’s designed to be the always-on component listening for the wake word or other trigger in an AI-enabled IoT device.

DOOM running on as little as a milliwatt of power makes for an impressive PR stunt at a trade show, but perhaps more interesting is that the chip isn’t simply running the game, it’s also playing it. As a neural network processor it contains the required smarts to learn how to play the game, and in the simple circular level it’s soon picking off the targets with ease.

We’ve not seen any projects using these chips as yet, which is hardly surprising given their niche marketplace. It is however worth noting that there is a development board for the lower-range sibling chip NDP101, which sells for around $35 USD. Super-low-power AI is within reach.

Teaching A Robot To Hallucinate

Training robots to execute tasks in the real world requires data — the more, the better. The problem is that creating these datasets takes a lot of time and effort, and methods don’t scale well. That’s where Robot Learning with Semantically Imagined Experience (ROSIE) comes in.

The basic concept is straightforward: enhance training data with hallucinated elements to change details, add variations, or introduce novel distractions. Studies show a robot additionally trained on this data performs tasks better than one without.

This robot is able to deposit an object into a metal sink it has never seen before, thanks to hallucinating a sink in place of an open drawer in its original training data.

Suppose one has a dataset consisting of a robot arm picking up a coke can and placing it into an orange lunchbox. That training data is used to teach the arm how to do the task. But in the real world, maybe there is distracting clutter on the countertop. Or, the lunchbox in the training data was empty, but the one on the counter right now already has a sandwich inside it. The further a real-world task differs from the training dataset, the less capable and accurate the robot becomes.

ROSIE aims to alleviate this problem by using image diffusion models (such as Imagen) to enhance the training data in targeted and direct ways. In one example, a robot has been trained to deposit an object into a drawer. ROSIE augments this training by inpainting the drawer in the training data, replacing it with a metal sink. A robot trained on both datasets competently performs the task of placing an object into a metal sink, despite the fact that a sink never actually appears in the original training data, nor has the robot ever seen this particular real-world sink. A robot without the benefit of ROSIE fails the task.

Here is a link to the team’s paper, and embedded below is a video demonstrating ROSIE both in concept and in action. This is also in a way a bit reminiscent of a plug-in we recently saw for Blender, which uses an AI image generator to texture entire 3D scenes with a simple text prompt.

Continue reading “Teaching A Robot To Hallucinate”

This Camera Produces A Picture, Using The Scene Before It

It’s the most basic of functions for a camera, that when you point it at a scene, it produces a photograph of what it sees. [Jasper van Loenen] has created a camera that does just that, but not perhaps in the way we might expect. Instead of committing pixels to memory it takes a picture, uses AI to generate a text description of what is in the picture, and then uses another AI to generate an image from that picture. It’s a curiously beautiful artwork as well as an ultimate expression of the current obsession with the technology, and we rather like it.

The camera itself is a black box with a simple twin-lens reflex viewfinder. Inside is a Raspberry Pi that takes the photo and sends it through the various AI services, and a Fuji Instax Mini printer. Of particular interest is the connection to the printer which we think may be of interest to quite a few others, he’s reverse engineered the Bluetooth protocols it uses and created Python code allowing easy printing. The images it produces are like so many such AI-generated pieces of content, pretty to look at but otherworldly, and weird parallels of the scenes they represent.

It’s inevitable that consumer cameras will before long offer AI augmentation features for less-competent photographers, meanwhile we’re pleased to see Jasper getting there first.

Tiny Machine Learning On As Little As 2 KB Of RAM

All of the machine language stuff coming out lately doesn’t affect you if you are developing with embedded microcontrollers, right? Perhaps not. Microsoft Research India wants you to use their EdgeML tool to do machine learning tasks such as gesture recognition in tiny devices like an Arduino Uno. According to the developers, you might need as little as 2 KB of RAM. There’s no network connection required and the work is using Tensorflow underneath, so it is compatible with much of what you’ll find for bigger computers.

If you add processing power, you can get more capability. For example, one of the demonstrations is a wake-word recognizer on a Raspberry Pi Zero (although the page for that demo seems to be missing at the moment; try the GesturePod, instead).

The system generally uses Python, but there are efficient C++ implementations for selected algorithms. The code lives on GitHub. There are also a number of research papers about each tool that you can find on the GitHub page. There’s also a recent paper on MinUn, an attempt to make things even more efficient for ARM microcontrollers. In particular, MinUn can store approximate numbers to save space, allows for variable precision of tensors, and tries to reduce memory fragmentation, an important feature for CPUs that don’t have memory management units.

If you haven’t studied TensorFlow yet, start here. Why use something like this with a microcontroller? How about smarter robots?

How To Roll Your Own Custom Object Detection Neural Network

Real-time object detection, which uses neural networks and deep learning to rapidly identify and tag objects of interest in a video feed, is a handy feature with great hacker potential. Happily, it’s also possible to make customized CNNs (convolutional neural networks) tailored for one’s own needs, and that process just got easier thanks to some new documentation for the Vizy “AI camera” by Charmed Labs.

Raspberry Pi-based Vizy camera

Charmed Labs has been making hacker-friendly machine vision devices for a long time, and the Vizy camera impressed us mightily when we checked it out last year. Out of the box, Vizy has a perfectly functional object detector application that runs locally on the device, and can detect and tag many common everyday objects in real time. But what if that default application doesn’t quite meet one’s project needs? Good news, because it’s possible to create a custom-trained CNN, and that process got a lot more accessible thanks to step-by-step examples of training a model to recognize hands doing rock-paper-scissors.

Person and cat with machine-generated tags identifying them
Default object detection works well, but sometimes one needs custom results.

The basic process is this: Start with a variety of images that show the item of interest. Then identify and label the item of interest in each photo. These photos (a “training set”) are then sent to Google Colab, which will be used to generate a neural network. The resulting CNN model can then be downloaded and used, to see how well it performs.

Of course things rarely work perfectly the first time around, so at this point it’s pretty common for some refinement to be needed to increase accuracy. Luckily there are a number of tools to help do this without creating a new model from scratch, so it’s just a matter of tweaking until things perform acceptably.

Google Colab is free and the resulting CNNs are implemented in the TensorFlow Lite framework, meaning it’s possible to use them elsewhere. So if custom object detection has been holding up a project idea of yours, this might be what gets you over that hump.

Getty Images Is Suing An AI Image Generator For Using Its Images

As per the Getty Images legal complaint, the Stable Diffusion AI seems to reproduce gooey versions of the Getty Images watermark in some of its output. Credit: Getty Images

Many AI systems require huge training datasets in order to achieve their impressive feats. This applies whether or not you’re talking about an AI that works with images, natural language, or just about anything else. AI developers are starting to come under scrutiny for where they’re sourcing their datasets. Unsurprisingly, stock photo site Getty Images is at the forefront of this, and is now suing the creators of Stable Diffusion over the matter, as reported by The Verge.

Stability AI, the company behind Stable Diffusion, is the target of the lawsuit for one good reason: there’s compelling evidence the company used Getty Images content without permission. The Stable Diffusion AI has been seen to generate output images that actually include blurry approximations of the Getty Images watermark. This is somewhat of a smoking gun to suggest that Stability AI may have scraped Getty Images content for use as training material.

The copyright implications are unclear, but using any imagery from a stock photo database without permission is always asking for trouble. Various arguments will likely play out in court. Stability AI may make claims that their activity falls under fair use guidelines, while Getty Images may claim that the appearance of perverted versions of their watermark may break trademark rules. The lawsuit could have serious implications for AI image generators worldwide, and is sure to be watched closely by the nascent AI industry. As with any legal matter, just don’t expect a quick answer from the courts.

[Thanks to Dan for the tip!]

Does Programming A Robot With ChatGPT Work At All?

ChatGPT has been put to all manner of silly uses since it first became available online. [Engineering After Hours] decided to see if its coding skills were any chop, and put it to work programming a circular saw. Pun intended.

The aim was to build a line following robot armed with a circular saw to handle lawn edging tasks.  The circular saw itself consists of a motor with a blade on it, and precisely no safety features. It’s mounted on the front of a small RC car with a rack and pinion to control its position. [Engineering After Hours] has some sage advice in this area: don’t try this at home.

ChatGPT was not only able to give advice on what parts to use, it was able to tell [Engineering After Hours] on how to hook everything up to an Arduino and even write the code. The AI language model even recommended a PID loop to control the position of the circular saw. Initial tests were messy, but some refinement got things impressively functional.

As a line following robot, the performance is pretty crummy. However, as a robot programmed by an AI, it does pretty okay. Obviously, it’s hard to say how much help the AI had, and how many corrections [Engineering After Hours] had to make to the code to get everything working. But the fact that this kind of project is even possible shows us just how far AI has really come.

Continue reading “Does Programming A Robot With ChatGPT Work At All?”