CUDA, But Make It AMD

Compute Unified Device Architecture, or CUDA, is a software platform for doing big parallel calculation tasks on NVIDIA GPUs. It’s been a big part of the push to use GPUs for general purpose computing, and in some ways, competitor AMD has thusly been left out in the cold. However, with more demand for GPU computation than ever, there’s been a breakthrough. SCALE from [Spectral Compute] will let you compile CUDA applications for AMD GPUs.

SCALE allows CUDA programs to run as-is on AMD GPUs, without modification. The SCALE compiler is also intended as a drop-in swap for nvcc, right down to the command line options. For maximum ease of use, it acts like you’ve installed the NVIDIA Cuda Toolkit, so you can build with cmake just like you would for a normal NVIDIA setup. Currently, Navi 21 and Navi 31 (RDNA 2.0 and RDNA 3.0) targets are supported, while a number of other GPUs are undergoing testing and development.

The basic aim is to allow developers to use AMD hardware without having to maintain an entirely separate codebase. It’s still a work in progress, but it’s a promising tool that could help break NVIDIA’s stranglehold on parts of the GPGPU market.

 

Tabletop Handybot Is Handy, And Powered By AI

Decently useful AI has been around for a little while now, and robotic arms have been around much longer. Yet somehow, we don’t have little robot helpers on our desks yet! Thankfully, [Yifei] is working towards that reality with Tabletop Handybot.

What [Yifei] has developed is a robotic arm that accepts voice commands. The robot relies on a Realsense D435 RGB-D camera, which provides color vision with depth information as well. Grounding DINO is used for object detection on the RGB images. Segment Anything and Open3D are used for further processing of the visual and depth data to help the robot understand what it’s looking at. Meanwhile, voice commands are interpreted via OpenAI Whisper, which can feed prompts to ChatGPT for further processing.

[Yifei] demonstrates his robot picking up markers on command, which is a pretty cool demo. With so many modern AI tools available, we’re getting closer to the ideal of robots that can understand and execute on general spoken instructions. This is a great example. We may not be all the way there yet, but perhaps soon. Video after the break.

Continue reading “Tabletop Handybot Is Handy, And Powered By AI”

Generative AI Hits The Commodore 64

Image-generating AIs are typically trained on huge arrays of GPUs and require great wads of processing power to run. Meanwhile, [Nick Bild] has managed to get something similar running on a Commodore 64. (via Tom’s Hardware).

A figure generated by [Nick]’s C64. We shall name him… “Sword Guy”!
As you might imagine, [Nick’s] AI image generator isn’t churning out 4K cyberpunk stills dripping in neon. Instead, he aimed at a smaller target, more befitting the Commodore 64 itself. His image generator creates 8×8 game sprites instead.

[Nick’s] model was trained on 100 retro-inspired sprites that he created himself. He did the training phase on a modern computer, so that the Commodore 64 didn’t have to sweat this difficult task on its feeble 6502 CPU. However, it’s more than capable of generating sprites using the model, thanks to some BASIC code that runs off of the training data. Right now, it takes the C64 about 20 minutes to run through 94 iterations to generate a decent sprite.

8×8 sprites are generally simple enough that you don’t need to be an artist to create them. Nonetheless, [Nick] has shown that modern machine learning techniques can be run on slow archaic hardware, even if there is limited utility in doing so. Video after the break.

Continue reading “Generative AI Hits The Commodore 64”

How AI Large Language Models Work, Explained Without Math

Large Language Models (LLMs ) are everywhere, but how exactly do they work under the hood? [Miguel Grinberg] provides a great explanation of the inner workings of LLMs in simple (but not simplistic) terms that eschews the low-level mathematics of how they work in favor of laying bare what it is they do.

At their heart, LLMs are prediction machines that work on tokens (small groups of letters and punctuation) and are as a result capable of great feats of human-seeming communication. Most technical-minded people understand that LLMs have no idea what they are saying, and this peek at their inner workings will make that abundantly clear.

Be sure to also review an illustrated guide to how image-generating AIs work. And if a peek under the hood of LLMs left you hungry for more low-level details, check out our coverage of training a GPT-2 LLM using pure C code.

The Perfect Desktop Kit For Experimenting With Self Driving Cars

When we think about self-driving cars, we normally think about big projects measured in billions of dollars, all funded by major automakers. But you can still dive into this world on a smaller scale, as [jmoreno555] demonstrates.

The build consists of a small RC car—an HSP 94123, in fact. It’s got a simple brushed motor inside, driven by a conventional speed controller, and servo-driven steering. A Raspberry Pi 4 is charged with driving the car, but it’s not alone. It’s outfitted with a Google Coral USB stick, which is a machine learning accelerator card capable of 4 trillion operations per second. The car also has a Wemos D1 onboard, charged with interfacing distance sensors to give the car a sense of its environment. Vision is courtesy of a 1.2-megapixel camera with a 160-degree lens, and a stereoscopic camera with twin 75-degree lenses. Software-wise, it’s early days yet. [jmoreno555] is exploring the use of Python and OpenCV to implement basic lane detection and other self driving routines, while using Blender as a simulator.

The real magic idea, though, is the treadmill. [jmoreno555] realized that one of the frustrations of working in this space is in having to chase a car around a test track. Instead, the use of a desktop treadmill allows the car to be programmed and debugged with less fuss in the early stages of development.

If you’re looking for a platform to experiment with AI and self-driving, this could be an project to dive in to. We’ve covered some other great builds in this space, too. Meanwhile, if you’ve cracked driving autonomy and want to let us know, our tipsline is always standing by!

Two assembled 1 dollar TinyML boards

$1 TinyML Board For Your “AI” Sensor Swarm

You might be under the impression that machine learning costs thousands of dollars to work with. That might be true in many cases, but there’s more to machine learning than you might think. For instance, what if you could shower anything with a network of cheap machine-learning-enabled sensors? The 1 dollar TinyML project by [Jon Nordby] allows you to do just that. These tiny boards host an STM32-like MCU, a BLE module, lithium ion power circuitry, and some nice sensor options — an accelerometer, a pair of microphones, and a light sensor.

What could you do with these sensors? [Jon] has talked a bit about a few commercial and non-commercial applications he’s worked on in his ML career, and tells us that the accelerometer alone lets you do human presence detection, sleep tracking, personal activity monitoring, or vibration pattern sensing, for a start. As for the sound input, there’s tasks ranging from gunshot or clapping detection, to coffee roasting process tracking, voice and speech detection, and surely much more. Just a few years ago, we’ve seen machine learning used to comfort a barking dog while its owner is away.

Bottom line is, you ought to get a few of these in your hands and start playing with ML. You still might need a bit of beefier hardware to train your code, but it gets that much easier once you have a network of sensors waiting for your command. Plus, since it’s an open source project, you’ll have a much easier time adding on any additional capabilities your particular application might need.

These boards are pretty cost-optimized, which makes it possible for you to order a couple dozen without breaking the bank. The $1 target is BOM cost, especially if you opt to not include one of the pricier sensors. You can assemble these boards yourself, or get them assembled at a fab of your choice for barely a cost increase. As for software, they will work with the emlearn framework.

Everything is on GitHub — from KiCad sources to Jupyter notebooks. As for Hackaday.io, there are five worklogs of impressive insight — the microphone worklog alone will teach you about microphone amplification in low-power conditions while keeping the cost low. Not as price-constrained and want to try on some image processing tasks? Here’s a beautiful Pi Pico ArduCam board with a camera and a TFT screen.

Train A GPT-2 LLM, Using Only Pure C Code

[Andrej Karpathy] recently released llm.c, a project that focuses on LLM training in pure C, once again showing that working with these tools isn’t necessarily reliant on sprawling development environments. GPT-2 may be older but is perfectly relevant, being the granddaddy of modern LLMs (large language models) with a clear heritage to more modern offerings.

LLMs are fantastically good at communicating despite not actually knowing what they are saying, and training them usually relies on PyTorch deep learning library, itself written in Python. llm.c takes a simpler approach by implementing the neural network training algorithm for GPT-2 directly. The result is highly focused and surprisingly short: about a thousand lines of C in a single file. It is a highly elegant process that does the same thing the bigger, clunkier methods accomplish. It can run entirely on a CPU, or it can take advantage of GPU acceleration, where available.

This isn’t the first time [Andrej Karpathy] has bent his considerable skills and understanding towards boiling down these sorts of concepts into bare-bones implementations. We previously covered a project of his that is the “hello world” of GPT, a tiny model that predicts the next bit in a given sequence and offers low-level insight into just how GPT (generative pre-trained transformer) models work.