Double-Dose Of AI Turns Daily Tasks Into Works Of Art

Not so long ago, “Magic Mirror” builds were all the rage, and we have to admit getting out daily reminders and newsfeeds on an LCD display sitting behind a partially reflective mirror is not without its charms. But styles ebb and flow, so we don’t see too many of those builds anymore. This e-ink daily calendar reminder hearkens back to those Magic Mirrors, only with a double twist of AI.

This project is the work of [Ilkka Turunen], and right up front we’ll say the results are just gorgeous. A lot of that has to do with the 10.3″ e-ink display used, but more with the creative use of not one but two machine learning systems. The first is ChatGPT, which [Ilkka] uses to parse the day’s online calendar entries and grab the most significant events to generate a prompt for DALL-E. The generated DALL-E prompt has specific instructions that guide the style of the image, which honestly is where most of the artistry lies. [Ilkka]’s aesthetic choices, like suggesting that the images look like a 19th-century lithograph or a satirical comic from a turn-of-the-(last)-century newspaper. The prompt is then sent off to DALL-E for rendering, and the resulting image is displayed.

It has to be said that the prompts that ChatGPT generates based on the combination of [Ilkka]’s aesthetic preferences and the random events of the day are strikingly complex. The chatbot really seems to be showing some imagination these days; DALL-E is no slouch either in turning those words into images.

Like the idea of an e-ink daily reminder but prefer a less artistic presentation? This should help.

Continue reading “Double-Dose Of AI Turns Daily Tasks Into Works Of Art”

3D Human Models From A Single Image

You’ve seen it in movies and shows — the hero takes a blurry still picture, and with a few keystrokes, generates a view from a different angle or sometimes even a full 3D model. Turns out, thanks to machine learning and work by several researchers, this might be possible. As you can see in the video below, using “shape-guided diffusion,” the researchers were able to take a single image of a person and recreate a plausible 3D model.

Of course, the work relies on machine learning. As you’ll see in the video, this isn’t a new idea, but previous attempts have been less than stellar. This new method uses shape prediction first, followed by an estimate of the back view appearance. The algorithm then guesses what images go between the initial photograph and the back view. However, it uses the 3D shape estimate as a guideline. Even then,  there is some post-processing to join the intermediate images together into a model.

The result looks good, although the video does point out some areas where they still fall short. For example, unusual lighting can affect the results.

This beats spinning around a person or a camera to get many images. Scanning people in 3D is a much older dream than you might expect.

Continue reading “3D Human Models From A Single Image”

Synthesizing 360-degree Views From Single Source Images

ZeroNVS is one of those research projects that is rather more impressive than it may look at first glance. On one hand, the 3D reconstructions — we urge you to click that first link to see them — look a bit grainy and imperfect. But on the other hand, it was reconstructed using a single still image as an input.

Most results look great, but some — like this bike visible through a park bench — come out a bit strange. A valiant effort for a single-image input, all things considered.

How is this done? It’s NeRFs (neural radiance fields) which leverages machine learning, but with yet another new twist. Existing methods mainly focus on single objects and masked backgrounds, but a new approach makes this method applicable to a variety of complex, in-the-wild images without the need to train new models.

There are a ton of sample outputs on the project summary page that are worth a browse if you find this sort of thing at all interesting. Some of the 360 degree reconstructions look rough, some are impressive, and some are a bit amusing. For example indoor shots tend to reconstruct rooms that look good, but lack doorways.

There is a research paper for those seeking additional details and a GitHub repository for the code, but the implementation requires some significant hardware.

AI In A Box Envisions AI As A Private, Offline, Hackable Module

[Useful Sensors] aims to embed a variety of complementary AI tools into a small, private, self-contained module with no internet connection with AI in a Box. It can do live voice recognition and captioning, live translation, and natural language conversational interaction with a local large language model (LLM). Intriguingly, it’s specifically designed with features to make it hack-friendly, such as the ability to act as a voice keyboard by sending live transcribed audio as keystrokes over USB.

Based on the RockChip 3588S SoC, the unit aims to have an integrated speaker, display, and microphone.

Right now it’s wrapping up a pre-order phase, and aims to ship units around the end of January 2024. The project is based around the RockChip 3588S SoC and is open source (GitHub repository), but since it’s still in development, there’s not a whole lot visible in the repository yet. However, a key part of getting good performance is [Useful Sensors]’s own transformers library for the RockChip NPU (neural processing unit).

The ability to perform things like high quality local voice recognition and run locally-hosted LLMs like LLaMa have gotten a massive boost thanks to recent advances in machine learning, and it looks like this project aims to tie them together in a self-contained package.

Perhaps private digital assistants can become more useful when users can have the freedom to modify and integrate them as they see fit. Digital assistants hosted by the big tech companies are often frustrating, and others have observed that this is ultimately because they primarily exist to serve their makers more than they help users.

Continue reading AI In A Box Envisions AI As A Private, Offline, Hackable Module”

Full Self-Driving, On A Budget

Self-driving is currently the Holy Grail in the automotive world, with a number of companies racing to build general-purpose autonomous vehicles that can get from point A to point B with no user input. While no one has brought one to market yet, at least one has promised this feature and had customers pay for it, but continually moved the goalposts for delivery due to how challenging this problem turns out to be. But it doesn’t need to be that hard or expensive to solve, at least in some situations.

The situation in question is driving on a single stretch of highway, and only focuses on steering, so it doesn’t handle the accelerator or brake pedal input. The highway is driven normally, using a webcam to take images of the route and an Arduino to capture data about the steering angle. The idea here is that with enough training the Arduino could eventually steer the car. But first some math needs to happen on the training data since the steering wheel is almost always not turning the car, so the Arduino knows that actual steering events aren’t just statistical anomalies. After the training, the system does a surprisingly good job at “driving” based on this data, and does it on a budget not much larger than laptop, microcontroller, and webcam.

Admittedly, this project was a proof-of-concept to investigate machine learning, neural networks, and other statistical algorithms used in these sorts of systems, and doesn’t actually drive any cars on any roadways. Even the creator says he wouldn’t trust it himself, but that he was pleasantly surprised by the results of such a simple system. It could also be expanded out to handle brake and accelerator pedals with separate neural networks as well. It’s not our first budget-friendly self-driving system, either. This one makes it happen with the enormous computing resources of a single Android smartphone.

Continue reading “Full Self-Driving, On A Budget”

Keeping Badgers At Bay With Tensorflow

Human-animal conflict is always a contentious issue, and finding ways to prevent damage without causing harm to the animals often requires creative solutions. [James Milward] needed a humane method to stop badgers and foxes from uprooting his garden, leading him to create the Furbinator 3000, a system that combines computer vision with audio deterrents..

[James] initially tried using scent repellents (which were ignored) and blocking access to his garden (resulting in more digging), but found some success with commercial ultrasonic audio repellent devices. However, these had to be manually turned off during the day to avoid annoying activation of the PIR motion sensors by [James] and his family, and the integrated solar panels couldn’t keep up with the load.

This presented a good opportunity to try his hand at practical machine vision. He already had a substantial number of sample images from the Ring cameras in his garden, which he turned into a functional TensorFlow Lite model with about 2.5 hours of training. He linked it with event-activated RTSP streams from his Ring cameras using the ring-mqtt library. To minimize false positives on stationary objects, he incorporated a motion filter into the processing pipeline. When it identifies a fox or badger with reasonable accuracy, it generates an MQTT event.

[James] modified the ultrasonic devices so they would react to these events using an ESP8266-based WeMos D1 Mini Pro development board and added an external 5 V power supply for sustained operation. All development was performed in a Docker container which simplified deployment on a Raspberry Pi 4.

After implementing the system, [James] woke up to the satisfying sight of his garden remaining untouched overnight, a victory that even earned him some coverage by the BBC.

Thanks for the tip [Laurent]!

Compact, Gesture-Based Remote Control Over Bluetooth

[AlexMiller11] shared a project for a DIY gesture-sensing remote control that acts like a Bluetooth keyboard, capable of controlling media and presentations on a computer with a high degree of accuracy.

The device recognizes eight different gestures and controls a host PC over Bluetooth.

The hardware is a Silicon Labs xG24 dev kit, a small IoT-focused board able to be powered by a CR2032 cell. Part of what makes it all work is the six-axis IMU sensor, but the rest is the software to interpret that data and figure out what motions the user is trying to do. That happens with a Neuton.AI model and SDK, a tiny but effective machine learning framework for small devices.

How does it actually work? The device acts as a Bluetooth HID, and gets connected to a PC in the same was as a regular Bluetooth keyboard. Once that’s done, recognized gestures are printed out the serial port as well as sent via Bluetooth to the host machine. Media can then be played, paused, volume adjusted, presentations controlled, and more. More details are on the project’s GitHub repository. There’s also a demo video that explains exactly what’s going on, embedded below the page break.

Machine learning is a way of using software to solve the kinds of problems humans are not very good at writing programs to solve, and accurate gesture recognition is a good example. Not all such applications require heaps of overheating GPUs, either. We’ve seen the concept of a neural network stripped down to its bare essentials running on an Arduino Uno, for those who would like to better appreciate the fundamentals.

Continue reading “Compact, Gesture-Based Remote Control Over Bluetooth”