Creating A Twisted Grid Image Illusion With A Diffusion Model

Images that can be interpreted in a variety of ways have existed for many decades, with the classical example being Rubin’s vase — which some viewers see as a vase, and others a pair of human faces.

When the duck becomes a bunny, if you ignore the graphical glitches that used to be part of the duck. (Credit: Steve Mould, YouTube)
When the duck becomes a bunny, if you ignore the graphical glitches that used to be part of the duck. (Credit: Steve Mould, YouTube)

Where things get trickier is if you want to create an image that changes into something else that looks realistic when you rotate each section of it within a 3×3 grid. In a video by [Steve Mould], he explains how this can be accomplished, by using a diffusion model to identify similar characteristics of two images and to create an output image that effectively contains essential features of both images.

Naturally, this process can be done by hand too, with the goal always being to create a plausible image in either orientation that has enough detail to trick the brain into filling in the details. To head down the path of interpreting what the eye sees as a duck, a bunny, a vase or the outline of faces.

Using a diffusion model to create such illusions is quite a natural fit, as it works with filling in noise until a plausible enough image begins to appear. Of course, whether it is a viable image is ultimately not determined by the model, but by the viewer, as humans are susceptible to such illusions while machine vision still struggles to distinguish a cat from a loaf and a raisin bun from a spotted dog. The imperfections of diffusion models would seem to be a benefit here, as it will happily churn through abstractions and iterations with no understanding or interpretive bias, while the human can steer it towards a viable interpretation.

Continue reading “Creating A Twisted Grid Image Illusion With A Diffusion Model”

Large Language Models On Small Computers

As technology progresses, we generally expect processing capabilities to scale up. Every year, we get more processor power, faster speeds, greater memory, and lower cost. However, we can also use improvements in software to get things running on what might otherwise be considered inadequate hardware. Taking this to the extreme, while large language models (LLMs) like GPT are running out of data to train on and having difficulty scaling up, [DaveBben] is experimenting with scaling down instead, running an LLM on the smallest computer that could reasonably run one.

Of course, some concessions have to be made to get an LLM running on underpowered hardware. In this case, the computer of choice is an ESP32, so the dataset was reduced from the trillions of parameters of something like GPT-4 or even hundreds of billions for GPT-3 down to only 260,000. The dataset comes from the tinyllamas checkpoint, and llama.2c is the implementation that [DaveBben] chose for this setup, as it can be streamlined to run a bit better on something like the ESP32. The specific model is the ESP32-S3FH4R2, which was chosen for its large amount of RAM compared to other versions since even this small model needs a minimum of 1 MB to run. It also has two cores, which will both work as hard as possible under (relatively) heavy loads like these, and the clock speed of the CPU can be maxed out at around 240 MHz.

Admittedly, [DaveBben] is mostly doing this just to see if it can be done since even the most powerful of ESP32 processors won’t be able to do much useful work with a large language model. It does turn out to be possible, though, and somewhat impressive, considering the ESP32 has about as much processing capability as a 486 or maybe an early Pentium chip, to put things in perspective. If you’re willing to devote a few more resources to an LLM, though, you can self-host it and use it in much the same way as an online model such as ChatGPT.

DIY Rabbit R1 Clone Could Be Neat With More Hardware

The Teenage Engineering badging usually appears on some cool gear that almost always costs a great deal of money. One such example is the Rabbit R1, an AI-powered personal assistant that retails for $199. It was also revealed that it’s basically a small device running a simple Android app. That raises the question — could build your own dupe for $20? That’s what [Thomas the Maker] did.

Meet Rappit. It’s basically [Thomas]’s take on an AI friend that doesn’t break the bank. It runs on a Raspberry Pi Zero 2W, which has the benefit of integrated wireless connectivity on board. It’s powered by rechargeable AA batteries or a USB power bank to keep things simple. [Thomas] then wrapped it all up in a cute 3D printed enclosure to give it some charm.

It’s software that makes the Rappit what it is. Rather than including a screen, microphone, or speakers on the device itself, [Thomas] interacts with the Pi-based device via smartphone. It makes it a less convincing dupe of the self-contained Rabbit R1, but the basic concept is the same. [Thomas] can make queries of the Rappit via a simple Android or iOS app he created called “Comfyspace,” and the Rappit responds with the aid of Google’s Gemini AI.

If you’re really trying to duplicate the trend of AI assistants, you really need standalone hardware. To that end, the Rappit design could really benefit from a screen, microphone, speaker, and speech synth. Honestly, though, that would only take you a few hours extra work compared to what [Thomas] has already done here. As it is, [Thomas] could simply throw away the Raspberry Pi and just use the smartphone with Gemini directly, right? But he chose this route of using the smartphone as an interface to keep costs down by minimizing hardware outlay.

If you want a real Rabbit R1, you can order one here. We’ve discussed controversy around the device before, too. Video after the break.

Continue reading “DIY Rabbit R1 Clone Could Be Neat With More Hardware”

Taco Bell To Bring Voice AI Ordering To Hundreds Of US Drive-Throughs

Drive-throughs are a popular feature at fast-food places, where you can get some fast grub without even leaving your car. For the fast-food companies running them they are also a big focus of automation, with the ideal being a voice assistant that can take orders and pass them on to the (still human) staff. This probably in lieu of being able to make customers use the touch screens-equipped order kiosks that are common these days. Pushing for this drive-through automation change is now Taco Bell, or specifically the Yum Brands parent company.

This comes interestingly enough shortly after McDonalds deemed its own drive-through voice assistant to be a failure and removing it. Meanwhile multiple Taco Bell in the US in 13 states and five KFC restaurants in Australia are trialing the system, with results apparently encouraging enough to start expanding it. Company officials are cited as it having ‘improved order accuracy’, ‘decreased wait times’ and ‘increased profits’. Considering the McDonalds experience which was pretty much the exact opposite in all of these categories we will remain with bated breath. Feel free to share your Taco Bell or other Voice AI-enabled drive-through experiences in the comments. Maybe whoever Yum Brands contracted for their voice assistant did a surprisingly decent job, which would be a pleasant change.

Top image: Taco Bell – Vadnais Heights, MN (Credit: Gabriel Vanslette, Wikimedia)

AI Image Generator Twists In Response To MIDI Dials, In Real-time

MIDI isn’t just about music, as [Johannes Stelzer] shows by using dials to adjust AI-generated imagery in real-time. The results are wild, with an interactivity to them that we don’t normally see in such things.

[Johannes] uses Stable Diffusion‘s SDXL Turbo to create a baseline image of “photo of a red brick house, blue sky”. The hardware dials act as manual controls for applying different embeddings to this baseline, such as “coral”, “moss”, “fire”, “ice”, “sand”, “rusty steel” and “cookie”.

By adjusting the dials, those embeddings are applied to the base image in varying strengths. The results are generated on the fly and are pretty neat to see, especially since there is no appreciable amount of processing time required.

The MIDI controller is integrated with the help of lunar_tools, a software toolkit on GitHub to facilitate creating interactive exhibits. As for the image end of things, we’ve previously covered how AI image generators work.

Analyzing Feature Learning In Artificial Neural Networks And Neural Collapse

Artificial Neural Networks (ANNs) are commonly used for machine vision purposes, where they are tasked with object recognition. This is accomplished by taking a multi-layer network and using a training data set to configure the weights associated with each ‘neuron’. Due to the complexity of these ANNs for non-trivial data sets, it’s often hard to make head or tails of what the network is actually matching in a given (non-training data) input. In a March 2024 study (preprint) by [A. Radhakrishnan] and colleagues in Science an approach is provided to elucidate and diagnose this mystery somewhat, by using what they call the average gradient outer product (AGOP).

Defined as the uncentered covariance matrix of the ANN’s input-output gradients averaged over the training dataset, this property can provide information on the data set’s features used for predictions. This turns out to be strongly correlated with repetitive information, such as the presence of eyes in recognizing whether lipstick is being worn and star patterns in a car and truck data set rather than anything to do with the (highly variable) vehicles. None of this was perhaps too surprising, but a number of the same researchers used the same AGOP for elucidating the mechanism behind neural collapse (NC) in ANNs.

NC occurs when an ANN gets overtrained (overparametrized). In the preprint paper by [D. Beaglehole] et al. the AGOP is used to provide evidence for the mechanism behind NC during feature learning. Perhaps the biggest take-away from these papers is that while ANNs can be useful, they’re also incredibly complex and poorly understood. The more we learn about their properties, the more appropriately we can use them.

Credit: Daniel Baxter

Mechanical Intelligence And Counterfeit Humanity

It would seem fair to say that the second half of last century up till the present day has been firmly shaped by our relation with technology and that of computers in particular. From the bulking behemoths at universities, to microcomputers at home, to today’s smartphones, smart homes and ever-looming compute cloud, we all have a relationship with computers in some form. One aspect of computers which has increasingly become underappreciated, however, is that the less we see them as physical objects, the more we seem inclined to accept them as humans. This is the point which [Harry R. Lewis] argues in a recent article in Harvard Magazine.

Born in 1947, [Harry R. Lewis] found himself at the forefront of what would become computer science and related disciplines, with some of his students being well-know to the average Hackaday reader, such as [Bill Gates] and [Mark Zuckerberg]. Suffice it to say, he has seen every attempt to ‘humanize’ computers, ranging from ELIZA to today’s ChatGPT. During this time, the line between humans and computers has become blurred, with computer systems becoming increasingly more competent at imitating human interactions even as they vanished into the background of daily life.

These counterfeit ‘humans’ are not capable of learning, of feeling and experiencing the way that humans can, being at most a facsimile of a human for all but that what makes a human, which is often referred to as ‘the human experience’. More and more of us are communicating these days via smartphone and computer screens with little idea or regard for whether we are talking to a real person or not. Ironically, it seems that by anthropomorphizing these counterfeit humans, we risk becoming less human in the process, while also opening the floodgates for blaming AI when the blame lies square with the humans behind it, such as with the recent Air Canada chatbot case. Equally ridiculous, [Lewis] argues, is the notion that we could create a ‘superintelligence’ while training an ‘AI’ on nothing but the data scraped off the internet, as there are many things in life which cannot be understood simply by reading about them.

Ultimately, the argument is made that it is humanistic learning that should be the focus point of artificial intelligence, as only this way we could create AIs that might truly be seen as our equals, and beneficial for the future of all.