See What ‘They’ See In Your Photos

Once upon a time, a computer could tell you virtually nothing about an image beyond its file format, size, and color palette. These days, powerful image recognition systems are a part of our everyday lives. They See Your Photos is a simple website that shows you just how much these systems can interpret from a regular photo.

The website simply takes your image submission, runs it through the Google Vision API, and spits back out a description of the image. I tried it out with a photograph of myself, and was pretty impressed with what the vision model saw:

Continue reading “See What ‘They’ See In Your Photos”

Render of life-size robot rat animatronic on blue plane

Robot Rodents: How AI Learned To Squeak And Play

In an astonishing blend of robotics and nature, SMEO—a robot rat designed by researchers in China and Germany — is fooling real rats into treating it like one of their own.

What sets SMEO apart is its rat-like adaptability. Equipped with a flexible spine, realistic forelimbs, and AI-driven behavior patterns, it doesn’t just mimic a rat — it learns and evolves through interaction. Researchers used video data to train SMEO to “think” like a rat, convincing its living counterparts to play, cower, or even engage in social nuzzling. This degree of mimicry could make SMEO a valuable tool for studying animal behavior ethically, minimizing stress on live animals by replacing some real-world interactions.

For builders and robotics enthusiasts, SMEO is a reminder that robotics can push boundaries while fostering a more compassionate future. Many have reservations about keeping intelligent creatures in confined cages or using them in experiments, so imagine applying this tech to non-invasive studies or even wildlife conservation. In a world where robotic dogs, bees, and even schools of fish have come to life, this animatronic rat sounds like an addition worth further exploring. SMEO’s development could, ironically, pave the way for reducing reliance on animal testing.

Continue reading “Robot Rodents: How AI Learned To Squeak And Play”

The Junk Machine Prints Corrupted Advertising On Demand

[ClownVamp]’s art project The Junk Machine is an interactive and eye-catching machine that, on demand, prints out an equally eye-catching and unique yet completely meaningless (one may even say corrupted) AI-generated advertisement for nothing in particular.

The machine is an artistic statement on how powerful software tools that have genuine promise and usefulness to creative types are finding their way into marketer’s hands, and resulting in a deluge of, well, junk. This machine simplifies and magnifies that in a physical way.

We can’t help but think that The Junk Machine is in a way highlighting Sturgeon’s Law (paraphrased as ‘ninety percent of everything is crud’) which happens to be particularly applicable to the current AI landscape. In short, the ease of use of these tools means that crud is also being effortlessly generated at an unprecedented scale, swamping any positive elements.

As for the hardware and software, we’re very interested in what’s inside. Unfortunately there’s no deep technical details, but the broad strokes are that The Junk Machine uses an embedded NVIDIA Jetson loaded up with Stable Diffusion’s SDXL Turbo, an open source AI image generator that can be installed and run locally. When and if a user mashes a large red button, the machine generates a piece of AI junk mail in real time without any need for a network connection of any kind, and prints it from an embedded printer.

Watch it in action in the video embedded below, just under the page break. There are a few more different photos on [ClownVamp]’s X account.

Continue reading The Junk Machine Prints Corrupted Advertising On Demand”

An Animated Walkthrough Of How Large Language Models Work

If you wonder how Large Language Models (LLMs) work and aren’t afraid of getting a bit technical, don’t miss [Brendan Bycroft]’s LLM Visualization. It is an interactively-animated step-by-step walk-through of a GPT large language model complete with animated and interactive 3D block diagram of everything going on under the hood. Check it out!

nano-gpt has only around 85,000 parameters, but the operating principles are all the same as for larger models.

The demonstration walks through a simple task and shows every step. The task is this: using the nano-gpt model, take a sequence of six letters and put them into alphabetical order.

A GPT model is a highly complex prediction engine, so the whole process begins with tokenizing the input (breaking up words and assigning numerical values to the chunks) and ends with choosing an appropriate output from a list of probabilities. There are of course many more steps in between, and different ways to adjust the model’s behavior. All of these are made quite clear by [Brendan]’s process breakdown.

We’ve previously covered how LLMs work, explained without math which eschews gritty technical details in favor of focusing on functionality, but it’s also nice to see an approach like this one, which embraces the technical elements of exactly what is going on.

We’ve also seen a much higher-level peek at how a modern AI model like Anthropic’s Claude works when it processes requests, extracting human-understandable concepts that illustrate what’s going on under the hood.

Playing Chess Against LLMs And The Mystery Of Instruct Models

At first glance, trying to play chess against a large language model (LLM) seems like a daft idea, as its weighted nodes have, at most, been trained on some chess-adjacent texts. It has no concept of board state, stratagems, or even whatever a ‘rook’ or ‘knight’ piece is. This daftness is indeed demonstrated by [Dynomight] in a recent blog post (Substack version), where the Stockfish chess AI is pitted against a range of LLMs, from a small Llama model to GPT-3.5. Although the outcomes (see featured image) are largely as you’d expect, there is one surprise: the gpt-3.5-turbo-instruct model, which seems quite capable of giving Stockfish a run for its money, albeit on Stockfish’s lower settings.

Each model was given the same query, telling it to be a chess grandmaster, to use standard notation, and to choose its next move. The stark difference between the instruct model and the others calls investigation. OpenAI describes the instruct model as an ‘InstructGPT 3.5 class model’, which leads us to this page on OpenAI’s site and an associated 2022 paper that describes how InstructGPT is effectively the standard GPT LLM model heavily fine-tuned using human feedback.

Continue reading “Playing Chess Against LLMs And The Mystery Of Instruct Models”

AI Face Anonymizer Masks Human Identity In Images

We’re all pretty familiar with AI’s ability to create realistic-looking images of people that don’t exist, but here’s an unusual implementation of using that technology for a different purpose: masking people’s identity without altering the substance of the image itself. The result is the photo’s content and “purpose” (for lack of a better term) of the image remains unchanged, while at the same time becoming impossible to identify the actual person in it. This invites some interesting privacy-related applications.

Originals on left, anonymized versions on the right. The substance of the images has not changed.

The paper for Face Anonymization Made Simple has all the details, but the method boils down to using diffusion models to take an input image, automatically pick out identity-related features, and alter them in a way that looks more or less natural. For this purpose, identity-related features essentially means key parts of a human face. Other elements of the photo (background, expression, pose, clothing) are left unchanged. As a concept it’s been explored before, but researchers show that this versatile method is both simpler and better-performing than others.

Diffusion models are the essence of AI image generators like Stable Diffusion. The fact that they can be run locally on personal hardware has opened the doors to all kinds of interesting experimentation, like this haunted mirror and other interactive experiments. Forget tweaking dull sliders like “brightness” and “contrast” for an image. How about altering the level of “moss”, “fire”, or “cookie” instead?

Here’s Code For That AI-Generated Minecraft Clone

A little while ago Oasis was showcased on social media, billing itself as the world’s first playable “AI video game” that responds to complex user input in real-time. Code is available on GitHub for a down-scaled local version if you’d like to take a look. There’s a bit more detail and background in the accompanying project write-up, which talks about both the potential as well as the numerous limitations.

We suspect the focus on supporting complex user input (such as mouse look and an item inventory) is what the creators feel distinguishes it meaningfully from AI-generated DOOM. The latter was a concept that demonstrated AI image generators could (kinda) function as real-time game engines.

Image generators are, in a sense, prediction machines. The idea is that by providing a trained model with a short history of what just happened plus the user’s input as context, it can generate a pretty usable prediction of what should happen next, and do it quickly enough to be interactive. Run that in a loop, and you get some pretty impressive clips to put on social media.

It is a neat idea, and we certainly applaud the creativity of bending an image generator to this kind of application, but we can’t help but really notice the limitations. Sit and stare at something, or walk through dark or repetitive areas, and the system loses its grip and things rapidly go in a downward spiral we can only describe as “dreamily broken”.

It may be more a demonstration of a concept than a properly functioning game, but it’s still a very clever way to leverage image generation technology. Although, if you’d prefer AI to keep the game itself untouched take a look at neural networks trained to use the DOOM level creator tools.