AI And Savvy Marketing Create Dubious Moon Photos

Taking a high-resolution photo of the moon is a surprisingly difficult task. Not only is a long enough lens required, but the camera typically needs to be mounted on a tracking system of some kind, as the moon moves too fast for the long exposure times needed. That’s why plenty were skeptical of Samsung’s claims that their latest smart phone cameras could actually photograph this celestial body with any degree of detail. It turns out that this skepticism might be warranted.

Samsung’s marketing department is claiming that this phone is using artificial intelligence to improve photos, which should quickly raise a red flag for anyone technically minded. [ibreakphotos] wanted to put this to the test rather than speculate, so a high-resolution image of the moon was modified in such a way that most of the fine detail of the image was lost. Displaying this image on a monitor, standing across the room, and using the smartphone in question reveals details in the image that can’t possibly be there.

The image that accompanies this post shows the two images side-by-side for those skeptical of these claims, but from what we can tell it looks like this is essentially an AI system copy-pasting the moon into images it thinks are of the moon itself. The AI also seems to need something more moon-like than a ping pong ball to trigger the detail overlay too, as other tests appear to debunk a more simplified overlay theory. It seems like using this system, though, is doing about the same thing that this AI camera does to take pictures of various common objects.

ChatGPT, Bing, And The Upcoming Security Apocalypse

Most security professionals will tell you that it’s a lot easier to attack code systems than it is to defend them, and that this is especially true for large systems. The white hat’s job is to secure each and every point of contact, while the black hat’s goal is to find just one that’s insecure.

Whether black hat or white hat, it also helps a lot to know how the system works and exactly what it’s doing. When you’ve got the source code, either because it’s open-source, or because you’re working inside the company that makes the software, you’ve got a huge advantage both in finding bugs and in fixing them. In the case of closed-source software, the white hats arguably have the offsetting advantage that they at least can see the source code, and peek inside the black box, while the attackers cannot.

Still, if you look at the number of security issues raised weekly, it’s clear that even in the case of closed-source software, where the defenders should have the largest advantage, that offense is a lot easier than defense.

So now put yourself in the shoes of the poor folks who are going to try to secure large language models like ChatGPT, the new Bing, or Google’s soon-to-be-released Bard. They don’t understand their machines. Of course they know how the work inside, in the sense of cross multiplying tensors and updating weights based on training sets and so on. But because the billions of internal parameters interact in incomprehensible ways, almost all researchers refer to large language models’ inner workings as a black box.

And they haven’t even begun to consider security yet. They’re still worried about how to construct obscure background prompts that prevent their machines from spewing hate speech or pornographic novels. But as soon as the machines start doing something more interesting than just providing you plain text, the black hats will take notice, and someone will have to figure out defense.

Indeed, this week, we saw the first real shot across the bow: a hack to make Bing direct users to arbitrary (bad) webpages. The Bing hack requires the user to already be on a compromised website, so it’s maybe not very threatening, but it points out a possible real security difference between Bing and ChatGPT: Bing gives you links to follow, and that makes it a juicy target.

We’re right on the edge of a new security landscape, because even the white hats are facing a black box in the AI. So far, what ChatGPT and Codex and other large language models are doing is trivially secure – putting out plain text – but Bing is taking the first dangerous steps into doing something more useful, both for users and black hats. Given the ease with which people have undone OpenAI’s attempts to keep ChatGPT in its comfort zone, my guess is that the white hats will have their hands full, and the black-box nature of the model deprives them of their best hope. Buckle your seatbelts.

Teaching A Robot To Hallucinate

Training robots to execute tasks in the real world requires data — the more, the better. The problem is that creating these datasets takes a lot of time and effort, and methods don’t scale well. That’s where Robot Learning with Semantically Imagined Experience (ROSIE) comes in.

The basic concept is straightforward: enhance training data with hallucinated elements to change details, add variations, or introduce novel distractions. Studies show a robot additionally trained on this data performs tasks better than one without.

This robot is able to deposit an object into a metal sink it has never seen before, thanks to hallucinating a sink in place of an open drawer in its original training data.

Suppose one has a dataset consisting of a robot arm picking up a coke can and placing it into an orange lunchbox. That training data is used to teach the arm how to do the task. But in the real world, maybe there is distracting clutter on the countertop. Or, the lunchbox in the training data was empty, but the one on the counter right now already has a sandwich inside it. The further a real-world task differs from the training dataset, the less capable and accurate the robot becomes.

ROSIE aims to alleviate this problem by using image diffusion models (such as Imagen) to enhance the training data in targeted and direct ways. In one example, a robot has been trained to deposit an object into a drawer. ROSIE augments this training by inpainting the drawer in the training data, replacing it with a metal sink. A robot trained on both datasets competently performs the task of placing an object into a metal sink, despite the fact that a sink never actually appears in the original training data, nor has the robot ever seen this particular real-world sink. A robot without the benefit of ROSIE fails the task.

Here is a link to the team’s paper, and embedded below is a video demonstrating ROSIE both in concept and in action. This is also in a way a bit reminiscent of a plug-in we recently saw for Blender, which uses an AI image generator to texture entire 3D scenes with a simple text prompt.

Continue reading “Teaching A Robot To Hallucinate”

Simultaneous Invention, All The Time?

As Tom quipped on the podcast this week, if you have an idea for a program you’d like to write, all you have to do is look around on GitHub and you’ll find it already coded up for you. (Or StackOverflow, or…) And that’s probably pretty close to true, at least for really trivial bits of code. But it hasn’t always been thus.

I was in college in the mid 90s, and we had a lab of networked workstations that the physics majors could use. That’s where I learned Unix, and where I had the idea for the simplest program ever. It took the background screen color, in the days before wallpapers, and slowly random-walked it around in RGB space. This was set to be slow enough that anyone watching it intently wouldn’t notice, but fast enough that others occasionally walking by my terminal would see a different color every time. I assure you, dear reader, this was the very height of wit at the time.

With the late 90s came the World Wide Web and the search engine, and the world got a lot smaller. For some reason, I was looking for how to set the X terminal background color again, this time searching the Internet instead of reading up in a reference book, and I stumbled on someone who wrote nearly exactly the same random-walk background color changer. My jaw dropped! I had found my long-lost identical twin brother! Of course, I e-mailed him to let him know. He was stoked, and we shot a couple funny e-mails back and forth riffing on the bizarre coincidence, and that was that.

Can you imagine this taking place today? It’s almost boringly obvious that if you search hard enough you’ll find another monkey on another typewriter writing exactly the same sentence as you. It doesn’t even bear mentioning. Heck, that’s the fundamental principle behind Codex / CoPilot – the code that you want to write has been already written so many times that it will emerge as the most statistically likely response from a giant pattern-matching, word-word completion neural net model.

Indeed, stop me if you’ve read this before.

This Camera Produces A Picture, Using The Scene Before It

It’s the most basic of functions for a camera, that when you point it at a scene, it produces a photograph of what it sees. [Jasper van Loenen] has created a camera that does just that, but not perhaps in the way we might expect. Instead of committing pixels to memory it takes a picture, uses AI to generate a text description of what is in the picture, and then uses another AI to generate an image from that picture. It’s a curiously beautiful artwork as well as an ultimate expression of the current obsession with the technology, and we rather like it.

The camera itself is a black box with a simple twin-lens reflex viewfinder. Inside is a Raspberry Pi that takes the photo and sends it through the various AI services, and a Fuji Instax Mini printer. Of particular interest is the connection to the printer which we think may be of interest to quite a few others, he’s reverse engineered the Bluetooth protocols it uses and created Python code allowing easy printing. The images it produces are like so many such AI-generated pieces of content, pretty to look at but otherworldly, and weird parallels of the scenes they represent.

It’s inevitable that consumer cameras will before long offer AI augmentation features for less-competent photographers, meanwhile we’re pleased to see Jasper getting there first.

Let Machine Learning Code An Infinite Variety Of Pong Games

In a very real way, Pong started the video game revolution. You wouldn’t have thought so at the time, with its simple gameplay, rudimentary controls, some very low-end sounds, and a cannibalized TV for a display, but the legendarily stuffed coinboxes tell the tale. Fast forward 50 years or so, and Pong has been largely reduced to a programmer’s exercise to see how few lines of code can stand in for what [Ted Dabney] and [Allan Alcorn] accomplished. But now even that’s too much, as OpenAI Codex can generate a playable Pong from just a few prompts, at least most of the time. Continue reading “Let Machine Learning Code An Infinite Variety Of Pong Games”

How To Roll Your Own Custom Object Detection Neural Network

Real-time object detection, which uses neural networks and deep learning to rapidly identify and tag objects of interest in a video feed, is a handy feature with great hacker potential. Happily, it’s also possible to make customized CNNs (convolutional neural networks) tailored for one’s own needs, and that process just got easier thanks to some new documentation for the Vizy “AI camera” by Charmed Labs.

Raspberry Pi-based Vizy camera

Charmed Labs has been making hacker-friendly machine vision devices for a long time, and the Vizy camera impressed us mightily when we checked it out last year. Out of the box, Vizy has a perfectly functional object detector application that runs locally on the device, and can detect and tag many common everyday objects in real time. But what if that default application doesn’t quite meet one’s project needs? Good news, because it’s possible to create a custom-trained CNN, and that process got a lot more accessible thanks to step-by-step examples of training a model to recognize hands doing rock-paper-scissors.

Person and cat with machine-generated tags identifying them
Default object detection works well, but sometimes one needs custom results.

The basic process is this: Start with a variety of images that show the item of interest. Then identify and label the item of interest in each photo. These photos (a “training set”) are then sent to Google Colab, which will be used to generate a neural network. The resulting CNN model can then be downloaded and used, to see how well it performs.

Of course things rarely work perfectly the first time around, so at this point it’s pretty common for some refinement to be needed to increase accuracy. Luckily there are a number of tools to help do this without creating a new model from scratch, so it’s just a matter of tweaking until things perform acceptably.

Google Colab is free and the resulting CNNs are implemented in the TensorFlow Lite framework, meaning it’s possible to use them elsewhere. So if custom object detection has been holding up a project idea of yours, this might be what gets you over that hump.