Teaching A Robot To Hallucinate

February 26, 2023

Training robots to execute tasks in the real world requires data — the more, the better. The problem is that creating these datasets takes a lot of time and effort, and methods don’t scale well. That’s where Robot Learning with Semantically Imagined Experience (ROSIE) comes in.

The basic concept is straightforward: enhance training data with hallucinated elements to change details, add variations, or introduce novel distractions. Studies show a robot additionally trained on this data performs tasks better than one without.

This robot is able to deposit an object into a metal sink it has never seen before, thanks to hallucinating a sink in place of an open drawer in its original training data.

Suppose one has a dataset consisting of a robot arm picking up a coke can and placing it into an orange lunchbox. That training data is used to teach the arm how to do the task. But in the real world, maybe there is distracting clutter on the countertop. Or, the lunchbox in the training data was empty, but the one on the counter right now already has a sandwich inside it. The further a real-world task differs from the training dataset, the less capable and accurate the robot becomes.

ROSIE aims to alleviate this problem by using image diffusion models (such as Imagen) to enhance the training data in targeted and direct ways. In one example, a robot has been trained to deposit an object into a drawer. ROSIE augments this training by inpainting the drawer in the training data, replacing it with a metal sink. A robot trained on both datasets competently performs the task of placing an object into a metal sink, despite the fact that a sink never actually appears in the original training data, nor has the robot ever seen this particular real-world sink. A robot without the benefit of ROSIE fails the task.

Here is a link to the team’s paper, and embedded below is a video demonstrating ROSIE both in concept and in action. This is also in a way a bit reminiscent of a plug-in we recently saw for Blender, which uses an AI image generator to texture entire 3D scenes with a simple text prompt.

8 thoughts on “Teaching A Robot To Hallucinate”

michael says:

February 26, 2023 at 4:18 am

Sounds similar to taking a nap as a human being so the brain can digest new information during some heavy studying or learning new complex tasks. Only we call it dreaming instead of hallucinating…

Report comment

Reply
1. Conor Stewart says:
  
  February 26, 2023 at 5:56 am
  
  Yeah that was what I was thinking, this is definitely dreaming rather than hallucinating. Hallucinating would be if the robot was there physically and was having its camera feed altered with other objects added in but this is just altering it’s training data when the robot isn’t physically powered on, so it is just dreaming.
  
  Report comment
  
  Reply
  1. Dude says:
    
    February 26, 2023 at 11:00 pm
    
    It’s neither.
    
    The neural net is merely trained on simulated data AND real data, instead of just the real data set. It does not generate the imagery.
    
    It still carries the problem that you have to train the model for each special case, where for example placing the item in a basket instead of a drawer or a sink requires you to start over and add images of baskets to the training set, which would eventually have to contain the entire world because the robot itself does not understand what it is seeing and cannot generalize.
    
    Report comment
    
    Reply
QL2Z says:

February 26, 2023 at 7:26 am

Do Androids Dream of Electric Sheep? Wow that’s some insane stuff. REALLY SRANGE TIME TIMES!

Report comment

Reply
Michael Black says:

February 26, 2023 at 7:41 am

Wasn’t there an SF story where the robot drank? I’m thinking of Eando Binder, but a search says no.

But.maybe I’m thinking of Bender on Futurama

Report comment

Reply
rnjacobs says:

February 26, 2023 at 10:50 am

“Hallucinating” is the term of art used in neural network farming to mean “making up random crap”.

Report comment

Reply
Twisty Plastic says:

February 27, 2023 at 6:01 am

Anyone else watch that animation and immediately think of Bojack Horseman?

Report comment

Reply
psuedonymous says:

February 28, 2023 at 8:12 am

A very similar technique is used by Nvidia for SLNN training: simulate multiple environments and render them at high fidelity, then train the visual model on those (e.g. simulating driving footage from multiple virtual ‘cameras’ to train self-driving car object recognition algorithms). The advantage of using fully synthetic training environments is that by definition you have an exact baseline for every single item in the training dataset (i.e. unlike a captured training set you do not have to have people go through and tag them), with the disadvantage being that utility of the generated training sets is related to how realistically you can render them – something Nvidia has been pushing hard for e.g. raytracing.

Report comment

Reply