Showing robots adversarial behavior may be the key to improving their performance, according to a study conducted by the University of Southern California. While a generative adversarial network (GAN), where two neural networks compete in a game, has been demonstrated, this is the first time adversarial human users have been used in a learning effort.
The report was presented at the International Conference on Intelligent Robots and Systems, describing the experiment in which reinforcement learning was used to train robotic systems to create a general purpose system. For most robots, a huge amount of training data is necessary in order to manipulate objects in a human-like way.
A line of research that has been successful in overcoming this problem is having a “human in the loop”, in which a human provides feedback to the system in regards to its abilities. Most algorithms have assumed a cooperating human assistant, but by acting against the system the robot may be more inclined to develop robustness towards real world complexities.
The experiment that was conducted involved a robot attempting to grasp an object in a computer simulation. The human observer observes the simulated grasp and attempts to snatch the object away from the robot if the grasp is successful. This helps the robot discern weak and firm grasps, a crazy idea from the researchers that managed to work. The system trained with the adversary rejected unstable grasps, quickly learning robust grasps for different objects.
Experiments like these can test the assumptions made in the learning task for robotic applications, leading to better stress-tested systems more inclined to work in real-world situations. Take a look at the interview in the video below the break.
[Thanks Qes for the tip!]
This may explain why the robots of the future want to destroy all human life… I for one welcome our new robot overlords.
‘Will they?
I haven’t been in the future yet and predictions have mostly been wrong … still waiting for my flying car …
Hmmm …. reminds me of something. :D
https://www.youtube.com/watch?v=y3RIHnK0_NE
Seems very obvious and straight forward to me, eventhough it’s always nice to see it work in practice.
Every software developer will test a system for corner cases and potentially unmodeled situations to see how robust a system is.
First you give an ideally short description of a generic solution, then you try to break it. The only difference is that a learning system will use that negative information directly, and a developer will have to revise the model manually.
The key is very likely on the timing of negative feedback, and the right choice of what kind to give.
Permanent negative feedback will prevent the system from learning, because the “negative noise” will drown out any positive signal. In other words: you need a smart teacher, not tough love.
Right, Yolandi had the right idea, Ninja not so much.