For all the complexity involved in driving, it becomes second nature to respond to pedestrians, environmental conditions, even the basic rules of the road. When it comes to AI, teaching machine learning algorithms how to drive in a virtual world makes sense when the real one is packed full of squishy humans and other potential catastrophes. So, why not use the wildly successful virtual world of Grand Theft Auto V to teach machine learning programs to operate a vehicle?
The hard problem with this approach is getting a large enough sample for the machine learning to be viable. The idea is this: the virtual world provides a far more efficient solution to supplying enough data to these programs compared to the time-consuming task of annotating object data from real-world images. In addition to scaling up the amount of data, researchers can manipulate weather, traffic, pedestrians and more to create complex conditions with which to train AI.
It’s pretty easy to teach the “rules of the road” — we do with 16-year-olds all the time. But those earliest drivers have already spent a lifetime observing the real world and watching parents drive. The virtual world inside GTA V is fantastically realistic. Humans are great pattern recognizers and fickle gamers would cry foul at anything that doesn’t analog real life. What we’re left with is a near-perfect source of test cases for machine learning to be applied to the hard part of self-drive: understanding the vastly variable world every vehicle encounters.
A team of researchers from Intel Labs and Darmstadt University in Germany created a program that automatically indexes the virtual world (as seen above), creating useful data for a machine learning program to consume. This isn’t a complete substitute for real-world experience mind you, but the freedom to make a few mistakes before putting an AI behind the wheel of a vehicle has the potential to speed up development of autonomous vehicles. Read the paper the team published Playing for Data: Ground Truth from Video Games.
Before you think this could go horribly wrong, check out this mini-rally truck that taught itself how to powerslide and realize that this method of teaching AI how to drive could actually be totally awesome. Also realize that this research is just characterizing still-images at about 7 seconds per image. This is more than a couple orders of magnitude faster than real-world images — great for learning but we’re still very far away from real-time and real-world implementation.
We really hope that a team of research assistants were paid to play a lot of GTA V in a serious scientific effort to harvest this data set. That brings to mind one serious speed-bump. This game is copyrighted and you can’t just do anything you want with recordings of gameplay, and the researchers do mention that gameplay footage is allowed under certain non-commercial circumstances. That means that Uber, Google, Apple, Tesla, every major auto company, and anyone else developing autonomous vehicles as a business model will be locked out of this data source unless Rockstar Games comes up with a licensing model. Maybe your next car will have a “Powered by GTA” sticker on it.
If we use GTA to teach behavior to AI humanoid robots, we could end up with this:
https://www.youtube.com/watch?v=Efbq8RsC_Rk
You beat me to it….
just imagine if Rockstar decided to create a mod where all this annotation is being done INSIDE the engine itself, since they already know what objects are what :)
They could sell it to all major companies and get another few hundred million out of their already successful game franchise AND further advance the progress of vehicular autonomy.
please remove this one
you can use the render pipeline and extract the id buffer, it makes annotation easier
But I don’t want a car that randomly takes off and flies.
The trouble with putting an AI in the Grand Theft Auto world to teach it anything is that it may learn sociopathy there faster than anything else. The last thing we need is a rogue AI pulling people out of their cars to take a joyride on the sidewalk. :)
Yea, scary thought!
http://i.blogs.es/e69185/bender-car/original.jpg
At least you wouldn’t worry about a GTA-trained robocar trying to solve trolley problems by crashing and killing you to avoid hitting that bus full of kindergarteners and nuns. (Whether you have to worry about it activating the driver ejection seat and then aiming for the bus to get you more points is a separate problem.)
That isnt how AI works except in sci fi movies and the rambling of click bait pop-sci writers.
…yet.
It’s a simple matter of switching the scoring rules in the AI’s mind from “Video game” to “Golf” (i.e. lower scores are better) and we’ll have the safest self driving car imaginable.
That’s pretty cool.
It would be even more awesome if we could have some help from the game developers and render each frame with annotations
The problem with reaching out to the game developers is that they would probably want a slice of the pie, but it could still be worth it.
Well duh? People want to get paid for commercial use of their game, what a strange thing!
Easter egg in your next self driving car will be a “decrease wanted level” mode :)
the biggest “fun” will be when the AI car see some painted advertising cars on some van or buses :)
“Also realize that this research is just characterizing still-images at about 7 seconds per image. This is more than a couple orders of magnitude faster than real-world images — great for learning but we’re still very far away from real-time and real-world implementation.”
At 7 seconds per image, wouldn’t it be “couple orders of magnitude SLOWER than real-world images”?
**Annotation** of real-world images according to the video takes 60-90 minutes per frame (suspicious). So this is “a couple orders of magnitude faster”, but I agree, that paragraph was poorly worded.
I wonder how detailed the annotation of the image is if it took 60 minutes to do. At least I could not describe and mark things in the shown pictures for more than 5-10 minutes.
I’m a little late to the party, but reading the paper gives some more insight. They’re talking about annotating the image in the sense of the overlay above – creating a pixel-accurate mask of what’s what. Doing this very accurately is key, since it’s supposed to be a “perfect” dataset to train with. Creating these masks in this sense by hand on a frame by frame basis (like the other datasets they’re comparing against) could easily take that long.
ANALOG IS NOT A VERB.
Are you upset that he verbed an adjective?
I think we need to dictionarify stuff as soon as it’s thingified. These poor OCDers obviously have the pain.
Yea I guess his spell check got a bit confused there?
The whole self-driving car thing is taking on dangerous aspects with cheaper and cheaper and more and more amateurish experiments and kickstarters.
But at least in Europe you can say it was a terrorist if things go really foul I suppose.
its more of a ‘GTA V and a bunch of interns manually and painstakingly clicking on images annotating every pixel, to manufacture dataset that MIGHT someday be used to teach AI’.
I don’t understand why people think this sort of machine learning isn’t possible, just the other day some idiot that uses the nic “LOL” was making that claim, but of course it works, that is how our neural networks get trained too, in controlled and structured environments that are a microcosm of the real world, i.e. The family home and then schools etc.. The advantage AI has over humans is that you can duplicate the net once it is trained as many times you like, imagine having true ancestral memory and being born with all the knowledge your parents gained before you were conceived.
I would’ve preferred they use Carmageddon… but sure GTA would work.