Modern Wizard Summons Familiar Spirit

May 24, 2018 by Roger Cheng 10 Comments

In European medieval folklore, a practitioner of magic may call for assistance from a familiar spirit who takes an animal form disguise. [Alex Glow] is our modern-day Merlin who invoked the magical incantations of 3D printing, Arduino, and Raspberry Pi to summon her familiar Archimedes: The AI Robot Owl.

The key attraction in this build is Google’s AIY Vision kit. Specifically the vision processing unit that tremendously accelerates image classification tasks running on an attached Raspberry Pi Zero W. It no longer consumes several seconds to analyze each image, classification can now run several times per second, all performed locally. No connection to Google cloud required. (See our earlier coverage for more technical details.) The default demo application of a Google AIY Vision kit is a “joy detector” that looks for faces and attempts to determine if a face is happy or sad. We’ve previously seen this functionality mounted on a robot dog.

[Alex] aimed to go beyond the default app (and default box) to create Archimedes, who was to reward happy people with a sticker. As a moving robotic owl, Archimedes had far more crowd appeal than the vision kit’s default cardboard box. All the kit components have been integrated into Archimedes’ head. One eye is the expected Pi camera, the other eye is actually the kit’s piezo buzzer. The vision kit’s LED-illuminated button now tops the dapper owl’s hat.

Archimedes was created to join in Google’s promotion efforts. Their presence at this Maker Faire consisted of two tents: one introductory “Learn to Solder” tent where people can create a blinky LED badge, and the other tent is focused on their line of AIY kits like this vision kit. Filled with demos of what the kits can do aside from really cool robot owls.

Hopefully these promotional efforts helped many AIY kits find new homes in the hands of creative makers. It’s pretty exciting that such a powerful and inexpensive neural net processor is now widely available, and we look forward to many more AI-powered hacks to come.

Continue reading “Modern Wizard Summons Familiar Spirit” →

Using TensorFlow To Recognize Your Own Objects

May 23, 2018 by Steven Dufresne 9 Comments

When the time comes to add an object recognizer to your hack, all you need do is choose from many of the available ones and retrain it for your particular objects of interest. To help with that, [Edje Electronics] has put together a step-by-step guide to using TensorFlow to retrain Google’s Inception object recognizer. He does it for Windows 10 since there’s already plenty of documentation out there for Linux OSes.

You’re not limited to just Inception though. Inception is one of a few which are very accurate but it can take a few seconds to process each image and so is more suited to a fast laptop or desktop machine. MobileNet is an example of one which is less accurate but recognizes faster and so is better for a Raspberry Pi or mobile phone.

You’ll need a few hundred images of your objects. These can either be scraped from an online source like Google’s images or you get take your own photos. If you use the latter approach, make sure to shoot from various angles, rotations, and with different lighting conditions. Fill your background with various other things and even have some things partially obscuring your objects. This may sound like a long, tedious task, but it can be done efficiently. [Edje Electronics] is working on recognizing playing cards so he first sprinkled them around his living room, added some clutter, and walked around, taking pictures using his phone. Once uploaded, some easy-to-use software helped him to label them all in around an hour. Note that he trained on 24 different objects, which are the number of different cards you get in a pinochle deck.

You’ll need to install a lot of software and do some configuration, but he walks you through that too. Ideally, you’d use a computer with a GPU but that’s optional, the difference being between three or twenty-four hours of training. Be sure to both watch his video below and follow the steps on his Github page. The Github page is kept most up-to-date but his video does a more thorough job of walking you through using the software, such as how to use the image labeling program.

Why is he training an object recognizer on playing cards? This is just one more step in making a blackjack playing robot. Previously he’d done an impressive job using OpenCV, even though the algorithm handled non-overlapping cards only. Google’s Inception, however, recognizes partially obscured cards. This is a very interesting project, one which we’ll be keeping an eye on. If you have any ideas for him, leave them in the comments below.

Continue reading “Using TensorFlow To Recognize Your Own Objects” →

Neural Networks Using Doom Level Creator Like It’s 1993

May 16, 2018 by Drew Risinger 12 Comments

Readers of a certain vintage will remember the glee of building your own levels for DOOM. There was something magical about carefully crafting a level and then dialing up your friends for a death match session on the new map. Now computers scientists are getting in on that fun in a new way. Researchers from Politecnico di Milano are using artificial intelligence to create new levels for the classic DOOM shooter (PDF whitepaper).

While procedural level generation has been around for decades, recent advances in machine learning to generate game content (usually levels) are different because they don’t use a human-defined algorithm. Instead, they generate new content by using existing, human-generated levels as a model. In effect they learn from what great game designers have already done and apply those lesson to new level generation. The screenshot shown above is an example of an AI generated level and the gameplay can be seen in the video below.

The idea of an AI generating levels is simple in concept but difficult in execution. The researchers used Generative Adversarial Networks (GANs) to analyze existing DOOM maps and then generate new maps similar to the originals. GANs are a type of neural network which learns from training data and then generates similar data. They considered two types of GANs when generating new levels: one that just used the appearance of the training maps, and another that used both the appearance and metrics such as the number of rooms, perimeter length, etc. If you’d like a better understanding of GANs, [Steven Dufresne] covered it in his guide to the evolving world of neural networks.

While both networks used in this project produce good levels, the one that included other metrics resulted in higher quality levels. However, while the AI-generated levels appeared similar at a high level to human-generated levels, many of the little details that humans tend to include were omitted. This is partially due to a lack of good metrics to describe levels and AI-generated data.

Example DOOM maps generated by AI. Each row is one map, and each image is one aspect of the map (floor, height, things, and walls, from left to right)

We can only guess that these researcher’s next step is to use similar techniques to create an entire game (levels, characters, and music) via AI. After all, how hard can it be?? Joking aside, we would love to see you take this concept and run with it. We’re dying to play through some gnarly levels whipped up by the AI from Hackaday readers!

Continue reading “Neural Networks Using Doom Level Creator Like It’s 1993” →

Google’s Duplex AI Has Conversation Indistinguishable From Human’s

May 10, 2018 by Steven Dufresne 106 Comments

First Google gradually improved its WaveNet text-to-speech neural network to the point where it sounds almost perfectly human. Then they introduced Smart Reply which suggests possible replies to your emails. So it’s no surprise that they’ve announced an enhancement for Google Assistant called Duplex which can have phone conversations for you.

What is surprising is how well it works, as you can hear below. The first is Duplex calling to book an appointment at a hair salon, and the second is it making reservation’s with a restaurant.

Note that this reverses the roles when talking to a computer on the phone. The computer is the customer who calls the business, and the human is on the business side. The goal of the computer is to book a hair appointment or reserve a table at a restaurant. The computer has to know how to carry out a conversation with the human without the human knowing that they’re talking to a computer. It’s for communicating with all those businesses which don’t have online booking systems but instead use human operators on the phone.

Not knowing that they’re talking to a computer, the human will therefore speak as it would with another human, with all the pauses, “hmm”s and “ah”s, speed, leaving words out, and even changing the context in mid-sentence. There’s also the problem of multiple meanings for a phrase. The “four” in “Ok for four” can mean 4 pm or four people.

The component which decides what to say is a recurrent neural network (RNN) trained on many anonymized phone calls. The input is: the audio, the output from Google’s automatic speech recognition (ASR) software, and context such as the conversation’s history and the parameters of the conversation (e.g. book places at a restaurant, for how many, when), and more.

Producing the speech is done using Google’s text-to-speech technologies, Wavenet and Tacotron. “Hmm”s and “ah”s are inserted for a more natural sound. Timing is also taken into account. “Hello?” gets an immediate response. But they introduce latency when responding to more complex questions since replying too soon would sound unnatural.

There are limitations though. If it decides it can’t complete a task then it hands the conversation over to a human operator. Also, Duplex can’t handle a general conversation. Instead, multiple instances are trained on different domains. So this isn’t the singularity which we’ve talked about before. But if you’re tired of talking to computers at businesses, maybe this will provide a little payback by having the computer talk to the business instead.

On a more serious note, would you want to know if the person you were speaking to was in fact a computer? Perhaps Google should preface each conversation with “Hi! This is Google Assistant calling.” And even knowing that, would you want to have a human conversation with a computer, knowing that it’s “um”s were artificial? This may save time for the person whom the call is on behalf of, but the person being called may wish the computer would be a little more computer-like and speak more efficiently. Let us know your thoughts in the comments below. Or just check out the following Google I/O ’18 keynote presentation video where all this was announced.

Continue reading “Google’s Duplex AI Has Conversation Indistinguishable From Human’s” →

Neural Network Names Nightshades

April 26, 2018 by Lewin Day 12 Comments

Neural networks are a core area of the artificial intelligence field. They can be trained on abstract data sets and be put to all manner of useful duties, like driving cars while ignoring road hazards or identifying cats in images. Recently, a biologist approached AI researcher [Janelle Shane] with a problem – could she help him name some tomatoes?

It’s a problem with a simple cause – like most people, [Darren] enjoys experimenting with tomato genetics, and thus requires a steady supply of names to designate the various varities produced in this work. It can be taxing on the feeble human brain, so a silicon-based solution is ideal.

[Janelle] decided to use the char-rnn library built by [Andrej Karpathy] to do the heavy lifting. After training it on a list of over 11,000 existing tomato varieties, the neural network was then asked to strike out on its own.

The results are truly fantastic – whether you’re partial to a Speckled Garfech or you prefer the smooth flavor of the Golden Pow, there’s a tomato to suit your tastes. When the network was retrained with additional content in the form of names of metal bands, the results get even better – it’s only a matter of time before Angels of Saucing reach a supermarket shelf near you.

On the surface, it’s a fun project with whimsical output, but fundamentally it highlights how much can be accomplished these days by standing on the shoulders of giants, so to speak. Now, if you need some assistance growing your tomatoes, the machines can help there, too.

Neural Networks… On A Stick!

April 24, 2018 by Al Williams 14 Comments

They probably weren’t inspired by [Jeff Dunham’s] jalapeno on a stick, but Intel have created the Movidius neural compute stick which is in effect a neural network in a USB stick form factor. They don’t rely on the cloud, they require no fan, and you can get one for well under $100. We were interested in [Jeff Johnson’s] use of these sticks with a Pynq-Z1. He also notes that it is a great way to put neural net power on a Raspberry Pi or BeagleBone. He shows us YOLO — an image recognizer — and applies it to an HDMI signal with the processing done on the Movidius. You can see the result in the first video, below.

At first, we thought you might be better off using the Z1’s built-in FPGA to do neural networks. [Jeff] points out that while it is possible, the Z1 has a lower-end device on it, so there isn’t that much FPGA real estate to play with. The stick, then, is a great idea. You can learn more about the device in the second video, below.

Continue reading “Neural Networks… On A Stick!” →

TensorFlow In Your Browser

April 16, 2018 by Al Williams 24 Comments

If you want to explore machine learning, you can now write applications that train and deploy TensorFlow in your browser using JavaScript. We know what you are thinking. That has to be slow. Surprisingly, it isn’t, since the libraries use Graphics Processing Unit (GPU) acceleration. Of course, that assumes your browser can use your GPU. There are several demos available, include one where you train a Pac Man game to respond to gestures in your webcam to control the game. If you try it and then disable accelerated graphics in your browser options, you’ll see just what a speed up you can gain from the GPU.